Skip to main content
A platform whose product is LLM output has to be able to answer “what exactly did the model see, and what did it cost?” for any claim it ever produced. Seyn’s observability is built around one architectural rule:
Every LLM call in the platform goes through a single central inference function. No stage, extractor, or chat handler calls a model directly.
That one choke point is what makes everything below possible.

The inference log

Every call through the central function is recorded:
FieldWhy it matters
ModelWhich model produced this output: fast-tier or frontier.
Prompt versionPrompts are versioned; a logged call pins the exact template, not “whatever the prompt was at the time.”
Prompt hashDetects drift even within a version.
Token countsPer-call cost accounting, aggregable per stage, per run, per org.
LatencyWall-clock per call.
Input event IDsThe provenance link: the precise set of events the model reasoned over.
The inference log does double duty by design: it is simultaneously the platform’s cost and quality audit trail and the middle link of the provenance chain. One mechanism, two guarantees; they can’t drift apart. Operators can browse inference logs per extraction run in the dashboard and inspect any stage’s actual prompts and outputs.

Tracing

On top of the log, every call emits a trace to a self-hosted tracing instance (Langfuse). Observability data, including full prompts and responses, never leaves Seyn infrastructure. Each extraction run produces an end-to-end trace with one span per LLM call: organisation and run context, stage name, prompt version, model, full prompts and responses, token counts, and latency.
Tracing is best-effort by contract: if the tracing backend is unreachable, spans are dropped and the run completes normally. Observability must never become an availability dependency for the thing it observes.

Versioned prompts

Prompt templates are versioned and reviewed like any other change. A rule extracted six months ago can still be traced to the exact prompt text that produced it. There is no “prompt was edited in a dashboard somewhere” failure mode.

Cost controls

  • Model routing per stage. High-volume stages run on a fast, cheap model tier; only synthesis gets the frontier model.
  • Prompt caching. Stable prompt prefixes are cached at the provider, cutting repeat-batch input cost to roughly a tenth.
  • Per-stage token budgets. Each stage operates under an explicit budget; oversized corpora are chunked rather than blowing through it.

Provenance

How the inference log anchors the audit chain.

Knowledge

The extraction stages whose calls all flow through here.