Architecture

Seyn separates write-time work from query-time work. Ingestion and analysis run asynchronously in the background and can take minutes; querying is synchronous and reads only what’s already been extracted and indexed. Almost every “why don’t I see X?” question resolves to knowing which side of that line X is on.

The five layers

Layer	What it does	Where it’s documented
Connectors	Pull raw activity from client systems via delta sync, API polling, or upload. Read-only, always.	Connectors
Events	Transform heterogeneous payloads into one event schema; resolve actors to canonical identities.	Events
Extraction	Staged LLM analysis over events; produces a new knowledge library version per run.	Knowledge → Extraction
Knowledge	An append-only assertion substrate, projected into versioned rules and libraries, indexed four ways.	Knowledge
Querying	Hybrid retrieval serving chat, dashboard, MCP, and the public API.	Query

Two cross-cutting systems run through all five layers: the provenance chain (every layer records where its outputs came from) and multi-tenancy (every row in every layer is organisation-scoped).

The write path

A connector sync lands deduplicated raw records. Each sync produces a run record with per-stage counters you can watch in the dashboard.
Normalizers fan raw records out into events (who did what to which entity, when) and resolve actor identities across systems.
An extraction run executes the staged LLM analysis and writes a new knowledge library version, logging every model call as it goes.
Indexes are generated for the new knowledge: a semantic embedding, a full-text vector, parent context references, and graph edges between related rules.

Each step is a separate durable background job with its own retry and checkpoint behaviour, so one failing stage doesn’t silently corrupt the layers below it.

The read path

A query, whether it comes from chat, the dashboard, MCP, or the API, hits the query pipeline: three retrieval signals run concurrently (structured, full-text, semantic), get fused, reranked, and optionally expanded through the knowledge graph and parent context. The read path never calls back into the write path. If knowledge isn’t in the active library version, no amount of querying will surface it; you need an extraction run.

Technology

Concern	Choice	Why
Database	PostgreSQL 16 + pgvector	One database for relational data, vectors, and full-text search. No sync drift between a search index and the source of truth.
LLM	Claude (Anthropic)	Fast models for high-volume stages, frontier model for synthesis. Every call goes through one central, logged function.
Reranking	Cohere Rerank 3.5	With a passthrough fallback: querying degrades gracefully, never hard-fails.
Jobs	Durable background tasks	Per-org concurrency caps, checkpointed stages, cron watchdogs.
Auth	Clerk	Organisations map 1:1 to tenants; roles come from signed claims, never from request parameters.
Tracing	Self-hosted Langfuse	Full LLM traces without prompt data leaving Seyn infrastructure.

When something looks wrong, walk the chain in order

Sync first

Did the connector sync actually complete? Check the sync-run status and counters on the connector detail page. A sync that’s running or failed means the data never arrived.

Extraction second

Has an extraction run completed since that sync? New events don’t affect answers until a run produces a new library version.

Library third

Is the library version you expect actually active? Queries read the active version, not drafts.

Query last

Use explain mode to see exactly which signals matched and how results were ranked. If a rule exists but doesn’t surface, this shows you why.

This order catches almost every “I connected a source but chat doesn’t know about it” debugging session. It’s nearly always step 1 or 2, not querying.

Events

The common schema everything is analysed in.

Query

The read path in full detail.

​The five layers

​The write path

​The read path

​Technology

​When something looks wrong, walk the chain in order

​Related