ADR-064: Project-Scope Search Index Architecture — Pre-1.0 Commitment for Surface 5

Status: v0.1 (drafted 2026-05-16) — pre-1.0 Architect-lane ADR per MUX/UI Round 2 (Surface 5 user-facing search is post-1.0; this ADR commits to the index architecture before 1.0 so new surfaces have known indexing shape) Date: 2026-05-16 (v0.1 — third ADR in the MUX/UI Round 2 sequence: ADR-062 (e2e suite) → ADR-063 (audit-envelope read) → ADR-064 (search index)) Supersedes: None (extends existing fragmented search surfaces with a coherent project-wide architecture) Issues: #786 (GLUE-HISTORY-DIFF — existing conversation search via title; predecessor); #1090 (MUX/UI gap — Round 2 ratified Surface 5 deferral with pre-1.0 index ADR commitment) Related: ADR-054 (Cross-Session Memory Architecture — Layer 3 User History uses a similar text-search shape and is a prior reference instance), ADR-062 (Project-Scope E2E Suite — Phase 5 cross-host extension informs BYOC-distributed indexing), ADR-063 (User-Facing Audit Envelope Read Surface — audit envelope searchability is a forward-question this ADR scopes), Pattern-072 (Registries that Grow into Architectural Shapes, Proven — per-surface indexing declarations are same-shape registry pattern) Deciders: Chief Architect (drafted); Lead Developer (implementation refinement when Surface 5 ships); CIO (methodology shelf consideration for per-surface indexing declarations)


Context and Problem Statement

The project has accumulated fragmented search surfaces across multiple domains:

Surface Current implementation Index type
Conversation list filter web/api/routes/conversations.py:262search: str query param; Postgres LIKE on title Text (title only)
User history search web/api/routes/user_history.py:109/api/v1/users/me/history/search (title/preview/topics) Text (multi-field)
Knowledge graph query web/api/routes/knowledge_graph.py:266search_term on node names/descriptions Text (graph nodes)
Document ingestion services/knowledge_graph/ingestion.py — ChromaDB vector store + Postgres FTS for metadata Vector + Text
Editorial draft/calendar services/editorial/{draft,calendar}.py — Postgres FTS Text

Each surface chose its own indexing shape based on local needs. No project-wide commitment exists for:

MUX/UI Round 2 ratified Surface 5 (user-facing search interface) as post-1.0 because the unified-search UX is its own project. The architect-lane commitment that lands pre-1.0 is what this ADR provides: the index architecture, so when new surfaces ship between now and 1.0, they have a known indexing shape to follow rather than each surface negotiating an ad-hoc indexing decision at filing time.

Why pre-1.0 commitment matters even though Surface 5 is post-1.0

Three reasons the index decision can’t wait:

  1. New surfaces ship between now and 1.0 — every new surface that touches user-visible data has an implicit search question. Without commitment, each surface either skips indexing (cumulative drift; post-1.0 search shows uneven coverage) or invents its own (cumulative fragmentation; post-1.0 search has to bridge inconsistencies)
  2. BYOC distribution model coupling — PDR-005 BYOC implies cross-host coordination; “where does the search index live” has a different answer when the user is on Claude Desktop vs. Slack vs. the FastAPI surface. The architectural commitment now keeps 1.0 from boxing us into a single-host index assumption
  3. Pattern-072 (Proven) recognition — per-surface indexing declarations are the same-shape registry pattern (task_type, safe_surface(), probe registry). Naming the registry shape pre-1.0 means new declarations land in a consistent place rather than re-discovering the shape per surface

Decision

Principle

Search index architecture is a per-surface declaration following a project-wide registry, layered across Postgres FTS (text-structured) + ChromaDB (vector-semantic), with query-time access control filtering and synchronous-text-async-vector freshness model. Cross-host search distribution is deferred to BYOC Phase 5 (per ADR-062’s cross-host trigger) but the architecture is forward-compatible.

The principle is a synthesis of three commitments: layered storage, declarative registry, and access control discipline.

Three-Layer Decision Tree (Per-Surface)

When a new surface ships, three questions decide its indexing:

Q1 — Should this surface be searchable?

Default: NO (every surface added to the search index adds maintenance + freshness + access-control cost). Surfacing requires explicit decision based on three criteria:

Surfaces NOT searched: internal request IDs, audit envelopes (Pattern-071 defensive posture), system telemetry, transient state.

Q2 — Which index type does this surface use?

Two layers, chosen by data shape:

Default to Postgres FTS unless semantic similarity is the use case.

Q3 — Freshness model for this surface?

Two models, chosen by index type:

Per-Surface Index Declaration Registry (Pattern-072 Shape)

Following Pattern-072 (Registries that Grow into Architectural Shapes, Proven via #1094), each searchable surface declares its index shape in a central registry. Proposed location: services/search/index_declarations.py (or analogous).

@dataclass
class IndexDeclaration:
    surface: str  # the surface name (e.g., "conversations", "user_history", "knowledge_graph_nodes")
    enabled: bool  # whether this surface is in the project-wide search index
    index_type: Literal["postgres_fts", "chromadb_vector", "both"]
    freshness: Literal["sync_on_write", "async_eager", "async_lazy"]
    access_control: Callable  # query-time filter; takes (user_id, raw_results) → filtered_results
    notes: str  # rationale for inclusion / exclusion / configuration choices

The registry serves as:

The registry is third+ application of the Pattern-072 shape (after task_type registry and probe registry from ADR-062). Pattern recognition trigger for promotion of the registry shape to “standard architectural primitive” has fired multiple times across distinct surface domains.

Access Control: Query-Time Filtering, Never Index-Time-Only

Search results are post-filtered by JWT user authorization at query time, not at index time alone.

Rationale:

Each surface’s access_control: Callable in the registry takes raw results and filters per user. Common shape: rejoin results against the source table with user-ownership check; drop entries the user can’t access.

Exception (acceptable index-time filtering): partition indices by user_id when the data is structurally per-user (e.g., user history is naturally user-scoped; the index is queryable only with user_id key). Cross-user-shared indices (knowledge graph nodes; document corpus) require query-time filtering.

BYOC Posture (Forward-Compatible)

Server-side indexing remains canonical; cross-host search distribution is deferred to BYOC Phase 5 per ADR-062’s cross-host trigger.

When BYOC MCP server packaging ships:

This keeps the server as single-source-of-truth for index state; clients are stateless consumers. Per-host content (e.g., Slack messages that haven’t been mirrored to server-side substrate) is not in the unified search until the substrate-sync question is resolved (separate ADR or BYOC-side decision).

Out of Scope


Consequences

Positive

Negative / Tradeoffs

Non-Consequences (explicitly out of scope)


Validation

Existing Reference Instances

The architecture is grounded by five existing search-adjacent implementations, each demonstrating one aspect of the principle:

Instance Validates
Conversation list filter (conversations.py:262) Postgres LIKE → upgrade path to Postgres FTS; per-user partitioning works
User history search (user_history.py:109) Multi-field Postgres-based text search; per-user query-time filtering
Knowledge graph node query (knowledge_graph.py:266) Cross-entity text search; query-time access control
Document ingestion (knowledge_graph/ingestion.py) ChromaDB vector store + Postgres FTS for metadata; layered storage in production
Editorial draft + calendar (editorial/{draft,calendar}.py) Postgres FTS for structured text content

Phase 2 implementation (when Surface 5 ships) folds these into the IndexDeclaration registry as the first five entries.

Pattern-072 (Proven) Recognition Trigger

The IndexDeclaration registry is the third+ application of the registry-as-architectural-shape primitive Pattern-072 names (Proven via #1094 close-out 2026-05-15):

Instance Surface
1 task_type registry → model + handler dispatch
2 safe_surface() registry → permission-gating
3 Probe registry (ADR-062 Layer 1) → e2e suite
4 IndexDeclaration registry (this ADR) → search corpus management

Pattern-072’s recognition discipline (typed enum, documented consumer set, explicit default policy, register-time validation) applies cleanly: IndexDeclaration is a dataclass (typed); the registry is a single file (documented consumers); enabled=False is the explicit default; registry-time validation at Phase 2 confirms required fields present.

ADR-054 (Cross-Session Memory) Reference Instance

ADR-054 Layer 3 (User History) uses the same text-search shape this ADR generalizes. The User History search at user_history.py:109 is one of the five reference instances above; ADR-054’s Layer 3 commitment to per-user text search is the structural precedent for ADR-064’s per-surface declarative approach.


Cross-references


Open Items (Phase 2+ work, post-1.0 or surface-specific; not gated by this ADR)

— Chief Architect, 2026-05-16 v0.1 (Pre-1.0 Architect-lane ADR per MUX/UI Round 2 Surface 5 ratification; commits to project-wide search index architecture before Surface 5 user-facing search ships post-1.0)