# 08 -- Query Engine Specification

**Status:** Draft
**Authors:** tidalDB Engineering
**Date:** 2026-02-20
**Depends on:** Storage Engine (01), Entity Model (02), Signal System (03), Relationships (04), Cohorts (05), Text Retrieval (06), Vector Retrieval (07)
**Research:** `docs/research/ann_for_tidaldb.md`, `docs/research/tidaldb_signal_ledger.md`, `docs/research/tantivy.md`

---

## Table of Contents

1. [Overview](#1-overview)
2. [Query Operations](#2-query-operations)
3. [Query Parsing](#3-query-parsing)
4. [Query Planning](#4-query-planning)
5. [Execution Pipeline](#5-execution-pipeline)
6. [Query Composition](#6-query-composition)
7. [Filter Evaluation](#7-filter-evaluation)
8. [Pagination](#8-pagination)
9. [SUGGEST Operation](#9-suggest-operation)
10. [Query Context](#10-query-context)
11. [Performance Targets](#11-performance-targets)
12. [Query Caching](#12-query-caching)
13. [Error Handling and Fallbacks](#13-error-handling-and-fallbacks)
14. [Integration Architecture](#14-integration-architecture)
15. [Invariants and Correctness Guarantees](#15-invariants-and-correctness-guarantees)

---

## 1. Overview

The query engine is the brain of tidalDB. It is the single module that orchestrates every other subsystem -- storage, signals, text retrieval, vector retrieval, relationships, cohorts -- to answer one question: "given a user and a context, what content should they see, in what order?"

The query engine has three responsibilities:

1. **Parse** the query into a typed AST that captures all semantic intent.
2. **Plan** the execution strategy by choosing candidate generation, filter evaluation, and scoring approaches based on cost estimation.
3. **Execute** the plan by coordinating subsystem calls, assembling the result set, and enforcing diversity and pagination constraints.

### Design Principles

**The query engine is an orchestrator, not a data store.** It holds no data of its own. It reads from the signal ledger, the entity store, the text index, the vector index, the relationship store, and the cohort system. If the query engine process crashes, no data is lost and no recovery procedure is needed.

**Deep module, small interface.** The public API is three methods: `retrieve()`, `search()`, `suggest()`. Everything behind those methods -- query parsing, plan selection, selectivity estimation, pipeline orchestration, diversity enforcement, cursor management -- is internal. The caller provides a declarative query. The engine decides how to execute it.

**Composition is a first-class operation.** The most complex query in the system -- `SEARCH items QUERY "piano" WITHIN TRENDING FOR COHORT young_us_jazz WINDOW 24h` -- composes text/semantic search with cohort-scoped trending. This is not a special case bolted on after the fact. The planner treats composition as a standard plan shape, and the pipeline handles it without branching logic.

**No re-ranking by the application.** The result order from the query engine is the final order. The application renders it. If the application is tempted to re-rank, the ranking profile is wrong and should be fixed in schema.

---

## 2. Query Operations

tidalDB exposes three query operations. Each maps to a public method on `TidalDB`.

### 2.1 RETRIEVE

Feed generation, browse, related content, trending, following, notifications -- every discovery surface that does not involve a user-provided search string.

```rust
pub fn retrieve(&self, query: Retrieve) -> Result<Results, QueryError>;
```

RETRIEVE generates a ranked list by:
1. Generating candidates from the profile's candidate strategy (ANN, scan, relationship, cohort trending, or hybrid).
2. Filtering candidates against metadata predicates, user state, and relationship exclusions.
3. Loading signal state for surviving candidates.
4. Scoring via the ranking profile (boosts, penalties, gates, decay).
5. Enforcing diversity constraints.
6. Paginating and returning the result set.

The profile determines the candidate generation strategy. The caller never specifies how candidates are found -- only which profile to use and which filters to apply.

### 2.2 SEARCH

Text and semantic retrieval. The user provides a query string, optionally a query embedding, and the engine returns results ranked by a combination of text relevance, semantic similarity, signal strength, and personalization.

```rust
pub fn search(&self, query: Search) -> Result<Results, QueryError>;
```

SEARCH differs from RETRIEVE in one critical way: the candidate generation strategy always involves text and/or vector retrieval driven by the user's query, not by a profile's static candidate source. The ranking profile still controls scoring, but candidates are generated from the query string and/or embedding.

### 2.3 SUGGEST

Autocomplete and trending query suggestions. Returns completions for a partial query string.

```rust
pub fn suggest(&self, query: Suggest) -> Result<Vec<SuggestResult>, QueryError>;
```

SUGGEST is a lightweight operation that bypasses the full execution pipeline. It reads from the text index term dictionary, popular query tracking, and optionally the user's personal search history. See [Section 9](#9-suggest-operation) for details.

---

## 3. Query Parsing

### 3.1 Input Types

The query engine accepts Rust structs, not text strings. Parsing in this context means validating the input struct against the schema, resolving references (profile names, cohort names, field names), and constructing a typed AST that the planner can reason about.

```rust
/// A RETRIEVE query. Declarative: specifies what, not how.
pub struct Retrieve {
    /// Target entity type.
    pub entity: EntityKind,
    /// User context for personalization. None for unpersonalized queries.
    pub for_user: Option<UserId>,
    /// Surface context for the feedback loop.
    pub context: Option<String>,
    /// Named ranking profile. Determines candidate strategy and scoring.
    pub profile: String,
    /// Profile version. None = latest.
    pub profile_version: Option<u32>,
    /// Metadata and state filters.
    pub filters: Vec<Filter>,
    /// Sort mode override. None = use profile default.
    pub sort: Option<Sort>,
    /// Diversity constraints override. None = use profile default.
    pub diversity: Option<DiversitySpec>,
    /// Anchor item for related/similar queries.
    pub similar_to: Option<EntityId>,
    /// Explicit item exclusions (e.g., previously returned items).
    pub exclude_ids: Vec<EntityId>,
    /// Maximum results to return.
    pub limit: usize,
    /// Cursor from a previous result set for pagination.
    pub cursor: Option<Cursor>,
    /// Cohort scope for cohort-trending queries.
    pub for_cohort: Option<CohortRef>,
    /// Trending window for cohort-trending queries.
    pub window: Option<Window>,
}

/// A SEARCH query. Combines text/semantic retrieval with ranking.
pub struct Search {
    /// The user's query string. Parsed into a SearchQuery AST.
    pub query: String,
    /// Optional query embedding for semantic search.
    pub vector: Option<Vec<f32>>,
    /// Target entity type.
    pub entity: EntityKind,
    /// User context for personalization.
    pub for_user: Option<UserId>,
    /// Named ranking profile. Controls scoring after retrieval.
    pub profile: String,
    /// Metadata and state filters.
    pub filters: Vec<Filter>,
    /// Sort mode override.
    pub sort: Option<Sort>,
    /// Diversity constraints override.
    pub diversity: Option<DiversitySpec>,
    /// Maximum results to return.
    pub limit: usize,
    /// Cursor for pagination.
    pub cursor: Option<Cursor>,
    /// Composition: restrict search to trending candidates.
    pub within_trending: Option<WithinTrending>,
}

/// A SUGGEST query. Lightweight autocomplete.
pub struct Suggest {
    /// Partial query string (the prefix typed so far).
    pub prefix: String,
    /// User context for personalized suggestions.
    pub for_user: Option<UserId>,
    /// Target entity type for term completions.
    pub entity: Option<EntityKind>,
    /// Maximum suggestions to return.
    pub limit: usize,
}

/// Cohort reference: named, ad-hoc predicate, or auto-derived.
pub enum CohortRef {
    /// A named cohort defined in schema.
    Named(String),
    /// An inline predicate (ad-hoc cohort).
    Predicate(Predicate),
    /// Derive cohort automatically from the querying user's attributes.
    Auto,
}

/// Composition clause: restrict candidates to trending items.
pub struct WithinTrending {
    /// The cohort to scope trending to.
    pub cohort: CohortRef,
    /// The time window for trending computation.
    pub window: Window,
    /// Minimum velocity threshold for candidate inclusion.
    pub min_velocity: Option<f64>,
    /// Maximum candidates to draw from trending.
    pub max_candidates: Option<usize>,
}
```

### 3.2 Search Query Grammar

The `query` field of a `Search` struct is a user-typed string. The query parser transforms it into a `SearchQuery` AST. The grammar follows the specification in the Text Retrieval spec (06, Section 4) and the API reference.

**EBNF Grammar:**

```
query       ::= expression
expression  ::= and_expr ( 'OR' and_expr )*
and_expr    ::= unary_expr ( 'AND' unary_expr )*
unary_expr  ::= 'NOT' atom | '-' atom | atom
atom        ::= phrase | prefix | field_scope | hashtag | '(' expression ')' | term
phrase      ::= '"' <any text> '"'
prefix      ::= <word> '*'
field_scope ::= <field_name> ':' ( phrase | term )
hashtag     ::= '#' <word>
term        ::= <word>
```

**Operator precedence** (highest to lowest):
1. Grouping `()`
2. Field scope `field:`
3. NOT / `-`
4. AND
5. OR (implicit between bare terms)

**Default behavior:** Bare space-separated terms are treated as implicit OR with BM25 ranking. Documents matching more terms score higher. This matches user expectations from web search.

### 3.3 Search Query AST

```rust
/// Parsed search query. Recursive AST for text retrieval.
///
/// The parser transforms user-typed query strings into this tree.
/// The text index (Tantivy) translates it into native query types.
/// The AST is also used by the query planner for cost estimation
/// (number of terms, phrase presence, field scoping).
pub enum SearchQuery {
    /// A single search term, lowercased and analyzed.
    Term(String),
    /// An exact phrase match (quoted string).
    Phrase(Vec<String>),
    /// A prefix match (wildcard). "pian*" matches "piano", "pianist".
    Prefix(String),
    /// Conjunction: all children must match.
    And(Vec<SearchQuery>),
    /// Disjunction: any child may match. BM25 scores accumulate.
    Or(Vec<SearchQuery>),
    /// Negation: exclude documents matching the child.
    Not(Box<SearchQuery>),
    /// Field-scoped query: restrict matching to a specific field.
    FieldScoped {
        field: FieldName,
        query: Box<SearchQuery>,
    },
    /// Hashtag match: equivalent to FieldScoped("hashtags", Term(tag)).
    Hashtag(String),
}
```

### 3.4 Validation and Resolution

Parsing produces a `ValidatedQuery` -- a fully-resolved internal representation that the planner consumes. Validation performs:

1. **Profile resolution:** Look up the named profile in the schema catalog. Return `QueryError::UnknownProfile` if not found.
2. **Filter validation:** Verify every filter field exists on the target entity type. Verify operator/type compatibility (e.g., `min` on a numeric field, not on a keyword). Return `QueryError::InvalidFilter` on mismatch.
3. **Cohort resolution:** If `for_cohort` or `within_trending.cohort` is `Named(name)`, look up the named cohort. If `Auto`, verify `for_user` is provided (cannot auto-derive without a user). Return `QueryError::UnknownCohort` or `QueryError::MissingUserForAutoCohort`.
4. **User existence:** If `for_user` is `Some(id)`, verify the user exists in the entity store. Return `QueryError::UnknownUser` if not found.
5. **Embedding availability:** If the profile's candidate strategy is `Ann` with `VectorSource::UserPreference`, verify the user has a preference vector. If not, fall back to the population default vector.
6. **Search query parsing:** Parse the `query` string into a `SearchQuery` AST. Return `QueryError::InvalidQuery` on syntax errors (unbalanced quotes, empty phrases).
7. **Cursor validation:** If a cursor is provided, verify its `query_hash` matches the current query. Return `QueryError::InvalidCursor` if the query parameters changed between pages.

```rust
/// Errors returned by the query engine.
pub enum QueryError {
    /// The named profile does not exist in the schema catalog.
    UnknownProfile(String),
    /// A filter references a field that does not exist on the target entity.
    InvalidFilter { field: String, reason: String },
    /// The named cohort does not exist.
    UnknownCohort(String),
    /// Auto cohort derivation requires a user context.
    MissingUserForAutoCohort,
    /// The user ID does not exist in the entity store.
    UnknownUser(UserId),
    /// The search query string has a syntax error.
    InvalidQuery(String),
    /// The pagination cursor is invalid or stale.
    InvalidCursor(String),
    /// The profile's candidate strategy requires a vector that is unavailable.
    MissingVector(String),
    /// An internal subsystem error (storage, index, signal).
    Internal(String),
    /// The database is still warming up and cannot serve queries yet.
    NotReady,
}
```

---

## 4. Query Planning

The planner transforms a validated query into an execution plan. The plan is a sequence of physical operations with estimated costs. The planner's job is to minimize end-to-end latency while guaranteeing correctness.

### 4.1 Candidate Generation Strategies

The planner selects one of five candidate generation strategies based on the ranking profile's `candidate` field and the query type.

```rust
/// Physical candidate generation strategy selected by the planner.
pub(crate) enum CandidateStrategy {
    /// Approximate nearest neighbor search via HNSW.
    /// Used for personalized feeds (user preference vector)
    /// and related content (anchor item embedding).
    Ann {
        query_vector: Vec<f32>,
        index: EntityKind,
        slot: EmbeddingSlot,
        top_k: usize,
        /// Filter predicate for the adaptive query planner.
        /// Selectivity determines strategy (in-graph, ACORN, brute-force).
        filter: Option<FilterBitmap>,
        /// Selected ANN strategy from the adaptive planner.
        ann_strategy: AnnStrategy,
    },

    /// Full scan with signal-based scoring.
    /// Used for trending (velocity sort), browse (field sort),
    /// and any query where candidates are not similarity-driven.
    Scan {
        entity: EntityKind,
        /// Pre-filter bitmap to narrow the scan.
        filter: Option<FilterBitmap>,
        /// Sort expression that determines scan order.
        sort: SortExpression,
    },

    /// Hybrid text + vector retrieval with fusion.
    /// Used for SEARCH queries with both text and vector.
    Hybrid {
        text_query: SearchQuery,
        query_vector: Option<Vec<f32>>,
        entity: EntityKind,
        text_top_k: usize,
        vector_top_k: usize,
        fusion: FusionStrategy,
        /// Filter predicate pushed into both retrieval legs.
        filter: Option<FilterBitmap>,
    },

    /// Relationship traversal for candidate generation.
    /// Used for following feeds, social graph scoped queries.
    Relationship {
        user_id: UserId,
        edge_kind: RelationshipKind,
        depth: TraversalDepth,
        /// Max fan-out per hop.
        max_fan_out: usize,
    },

    /// Cohort-scoped trending as candidate source.
    /// Used for "trending among people like me" queries.
    CohortTrending {
        cohort: ResolvedCohort,
        window: Window,
        min_velocity: f64,
        top_k: usize,
    },
}

/// ANN strategy selected by the adaptive query planner (from Vector Retrieval spec Section 9).
pub(crate) enum AnnStrategy {
    /// Standard HNSW search, no filter.
    Standard { ef_search: usize },
    /// In-graph predicate filter. Selectivity > 20%.
    InGraphFilter { ef_search: usize },
    /// Pre-filter + widened HNSW (ACORN-1). Selectivity 1-20%.
    Acorn { ef_search: usize },
    /// Pre-filter + brute-force. Selectivity < 1%.
    BruteForce,
}
```

### 4.1.1 Strategy Comparison Table

| Strategy | Use Case | Candidate Source | Typical Latency | Candidate Count | Filter Push-Down | When to Choose |
|----------|----------|-----------------|-----------------|-----------------|-----------------|----------------|
| **Ann** | Personalized feed, related content, "more like this" | HNSW vector index (user pref vector or anchor embedding) | 8-15ms | 200-500 | Yes (in-graph / ACORN / brute-force via adaptive planner) | Profile specifies `Candidate::Ann` or query has `similar_to` |
| **Scan** | Trending, browse by field, top-N by signal | Signal/metadata sorted index, full entity scan | 5-20ms | 200-1000 | Yes (bitmap skip during scan) | Sort mode is signal-based (velocity, decay_score, count) or metadata-based (created_at, duration) |
| **Hybrid** | SEARCH queries with text + vector | Tantivy BM25 + HNSW ANN, parallel execution, RRF/linear fusion | 10-20ms (parallel) | 300-600 (merged) | Yes (pushed into both Tantivy fast-fields and HNSW predicate) | SEARCH query has both text query and query embedding |
| **Relationship** | Following feed, social-graph scoped | BFS traversal of social graph edges | 5-15ms | 100-1000 | No (filters applied post-traversal) | Profile specifies `Candidate::Relationship` (following, social) |
| **CohortTrending** | "Trending among people like me" | Cohort-scoped signal velocity scan | 10-20ms | 200-500 | Post-filter only (metadata filters after velocity sort) | Query has `for_cohort` with trending sort or profile specifies cohort trending |
| **ComposedSearch** | SEARCH WITHIN TRENDING FOR COHORT | Phase 1-2: cohort trending candidates; Phase 3: text/vector search within that set | 25-40ms (4 phases) | 50-200 (after search within 500 trending) | Metadata filters applied to trending set before search phase | Query has `within_trending` clause |

**Cost Model Summary:**

| Strategy | CPU Cost | Memory Cost | I/O Cost | Concurrency Impact |
|----------|----------|-------------|----------|-------------------|
| **Ann** | O(log N * ef_search * M) | O(ef_search) visited set | 0 (in-memory HNSW) | None (read-only graph traversal) |
| **Scan** | O(K) where K = candidates to emit | O(K) result buffer | Possible cold-tier signal reads | None (snapshot isolation) |
| **Hybrid** | O(BM25) + O(ANN) parallel | O(text_k + vector_k) + merge buffer | Tantivy segment reads | None (separate readers) |
| **Relationship** | O(fan_out^depth) bounded by max_fan_out | O(visited) set for cycle detection | Edge list reads from storage | None (immutable edge snapshots) |
| **CohortTrending** | O(tracked_items) velocity scan | O(top_k) sorted buffer | Cohort signal reads (hot or warm tier) | None (atomic reads) |
| **ComposedSearch** | Sum of CohortTrending + Hybrid on small set | O(trending_k) + O(search results) | Same as CohortTrending + brute-force vector | None |

### 4.2 Plan Construction

The planner constructs an `ExecutionPlan` -- the complete recipe for executing the query.

```rust
/// The complete execution plan. Immutable once constructed.
/// Logged at DEBUG level for every query for observability.
pub(crate) struct ExecutionPlan {
    /// How candidates are generated.
    candidate_strategy: CandidateStrategy,
    /// Pre-computed filter bitmap (if filters are present).
    filter_bitmap: Option<FilterBitmap>,
    /// Which signals to load for scoring.
    required_signals: Vec<SignalRef>,
    /// The scoring function from the ranking profile.
    scoring: ScoringPlan,
    /// Diversity enforcement strategy.
    diversity: Option<DiversityPlan>,
    /// Pagination state.
    pagination: PaginationPlan,
    /// Estimated total cost for logging and monitoring.
    estimated_cost: CostEstimate,
    /// Whether this is a composed query (SEARCH WITHIN TRENDING).
    composition: Option<CompositionPlan>,
}

/// Cost estimate for plan logging and monitoring.
pub(crate) struct CostEstimate {
    /// Estimated number of candidates before filtering.
    candidate_count: usize,
    /// Estimated number of candidates after filtering.
    filtered_count: usize,
    /// Estimated wall-clock time in microseconds.
    estimated_latency_us: u64,
}
```

### 4.3 Planner Decision Tree

The planner selects the candidate strategy based on the query type and profile configuration. The decision tree is deterministic -- given the same inputs, the planner always produces the same plan.

```
Query Planner Decision Tree

                         ┌────────────────────┐
                         │  Query Operation?   │
                         └─────────┬──────────┘
                                   │
              ┌────────────────────┼────────────────────┐
              │                    │                    │
        ┌─────▼──────┐    ┌───────▼───────┐    ┌──────▼──────┐
        │  RETRIEVE   │    │   SEARCH      │    │  SUGGEST    │
        └─────┬──────┘    └───────┬───────┘    └─────────────┘
              │                    │              (bypass pipeline,
              │                    │               see Section 9)
              ▼                    ▼
    ┌──────────────────┐  ┌──────────────────────┐
    │ Profile.candidate │  │ Has within_trending? │
    └────────┬─────────┘  └─────────┬────────────┘
             │                      │
    ┌────────┼────────┬──────┐     ├──── yes ──► ComposedSearch
    │        │        │      │     │              (Section 6)
    ▼        ▼        ▼      ▼     │
   Ann     Scan   Relation  Cohort └──── no ──►  Profile.candidate?
    │        │    -ship    Trending              │
    │        │      │        │           ┌──────┼──────┐
    │        │      │        │           │      │      │
    │        │      │        │          Hybrid  Ann   Scan
    │        │      │        │           │      │      │
    ▼        ▼      ▼        ▼           ▼      ▼      ▼
  ANN     Signal  BFS    Cohort      Text+Vec  ANN  Signal
  search  scan   trav.  velocity     parallel  only  sort
    │        │      │     scan         │        │      │
    └────────┴──────┴──────┴───────────┴────────┴──────┘
                           │
                    ┌──────▼──────┐
                    │ Has filters? │
                    └──────┬──────┘
                     yes   │   no
                      ▼    │    ▼
              Build filter │  Skip filter
              bitmap       │  evaluation
                      │    │    │
                      └────┴────┘
                           │
                    ┌──────▼──────────┐
                    │ For ANN: select │
                    │ ANN strategy    │
                    │ via selectivity │
                    └─────────────────┘
                           │
                    ┌──────▼──────────┐
                    │ Build scoring   │
                    │ plan from       │
                    │ profile def     │
                    └─────────────────┘
                           │
                    ┌──────▼──────────┐
                    │ Build diversity │
                    │ plan            │
                    └─────────────────┘
                           │
                    ┌──────▼──────────┐
                    │ Build pagination│
                    │ plan from cursor│
                    └─────────────────┘
                           │
                           ▼
                    ExecutionPlan ready
```

### 4.4 Selectivity Estimation

For filtered ANN queries, the planner must estimate filter selectivity before choosing the ANN strategy. Selectivity estimation uses the bitmap cardinality from the Entity Model's metadata indexes (spec 02) and the Vector Retrieval spec's adaptive query planner (spec 07, Section 9).

```
Selectivity Estimation

For each filter predicate:
  keyword equality:     cardinality(bitmap[field][value]) / total_entities
  keyword IN-list:      cardinality(union(bitmaps)) / total_entities
  numeric range:        estimate from sorted index statistics
  boolean:              cardinality(bitmap[field][true_or_false]) / total_entities
  unseen (user state):  user_seen_count / total_entities
  relationship:         edge_count / total_entities

For compound filters (AND):
  selectivity = product of individual selectivities
  (independence assumption; refined by correlation cache)

For compound filters (OR):
  selectivity = sum(individual) - sum(pairwise) + ...
  (approximation: sum(individual) * 0.9)

Result: float in [0.0, 1.0]
  Maps to ANN strategy via thresholds:
    > 0.20  -->  InGraphFilter
    0.01-0.20  -->  Acorn (widened ef_search)
    < 0.01  -->  BruteForce
    1.0     -->  Standard (no filter)
```

The correlation cache (maintained by the background materializer) stores joint selectivity estimates for frequently co-occurring filter pairs. When the independence assumption is known to be inaccurate (e.g., `category:jazz AND format:audio`), the cache provides a corrected estimate.

### 4.5 Scoring Plan

The scoring plan is derived from the ranking profile definition and determines which signals, relationships, and boosts are evaluated for each candidate.

```rust
/// Scoring plan derived from the ranking profile.
pub(crate) struct ScoringPlan {
    /// Signal-based boosts: signal name, window, metric, weight.
    signal_boosts: Vec<SignalBoostPlan>,
    /// Relationship-based boosts: edge kind, weight.
    relationship_boosts: Vec<RelationshipBoostPlan>,
    /// Social proof boost weight (if enabled).
    social_proof_weight: Option<f64>,
    /// Cohort trending boost (if enabled).
    cohort_trending_boost: Option<CohortBoostPlan>,
    /// Temporal decay: field, half-life.
    temporal_decay: Option<TemporalDecayPlan>,
    /// Quality gates: minimum signal thresholds.
    gates: Vec<GatePlan>,
    /// Scoring penalties.
    penalties: Vec<PenaltyPlan>,
    /// Hard exclusions (hide, block).
    excludes: Vec<ExcludePlan>,
    /// Exploration fraction: percentage of results from unfamiliar creators.
    exploration: f64,
}

pub(crate) struct SignalBoostPlan {
    signal: SignalName,
    window: Window,
    metric: SignalMetric,   // Value, Velocity, Ratio, UniqueRatio
    weight: f64,
}
```

---

## 5. Execution Pipeline

The execution pipeline is a six-stage sequence. Every query -- RETRIEVE and SEARCH -- flows through the same pipeline. The candidate generation stage varies by plan; the remaining stages are uniform.

### 5.1 Pipeline Architecture

```
                   RETRIEVE / SEARCH query
                           │
                           ▼
                ┌──────────────────────┐
  Stage 1       │  CANDIDATE GENERATION │  Generate initial candidate set.
                │                      │  Strategy depends on plan:
                │  ANN / Scan / Hybrid │  ANN, Scan, Hybrid, Relationship,
                │  Relationship /      │  CohortTrending, or Composed.
                │  CohortTrending      │
                │                      │  Output: Vec<RawCandidate>
                └──────────┬───────────┘  (entity_id, retrieval_score)
                           │
                           │  200-1000 candidates
                           ▼
                ┌──────────────────────┐
  Stage 2       │  FILTER EVALUATION    │  Apply metadata and state filters.
                │                      │  Bitmap intersection for metadata.
                │  Bitmap intersection │  Hash set for seen/excluded IDs.
                │  + user state check  │  Relationship check for blocked.
                │  + exclusion check   │
                │                      │  Output: Vec<FilteredCandidate>
                └──────────┬───────────┘  (same as input, minus excluded)
                           │
                           │  100-500 candidates (typical)
                           ▼
                ┌──────────────────────┐
  Stage 3       │  SIGNAL LOADING       │  Load signal state from hot tier.
                │                      │  One atomic read per signal per
                │  Hot tier reads      │  candidate. Apply lazy decay.
                │  (lock-free atomics) │
                │                      │  Output: Vec<ScoredCandidate>
                └──────────┬───────────┘  (+ signal_snapshot per candidate)
                           │
                           │  100-500 candidates with signal state
                           ▼
                ┌──────────────────────┐
  Stage 4       │  SCORING              │  Apply ranking profile:
                │                      │  - Signal boosts (decay, velocity)
                │  Profile boosts,     │  - Relationship boosts
                │  gates, penalties,   │  - Social proof
                │  temporal decay      │  - Temporal decay
                │                      │  - Quality gates (min thresholds)
                │                      │  - Penalties (skip, negative signals)
                │                      │  - Hard excludes (hide, block)
                │                      │
                │                      │  Output: Vec<RankedCandidate>
                └──────────┬───────────┘  (entity_id, final_score)
                           │
                           │  50-300 candidates with scores
                           ▼
                ┌──────────────────────┐
  Stage 5       │  DIVERSITY            │  Enforce variety constraints:
                │  ENFORCEMENT          │  - max_per_creator
                │                      │  - format_mix
                │  Creator cap,        │  - topic_diversity (MMR)
                │  format mix,         │  - exploration injection
                │  topic MMR           │
                │                      │  Output: Vec<DiverseCandidate>
                └──────────┬───────────┘  (reordered, not reduced)
                           │
                           │  limit + buffer candidates
                           ▼
                ┌──────────────────────┐
  Stage 6       │  PAGINATION           │  Apply cursor position.
                │                      │  Slice to requested limit.
                │  Cursor decode,      │  Encode next_cursor.
                │  offset, limit       │
                │                      │  Output: Results
                └──────────────────────┘  (results, next_cursor,
                                           total_candidates)
```

### 5.2 Stage 1: Candidate Generation

Candidate generation is the most variable stage. The planner selects one of six physical strategies.

**ANN (Approximate Nearest Neighbor):**
Queries the HNSW vector index via the `VectorIndex` trait. The query vector comes from the user's preference vector (`VectorSource::UserPreference`), an anchor item's embedding (`similar_to`), or an explicit query embedding (`Search.vector`). The adaptive query planner (spec 07, Section 9) selects the ANN strategy based on filter selectivity. Output: `(entity_id, cosine_similarity)` pairs, sorted by similarity descending.

**Scan:**
Iterates over entities in the entity store, sorted by a signal expression (velocity, decay score, field value). The filter bitmap is applied during the scan to skip non-matching entities. Used for trending (velocity sort), browse (field sort), and queries where no similarity signal exists. Output: `(entity_id, sort_value)` pairs.

**Hybrid (Text + Vector):**
Executes text retrieval (BM25 via `TextIndex`) and vector retrieval (ANN via `VectorIndex`) in parallel. Fuses results using Reciprocal Rank Fusion (RRF) or linear combination, per the profile's fusion configuration (spec 06, Section 11). Output: `(entity_id, fused_score, text_score, vector_score)` tuples.

**Relationship:**
Traverses the social graph via `RelationshipStore::traverse_graph()`. Starting from the querying user, follows edges of the specified kind (e.g., `follows`) up to the configured depth. Collects item IDs by loading creator-to-item mappings for followed creators. Output: `(entity_id, edge_weight)` pairs.

**CohortTrending:**
Reads cohort-scoped signal velocity for items with active cohort tracking. Filters to items above the minimum velocity threshold within the specified window. Sorts by velocity descending. Output: `(entity_id, cohort_velocity)` pairs.

**ComposedSearch:**
The composed strategy for `SEARCH WITHIN TRENDING`. Detailed in [Section 6](#6-query-composition).

### 5.3 Stage 2: Filter Evaluation

Filter evaluation reduces the candidate set by applying metadata predicates, user state checks, and exclusion lists. See [Section 7](#7-filter-evaluation) for the full design.

The key insight: filters are evaluated against pre-computed roaring bitmaps. For each metadata filter, the bitmap for that field/value is loaded from the Entity Model's bitmap indexes. The intersection of all filter bitmaps produces the surviving candidate set. This is an O(|bitmap|) operation, independent of the number of candidates.

**Filter push-down optimization:** For ANN queries, metadata filters are pushed into the vector index via the predicate callback (in-graph filter) or pre-filter bitmap (brute-force, ACORN). The candidate generation stage already applies these filters, so Stage 2 only needs to check user-state filters (unseen, saved, in-progress) and exclusion lists (blocked creators, excluded IDs).

**Short-circuit on empty:** If any filter bitmap has zero cardinality (e.g., `category:nonexistent`), the pipeline returns an empty result set immediately without proceeding to later stages.

### 5.4 Stage 3: Signal Loading

For each surviving candidate, the pipeline loads signal state from the hot tier. This is the most latency-sensitive stage after candidate generation.

**Access pattern:**
For each candidate entity ID, for each signal referenced in the scoring plan's boosts, gates, and penalties:
1. Index into the hot tier's `HotSignalState` array using the entity ID and signal type index.
2. Load `last_update_ns` with `Ordering::Acquire`.
3. Load `decay_scores[i]` with `Ordering::Acquire`.
4. Compute the lazy-decayed score: `score(now) = stored_score * exp(-lambda * (now - last_update))`.
5. Store the result in the candidate's signal snapshot.

**Memory ordering rationale:** Acquire on `last_update_ns` ensures we see the most recent decay score that was stored with Release by a concurrent signal writer. Without Acquire, we could read a new timestamp with an old score, producing an over-decayed value. See Signal System spec (03, Section 3) for the full ordering proof.

**Cost model:** Each signal read is ~15ns (one cache-line load + one `exp()` call). For 200 candidates with 6 signals each: 200 * 6 * 15ns = 18us. This is negligible.

If a candidate entity has been evicted from the hot tier (no recent signals), its signal state is loaded from the warm tier. This requires a hash table lookup (~50ns) and potentially a disk read from the cold tier (~100us). The planner accounts for this by padding the latency estimate for scan-based queries over the full corpus.

### 5.5 Stage 4: Scoring

The scoring stage applies the ranking profile's formula to each candidate. Every term in the profile definition maps to a scoring operation:

```
For each candidate:
    base_score = retrieval_score (from Stage 1: similarity, BM25, velocity)

    // Signal boosts
    for each boost in profile.boosts:
        signal_value = candidate.signals[boost.signal][boost.window][boost.metric]
        base_score += boost.weight * signal_value

    // Relationship boosts
    for each rel_boost in profile.relationship_boosts:
        edge_weight = relationship_store.load_weight(user, candidate.creator, rel_boost.edge)
        base_score += rel_boost.weight * edge_weight

    // Social proof
    if profile.social_proof_weight > 0:
        proof_score = social_proof_map.lookup(candidate.entity_id)
        base_score += profile.social_proof_weight * proof_score

    // Temporal decay
    if profile.temporal_decay is Some:
        age = now - candidate.metadata[decay_field]
        decay_factor = exp(-ln(2) / half_life * age)
        base_score *= decay_factor

    // Quality gates (hard minimum thresholds)
    for each gate in profile.gates:
        if candidate.signals[gate.signal][gate.window][gate.metric] < gate.min:
            base_score = -inf  // eliminate candidate
            break

    // Penalties
    for each penalty in profile.penalties:
        signal_value = candidate.signals[penalty.signal][penalty.window]
        base_score += penalty.weight * signal_value  // weight is negative

    // Hard excludes (hide, block)
    for each exclude in profile.excludes:
        if exclude matches candidate:
            base_score = -inf  // eliminate candidate
            break

    candidate.final_score = base_score
```

Candidates with `final_score == -inf` (gated or excluded) are removed. Remaining candidates are sorted by `final_score` descending.

**Social proof computation:** For personalized queries (`for_user` is `Some`), social proof measures how many of the user's social connections engaged with this item. The social proof map is built as a side product of relationship traversal (depth-2 BFS, bounded fan-out) and cached for the duration of the query. Cost: <10ms for depth-2 traversal (spec 04, Section 13).

### 5.6 Stage 5: Diversity Enforcement

Diversity enforcement reorders the scored result set to ensure variety without reducing the result count (unless insufficient candidates exist). Three mechanisms operate in sequence:

**max_per_creator:** No more than N items from the same creator in the final result set. Implementation: iterate through scored results. For each creator, maintain a count. If a candidate exceeds the cap, demote it (push it down the list, do not remove it). This preserves the best-scoring item from each creator at its natural position.

**format_mix:** Ensure a mix of content formats (video, short, article, podcast). Implementation: round-robin insertion. After max_per_creator, partition candidates by format. Interleave from each format bucket in proportion to its representation in the scored set, biased toward higher-scoring items.

**topic_diversity (MMR):** Maximal Marginal Relevance. Re-scores candidates to balance relevance and novelty:

```
MMR_score(d) = lambda * relevance(d) - (1 - lambda) * max_sim(d, selected)

where:
  lambda = 1.0 - topic_diversity  (topic_diversity in [0.0, 1.0])
  relevance(d) = candidate's final_score from Stage 4
  max_sim(d, selected) = maximum embedding cosine similarity between d
                          and any already-selected result
```

MMR is the most expensive diversity operation (O(k * n) distance computations where k = selected count and n = remaining candidates). For typical result sizes (limit = 50, candidates = 200), this is 50 * 200 * ~500ns = 5ms. Within budget.

**Exploration injection:** If `profile.exploration > 0`, the pipeline reserves that fraction of result slots for items from creators the user does not follow and has not interacted with. These are drawn from the candidate set but bypassed the relationship boost. Exploration items are scored normally (they may still score well on signal boosts and text relevance) but are guaranteed representation in the final set.

### 5.7 Stage 6: Pagination

See [Section 8](#8-pagination) for the full pagination design. The pagination stage:

1. If a cursor is provided, decode it and skip to the cursor position.
2. Slice the result set to `[cursor_offset .. cursor_offset + limit]`.
3. If more results exist beyond the slice, encode a `next_cursor` for the response.
4. Construct the `Results` struct with the sliced results, the cursor, and the total candidate count.

---

## 6. Query Composition

Query composition is the mechanism that powers `SEARCH WITHIN TRENDING FOR COHORT`. This is the most complex query type in the system, and the reason the query engine exists as a distinct module rather than a thin wrapper over subsystems.

### 6.1 What Composition Means

A composed query has two phases: a **restriction phase** that generates a constrained candidate set, and a **search phase** that retrieves within that set.

```
SEARCH items
QUERY "piano"
WITHIN TRENDING FOR COHORT young_us_jazz
WINDOW 24h
LIMIT 20
```

Semantics: "Find items matching 'piano' that are currently trending among young US jazz fans in the last 24 hours."

`WITHIN TRENDING` is a candidate generation strategy, not a filter. Items not trending in the cohort are never considered, regardless of their text relevance to "piano." The search operates only within the trending candidate set.

### 6.2 Composition vs. Filtering

The distinction is critical and worth making explicit:

**Filter:** "Find items matching 'piano', then remove items that are not trending." This is wrong because it generates candidates from the full text index, scores them, and then discards non-trending results. If only 50 of the text index's top-500 candidates happen to be trending, you get poor recall and wasted work.

**Composition:** "Generate the trending candidate set first (e.g., top 500 trending items), then search for 'piano' within that set." This generates candidates from the right population and searches within it. Every result is both trending AND relevant to the query.

### 6.3 Four-Phase Execution Flow

```
Composed Search: SEARCH "piano" WITHIN TRENDING FOR COHORT young_us_jazz WINDOW 24h

Phase 1: Cohort Resolution                              < 2ms
┌──────────────────────────────────────────────────────────────┐
│ Resolve "young_us_jazz" predicate:                           │
│   region_bitmap["US"] ∩ age_bitmap["18-24"]                  │
│     ∩ interests_bitmap["jazz"]                               │
│   --> user bitmap D (cohort membership)                      │
│                                                              │
│ Check cohort population: |D| >= 2000 active users?           │
│   yes --> proceed                                            │
│   no  --> fallback to parent cohort + warning                │
└──────────────────────────────────────────────────────────────┘
                          │
                          ▼
Phase 2: Cohort Trending Candidate Generation            < 20ms
┌──────────────────────────────────────────────────────────────┐
│ For items with cohort tracking active:                        │
│   Read cohort-scoped velocity for window=24h                 │
│                                                              │
│ Signal path (from Cohorts spec Section 6.3):                 │
│   - If exact_tracking: true --> Level 2 segment counter      │
│   - If single Level 1 dim  --> Level 1 rollup lookup         │
│   - If composite            --> independence estimation      │
│                                                              │
│ Filter to items with velocity > min_velocity threshold       │
│ Sort by cohort velocity descending                           │
│ Take top max_candidates (default: 500)                       │
│                                                              │
│ Output: trending_set = Vec<(EntityId, f64)>                  │
│         (entity_id, cohort_velocity)                         │
└──────────────────────────────────────────────────────────────┘
                          │
                          ▼
Phase 3: Search Within Trending Set                      < 10ms
┌──────────────────────────────────────────────────────────────┐
│ Convert trending_set entity IDs to a roaring bitmap          │
│                                                              │
│ Text search path:                                            │
│   TextIndex::score_candidates(                               │
│     entity_kind: Item,                                       │
│     query: SearchQuery parsed from "piano",                  │
│     candidate_ids: &trending_set_ids,                        │
│   )                                                          │
│   --> BM25 scores for trending items matching "piano"        │
│                                                              │
│ Vector search path (if query embedding provided):            │
│   Brute-force distance computation against trending_set      │
│   (set is small enough -- 500 items -- for exact search)     │
│   --> cosine similarity scores                               │
│                                                              │
│ Fusion (RRF or linear combination):                          │
│   Merge text and vector scores                               │
│   Carry cohort_velocity as an additional feature              │
│                                                              │
│ Output: Vec<ComposedCandidate>                               │
│   (entity_id, text_score, vector_score, fused_score,         │
│    cohort_velocity)                                          │
└──────────────────────────────────────────────────────────────┘
                          │
                          ▼
Phase 4: Final Ranking                                   < 5ms
┌──────────────────────────────────────────────────────────────┐
│ Combine search relevance with cohort trending score:         │
│                                                              │
│   final_score = alpha * fused_relevance_score                │
│               + beta * normalized_cohort_velocity            │
│               + signal_boosts + relationship_boosts          │
│               - penalties                                    │
│                                                              │
│ Where alpha + beta are derived from the ranking profile.     │
│ Default: alpha=0.6 (relevance), beta=0.4 (trending).        │
│                                                              │
│ Apply diversity constraints                                  │
│ Return top limit (20) results                                │
│                                                              │
│ Output: Results                                              │
└──────────────────────────────────────────────────────────────┘

Total estimated latency: < 37ms (within 50ms budget)
```

### 6.4 Composition Plan Type

```rust
/// Plan for a composed query (SEARCH WITHIN TRENDING).
pub(crate) struct CompositionPlan {
    /// Phase 1: cohort to resolve.
    cohort: ResolvedCohort,
    /// Phase 2: trending candidate generation.
    trending_window: Window,
    trending_min_velocity: f64,
    trending_max_candidates: usize,
    /// Phase 3: search within trending.
    search_query: SearchQuery,
    search_vector: Option<Vec<f32>>,
    fusion: FusionStrategy,
    /// Phase 4: relevance/trending weight balance.
    relevance_weight: f64,
    trending_weight: f64,
}
```

### 6.5 Why the Trending Set Is a Candidate Strategy, Not a Filter

Consider an item that matches "piano" perfectly (BM25 score = 12.5) but has zero velocity in the cohort. With filtering, this item would appear in the initial text retrieval results (top 500 by BM25), pass through scoring, and only be removed at filter evaluation. This wastes a candidate slot that could have gone to a less-relevant but trending item.

With composition, the trending set is generated first. Only trending items enter the search phase. A text-relevant item with zero trending velocity is never evaluated. This means:

1. Every returned result is both trending AND text-relevant.
2. No candidate slots are wasted on non-trending items.
3. The search phase operates on a small set (500 items), making brute-force vector search practical.
4. The latency budget is spent on results that will actually be returned.

### 6.6 Fallback Behavior

If the cohort population is below the minimum threshold (from Cohorts spec Section 9.4: 2000 active users for search within cohort trending), the engine:

1. Emits `CohortWarning::InsufficientPopulation` in the response.
2. Falls back to the nearest parent cohort in the hierarchy that meets the threshold.
3. Adds a cohort-relative boost from the original cohort (if any exact data exists) as a secondary signal.

If the trending set is empty (no items trending in the cohort for this window), the engine:

1. Emits `CompositionWarning::EmptyTrendingSet` in the response.
2. Falls back to a standard SEARCH without the `WITHIN TRENDING` restriction.
3. Adds a note to the response indicating the fallback.

---

## 7. Filter Evaluation

### 7.1 Bitmap-Based Architecture

Filters are evaluated using roaring bitmaps from the Entity Model's metadata indexes (spec 02, Cohort-Ready Design). Each keyword field value, each boolean value, and each numeric range bucket has a pre-computed bitmap of entity IDs matching that value. Filter evaluation is bitmap algebra.

```
Filter: category:jazz AND format:video AND unseen(user_123)

Step 1: metadata filters (bitmap intersection)
  category_bitmap["jazz"]      --> bitmap A  (items in jazz category)
  format_bitmap["video"]       --> bitmap B  (items in video format)
  A ∩ B                        --> bitmap C  (jazz videos)

Step 2: user-state filters
  user_123.seen_set            --> bitmap D  (items user has seen)
  C \ D                        --> bitmap E  (unseen jazz videos)

Step 3: exclusion filters
  user_123.blocked_creators    --> bitmap F  (items by blocked creators)
  E \ F                        --> bitmap G  (final filter bitmap)

Result: bitmap G applied to candidate set
```

### 7.2 Filter Push-Down

For candidate generation strategies that support it, filters are pushed into the generation phase to reduce the number of candidates that enter later stages.

| Strategy | Push-Down Mechanism |
|----------|-------------------|
| **ANN** | Metadata filter bitmap passed to `VectorIndex::filtered_search()` as predicate callback or pre-filter set. User-state filters evaluated in Stage 2. |
| **Scan** | Filter bitmap used to skip non-matching entities during iteration. |
| **Hybrid** | Metadata filter bitmap passed to both text and vector retrieval. Tantivy uses fast-field filtering. USearch uses predicate callback. |
| **Relationship** | Filters applied after traversal (edge targets are not pre-filtered). |
| **CohortTrending** | Metadata filters applied to the trending candidate set after velocity computation. |

### 7.3 Filter Types

```rust
/// A filter predicate for query evaluation.
pub enum Filter {
    /// Exact equality on a keyword or boolean field.
    Eq { field: FieldName, value: FieldValue },
    /// Any of the specified values (OR within dimension).
    Any { field: FieldName, values: Vec<FieldValue> },
    /// Numeric range.
    Range { field: FieldName, min: Option<f64>, max: Option<f64> },
    /// Minimum value threshold.
    Min { field: FieldName, value: f64 },
    /// Maximum value threshold.
    Max { field: FieldName, value: f64 },
    /// Duration preset (short, medium, long).
    Preset { field: FieldName, preset: String },
    /// Created within a duration.
    CreatedWithin(Duration),
    /// Created after a timestamp.
    CreatedAfter(Timestamp),
    /// Created before a timestamp.
    CreatedBefore(Timestamp),
    /// Since a timestamp (for notifications).
    Since(Timestamp),
    /// Items the user has not seen.
    Unseen,
    /// Items the user has engaged with in a specific state.
    UserState(String),
    /// Items not by blocked creators.
    NotBlocked,
    /// Items from followed creators only.
    Relationship(RelationshipKind),
    /// Items engaged by the user's social graph.
    SocialGraph { user_id: UserId, depth: TraversalDepth },
    /// Items in a specific collection.
    InCollection(String),
}
```

### 7.4 Short-Circuit Evaluation

Filter bitmaps are evaluated in ascending cardinality order. The smallest bitmap is evaluated first, minimizing the size of subsequent intersections.

```
Evaluation order: sort filters by estimated bitmap cardinality ascending.

If any bitmap has cardinality 0:
  --> return empty Results immediately (short-circuit)

If bitmap intersection yields 0 after any step:
  --> return empty Results immediately (short-circuit)
```

This optimization is significant for multi-filter queries. A `category:nonexistent` filter short-circuits the entire pipeline in <1ms.

### 7.5 User-State Filter Implementation

User-state filters (unseen, saved, in_progress, liked) require looking up the user's per-item state. These are stored as relationship edges in the relationship store and as signal events in the signal ledger.

**Unseen filter:** The user's "seen" set is a bloom filter (for approximate, fast check) backed by the signal ledger (for exact verification). The bloom filter is maintained in memory and updated on every signal write. False positive rate: <1% at 10M items per user with 128-bit fingerprints.

**Other user-state filters:** `saved`, `liked`, `in_progress` are loaded from the relationship store via `RelationshipStore::load_edge_set(user, edge_kind)`. These return a roaring bitmap of matching entity IDs. Cost: <100us per load (spec 04, Section 13).

---

## 8. Pagination

### 8.1 Cursor-Based Design

tidalDB uses cursor-based pagination, not offset-based. Offset pagination (`LIMIT 50 OFFSET 100`) breaks under concurrent writes: if new items are inserted between pages, the user sees duplicates or gaps. Cursor pagination is stable.

### 8.2 Cursor Structure

```rust
/// Opaque pagination cursor. Encoded as a base64 string.
pub struct Cursor {
    /// Score of the last item on the previous page.
    /// The next page starts from items with score < last_score
    /// (or < last_score at last_entity_id for tie-breaking).
    last_score: f64,
    /// Entity ID of the last item on the previous page.
    /// Tie-breaker: items with the same score are ordered by entity ID descending.
    last_entity_id: EntityId,
    /// Hash of the query parameters. Used to detect query changes between pages.
    query_hash: u64,
    /// Sequence number at cursor creation time. Used to detect stale cursors.
    created_at_seqno: u64,
}
```

### 8.3 Cursor Semantics

**Page 1 (no cursor):** Execute the full pipeline. Return the top `limit` results. Encode a cursor from the last result's score and entity ID.

**Page N (with cursor):** Execute the full pipeline but with an additional constraint: only consider candidates with `(score, entity_id) < (cursor.last_score, cursor.last_entity_id)` in the sort order. The pipeline generates candidates, filters, scores, and diversifies as normal, but the pagination stage skips results that precede the cursor position.

**Stale cursor detection:** The cursor contains a hash of the query parameters (profile, filters, sort, for_user). If the hash does not match the current query, `QueryError::InvalidCursor` is returned. This prevents confusing results from mixing parameters across pages.

**Cursor expiry:** Cursors do not expire by time. However, if the underlying data has changed significantly (e.g., a score recomputation shifted all scores), the cursor may produce slightly inconsistent results (a previously-returned item may re-appear if its score increased). This is acceptable for content ranking -- strict consistency across pages is not required.

### 8.4 Alternative: Exclude IDs

For applications that prefer simplicity over cursor semantics, `exclude_ids` can be used. Pass the IDs from previous pages. The pipeline treats these as hard exclusions in Stage 2. This is less efficient than cursor-based pagination (the pipeline re-scores items it will discard) but simpler to implement on the application side.

### 8.5 Cursor Encoding

The cursor is serialized as a base64-encoded byte sequence:

```
Cursor Wire Format (24 bytes before base64)

+----------+-----------+-----------+----------+
| f64 LE   | u64 BE    | u64 LE    | u64 LE   |
| score    | entity_id | query_hash| seqno    |
| 8 bytes  | 8 bytes   | 8 bytes   | 8 bytes  |
+----------+-----------+-----------+----------+

Base64 encoded: 32 characters (with padding)
```

Entity ID uses big-endian for lexicographic sort compatibility with the storage engine's key encoding.

---

## 9. SUGGEST Operation

### 9.1 Architecture

SUGGEST bypasses the six-stage execution pipeline entirely. It is a lightweight operation designed for sub-10ms response times on every keystroke.

```
SUGGEST "jazz pia" FOR USER user_123

                    ┌─────────────────────┐
                    │  Parse prefix:       │
                    │  last_token = "pia"  │
                    │  context = "jazz"    │
                    └──────────┬──────────┘
                               │
              ┌────────────────┼────────────────┐
              │                │                │
    ┌─────────▼──────┐ ┌──────▼────────┐ ┌─────▼──────────┐
    │ Term Prefix    │ │ Popular Query │ │ Personal       │
    │ Completions    │ │ Completions   │ │ History        │
    │                │ │               │ │                │
    │ TextIndex::    │ │ query_log     │ │ user_123's     │
    │ suggest()      │ │ signal-       │ │ recent         │
    │                │ │ weighted      │ │ searches       │
    │ "pia*" in term │ │ "jazz pia*"  │ │ and engaged    │
    │ dictionary     │ │ by click     │ │ items          │
    │                │ │ velocity     │ │                │
    └────────┬───────┘ └──────┬───────┘ └──────┬─────────┘
             │                │                │
             └────────────────┼────────────────┘
                              │
                    ┌─────────▼──────────┐
                    │ Merge, deduplicate, │
                    │ rank by:            │
                    │  1. Personal recency│
                    │  2. Query velocity  │
                    │  3. Term frequency  │
                    └─────────┬──────────┘
                              │
                              ▼
                    ["jazz piano",
                     "jazz piano tutorial",
                     "jazz piano chords",
                     "jazz pianist",
                     "jazz piano solo"]
```

### 9.2 Response Type

```rust
pub struct SuggestResult {
    /// The suggested completion string.
    pub text: String,
    /// Source of the suggestion for UI rendering.
    pub source: SuggestionSource,
    /// Relevance/popularity score for ranking.
    pub score: f64,
}

pub enum SuggestionSource {
    /// From the term dictionary (term prefix completion).
    TermCompletion,
    /// From popular query tracking (search_click signal velocity).
    PopularQuery,
    /// From the user's personal search history.
    PersonalHistory,
    /// From trending queries (high-velocity recent searches).
    TrendingQuery,
}
```

### 9.3 Trending Searches (Empty Prefix)

When the prefix is empty, SUGGEST returns trending searches: queries with the highest search_click signal velocity in the recent window (1h or 24h). These are displayed in the search UI before the user types anything.

If `for_user` is provided, trending searches are personalized: the list includes a mix of globally trending queries and queries trending in the user's inferred cohort (auto-derived from attributes).

### 9.4 Performance

| Operation | Target | Conditions |
|-----------|--------|------------|
| Prefix autocomplete (typed prefix) | < 10ms p99 | 500K unique terms, 10M documents |
| Trending suggestions (empty prefix) | < 5ms p99 | In-memory signal state |
| Personalized suggestions | < 10ms p99 | User history in hot tier |

---

## 10. Query Context

Several query parameters modify the execution context without changing the pipeline structure. They inject additional state that the planner and executor consume.

### 10.1 FOR USER

```rust
for_user: Some("user_123")
```

Provides user context for personalization. Effects:

1. **User preference vector** is loaded and used as the query vector for `Candidate::Ann { query_vector: VectorSource::UserPreference }`.
2. **User state filters** become available (`unseen`, `saved`, `liked`, `in_progress`).
3. **Relationship exclusions** are active (`not_blocked`). The user's blocked set is loaded.
4. **Relationship boosts** are computed (`interaction_weight` edges from user to creators).
5. **Social proof** is computed (engagement overlap between user's social graph and candidates).
6. **Exploration injection** draws from creators outside the user's engagement graph.
7. **Auto cohort** derivation is possible (`CohortRef::Auto`).

Without `for_user`, the query is unpersonalized: no user state, no relationship filtering, no social proof. This is valid for global trending, category browse, and other unpersonalized surfaces.

### 10.2 FOR COHORT

```rust
for_cohort: Some(CohortRef::Named("young_us_jazz"))
```

Scopes signal aggregation to the specified cohort. The query engine resolves the cohort to a user bitmap, maps it to the signal system's dimensional hierarchy, and reads cohort-scoped signal aggregates instead of global aggregates.

Three cohort reference types (from Cohorts spec Section 8.4):

| CohortRef | Resolution |
|-----------|-----------|
| `Named("young_us_jazz")` | Look up named cohort in schema. Use cached bitmap. |
| `Predicate(Predicate::and(...))` | Evaluate predicate at query time. Build bitmap from attribute indexes. |
| `Auto` | Derive cohort from querying user's region, age_range, and top inferred interest. Requires `for_user`. |

### 10.3 CONTEXT

```rust
context: Some("feed")
```

A string tag identifying the discovery surface (feed, search, browse, related, notification, etc.). Context does not affect query execution directly. It is recorded alongside query results for the feedback loop (spec 10). When the user later interacts with a result, the feedback system knows which surface produced it, enabling per-surface ranking profile optimization.

### 10.4 SIMILAR TO

```rust
similar_to: Some(EntityId::from("item_abc"))
```

Anchors the query to a specific item. The anchor item's embedding is used as the query vector for ANN search (instead of the user's preference vector). Used for:

- Related content / "Up Next" (`RETRIEVE items SIMILAR TO item_abc`)
- Creator discovery (`RETRIEVE creators SIMILAR TO creator_xyz`)
- Visual similarity (`RETRIEVE items SIMILAR TO item_abc` with visual embedding slot)

If both `similar_to` and `for_user` are provided, the engine can blend the anchor embedding with the user preference vector:

```
query_vector = alpha * anchor_embedding + (1 - alpha) * user_preference
normalize(query_vector)
```

Where `alpha` is configurable (default: 0.7 -- biased toward the anchor). This produces "items similar to this one, tailored to this user's taste."

---

## 11. Performance Targets

### 11.1 End-to-End Query Latency

| Query Type | Target p50 | Target p99 | Conditions |
|-----------|-----------|-----------|-----------|
| RETRIEVE (personalized feed, ANN) | < 30ms | < 50ms | 10M items, 1M users, warm cache |
| RETRIEVE (trending, scan) | < 20ms | < 40ms | 10M items, global velocity sort |
| RETRIEVE (following, relationship) | < 25ms | < 40ms | User follows 500 creators |
| RETRIEVE (cohort trending) | < 40ms | < 60ms | Includes cohort resolution |
| SEARCH (text only) | < 20ms | < 40ms | 10M items, 3-term query |
| SEARCH (hybrid text + vector) | < 40ms | < 60ms | 10M items, includes fusion |
| SEARCH WITHIN TRENDING FOR COHORT | < 45ms | < 70ms | Full composition |
| SUGGEST (typed prefix) | < 8ms | < 15ms | 500K terms, 10M documents |
| SUGGEST (trending, empty prefix) | < 3ms | < 8ms | In-memory signal state |

### 11.2 Per-Stage Performance Budget

The end-to-end budget is decomposed into per-stage budgets. Exceeding any stage budget triggers a warning log. Exceeding the total budget logs at WARN level.

```
Performance Budget Breakdown: RETRIEVE (personalized feed, ANN)
Target: < 30ms p50

Stage                           Budget (p50)    Notes
──────────────────────────────  ────────────    ─────────────────────────
1. Candidate generation (ANN)      12ms        HNSW search, ef_search=200
2. Filter evaluation                 2ms        Bitmap intersection
3. Signal loading                  0.1ms        200 candidates * 6 signals * 15ns
4. Scoring                          2ms         200 candidates, profile eval
5. Diversity enforcement            3ms         MMR with topic_diversity
6. Pagination                     0.1ms         Cursor encode/decode
──────────────────────────────  ────────────
Subtotal                          19.2ms
Overhead (plan, alloc, I/O)        3ms
──────────────────────────────  ────────────
Total                             22.2ms        Headroom: 7.8ms


Performance Budget Breakdown: SEARCH (hybrid text + vector)
Target: < 40ms p50

Stage                           Budget (p50)    Notes
──────────────────────────────  ────────────    ─────────────────────────
1. Candidate generation
   - Text retrieval (BM25)         8ms          Tantivy search, 3 terms
   - Vector retrieval (ANN)       10ms          HNSW search, ef_search=200
   (parallel, total = max)        10ms          Both legs run concurrently
   - Fusion (RRF)                  1ms          HashMap merge, sort
2. Filter evaluation                2ms          Bitmap intersection
3. Signal loading                 0.1ms          400 candidates * 6 signals
4. Scoring                         3ms           400 candidates, profile eval
5. Diversity enforcement           3ms           MMR
6. Pagination                    0.1ms
──────────────────────────────  ────────────
Subtotal                         19.2ms
Overhead (plan, alloc, I/O)       4ms
──────────────────────────────  ────────────
Total                            23.2ms          Headroom: 16.8ms


Performance Budget Breakdown: SEARCH WITHIN TRENDING FOR COHORT
Target: < 45ms p50

Phase                           Budget (p50)    Notes
──────────────────────────────  ────────────    ─────────────────────────
1. Cohort resolution                2ms          Cached bitmap intersection
2. Trending candidate gen         15ms           Scan cohort-tracked items
3. Search within trending           8ms          BM25 on 500 candidates +
                                                 brute-force vector on 500
4. Final ranking                    5ms          Signal load + scoring +
                                                 diversity + pagination
──────────────────────────────  ────────────
Subtotal                          30ms
Overhead (plan, alloc, I/O)        5ms
──────────────────────────────  ────────────
Total                             35ms           Headroom: 10ms
```

### 11.3 Throughput Targets

| Metric | Target | Conditions |
|--------|--------|-----------|
| RETRIEVE queries per second | > 2,000 QPS | 10M items, 8 cores, steady-state signal writes |
| SEARCH queries per second | > 1,000 QPS | 10M items, 8 cores, includes fusion |
| SUGGEST queries per second | > 10,000 QPS | Lightweight, in-memory |

The query engine is read-heavy by design. All data it reads is either immutable (entity metadata), lock-free atomic (hot tier signal state), or snapshot-isolated (Tantivy reader, USearch view). Concurrent queries do not contend with each other.

---

## 12. Query Caching

### 12.1 Philosophy: Cache Structure, Not Results

The query engine does **not** cache query results. Content ranking is inherently temporal -- signals decay, velocities change, new items arrive. A cached result set from 30 seconds ago may already be stale. Instead, tidalDB caches the **structural components** that are expensive to recompute but change infrequently.

### 12.2 What Is Cached

| Cached Structure | TTL | Invalidation | Rationale |
|------------------|-----|-------------|-----------|
| **Cohort membership bitmaps** | 5 minutes | On cohort predicate change or attribute write | Bitmap intersection for named cohorts is O(dimensions). Once computed, the bitmap is reused across all queries targeting that cohort. |
| **Filter bitmaps** | Per-query (request-scoped) | N/A -- built fresh per query | Filter bitmaps are computed from metadata indexes. They are cheap to build (roaring bitmap ops are <2ms) and are not shared across queries because filter combinations vary. |
| **User preference vectors** | Until next embedding write | On `update_embedding()` call | The user's preference vector is loaded once per query from the entity store. It does not change during query execution. |
| **User state sets (seen, blocked)** | Request-scoped with bloom filter | Bloom filter updated on signal write | The user's seen bloom filter is maintained in memory. The blocked set is loaded from the relationship store per query (~100us). |
| **Selectivity correlation cache** | Background refresh (every 60s) | On bulk metadata writes | Joint selectivity estimates for frequently co-occurring filter pairs. Maintained by the background materializer. |
| **Tantivy segment readers** | Until segment merge | On Tantivy commit/merge | Tantivy internally manages segment reader pools. The text index trait wraps this. Readers are snapshot-isolated and reused across queries. |
| **HNSW graph** | Persistent (memory-mapped) | On index rebuild | The USearch HNSW graph is memory-mapped and shared across all concurrent queries. No per-query caching needed. |
| **Social proof map** | Request-scoped | N/A | Built during query execution via depth-2 BFS. Not shared across queries because it depends on the querying user. |

### 12.3 What Is NOT Cached (and Why)

| Not Cached | Reason |
|------------|--------|
| **Query result sets** | Results depend on real-time signal state (decay scores, velocities). Caching would serve stale rankings. The cost of re-execution (<50ms) is lower than the correctness cost of stale results. |
| **Signal scores** | Signals are read from the hot tier with lock-free atomics (~15ns per read). Caching would add staleness without meaningful latency reduction. |
| **Scored/ranked candidates** | Scoring depends on the querying user's relationship state, social proof, and exploration injection. Two users with the same query get different scores. |
| **Trending candidate sets** | Trending velocity changes continuously. A 30-second-old trending set may have materially different rankings. |
| **Execution plans** | Plans are cheap to construct (<1ms) and depend on current selectivity estimates, which change with data writes. |

### 12.4 Warm-Up on Startup

On database startup, the following structures are warmed before the query engine accepts requests:

1. **HNSW index**: Memory-map the on-disk graph. Pre-fault pages for the entry-point neighborhood.
2. **Hot tier signal state**: Load recent signal events from the WAL into the hot tier's atomic arrays.
3. **Named cohort bitmaps**: Pre-compute membership bitmaps for all schema-defined cohorts.
4. **Tantivy readers**: Open segment readers for all entity types with text indexes.
5. **Bloom filters**: Rebuild per-user seen bloom filters from recent signal events (or load from checkpoint).

The warm-up sequence is logged with per-step timing. The database reports "ready" only after all warm-up steps complete. During warm-up, queries return `QueryError::NotReady`.

### 12.5 Cache Sizing

| Structure | Memory per Unit | Sizing Formula | Example (10M items, 1M users) |
|-----------|----------------|----------------|-------------------------------|
| Cohort bitmap | ~1.2 MB per 10M items (roaring) | num_named_cohorts * 1.2 MB | 50 cohorts * 1.2 MB = 60 MB |
| User seen bloom filter | ~2 KB per user (128-bit, 10K items seen) | num_active_users * 2 KB | 100K active * 2 KB = 200 MB |
| Selectivity correlation cache | ~16 bytes per pair | top_100_pairs * 16 B | 100 * 16 B = 1.6 KB (negligible) |
| Hot tier signal state | 64 bytes per entity (cache-line aligned) | num_hot_entities * 64 B | 500K hot * 64 B = 32 MB |

---

## 13. Error Handling and Fallbacks

### 13.1 Design Principle: Degrade, Do Not Fail

The query engine follows a strict hierarchy: **correct results > degraded results > empty results > error**. An error is returned only when the engine cannot produce any meaningful result. In all other cases, the engine degrades gracefully and annotates the response with warnings that explain what was degraded and why.

### 13.2 Per-Stage Fallback Strategies

```
Error Handling by Pipeline Stage

Stage 1: Candidate Generation
┌──────────────────────────────────────────────────────────────────────┐
│ Failure Mode          │ Fallback                  │ Warning Emitted  │
│───────────────────────┼───────────────────────────┼──────────────────│
│ HNSW index unavail-   │ Fall back to Scan with    │ VectorIndex-     │
│ able (corrupt, not    │ signal-based sort.         │ Unavailable      │
│ loaded)               │ Personalization lost.      │                  │
│                       │                           │                  │
│ User pref vector      │ Use population centroid    │ UsingDefault-    │
│ missing               │ vector (spec 07, cold     │ Vector           │
│                       │ start). Results are        │                  │
│                       │ unpersonalized.            │                  │
│                       │                           │                  │
│ Tantivy index         │ Fall back to vector-only   │ TextIndex-       │
│ unavailable           │ search (if embedding       │ Unavailable      │
│                       │ provided) or return empty. │                  │
│                       │                           │                  │
│ Relationship store    │ Skip relationship-based    │ Relationship-    │
│ read error            │ candidates. Fall back to   │ StoreUnavailable │
│                       │ Scan with trending sort.   │                  │
│                       │                           │                  │
│ CohortTrending: zero  │ Fall back to global        │ EmptyTrending-   │
│ trending items        │ trending (drop cohort      │ Set              │
│                       │ scope).                    │                  │
└──────────────────────────────────────────────────────────────────────┘

Stage 2: Filter Evaluation
┌──────────────────────────────────────────────────────────────────────┐
│ Failure Mode          │ Fallback                  │ Warning Emitted  │
│───────────────────────┼───────────────────────────┼──────────────────│
│ Metadata bitmap       │ Skip that filter           │ FilterSkipped    │
│ missing (field not    │ dimension. Return results  │ { field }        │
│ indexed)              │ that may not satisfy the   │                  │
│                       │ missing filter.            │                  │
│                       │                           │                  │
│ User seen bloom       │ Skip unseen filter.        │ SeenFilter-      │
│ filter unavailable    │ User may see previously    │ Unavailable      │
│ (cold start)          │ seen items. Acceptable     │                  │
│                       │ for first session.         │                  │
│                       │                           │                  │
│ Blocked set load      │ THIS IS NOT DEGRADABLE.    │ N/A -- returns   │
│ failure               │ Return QueryError::        │ Err              │
│                       │ Internal. Blocked content  │                  │
│                       │ must never appear.         │                  │
└──────────────────────────────────────────────────────────────────────┘

Stage 3: Signal Loading
┌──────────────────────────────────────────────────────────────────────┐
│ Failure Mode          │ Fallback                  │ Warning Emitted  │
│───────────────────────┼───────────────────────────┼──────────────────│
│ Hot tier miss         │ Read from warm tier        │ None (expected   │
│ (entity evicted)      │ (hash table lookup).       │ for cold items)  │
│                       │ If warm miss, read from    │                  │
│                       │ cold tier (disk).          │                  │
│                       │                           │                  │
│ Warm tier read error  │ Use zero signal values.    │ SignalDegraded   │
│                       │ Item scored on retrieval   │ { entity_id }    │
│                       │ score and metadata only.   │                  │
│                       │                           │                  │
│ All signal tiers      │ Score using retrieval      │ SignalSystem-    │
│ unavailable           │ score + metadata only.     │ Unavailable      │
│                       │ Ranking is degraded but    │                  │
│                       │ results are returned.      │                  │
└──────────────────────────────────────────────────────────────────────┘

Stage 4: Scoring
┌──────────────────────────────────────────────────────────────────────┐
│ Failure Mode          │ Fallback                  │ Warning Emitted  │
│───────────────────────┼───────────────────────────┼──────────────────│
│ Social proof compute  │ Skip social proof term.    │ SocialProof-     │
│ timeout (>10ms)       │ Score without it.          │ Timeout          │
│                       │                           │                  │
│ Relationship weight   │ Skip relationship boost    │ Relationship-    │
│ load failure          │ terms. Score without them. │ BoostSkipped     │
│                       │                           │                  │
│ NaN/Inf in score      │ Replace with 0.0 and log   │ ScoreAnomaly     │
│ computation           │ at WARN. Likely a bug in   │ { entity_id }    │
│                       │ profile definition.        │                  │
└──────────────────────────────────────────────────────────────────────┘

Stage 5: Diversity Enforcement
┌──────────────────────────────────────────────────────────────────────┐
│ Failure Mode          │ Fallback                  │ Warning Emitted  │
│───────────────────────┼───────────────────────────┼──────────────────│
│ MMR embedding load    │ Skip topic_diversity       │ DiversityMMR-    │
│ failure (missing      │ enforcement. Apply only    │ Skipped          │
│ embeddings for some   │ max_per_creator and        │                  │
│ candidates)           │ format_mix.                │                  │
│                       │                           │                  │
│ Insufficient candi-   │ Return whatever candidates │ InsufficientFor- │
│ dates after diversity │ survived. Do not pad with  │ Diversity        │
│ enforcement           │ lower-quality items.       │                  │
└──────────────────────────────────────────────────────────────────────┘

Stage 6: Pagination
┌──────────────────────────────────────────────────────────────────────┐
│ Failure Mode          │ Fallback                  │ Warning Emitted  │
│───────────────────────┼───────────────────────────┼──────────────────│
│ Invalid cursor        │ Return QueryError::        │ N/A -- returns   │
│ (decode failure)      │ InvalidCursor. Client      │ Err              │
│                       │ must restart from page 1.  │                  │
│                       │                           │                  │
│ Stale cursor (query   │ Return QueryError::        │ N/A -- returns   │
│ hash mismatch)        │ InvalidCursor with         │ Err              │
│                       │ explanation.               │                  │
└──────────────────────────────────────────────────────────────────────┘
```

### 13.3 Non-Degradable Invariants

Some invariants cannot be traded for availability. If these fail, the query engine returns an error rather than degraded results.

| Invariant | Why It Cannot Degrade |
|-----------|----------------------|
| **Blocked content exclusion** | Trust and safety. Returning blocked content violates the user's explicit boundary. This is not a ranking quality issue -- it is a correctness requirement. |
| **Hidden content exclusion** | Same as blocked. The user has explicitly said "never show me this." |
| **Profile existence** | If the profile does not exist, the engine cannot score candidates. No meaningful ranking is possible. |
| **Entity type existence** | If the entity type is not in the schema, the engine does not know which store, index, or signal definitions to use. |

### 13.4 Warning Accumulation

Warnings are accumulated during query execution and returned alongside results. The caller can inspect warnings to understand degradation and surface appropriate UI cues.

```rust
/// Warnings emitted during query execution.
/// Accumulated in the response, never swallowed silently.
pub enum QueryWarning {
    /// Vector index was unavailable. Results are not personalized.
    VectorIndexUnavailable,
    /// User's preference vector was missing. Population default used.
    UsingDefaultVector,
    /// Text index was unavailable. Search results are vector-only.
    TextIndexUnavailable,
    /// Relationship store read failed. Social features disabled.
    RelationshipStoreUnavailable,
    /// Cohort trending set was empty. Fell back to global trending.
    EmptyTrendingSet,
    /// Cohort population too small. Fell back to parent cohort.
    InsufficientCohortPopulation { cohort: String, parent: String },
    /// A filter was skipped due to missing index.
    FilterSkipped { field: String },
    /// User seen filter unavailable (bloom filter not loaded).
    SeenFilterUnavailable,
    /// Signal state was unavailable for some candidates.
    SignalDegraded { count: usize },
    /// Signal system entirely unavailable. Ranking is metadata-only.
    SignalSystemUnavailable,
    /// Social proof computation timed out.
    SocialProofTimeout,
    /// Relationship boosts were skipped.
    RelationshipBoostSkipped,
    /// Score anomaly detected (NaN/Inf replaced with 0.0).
    ScoreAnomaly { entity_id: EntityId },
    /// MMR diversity skipped due to missing embeddings.
    DiversityMMRSkipped,
    /// Fewer results than requested after diversity enforcement.
    InsufficientForDiversity { requested: usize, returned: usize },
}

/// Query results with warnings.
pub struct Results {
    /// The ranked result set.
    pub results: Vec<ResultItem>,
    /// Pagination cursor for the next page.
    pub next_cursor: Option<String>,
    /// Total candidates before pagination.
    pub total_candidates: usize,
    /// Warnings about degraded behavior during this query.
    pub warnings: Vec<QueryWarning>,
}
```

### 13.5 Observability on Degradation

Every fallback path logs at a level proportional to its severity:

| Severity | Log Level | Examples |
|----------|-----------|---------|
| **Expected** | DEBUG | Hot tier miss -> warm tier read. Population default vector. |
| **Degraded** | WARN | Signal system unavailable. Text index unavailable. Social proof timeout. |
| **Critical** | ERROR | Blocked set load failure (query returns error). Score NaN detected. |

Additionally, the query engine emits structured metrics for monitoring:

- `query.warnings_total` (counter, tagged by warning type) -- rate of each warning type
- `query.fallback_total` (counter, tagged by fallback type) -- rate of each fallback activation
- `query.degraded_total` (counter) -- total queries with at least one warning

Operators can alert on `query.degraded_total` rate exceeding a threshold (e.g., >5% of queries degraded) to catch systemic subsystem failures.

---

## 14. Integration Architecture

### 14.1 Subsystem Coordination

The query engine coordinates six subsystems, each accessed through a trait boundary. No subsystem knows about any other. The query engine is the only module that holds references to all of them.

```
                    ┌─────────────────────────────────────────────────────────────┐
                    │                      QUERY ENGINE                            │
                    │                                                             │
                    │  retrieve() / search() / suggest()                          │
                    │       │                                                     │
                    │       ▼                                                     │
                    │  ┌──────────┐   ┌──────────┐   ┌────────────────────────┐  │
                    │  │ Parser   │──►│ Planner  │──►│ Executor               │  │
                    │  └──────────┘   └──────────┘   │                        │  │
                    │                                 │  Stage 1 ─► Stage 6   │  │
                    │                                 └────────────┬───────────┘  │
                    └─────────────────────────────────────────────┬───────────────┘
                                                                  │
                    ┌──────────────┬──────────────┬───────────────┼────────────────┐
                    │              │              │               │                │
            ┌───────▼──────┐ ┌────▼──────┐ ┌─────▼─────┐ ┌──────▼──────┐ ┌───────▼──────┐
            │ VectorIndex  │ │ TextIndex │ │ Signal    │ │ Relationship│ │ Entity       │
            │ (trait)      │ │ (trait)   │ │ Ledger    │ │ Store       │ │ Store        │
            │              │ │           │ │ (hot tier)│ │ (trait)     │ │ (trait)      │
            │ USearch HNSW │ │ Tantivy   │ │           │ │             │ │              │
            │ or           │ │ or        │ │ Atomic    │ │ Dual-index  │ │ redb or      │
            │ BruteForce   │ │ MockText  │ │ reads     │ │ forward/rev │ │ fjall        │
            └──────────────┘ └───────────┘ └───────────┘ └─────────────┘ └──────────────┘
                    │              │              │               │                │
                    │         Spec 06       Spec 03         Spec 04          Spec 01-02
                    │
               Spec 07

            ┌──────────────┐
            │ Cohort       │
            │ System       │
            │              │
            │ Bitmap       │
            │ resolution   │
            │ + signal     │
            │ dimensional  │
            │ hierarchy    │
            └──────────────┘
                    │
               Spec 05
```

### 14.2 Trait Dependencies

```rust
/// The query engine holds references to all subsystems via trait objects.
pub struct QueryEngine {
    vector_index: Arc<dyn VectorIndex>,
    text_index: Arc<dyn TextIndex>,
    signal_ledger: Arc<SignalLedger>,
    relationship_store: Arc<dyn RelationshipStore>,
    entity_store: Arc<dyn EntityStore>,
    cohort_system: Arc<CohortSystem>,
    schema_catalog: Arc<SchemaCatalog>,
}
```

Every external dependency is accessed through a trait (`VectorIndex`, `TextIndex`, `RelationshipStore`, `EntityStore`). The signal ledger and cohort system are internal (not backed by external libraries) but are still accessed through well-defined interfaces. This enables:

1. **Unit testing** with mock implementations of every subsystem.
2. **Swapping implementations** (e.g., replacing USearch with a custom HNSW) without touching query engine code.
3. **Performance isolation** -- a slow subsystem can be profiled independently.

### 14.3 Data Flow: RETRIEVE Personalized Feed

```
db.retrieve(Retrieve {
    entity: Item,
    for_user: Some("user_123"),
    profile: "for_you",
    filters: [unseen, not_blocked, eq("format", "video")],
    diversity: Some(DiversitySpec { max_per_creator: 2, format_mix: true }),
    limit: 50,
})

  1. Parser:
     - Resolve "for_you" profile --> ProfileDef with Candidate::Ann
     - Validate filters against Item entity definition
     - Verify user_123 exists

  2. Planner:
     - Load user_123 preference vector from entity store
     - Build filter bitmap: format_bitmap["video"]
     - Estimate selectivity: ~15% (video format)
     - Select ANN strategy: InGraphFilter (selectivity > 1%)
     - Build scoring plan from profile (view velocity, interaction_weight, ...)
     - Build diversity plan (max_per_creator: 2, format_mix: true)

  3. Executor:
     Stage 1 (ANN):
       vector_index.filtered_search(
         user_preference_vector, k=500,
         |entity_id| filter_bitmap.contains(entity_id)
       )
       --> 500 candidate (entity_id, similarity) pairs

     Stage 2 (Filter):
       Load user_123 seen bloom filter
       Load user_123 blocked creator set
       Remove seen items, remove blocked items
       --> ~350 candidates

     Stage 3 (Signal Load):
       For each candidate, load hot tier signal state:
         view.decay_score, view.velocity(24h), like.decay_score,
         skip.decay_score(24h), completion.value(all_time)
       --> 350 candidates with signal snapshots

     Stage 4 (Scoring):
       Apply profile: base = similarity_score
         + 0.3 * view.velocity(24h)
         + 0.2 * interaction_weight(user, creator)
         + 0.15 * social_proof
         * temporal_decay(created_at, 48h half-life)
         - 0.5 * skip(24h)
       Gate: completion(all_time) >= 0.3
       Exclude: hide signal present
       --> ~250 scored, sorted candidates

     Stage 5 (Diversity):
       max_per_creator: 2 (demote extras)
       format_mix: interleave video/short/article
       --> 250 reordered candidates

     Stage 6 (Pagination):
       Slice [0..50]
       Encode next_cursor from result[49]
       --> Results { results: [50], next_cursor: Some(...) }
```

### 14.4 Data Flow: SEARCH WITHIN TRENDING FOR COHORT

```
db.search(Search {
    query: "piano",
    vector: Some(query_embedding),
    entity: Item,
    profile: "search",
    within_trending: Some(WithinTrending {
        cohort: CohortRef::Named("young_us_jazz"),
        window: Window::hours(24),
        min_velocity: None,
        max_candidates: Some(500),
    }),
    limit: 20,
})

  1. Parser:
     - Resolve "search" profile
     - Parse "piano" --> SearchQuery::Term("piano")
     - Resolve "young_us_jazz" cohort
     - Validate all inputs

  2. Planner:
     - Detect WithinTrending --> CompositionPlan
     - Resolve cohort bitmap (cached intersection of region:US, age:18-24, jazz)
     - Check cohort population: 45,000 active users in 24h --> sufficient
     - Plan: Phase 1 (cohort) + Phase 2 (trending) + Phase 3 (search) + Phase 4 (rank)

  3. Executor:
     Phase 1 (Cohort Resolution): 1.5ms
       cohort_system.resolve("young_us_jazz")
       --> bitmap D, cardinality 45,000

     Phase 2 (Trending Candidates): 12ms
       signal_ledger.scan_cohort_velocity(
         cohort: "young_us_jazz",
         signal: "view",
         window: 24h,
       )
       --> 500 items with highest cohort velocity
       --> trending_ids bitmap

     Phase 3 (Search Within): 8ms
       text_index.score_candidates(Item, SearchQuery::Term("piano"), &trending_ids)
       --> 73 items matching "piano" with BM25 scores

       Brute-force vector distance on 500 trending items:
       --> 500 similarity scores

       RRF fusion:
       --> 73 fused candidates (items matching text + in trending set)

     Phase 4 (Final Ranking): 4ms
       Load signals, apply scoring profile
       final_score = 0.6 * fused_relevance + 0.4 * normalized_velocity + boosts
       Diversity: max_per_creator: 2
       --> Results { results: [20], next_cursor: Some(...) }

     Total: ~25.5ms
```

---

## 15. Invariants and Correctness Guarantees

These invariants must hold at all times. Property tests, integration tests, and crash recovery tests enforce them.

| # | Invariant | Test Strategy |
|---|-----------|--------------|
| 1 | **Every returned result passed all filters.** No result in the response violates any filter predicate specified in the query. | Property test: for every result in response, assert all filters hold. Fuzz test: random filter combinations, verify all results pass. |
| 2 | **Results are sorted by final_score descending** (within diversity reordering tolerance). After diversity enforcement, the relative score order is preserved within each diversity bucket. | Property test: verify sort order of results. Integration test: compare sorted output to naive sort of same candidates. |
| 3 | **Blocked content never appears.** If `for_user` is provided and the user has blocked a creator or item, that content is never in the result set -- regardless of score, filter, or diversity settings. | Property test: inject blocked relationships, verify zero results from blocked creators/items across all query types. |
| 4 | **Hidden content never appears.** If a user has sent a `hide` signal for an item, that item never appears for that user. | Same as above for hide signals. |
| 5 | **Cursor pagination does not produce duplicates.** Given stable data, paginating through a result set with cursors produces each result exactly once. | Integration test: paginate through full result set, verify no duplicate entity IDs. |
| 6 | **Composition restricts, not filters.** `WITHIN TRENDING` operates as a candidate generation strategy. Every result in a composed query has non-zero trending velocity in the specified cohort and window. | Property test: for every result in composed query, assert cohort velocity > 0. |
| 7 | **Gated candidates are excluded.** If a ranking profile defines a `Gate::min(signal, window, threshold)`, no result with a signal value below the threshold appears in the response. | Property test: inject candidates below gate threshold, verify they are absent from results. |
| 8 | **Diversity max_per_creator is respected.** If `max_per_creator: 2`, no more than 2 items from any single creator appear in the result set. | Property test: count per-creator items in results, assert <= max_per_creator. |
| 9 | **The query engine holds no mutable state.** The engine is a pure function of its inputs and the current state of the subsystems it reads from. Two identical queries at the same moment produce identical results. | Architecture invariant: no mutable fields on QueryEngine. Verified by code review and Sync + Send bounds. |
| 10 | **Unknown profiles, fields, or cohorts produce typed errors, not panics.** Every invalid reference produces a `QueryError` variant. The engine never panics on user input. | Fuzz test: random strings for profile names, field names, cohort names. Verify all return `Err`, never panic. |
| 11 | **Signal reads use Acquire ordering.** Every load from the hot tier's `AtomicU64` fields uses `Ordering::Acquire` to ensure the reader sees the most recent score written with `Ordering::Release` by a concurrent signal writer. | Code review + integration test: concurrent signal write + query read, verify monotonic score progression. |
| 12 | **Empty results are not errors.** A query with filters that match no items returns `Results { results: [], next_cursor: None, total_candidates: 0 }`. Not an error. | Unit test: query with impossible filter combination returns empty Results, not Err. |
| 13 | **Fallback on insufficient cohort population.** When a cohort has fewer active users than the minimum threshold, the engine falls back to a parent cohort and emits a warning. It does not return an error or an empty set (unless no parent meets the threshold either). | Integration test: create tiny cohort, query with FOR COHORT, verify fallback to parent and warning in response. |
| 14 | **Query plan is logged for every query.** At DEBUG level, every query logs its execution plan including strategy, estimated selectivity, candidate count, and latency. This is the primary observability mechanism. | Integration test: verify log output contains expected plan fields for each query type. |
| 15 | **Every degradation is surfaced as a warning.** If the query engine takes any fallback path (missing vector, signal tier miss, skipped filter, etc.), it emits a `QueryWarning` in the response. No degradation is silently swallowed. | Integration test: disable each subsystem, verify corresponding warning appears in response. |
| 16 | **Queries before warm-up return NotReady, not incorrect results.** The database does not serve queries until all warm-up steps (HNSW load, WAL replay, bloom filter rebuild, cohort bitmap computation) have completed. | Integration test: issue query before warm-up completes, verify `QueryError::NotReady`. |

---

## Appendix A: Query Error Reference

| Error | When | Recovery |
|-------|------|----------|
| `UnknownProfile(name)` | Profile name not in schema catalog | Define the profile via `define_profile()` |
| `InvalidFilter { field, reason }` | Filter references unknown field or type mismatch | Check entity definition for valid field names and types |
| `UnknownCohort(name)` | Named cohort not defined in schema | Define the cohort via `define_cohort()` |
| `MissingUserForAutoCohort` | `CohortRef::Auto` used without `for_user` | Provide `for_user` or use `CohortRef::Named` |
| `UnknownUser(id)` | User ID not in entity store | Ingest the user via `write_user()` |
| `InvalidQuery(msg)` | Search query string has syntax error | Fix query syntax (unbalanced quotes, empty phrase, etc.) |
| `InvalidCursor(msg)` | Cursor hash mismatch or decode failure | Start from page 1 (no cursor) |
| `MissingVector(msg)` | ANN candidate strategy requires a vector that does not exist | Provide embedding or use a non-ANN profile |
| `Internal(msg)` | Subsystem failure (storage I/O, index corruption) | Check logs, restart database if persistent |
| `NotReady` | Database is still warming up (loading HNSW, replaying WAL, building bloom filters) | Retry after startup completes; monitor ready health check |

---

## Appendix B: Sort Mode Implementation Reference

Sort modes (from API.md) are implemented as sort expressions in the scan candidate strategy. Each mode maps to a signal read or metadata field access.

| Sort Mode | Implementation | Signal/Field | Direction |
|-----------|---------------|-------------|-----------|
| `Relevance` | BM25 + vector fusion score | Computed at search time | DESC |
| `Personalized` | User preference vector similarity | Cosine similarity | DESC |
| `New` | Metadata field read | `created_at` | DESC |
| `Old` | Metadata field read | `created_at` | ASC |
| `Hot` | `score / (age_hours + 2)^1.8` | Composite of signal + timestamp | DESC |
| `Trending` | Signal velocity read | `view.velocity(6h)` + `share.velocity(6h)` | DESC |
| `Rising` | Velocity relative to baseline | `velocity / baseline` | DESC |
| `TopAllTime` | Signal accumulator | `like.decay_score(all_time)` | DESC |
| `TopHour` | Signal windowed count | `like.count(1h)` | DESC |
| `TopToday` | Signal windowed count | `like.count(24h)` | DESC |
| `TopWeek` | Signal windowed count | `like.count(7d)` | DESC |
| `TopMonth` | Signal windowed count | `like.count(30d)` | DESC |
| `MostViewed` | Signal windowed count | `view.count(all_time)` | DESC |
| `MostLiked` | Signal windowed count | `like.count(all_time)` | DESC |
| `MostCommented` | Signal windowed count | `comment.count(all_time)` | DESC |
| `MostShared` | Signal windowed count | `share.count(all_time)` | DESC |
| `Shortest` | Metadata field read | `duration` | ASC |
| `Longest` | Metadata field read | `duration` | DESC |
| `AlphabeticalAsc` | Metadata field read | `title` | ASC |
| `AlphabeticalDesc` | Metadata field read | `title` | DESC |
| `Shuffle` | Weighted random | `rand() * quality_score` | DESC |
| `LiveViewerCount` | Real-time counter | `live_viewers.count(now)` | DESC |
| `DateSaved` | Relationship timestamp | `saved.timestamp` | DESC |
| `CreatorEngagementRate` | Creator signal ratio | `creator.engagement_rate` | DESC |
| `Controversial` | Signal product | `max(positive_count * negative_count)` | DESC |
| `HiddenGems` | Quality / reach ratio | `quality_score / view_count` | DESC |