jordan 413b712c0a chore: initialize tidalDB repository with schema foundation and standards

- Schema phase 1 (tasks 01-02): EntityId, EntityKind, Timestamp, Score, SignalTypeDef, DecayModel, Window, WindowSet — all with property tests and benchmarks scaffolding
- Stub modules for storage, signals, query, ranking
- Full documentation suite: VISION, USE_CASES, SEQUENCE, API, CODING_GUIDELINES, ai-lookup, research docs, specs, roadmap, planning docs
- Marketing site (Next.js) with blog infrastructure
- .claude/ agents and skills for the tidalDB development workflow
- Foundation standards enforced: thiserror + tracing declared as dependencies, clippy::unwrap_used = deny added to lint config
- .gitignore hardened: .next/, node_modules/, .env, secrets, logs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-20 12:52:20 -07:00

107 KiB

Raw Blame History

08 -- Query Engine Specification

Status: Draft Authors: tidalDB Engineering Date: 2026-02-20 Depends on: Storage Engine (01), Entity Model (02), Signal System (03), Relationships (04), Cohorts (05), Text Retrieval (06), Vector Retrieval (07) Research: docs/research/ann_for_tidaldb.md, docs/research/tidaldb_signal_ledger.md, docs/research/tantivy.md

Overview
Query Operations
Query Parsing
Query Planning
Execution Pipeline
Query Composition
Filter Evaluation
Pagination
SUGGEST Operation
Query Context
Performance Targets
Query Caching
Error Handling and Fallbacks
Integration Architecture
Invariants and Correctness Guarantees

1. Overview

The query engine is the brain of tidalDB. It is the single module that orchestrates every other subsystem -- storage, signals, text retrieval, vector retrieval, relationships, cohorts -- to answer one question: "given a user and a context, what content should they see, in what order?"

The query engine has three responsibilities:

Parse the query into a typed AST that captures all semantic intent.
Plan the execution strategy by choosing candidate generation, filter evaluation, and scoring approaches based on cost estimation.
Execute the plan by coordinating subsystem calls, assembling the result set, and enforcing diversity and pagination constraints.

Design Principles

The query engine is an orchestrator, not a data store. It holds no data of its own. It reads from the signal ledger, the entity store, the text index, the vector index, the relationship store, and the cohort system. If the query engine process crashes, no data is lost and no recovery procedure is needed.

Deep module, small interface. The public API is three methods: retrieve(), search(), suggest(). Everything behind those methods -- query parsing, plan selection, selectivity estimation, pipeline orchestration, diversity enforcement, cursor management -- is internal. The caller provides a declarative query. The engine decides how to execute it.

Composition is a first-class operation. The most complex query in the system -- SEARCH items QUERY "piano" WITHIN TRENDING FOR COHORT young_us_jazz WINDOW 24h -- composes text/semantic search with cohort-scoped trending. This is not a special case bolted on after the fact. The planner treats composition as a standard plan shape, and the pipeline handles it without branching logic.

No re-ranking by the application. The result order from the query engine is the final order. The application renders it. If the application is tempted to re-rank, the ranking profile is wrong and should be fixed in schema.

2. Query Operations

tidalDB exposes three query operations. Each maps to a public method on TidalDB.

2.1 RETRIEVE

Feed generation, browse, related content, trending, following, notifications -- every discovery surface that does not involve a user-provided search string.

pub fn retrieve(&self, query: Retrieve) -> Result<Results, QueryError>;

RETRIEVE generates a ranked list by:

Generating candidates from the profile's candidate strategy (ANN, scan, relationship, cohort trending, or hybrid).
Filtering candidates against metadata predicates, user state, and relationship exclusions.
Loading signal state for surviving candidates.
Scoring via the ranking profile (boosts, penalties, gates, decay).
Enforcing diversity constraints.
Paginating and returning the result set.

The profile determines the candidate generation strategy. The caller never specifies how candidates are found -- only which profile to use and which filters to apply.

2.2 SEARCH

Text and semantic retrieval. The user provides a query string, optionally a query embedding, and the engine returns results ranked by a combination of text relevance, semantic similarity, signal strength, and personalization.

pub fn search(&self, query: Search) -> Result<Results, QueryError>;

SEARCH differs from RETRIEVE in one critical way: the candidate generation strategy always involves text and/or vector retrieval driven by the user's query, not by a profile's static candidate source. The ranking profile still controls scoring, but candidates are generated from the query string and/or embedding.

2.3 SUGGEST

Autocomplete and trending query suggestions. Returns completions for a partial query string.

pub fn suggest(&self, query: Suggest) -> Result<Vec<SuggestResult>, QueryError>;

SUGGEST is a lightweight operation that bypasses the full execution pipeline. It reads from the text index term dictionary, popular query tracking, and optionally the user's personal search history. See Section 9 for details.

3. Query Parsing

3.1 Input Types

The query engine accepts Rust structs, not text strings. Parsing in this context means validating the input struct against the schema, resolving references (profile names, cohort names, field names), and constructing a typed AST that the planner can reason about.

/// A RETRIEVE query. Declarative: specifies what, not how.
pub struct Retrieve {
    /// Target entity type.
    pub entity: EntityKind,
    /// User context for personalization. None for unpersonalized queries.
    pub for_user: Option<UserId>,
    /// Surface context for the feedback loop.
    pub context: Option<String>,
    /// Named ranking profile. Determines candidate strategy and scoring.
    pub profile: String,
    /// Profile version. None = latest.
    pub profile_version: Option<u32>,
    /// Metadata and state filters.
    pub filters: Vec<Filter>,
    /// Sort mode override. None = use profile default.
    pub sort: Option<Sort>,
    /// Diversity constraints override. None = use profile default.
    pub diversity: Option<DiversitySpec>,
    /// Anchor item for related/similar queries.
    pub similar_to: Option<EntityId>,
    /// Explicit item exclusions (e.g., previously returned items).
    pub exclude_ids: Vec<EntityId>,
    /// Maximum results to return.
    pub limit: usize,
    /// Cursor from a previous result set for pagination.
    pub cursor: Option<Cursor>,
    /// Cohort scope for cohort-trending queries.
    pub for_cohort: Option<CohortRef>,
    /// Trending window for cohort-trending queries.
    pub window: Option<Window>,
}

/// A SEARCH query. Combines text/semantic retrieval with ranking.
pub struct Search {
    /// The user's query string. Parsed into a SearchQuery AST.
    pub query: String,
    /// Optional query embedding for semantic search.
    pub vector: Option<Vec<f32>>,
    /// Target entity type.
    pub entity: EntityKind,
    /// User context for personalization.
    pub for_user: Option<UserId>,
    /// Named ranking profile. Controls scoring after retrieval.
    pub profile: String,
    /// Metadata and state filters.
    pub filters: Vec<Filter>,
    /// Sort mode override.
    pub sort: Option<Sort>,
    /// Diversity constraints override.
    pub diversity: Option<DiversitySpec>,
    /// Maximum results to return.
    pub limit: usize,
    /// Cursor for pagination.
    pub cursor: Option<Cursor>,
    /// Composition: restrict search to trending candidates.
    pub within_trending: Option<WithinTrending>,
}

/// A SUGGEST query. Lightweight autocomplete.
pub struct Suggest {
    /// Partial query string (the prefix typed so far).
    pub prefix: String,
    /// User context for personalized suggestions.
    pub for_user: Option<UserId>,
    /// Target entity type for term completions.
    pub entity: Option<EntityKind>,
    /// Maximum suggestions to return.
    pub limit: usize,
}

/// Cohort reference: named, ad-hoc predicate, or auto-derived.
pub enum CohortRef {
    /// A named cohort defined in schema.
    Named(String),
    /// An inline predicate (ad-hoc cohort).
    Predicate(Predicate),
    /// Derive cohort automatically from the querying user's attributes.
    Auto,
}

/// Composition clause: restrict candidates to trending items.
pub struct WithinTrending {
    /// The cohort to scope trending to.
    pub cohort: CohortRef,
    /// The time window for trending computation.
    pub window: Window,
    /// Minimum velocity threshold for candidate inclusion.
    pub min_velocity: Option<f64>,
    /// Maximum candidates to draw from trending.
    pub max_candidates: Option<usize>,
}

3.2 Search Query Grammar

The query field of a Search struct is a user-typed string. The query parser transforms it into a SearchQuery AST. The grammar follows the specification in the Text Retrieval spec (06, Section 4) and the API reference.

EBNF Grammar:

query       ::= expression
expression  ::= and_expr ( 'OR' and_expr )*
and_expr    ::= unary_expr ( 'AND' unary_expr )*
unary_expr  ::= 'NOT' atom | '-' atom | atom
atom        ::= phrase | prefix | field_scope | hashtag | '(' expression ')' | term
phrase      ::= '"' <any text> '"'
prefix      ::= <word> '*'
field_scope ::= <field_name> ':' ( phrase | term )
hashtag     ::= '#' <word>
term        ::= <word>

Operator precedence (highest to lowest):

Grouping ()
Field scope field:
NOT / -
AND
OR (implicit between bare terms)

Default behavior: Bare space-separated terms are treated as implicit OR with BM25 ranking. Documents matching more terms score higher. This matches user expectations from web search.

3.3 Search Query AST

/// Parsed search query. Recursive AST for text retrieval.
///
/// The parser transforms user-typed query strings into this tree.
/// The text index (Tantivy) translates it into native query types.
/// The AST is also used by the query planner for cost estimation
/// (number of terms, phrase presence, field scoping).
pub enum SearchQuery {
    /// A single search term, lowercased and analyzed.
    Term(String),
    /// An exact phrase match (quoted string).
    Phrase(Vec<String>),
    /// A prefix match (wildcard). "pian*" matches "piano", "pianist".
    Prefix(String),
    /// Conjunction: all children must match.
    And(Vec<SearchQuery>),
    /// Disjunction: any child may match. BM25 scores accumulate.
    Or(Vec<SearchQuery>),
    /// Negation: exclude documents matching the child.
    Not(Box<SearchQuery>),
    /// Field-scoped query: restrict matching to a specific field.
    FieldScoped {
        field: FieldName,
        query: Box<SearchQuery>,
    },
    /// Hashtag match: equivalent to FieldScoped("hashtags", Term(tag)).
    Hashtag(String),
}

3.4 Validation and Resolution

Parsing produces a ValidatedQuery -- a fully-resolved internal representation that the planner consumes. Validation performs:

Profile resolution: Look up the named profile in the schema catalog. Return QueryError::UnknownProfile if not found.
Filter validation: Verify every filter field exists on the target entity type. Verify operator/type compatibility (e.g., min on a numeric field, not on a keyword). Return QueryError::InvalidFilter on mismatch.
Cohort resolution: If for_cohort or within_trending.cohort is Named(name), look up the named cohort. If Auto, verify for_user is provided (cannot auto-derive without a user). Return QueryError::UnknownCohort or QueryError::MissingUserForAutoCohort.
User existence: If for_user is Some(id), verify the user exists in the entity store. Return QueryError::UnknownUser if not found.
Embedding availability: If the profile's candidate strategy is Ann with VectorSource::UserPreference, verify the user has a preference vector. If not, fall back to the population default vector.
Search query parsing: Parse the query string into a SearchQuery AST. Return QueryError::InvalidQuery on syntax errors (unbalanced quotes, empty phrases).
Cursor validation: If a cursor is provided, verify its query_hash matches the current query. Return QueryError::InvalidCursor if the query parameters changed between pages.

/// Errors returned by the query engine.
pub enum QueryError {
    /// The named profile does not exist in the schema catalog.
    UnknownProfile(String),
    /// A filter references a field that does not exist on the target entity.
    InvalidFilter { field: String, reason: String },
    /// The named cohort does not exist.
    UnknownCohort(String),
    /// Auto cohort derivation requires a user context.
    MissingUserForAutoCohort,
    /// The user ID does not exist in the entity store.
    UnknownUser(UserId),
    /// The search query string has a syntax error.
    InvalidQuery(String),
    /// The pagination cursor is invalid or stale.
    InvalidCursor(String),
    /// The profile's candidate strategy requires a vector that is unavailable.
    MissingVector(String),
    /// An internal subsystem error (storage, index, signal).
    Internal(String),
    /// The database is still warming up and cannot serve queries yet.
    NotReady,
}

4. Query Planning

The planner transforms a validated query into an execution plan. The plan is a sequence of physical operations with estimated costs. The planner's job is to minimize end-to-end latency while guaranteeing correctness.

4.1 Candidate Generation Strategies

The planner selects one of five candidate generation strategies based on the ranking profile's candidate field and the query type.

/// Physical candidate generation strategy selected by the planner.
pub(crate) enum CandidateStrategy {
    /// Approximate nearest neighbor search via HNSW.
    /// Used for personalized feeds (user preference vector)
    /// and related content (anchor item embedding).
    Ann {
        query_vector: Vec<f32>,
        index: EntityKind,
        slot: EmbeddingSlot,
        top_k: usize,
        /// Filter predicate for the adaptive query planner.
        /// Selectivity determines strategy (in-graph, ACORN, brute-force).
        filter: Option<FilterBitmap>,
        /// Selected ANN strategy from the adaptive planner.
        ann_strategy: AnnStrategy,
    },

    /// Full scan with signal-based scoring.
    /// Used for trending (velocity sort), browse (field sort),
    /// and any query where candidates are not similarity-driven.
    Scan {
        entity: EntityKind,
        /// Pre-filter bitmap to narrow the scan.
        filter: Option<FilterBitmap>,
        /// Sort expression that determines scan order.
        sort: SortExpression,
    },

    /// Hybrid text + vector retrieval with fusion.
    /// Used for SEARCH queries with both text and vector.
    Hybrid {
        text_query: SearchQuery,
        query_vector: Option<Vec<f32>>,
        entity: EntityKind,
        text_top_k: usize,
        vector_top_k: usize,
        fusion: FusionStrategy,
        /// Filter predicate pushed into both retrieval legs.
        filter: Option<FilterBitmap>,
    },

    /// Relationship traversal for candidate generation.
    /// Used for following feeds, social graph scoped queries.
    Relationship {
        user_id: UserId,
        edge_kind: RelationshipKind,
        depth: TraversalDepth,
        /// Max fan-out per hop.
        max_fan_out: usize,
    },

    /// Cohort-scoped trending as candidate source.
    /// Used for "trending among people like me" queries.
    CohortTrending {
        cohort: ResolvedCohort,
        window: Window,
        min_velocity: f64,
        top_k: usize,
    },
}

/// ANN strategy selected by the adaptive query planner (from Vector Retrieval spec Section 9).
pub(crate) enum AnnStrategy {
    /// Standard HNSW search, no filter.
    Standard { ef_search: usize },
    /// In-graph predicate filter. Selectivity > 20%.
    InGraphFilter { ef_search: usize },
    /// Pre-filter + widened HNSW (ACORN-1). Selectivity 1-20%.
    Acorn { ef_search: usize },
    /// Pre-filter + brute-force. Selectivity < 1%.
    BruteForce,
}

4.1.1 Strategy Comparison Table

Strategy	Use Case	Candidate Source	Typical Latency	Candidate Count	Filter Push-Down	When to Choose
Ann	Personalized feed, related content, "more like this"	HNSW vector index (user pref vector or anchor embedding)	8-15ms	200-500	Yes (in-graph / ACORN / brute-force via adaptive planner)	Profile specifies `Candidate::Ann` or query has `similar_to`
Scan	Trending, browse by field, top-N by signal	Signal/metadata sorted index, full entity scan	5-20ms	200-1000	Yes (bitmap skip during scan)	Sort mode is signal-based (velocity, decay_score, count) or metadata-based (created_at, duration)
Hybrid	SEARCH queries with text + vector	Tantivy BM25 + HNSW ANN, parallel execution, RRF/linear fusion	10-20ms (parallel)	300-600 (merged)	Yes (pushed into both Tantivy fast-fields and HNSW predicate)	SEARCH query has both text query and query embedding
Relationship	Following feed, social-graph scoped	BFS traversal of social graph edges	5-15ms	100-1000	No (filters applied post-traversal)	Profile specifies `Candidate::Relationship` (following, social)
CohortTrending	"Trending among people like me"	Cohort-scoped signal velocity scan	10-20ms	200-500	Post-filter only (metadata filters after velocity sort)	Query has `for_cohort` with trending sort or profile specifies cohort trending
ComposedSearch	SEARCH WITHIN TRENDING FOR COHORT	Phase 1-2: cohort trending candidates; Phase 3: text/vector search within that set	25-40ms (4 phases)	50-200 (after search within 500 trending)	Metadata filters applied to trending set before search phase	Query has `within_trending` clause

Cost Model Summary:

Strategy	CPU Cost	Memory Cost	I/O Cost	Concurrency Impact
Ann	O(log N * ef_search * M)	O(ef_search) visited set	0 (in-memory HNSW)	None (read-only graph traversal)
Scan	O(K) where K = candidates to emit	O(K) result buffer	Possible cold-tier signal reads	None (snapshot isolation)
Hybrid	O(BM25) + O(ANN) parallel	O(text_k + vector_k) + merge buffer	Tantivy segment reads	None (separate readers)
Relationship	O(fan_out^depth) bounded by max_fan_out	O(visited) set for cycle detection	Edge list reads from storage	None (immutable edge snapshots)
CohortTrending	O(tracked_items) velocity scan	O(top_k) sorted buffer	Cohort signal reads (hot or warm tier)	None (atomic reads)
ComposedSearch	Sum of CohortTrending + Hybrid on small set	O(trending_k) + O(search results)	Same as CohortTrending + brute-force vector	None

4.2 Plan Construction

The planner constructs an ExecutionPlan -- the complete recipe for executing the query.

/// The complete execution plan. Immutable once constructed.
/// Logged at DEBUG level for every query for observability.
pub(crate) struct ExecutionPlan {
    /// How candidates are generated.
    candidate_strategy: CandidateStrategy,
    /// Pre-computed filter bitmap (if filters are present).
    filter_bitmap: Option<FilterBitmap>,
    /// Which signals to load for scoring.
    required_signals: Vec<SignalRef>,
    /// The scoring function from the ranking profile.
    scoring: ScoringPlan,
    /// Diversity enforcement strategy.
    diversity: Option<DiversityPlan>,
    /// Pagination state.
    pagination: PaginationPlan,
    /// Estimated total cost for logging and monitoring.
    estimated_cost: CostEstimate,
    /// Whether this is a composed query (SEARCH WITHIN TRENDING).
    composition: Option<CompositionPlan>,
}

/// Cost estimate for plan logging and monitoring.
pub(crate) struct CostEstimate {
    /// Estimated number of candidates before filtering.
    candidate_count: usize,
    /// Estimated number of candidates after filtering.
    filtered_count: usize,
    /// Estimated wall-clock time in microseconds.
    estimated_latency_us: u64,
}

4.3 Planner Decision Tree

The planner selects the candidate strategy based on the query type and profile configuration. The decision tree is deterministic -- given the same inputs, the planner always produces the same plan.

Query Planner Decision Tree

                         ┌────────────────────┐
                         │  Query Operation?   │
                         └─────────┬──────────┘
                                   │
              ┌────────────────────┼────────────────────┐
              │                    │                    │
        ┌─────▼──────┐    ┌───────▼───────┐    ┌──────▼──────┐
        │  RETRIEVE   │    │   SEARCH      │    │  SUGGEST    │
        └─────┬──────┘    └───────┬───────┘    └─────────────┘
              │                    │              (bypass pipeline,
              │                    │               see Section 9)
              ▼                    ▼
    ┌──────────────────┐  ┌──────────────────────┐
    │ Profile.candidate │  │ Has within_trending? │
    └────────┬─────────┘  └─────────┬────────────┘
             │                      │
    ┌────────┼────────┬──────┐     ├──── yes ──► ComposedSearch
    │        │        │      │     │              (Section 6)
    ▼        ▼        ▼      ▼     │
   Ann     Scan   Relation  Cohort └──── no ──►  Profile.candidate?
    │        │    -ship    Trending              │
    │        │      │        │           ┌──────┼──────┐
    │        │      │        │           │      │      │
    │        │      │        │          Hybrid  Ann   Scan
    │        │      │        │           │      │      │
    ▼        ▼      ▼        ▼           ▼      ▼      ▼
  ANN     Signal  BFS    Cohort      Text+Vec  ANN  Signal
  search  scan   trav.  velocity     parallel  only  sort
    │        │      │     scan         │        │      │
    └────────┴──────┴──────┴───────────┴────────┴──────┘
                           │
                    ┌──────▼──────┐
                    │ Has filters? │
                    └──────┬──────┘
                     yes   │   no
                      ▼    │    ▼
              Build filter │  Skip filter
              bitmap       │  evaluation
                      │    │    │
                      └────┴────┘
                           │
                    ┌──────▼──────────┐
                    │ For ANN: select │
                    │ ANN strategy    │
                    │ via selectivity │
                    └─────────────────┘
                           │
                    ┌──────▼──────────┐
                    │ Build scoring   │
                    │ plan from       │
                    │ profile def     │
                    └─────────────────┘
                           │
                    ┌──────▼──────────┐
                    │ Build diversity │
                    │ plan            │
                    └─────────────────┘
                           │
                    ┌──────▼──────────┐
                    │ Build pagination│
                    │ plan from cursor│
                    └─────────────────┘
                           │
                           ▼
                    ExecutionPlan ready

4.4 Selectivity Estimation

For filtered ANN queries, the planner must estimate filter selectivity before choosing the ANN strategy. Selectivity estimation uses the bitmap cardinality from the Entity Model's metadata indexes (spec 02) and the Vector Retrieval spec's adaptive query planner (spec 07, Section 9).

Selectivity Estimation

For each filter predicate:
  keyword equality:     cardinality(bitmap[field][value]) / total_entities
  keyword IN-list:      cardinality(union(bitmaps)) / total_entities
  numeric range:        estimate from sorted index statistics
  boolean:              cardinality(bitmap[field][true_or_false]) / total_entities
  unseen (user state):  user_seen_count / total_entities
  relationship:         edge_count / total_entities

For compound filters (AND):
  selectivity = product of individual selectivities
  (independence assumption; refined by correlation cache)

For compound filters (OR):
  selectivity = sum(individual) - sum(pairwise) + ...
  (approximation: sum(individual) * 0.9)

Result: float in [0.0, 1.0]
  Maps to ANN strategy via thresholds:
    > 0.20  -->  InGraphFilter
    0.01-0.20  -->  Acorn (widened ef_search)
    < 0.01  -->  BruteForce
    1.0     -->  Standard (no filter)

The correlation cache (maintained by the background materializer) stores joint selectivity estimates for frequently co-occurring filter pairs. When the independence assumption is known to be inaccurate (e.g., category:jazz AND format:audio), the cache provides a corrected estimate.

4.5 Scoring Plan

The scoring plan is derived from the ranking profile definition and determines which signals, relationships, and boosts are evaluated for each candidate.

/// Scoring plan derived from the ranking profile.
pub(crate) struct ScoringPlan {
    /// Signal-based boosts: signal name, window, metric, weight.
    signal_boosts: Vec<SignalBoostPlan>,
    /// Relationship-based boosts: edge kind, weight.
    relationship_boosts: Vec<RelationshipBoostPlan>,
    /// Social proof boost weight (if enabled).
    social_proof_weight: Option<f64>,
    /// Cohort trending boost (if enabled).
    cohort_trending_boost: Option<CohortBoostPlan>,
    /// Temporal decay: field, half-life.
    temporal_decay: Option<TemporalDecayPlan>,
    /// Quality gates: minimum signal thresholds.
    gates: Vec<GatePlan>,
    /// Scoring penalties.
    penalties: Vec<PenaltyPlan>,
    /// Hard exclusions (hide, block).
    excludes: Vec<ExcludePlan>,
    /// Exploration fraction: percentage of results from unfamiliar creators.
    exploration: f64,
}

pub(crate) struct SignalBoostPlan {
    signal: SignalName,
    window: Window,
    metric: SignalMetric,   // Value, Velocity, Ratio, UniqueRatio
    weight: f64,
}

5. Execution Pipeline

The execution pipeline is a six-stage sequence. Every query -- RETRIEVE and SEARCH -- flows through the same pipeline. The candidate generation stage varies by plan; the remaining stages are uniform.

5.1 Pipeline Architecture

                   RETRIEVE / SEARCH query
                           │
                           ▼
                ┌──────────────────────┐
  Stage 1       │  CANDIDATE GENERATION │  Generate initial candidate set.
                │                      │  Strategy depends on plan:
                │  ANN / Scan / Hybrid │  ANN, Scan, Hybrid, Relationship,
                │  Relationship /      │  CohortTrending, or Composed.
                │  CohortTrending      │
                │                      │  Output: Vec<RawCandidate>
                └──────────┬───────────┘  (entity_id, retrieval_score)
                           │
                           │  200-1000 candidates
                           ▼
                ┌──────────────────────┐
  Stage 2       │  FILTER EVALUATION    │  Apply metadata and state filters.
                │                      │  Bitmap intersection for metadata.
                │  Bitmap intersection │  Hash set for seen/excluded IDs.
                │  + user state check  │  Relationship check for blocked.
                │  + exclusion check   │
                │                      │  Output: Vec<FilteredCandidate>
                └──────────┬───────────┘  (same as input, minus excluded)
                           │
                           │  100-500 candidates (typical)
                           ▼
                ┌──────────────────────┐
  Stage 3       │  SIGNAL LOADING       │  Load signal state from hot tier.
                │                      │  One atomic read per signal per
                │  Hot tier reads      │  candidate. Apply lazy decay.
                │  (lock-free atomics) │
                │                      │  Output: Vec<ScoredCandidate>
                └──────────┬───────────┘  (+ signal_snapshot per candidate)
                           │
                           │  100-500 candidates with signal state
                           ▼
                ┌──────────────────────┐
  Stage 4       │  SCORING              │  Apply ranking profile:
                │                      │  - Signal boosts (decay, velocity)
                │  Profile boosts,     │  - Relationship boosts
                │  gates, penalties,   │  - Social proof
                │  temporal decay      │  - Temporal decay
                │                      │  - Quality gates (min thresholds)
                │                      │  - Penalties (skip, negative signals)
                │                      │  - Hard excludes (hide, block)
                │                      │
                │                      │  Output: Vec<RankedCandidate>
                └──────────┬───────────┘  (entity_id, final_score)
                           │
                           │  50-300 candidates with scores
                           ▼
                ┌──────────────────────┐
  Stage 5       │  DIVERSITY            │  Enforce variety constraints:
                │  ENFORCEMENT          │  - max_per_creator
                │                      │  - format_mix
                │  Creator cap,        │  - topic_diversity (MMR)
                │  format mix,         │  - exploration injection
                │  topic MMR           │
                │                      │  Output: Vec<DiverseCandidate>
                └──────────┬───────────┘  (reordered, not reduced)
                           │
                           │  limit + buffer candidates
                           ▼
                ┌──────────────────────┐
  Stage 6       │  PAGINATION           │  Apply cursor position.
                │                      │  Slice to requested limit.
                │  Cursor decode,      │  Encode next_cursor.
                │  offset, limit       │
                │                      │  Output: Results
                └──────────────────────┘  (results, next_cursor,
                                           total_candidates)

5.2 Stage 1: Candidate Generation

Candidate generation is the most variable stage. The planner selects one of six physical strategies.

ANN (Approximate Nearest Neighbor): Queries the HNSW vector index via the VectorIndex trait. The query vector comes from the user's preference vector (VectorSource::UserPreference), an anchor item's embedding (similar_to), or an explicit query embedding (Search.vector). The adaptive query planner (spec 07, Section 9) selects the ANN strategy based on filter selectivity. Output: (entity_id, cosine_similarity) pairs, sorted by similarity descending.

Scan: Iterates over entities in the entity store, sorted by a signal expression (velocity, decay score, field value). The filter bitmap is applied during the scan to skip non-matching entities. Used for trending (velocity sort), browse (field sort), and queries where no similarity signal exists. Output: (entity_id, sort_value) pairs.

Hybrid (Text + Vector): Executes text retrieval (BM25 via TextIndex) and vector retrieval (ANN via VectorIndex) in parallel. Fuses results using Reciprocal Rank Fusion (RRF) or linear combination, per the profile's fusion configuration (spec 06, Section 11). Output: (entity_id, fused_score, text_score, vector_score) tuples.

Relationship: Traverses the social graph via RelationshipStore::traverse_graph(). Starting from the querying user, follows edges of the specified kind (e.g., follows) up to the configured depth. Collects item IDs by loading creator-to-item mappings for followed creators. Output: (entity_id, edge_weight) pairs.

CohortTrending: Reads cohort-scoped signal velocity for items with active cohort tracking. Filters to items above the minimum velocity threshold within the specified window. Sorts by velocity descending. Output: (entity_id, cohort_velocity) pairs.

ComposedSearch: The composed strategy for SEARCH WITHIN TRENDING. Detailed in Section 6.

5.3 Stage 2: Filter Evaluation

Filter evaluation reduces the candidate set by applying metadata predicates, user state checks, and exclusion lists. See Section 7 for the full design.

The key insight: filters are evaluated against pre-computed roaring bitmaps. For each metadata filter, the bitmap for that field/value is loaded from the Entity Model's bitmap indexes. The intersection of all filter bitmaps produces the surviving candidate set. This is an O(|bitmap|) operation, independent of the number of candidates.

Filter push-down optimization: For ANN queries, metadata filters are pushed into the vector index via the predicate callback (in-graph filter) or pre-filter bitmap (brute-force, ACORN). The candidate generation stage already applies these filters, so Stage 2 only needs to check user-state filters (unseen, saved, in-progress) and exclusion lists (blocked creators, excluded IDs).

Short-circuit on empty: If any filter bitmap has zero cardinality (e.g., category:nonexistent), the pipeline returns an empty result set immediately without proceeding to later stages.

5.4 Stage 3: Signal Loading

For each surviving candidate, the pipeline loads signal state from the hot tier. This is the most latency-sensitive stage after candidate generation.

Access pattern: For each candidate entity ID, for each signal referenced in the scoring plan's boosts, gates, and penalties:

Index into the hot tier's HotSignalState array using the entity ID and signal type index.
Load last_update_ns with Ordering::Acquire.
Load decay_scores[i] with Ordering::Acquire.
Compute the lazy-decayed score: score(now) = stored_score * exp(-lambda * (now - last_update)).
Store the result in the candidate's signal snapshot.

Memory ordering rationale: Acquire on last_update_ns ensures we see the most recent decay score that was stored with Release by a concurrent signal writer. Without Acquire, we could read a new timestamp with an old score, producing an over-decayed value. See Signal System spec (03, Section 3) for the full ordering proof.

Cost model: Each signal read is ~15ns (one cache-line load + one exp() call). For 200 candidates with 6 signals each: 200 * 6 * 15ns = 18us. This is negligible.

If a candidate entity has been evicted from the hot tier (no recent signals), its signal state is loaded from the warm tier. This requires a hash table lookup (~50ns) and potentially a disk read from the cold tier (~100us). The planner accounts for this by padding the latency estimate for scan-based queries over the full corpus.

5.5 Stage 4: Scoring

The scoring stage applies the ranking profile's formula to each candidate. Every term in the profile definition maps to a scoring operation:

For each candidate:
    base_score = retrieval_score (from Stage 1: similarity, BM25, velocity)

    // Signal boosts
    for each boost in profile.boosts:
        signal_value = candidate.signals[boost.signal][boost.window][boost.metric]
        base_score += boost.weight * signal_value

    // Relationship boosts
    for each rel_boost in profile.relationship_boosts:
        edge_weight = relationship_store.load_weight(user, candidate.creator, rel_boost.edge)
        base_score += rel_boost.weight * edge_weight

    // Social proof
    if profile.social_proof_weight > 0:
        proof_score = social_proof_map.lookup(candidate.entity_id)
        base_score += profile.social_proof_weight * proof_score

    // Temporal decay
    if profile.temporal_decay is Some:
        age = now - candidate.metadata[decay_field]
        decay_factor = exp(-ln(2) / half_life * age)
        base_score *= decay_factor

    // Quality gates (hard minimum thresholds)
    for each gate in profile.gates:
        if candidate.signals[gate.signal][gate.window][gate.metric] < gate.min:
            base_score = -inf  // eliminate candidate
            break

    // Penalties
    for each penalty in profile.penalties:
        signal_value = candidate.signals[penalty.signal][penalty.window]
        base_score += penalty.weight * signal_value  // weight is negative

    // Hard excludes (hide, block)
    for each exclude in profile.excludes:
        if exclude matches candidate:
            base_score = -inf  // eliminate candidate
            break

    candidate.final_score = base_score

Candidates with final_score == -inf (gated or excluded) are removed. Remaining candidates are sorted by final_score descending.

Social proof computation: For personalized queries (for_user is Some), social proof measures how many of the user's social connections engaged with this item. The social proof map is built as a side product of relationship traversal (depth-2 BFS, bounded fan-out) and cached for the duration of the query. Cost: <10ms for depth-2 traversal (spec 04, Section 13).

5.6 Stage 5: Diversity Enforcement

Diversity enforcement reorders the scored result set to ensure variety without reducing the result count (unless insufficient candidates exist). Three mechanisms operate in sequence:

max_per_creator: No more than N items from the same creator in the final result set. Implementation: iterate through scored results. For each creator, maintain a count. If a candidate exceeds the cap, demote it (push it down the list, do not remove it). This preserves the best-scoring item from each creator at its natural position.

format_mix: Ensure a mix of content formats (video, short, article, podcast). Implementation: round-robin insertion. After max_per_creator, partition candidates by format. Interleave from each format bucket in proportion to its representation in the scored set, biased toward higher-scoring items.

topic_diversity (MMR): Maximal Marginal Relevance. Re-scores candidates to balance relevance and novelty:

MMR_score(d) = lambda * relevance(d) - (1 - lambda) * max_sim(d, selected)

where:
  lambda = 1.0 - topic_diversity  (topic_diversity in [0.0, 1.0])
  relevance(d) = candidate's final_score from Stage 4
  max_sim(d, selected) = maximum embedding cosine similarity between d
                          and any already-selected result

MMR is the most expensive diversity operation (O(k * n) distance computations where k = selected count and n = remaining candidates). For typical result sizes (limit = 50, candidates = 200), this is 50 * 200 * ~500ns = 5ms. Within budget.

Exploration injection: If profile.exploration > 0, the pipeline reserves that fraction of result slots for items from creators the user does not follow and has not interacted with. These are drawn from the candidate set but bypassed the relationship boost. Exploration items are scored normally (they may still score well on signal boosts and text relevance) but are guaranteed representation in the final set.

5.7 Stage 6: Pagination

See Section 8 for the full pagination design. The pagination stage:

If a cursor is provided, decode it and skip to the cursor position.
Slice the result set to [cursor_offset .. cursor_offset + limit].
If more results exist beyond the slice, encode a next_cursor for the response.
Construct the Results struct with the sliced results, the cursor, and the total candidate count.

6. Query Composition

Query composition is the mechanism that powers SEARCH WITHIN TRENDING FOR COHORT. This is the most complex query type in the system, and the reason the query engine exists as a distinct module rather than a thin wrapper over subsystems.

6.1 What Composition Means

A composed query has two phases: a restriction phase that generates a constrained candidate set, and a search phase that retrieves within that set.

SEARCH items
QUERY "piano"
WITHIN TRENDING FOR COHORT young_us_jazz
WINDOW 24h
LIMIT 20

Semantics: "Find items matching 'piano' that are currently trending among young US jazz fans in the last 24 hours."

WITHIN TRENDING is a candidate generation strategy, not a filter. Items not trending in the cohort are never considered, regardless of their text relevance to "piano." The search operates only within the trending candidate set.

6.2 Composition vs. Filtering

The distinction is critical and worth making explicit:

Filter: "Find items matching 'piano', then remove items that are not trending." This is wrong because it generates candidates from the full text index, scores them, and then discards non-trending results. If only 50 of the text index's top-500 candidates happen to be trending, you get poor recall and wasted work.

Composition: "Generate the trending candidate set first (e.g., top 500 trending items), then search for 'piano' within that set." This generates candidates from the right population and searches within it. Every result is both trending AND relevant to the query.

6.3 Four-Phase Execution Flow

Composed Search: SEARCH "piano" WITHIN TRENDING FOR COHORT young_us_jazz WINDOW 24h

Phase 1: Cohort Resolution                              < 2ms
┌──────────────────────────────────────────────────────────────┐
│ Resolve "young_us_jazz" predicate:                           │
│   region_bitmap["US"] ∩ age_bitmap["18-24"]                  │
│     ∩ interests_bitmap["jazz"]                               │
│   --> user bitmap D (cohort membership)                      │
│                                                              │
│ Check cohort population: |D| >= 2000 active users?           │
│   yes --> proceed                                            │
│   no  --> fallback to parent cohort + warning                │
└──────────────────────────────────────────────────────────────┘
                          │
                          ▼
Phase 2: Cohort Trending Candidate Generation            < 20ms
┌──────────────────────────────────────────────────────────────┐
│ For items with cohort tracking active:                        │
│   Read cohort-scoped velocity for window=24h                 │
│                                                              │
│ Signal path (from Cohorts spec Section 6.3):                 │
│   - If exact_tracking: true --> Level 2 segment counter      │
│   - If single Level 1 dim  --> Level 1 rollup lookup         │
│   - If composite            --> independence estimation      │
│                                                              │
│ Filter to items with velocity > min_velocity threshold       │
│ Sort by cohort velocity descending                           │
│ Take top max_candidates (default: 500)                       │
│                                                              │
│ Output: trending_set = Vec<(EntityId, f64)>                  │
│         (entity_id, cohort_velocity)                         │
└──────────────────────────────────────────────────────────────┘
                          │
                          ▼
Phase 3: Search Within Trending Set                      < 10ms
┌──────────────────────────────────────────────────────────────┐
│ Convert trending_set entity IDs to a roaring bitmap          │
│                                                              │
│ Text search path:                                            │
│   TextIndex::score_candidates(                               │
│     entity_kind: Item,                                       │
│     query: SearchQuery parsed from "piano",                  │
│     candidate_ids: &trending_set_ids,                        │
│   )                                                          │
│   --> BM25 scores for trending items matching "piano"        │
│                                                              │
│ Vector search path (if query embedding provided):            │
│   Brute-force distance computation against trending_set      │
│   (set is small enough -- 500 items -- for exact search)     │
│   --> cosine similarity scores                               │
│                                                              │
│ Fusion (RRF or linear combination):                          │
│   Merge text and vector scores                               │
│   Carry cohort_velocity as an additional feature              │
│                                                              │
│ Output: Vec<ComposedCandidate>                               │
│   (entity_id, text_score, vector_score, fused_score,         │
│    cohort_velocity)                                          │
└──────────────────────────────────────────────────────────────┘
                          │
                          ▼
Phase 4: Final Ranking                                   < 5ms
┌──────────────────────────────────────────────────────────────┐
│ Combine search relevance with cohort trending score:         │
│                                                              │
│   final_score = alpha * fused_relevance_score                │
│               + beta * normalized_cohort_velocity            │
│               + signal_boosts + relationship_boosts          │
│               - penalties                                    │
│                                                              │
│ Where alpha + beta are derived from the ranking profile.     │
│ Default: alpha=0.6 (relevance), beta=0.4 (trending).        │
│                                                              │
│ Apply diversity constraints                                  │
│ Return top limit (20) results                                │
│                                                              │
│ Output: Results                                              │
└──────────────────────────────────────────────────────────────┘

Total estimated latency: < 37ms (within 50ms budget)

6.4 Composition Plan Type

/// Plan for a composed query (SEARCH WITHIN TRENDING).
pub(crate) struct CompositionPlan {
    /// Phase 1: cohort to resolve.
    cohort: ResolvedCohort,
    /// Phase 2: trending candidate generation.
    trending_window: Window,
    trending_min_velocity: f64,
    trending_max_candidates: usize,
    /// Phase 3: search within trending.
    search_query: SearchQuery,
    search_vector: Option<Vec<f32>>,
    fusion: FusionStrategy,
    /// Phase 4: relevance/trending weight balance.
    relevance_weight: f64,
    trending_weight: f64,
}

Consider an item that matches "piano" perfectly (BM25 score = 12.5) but has zero velocity in the cohort. With filtering, this item would appear in the initial text retrieval results (top 500 by BM25), pass through scoring, and only be removed at filter evaluation. This wastes a candidate slot that could have gone to a less-relevant but trending item.

With composition, the trending set is generated first. Only trending items enter the search phase. A text-relevant item with zero trending velocity is never evaluated. This means:

Every returned result is both trending AND text-relevant.
No candidate slots are wasted on non-trending items.
The search phase operates on a small set (500 items), making brute-force vector search practical.
The latency budget is spent on results that will actually be returned.

6.6 Fallback Behavior

If the cohort population is below the minimum threshold (from Cohorts spec Section 9.4: 2000 active users for search within cohort trending), the engine:

Emits CohortWarning::InsufficientPopulation in the response.
Falls back to the nearest parent cohort in the hierarchy that meets the threshold.
Adds a cohort-relative boost from the original cohort (if any exact data exists) as a secondary signal.

If the trending set is empty (no items trending in the cohort for this window), the engine:

Emits CompositionWarning::EmptyTrendingSet in the response.
Falls back to a standard SEARCH without the WITHIN TRENDING restriction.
Adds a note to the response indicating the fallback.

7. Filter Evaluation

7.1 Bitmap-Based Architecture

Filters are evaluated using roaring bitmaps from the Entity Model's metadata indexes (spec 02, Cohort-Ready Design). Each keyword field value, each boolean value, and each numeric range bucket has a pre-computed bitmap of entity IDs matching that value. Filter evaluation is bitmap algebra.

Filter: category:jazz AND format:video AND unseen(user_123)

Step 1: metadata filters (bitmap intersection)
  category_bitmap["jazz"]      --> bitmap A  (items in jazz category)
  format_bitmap["video"]       --> bitmap B  (items in video format)
  A ∩ B                        --> bitmap C  (jazz videos)

Step 2: user-state filters
  user_123.seen_set            --> bitmap D  (items user has seen)
  C \ D                        --> bitmap E  (unseen jazz videos)

Step 3: exclusion filters
  user_123.blocked_creators    --> bitmap F  (items by blocked creators)
  E \ F                        --> bitmap G  (final filter bitmap)

Result: bitmap G applied to candidate set

7.2 Filter Push-Down

For candidate generation strategies that support it, filters are pushed into the generation phase to reduce the number of candidates that enter later stages.

Strategy	Push-Down Mechanism
ANN	Metadata filter bitmap passed to `VectorIndex::filtered_search()` as predicate callback or pre-filter set. User-state filters evaluated in Stage 2.
Scan	Filter bitmap used to skip non-matching entities during iteration.
Hybrid	Metadata filter bitmap passed to both text and vector retrieval. Tantivy uses fast-field filtering. USearch uses predicate callback.
Relationship	Filters applied after traversal (edge targets are not pre-filtered).
CohortTrending	Metadata filters applied to the trending candidate set after velocity computation.

7.3 Filter Types

/// A filter predicate for query evaluation.
pub enum Filter {
    /// Exact equality on a keyword or boolean field.
    Eq { field: FieldName, value: FieldValue },
    /// Any of the specified values (OR within dimension).
    Any { field: FieldName, values: Vec<FieldValue> },
    /// Numeric range.
    Range { field: FieldName, min: Option<f64>, max: Option<f64> },
    /// Minimum value threshold.
    Min { field: FieldName, value: f64 },
    /// Maximum value threshold.
    Max { field: FieldName, value: f64 },
    /// Duration preset (short, medium, long).
    Preset { field: FieldName, preset: String },
    /// Created within a duration.
    CreatedWithin(Duration),
    /// Created after a timestamp.
    CreatedAfter(Timestamp),
    /// Created before a timestamp.
    CreatedBefore(Timestamp),
    /// Since a timestamp (for notifications).
    Since(Timestamp),
    /// Items the user has not seen.
    Unseen,
    /// Items the user has engaged with in a specific state.
    UserState(String),
    /// Items not by blocked creators.
    NotBlocked,
    /// Items from followed creators only.
    Relationship(RelationshipKind),
    /// Items engaged by the user's social graph.
    SocialGraph { user_id: UserId, depth: TraversalDepth },
    /// Items in a specific collection.
    InCollection(String),
}

7.4 Short-Circuit Evaluation

Filter bitmaps are evaluated in ascending cardinality order. The smallest bitmap is evaluated first, minimizing the size of subsequent intersections.

Evaluation order: sort filters by estimated bitmap cardinality ascending.

If any bitmap has cardinality 0:
  --> return empty Results immediately (short-circuit)

If bitmap intersection yields 0 after any step:
  --> return empty Results immediately (short-circuit)

This optimization is significant for multi-filter queries. A category:nonexistent filter short-circuits the entire pipeline in <1ms.

7.5 User-State Filter Implementation

User-state filters (unseen, saved, in_progress, liked) require looking up the user's per-item state. These are stored as relationship edges in the relationship store and as signal events in the signal ledger.

Unseen filter: The user's "seen" set is a bloom filter (for approximate, fast check) backed by the signal ledger (for exact verification). The bloom filter is maintained in memory and updated on every signal write. False positive rate: <1% at 10M items per user with 128-bit fingerprints.

Other user-state filters: saved, liked, in_progress are loaded from the relationship store via RelationshipStore::load_edge_set(user, edge_kind). These return a roaring bitmap of matching entity IDs. Cost: <100us per load (spec 04, Section 13).

8. Pagination

8.1 Cursor-Based Design

tidalDB uses cursor-based pagination, not offset-based. Offset pagination (LIMIT 50 OFFSET 100) breaks under concurrent writes: if new items are inserted between pages, the user sees duplicates or gaps. Cursor pagination is stable.

8.2 Cursor Structure

/// Opaque pagination cursor. Encoded as a base64 string.
pub struct Cursor {
    /// Score of the last item on the previous page.
    /// The next page starts from items with score < last_score
    /// (or < last_score at last_entity_id for tie-breaking).
    last_score: f64,
    /// Entity ID of the last item on the previous page.
    /// Tie-breaker: items with the same score are ordered by entity ID descending.
    last_entity_id: EntityId,
    /// Hash of the query parameters. Used to detect query changes between pages.
    query_hash: u64,
    /// Sequence number at cursor creation time. Used to detect stale cursors.
    created_at_seqno: u64,
}

8.3 Cursor Semantics

Page 1 (no cursor): Execute the full pipeline. Return the top limit results. Encode a cursor from the last result's score and entity ID.

Page N (with cursor): Execute the full pipeline but with an additional constraint: only consider candidates with (score, entity_id) < (cursor.last_score, cursor.last_entity_id) in the sort order. The pipeline generates candidates, filters, scores, and diversifies as normal, but the pagination stage skips results that precede the cursor position.

Stale cursor detection: The cursor contains a hash of the query parameters (profile, filters, sort, for_user). If the hash does not match the current query, QueryError::InvalidCursor is returned. This prevents confusing results from mixing parameters across pages.

Cursor expiry: Cursors do not expire by time. However, if the underlying data has changed significantly (e.g., a score recomputation shifted all scores), the cursor may produce slightly inconsistent results (a previously-returned item may re-appear if its score increased). This is acceptable for content ranking -- strict consistency across pages is not required.

8.4 Alternative: Exclude IDs

For applications that prefer simplicity over cursor semantics, exclude_ids can be used. Pass the IDs from previous pages. The pipeline treats these as hard exclusions in Stage 2. This is less efficient than cursor-based pagination (the pipeline re-scores items it will discard) but simpler to implement on the application side.

8.5 Cursor Encoding

The cursor is serialized as a base64-encoded byte sequence:

Cursor Wire Format (24 bytes before base64)

+----------+-----------+-----------+----------+
| f64 LE   | u64 BE    | u64 LE    | u64 LE   |
| score    | entity_id | query_hash| seqno    |
| 8 bytes  | 8 bytes   | 8 bytes   | 8 bytes  |
+----------+-----------+-----------+----------+

Base64 encoded: 32 characters (with padding)

Entity ID uses big-endian for lexicographic sort compatibility with the storage engine's key encoding.

9. SUGGEST Operation

9.1 Architecture

SUGGEST bypasses the six-stage execution pipeline entirely. It is a lightweight operation designed for sub-10ms response times on every keystroke.

SUGGEST "jazz pia" FOR USER user_123

                    ┌─────────────────────┐
                    │  Parse prefix:       │
                    │  last_token = "pia"  │
                    │  context = "jazz"    │
                    └──────────┬──────────┘
                               │
              ┌────────────────┼────────────────┐
              │                │                │
    ┌─────────▼──────┐ ┌──────▼────────┐ ┌─────▼──────────┐
    │ Term Prefix    │ │ Popular Query │ │ Personal       │
    │ Completions    │ │ Completions   │ │ History        │
    │                │ │               │ │                │
    │ TextIndex::    │ │ query_log     │ │ user_123's     │
    │ suggest()      │ │ signal-       │ │ recent         │
    │                │ │ weighted      │ │ searches       │
    │ "pia*" in term │ │ "jazz pia*"  │ │ and engaged    │
    │ dictionary     │ │ by click     │ │ items          │
    │                │ │ velocity     │ │                │
    └────────┬───────┘ └──────┬───────┘ └──────┬─────────┘
             │                │                │
             └────────────────┼────────────────┘
                              │
                    ┌─────────▼──────────┐
                    │ Merge, deduplicate, │
                    │ rank by:            │
                    │  1. Personal recency│
                    │  2. Query velocity  │
                    │  3. Term frequency  │
                    └─────────┬──────────┘
                              │
                              ▼
                    ["jazz piano",
                     "jazz piano tutorial",
                     "jazz piano chords",
                     "jazz pianist",
                     "jazz piano solo"]

9.2 Response Type

pub struct SuggestResult {
    /// The suggested completion string.
    pub text: String,
    /// Source of the suggestion for UI rendering.
    pub source: SuggestionSource,
    /// Relevance/popularity score for ranking.
    pub score: f64,
}

pub enum SuggestionSource {
    /// From the term dictionary (term prefix completion).
    TermCompletion,
    /// From popular query tracking (search_click signal velocity).
    PopularQuery,
    /// From the user's personal search history.
    PersonalHistory,
    /// From trending queries (high-velocity recent searches).
    TrendingQuery,
}

When the prefix is empty, SUGGEST returns trending searches: queries with the highest search_click signal velocity in the recent window (1h or 24h). These are displayed in the search UI before the user types anything.

If for_user is provided, trending searches are personalized: the list includes a mix of globally trending queries and queries trending in the user's inferred cohort (auto-derived from attributes).

9.4 Performance

Operation	Target	Conditions
Prefix autocomplete (typed prefix)	< 10ms p99	500K unique terms, 10M documents
Trending suggestions (empty prefix)	< 5ms p99	In-memory signal state
Personalized suggestions	< 10ms p99	User history in hot tier

10. Query Context

Several query parameters modify the execution context without changing the pipeline structure. They inject additional state that the planner and executor consume.

10.1 FOR USER

for_user: Some("user_123")

Provides user context for personalization. Effects:

User preference vector is loaded and used as the query vector for Candidate::Ann { query_vector: VectorSource::UserPreference }.
User state filters become available (unseen, saved, liked, in_progress).
Relationship exclusions are active (not_blocked). The user's blocked set is loaded.
Relationship boosts are computed (interaction_weight edges from user to creators).
Social proof is computed (engagement overlap between user's social graph and candidates).
Exploration injection draws from creators outside the user's engagement graph.
Auto cohort derivation is possible (CohortRef::Auto).

Without for_user, the query is unpersonalized: no user state, no relationship filtering, no social proof. This is valid for global trending, category browse, and other unpersonalized surfaces.

10.2 FOR COHORT

for_cohort: Some(CohortRef::Named("young_us_jazz"))

Scopes signal aggregation to the specified cohort. The query engine resolves the cohort to a user bitmap, maps it to the signal system's dimensional hierarchy, and reads cohort-scoped signal aggregates instead of global aggregates.

Three cohort reference types (from Cohorts spec Section 8.4):

CohortRef	Resolution
`Named("young_us_jazz")`	Look up named cohort in schema. Use cached bitmap.
`Predicate(Predicate::and(...))`	Evaluate predicate at query time. Build bitmap from attribute indexes.
`Auto`	Derive cohort from querying user's region, age_range, and top inferred interest. Requires `for_user`.

10.3 CONTEXT

context: Some("feed")

A string tag identifying the discovery surface (feed, search, browse, related, notification, etc.). Context does not affect query execution directly. It is recorded alongside query results for the feedback loop (spec 10). When the user later interacts with a result, the feedback system knows which surface produced it, enabling per-surface ranking profile optimization.

10.4 SIMILAR TO

similar_to: Some(EntityId::from("item_abc"))

Anchors the query to a specific item. The anchor item's embedding is used as the query vector for ANN search (instead of the user's preference vector). Used for:

Related content / "Up Next" (RETRIEVE items SIMILAR TO item_abc)
Creator discovery (RETRIEVE creators SIMILAR TO creator_xyz)
Visual similarity (RETRIEVE items SIMILAR TO item_abc with visual embedding slot)

If both similar_to and for_user are provided, the engine can blend the anchor embedding with the user preference vector:

query_vector = alpha * anchor_embedding + (1 - alpha) * user_preference
normalize(query_vector)

Where alpha is configurable (default: 0.7 -- biased toward the anchor). This produces "items similar to this one, tailored to this user's taste."

11. Performance Targets

11.1 End-to-End Query Latency

Query Type	Target p50	Target p99	Conditions
RETRIEVE (personalized feed, ANN)	< 30ms	< 50ms	10M items, 1M users, warm cache
RETRIEVE (trending, scan)	< 20ms	< 40ms	10M items, global velocity sort
RETRIEVE (following, relationship)	< 25ms	< 40ms	User follows 500 creators
RETRIEVE (cohort trending)	< 40ms	< 60ms	Includes cohort resolution
SEARCH (text only)	< 20ms	< 40ms	10M items, 3-term query
SEARCH (hybrid text + vector)	< 40ms	< 60ms	10M items, includes fusion
SEARCH WITHIN TRENDING FOR COHORT	< 45ms	< 70ms	Full composition
SUGGEST (typed prefix)	< 8ms	< 15ms	500K terms, 10M documents
SUGGEST (trending, empty prefix)	< 3ms	< 8ms	In-memory signal state

11.2 Per-Stage Performance Budget

The end-to-end budget is decomposed into per-stage budgets. Exceeding any stage budget triggers a warning log. Exceeding the total budget logs at WARN level.

Performance Budget Breakdown: RETRIEVE (personalized feed, ANN)
Target: < 30ms p50

Stage                           Budget (p50)    Notes
──────────────────────────────  ────────────    ─────────────────────────
1. Candidate generation (ANN)      12ms        HNSW search, ef_search=200
2. Filter evaluation                 2ms        Bitmap intersection
3. Signal loading                  0.1ms        200 candidates * 6 signals * 15ns
4. Scoring                          2ms         200 candidates, profile eval
5. Diversity enforcement            3ms         MMR with topic_diversity
6. Pagination                     0.1ms         Cursor encode/decode
──────────────────────────────  ────────────
Subtotal                          19.2ms
Overhead (plan, alloc, I/O)        3ms
──────────────────────────────  ────────────
Total                             22.2ms        Headroom: 7.8ms


Performance Budget Breakdown: SEARCH (hybrid text + vector)
Target: < 40ms p50

Stage                           Budget (p50)    Notes
──────────────────────────────  ────────────    ─────────────────────────
1. Candidate generation
   - Text retrieval (BM25)         8ms          Tantivy search, 3 terms
   - Vector retrieval (ANN)       10ms          HNSW search, ef_search=200
   (parallel, total = max)        10ms          Both legs run concurrently
   - Fusion (RRF)                  1ms          HashMap merge, sort
2. Filter evaluation                2ms          Bitmap intersection
3. Signal loading                 0.1ms          400 candidates * 6 signals
4. Scoring                         3ms           400 candidates, profile eval
5. Diversity enforcement           3ms           MMR
6. Pagination                    0.1ms
──────────────────────────────  ────────────
Subtotal                         19.2ms
Overhead (plan, alloc, I/O)       4ms
──────────────────────────────  ────────────
Total                            23.2ms          Headroom: 16.8ms


Performance Budget Breakdown: SEARCH WITHIN TRENDING FOR COHORT
Target: < 45ms p50

Phase                           Budget (p50)    Notes
──────────────────────────────  ────────────    ─────────────────────────
1. Cohort resolution                2ms          Cached bitmap intersection
2. Trending candidate gen         15ms           Scan cohort-tracked items
3. Search within trending           8ms          BM25 on 500 candidates +
                                                 brute-force vector on 500
4. Final ranking                    5ms          Signal load + scoring +
                                                 diversity + pagination
──────────────────────────────  ────────────
Subtotal                          30ms
Overhead (plan, alloc, I/O)        5ms
──────────────────────────────  ────────────
Total                             35ms           Headroom: 10ms

11.3 Throughput Targets

Metric	Target	Conditions
RETRIEVE queries per second	> 2,000 QPS	10M items, 8 cores, steady-state signal writes
SEARCH queries per second	> 1,000 QPS	10M items, 8 cores, includes fusion
SUGGEST queries per second	> 10,000 QPS	Lightweight, in-memory

The query engine is read-heavy by design. All data it reads is either immutable (entity metadata), lock-free atomic (hot tier signal state), or snapshot-isolated (Tantivy reader, USearch view). Concurrent queries do not contend with each other.

12. Query Caching

12.1 Philosophy: Cache Structure, Not Results

The query engine does not cache query results. Content ranking is inherently temporal -- signals decay, velocities change, new items arrive. A cached result set from 30 seconds ago may already be stale. Instead, tidalDB caches the structural components that are expensive to recompute but change infrequently.

12.2 What Is Cached

Cached Structure	TTL	Invalidation	Rationale
Cohort membership bitmaps	5 minutes	On cohort predicate change or attribute write	Bitmap intersection for named cohorts is O(dimensions). Once computed, the bitmap is reused across all queries targeting that cohort.
Filter bitmaps	Per-query (request-scoped)	N/A -- built fresh per query	Filter bitmaps are computed from metadata indexes. They are cheap to build (roaring bitmap ops are <2ms) and are not shared across queries because filter combinations vary.
User preference vectors	Until next embedding write	On `update_embedding()` call	The user's preference vector is loaded once per query from the entity store. It does not change during query execution.
User state sets (seen, blocked)	Request-scoped with bloom filter	Bloom filter updated on signal write	The user's seen bloom filter is maintained in memory. The blocked set is loaded from the relationship store per query (~100us).
Selectivity correlation cache	Background refresh (every 60s)	On bulk metadata writes	Joint selectivity estimates for frequently co-occurring filter pairs. Maintained by the background materializer.
Tantivy segment readers	Until segment merge	On Tantivy commit/merge	Tantivy internally manages segment reader pools. The text index trait wraps this. Readers are snapshot-isolated and reused across queries.
HNSW graph	Persistent (memory-mapped)	On index rebuild	The USearch HNSW graph is memory-mapped and shared across all concurrent queries. No per-query caching needed.
Social proof map	Request-scoped	N/A	Built during query execution via depth-2 BFS. Not shared across queries because it depends on the querying user.

12.3 What Is NOT Cached (and Why)

Not Cached	Reason
Query result sets	Results depend on real-time signal state (decay scores, velocities). Caching would serve stale rankings. The cost of re-execution (<50ms) is lower than the correctness cost of stale results.
Signal scores	Signals are read from the hot tier with lock-free atomics (~15ns per read). Caching would add staleness without meaningful latency reduction.
Scored/ranked candidates	Scoring depends on the querying user's relationship state, social proof, and exploration injection. Two users with the same query get different scores.
Trending candidate sets	Trending velocity changes continuously. A 30-second-old trending set may have materially different rankings.
Execution plans	Plans are cheap to construct (<1ms) and depend on current selectivity estimates, which change with data writes.

12.4 Warm-Up on Startup

On database startup, the following structures are warmed before the query engine accepts requests:

HNSW index: Memory-map the on-disk graph. Pre-fault pages for the entry-point neighborhood.
Hot tier signal state: Load recent signal events from the WAL into the hot tier's atomic arrays.
Named cohort bitmaps: Pre-compute membership bitmaps for all schema-defined cohorts.
Tantivy readers: Open segment readers for all entity types with text indexes.
Bloom filters: Rebuild per-user seen bloom filters from recent signal events (or load from checkpoint).

The warm-up sequence is logged with per-step timing. The database reports "ready" only after all warm-up steps complete. During warm-up, queries return QueryError::NotReady.

12.5 Cache Sizing

Structure	Memory per Unit	Sizing Formula	Example (10M items, 1M users)
Cohort bitmap	~1.2 MB per 10M items (roaring)	num_named_cohorts * 1.2 MB	50 cohorts * 1.2 MB = 60 MB
User seen bloom filter	~2 KB per user (128-bit, 10K items seen)	num_active_users * 2 KB	100K active * 2 KB = 200 MB
Selectivity correlation cache	~16 bytes per pair	top_100_pairs * 16 B	100 * 16 B = 1.6 KB (negligible)
Hot tier signal state	64 bytes per entity (cache-line aligned)	num_hot_entities * 64 B	500K hot * 64 B = 32 MB

13. Error Handling and Fallbacks

13.1 Design Principle: Degrade, Do Not Fail

The query engine follows a strict hierarchy: correct results > degraded results > empty results > error. An error is returned only when the engine cannot produce any meaningful result. In all other cases, the engine degrades gracefully and annotates the response with warnings that explain what was degraded and why.

13.2 Per-Stage Fallback Strategies

Error Handling by Pipeline Stage

Stage 1: Candidate Generation
┌──────────────────────────────────────────────────────────────────────┐
│ Failure Mode          │ Fallback                  │ Warning Emitted  │
│───────────────────────┼───────────────────────────┼──────────────────│
│ HNSW index unavail-   │ Fall back to Scan with    │ VectorIndex-     │
│ able (corrupt, not    │ signal-based sort.         │ Unavailable      │
│ loaded)               │ Personalization lost.      │                  │
│                       │                           │                  │
│ User pref vector      │ Use population centroid    │ UsingDefault-    │
│ missing               │ vector (spec 07, cold     │ Vector           │
│                       │ start). Results are        │                  │
│                       │ unpersonalized.            │                  │
│                       │                           │                  │
│ Tantivy index         │ Fall back to vector-only   │ TextIndex-       │
│ unavailable           │ search (if embedding       │ Unavailable      │
│                       │ provided) or return empty. │                  │
│                       │                           │                  │
│ Relationship store    │ Skip relationship-based    │ Relationship-    │
│ read error            │ candidates. Fall back to   │ StoreUnavailable │
│                       │ Scan with trending sort.   │                  │
│                       │                           │                  │
│ CohortTrending: zero  │ Fall back to global        │ EmptyTrending-   │
│ trending items        │ trending (drop cohort      │ Set              │
│                       │ scope).                    │                  │
└──────────────────────────────────────────────────────────────────────┘

Stage 2: Filter Evaluation
┌──────────────────────────────────────────────────────────────────────┐
│ Failure Mode          │ Fallback                  │ Warning Emitted  │
│───────────────────────┼───────────────────────────┼──────────────────│
│ Metadata bitmap       │ Skip that filter           │ FilterSkipped    │
│ missing (field not    │ dimension. Return results  │ { field }        │
│ indexed)              │ that may not satisfy the   │                  │
│                       │ missing filter.            │                  │
│                       │                           │                  │
│ User seen bloom       │ Skip unseen filter.        │ SeenFilter-      │
│ filter unavailable    │ User may see previously    │ Unavailable      │
│ (cold start)          │ seen items. Acceptable     │                  │
│                       │ for first session.         │                  │
│                       │                           │                  │
│ Blocked set load      │ THIS IS NOT DEGRADABLE.    │ N/A -- returns   │
│ failure               │ Return QueryError::        │ Err              │
│                       │ Internal. Blocked content  │                  │
│                       │ must never appear.         │                  │
└──────────────────────────────────────────────────────────────────────┘

Stage 3: Signal Loading
┌──────────────────────────────────────────────────────────────────────┐
│ Failure Mode          │ Fallback                  │ Warning Emitted  │
│───────────────────────┼───────────────────────────┼──────────────────│
│ Hot tier miss         │ Read from warm tier        │ None (expected   │
│ (entity evicted)      │ (hash table lookup).       │ for cold items)  │
│                       │ If warm miss, read from    │                  │
│                       │ cold tier (disk).          │                  │
│                       │                           │                  │
│ Warm tier read error  │ Use zero signal values.    │ SignalDegraded   │
│                       │ Item scored on retrieval   │ { entity_id }    │
│                       │ score and metadata only.   │                  │
│                       │                           │                  │
│ All signal tiers      │ Score using retrieval      │ SignalSystem-    │
│ unavailable           │ score + metadata only.     │ Unavailable      │
│                       │ Ranking is degraded but    │                  │
│                       │ results are returned.      │                  │
└──────────────────────────────────────────────────────────────────────┘

Stage 4: Scoring
┌──────────────────────────────────────────────────────────────────────┐
│ Failure Mode          │ Fallback                  │ Warning Emitted  │
│───────────────────────┼───────────────────────────┼──────────────────│
│ Social proof compute  │ Skip social proof term.    │ SocialProof-     │
│ timeout (>10ms)       │ Score without it.          │ Timeout          │
│                       │                           │                  │
│ Relationship weight   │ Skip relationship boost    │ Relationship-    │
│ load failure          │ terms. Score without them. │ BoostSkipped     │
│                       │                           │                  │
│ NaN/Inf in score      │ Replace with 0.0 and log   │ ScoreAnomaly     │
│ computation           │ at WARN. Likely a bug in   │ { entity_id }    │
│                       │ profile definition.        │                  │
└──────────────────────────────────────────────────────────────────────┘

Stage 5: Diversity Enforcement
┌──────────────────────────────────────────────────────────────────────┐
│ Failure Mode          │ Fallback                  │ Warning Emitted  │
│───────────────────────┼───────────────────────────┼──────────────────│
│ MMR embedding load    │ Skip topic_diversity       │ DiversityMMR-    │
│ failure (missing      │ enforcement. Apply only    │ Skipped          │
│ embeddings for some   │ max_per_creator and        │                  │
│ candidates)           │ format_mix.                │                  │
│                       │                           │                  │
│ Insufficient candi-   │ Return whatever candidates │ InsufficientFor- │
│ dates after diversity │ survived. Do not pad with  │ Diversity        │
│ enforcement           │ lower-quality items.       │                  │
└──────────────────────────────────────────────────────────────────────┘

Stage 6: Pagination
┌──────────────────────────────────────────────────────────────────────┐
│ Failure Mode          │ Fallback                  │ Warning Emitted  │
│───────────────────────┼───────────────────────────┼──────────────────│
│ Invalid cursor        │ Return QueryError::        │ N/A -- returns   │
│ (decode failure)      │ InvalidCursor. Client      │ Err              │
│                       │ must restart from page 1.  │                  │
│                       │                           │                  │
│ Stale cursor (query   │ Return QueryError::        │ N/A -- returns   │
│ hash mismatch)        │ InvalidCursor with         │ Err              │
│                       │ explanation.               │                  │
└──────────────────────────────────────────────────────────────────────┘

13.3 Non-Degradable Invariants

Some invariants cannot be traded for availability. If these fail, the query engine returns an error rather than degraded results.

Invariant	Why It Cannot Degrade
Blocked content exclusion	Trust and safety. Returning blocked content violates the user's explicit boundary. This is not a ranking quality issue -- it is a correctness requirement.
Hidden content exclusion	Same as blocked. The user has explicitly said "never show me this."
Profile existence	If the profile does not exist, the engine cannot score candidates. No meaningful ranking is possible.
Entity type existence	If the entity type is not in the schema, the engine does not know which store, index, or signal definitions to use.

13.4 Warning Accumulation

Warnings are accumulated during query execution and returned alongside results. The caller can inspect warnings to understand degradation and surface appropriate UI cues.

/// Warnings emitted during query execution.
/// Accumulated in the response, never swallowed silently.
pub enum QueryWarning {
    /// Vector index was unavailable. Results are not personalized.
    VectorIndexUnavailable,
    /// User's preference vector was missing. Population default used.
    UsingDefaultVector,
    /// Text index was unavailable. Search results are vector-only.
    TextIndexUnavailable,
    /// Relationship store read failed. Social features disabled.
    RelationshipStoreUnavailable,
    /// Cohort trending set was empty. Fell back to global trending.
    EmptyTrendingSet,
    /// Cohort population too small. Fell back to parent cohort.
    InsufficientCohortPopulation { cohort: String, parent: String },
    /// A filter was skipped due to missing index.
    FilterSkipped { field: String },
    /// User seen filter unavailable (bloom filter not loaded).
    SeenFilterUnavailable,
    /// Signal state was unavailable for some candidates.
    SignalDegraded { count: usize },
    /// Signal system entirely unavailable. Ranking is metadata-only.
    SignalSystemUnavailable,
    /// Social proof computation timed out.
    SocialProofTimeout,
    /// Relationship boosts were skipped.
    RelationshipBoostSkipped,
    /// Score anomaly detected (NaN/Inf replaced with 0.0).
    ScoreAnomaly { entity_id: EntityId },
    /// MMR diversity skipped due to missing embeddings.
    DiversityMMRSkipped,
    /// Fewer results than requested after diversity enforcement.
    InsufficientForDiversity { requested: usize, returned: usize },
}

/// Query results with warnings.
pub struct Results {
    /// The ranked result set.
    pub results: Vec<ResultItem>,
    /// Pagination cursor for the next page.
    pub next_cursor: Option<String>,
    /// Total candidates before pagination.
    pub total_candidates: usize,
    /// Warnings about degraded behavior during this query.
    pub warnings: Vec<QueryWarning>,
}

13.5 Observability on Degradation

Every fallback path logs at a level proportional to its severity:

Severity	Log Level	Examples
Expected	DEBUG	Hot tier miss -> warm tier read. Population default vector.
Degraded	WARN	Signal system unavailable. Text index unavailable. Social proof timeout.
Critical	ERROR	Blocked set load failure (query returns error). Score NaN detected.

Additionally, the query engine emits structured metrics for monitoring:

query.warnings_total (counter, tagged by warning type) -- rate of each warning type
query.fallback_total (counter, tagged by fallback type) -- rate of each fallback activation
query.degraded_total (counter) -- total queries with at least one warning

Operators can alert on query.degraded_total rate exceeding a threshold (e.g., >5% of queries degraded) to catch systemic subsystem failures.

14. Integration Architecture

14.1 Subsystem Coordination

The query engine coordinates six subsystems, each accessed through a trait boundary. No subsystem knows about any other. The query engine is the only module that holds references to all of them.

                    ┌─────────────────────────────────────────────────────────────┐
                    │                      QUERY ENGINE                            │
                    │                                                             │
                    │  retrieve() / search() / suggest()                          │
                    │       │                                                     │
                    │       ▼                                                     │
                    │  ┌──────────┐   ┌──────────┐   ┌────────────────────────┐  │
                    │  │ Parser   │──►│ Planner  │──►│ Executor               │  │
                    │  └──────────┘   └──────────┘   │                        │  │
                    │                                 │  Stage 1 ─► Stage 6   │  │
                    │                                 └────────────┬───────────┘  │
                    └─────────────────────────────────────────────┬───────────────┘
                                                                  │
                    ┌──────────────┬──────────────┬───────────────┼────────────────┐
                    │              │              │               │                │
            ┌───────▼──────┐ ┌────▼──────┐ ┌─────▼─────┐ ┌──────▼──────┐ ┌───────▼──────┐
            │ VectorIndex  │ │ TextIndex │ │ Signal    │ │ Relationship│ │ Entity       │
            │ (trait)      │ │ (trait)   │ │ Ledger    │ │ Store       │ │ Store        │
            │              │ │           │ │ (hot tier)│ │ (trait)     │ │ (trait)      │
            │ USearch HNSW │ │ Tantivy   │ │           │ │             │ │              │
            │ or           │ │ or        │ │ Atomic    │ │ Dual-index  │ │ redb or      │
            │ BruteForce   │ │ MockText  │ │ reads     │ │ forward/rev │ │ fjall        │
            └──────────────┘ └───────────┘ └───────────┘ └─────────────┘ └──────────────┘
                    │              │              │               │                │
                    │         Spec 06       Spec 03         Spec 04          Spec 01-02
                    │
               Spec 07

            ┌──────────────┐
            │ Cohort       │
            │ System       │
            │              │
            │ Bitmap       │
            │ resolution   │
            │ + signal     │
            │ dimensional  │
            │ hierarchy    │
            └──────────────┘
                    │
               Spec 05

14.2 Trait Dependencies

/// The query engine holds references to all subsystems via trait objects.
pub struct QueryEngine {
    vector_index: Arc<dyn VectorIndex>,
    text_index: Arc<dyn TextIndex>,
    signal_ledger: Arc<SignalLedger>,
    relationship_store: Arc<dyn RelationshipStore>,
    entity_store: Arc<dyn EntityStore>,
    cohort_system: Arc<CohortSystem>,
    schema_catalog: Arc<SchemaCatalog>,
}

Every external dependency is accessed through a trait (VectorIndex, TextIndex, RelationshipStore, EntityStore). The signal ledger and cohort system are internal (not backed by external libraries) but are still accessed through well-defined interfaces. This enables:

Unit testing with mock implementations of every subsystem.
Swapping implementations (e.g., replacing USearch with a custom HNSW) without touching query engine code.
Performance isolation -- a slow subsystem can be profiled independently.

14.3 Data Flow: RETRIEVE Personalized Feed

db.retrieve(Retrieve {
    entity: Item,
    for_user: Some("user_123"),
    profile: "for_you",
    filters: [unseen, not_blocked, eq("format", "video")],
    diversity: Some(DiversitySpec { max_per_creator: 2, format_mix: true }),
    limit: 50,
})

  1. Parser:
     - Resolve "for_you" profile --> ProfileDef with Candidate::Ann
     - Validate filters against Item entity definition
     - Verify user_123 exists

  2. Planner:
     - Load user_123 preference vector from entity store
     - Build filter bitmap: format_bitmap["video"]
     - Estimate selectivity: ~15% (video format)
     - Select ANN strategy: InGraphFilter (selectivity > 1%)
     - Build scoring plan from profile (view velocity, interaction_weight, ...)
     - Build diversity plan (max_per_creator: 2, format_mix: true)

  3. Executor:
     Stage 1 (ANN):
       vector_index.filtered_search(
         user_preference_vector, k=500,
         |entity_id| filter_bitmap.contains(entity_id)
       )
       --> 500 candidate (entity_id, similarity) pairs

     Stage 2 (Filter):
       Load user_123 seen bloom filter
       Load user_123 blocked creator set
       Remove seen items, remove blocked items
       --> ~350 candidates

     Stage 3 (Signal Load):
       For each candidate, load hot tier signal state:
         view.decay_score, view.velocity(24h), like.decay_score,
         skip.decay_score(24h), completion.value(all_time)
       --> 350 candidates with signal snapshots

     Stage 4 (Scoring):
       Apply profile: base = similarity_score
         + 0.3 * view.velocity(24h)
         + 0.2 * interaction_weight(user, creator)
         + 0.15 * social_proof
         * temporal_decay(created_at, 48h half-life)
         - 0.5 * skip(24h)
       Gate: completion(all_time) >= 0.3
       Exclude: hide signal present
       --> ~250 scored, sorted candidates

     Stage 5 (Diversity):
       max_per_creator: 2 (demote extras)
       format_mix: interleave video/short/article
       --> 250 reordered candidates

     Stage 6 (Pagination):
       Slice [0..50]
       Encode next_cursor from result[49]
       --> Results { results: [50], next_cursor: Some(...) }

db.search(Search {
    query: "piano",
    vector: Some(query_embedding),
    entity: Item,
    profile: "search",
    within_trending: Some(WithinTrending {
        cohort: CohortRef::Named("young_us_jazz"),
        window: Window::hours(24),
        min_velocity: None,
        max_candidates: Some(500),
    }),
    limit: 20,
})

  1. Parser:
     - Resolve "search" profile
     - Parse "piano" --> SearchQuery::Term("piano")
     - Resolve "young_us_jazz" cohort
     - Validate all inputs

  2. Planner:
     - Detect WithinTrending --> CompositionPlan
     - Resolve cohort bitmap (cached intersection of region:US, age:18-24, jazz)
     - Check cohort population: 45,000 active users in 24h --> sufficient
     - Plan: Phase 1 (cohort) + Phase 2 (trending) + Phase 3 (search) + Phase 4 (rank)

  3. Executor:
     Phase 1 (Cohort Resolution): 1.5ms
       cohort_system.resolve("young_us_jazz")
       --> bitmap D, cardinality 45,000

     Phase 2 (Trending Candidates): 12ms
       signal_ledger.scan_cohort_velocity(
         cohort: "young_us_jazz",
         signal: "view",
         window: 24h,
       )
       --> 500 items with highest cohort velocity
       --> trending_ids bitmap

     Phase 3 (Search Within): 8ms
       text_index.score_candidates(Item, SearchQuery::Term("piano"), &trending_ids)
       --> 73 items matching "piano" with BM25 scores

       Brute-force vector distance on 500 trending items:
       --> 500 similarity scores

       RRF fusion:
       --> 73 fused candidates (items matching text + in trending set)

     Phase 4 (Final Ranking): 4ms
       Load signals, apply scoring profile
       final_score = 0.6 * fused_relevance + 0.4 * normalized_velocity + boosts
       Diversity: max_per_creator: 2
       --> Results { results: [20], next_cursor: Some(...) }

     Total: ~25.5ms

15. Invariants and Correctness Guarantees

These invariants must hold at all times. Property tests, integration tests, and crash recovery tests enforce them.

#	Invariant	Test Strategy
1	Every returned result passed all filters. No result in the response violates any filter predicate specified in the query.	Property test: for every result in response, assert all filters hold. Fuzz test: random filter combinations, verify all results pass.
2	Results are sorted by final_score descending (within diversity reordering tolerance). After diversity enforcement, the relative score order is preserved within each diversity bucket.	Property test: verify sort order of results. Integration test: compare sorted output to naive sort of same candidates.
3	Blocked content never appears. If `for_user` is provided and the user has blocked a creator or item, that content is never in the result set -- regardless of score, filter, or diversity settings.	Property test: inject blocked relationships, verify zero results from blocked creators/items across all query types.
4	Hidden content never appears. If a user has sent a `hide` signal for an item, that item never appears for that user.	Same as above for hide signals.
5	Cursor pagination does not produce duplicates. Given stable data, paginating through a result set with cursors produces each result exactly once.	Integration test: paginate through full result set, verify no duplicate entity IDs.
6	Composition restricts, not filters. `WITHIN TRENDING` operates as a candidate generation strategy. Every result in a composed query has non-zero trending velocity in the specified cohort and window.	Property test: for every result in composed query, assert cohort velocity > 0.
7	Gated candidates are excluded. If a ranking profile defines a `Gate::min(signal, window, threshold)`, no result with a signal value below the threshold appears in the response.	Property test: inject candidates below gate threshold, verify they are absent from results.
8	Diversity max_per_creator is respected. If `max_per_creator: 2`, no more than 2 items from any single creator appear in the result set.	Property test: count per-creator items in results, assert <= max_per_creator.
9	The query engine holds no mutable state. The engine is a pure function of its inputs and the current state of the subsystems it reads from. Two identical queries at the same moment produce identical results.	Architecture invariant: no mutable fields on QueryEngine. Verified by code review and Sync + Send bounds.
10	Unknown profiles, fields, or cohorts produce typed errors, not panics. Every invalid reference produces a `QueryError` variant. The engine never panics on user input.	Fuzz test: random strings for profile names, field names, cohort names. Verify all return `Err`, never panic.
11	Signal reads use Acquire ordering. Every load from the hot tier's `AtomicU64` fields uses `Ordering::Acquire` to ensure the reader sees the most recent score written with `Ordering::Release` by a concurrent signal writer.	Code review + integration test: concurrent signal write + query read, verify monotonic score progression.
12	Empty results are not errors. A query with filters that match no items returns `Results { results: [], next_cursor: None, total_candidates: 0 }`. Not an error.	Unit test: query with impossible filter combination returns empty Results, not Err.
13	Fallback on insufficient cohort population. When a cohort has fewer active users than the minimum threshold, the engine falls back to a parent cohort and emits a warning. It does not return an error or an empty set (unless no parent meets the threshold either).	Integration test: create tiny cohort, query with FOR COHORT, verify fallback to parent and warning in response.
14	Query plan is logged for every query. At DEBUG level, every query logs its execution plan including strategy, estimated selectivity, candidate count, and latency. This is the primary observability mechanism.	Integration test: verify log output contains expected plan fields for each query type.
15	Every degradation is surfaced as a warning. If the query engine takes any fallback path (missing vector, signal tier miss, skipped filter, etc.), it emits a `QueryWarning` in the response. No degradation is silently swallowed.	Integration test: disable each subsystem, verify corresponding warning appears in response.
16	Queries before warm-up return NotReady, not incorrect results. The database does not serve queries until all warm-up steps (HNSW load, WAL replay, bloom filter rebuild, cohort bitmap computation) have completed.	Integration test: issue query before warm-up completes, verify `QueryError::NotReady`.

Appendix A: Query Error Reference

Error	When	Recovery
`UnknownProfile(name)`	Profile name not in schema catalog	Define the profile via `define_profile()`
`InvalidFilter { field, reason }`	Filter references unknown field or type mismatch	Check entity definition for valid field names and types
`UnknownCohort(name)`	Named cohort not defined in schema	Define the cohort via `define_cohort()`
`MissingUserForAutoCohort`	`CohortRef::Auto` used without `for_user`	Provide `for_user` or use `CohortRef::Named`
`UnknownUser(id)`	User ID not in entity store	Ingest the user via `write_user()`
`InvalidQuery(msg)`	Search query string has syntax error	Fix query syntax (unbalanced quotes, empty phrase, etc.)
`InvalidCursor(msg)`	Cursor hash mismatch or decode failure	Start from page 1 (no cursor)
`MissingVector(msg)`	ANN candidate strategy requires a vector that does not exist	Provide embedding or use a non-ANN profile
`Internal(msg)`	Subsystem failure (storage I/O, index corruption)	Check logs, restart database if persistent
`NotReady`	Database is still warming up (loading HNSW, replaying WAL, building bloom filters)	Retry after startup completes; monitor ready health check

Appendix B: Sort Mode Implementation Reference

Sort modes (from API.md) are implemented as sort expressions in the scan candidate strategy. Each mode maps to a signal read or metadata field access.

Sort Mode	Implementation	Signal/Field	Direction
`Relevance`	BM25 + vector fusion score	Computed at search time	DESC
`Personalized`	User preference vector similarity	Cosine similarity	DESC
`New`	Metadata field read	`created_at`	DESC
`Old`	Metadata field read	`created_at`	ASC
`Hot`	`score / (age_hours + 2)^1.8`	Composite of signal + timestamp	DESC
`Trending`	Signal velocity read	`view.velocity(6h)` + `share.velocity(6h)`	DESC
`Rising`	Velocity relative to baseline	`velocity / baseline`	DESC
`TopAllTime`	Signal accumulator	`like.decay_score(all_time)`	DESC
`TopHour`	Signal windowed count	`like.count(1h)`	DESC
`TopToday`	Signal windowed count	`like.count(24h)`	DESC
`TopWeek`	Signal windowed count	`like.count(7d)`	DESC
`TopMonth`	Signal windowed count	`like.count(30d)`	DESC
`MostViewed`	Signal windowed count	`view.count(all_time)`	DESC
`MostLiked`	Signal windowed count	`like.count(all_time)`	DESC
`MostCommented`	Signal windowed count	`comment.count(all_time)`	DESC
`MostShared`	Signal windowed count	`share.count(all_time)`	DESC
`Shortest`	Metadata field read	`duration`	ASC
`Longest`	Metadata field read	`duration`	DESC
`AlphabeticalAsc`	Metadata field read	`title`	ASC
`AlphabeticalDesc`	Metadata field read	`title`	DESC
`Shuffle`	Weighted random	`rand() * quality_score`	DESC
`LiveViewerCount`	Real-time counter	`live_viewers.count(now)`	DESC
`DateSaved`	Relationship timestamp	`saved.timestamp`	DESC
`CreatorEngagementRate`	Creator signal ratio	`creator.engagement_rate`	DESC
`Controversial`	Signal product	`max(positive_count * negative_count)`	DESC
`HiddenGems`	Quality / reach ratio	`quality_score / view_count`	DESC

107 KiB Raw Blame History