tidaldb/docs/specs/07-vector-retrieval.md
jordan 413b712c0a chore: initialize tidalDB repository with schema foundation and standards
- Schema phase 1 (tasks 01-02): EntityId, EntityKind, Timestamp, Score, SignalTypeDef, DecayModel, Window, WindowSet — all with property tests and benchmarks scaffolding
- Stub modules for storage, signals, query, ranking
- Full documentation suite: VISION, USE_CASES, SEQUENCE, API, CODING_GUIDELINES, ai-lookup, research docs, specs, roadmap, planning docs
- Marketing site (Next.js) with blog infrastructure
- .claude/ agents and skills for the tidalDB development workflow
- Foundation standards enforced: thiserror + tracing declared as dependencies, clippy::unwrap_used = deny added to lint config
- .gitignore hardened: .next/, node_modules/, .env, secrets, logs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 12:52:20 -07:00

64 KiB

Vector Retrieval Specification

Status: Draft Author: tidalDB Engineering Last Updated: 2026-02-20 Depends on: Storage Engine (01), Entity Model (02), Signal System (03) Research: docs/research/ann_for_tidaldb.md


Table of Contents

  1. Design Principles
  2. HNSW Index Internals
  3. Filtered ANN (ACORN Framework)
  4. Quantization
  5. Multiple Embedding Spaces
  6. Embedding Lifecycle
  7. Index Persistence and Recovery
  8. Hybrid Fusion with Text Retrieval
  9. Adaptive Query Planning
  10. User Preference Vector
  11. Trait Abstraction
  12. Performance Targets
  13. Invariants and Correctness Guarantees

1. Design Principles

Vector retrieval is one leg of tidalDB's retrieval system. The other is text retrieval (Tantivy/BM25). Together they produce candidate sets that the ranking engine scores. Vector retrieval handles: personalized feed generation (user preference vector vs item embeddings), semantic search (query embedding vs item embeddings), visual similarity (image embedding vs visual embeddings), creator discovery (catalog embedding similarity), and collaborative filtering via embedding-space proximity.

Invariants

These hold at all times. Property tests and crash recovery tests enforce them.

  1. The database indexes vectors. It does not generate them. External embeddings are provided by the application. Database-managed embeddings (user preference, creator catalog) are computed from external embeddings via documented formulas. No ML model inference occurs inside tidalDB.

  2. Filtered ANN is a first-class operation. Every ANN query can carry metadata predicates. The query planner selects the optimal strategy (pre-filter, in-graph filter, brute-force) based on estimated selectivity. Post-filter-and-hope is never the strategy.

  3. Trait-abstracted engine. USearch is the production HNSW implementation. It sits behind a VectorIndex trait boundary. No module outside storage/vector/ knows that USearch exists. A BruteForceIndex exists for correctness verification and small-dataset deployments.

  4. Multiple embedding spaces per entity type. An item can have a content embedding (1536d), a visual embedding (512d), and an audio embedding (256d). Each space has its own HNSW index. Cross-space queries are supported via multi-index fan-out.

  5. Embeddings are L2-normalized at insertion. Cosine similarity is computed as L2 distance over unit vectors (mathematically equivalent, more SIMD-friendly). The application does not need to pre-normalize. The database handles it.

  6. Index is derived state. The HNSW index can be rebuilt from entity store embedding columns. If the index file is corrupted, crash recovery rebuilds it. The entity store is the source of truth for vector data.


2. HNSW Index Internals

Algorithm Overview

Hierarchical Navigable Small World (HNSW) is a proximity graph algorithm for approximate nearest neighbor search. It builds a multi-layer graph where:

  • Layer 0 contains all vectors, connected to their M nearest neighbors.
  • Higher layers contain exponentially fewer nodes (each node has probability 1/ln(M) of appearing in layer l+1). These sparse layers enable logarithmic search complexity by providing long-range "express lane" connections.

Search procedure:

HNSW Search(query, K, ef_search)

1. Start at the entry point in the highest layer.
2. Greedily traverse to the nearest node to the query at this layer.
3. Drop to the next layer, using the nearest node found as the new entry point.
4. Repeat until reaching layer 0.
5. At layer 0, perform a beam search with beam width = ef_search.
   - Maintain a priority queue of ef_search candidates.
   - For each candidate, evaluate all M neighbors.
   - Expand the best unexplored candidate.
   - Continue until no unexplored candidate is closer than the farthest result.
6. Return the top K results from the priority queue.

The key insight: upper layers provide logarithmic navigation to the right neighborhood. Layer 0 provides high-recall local search within that neighborhood. ef_search controls the quality/speed tradeoff at layer 0.

Parameter Reference

Parameter Symbol Description Default Range
Max connections per layer M Number of bidirectional links per node in layer 0. Upper layers use M. 16 8-64
Construction beam width ef_construction Beam width during index build. Higher = better graph quality, slower build. 200 100-500
Search beam width ef_search Beam width during query. Higher = better recall, slower query. 200 50-500
Distance metric metric Distance function for similarity computation. L2 (cosine via normalized vectors) L2, InnerProduct, Cosine

Parameter Recommendations for tidalDB

Workload M ef_construction ef_search Rationale
Standard (10M, 1536d, recall >95%) 16 200 200 Per USearch benchmarks: 126K QPS at f32, >95% recall@100. M=16 is the production default for ScyllaDB and Qdrant at this dimensionality.
High-recall (filtered ANN, compound predicates) 32 300 300 Under selective filters, effective connectivity drops. M=32 provides ~2x the surviving edges per node. Memory overhead: +300 bytes/node (5% at 1536d f16). Research doc recommends benchmarking M=16 vs M=32 under tidalDB's filter distribution.
Low-latency (autocomplete, typeahead) 16 200 100 ef_search=100 halves query time with ~2% recall loss. Acceptable for suggestion candidates that are re-ranked anyway.
Bulk rebuild (compaction, recovery) 16 128 -- Lower ef_construction for faster rebuilds during compaction. Graph quality is slightly lower but rebuilt indexes serve queries immediately; a background process can rebuild with ef_construction=200 later.

Distance Metrics

tidalDB uses L2 distance over L2-normalized vectors as the universal distance metric. This is mathematically equivalent to cosine distance for unit vectors:

For unit vectors a, b:
  ||a - b||^2 = 2 - 2 * cos(a, b)

Minimizing L2 distance = maximizing cosine similarity.

Why L2 over native cosine: USearch and every SIMD library optimize L2 distance computation more aggressively than cosine. L2 avoids the per-query normalization step. SimSIMD (USearch's distance kernel) processes L2 at near-memory-bandwidth speeds with AVX-512 and NEON SIMD.

Inner product (MIPS) support: If tidalDB later adds collaborative filtering embeddings where vector magnitude carries meaning (e.g., popularity-scaled embeddings), MIPS queries are converted to L2 via the XBOX transformation: append one extra dimension sqrt(max_norm^2 - ||v||^2) to each stored vector and 0 to the query vector. This reduces MIPS to L2 search with no recall loss. The transformation is applied transparently at the VectorIndex trait boundary.

Layer Structure and Memory

At 10M vectors with M=16, the HNSW graph structure consumes approximately:

Graph memory per node:
  Layer 0 connections: M * sizeof(u64) = 16 * 8 = 128 bytes
  Upper layer connections (expected ~1.3 layers per node): ~40 bytes
  Node metadata (level, neighbors array offsets): ~32 bytes
  Total per node: ~200 bytes (USearch reports ~300 bytes at M=16 including alignment)

Total graph at 10M nodes: ~2-3 GB

This is modest compared to vector storage (see Quantization, Section 4). The graph structure must always reside in memory for acceptable latency. Vector data can optionally be memory-mapped.


3. Filtered ANN (ACORN Framework)

The Problem

tidalDB queries almost always carry metadata predicates: "nearest neighbors that are category:jazz AND format:video AND created within the last 7 days, excluding items the user has already seen." Naive post-filtering (run ANN, discard non-matching results) fails catastrophically when filters retain less than ~10% of the corpus -- recall drops to near zero because the top-K ANN candidates contain almost no filter-matching items.

Three Strategies

tidalDB implements three filtered ANN strategies, selected at query time by the adaptive query planner (Section 9).

Strategy 1: In-Graph Filter (USearch Predicate Callback)

When: Filter selectivity > 20% (more than 20% of items match the filter).

USearch's filtered_search(query, k, |key| predicate(key)) evaluates the predicate on each candidate node during HNSW traversal. Nodes failing the predicate are skipped for results but still used for graph navigation -- preserving search quality. This is the same approach used by ScyllaDB in production at 1 billion vectors.

In-Graph Filter Execution

query vector ──► HNSW entry point (top layer)
                      │
                      ▼ greedy descent through upper layers
                      │
                 Layer 0 beam search (ef_search candidates)
                      │
                 For each candidate node:
                   ├── Compute distance to query     ◄── always
                   ├── Add to navigation set          ◄── always (preserves graph connectivity)
                   └── Add to result set only if      ◄── predicate(node.key) == true
                       predicate passes
                      │
                      ▼
                 Top K results from result set

Predicate evaluation cost: The predicate receives a u64 key (entity ID) and must resolve all filter conditions. For tidalDB, this means:

  1. Bitmap lookup for keyword filters (roaring bitmap intersection): ~50-200ns
  2. Range check for numeric/timestamp filters: ~10ns
  3. Set membership for seen-item exclusion (bloom filter or hash set): ~20ns

Total per-node predicate cost: ~100-300ns. At ef_search=200, the search evaluates ~2000-5000 nodes, so predicate overhead is 0.2-1.5ms -- well within budget.

Strategy 2: Pre-Filter with Brute-Force (Selective Filters)

When: Filter selectivity < 1% (fewer than 1% of items match the filter).

When the filter is extremely selective, the matching set is small enough for exact brute-force computation. This gives perfect recall with no graph traversal overhead.

Pre-Filter Execution

1. Resolve filter predicates to roaring bitmaps
2. Intersect bitmaps → candidate set (e.g., 5,000 items from 10M)
3. For each candidate:
   a. Load embedding vector (from entity store or mmap'd vector storage)
   b. Compute L2 distance to query
4. Return top K by distance

Cost model: At 1536d f16, each distance computation takes ~500ns (SIMD-accelerated). For 5,000 candidates: 5000 * 500ns = 2.5ms. Plus bitmap intersection: ~100us. Total: ~3ms -- faster than HNSW traversal for this case.

Breakeven point: Brute-force beats in-graph filtering when the filtered set is smaller than approximately ef_search * 10 nodes (~2,000-5,000 for typical ef_search values). The adaptive query planner uses this heuristic.

Strategy 3: Pre-Filter with ACORN Subgraph Expansion

When: Filter selectivity 1-20% (the "danger zone").

This is the most challenging selectivity range. The filtered set is too large for brute-force but too sparse for standard HNSW traversal to maintain recall. The ACORN approach (Patel et al., SIGMOD 2024) addresses this by expanding the effective neighbor list during traversal.

ACORN-1 (two-hop expansion): Instead of checking only a node's direct M neighbors, also check neighbors-of-neighbors. This effectively increases the graph degree to M^2 under the filter, dramatically improving connectivity in sparse regions.

tidalDB implements ACORN-1 within USearch's predicate callback by maintaining traversal state:

ACORN-1 via Predicate Callback (conceptual)

For each candidate node during filtered_search:
  1. Standard: evaluate direct neighbors (USearch does this)
  2. Extension: for each direct neighbor that FAILS the predicate,
     load THAT neighbor's neighbor list and evaluate those nodes too

  This is implemented by widening ef_search (e.g., 2x-3x normal)
  and accepting the additional traversal cost.

Fallback within this strategy: If widened ef_search still returns fewer than K results, fall back to pre-filter brute-force. The query planner tracks this and adjusts thresholds for future queries.

Selectivity Estimation

The query planner needs fast, accurate selectivity estimates before choosing a strategy. tidalDB uses bitmap cardinality from its metadata indexes:

Selectivity Estimation

1. For each filter predicate:
   - Keyword equality: cardinality(bitmap[field][value]) / total_entities
   - Keyword IN-list: cardinality(union(bitmap[field][v] for v in values)) / total
   - Numeric range: estimate from sorted index statistics
   - Boolean: cardinality(bitmap[field][true_or_false]) / total
   - Seen-item exclusion: user_seen_count / total

2. For compound predicates (AND):
   - Independence assumption: selectivity = product of individual selectivities
   - Refinement: maintain joint statistics for common filter combinations

3. For compound predicates (OR):
   - selectivity = sum(individual) - sum(pairwise intersections) + ...
   - Approximation: sum(individual) * 0.9 (overlapping discount)

Independence assumption caveat: Correlated filters (e.g., category:jazz AND format:audio) violate the independence assumption. The selectivity of category:jazz AND format:audio may be 0.1% even though category:jazz is 5% and format:audio is 10% (expected: 0.5%). tidalDB maintains a correlation cache for frequently co-occurring filter pairs, updated by the background materializer.


4. Quantization

Quantization Levels

tidalDB supports three quantization levels. The default is f16, selected based on the research doc's analysis showing minimal recall loss at half the memory cost.

Level Bytes/Dim Memory at 10M x 1536d Recall@100 vs f32 Latency Impact When to Use
f32 (full precision) 4 57.2 GB baseline baseline Embedding models that require full precision; correctness verification benchmarks
f16 (half precision, default) 2 28.6 GB >99% (typically <0.5% loss) ~1.1x (slightly faster due to cache) Default for all production workloads. OpenAI, Cohere, and most transformer embeddings tolerate f16 with negligible quality loss.
int8 (scalar quantization) 1 14.3 GB 97-99% (1-3% loss) ~0.9x (faster SIMD on integer ops) Memory-constrained deployments. Acceptable when the ranking pipeline has a re-scoring stage with full-precision vectors.

Memory Budget at Scale

Complete memory budget including graph overhead (M=16, ~300 bytes/node):

Scale f32 Total f16 Total int8 Total
1M vectors 6.0 GB 3.2 GB 1.7 GB
10M vectors 60 GB 31.5 GB 17.2 GB
100M vectors 601 GB 314 GB 172 GB

tidalDB's target deployment: 10M vectors at f16 = ~31.5 GB. On a 64 GB machine, this leaves ~32 GB for entity store, signal ledger hot tier, OS page cache, and application overhead. This is a comfortable fit.

Quantization Implementation

USearch handles quantization natively. Vectors are quantized at insertion time and stored in the quantized format. Distance computation uses quantization-aware SIMD kernels (SimSIMD).

// USearch quantization configuration (at index creation)
let index = usearch::new_index(&usearch::IndexOptions {
    dimensions: 1536,
    metric: usearch::MetricKind::L2sq,
    quantization: usearch::ScalarKind::F16,  // default
    connectivity: 16,  // M parameter
    expansion_add: 200,  // ef_construction
    expansion_search: 200,  // ef_search (adjustable per query)
})?;

Quantization Selection per Embedding Slot

Different embedding slots may warrant different quantization levels:

Embedding Slot Dimensions Recommended Quantization Rationale
Item content 1536 f16 Primary retrieval vector. Must maintain high recall.
Item visual 512 f16 Visual similarity. f16 sufficient for CLIP-family embeddings.
Item audio 256 f16 Audio fingerprint. Low dimensionality keeps memory modest at any precision.
User preference 1536 f16 Database-managed, updated frequently. f16 precision sufficient for preference matching.
Creator catalog 1536 f16 Database-managed, updated daily. f16 sufficient.

Quantization level is configured per embedding slot in the entity schema definition and cannot be changed without rebuilding the HNSW index for that slot.

Product Quantization (PQ) -- Future Consideration

Product quantization compresses vectors by 4-32x by splitting them into subvectors and codebook-quantizing each subvector. USearch does not currently support PQ natively. If tidalDB needs to serve datasets exceeding available RAM (>100M vectors), PQ would be implemented as a separate indexing tier:

  • Hot tier: HNSW with f16 vectors in RAM (active entities)
  • Cold tier: PQ-compressed vectors on disk with a coarse IVF index (archived entities)

This is a post-v1 optimization. The single-node target of 10M-50M vectors fits comfortably in RAM with f16.


5. Multiple Embedding Spaces

Architecture

Each entity type can define up to 4 embedding slots (per Entity Model Specification, Section "Embedding Slot Constraints"). Each slot has:

  • Its own dimensionality
  • Its own HNSW index (independent graph structure)
  • Its own quantization level
  • Its own set of HNSW parameters
Multiple Embedding Spaces

                    ┌─────────────────────────────────────────────┐
                    │              Entity Store                    │
                    │                                             │
                    │  Item "item_abc":                           │
                    │    content_embedding: [f32; 1536]           │
                    │    visual_embedding:  [f32; 512]            │
                    │    audio_embedding:   [f32; 256]            │
                    └──────────┬──────────┬──────────┬────────────┘
                               │          │          │
                    ┌──────────▼──┐  ┌────▼──────┐  ┌▼───────────┐
                    │   HNSW      │  │   HNSW    │  │   HNSW     │
                    │  "content"  │  │  "visual" │  │  "audio"   │
                    │  1536d, f16 │  │  512d, f16│  │  256d, f16 │
                    │  M=16       │  │  M=16     │  │  M=16      │
                    │  10M nodes  │  │  10M nodes│  │  10M nodes │
                    │  ~31.5 GB   │  │  ~5.8 GB  │  │  ~3.2 GB   │
                    └─────────────┘  └───────────┘  └────────────┘

Slot Registry

The EmbeddingSlotRegistry maps slot names to their HNSW indexes and configuration:

/// Registry of all embedding slots across all entity types.
pub(crate) struct EmbeddingSlotRegistry {
    /// Maps (EntityKind, slot_name) -> EmbeddingSlotState
    slots: HashMap<(EntityKind, String), EmbeddingSlotState>,
}

pub(crate) struct EmbeddingSlotState {
    /// The HNSW index for this slot.
    index: Box<dyn VectorIndex>,
    /// Dimensions for this slot.
    dimensions: usize,
    /// Quantization level.
    quantization: ScalarKind,
    /// Whether this slot is database-managed.
    source: EmbeddingSource,
    /// HNSW parameters.
    params: HnswParams,
}

Cross-Space Queries

Some queries require searching across multiple embedding spaces simultaneously. For example: "find items whose visual embedding is near this image AND whose content embedding is near this text."

Cross-space queries execute as parallel independent searches with result intersection:

Cross-Space Query Execution

1. Parse query: two vector constraints
   - visual_embedding NEAR image_vec, top 200
   - content_embedding NEAR text_vec, top 200

2. Execute in parallel:
   - Thread A: visual_index.search(image_vec, 200, filter)
   - Thread B: content_index.search(text_vec, 200, filter)

3. Intersect result sets:
   - Items appearing in BOTH result sets get combined score
   - Score combination: configurable (RRF, weighted sum, min-of-ranks)

4. Return top K from intersection

When the intersection is empty: If no items appear in both result sets (common when K is small), fall back to score-weighted union: rank by alpha * visual_rank + (1-alpha) * content_rank with alpha configured per ranking profile.

Default Embedding Slots by Entity Type

Per the Entity Model Specification:

Entity Type Slot Name Dimensions Source HNSW Index
Item content 1536 (default, configurable) External Yes
Item visual 512 (optional) External Yes, if slot defined
Item audio 256 (optional) External Yes, if slot defined
User preference 1536 (matches Item.content) DatabaseManaged Yes
Creator catalog 1536 (matches Item.content) DatabaseManaged Yes

6. Embedding Lifecycle

Insert

When write_item(), write_user(), or write_creator() is called with an embedding:

Embedding Insert Path

1. Validate dimensions match slot definition.
2. L2-normalize the vector to unit length.
   norm = sqrt(sum(v[i]^2 for i in 0..d))
   v[i] = v[i] / norm
3. Store normalized f32 vector in entity store (META key with EMB:slot_name suffix).
   This is the source of truth.
4. Quantize to slot's precision level (f16, int8).
5. Insert into HNSW index: index.insert(entity_id, quantized_vector).
6. Entity is immediately searchable via ANN.

Normalization edge case: If the vector has zero norm (all zeros), insertion fails with SchemaError::ZeroNormEmbedding. A zero vector has no direction and cannot participate in cosine similarity.

Update

When update_item() (or equivalent) is called with a new embedding:

Embedding Update Path

1. Validate dimensions match slot definition.
2. L2-normalize the new vector.
3. Update entity store with new normalized vector.
4. Remove old vector from HNSW index: index.delete(entity_id).
5. Insert new vector into HNSW index: index.insert(entity_id, quantized_new_vector).

HNSW does not support in-place updates. The graph structure stores neighbor lists that depend on the vector's position. Changing a vector requires removing the old node and inserting a new one. USearch implements deletion as lazy tombstoning -- the node remains in the graph but is excluded from results. Tombstoned nodes are reclaimed during periodic index rebuilds (Section 7).

Concurrent read safety: A reader may query the index between the delete and insert steps. During this window, the entity is absent from ANN results. This is acceptable -- the window is microseconds, and the next query will find it. For database-managed embeddings (user preference, creator catalog) that update frequently, the update is atomic from the reader's perspective because USearch's add/remove operations are internally synchronized.

Delete

When an entity is archived or deleted:

Embedding Delete Path

1. Mark entity as deleted in HNSW index: index.delete(entity_id).
   - Tombstone: node remains in graph structure for navigation
   - Excluded from search results
2. Do NOT remove from entity store immediately (archive preserves data).
3. For hard delete: remove entity store embedding key after HNSW removal.

Tombstone accumulation: Lazy deletion means tombstoned nodes consume memory and degrade graph quality over time. The index rebuild process (Section 7) reclaims tombstoned space. The rebuild threshold is configurable:

Parameter Default Description
vector_tombstone_ratio 0.10 Trigger rebuild when tombstoned nodes exceed 10% of total
vector_rebuild_interval 24 hours Minimum time between automatic rebuilds

Batch Operations

For initial data load or bulk import:

// Batch insert for initial data load
let vectors: Vec<(EntityId, Vec<f32>)> = load_from_external();

// Reserve capacity upfront (required by USearch)
index.reserve(vectors.len())?;

// Parallel batch insert (USearch supports concurrent add)
vectors.par_iter().try_for_each(|(id, vec)| {
    let normalized = l2_normalize(vec);
    index.insert(*id, &normalized)
})?;

// Persist index to disk
index.save(&index_path)?;

Capacity planning: USearch requires reserve(capacity) before first insertion. tidalDB reserves 2x the expected entity count at schema definition time. If the index fills, a new index is built with 2x the current capacity and atomically swapped.


7. Index Persistence and Recovery

Persistence Modes

USearch provides three persistence modes. tidalDB uses all three at different lifecycle stages:

Mode Function Description Use Case
save(path) Full serialization Writes entire index (graph + vectors) to a single file. Requires O(index_size) disk I/O. Checkpoint persistence, backup
load(path) Full deserialization Reads entire index into writable RAM. Supports add/delete. Normal operation (writable index)
view(path) Memory-mapped read-only Zero-copy mmap of the saved index file. Instant availability, read-only. Fast restart, recovery serving

Persistence Strategy

Index Persistence Lifecycle

Normal Operation:
  ┌─────────────────────────────────────────────────────────┐
  │  Writable HNSW Index in RAM                             │
  │  (loaded via load() on startup)                         │
  │                                                         │
  │  Inserts, deletes, searches ── all in-memory            │
  └────────────────────────┬────────────────────────────────┘
                           │
                  Periodic save() ── coordinated with WAL checkpoint
                           │
                           ▼
  ┌─────────────────────────────────────────────────────────┐
  │  Index File on Disk                                     │
  │  data/vector/{entity_kind}_{slot_name}.usearch          │
  │                                                         │
  │  Also: entity store EMB: keys (source of truth)         │
  └─────────────────────────────────────────────────────────┘

Restart (normal):
  1. view() the latest index file ── immediate read-only serving
  2. Background: load() into writable RAM
  3. Replay WAL from last checkpoint seqno:
     - SignalEvent with embedding update → apply to writable index
     - EntityWrite with embedding → insert/update in writable index
  4. Atomic swap: replace view()'d index with writable index
  5. Resume normal operation

Restart (corrupted index file):
  1. Log warning: index file checksum mismatch or missing
  2. Scan entity store for all EMB:{slot_name} keys
  3. Bulk-rebuild HNSW index from entity embeddings
  4. save() rebuilt index
  5. Resume normal operation

Checkpoint Coordination

Index persistence is coordinated with the storage engine checkpoint (Spec 01, Section 8):

  1. The checkpoint procedure flushes signal state to the warm tier.
  2. After signal state is flushed, the vector index is saved: index.save(path).
  3. The checkpoint record includes the index save status.
  4. On recovery, the checkpoint seqno tells us how stale the index file is.

Save duration: At 10M vectors x 1536d x f16, the index file is approximately 31.5 GB (vectors) + 3 GB (graph) = ~34.5 GB. Writing 34.5 GB at NVMe sequential speeds (2 GB/s) takes ~17 seconds. This is too long for the synchronous checkpoint path.

Solution: incremental persistence. tidalDB does not save the full index at every checkpoint. Instead:

Persistence Event Trigger Method Duration
Delta log Every checkpoint (30s) Append new inserts/deletes to a delta journal file <10ms
Full save Configurable interval (default: 6 hours) or on graceful shutdown index.save() ~17s for 10M vectors
Recovery On startup if delta journal exists load() full save + replay delta journal Full load time + delta replay

The delta journal is a simple append-only file:

Delta Journal Record Format

+--------+-----------+--------+---------------------------+
| OpType | EntityId  | SlotId | Embedding (if insert)     |
| 1 byte | 8 bytes   | 2 bytes| d * bytes_per_dim bytes   |
+--------+-----------+--------+---------------------------+

OpType: 0x01 = Insert, 0x02 = Delete

Index Size Estimation Formula

For capacity planning and monitoring:

Index file size = vector_storage + graph_storage + metadata

vector_storage = num_vectors * dimensions * bytes_per_dimension
  f32: num_vectors * 1536 * 4 = num_vectors * 6,144 bytes
  f16: num_vectors * 1536 * 2 = num_vectors * 3,072 bytes
  int8: num_vectors * 1536 * 1 = num_vectors * 1,536 bytes

graph_storage = num_vectors * ~300 bytes (at M=16, per USearch internals)

metadata = ~1 MB (dimensions, metric, parameters)
Vectors f16 Index Size f32 Index Size int8 Index Size
1M 3.2 GB 6.1 GB 1.7 GB
10M 32 GB 61 GB 17 GB
50M 160 GB 305 GB 87 GB

Filesystem Layout

{data_dir}/
  vector/
    item_content.usearch           # Item content embedding HNSW index
    item_content.delta             # Delta journal since last full save
    item_visual.usearch            # Item visual embedding HNSW index (if defined)
    item_visual.delta
    item_audio.usearch             # Item audio embedding HNSW index (if defined)
    item_audio.delta
    user_preference.usearch        # User preference vector HNSW index
    user_preference.delta
    creator_catalog.usearch        # Creator catalog embedding HNSW index
    creator_catalog.delta

8. Hybrid Fusion with Text Retrieval

Overview

Hybrid search combines vector similarity (semantic meaning) with text relevance (lexical matching). The text retrieval system (Tantivy/BM25, see separate text retrieval specification) and the vector retrieval system produce independent candidate sets with independent scores. Fusion merges them into a single ranked list.

This section specifies the vector side of the fusion interface. The text retrieval specification covers the text side and the shared fusion orchestration.

Score Production and Normalization

Vector scores: USearch returns L2 distances in the range [0, +inf) for unit vectors. Since all tidalDB vectors are L2-normalized, the maximum possible L2 distance is 2.0 (diametrically opposite vectors on the unit sphere) and the minimum is 0.0 (identical vectors).

Conversion to similarity score:

cosine_similarity = 1 - (l2_distance^2 / 2)

Range: [-1, 1] for unit vectors
  1.0 = identical
  0.0 = orthogonal
 -1.0 = opposite

Normalized to [0, 1] for fusion:
  normalized_score = (cosine_similarity + 1) / 2

Range: [0, 1]
  1.0 = identical
  0.5 = orthogonal
  0.0 = opposite

Fusion Modes

tidalDB supports two fusion modes, configurable per ranking profile:

Reciprocal Rank Fusion (RRF)

RRF combines results by rank position, not by score. This avoids the calibration problem (BM25 scores and cosine similarities are on incomparable scales).

RRF_score(d) = 1 / (k + rank_text(d)) + 1 / (k + rank_vector(d))

where:
  k = 60 (default, from Cormack et al. 2009)
  rank_text(d) = position of document d in BM25 results (1-indexed, inf if absent)
  rank_vector(d) = position of document d in ANN results (1-indexed, inf if absent)

When to use RRF: Default fusion mode. Robust across query types. No tuning required. Recommended as the starting point for all hybrid search profiles.

Convex Combination (Weighted Sum)

hybrid_score(d) = alpha * text_score(d) + (1 - alpha) * vector_score(d)

where:
  alpha in [0, 1], configurable per profile
  text_score: BM25 score, min-max normalized to [0, 1] within the result set
  vector_score: cosine similarity, normalized to [0, 1] as above

When to use convex combination: After relevance labels exist to tune alpha. With labeled data, convex combination outperforms RRF because it uses score magnitude, not just rank. Without tuning, the alpha setting is a guess that can hurt more than it helps.

Two-Phase Execution Modes

The query planner selects the execution mode based on the ranking profile configuration:

Hybrid Search Execution Modes

Mode 1: Parallel (default for SEARCH queries)
  ┌──────────────┐     ┌──────────────┐
  │ Tantivy BM25 │     │ USearch ANN  │
  │ top-500      │     │ top-500      │
  └──────┬───────┘     └──────┬───────┘
         │                    │
         └────────┬───────────┘
                  ▼
          ┌──────────────┐
          │  Fuse (RRF)  │
          │  Deduplicate  │
          │  Top-K        │
          └──────┬───────┘
                 ▼
          Scoring pipeline

Mode 2: Vector-first (for RETRIEVE with ANN candidate generation)
  ┌──────────────┐
  │ USearch ANN  │
  │ top-500      │
  └──────┬───────┘
         │ candidate IDs
         ▼
  ┌──────────────┐
  │ Tantivy seek │
  │ score candidates │
  └──────┬───────┘
         │ BM25 scores for candidates
         ▼
  ┌──────────────┐
  │ Fuse scores  │
  │ Top-K        │
  └──────────────┘

Mode 3: Text-first (for SEARCH with text-dominant queries)
  ┌──────────────┐
  │ Tantivy BM25 │
  │ top-500      │
  └──────┬───────┘
         │ candidate IDs
         ▼
  ┌────────────────────┐
  │ Load embeddings    │
  │ Compute vector dist│
  └──────┬─────────────┘
         │ vector scores for candidates
         ▼
  ┌──────────────┐
  │ Fuse scores  │
  │ Top-K        │
  └──────────────┘

Mode selection heuristic:

Condition Mode Rationale
SEARCH query with both text and vector Parallel Both retrieval paths are fast; parallel minimizes latency
RETRIEVE with Candidate::Ann Vector-first ANN is the primary candidate generator; text scores are secondary
SEARCH with text only (no vector provided) Text-only No vector to search with
SEARCH with vector only (empty text query) Vector-only No text to match with
RETRIEVE with Candidate::Hybrid Parallel Profile explicitly requests hybrid

Profile Configuration for Fusion

From the API specification, ranking profiles configure fusion:

Candidate::Hybrid {
    text_weight: 0.6,
    vector_weight: 0.4,
    fusion: Fusion::Rrf { k: 60 },  // or Fusion::Convex { alpha: 0.6 }
}

The text_weight and vector_weight in the Hybrid candidate spec control candidate set sizing, not score weights:

  • text_weight: 0.6 means retrieve ceil(top_k * 0.6 / min(0.6, 0.4)) = ceil(top_k * 1.5) from BM25
  • vector_weight: 0.4 means retrieve ceil(top_k * 1.0) from ANN
  • The fusion method (RRF or Convex) determines how scores are combined

In practice, both legs retrieve approximately the same number of candidates (500 each for top_k=200) and RRF handles the weighting implicitly through rank positions.


9. Adaptive Query Planning

Decision Tree

The adaptive query planner evaluates filter selectivity and index metadata to select the optimal ANN strategy for each query. The decision is made before the search begins and logged for observability.

Adaptive Query Planner Decision Tree

                    ┌─────────────────────┐
                    │  Estimate filter     │
                    │  selectivity S       │
                    └──────────┬──────────┘
                               │
                    ┌──────────▼──────────┐
                    │  S = 100%?          │
                    │  (no filter)        │
                    └───┬─────────────┬───┘
                    yes │             │ no
                        ▼             ▼
              ┌─────────────┐  ┌──────────────────┐
              │ Standard    │  │ S > 20%?         │
              │ HNSW search │  │ (high selectivity)│
              │ (no filter) │  └──┬────────────┬──┘
              └─────────────┘  yes│            │no
                                  ▼            ▼
                        ┌─────────────┐  ┌──────────────┐
                        │ In-graph    │  │ S > 1%?      │
                        │ filter      │  │ (danger zone) │
                        │ (predicate  │  └──┬────────┬──┘
                        │  callback)  │  yes│        │no
                        └─────────────┘     ▼        ▼
                                 ┌──────────────┐  ┌──────────────┐
                                 │ Pre-filter + │  │ Pre-filter + │
                                 │ ACORN-1      │  │ brute-force  │
                                 │ (widened     │  │ (exact, fast │
                                 │  ef_search)  │  │  on small    │
                                 └──────────────┘  │  sets)       │
                                                   └──────────────┘

Threshold Reference

Selectivity Range Strategy ef_search Multiplier Expected Recall@100 Expected Latency (10M, 1536d)
100% (no filter) Standard HNSW 1x (200) >97% <10ms
20-100% In-graph predicate filter 1x (200) >95% <15ms
1-20% Pre-filter + widened HNSW (ACORN-1) 2-3x (400-600) >90% <25ms
<1% Pre-filter + brute-force N/A 100% (exact) <10ms (small filtered set)

Runtime Statistics and Threshold Tuning

The query planner collects per-query statistics to validate and adjust thresholds:

/// Statistics collected per ANN query for planner feedback.
pub(crate) struct AnnQueryStats {
    /// Estimated selectivity before execution.
    estimated_selectivity: f64,
    /// Actual selectivity (results matching filter / total evaluated).
    actual_selectivity: f64,
    /// Strategy selected by planner.
    strategy: AnnStrategy,
    /// Number of results returned.
    results_returned: usize,
    /// Requested K.
    requested_k: usize,
    /// Wall clock time for the ANN query.
    latency: Duration,
    /// Number of distance computations performed.
    distance_computations: u64,
}

Threshold adjustment: If a query using in-graph filtering at estimated selectivity 25% returns fewer than K results (recall failure), the planner lowers the in-graph threshold for subsequent queries with similar filter patterns. Conversely, if brute-force queries at 2% selectivity take longer than 20ms, the planner raises the brute-force threshold. Adjustments are bounded to prevent oscillation:

Parameter Default Min Max
in_graph_min_selectivity 0.20 0.05 0.50
brute_force_max_selectivity 0.01 0.001 0.05

Query Plan Logging

Every ANN query logs its plan at DEBUG level for observability:

[DEBUG] ANN query plan: strategy=InGraphFilter, estimated_selectivity=0.35,
        ef_search=200, K=100, filters=[category=jazz, format=video],
        index=item_content (10,234,567 vectors)

Failed queries (fewer than K results returned) log at WARN level:

[WARN] ANN query underflow: strategy=InGraphFilter, requested_k=100,
       returned=47, estimated_selectivity=0.12, actual_selectivity=0.03,
       recommendation=lower_in_graph_threshold

10. User Preference Vector

Overview

The user preference vector is a database-managed embedding that represents a user's taste profile in the same vector space as item content embeddings. It is the primary query vector for Candidate::Ann { query_vector: VectorSource::UserPreference } -- the "For You" feed.

Unlike external embeddings (provided by the application), the preference vector is computed and maintained entirely by the database. The application never writes it directly.

Update Algorithm

On every signal event involving a user and an item, the preference vector is updated:

Preference Vector Update

Given:
  pref         = current user preference vector (1536d, L2-normalized)
  item_emb     = item's content embedding (1536d, L2-normalized)
  signal_type  = type of signal (view, like, skip, hide, completion, ...)
  signal_weight = weight of the signal event (0.0 - 1.0)
  lr           = learning rate for this signal type

Positive signals (view, like, completion, share, save):
  delta = lr * signal_weight * (item_emb - pref)
  pref_new = pref + delta

Negative signals (skip, hide, not_interested, block):
  delta = lr * signal_weight * (item_emb - pref)
  pref_new = pref - delta

Re-normalize:
  pref_new = pref_new / ||pref_new||

Learning Rate Configuration

Learning rates are configured per signal type in the ranking profile:

Signal Type Default Learning Rate Rationale
view 0.005 Weak positive. Many views are passive (autoplay).
like 0.02 Moderate positive. Deliberate user action.
completion (>80%) 0.03 Strong positive. User consumed the full content.
share 0.04 Strongest positive. User endorsed publicly.
save 0.015 Moderate positive. Intent to return.
skip (<3s) 0.01 Weak negative. May be accidental or contextual.
hide 0.05 Strong negative. Deliberate rejection.
not_interested 0.03 Moderate negative. Topic-level rejection.

Effective learning rate decay: The learning rate decays with user maturity (number of signal events) to prevent wild swings in established profiles:

effective_lr = base_lr * min(1.0, maturity_cap / user_signal_count)

where:
  maturity_cap = 1000 (configurable)
  user_signal_count = total signals written for this user

Effect:
  New user (10 signals):   effective_lr = base_lr * 1.0 (full learning)
  Maturing user (500):     effective_lr = base_lr * 1.0 (still full)
  Mature user (5000):      effective_lr = base_lr * 0.2 (stabilized)
  Very mature (50000):     effective_lr = base_lr * 0.02 (very stable)

Momentum (EWMA Smoothing)

To prevent oscillation from noisy signals (a jazz fan who watches one cooking video should not shift their preference vector dramatically), updates use exponential weighted moving average (EWMA) smoothing:

Momentum Update

momentum_state = alpha * delta + (1 - alpha) * momentum_state_prev
pref_new = pref + momentum_state

where:
  alpha = 0.3 (configurable)
  delta = lr * signal_weight * direction * (item_emb - pref)

The momentum state is stored per-user in the signal ledger (8 bytes: a compressed direction indicator, not the full 1536d vector). The full momentum vector would require 1536 * 4 = 6KB per user -- at 10M users, 60 GB. Instead, tidalDB maintains a scalar momentum magnitude and direction bias:

/// Per-user preference update state. Stored in signal ledger.
pub(crate) struct PreferenceUpdateState {
    /// Scalar momentum magnitude (EWMA of recent update magnitudes).
    momentum_magnitude: f32,
    /// Number of signals processed (for learning rate decay).
    signal_count: u32,
}

Cold Start Initialization

When a new user is created with no embedding:

Cold Start Strategy

1. If user has explicit_interests (from signup):
   a. Look up representative items for each interest (e.g., top-3 items tagged "jazz")
   b. Average their content embeddings (weighted equally)
   c. L2-normalize the result
   d. Use as initial preference vector

2. If user has no explicit_interests:
   a. Use population centroid: average of all item content embeddings
   b. L2-normalize
   c. This is a "knows nothing" starting point

3. Alternative (if cohort data available):
   a. Use cohort centroid: average preference vector of users in the same
      demographic cohort (region, age_range, language)
   b. Better than population centroid when cohort is meaningful

Cold start duration: After approximately 20 signal events (empirical threshold from recommendation systems literature; Netflix, Spotify, and YouTube all converge on ~20 interactions for reasonable personalization), the preference vector becomes user-specific. Before this threshold, the ranking profile should weight exploration higher and preference similarity lower. This is configured via the profile's exploration parameter.

Preference Vector in HNSW Index

The user preference vector is indexed in its own HNSW graph (user_preference.usearch). This enables:

  • Cohort queries: "Find users with similar taste" for collaborative filtering.
  • User clustering: Background computation can cluster user preference vectors to identify taste segments.
  • User-to-user similarity: For social recommendations ("users like you also watch...").

Update frequency in HNSW: The preference vector changes on every signal event. Updating the HNSW index on every change would be prohibitively expensive (delete + insert per signal). Instead:

  1. In-memory: The latest preference vector is always in the hot tier EntitySignalState (or loaded on demand).
  2. HNSW index: Updated periodically (every N signals or every T minutes, whichever comes first).
  3. Query-time override: When a RETRIEVE query uses VectorSource::UserPreference, it reads the preference vector directly from the hot tier, not from the HNSW index. The HNSW index of user preference vectors is used only for user-to-user similarity queries.
Parameter Default Description
pref_hnsw_update_interval 100 signals or 15 minutes How often the user's HNSW node is updated
pref_learning_rate_cap 1000 Signal count at which learning rate begins to decay
pref_momentum_alpha 0.3 EWMA smoothing factor for momentum

Full Recomputation

The preference vector accumulates drift from incremental updates (floating-point rounding, ordering effects from concurrent updates). A daily background job recomputes each user's preference vector from scratch:

Full Recomputation (daily, per user)

1. Load user's signal history (last 90 days or configurable window)
2. For each signal event (chronological order):
   a. Load item's content embedding
   b. Apply update formula with original signal weight and learning rate
3. L2-normalize final result
4. Replace current preference vector
5. Update HNSW index

This is expensive (90 days * ~50 signals/day * 1536d vector load per signal per user) but runs as a low-priority background task. At 10M users, processing 1000 users/second, full recomputation takes ~2.8 hours.


11. Trait Abstraction

VectorIndex Trait

All vector search operations go through this trait. No module outside storage/vector/ references USearch types.

use std::path::Path;

/// A unique identifier for an entity in the vector index.
/// Corresponds to the u64 hash of the application-provided entity ID.
pub type VectorId = u64;

/// A scored search result from the vector index.
#[derive(Debug, Clone)]
pub struct VectorSearchResult {
    pub id: VectorId,
    /// L2 distance from query vector. Lower = more similar.
    pub distance: f32,
}

/// Configuration for HNSW index construction.
#[derive(Debug, Clone)]
pub struct VectorIndexConfig {
    /// Number of dimensions per vector.
    pub dimensions: usize,
    /// Distance metric.
    pub metric: DistanceMetric,
    /// Quantization level.
    pub quantization: QuantizationLevel,
    /// Maximum connections per node per layer.
    pub connectivity: usize,
    /// Beam width during index construction.
    pub ef_construction: usize,
    /// Default beam width during search (overridable per query).
    pub ef_search: usize,
}

#[derive(Debug, Clone, Copy, PartialEq)]
pub enum DistanceMetric {
    /// L2 squared distance. Default for cosine over normalized vectors.
    L2,
    /// Inner product. For MIPS workloads (with XBOX transformation).
    InnerProduct,
}

#[derive(Debug, Clone, Copy, PartialEq)]
pub enum QuantizationLevel {
    F32,
    F16,
    Int8,
}

/// The vector index trait. All ANN operations go through this interface.
///
/// Implementations must be `Send + Sync` for concurrent search + insert.
pub trait VectorIndex: Send + Sync {
    /// Insert a vector into the index. The vector is L2-normalized by the caller.
    ///
    /// If a vector with this ID already exists, it is replaced (delete + insert).
    ///
    /// # Errors
    /// Returns `VectorError::CapacityExceeded` if the index is full and cannot
    /// be resized. Returns `VectorError::DimensionMismatch` if the vector length
    /// does not match the index dimensions.
    fn insert(&self, id: VectorId, embedding: &[f32]) -> Result<(), VectorError>;

    /// Search for the K nearest neighbors to the query vector.
    ///
    /// Results are ordered by ascending distance (most similar first).
    ///
    /// # Arguments
    /// * `query` - The query vector. Must be L2-normalized.
    /// * `k` - Number of results to return.
    /// * `ef_search` - Beam width override. If 0, uses the index default.
    fn search(
        &self,
        query: &[f32],
        k: usize,
        ef_search: usize,
    ) -> Result<Vec<VectorSearchResult>, VectorError>;

    /// Search for the K nearest neighbors that satisfy a filter predicate.
    ///
    /// The predicate is evaluated during graph traversal (in-graph filtering).
    /// Nodes failing the predicate are used for navigation but excluded from results.
    ///
    /// # Arguments
    /// * `query` - The query vector. Must be L2-normalized.
    /// * `k` - Number of results to return.
    /// * `ef_search` - Beam width override. If 0, uses the index default.
    /// * `filter` - Predicate evaluated per candidate node. Return `true` to include.
    fn filtered_search(
        &self,
        query: &[f32],
        k: usize,
        ef_search: usize,
        filter: &dyn Fn(VectorId) -> bool,
    ) -> Result<Vec<VectorSearchResult>, VectorError>;

    /// Remove a vector from the index (lazy tombstone).
    ///
    /// The node remains in the graph for navigation but is excluded from results.
    /// Tombstoned space is reclaimed on rebuild.
    ///
    /// # Errors
    /// Returns `VectorError::NotFound` if the ID is not in the index.
    fn delete(&self, id: VectorId) -> Result<(), VectorError>;

    /// Reserve capacity for at least `additional` more vectors.
    ///
    /// Must be called before inserts if the index is at capacity.
    fn reserve(&self, additional: usize) -> Result<(), VectorError>;

    /// Persist the index to disk.
    fn save(&self, path: &Path) -> Result<(), VectorError>;

    /// Load an index from disk into writable memory.
    fn load(path: &Path, config: &VectorIndexConfig) -> Result<Self, VectorError>
    where
        Self: Sized;

    /// Memory-map an index from disk for read-only access.
    fn view(path: &Path) -> Result<Self, VectorError>
    where
        Self: Sized;

    /// Number of vectors in the index (including tombstoned).
    fn len(&self) -> usize;

    /// Number of live (non-tombstoned) vectors.
    fn len_live(&self) -> usize;

    /// Whether the index is empty.
    fn is_empty(&self) -> bool {
        self.len_live() == 0
    }

    /// Ratio of tombstoned vectors to total vectors.
    fn tombstone_ratio(&self) -> f64 {
        if self.len() == 0 {
            0.0
        } else {
            (self.len() - self.len_live()) as f64 / self.len() as f64
        }
    }
}

/// Errors from vector index operations.
#[derive(Debug)]
pub enum VectorError {
    /// Vector dimensions do not match index configuration.
    DimensionMismatch { expected: usize, got: usize },
    /// Index is at capacity and cannot accept more vectors.
    CapacityExceeded { capacity: usize },
    /// Vector ID not found in the index.
    NotFound { id: VectorId },
    /// I/O error during persistence.
    Io(std::io::Error),
    /// Index file is corrupted or incompatible.
    CorruptedIndex(String),
    /// USearch or backend-specific error.
    Backend(String),
}

Implementations

UsearchIndex (Production)

The production implementation wrapping USearch via its Rust crate (usearch, Apache-2.0, C++ FFI via cxx).

pub struct UsearchIndex {
    inner: usearch::Index,
    config: VectorIndexConfig,
}

impl VectorIndex for UsearchIndex {
    // Delegates to usearch::Index methods.
    // insert() calls inner.add(key, &vector).
    // search() calls inner.search(&query, k).
    // filtered_search() calls inner.filtered_search(&query, k, |key| filter(key)).
    // delete() calls inner.remove(key).
    // save() calls inner.save(path).
    // load() calls usearch::Index::load(path) with options.
    // view() calls usearch::Index::view(path).
}

BruteForceIndex (Correctness Verification)

An exact nearest-neighbor implementation using linear scan. Used for:

  1. Correctness testing: Compare HNSW recall against exact results.
  2. Small datasets: When the index has fewer than 10,000 vectors, brute-force is faster than HNSW.
  3. Pre-filter fallback: The adaptive query planner uses brute-force for very selective filters.
pub struct BruteForceIndex {
    vectors: RwLock<HashMap<VectorId, Vec<f32>>>,
    config: VectorIndexConfig,
}

impl VectorIndex for BruteForceIndex {
    fn search(&self, query: &[f32], k: usize, _ef: usize)
        -> Result<Vec<VectorSearchResult>, VectorError>
    {
        let vectors = self.vectors.read().unwrap();
        let mut distances: Vec<(VectorId, f32)> = vectors
            .iter()
            .map(|(&id, v)| (id, l2_distance_sq(query, v)))
            .collect();
        distances.sort_by(|a, b| a.1.partial_cmp(&b.1).unwrap());
        Ok(distances.into_iter().take(k).map(|(id, d)| {
            VectorSearchResult { id, distance: d }
        }).collect())
    }
    // ... other methods similarly straightforward
}

MockVectorIndex (Testing)

A configurable mock for unit tests that returns predetermined results or records calls for verification.

pub struct MockVectorIndex {
    /// Predetermined results to return from search calls.
    search_results: RwLock<Vec<Vec<VectorSearchResult>>>,
    /// Record of all insert/delete/search calls.
    call_log: RwLock<Vec<VectorIndexCall>>,
    config: VectorIndexConfig,
}

12. Performance Targets

Latency Targets

All targets measured at 10M vectors, 1536 dimensions, f16 quantization, M=16, on a single machine with NVMe SSD.

Operation Target Conditions
ANN search (unfiltered) <10ms p99 K=100, ef_search=200
ANN search (filtered, >20% selectivity) <15ms p99 K=100, ef_search=200, in-graph predicate
ANN search (filtered, 1-20% selectivity) <25ms p99 K=100, ef_search=400-600, ACORN-1 widened
ANN search (filtered, <1% selectivity) <10ms p99 K=100, brute-force over filtered set
Vector insert <1ms p99 Single vector, index not at capacity
Vector delete (tombstone) <100us p99 Lazy tombstone, no graph restructuring
Batch insert <50ms per 1000 vectors Parallel insertion, pre-reserved capacity
Index load (from disk) <30s 10M vectors, f16, NVMe SSD
Index view (mmap) <1s Immediate read-only availability
Preference vector update <50us Single update, hot-tier entity

Recall Targets

Configuration Recall@100 Target Measurement
Unfiltered, f32 >97% vs brute-force exact search
Unfiltered, f16 >96% vs brute-force exact search
Unfiltered, int8 >93% vs brute-force exact search
Filtered (>20% selectivity) >95% vs filtered brute-force
Filtered (1-20% selectivity) >90% vs filtered brute-force
Filtered (<1% selectivity) 100% exact (brute-force strategy)

Throughput Targets

Operation Target QPS Conditions
Unfiltered search >10,000 K=100, ef_search=200, concurrent readers
Filtered search >5,000 K=100, moderate selectivity
Mixed read/write >8,000 search + 1,000 insert/sec Concurrent operations

Memory Budget

Component 10M Vectors (f16, 1536d) Notes
Item content HNSW ~31.5 GB Vectors (28.6 GB) + graph (~3 GB)
Item visual HNSW (optional, 512d) ~5.8 GB If visual slot is defined
Item audio HNSW (optional, 256d) ~3.2 GB If audio slot is defined
User preference HNSW ~31.5 GB at 10M users Same dimensionality as item content
Creator catalog HNSW ~0.3 GB at 100K creators Same dimensionality, far fewer entities
Delta journals <100 MB Small, append-only
Minimum (content only) ~31.5 GB Single embedding slot
Typical (content + preference) ~63 GB Two 1536d indexes

For a 64 GB machine with items only (no visual/audio slots), the content HNSW index at f16 leaves ~32 GB for entity store, signal ledger, OS page cache, and application overhead. If both item content and user preference indexes are needed at 10M scale, a 128 GB machine is recommended.

Benchmark Definitions

These benchmarks must be tracked from day one using criterion:

// bench_ann_search_unfiltered: K=100, ef=200, 10M random f16 vectors, 1536d
// bench_ann_search_filtered_20pct: same + 20% selectivity keyword filter
// bench_ann_search_filtered_5pct: same + 5% selectivity compound filter
// bench_ann_search_filtered_half_pct: same + 0.5% selectivity (brute-force)
// bench_ann_insert_single: single vector insert, pre-reserved capacity
// bench_ann_insert_batch_1000: 1000 vector batch insert
// bench_ann_delete_single: single tombstone deletion
// bench_preference_update: update user preference vector on signal event
// bench_recall_at_100: measure recall@100 vs brute-force (nightly, not CI)
// bench_hybrid_fusion_rrf: parallel text+vector search with RRF fusion

Regressions in these benchmarks are treated as bugs.


13. Invariants and Correctness Guarantees

These invariants must be verified by property tests and crash recovery tests.

# Invariant Test Strategy
1 A vector inserted via insert() is retrievable via search() immediately (within the same thread). Property test: insert N vectors, search for each, verify present in results.
2 A vector removed via delete() never appears in search() or filtered_search() results. Property test: insert, delete, search, verify absent. Concurrent variant: delete while searching.
3 filtered_search returns only results for which filter(id) == true. Property test: random filter predicates, verify all results satisfy predicate. Compare count against brute-force filtered search.
4 All stored vectors are L2-normalized. `
5 Recall@100 exceeds the configured minimum (95% for standard, 90% for filtered) measured against brute-force. Nightly benchmark: 100K random vectors, 1000 random queries, compute mean recall. Fail if below threshold.
6 Index save() + load() produces an index that returns identical results to the pre-save index for the same queries. Property test: build index, search, save, load, search again, compare results.
7 Index save() + view() produces an index that returns identical results (read-only). Same as above but with view().
8 After crash recovery (rebuild from entity store), the reconstructed index achieves the same recall as the original. Crash test: build index, simulate crash (delete index file), rebuild from entity store, measure recall.
9 The preference vector update is deterministic: the same sequence of signals on the same initial vector produces the same result regardless of concurrency. Property test: generate random signal sequences, apply sequentially, verify result matches.
10 Cross-space queries (multi-index search) return only entities present in all searched indexes. Entities missing from any index are excluded, not scored as zero. Property test: insert overlapping entity sets into two indexes, cross-space search, verify intersection semantics.

References

  • ANN Research for tidalDB -- USearch evaluation, ACORN analysis, filtered ANN strategies, memory analysis, quantization comparison
  • Storage Engine Specification -- WAL, checkpoint, key encoding, crash recovery
  • Entity Model Specification -- Embedding slots, normalization, entity lifecycle
  • Signal System Specification -- Signal write path (triggers preference vector update)
  • Tantivy Research -- Text retrieval, BM25 scoring, hybrid fusion with RRF
  • VISION.md -- Retrieval modes, query surface, design principles
  • API.md -- SEARCH operation, RETRIEVE with ANN candidates, VectorSource
  • USE_CASES.md -- UC-01 (For You), UC-02 (Search), UC-05 (Related), UC-11 (Visual/Semantic)
  • CODING_GUIDELINES.md -- USearch as HNSW engine, f16 default, adaptive filtered search, trait abstraction
  • Malkov & Yashunin, "Efficient and Robust Approximate Nearest Neighbor using Hierarchical Navigable Small World Graphs" (IEEE TPAMI, 2018) -- HNSW algorithm
  • Patel et al., "ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data" (SIGMOD, 2024) -- Filtered ANN with subgraph expansion
  • Cormack, Clarke & Buettcher, "Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods" (SIGIR, 2009) -- RRF fusion
  • Pal et al., "PinnerSage: Multi-Modal User Embedding Framework for Recommendations at Pinterest" (KDD, 2020) -- Multi-vector user preference modeling
  • Cormode et al., "Forward Decay: A Practical Time Decay Model for Streaming Systems" (ICDE, 2009) -- Decay formulas applied to preference vector learning rate