tidaldb/docs/research/ann_for_tidaldb.md

# ANN for tidalDB: USearch with adaptive filtered search

**Use USearch (Unum Cloud) as tidalDB's vector index engine, with an adaptive query planner layered on top.** USearch is the only actively maintained Rust-available ANN library that supports predicate-based filtering during HNSW graph traversal — the exact algorithmic primitive tidalDB needs. ScyllaDB, ClickHouse, and DuckDB already embed USearch in production at scale, validating the approach. The filtered search callback architecture lets tidalDB evaluate arbitrary metadata predicates (date ranges, followed-creator sets, seen-item exclusions) inline during graph traversal, avoiding both post-filter recall collapse and pre-filter brute-force degradation. At 10M vectors of dimension 1536, expect **~60 GB RAM at float32** or **~15 GB with int8 quantization**, with mmap persistence for instant restart via USearch's `view()` mode.

---

## Rust ANN library landscape: most options are inadequate

The Rust ecosystem for HNSW is fragmented. Of eight libraries evaluated, only two support filtered search with active maintenance — and one dramatically outperforms the other.

| Library | Type | Stars | Active | Filtered Search | mmap | Incremental Add/Delete | Quantization | QPS (high-dim) |
|---|---|---|---|---|---|---|---|---|
| **usearch** | C++ FFI | ~3,500 | ✅ (Jan 2026) | ✅ predicate callback | ✅ `view()` | ✅ / ✅ (lazy) | f16, i8, binary | **~127K–167K** |
| **hnsw_rs** | Pure Rust | 221 | ✅ | ✅ `Filterable` trait | ✅ (vectors) | ✅ / ❌ | ❌ | ~10K–30K est. |
| **hannoy** | Pure Rust (LMDB) | New | ✅ | ✅ RoaringBitmap | ✅ (LMDB) | ✅ / ✅ | Binary only | Competitive w/ FAISS |
| **hnswlib-rs** (CoreNN) | Pure Rust | New | ✅ | ❌ | ❌ (bincode) | ✅ / ✅ (tombstone) | i8 | Unknown |
| **hora** | Pure Rust | ~2,600 | ❌ (2021) | ❌ | ❌ | ❌ | ❌ | N/A — abandoned |
| **instant-distance** | Pure Rust | 343 | ⚠️ Low | ❌ | ❌ | ❌ | ❌ | N/A |
| **arroy** | Pure Rust (LMDB) | ~500 | ✅ (superseded) | ✅ RoaringBitmap | ✅ | ❌ | Binary | Degrades at high-dim |
| **hnsw** (rust-cv) | Pure Rust | ~200 | ❌ (2021) | ❌ | ❌ | ❌ | ❌ | N/A — abandoned |

**USearch dominates on every axis that matters for tidalDB.** On a 92-core Intel Xeon with 10M vectors, USearch achieves **126,582 QPS at float32** and **166,667 QPS at int8** — roughly **150x faster than Lucene** at equivalent recall. ScyllaDB reports **12,000 QPS at >97% recall on 100M vectors of dimension 768** in production, with p99 latency under 40ms. No pure-Rust library publishes competitive numbers at this scale.

The pure-Rust alternative **hnsw_rs** (jean-pierreBoth) deserves mention as a fallback. It supports filtering via a `Filterable` trait, offers mmap for vector data, has SIMD acceleration via simdeez, and supports an unusually broad set of distance metrics including Jaccard, Hamming, and Hellinger divergence. It lacks quantization and deletion support, and has no published high-dimensional benchmarks, but at **145K crates.io downloads** it has real production usage in genomics and bioinformatics.

**hannoy** (the new Meilisearch HNSW backend) is architecturally interesting — an LMDB-backed, disk-native HNSW with DiskANN-inspired graph patching for incremental updates. It replaced arroy in Meilisearch v1.29 (December 2025), delivering **10x search speedup and 2x index size reduction**. However, it is tightly coupled to Meilisearch internals and far too new (v0.0.3) for production embedding in another database.

Qdrant's internal HNSW implementation (pure Rust, with ACORN-1 support) cannot be extracted as a standalone library. The code is deeply coupled to Qdrant's segment/storage/API layers, and the only third-party extraction attempt (qdrant-lib, 63 stars) carries massive dependency bloat.

---

## The filtered ANN problem: why USearch's callback architecture works

The core challenge is clear: "find 100 nearest items, but only from items created in the last 7 days by followed creators, excluding seen items." Naive post-filtering fails catastrophically when the filter retains less than ~10% of the corpus — recall drops to near zero because the top-k ANN candidates contain almost no filter-matching items. Pure pre-filtering (brute-force over the filtered set) works perfectly at 1% selectivity but becomes prohibitively slow when the filtered set exceeds a few hundred thousand vectors.

**Production systems converge on the same solution: evaluate filters during graph traversal, with an adaptive query planner that switches strategies based on estimated filter selectivity.** Here is how the major systems handle it:

**Qdrant** builds a "filterable HNSW" with extra graph edges per payload value, ensuring subgraph connectivity under filters. Its query planner estimates filter cardinality, then selects: payload-index-only scan for very selective filters (<1-2%), filterable HNSW traversal for moderate selectivity, and ACORN two-hop expansion for compound low-selectivity filters. Starting with v1.16, Qdrant integrates ACORN as a configurable per-query option — **2-10x slower than regular HNSW but dramatically better recall** for restrictive multi-attribute filters.

**Weaviate** takes a simpler approach: build both an inverted index and an HNSW index per shard. Pre-filtering produces a RoaringBitmap allow-list; HNSW traversal follows all edges normally but only adds allow-listed nodes to results. A `flatSearchCutoff` parameter triggers brute-force when the filtered set is small enough. Their documentation shows **recall@10 remaining near 0.99 from 100% down to 1% selectivity** with this hybrid approach. Since v1.27, Weaviate adds ACORN-style two-hop expansion for low-correlation filter scenarios.

**Pinecone** uses a single-stage approach with adaptive IVF: metadata bitmaps per field are intersected with IVF cluster assignments, excluding irrelevant clusters entirely. Their published benchmarks show **recall@10 of 0.989 on YFCC**, stable across all selectivities — and filters actually **speed up search by 35%** at 1% selectivity because they reduce the effective search space.

**The critical insight across all systems**: at extreme selectivity (<1-2%), everyone falls back to pre-filter + brute-force over the small matched set, which gives exact results quickly. The differentiating engineering happens in the **5-30% selectivity "danger zone"** where the filtered set is too large for brute-force but too sparse for standard HNSW to maintain recall.

USearch's `filtered_search(query, k, |key| predicate(key))` implements the correct in-graph filtering primitive. The predicate receives each candidate node's `Key` (u64) during traversal. Nodes failing the predicate are skipped for results but **still used for graph navigation** — preserving search quality. tidalDB's architecture would be:

```
User query → parse filter conditions → estimate selectivity via metadata indexes
  → if selectivity < 2%: pre-filter (roaring bitmap) → brute-force top-k
  → if selectivity 2-100%: index.filtered_search(vector, k, |key| check_metadata(key))
```

This mirrors exactly how ScyllaDB uses USearch in production.

---

## What ACORN teaches us about predicate-agnostic search

The ACORN paper (Patel et al., Stanford, SIGMOD 2024) introduces the most theoretically elegant solution to filtered ANN. Its core insight: expand each HNSW node's neighbor list from M to **γ·M candidates** during construction. When a selective filter eliminates most nodes, the surviving ~γ·M × selectivity edges still approximate the standard M edges needed for navigability. With γ=12 and 8% selectivity, each node retains roughly 12 × 16 × 0.08 ≈ 15 usable edges — close to the standard M=16.

ACORN achieves **2–1,000x higher QPS at 0.9 recall** compared to pre-filtering and post-filtering across multiple datasets (SIFT1M, LAION, TripClick). The lightweight ACORN-1 variant uses two-hop neighbor scanning instead of graph densification — at most 5x lower QPS than full ACORN-γ but with **9-53x lower construction time**. ACORN-1 is what Qdrant (v1.16) and Weaviate (v1.27) have adopted.

**For 1536-dimensional embeddings specifically, ACORN is the strongest academic approach.** The ETH Zurich FANNS benchmark (2025) tested all major filtered ANN methods on 2.7M documents with 1024-dim transformer embeddings. ACORN was the **only method supporting all filter types** (exact match, range, set membership, and combinations), and it **maintained performance while Filtered-DiskANN, CAPS, and UNG all failed to reach >25% recall** on this high-dimensional dataset.

Other notable academic approaches and their applicability:

- **Filtered-DiskANN** (Microsoft, WWW 2023): builds label-aware Vamana graphs. Excellent for categorical label filters — deployed in Microsoft Ads with **30-80% revenue gains**. Limited to equality predicates on ~1,000 labels; struggles with high-dimensional transformer embeddings.
- **SeRF** (SIGMOD 2024): segment graph specifically for range filters on ordered attributes (timestamps, prices). Excellent for tidalDB's "last 7 days" filter component but static — no incremental updates.
- **NHQ** (TKDE 2022 / NeurIPS 2024): fuses embedding distance with attribute dissimilarity into a combined metric. Fast (10-315x over baselines) but returns approximate filter matches — not guaranteed to satisfy predicates. Unsuitable when hard filter compliance is required.
- **CAPS** (2023): partition-based approach with **10% the index size** of graph methods. Impressive on low-dimensional data but fails on transformer embeddings at scale.

**Practical recommendation**: tidalDB does not need to implement ACORN directly. USearch's predicate callback during traversal achieves the same effect (skipping non-matching nodes while preserving graph navigation). If recall degrades under very selective compound filters, tidalDB can implement ACORN-1 style two-hop expansion by having the predicate maintain state and exploring neighbors-of-neighbors — or simply fall back to pre-filter + brute-force for the most selective cases. The adaptive query planner handles this automatically.

---

## Memory, persistence, and quantization at scale

At 1536 dimensions with HNSW (M=16), memory is dominated by raw vectors — the graph adds only ~300 bytes per node (~5% overhead):

| Scale | Float32 Vectors | HNSW Graph | **Total** | With f16 | With int8 | With Binary + Rescore |
|---|---|---|---|---|---|---|
| **1M** | 5.72 GB | 0.29 GB | **6.0 GB** | 3.2 GB | 1.7 GB | 0.5 GB |
| **10M** | 57.2 GB | 2.86 GB | **60 GB** | 31.5 GB | 17.2 GB | 4.7 GB |
| **100M** | 572 GB | 28.6 GB | **601 GB** | 314 GB | 172 GB | 47 GB |

**USearch's f16 quantization is the optimal default for tidalDB.** It halves memory with negligible recall loss (<1%) — bringing 10M vectors from 60 GB to ~32 GB, comfortably fitting on a single 64 GB node. Int8 quantization reduces to 17 GB with 1-3% recall loss. Binary quantization achieves 32x compression but requires full-precision rescoring from disk for acceptable recall.

**Persistence strategy**: USearch provides three modes — `save()` for full serialization, `load()` for deserialization into RAM, and `view()` for zero-copy mmap serving. The recommended pattern for tidalDB:

1. **Active index in RAM** for reads and writes during operation
2. **Periodic `save()`** to persist to disk (coordinated with tidalDB's WAL/checkpointing)
3. **On restart: `view()`** for immediate read-only serving while a writable copy loads in background
4. **For datasets exceeding RAM**: mmap vectors to NVMe SSD while keeping the HNSW graph (~29 GB for 100M vectors) in memory. Expect 2-10x latency increase for mmap'd vectors depending on OS page cache hit rates. Milvus specifically recommends HNSW over IVF for mmap workloads because graph traversal locality is better than IVF's random cluster access.

**Incremental updates**: USearch supports `add(key, vector)` and `remove(key)` natively. Deletion is lazy (tombstoning). One constraint: `reserve(capacity)` must be called before first write, requiring capacity planning. tidalDB should either over-provision (2x expected count) or implement segment-based index management — build new segments for incoming data, periodically merge segments into a rebuilt index that reclaims tombstoned space. This mirrors Qdrant's and Tantivy/Lucene's proven segment architecture.

**The DiskANN alternative** (Microsoft) uses a fundamentally different approach for datasets that don't fit in RAM: a single-layer Vamana graph with Product Quantization in memory for coarse search, full vectors on SSD for rescoring. DiskANN achieves **<5ms mean latency at 95%+ recall on 1 billion 128D vectors** using SSD + 64 GB RAM. A pure-Rust DiskANN implementation exists (infinilabs/diskann) but is early-stage. For tidalDB's single-node scale (≤100M vectors), HNSW with mmap is simpler and sufficient.

---

## Multi-vector retrieval needs no special indexing

For "For You" feeds driven by a user's history of interactions, the question is how to query with a preference derived from multiple embeddings. **The answer is PinnerSage-style multi-query with result merging** — no special index modifications required.

Pinterest's PinnerSage system (KDD 2020, 400M+ MAU in production) proved that **averaging multiple interest embeddings catastrophically loses information**. Averaging embeddings for "hiking," "cooking," and "cars" produces a centroid best represented by "energy boosting breakfast" — semantically unrelated to any actual interest. Instead, PinnerSage clusters user interactions via Ward hierarchical clustering into 3-100 coherent interest groups per user, represents each with a medoid (actual item, not centroid), and issues **separate ANN queries per interest cluster**.

For tidalDB, this means:
1. Pre-compute user interest clusters offline (3-10 clusters per user)
2. Store cluster medoids/centroids per user
3. At query time: issue 3-10 standard `filtered_search` calls (one per top cluster), merge and deduplicate results by score
4. For users with <5 interactions: simple weighted average is acceptable

This requires only standard single-vector ANN queries — USearch's filtered_search works directly. The total query cost scales linearly with cluster count, but since each query is independent, they can execute in parallel.

For cosine vs. inner product: OpenAI 1536D embeddings are designed for cosine similarity. **Normalize vectors at insertion time** and use L2 distance (equivalent to cosine for unit vectors, and more SIMD-friendly). If tidalDB later adds collaborative-filtering-style embeddings where magnitude carries meaning, implement the XBOX transformation (append one extra dimension) to convert MIPS to L2.

---

## Implementation recommendation: wrap USearch, build the planner

**Recommended architecture**: Embed USearch as tidalDB's vector index via its Rust crate (Apache-2.0, single dependency on `cxx`), and build three layers on top.

**Layer 1 — Metadata indexes** (tidalDB builds): Maintain roaring bitmaps per high-cardinality filter value (creator_id → bitmap of their item keys), B-tree indexes for range attributes (created_at), and a bloom filter or hash set for per-user seen-item exclusion. These enable both fast cardinality estimation and efficient predicate evaluation inside USearch's callback.

**Layer 2 — Adaptive query planner** (tidalDB builds): Before each search, estimate filter selectivity from metadata index statistics:
- **Selectivity <2%**: Pre-filter via bitmap intersection → brute-force L2 scan over matched vectors (exact, fast on small sets)
- **Selectivity 2-100%**: Call `index.filtered_search(query, k, |key| check_all_filters(key))` — USearch handles in-graph filtered traversal
- **Fallback**: If filtered_search returns fewer than k results with high ef_search, widen the search or fall back to pre-filter + brute-force

**Layer 3 — Persistence and lifecycle** (tidalDB builds): WAL-based durability wrapping USearch's save/load/view. Segment-based index management for growing datasets. Periodic compaction to reclaim tombstoned vectors. On restart, `view()` for immediate read serving.

**Why not build HNSW from scratch**: Implementing a correct, high-performance, concurrent HNSW with SIMD-optimized distance computation is **6-12 months of dedicated systems work**. USearch's C++ core has been battle-tested across ScyllaDB (1B vectors), ClickHouse, and DuckDB. The FFI boundary via CXX is thin and well-maintained. The engineering effort is better spent on tidalDB's metadata filtering, query planning, and persistence layers — the parts that differentiate a database from a bare index.

**Why not use hnsw_rs instead**: It's pure Rust (avoiding FFI), but lacks quantization, deletion support, and published high-dimensional benchmarks. For a performance-critical vector database, USearch's 10-100x performance advantage (via SimSIMD and architectural optimizations) outweighs FFI purity concerns. If tidalDB later needs to eliminate C++ dependencies, hnsw_rs is a viable migration target — its `Filterable` trait provides the same predicate-during-traversal capability.

---

## Open questions requiring benchmarking in tidalDB's conditions

**Must benchmark before committing:**
- USearch filtered_search latency as a function of predicate evaluation cost. If tidalDB's `check_all_filters(key)` requires random access to a metadata store, the overhead per HNSW hop could dominate. Benchmark with realistic filter complexity (bitmap lookup + range check + set membership) to establish the latency budget per predicate call.
- Recall@10 and QPS at 1536D for USearch at 1M and 10M vectors with tidalDB's actual filter selectivity distribution. No published benchmark tests USearch's filtered search at this dimensionality.
- Memory overhead of USearch's graph structure at 1536D with M=16 vs M=32 — higher M improves recall under selective filters but increases memory.
- `reserve()` capacity planning: what happens when the index fills up? Is there a graceful resize path or does it require a full rebuild?

**Should investigate:**
- USearch's behavior under concurrent writes + filtered reads — ScyllaDB validates concurrent operation but tidalDB's access patterns may differ.
- The crossover point where pre-filter brute-force beats filtered HNSW for tidalDB's data distribution. This determines the query planner's switching threshold.
- Whether USearch's `view()` mmap mode supports concurrent search adequately, or if a writable `load()` is always needed for production serving.
- f16 vs. int8 quantization recall impact specifically for OpenAI text-embedding-3-large vectors — quantization tolerance varies by embedding model.
- Incremental index degradation: after 100K inserts + 50K deletes without compaction, how much does recall degrade?
- ACORN-1 style two-hop expansion: can this be implemented within USearch's predicate callback (by maintaining traversal state), or would it require patching USearch's C++ core?