- m0p3: CONTRIBUTING.md with run-samples checklist, all 4 examples (quickstart, cli_embedding, axum_embedding, actix_embedding), doc-test coverage for every public API surface - m1p5: TidalDb public API — write_item, signal, read_decay_score, read_windowed_count, read_velocity; StorageBox enum routing memory vs fjall; WalSender/WalHandleWriter bridge; WAL replay on open - Periodic checkpoint: 30s background thread for persistent+schema mode; FjallBackend::Clone (O(1), fjall::Keyspace is ref-counted); graceful shutdown via Arc<AtomicBool> + join before final checkpoint - ROADMAP.md: M0 and M1 fully marked COMPLETE (341 tests passing) - Milestone 2 planning scaffolding added under docs/planning/milestone-2/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
31 KiB
Task 01: VectorIndex Trait + BruteForceIndex
Context
Milestone: 2 -- Ranked Retrieval Phase: m2p1 -- Vector Index Integration (USearch) Depends On: None (uses types from m1p1 but no m2p1 tasks) Blocks: Task 02 (USearch Backend), Task 03 (Embedding Lifecycle), Task 04 (Adaptive Query Planner) Complexity: M
Objective
Deliver the VectorIndex trait -- the public interface for all ANN operations in tidalDB -- along with the full type system for vector search (VectorId, VectorSearchResult, VectorIndexConfig, DistanceMetric, QuantizationLevel, VectorError) and two pure-Rust implementations: BruteForceIndex (exact linear-scan search) and MockVectorIndex (predetermined results for unit tests).
The VectorIndex trait is the abstraction boundary. No module outside storage/vector/ will ever know whether USearch, hnsw_rs, or brute-force is behind it. This is the same pattern as StorageEngine in m1p3: define the trait first, implement brute-force for correctness, then add the production backend in the next task.
BruteForceIndex is not a throwaway. It serves three permanent roles:
- Correctness oracle -- recall measurements compare HNSW results against
BruteForceIndexexact results. - Small datasets -- when the index has fewer than ~10,000 vectors, brute-force is faster than HNSW because there is no graph construction overhead.
- Pre-filter fallback -- the adaptive query planner (Task 04) uses
BruteForceIndex-style linear scan over bitmap-filtered candidate sets when selectivity < 1%.
No unsafe code in this task. Pure Rust throughout.
Requirements
VectorIndextrait:insert,search,filtered_search,delete,reserve,save,load,view,len,len_live,is_empty,tombstone_ratio- All trait methods match the signatures in Spec 07, Section 11
VectorIndex: Send + SyncboundVectorId = u64type aliasVectorSearchResult { id: VectorId, distance: f32 }withDebug,CloneVectorIndexConfigwith all HNSW parametersDistanceMetricenum:L2,InnerProductQuantizationLevelenum:F32,F16,Int8VectorErrorenum withDisplay,Debug,From<std::io::Error>BruteForceIndex:RwLock<HashMap<VectorId, Vec<f32>>>for storage, linear scan for searchBruteForceIndex::searchreturns results sorted by ascending L2 squared distanceBruteForceIndex::filtered_searchapplies predicate during linear scan, returns only matching resultsBruteForceIndex::deleteremoves the vector from the HashMap (true delete, not tombstone)BruteForceIndex::save/load/viewuse a simple binary format for test persistenceMockVectorIndex: predetermined results, call recording for test assertions- No
unsafecode
Technical Design
Module Structure
tidal/src/storage/vector/
mod.rs -- VectorIndex trait, all types, re-exports
brute.rs -- BruteForceIndex, MockVectorIndex
Public API
// === storage/vector/mod.rs ===
use std::path::Path;
/// A unique identifier for an entity in the vector index.
/// Corresponds to the u64 representation of the application-provided entity ID.
pub type VectorId = u64;
/// A scored search result from the vector index.
#[derive(Debug, Clone)]
pub struct VectorSearchResult {
/// Entity ID in the vector index.
pub id: VectorId,
/// L2 squared distance from query vector. Lower = more similar.
/// For L2-normalized vectors, range is [0.0, 4.0] where 0.0 = identical.
pub distance: f32,
}
/// Configuration for vector index construction.
#[derive(Debug, Clone)]
pub struct VectorIndexConfig {
/// Number of dimensions per vector.
pub dimensions: usize,
/// Distance metric.
pub metric: DistanceMetric,
/// Quantization level for stored vectors.
pub quantization: QuantizationLevel,
/// Maximum connections per node per layer (M parameter). Default: 16.
pub connectivity: usize,
/// Beam width during index construction. Default: 200.
pub ef_construction: usize,
/// Default beam width during search (overridable per query). Default: 200.
pub ef_search: usize,
}
impl Default for VectorIndexConfig {
fn default() -> Self {
Self {
dimensions: 1536,
metric: DistanceMetric::L2,
quantization: QuantizationLevel::F16,
connectivity: 16,
ef_construction: 200,
ef_search: 200,
}
}
}
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum DistanceMetric {
/// L2 squared distance. Default for cosine over normalized vectors.
L2,
/// Inner product. For MIPS workloads (with XBOX transformation).
InnerProduct,
}
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum QuantizationLevel {
/// Full precision (4 bytes per dimension).
F32,
/// Half precision (2 bytes per dimension). Default.
F16,
/// Scalar quantization (1 byte per dimension).
Int8,
}
/// Errors from vector index operations.
#[derive(Debug)]
pub enum VectorError {
/// Vector dimensions do not match index configuration.
DimensionMismatch { expected: usize, got: usize },
/// Index is at capacity and cannot accept more vectors.
CapacityExceeded { capacity: usize },
/// Vector ID not found in the index.
NotFound { id: VectorId },
/// I/O error during persistence.
Io(std::io::Error),
/// Index file is corrupted or incompatible.
CorruptedIndex(String),
/// USearch or backend-specific error.
Backend(String),
/// Vector has zero L2 norm and cannot be normalized.
ZeroNormVector,
}
// Note: `ZeroNormVector` is not in Spec 07 Section 11 but is required by `l2_normalize()` in Task 03. Spec 07 should be updated to include it.
impl std::fmt::Display for VectorError { /* variant-specific messages */ }
impl std::error::Error for VectorError {}
impl From<std::io::Error> for VectorError { /* wraps as VectorError::Io */ }
/// The vector index trait. All ANN operations go through this interface.
///
/// Implementations must be `Send + Sync` for concurrent search + insert.
///
/// # Contract
///
/// - Vectors passed to `insert()` must already be L2-normalized. The trait
/// does not normalize -- the caller (embedding lifecycle, Task 03) is
/// responsible for normalization before insertion.
/// - `search()` and `filtered_search()` return results sorted by ascending
/// distance (most similar first).
/// - `delete()` marks a vector as tombstoned. Tombstoned vectors are excluded
/// from search results but may remain in the index structure.
pub trait VectorIndex: Send + Sync {
/// Insert a vector into the index.
///
/// If a vector with this ID already exists, it is replaced (delete + insert).
///
/// # Errors
///
/// - `VectorError::CapacityExceeded` if the index is full.
/// - `VectorError::DimensionMismatch` if `embedding.len() != config.dimensions`.
fn insert(&self, id: VectorId, embedding: &[f32]) -> Result<(), VectorError>;
/// Search for the K nearest neighbors to the query vector.
///
/// Results are ordered by ascending distance (most similar first).
///
/// # Arguments
///
/// * `query` -- The query vector. Must be L2-normalized.
/// * `k` -- Number of results to return.
/// * `ef_search` -- Beam width override. If 0, uses the index default.
fn search(
&self,
query: &[f32],
k: usize,
ef_search: usize,
) -> Result<Vec<VectorSearchResult>, VectorError>;
/// Search for the K nearest neighbors that satisfy a filter predicate.
///
/// The predicate is evaluated during traversal. Nodes failing the predicate
/// are used for navigation but excluded from results (in-graph filtering).
///
/// # Arguments
///
/// * `query` -- The query vector. Must be L2-normalized.
/// * `k` -- Number of results to return.
/// * `ef_search` -- Beam width override. If 0, uses the index default.
/// * `filter` -- Predicate per candidate node. Return `true` to include.
fn filtered_search(
&self,
query: &[f32],
k: usize,
ef_search: usize,
filter: &dyn Fn(VectorId) -> bool,
) -> Result<Vec<VectorSearchResult>, VectorError>;
/// Remove a vector from the index (lazy tombstone).
///
/// # Errors
///
/// - `VectorError::NotFound` if the ID is not in the index.
fn delete(&self, id: VectorId) -> Result<(), VectorError>;
/// Reserve capacity for at least `additional` more vectors.
fn reserve(&self, additional: usize) -> Result<(), VectorError>;
/// Persist the index to disk.
fn save(&self, path: &Path) -> Result<(), VectorError>;
/// Load an index from disk into writable memory.
fn load(path: &Path, config: &VectorIndexConfig) -> Result<Self, VectorError>
where
Self: Sized;
/// Memory-map an index from disk for read-only access.
// config required by USearch to initialize the mmap'd index with correct parameters
fn view(path: &Path, config: &VectorIndexConfig) -> Result<Self, VectorError>
where
Self: Sized;
/// Number of vectors in the index (including tombstoned).
fn len(&self) -> usize;
/// Number of live (non-tombstoned) vectors.
fn len_live(&self) -> usize;
/// Whether the index is empty.
fn is_empty(&self) -> bool {
self.len_live() == 0
}
/// Ratio of tombstoned vectors to total vectors.
fn tombstone_ratio(&self) -> f64 {
if self.len() == 0 {
0.0
} else {
(self.len() - self.len_live()) as f64 / self.len() as f64
}
}
}
BruteForceIndex
// === storage/vector/brute.rs ===
use std::collections::HashMap;
use std::sync::RwLock;
use std::path::Path;
use std::io::{Read, Write, BufReader, BufWriter};
use std::fs::File;
use super::{VectorIndex, VectorId, VectorSearchResult, VectorIndexConfig, VectorError};
/// Exact nearest-neighbor search via linear scan.
///
/// Used for:
/// 1. Correctness verification (recall measurement against HNSW).
/// 2. Small datasets (< 10,000 vectors where brute-force is faster).
/// 3. Pre-filter fallback (adaptive query planner uses brute-force for
/// very selective filters where the filtered set is small).
pub struct BruteForceIndex {
vectors: RwLock<HashMap<VectorId, Vec<f32>>>,
config: VectorIndexConfig,
}
impl BruteForceIndex {
pub fn new(config: VectorIndexConfig) -> Self;
/// Number of vectors (HashMap length).
fn vector_count(&self) -> usize;
}
Search implementation:
- Acquire read lock on
vectors - Compute L2 squared distance between query and every stored vector
- Collect
(VectorId, f32)pairs into a Vec - Sort by ascending distance
- Take first
kresults - Return as
Vec<VectorSearchResult>
L2 squared distance function:
/// Compute L2 squared distance between two vectors of equal length.
///
/// For L2-normalized vectors, this is equivalent to `2 - 2 * cos(a, b)`.
/// Returns sum of squared differences.
pub(crate) fn l2_distance_sq(a: &[f32], b: &[f32]) -> f32 {
debug_assert_eq!(a.len(), b.len());
a.iter()
.zip(b.iter())
.map(|(x, y)| {
let d = x - y;
d * d
})
.sum()
}
Persistence (save/load/view):
BruteForceIndex uses a simple binary format for test persistence:
Header:
[magic: 4 bytes "BFVI"]
[version: 1 byte (0x01)]
[dimensions: 4 bytes LE]
[count: 8 bytes LE]
Per vector:
[id: 8 bytes LE]
[vector: dimensions * 4 bytes, f32 LE]
view() loads the same file as load() (brute-force has no mmap mode -- it is always in-memory). This is acceptable because BruteForceIndex is not the production backend.
Filtered search: Same as search() but skips vectors where filter(id) == false before adding to the distance computation. This means brute-force filtered search only computes distances for vectors passing the filter, which is why it is fast for very selective filters.
MockVectorIndex
/// Configurable mock for unit tests.
///
/// Returns predetermined results from search calls and records all method
/// invocations for verification.
pub struct MockVectorIndex {
search_results: RwLock<Vec<Vec<VectorSearchResult>>>,
call_log: RwLock<Vec<VectorIndexCall>>,
config: VectorIndexConfig,
inserted_count: RwLock<usize>,
}
#[derive(Debug, Clone)]
pub enum VectorIndexCall {
Insert { id: VectorId },
Delete { id: VectorId },
Search { k: usize, ef_search: usize },
FilteredSearch { k: usize, ef_search: usize },
Reserve { additional: usize },
Save,
Load,
View,
}
impl MockVectorIndex {
/// Create a mock with predetermined search results.
///
/// Each call to `search()` or `filtered_search()` pops the first element
/// from `search_results`. If empty, returns an empty Vec.
pub fn new(config: VectorIndexConfig, search_results: Vec<Vec<VectorSearchResult>>) -> Self;
/// Get the recorded call log.
pub fn calls(&self) -> Vec<VectorIndexCall>;
/// Clear the call log.
pub fn clear_calls(&self);
}
Error Handling
insert()with wrong dimensions: returnsVectorError::DimensionMismatch { expected, got }.search()with wrong query dimensions: returnsVectorError::DimensionMismatch.delete()for unknown ID: returnsVectorError::NotFound { id }.save()/load()I/O failures: returnsVectorError::Io(e).load()with corrupt file: returnsVectorError::CorruptedIndex(msg).
Test Strategy
Property Tests
use proptest::prelude::*;
// Insert + search roundtrip: every inserted vector is retrievable.
proptest! {
#[test]
fn insert_search_roundtrip(
dim in 2usize..64,
n_vectors in 1usize..200,
k in 1usize..50,
) {
let k = k.min(n_vectors);
let config = VectorIndexConfig {
dimensions: dim,
..VectorIndexConfig::default()
};
let index = BruteForceIndex::new(config);
// Insert random unit vectors
let mut rng = proptest::test_runner::TestRng::deterministic_rng(
proptest::test_runner::RngAlgorithm::ChaCha
);
for id in 0..n_vectors as u64 {
let v: Vec<f32> = (0..dim).map(|_| rng.gen::<f32>() - 0.5).collect();
let norm: f32 = v.iter().map(|x| x * x).sum::<f32>().sqrt();
let unit: Vec<f32> = v.iter().map(|x| x / norm).collect();
index.insert(id, &unit).unwrap();
}
// Search for each inserted vector: it should be the top-1 result
for id in 0..n_vectors as u64 {
// Note: test must be in the module (or use pub(crate) vectors field) to access this private field.
let v = index.vectors.read().unwrap()[&id].clone();
let results = index.search(&v, 1, 0).unwrap();
prop_assert!(!results.is_empty());
prop_assert_eq!(results[0].id, id);
prop_assert!(results[0].distance < 1e-6, "self-search should return distance ~0");
}
}
}
// Delete excludes tombstoned IDs from search results.
proptest! {
#[test]
fn delete_excludes_from_results(
dim in 2usize..32,
n_vectors in 5usize..100,
) {
let config = VectorIndexConfig {
dimensions: dim,
..VectorIndexConfig::default()
};
let index = BruteForceIndex::new(config);
// Insert vectors
let vectors: Vec<Vec<f32>> = (0..n_vectors).map(|_| {
let v: Vec<f32> = (0..dim).map(|i| ((i * 7 + 13) % 100) as f32 / 100.0 - 0.5).collect();
let norm: f32 = v.iter().map(|x| x * x).sum::<f32>().sqrt();
v.iter().map(|x| x / norm).collect()
}).collect();
for (id, v) in vectors.iter().enumerate() {
index.insert(id as u64, v).unwrap();
}
// Delete the first vector
index.delete(0).unwrap();
// Search should not return deleted ID
let query = &vectors[0];
let results = index.search(query, n_vectors, 0).unwrap();
prop_assert!(results.iter().all(|r| r.id != 0),
"deleted vector should not appear in results");
prop_assert_eq!(results.len(), n_vectors - 1);
}
}
// filtered_search honors all predicates.
proptest! {
#[test]
fn filtered_search_honors_predicate(
dim in 2usize..32,
n_vectors in 10usize..100,
k in 1usize..20,
) {
let k = k.min(n_vectors / 2);
let config = VectorIndexConfig {
dimensions: dim,
..VectorIndexConfig::default()
};
let index = BruteForceIndex::new(config);
for id in 0..n_vectors as u64 {
let v: Vec<f32> = (0..dim).map(|i| ((id as usize * 3 + i * 7) % 100) as f32 / 100.0).collect();
let norm: f32 = v.iter().map(|x| x * x).sum::<f32>().sqrt();
let unit: Vec<f32> = v.iter().map(|x| x / norm).collect();
index.insert(id, &unit).unwrap();
}
// Filter: only even IDs
let predicate = |id: VectorId| id % 2 == 0;
let query: Vec<f32> = (0..dim).map(|i| (i as f32) / dim as f32).collect();
let norm: f32 = query.iter().map(|x| x * x).sum::<f32>().sqrt();
let unit_query: Vec<f32> = query.iter().map(|x| x / norm).collect();
let results = index.filtered_search(&unit_query, k, 0, &predicate).unwrap();
for r in &results {
prop_assert!(r.id % 2 == 0,
"filtered_search returned odd ID {}", r.id);
}
}
}
// Search results are sorted by ascending distance.
proptest! {
#[test]
fn results_sorted_by_distance(
dim in 2usize..32,
n_vectors in 5usize..100,
k in 2usize..50,
) {
let k = k.min(n_vectors);
let config = VectorIndexConfig {
dimensions: dim,
..VectorIndexConfig::default()
};
let index = BruteForceIndex::new(config);
for id in 0..n_vectors as u64 {
let v: Vec<f32> = (0..dim).map(|i| ((id as usize + i) % 100) as f32 / 100.0).collect();
let norm: f32 = v.iter().map(|x| x * x).sum::<f32>().sqrt();
let unit: Vec<f32> = v.iter().map(|x| x / norm).collect();
index.insert(id, &unit).unwrap();
}
let query: Vec<f32> = vec![1.0 / (dim as f32).sqrt(); dim];
let results = index.search(&query, k, 0).unwrap();
for w in results.windows(2) {
prop_assert!(w[0].distance <= w[1].distance,
"results not sorted: {} > {}", w[0].distance, w[1].distance);
}
}
}
Unit Tests
#[test]
fn brute_force_new_is_empty() {
let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
let index = BruteForceIndex::new(config);
assert_eq!(index.len(), 0);
assert_eq!(index.len_live(), 0);
assert!(index.is_empty());
assert!((index.tombstone_ratio() - 0.0).abs() < f64::EPSILON);
}
#[test]
fn brute_force_insert_and_len() {
let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
let index = BruteForceIndex::new(config);
index.insert(1, &[1.0, 0.0, 0.0]).unwrap();
index.insert(2, &[0.0, 1.0, 0.0]).unwrap();
assert_eq!(index.len(), 2);
assert_eq!(index.len_live(), 2);
assert!(!index.is_empty());
}
#[test]
fn brute_force_dimension_mismatch() {
let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
let index = BruteForceIndex::new(config);
let result = index.insert(1, &[1.0, 0.0]); // 2 dims instead of 3
assert!(matches!(result, Err(VectorError::DimensionMismatch { expected: 3, got: 2 })));
}
#[test]
fn brute_force_search_dimension_mismatch() {
let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
let index = BruteForceIndex::new(config);
index.insert(1, &[1.0, 0.0, 0.0]).unwrap();
let result = index.search(&[1.0, 0.0], 1, 0); // 2 dims query
assert!(matches!(result, Err(VectorError::DimensionMismatch { .. })));
}
#[test]
fn brute_force_self_search_distance_zero() {
let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
let index = BruteForceIndex::new(config);
let v = [1.0, 0.0, 0.0];
index.insert(42, &v).unwrap();
let results = index.search(&v, 1, 0).unwrap();
assert_eq!(results.len(), 1);
assert_eq!(results[0].id, 42);
assert!(results[0].distance < 1e-6);
}
#[test]
fn brute_force_search_empty_index() {
let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
let index = BruteForceIndex::new(config);
let results = index.search(&[1.0, 0.0, 0.0], 10, 0).unwrap();
assert!(results.is_empty());
}
#[test]
fn brute_force_search_k_larger_than_index() {
let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
let index = BruteForceIndex::new(config);
index.insert(1, &[1.0, 0.0, 0.0]).unwrap();
index.insert(2, &[0.0, 1.0, 0.0]).unwrap();
let results = index.search(&[1.0, 0.0, 0.0], 100, 0).unwrap();
assert_eq!(results.len(), 2); // returns all available, not error
}
#[test]
fn brute_force_orthogonal_vectors_distance() {
// For unit vectors a, b: ||a - b||^2 = 2 - 2*cos(a,b)
// Orthogonal unit vectors: cos = 0, so distance = 2.0
let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
let index = BruteForceIndex::new(config);
index.insert(1, &[1.0, 0.0, 0.0]).unwrap();
let results = index.search(&[0.0, 1.0, 0.0], 1, 0).unwrap();
assert!((results[0].distance - 2.0).abs() < 1e-5,
"orthogonal unit vectors should have L2^2 distance of 2.0, got {}", results[0].distance);
}
#[test]
fn brute_force_identical_vectors_distance() {
let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
let index = BruteForceIndex::new(config);
let v = [0.577_350_3, 0.577_350_3, 0.577_350_3]; // unit vector
index.insert(1, &v).unwrap();
let results = index.search(&v, 1, 0).unwrap();
assert!(results[0].distance < 1e-6);
}
#[test]
fn brute_force_delete_and_search() {
let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
let index = BruteForceIndex::new(config);
index.insert(1, &[1.0, 0.0, 0.0]).unwrap();
index.insert(2, &[0.0, 1.0, 0.0]).unwrap();
index.insert(3, &[0.0, 0.0, 1.0]).unwrap();
index.delete(2).unwrap();
assert_eq!(index.len(), 2); // BruteForce does true delete
assert_eq!(index.len_live(), 2);
let results = index.search(&[0.0, 1.0, 0.0], 10, 0).unwrap();
assert!(results.iter().all(|r| r.id != 2));
}
#[test]
fn brute_force_delete_not_found() {
let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
let index = BruteForceIndex::new(config);
let result = index.delete(999);
assert!(matches!(result, Err(VectorError::NotFound { id: 999 })));
}
#[test]
fn brute_force_insert_replaces_existing() {
let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
let index = BruteForceIndex::new(config);
index.insert(1, &[1.0, 0.0, 0.0]).unwrap();
index.insert(1, &[0.0, 1.0, 0.0]).unwrap(); // replace
assert_eq!(index.len(), 1); // still 1 vector
let results = index.search(&[0.0, 1.0, 0.0], 1, 0).unwrap();
assert_eq!(results[0].id, 1);
assert!(results[0].distance < 1e-6, "should match the replacement vector");
}
#[test]
fn brute_force_filtered_search_excludes_non_matching() {
let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
let index = BruteForceIndex::new(config);
for id in 0..10u64 {
let v = [1.0, 0.0, 0.0]; // all same direction
index.insert(id, &v).unwrap();
}
// Only include even IDs
let results = index.filtered_search(&[1.0, 0.0, 0.0], 10, 0, &|id| id % 2 == 0).unwrap();
assert_eq!(results.len(), 5);
assert!(results.iter().all(|r| r.id % 2 == 0));
}
#[test]
fn brute_force_filtered_search_empty_result() {
let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
let index = BruteForceIndex::new(config);
index.insert(1, &[1.0, 0.0, 0.0]).unwrap();
// Predicate that matches nothing
let results = index.filtered_search(&[1.0, 0.0, 0.0], 10, 0, &|_| false).unwrap();
assert!(results.is_empty());
}
#[test]
fn brute_force_save_load_roundtrip() {
let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
let index = BruteForceIndex::new(config.clone());
index.insert(1, &[1.0, 0.0, 0.0]).unwrap();
index.insert(2, &[0.0, 1.0, 0.0]).unwrap();
let dir = tempfile::tempdir().unwrap();
let path = dir.path().join("test.bfvi");
index.save(&path).unwrap();
let loaded = BruteForceIndex::load(&path, &config).unwrap();
assert_eq!(loaded.len(), 2);
// Search should produce identical results
let results_orig = index.search(&[1.0, 0.0, 0.0], 2, 0).unwrap();
let results_loaded = loaded.search(&[1.0, 0.0, 0.0], 2, 0).unwrap();
assert_eq!(results_orig.len(), results_loaded.len());
for (a, b) in results_orig.iter().zip(results_loaded.iter()) {
assert_eq!(a.id, b.id);
assert!((a.distance - b.distance).abs() < 1e-6);
}
}
#[test]
fn brute_force_reserve_is_noop() {
// BruteForce uses HashMap, which resizes automatically.
// reserve() is a noop but must not error.
let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
let index = BruteForceIndex::new(config);
assert!(index.reserve(1_000_000).is_ok());
}
#[test]
fn l2_distance_sq_correctness() {
let a = [1.0, 0.0, 0.0];
let b = [0.0, 1.0, 0.0];
let dist = l2_distance_sq(&a, &b);
assert!((dist - 2.0).abs() < 1e-6);
let c = [1.0, 0.0, 0.0];
assert!(l2_distance_sq(&a, &c) < 1e-6);
}
#[test]
fn mock_vector_index_returns_predetermined() {
let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
let results = vec![
vec![VectorSearchResult { id: 42, distance: 0.1 }],
vec![VectorSearchResult { id: 99, distance: 0.5 }],
];
let mock = MockVectorIndex::new(config, results);
let r1 = mock.search(&[1.0, 0.0, 0.0], 1, 0).unwrap();
assert_eq!(r1[0].id, 42);
let r2 = mock.search(&[0.0, 1.0, 0.0], 1, 0).unwrap();
assert_eq!(r2[0].id, 99);
// Third call: no more results, returns empty
let r3 = mock.search(&[0.0, 0.0, 1.0], 1, 0).unwrap();
assert!(r3.is_empty());
}
#[test]
fn mock_vector_index_records_calls() {
let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
let mock = MockVectorIndex::new(config, vec![]);
mock.insert(1, &[1.0, 0.0, 0.0]).unwrap();
mock.delete(1).unwrap();
mock.search(&[1.0, 0.0, 0.0], 10, 200).unwrap();
mock.filtered_search(&[1.0, 0.0, 0.0], 5, 0, &|_| true).unwrap();
let calls = mock.calls();
assert_eq!(calls.len(), 4);
assert!(matches!(calls[0], VectorIndexCall::Insert { id: 1 }));
assert!(matches!(calls[1], VectorIndexCall::Delete { id: 1 }));
assert!(matches!(calls[2], VectorIndexCall::Search { k: 10, ef_search: 200 }));
assert!(matches!(calls[3], VectorIndexCall::FilteredSearch { k: 5, ef_search: 0 }));
}
#[test]
fn vector_index_is_send_and_sync() {
fn assert_send_sync<T: Send + Sync>() {}
assert_send_sync::<BruteForceIndex>();
assert_send_sync::<MockVectorIndex>();
}
#[test]
fn vector_index_config_defaults() {
let config = VectorIndexConfig::default();
assert_eq!(config.dimensions, 1536);
assert_eq!(config.metric, DistanceMetric::L2);
assert_eq!(config.quantization, QuantizationLevel::F16);
assert_eq!(config.connectivity, 16);
assert_eq!(config.ef_construction, 200);
assert_eq!(config.ef_search, 200);
}
Acceptance Criteria
VectorIndextrait with all methods from Spec 07, Section 11VectorIndex: Send + SyncboundVectorId = u64type aliasVectorSearchResult,VectorIndexConfig,DistanceMetric,QuantizationLevel,VectorErrortypes with correct derivesVectorIndexConfig::default()returns dimensions=1536, L2, F16, M=16, ef_construction=200, ef_search=200VectorErrorimplementsDisplay,Error,From<std::io::Error>l2_distance_sq()computes correct L2 squared distanceBruteForceIndex::search()returns exact nearest neighbors sorted by ascending distanceBruteForceIndex::filtered_search()returns only results wherefilter(id) == trueBruteForceIndex::insert()validates dimensions and rejects mismatchesBruteForceIndex::insert()replaces existing vectors with the same IDBruteForceIndex::delete()removes vectors; they never appear in search resultsBruteForceIndex::delete()returnsNotFoundfor unknown IDsBruteForceIndex::save()andload()roundtrip produces identical search resultsMockVectorIndexreturns predetermined results and records call history- All property tests pass: insert+search roundtrip, delete exclusion, filtered_search predicate honor, result ordering
BruteForceIndexandMockVectorIndexareSend + Sync- No
unsafecode cargo clippy -- -D warningspasses- All property tests and unit tests pass
Research References
- docs/research/ann_for_tidaldb.md -- Section "Implementation recommendation: wrap USearch, build the planner": "A
BruteForceIndexexists for correctness verification and small-dataset deployments", brute-force breakeven point (~2,000-5,000 vectors)
Spec References
- docs/specs/07-vector-retrieval.md -- Section 11 (VectorIndex trait: full API signatures, VectorError variants, BruteForceIndex implementation sketch, MockVectorIndex), Section 12 (performance targets), Section 13 (invariants 1-3: insert retrievability, delete exclusion, filtered_search predicate compliance)
Implementation Notes
- Add
pub mod vector;totidal/src/storage/mod.rs. The vector module is a submodule of storage because vector indexes are a storage concern (persistence, key encoding, entity store integration). BruteForceIndexuses true deletion (HashMap::remove), not lazy tombstoning. This meanslen()andlen_live()always return the same value. Thetombstone_ratio()default implementation handles this correctly (returns 0.0). USearch (Task 02) uses lazy tombstoning, wherelen() > len_live().- The
ef_searchparameter is ignored byBruteForceIndex(exact search has no beam width). It is accepted for trait compliance but unused. view()forBruteForceIndexdelegates toload()since there is no mmap mode. This is documented on the method.reserve()forBruteForceIndexis a no-op since HashMap resizes automatically. This is documented on the method.- Do NOT add the
usearchcrate dependency in this task. That is Task 02. - Do NOT implement
l2_normalize()in this task. That is Task 03 (Embedding Lifecycle). - Do NOT implement the adaptive query planner in this task. That is Task 04.
- The
l2_distance_sq()function ispub(crate)-- it is used byBruteForceIndexand by Task 04's planner for brute-force fallback. It is not a public API.