# Task 01: VectorIndex Trait + BruteForceIndex ## Context **Milestone:** 2 -- Ranked Retrieval **Phase:** m2p1 -- Vector Index Integration (USearch) **Depends On:** None (uses types from m1p1 but no m2p1 tasks) **Blocks:** Task 02 (USearch Backend), Task 03 (Embedding Lifecycle), Task 04 (Adaptive Query Planner) **Complexity:** M ## Objective Deliver the `VectorIndex` trait -- the public interface for all ANN operations in tidalDB -- along with the full type system for vector search (`VectorId`, `VectorSearchResult`, `VectorIndexConfig`, `DistanceMetric`, `QuantizationLevel`, `VectorError`) and two pure-Rust implementations: `BruteForceIndex` (exact linear-scan search) and `MockVectorIndex` (predetermined results for unit tests). The `VectorIndex` trait is the abstraction boundary. No module outside `storage/vector/` will ever know whether USearch, hnsw_rs, or brute-force is behind it. This is the same pattern as `StorageEngine` in m1p3: define the trait first, implement brute-force for correctness, then add the production backend in the next task. `BruteForceIndex` is not a throwaway. It serves three permanent roles: 1. **Correctness oracle** -- recall measurements compare HNSW results against `BruteForceIndex` exact results. 2. **Small datasets** -- when the index has fewer than ~10,000 vectors, brute-force is faster than HNSW because there is no graph construction overhead. 3. **Pre-filter fallback** -- the adaptive query planner (Task 04) uses `BruteForceIndex`-style linear scan over bitmap-filtered candidate sets when selectivity < 1%. No unsafe code in this task. Pure Rust throughout. ## Requirements - `VectorIndex` trait: `insert`, `search`, `filtered_search`, `delete`, `reserve`, `save`, `load`, `view`, `len`, `len_live`, `is_empty`, `tombstone_ratio` - All trait methods match the signatures in Spec 07, Section 11 - `VectorIndex: Send + Sync` bound - `VectorId = u64` type alias - `VectorSearchResult { id: VectorId, distance: f32 }` with `Debug`, `Clone` - `VectorIndexConfig` with all HNSW parameters - `DistanceMetric` enum: `L2`, `InnerProduct` - `QuantizationLevel` enum: `F32`, `F16`, `Int8` - `VectorError` enum with `Display`, `Debug`, `From` - `BruteForceIndex`: `RwLock>>` for storage, linear scan for search - `BruteForceIndex::search` returns results sorted by ascending L2 squared distance - `BruteForceIndex::filtered_search` applies predicate during linear scan, returns only matching results - `BruteForceIndex::delete` removes the vector from the HashMap (true delete, not tombstone) - `BruteForceIndex::save`/`load`/`view` use a simple binary format for test persistence - `MockVectorIndex`: predetermined results, call recording for test assertions - No `unsafe` code ## Technical Design ### Module Structure ``` tidal/src/storage/vector/ mod.rs -- VectorIndex trait, all types, re-exports brute.rs -- BruteForceIndex, MockVectorIndex ``` ### Public API ```rust // === storage/vector/mod.rs === use std::path::Path; /// A unique identifier for an entity in the vector index. /// Corresponds to the u64 representation of the application-provided entity ID. pub type VectorId = u64; /// A scored search result from the vector index. #[derive(Debug, Clone)] pub struct VectorSearchResult { /// Entity ID in the vector index. pub id: VectorId, /// L2 squared distance from query vector. Lower = more similar. /// For L2-normalized vectors, range is [0.0, 4.0] where 0.0 = identical. pub distance: f32, } /// Configuration for vector index construction. #[derive(Debug, Clone)] pub struct VectorIndexConfig { /// Number of dimensions per vector. pub dimensions: usize, /// Distance metric. pub metric: DistanceMetric, /// Quantization level for stored vectors. pub quantization: QuantizationLevel, /// Maximum connections per node per layer (M parameter). Default: 16. pub connectivity: usize, /// Beam width during index construction. Default: 200. pub ef_construction: usize, /// Default beam width during search (overridable per query). Default: 200. pub ef_search: usize, } impl Default for VectorIndexConfig { fn default() -> Self { Self { dimensions: 1536, metric: DistanceMetric::L2, quantization: QuantizationLevel::F16, connectivity: 16, ef_construction: 200, ef_search: 200, } } } #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub enum DistanceMetric { /// L2 squared distance. Default for cosine over normalized vectors. L2, /// Inner product. For MIPS workloads (with XBOX transformation). InnerProduct, } #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub enum QuantizationLevel { /// Full precision (4 bytes per dimension). F32, /// Half precision (2 bytes per dimension). Default. F16, /// Scalar quantization (1 byte per dimension). Int8, } /// Errors from vector index operations. #[derive(Debug)] pub enum VectorError { /// Vector dimensions do not match index configuration. DimensionMismatch { expected: usize, got: usize }, /// Index is at capacity and cannot accept more vectors. CapacityExceeded { capacity: usize }, /// Vector ID not found in the index. NotFound { id: VectorId }, /// I/O error during persistence. Io(std::io::Error), /// Index file is corrupted or incompatible. CorruptedIndex(String), /// USearch or backend-specific error. Backend(String), /// Vector has zero L2 norm and cannot be normalized. ZeroNormVector, } // Note: `ZeroNormVector` is not in Spec 07 Section 11 but is required by `l2_normalize()` in Task 03. Spec 07 should be updated to include it. impl std::fmt::Display for VectorError { /* variant-specific messages */ } impl std::error::Error for VectorError {} impl From for VectorError { /* wraps as VectorError::Io */ } /// The vector index trait. All ANN operations go through this interface. /// /// Implementations must be `Send + Sync` for concurrent search + insert. /// /// # Contract /// /// - Vectors passed to `insert()` must already be L2-normalized. The trait /// does not normalize -- the caller (embedding lifecycle, Task 03) is /// responsible for normalization before insertion. /// - `search()` and `filtered_search()` return results sorted by ascending /// distance (most similar first). /// - `delete()` marks a vector as tombstoned. Tombstoned vectors are excluded /// from search results but may remain in the index structure. pub trait VectorIndex: Send + Sync { /// Insert a vector into the index. /// /// If a vector with this ID already exists, it is replaced (delete + insert). /// /// # Errors /// /// - `VectorError::CapacityExceeded` if the index is full. /// - `VectorError::DimensionMismatch` if `embedding.len() != config.dimensions`. fn insert(&self, id: VectorId, embedding: &[f32]) -> Result<(), VectorError>; /// Search for the K nearest neighbors to the query vector. /// /// Results are ordered by ascending distance (most similar first). /// /// # Arguments /// /// * `query` -- The query vector. Must be L2-normalized. /// * `k` -- Number of results to return. /// * `ef_search` -- Beam width override. If 0, uses the index default. fn search( &self, query: &[f32], k: usize, ef_search: usize, ) -> Result, VectorError>; /// Search for the K nearest neighbors that satisfy a filter predicate. /// /// The predicate is evaluated during traversal. Nodes failing the predicate /// are used for navigation but excluded from results (in-graph filtering). /// /// # Arguments /// /// * `query` -- The query vector. Must be L2-normalized. /// * `k` -- Number of results to return. /// * `ef_search` -- Beam width override. If 0, uses the index default. /// * `filter` -- Predicate per candidate node. Return `true` to include. fn filtered_search( &self, query: &[f32], k: usize, ef_search: usize, filter: &dyn Fn(VectorId) -> bool, ) -> Result, VectorError>; /// Remove a vector from the index (lazy tombstone). /// /// # Errors /// /// - `VectorError::NotFound` if the ID is not in the index. fn delete(&self, id: VectorId) -> Result<(), VectorError>; /// Reserve capacity for at least `additional` more vectors. fn reserve(&self, additional: usize) -> Result<(), VectorError>; /// Persist the index to disk. fn save(&self, path: &Path) -> Result<(), VectorError>; /// Load an index from disk into writable memory. fn load(path: &Path, config: &VectorIndexConfig) -> Result where Self: Sized; /// Memory-map an index from disk for read-only access. // config required by USearch to initialize the mmap'd index with correct parameters fn view(path: &Path, config: &VectorIndexConfig) -> Result where Self: Sized; /// Number of vectors in the index (including tombstoned). fn len(&self) -> usize; /// Number of live (non-tombstoned) vectors. fn len_live(&self) -> usize; /// Whether the index is empty. fn is_empty(&self) -> bool { self.len_live() == 0 } /// Ratio of tombstoned vectors to total vectors. fn tombstone_ratio(&self) -> f64 { if self.len() == 0 { 0.0 } else { (self.len() - self.len_live()) as f64 / self.len() as f64 } } } ``` ### BruteForceIndex ```rust // === storage/vector/brute.rs === use std::collections::HashMap; use std::sync::RwLock; use std::path::Path; use std::io::{Read, Write, BufReader, BufWriter}; use std::fs::File; use super::{VectorIndex, VectorId, VectorSearchResult, VectorIndexConfig, VectorError}; /// Exact nearest-neighbor search via linear scan. /// /// Used for: /// 1. Correctness verification (recall measurement against HNSW). /// 2. Small datasets (< 10,000 vectors where brute-force is faster). /// 3. Pre-filter fallback (adaptive query planner uses brute-force for /// very selective filters where the filtered set is small). pub struct BruteForceIndex { vectors: RwLock>>, config: VectorIndexConfig, } impl BruteForceIndex { pub fn new(config: VectorIndexConfig) -> Self; /// Number of vectors (HashMap length). fn vector_count(&self) -> usize; } ``` **Search implementation:** - Acquire read lock on `vectors` - Compute L2 squared distance between query and every stored vector - Collect `(VectorId, f32)` pairs into a Vec - Sort by ascending distance - Take first `k` results - Return as `Vec` **L2 squared distance function:** ```rust /// Compute L2 squared distance between two vectors of equal length. /// /// For L2-normalized vectors, this is equivalent to `2 - 2 * cos(a, b)`. /// Returns sum of squared differences. pub(crate) fn l2_distance_sq(a: &[f32], b: &[f32]) -> f32 { debug_assert_eq!(a.len(), b.len()); a.iter() .zip(b.iter()) .map(|(x, y)| { let d = x - y; d * d }) .sum() } ``` **Persistence (save/load/view):** `BruteForceIndex` uses a simple binary format for test persistence: ``` Header: [magic: 4 bytes "BFVI"] [version: 1 byte (0x01)] [dimensions: 4 bytes LE] [count: 8 bytes LE] Per vector: [id: 8 bytes LE] [vector: dimensions * 4 bytes, f32 LE] ``` `view()` loads the same file as `load()` (brute-force has no mmap mode -- it is always in-memory). This is acceptable because `BruteForceIndex` is not the production backend. **Filtered search:** Same as `search()` but skips vectors where `filter(id) == false` before adding to the distance computation. This means brute-force filtered search only computes distances for vectors passing the filter, which is why it is fast for very selective filters. ### MockVectorIndex ```rust /// Configurable mock for unit tests. /// /// Returns predetermined results from search calls and records all method /// invocations for verification. pub struct MockVectorIndex { search_results: RwLock>>, call_log: RwLock>, config: VectorIndexConfig, inserted_count: RwLock, } #[derive(Debug, Clone)] pub enum VectorIndexCall { Insert { id: VectorId }, Delete { id: VectorId }, Search { k: usize, ef_search: usize }, FilteredSearch { k: usize, ef_search: usize }, Reserve { additional: usize }, Save, Load, View, } impl MockVectorIndex { /// Create a mock with predetermined search results. /// /// Each call to `search()` or `filtered_search()` pops the first element /// from `search_results`. If empty, returns an empty Vec. pub fn new(config: VectorIndexConfig, search_results: Vec>) -> Self; /// Get the recorded call log. pub fn calls(&self) -> Vec; /// Clear the call log. pub fn clear_calls(&self); } ``` ### Error Handling - `insert()` with wrong dimensions: returns `VectorError::DimensionMismatch { expected, got }`. - `search()` with wrong query dimensions: returns `VectorError::DimensionMismatch`. - `delete()` for unknown ID: returns `VectorError::NotFound { id }`. - `save()`/`load()` I/O failures: returns `VectorError::Io(e)`. - `load()` with corrupt file: returns `VectorError::CorruptedIndex(msg)`. ## Test Strategy ### Property Tests ```rust use proptest::prelude::*; // Insert + search roundtrip: every inserted vector is retrievable. proptest! { #[test] fn insert_search_roundtrip( dim in 2usize..64, n_vectors in 1usize..200, k in 1usize..50, ) { let k = k.min(n_vectors); let config = VectorIndexConfig { dimensions: dim, ..VectorIndexConfig::default() }; let index = BruteForceIndex::new(config); // Insert random unit vectors let mut rng = proptest::test_runner::TestRng::deterministic_rng( proptest::test_runner::RngAlgorithm::ChaCha ); for id in 0..n_vectors as u64 { let v: Vec = (0..dim).map(|_| rng.gen::() - 0.5).collect(); let norm: f32 = v.iter().map(|x| x * x).sum::().sqrt(); let unit: Vec = v.iter().map(|x| x / norm).collect(); index.insert(id, &unit).unwrap(); } // Search for each inserted vector: it should be the top-1 result for id in 0..n_vectors as u64 { // Note: test must be in the module (or use pub(crate) vectors field) to access this private field. let v = index.vectors.read().unwrap()[&id].clone(); let results = index.search(&v, 1, 0).unwrap(); prop_assert!(!results.is_empty()); prop_assert_eq!(results[0].id, id); prop_assert!(results[0].distance < 1e-6, "self-search should return distance ~0"); } } } // Delete excludes tombstoned IDs from search results. proptest! { #[test] fn delete_excludes_from_results( dim in 2usize..32, n_vectors in 5usize..100, ) { let config = VectorIndexConfig { dimensions: dim, ..VectorIndexConfig::default() }; let index = BruteForceIndex::new(config); // Insert vectors let vectors: Vec> = (0..n_vectors).map(|_| { let v: Vec = (0..dim).map(|i| ((i * 7 + 13) % 100) as f32 / 100.0 - 0.5).collect(); let norm: f32 = v.iter().map(|x| x * x).sum::().sqrt(); v.iter().map(|x| x / norm).collect() }).collect(); for (id, v) in vectors.iter().enumerate() { index.insert(id as u64, v).unwrap(); } // Delete the first vector index.delete(0).unwrap(); // Search should not return deleted ID let query = &vectors[0]; let results = index.search(query, n_vectors, 0).unwrap(); prop_assert!(results.iter().all(|r| r.id != 0), "deleted vector should not appear in results"); prop_assert_eq!(results.len(), n_vectors - 1); } } // filtered_search honors all predicates. proptest! { #[test] fn filtered_search_honors_predicate( dim in 2usize..32, n_vectors in 10usize..100, k in 1usize..20, ) { let k = k.min(n_vectors / 2); let config = VectorIndexConfig { dimensions: dim, ..VectorIndexConfig::default() }; let index = BruteForceIndex::new(config); for id in 0..n_vectors as u64 { let v: Vec = (0..dim).map(|i| ((id as usize * 3 + i * 7) % 100) as f32 / 100.0).collect(); let norm: f32 = v.iter().map(|x| x * x).sum::().sqrt(); let unit: Vec = v.iter().map(|x| x / norm).collect(); index.insert(id, &unit).unwrap(); } // Filter: only even IDs let predicate = |id: VectorId| id % 2 == 0; let query: Vec = (0..dim).map(|i| (i as f32) / dim as f32).collect(); let norm: f32 = query.iter().map(|x| x * x).sum::().sqrt(); let unit_query: Vec = query.iter().map(|x| x / norm).collect(); let results = index.filtered_search(&unit_query, k, 0, &predicate).unwrap(); for r in &results { prop_assert!(r.id % 2 == 0, "filtered_search returned odd ID {}", r.id); } } } // Search results are sorted by ascending distance. proptest! { #[test] fn results_sorted_by_distance( dim in 2usize..32, n_vectors in 5usize..100, k in 2usize..50, ) { let k = k.min(n_vectors); let config = VectorIndexConfig { dimensions: dim, ..VectorIndexConfig::default() }; let index = BruteForceIndex::new(config); for id in 0..n_vectors as u64 { let v: Vec = (0..dim).map(|i| ((id as usize + i) % 100) as f32 / 100.0).collect(); let norm: f32 = v.iter().map(|x| x * x).sum::().sqrt(); let unit: Vec = v.iter().map(|x| x / norm).collect(); index.insert(id, &unit).unwrap(); } let query: Vec = vec![1.0 / (dim as f32).sqrt(); dim]; let results = index.search(&query, k, 0).unwrap(); for w in results.windows(2) { prop_assert!(w[0].distance <= w[1].distance, "results not sorted: {} > {}", w[0].distance, w[1].distance); } } } ``` ### Unit Tests ```rust #[test] fn brute_force_new_is_empty() { let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() }; let index = BruteForceIndex::new(config); assert_eq!(index.len(), 0); assert_eq!(index.len_live(), 0); assert!(index.is_empty()); assert!((index.tombstone_ratio() - 0.0).abs() < f64::EPSILON); } #[test] fn brute_force_insert_and_len() { let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() }; let index = BruteForceIndex::new(config); index.insert(1, &[1.0, 0.0, 0.0]).unwrap(); index.insert(2, &[0.0, 1.0, 0.0]).unwrap(); assert_eq!(index.len(), 2); assert_eq!(index.len_live(), 2); assert!(!index.is_empty()); } #[test] fn brute_force_dimension_mismatch() { let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() }; let index = BruteForceIndex::new(config); let result = index.insert(1, &[1.0, 0.0]); // 2 dims instead of 3 assert!(matches!(result, Err(VectorError::DimensionMismatch { expected: 3, got: 2 }))); } #[test] fn brute_force_search_dimension_mismatch() { let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() }; let index = BruteForceIndex::new(config); index.insert(1, &[1.0, 0.0, 0.0]).unwrap(); let result = index.search(&[1.0, 0.0], 1, 0); // 2 dims query assert!(matches!(result, Err(VectorError::DimensionMismatch { .. }))); } #[test] fn brute_force_self_search_distance_zero() { let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() }; let index = BruteForceIndex::new(config); let v = [1.0, 0.0, 0.0]; index.insert(42, &v).unwrap(); let results = index.search(&v, 1, 0).unwrap(); assert_eq!(results.len(), 1); assert_eq!(results[0].id, 42); assert!(results[0].distance < 1e-6); } #[test] fn brute_force_search_empty_index() { let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() }; let index = BruteForceIndex::new(config); let results = index.search(&[1.0, 0.0, 0.0], 10, 0).unwrap(); assert!(results.is_empty()); } #[test] fn brute_force_search_k_larger_than_index() { let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() }; let index = BruteForceIndex::new(config); index.insert(1, &[1.0, 0.0, 0.0]).unwrap(); index.insert(2, &[0.0, 1.0, 0.0]).unwrap(); let results = index.search(&[1.0, 0.0, 0.0], 100, 0).unwrap(); assert_eq!(results.len(), 2); // returns all available, not error } #[test] fn brute_force_orthogonal_vectors_distance() { // For unit vectors a, b: ||a - b||^2 = 2 - 2*cos(a,b) // Orthogonal unit vectors: cos = 0, so distance = 2.0 let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() }; let index = BruteForceIndex::new(config); index.insert(1, &[1.0, 0.0, 0.0]).unwrap(); let results = index.search(&[0.0, 1.0, 0.0], 1, 0).unwrap(); assert!((results[0].distance - 2.0).abs() < 1e-5, "orthogonal unit vectors should have L2^2 distance of 2.0, got {}", results[0].distance); } #[test] fn brute_force_identical_vectors_distance() { let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() }; let index = BruteForceIndex::new(config); let v = [0.577_350_3, 0.577_350_3, 0.577_350_3]; // unit vector index.insert(1, &v).unwrap(); let results = index.search(&v, 1, 0).unwrap(); assert!(results[0].distance < 1e-6); } #[test] fn brute_force_delete_and_search() { let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() }; let index = BruteForceIndex::new(config); index.insert(1, &[1.0, 0.0, 0.0]).unwrap(); index.insert(2, &[0.0, 1.0, 0.0]).unwrap(); index.insert(3, &[0.0, 0.0, 1.0]).unwrap(); index.delete(2).unwrap(); assert_eq!(index.len(), 2); // BruteForce does true delete assert_eq!(index.len_live(), 2); let results = index.search(&[0.0, 1.0, 0.0], 10, 0).unwrap(); assert!(results.iter().all(|r| r.id != 2)); } #[test] fn brute_force_delete_not_found() { let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() }; let index = BruteForceIndex::new(config); let result = index.delete(999); assert!(matches!(result, Err(VectorError::NotFound { id: 999 }))); } #[test] fn brute_force_insert_replaces_existing() { let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() }; let index = BruteForceIndex::new(config); index.insert(1, &[1.0, 0.0, 0.0]).unwrap(); index.insert(1, &[0.0, 1.0, 0.0]).unwrap(); // replace assert_eq!(index.len(), 1); // still 1 vector let results = index.search(&[0.0, 1.0, 0.0], 1, 0).unwrap(); assert_eq!(results[0].id, 1); assert!(results[0].distance < 1e-6, "should match the replacement vector"); } #[test] fn brute_force_filtered_search_excludes_non_matching() { let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() }; let index = BruteForceIndex::new(config); for id in 0..10u64 { let v = [1.0, 0.0, 0.0]; // all same direction index.insert(id, &v).unwrap(); } // Only include even IDs let results = index.filtered_search(&[1.0, 0.0, 0.0], 10, 0, &|id| id % 2 == 0).unwrap(); assert_eq!(results.len(), 5); assert!(results.iter().all(|r| r.id % 2 == 0)); } #[test] fn brute_force_filtered_search_empty_result() { let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() }; let index = BruteForceIndex::new(config); index.insert(1, &[1.0, 0.0, 0.0]).unwrap(); // Predicate that matches nothing let results = index.filtered_search(&[1.0, 0.0, 0.0], 10, 0, &|_| false).unwrap(); assert!(results.is_empty()); } #[test] fn brute_force_save_load_roundtrip() { let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() }; let index = BruteForceIndex::new(config.clone()); index.insert(1, &[1.0, 0.0, 0.0]).unwrap(); index.insert(2, &[0.0, 1.0, 0.0]).unwrap(); let dir = tempfile::tempdir().unwrap(); let path = dir.path().join("test.bfvi"); index.save(&path).unwrap(); let loaded = BruteForceIndex::load(&path, &config).unwrap(); assert_eq!(loaded.len(), 2); // Search should produce identical results let results_orig = index.search(&[1.0, 0.0, 0.0], 2, 0).unwrap(); let results_loaded = loaded.search(&[1.0, 0.0, 0.0], 2, 0).unwrap(); assert_eq!(results_orig.len(), results_loaded.len()); for (a, b) in results_orig.iter().zip(results_loaded.iter()) { assert_eq!(a.id, b.id); assert!((a.distance - b.distance).abs() < 1e-6); } } #[test] fn brute_force_reserve_is_noop() { // BruteForce uses HashMap, which resizes automatically. // reserve() is a noop but must not error. let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() }; let index = BruteForceIndex::new(config); assert!(index.reserve(1_000_000).is_ok()); } #[test] fn l2_distance_sq_correctness() { let a = [1.0, 0.0, 0.0]; let b = [0.0, 1.0, 0.0]; let dist = l2_distance_sq(&a, &b); assert!((dist - 2.0).abs() < 1e-6); let c = [1.0, 0.0, 0.0]; assert!(l2_distance_sq(&a, &c) < 1e-6); } #[test] fn mock_vector_index_returns_predetermined() { let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() }; let results = vec![ vec![VectorSearchResult { id: 42, distance: 0.1 }], vec![VectorSearchResult { id: 99, distance: 0.5 }], ]; let mock = MockVectorIndex::new(config, results); let r1 = mock.search(&[1.0, 0.0, 0.0], 1, 0).unwrap(); assert_eq!(r1[0].id, 42); let r2 = mock.search(&[0.0, 1.0, 0.0], 1, 0).unwrap(); assert_eq!(r2[0].id, 99); // Third call: no more results, returns empty let r3 = mock.search(&[0.0, 0.0, 1.0], 1, 0).unwrap(); assert!(r3.is_empty()); } #[test] fn mock_vector_index_records_calls() { let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() }; let mock = MockVectorIndex::new(config, vec![]); mock.insert(1, &[1.0, 0.0, 0.0]).unwrap(); mock.delete(1).unwrap(); mock.search(&[1.0, 0.0, 0.0], 10, 200).unwrap(); mock.filtered_search(&[1.0, 0.0, 0.0], 5, 0, &|_| true).unwrap(); let calls = mock.calls(); assert_eq!(calls.len(), 4); assert!(matches!(calls[0], VectorIndexCall::Insert { id: 1 })); assert!(matches!(calls[1], VectorIndexCall::Delete { id: 1 })); assert!(matches!(calls[2], VectorIndexCall::Search { k: 10, ef_search: 200 })); assert!(matches!(calls[3], VectorIndexCall::FilteredSearch { k: 5, ef_search: 0 })); } #[test] fn vector_index_is_send_and_sync() { fn assert_send_sync() {} assert_send_sync::(); assert_send_sync::(); } #[test] fn vector_index_config_defaults() { let config = VectorIndexConfig::default(); assert_eq!(config.dimensions, 1536); assert_eq!(config.metric, DistanceMetric::L2); assert_eq!(config.quantization, QuantizationLevel::F16); assert_eq!(config.connectivity, 16); assert_eq!(config.ef_construction, 200); assert_eq!(config.ef_search, 200); } ``` ## Acceptance Criteria - [ ] `VectorIndex` trait with all methods from Spec 07, Section 11 - [ ] `VectorIndex: Send + Sync` bound - [ ] `VectorId = u64` type alias - [ ] `VectorSearchResult`, `VectorIndexConfig`, `DistanceMetric`, `QuantizationLevel`, `VectorError` types with correct derives - [ ] `VectorIndexConfig::default()` returns dimensions=1536, L2, F16, M=16, ef_construction=200, ef_search=200 - [ ] `VectorError` implements `Display`, `Error`, `From` - [ ] `l2_distance_sq()` computes correct L2 squared distance - [ ] `BruteForceIndex::search()` returns exact nearest neighbors sorted by ascending distance - [ ] `BruteForceIndex::filtered_search()` returns only results where `filter(id) == true` - [ ] `BruteForceIndex::insert()` validates dimensions and rejects mismatches - [ ] `BruteForceIndex::insert()` replaces existing vectors with the same ID - [ ] `BruteForceIndex::delete()` removes vectors; they never appear in search results - [ ] `BruteForceIndex::delete()` returns `NotFound` for unknown IDs - [ ] `BruteForceIndex::save()` and `load()` roundtrip produces identical search results - [ ] `MockVectorIndex` returns predetermined results and records call history - [ ] All property tests pass: insert+search roundtrip, delete exclusion, filtered_search predicate honor, result ordering - [ ] `BruteForceIndex` and `MockVectorIndex` are `Send + Sync` - [ ] No `unsafe` code - [ ] `cargo clippy -- -D warnings` passes - [ ] All property tests and unit tests pass ## Research References - [docs/research/ann_for_tidaldb.md](../../../research/ann_for_tidaldb.md) -- Section "Implementation recommendation: wrap USearch, build the planner": "A `BruteForceIndex` exists for correctness verification and small-dataset deployments", brute-force breakeven point (~2,000-5,000 vectors) ## Spec References - [docs/specs/07-vector-retrieval.md](../../../specs/07-vector-retrieval.md) -- Section 11 (VectorIndex trait: full API signatures, VectorError variants, BruteForceIndex implementation sketch, MockVectorIndex), Section 12 (performance targets), Section 13 (invariants 1-3: insert retrievability, delete exclusion, filtered_search predicate compliance) ## Implementation Notes - Add `pub mod vector;` to `tidal/src/storage/mod.rs`. The vector module is a submodule of storage because vector indexes are a storage concern (persistence, key encoding, entity store integration). - `BruteForceIndex` uses true deletion (HashMap::remove), not lazy tombstoning. This means `len()` and `len_live()` always return the same value. The `tombstone_ratio()` default implementation handles this correctly (returns 0.0). USearch (Task 02) uses lazy tombstoning, where `len() > len_live()`. - The `ef_search` parameter is ignored by `BruteForceIndex` (exact search has no beam width). It is accepted for trait compliance but unused. - `view()` for `BruteForceIndex` delegates to `load()` since there is no mmap mode. This is documented on the method. - `reserve()` for `BruteForceIndex` is a no-op since HashMap resizes automatically. This is documented on the method. - Do NOT add the `usearch` crate dependency in this task. That is Task 02. - Do NOT implement `l2_normalize()` in this task. That is Task 03 (Embedding Lifecycle). - Do NOT implement the adaptive query planner in this task. That is Task 04. - The `l2_distance_sq()` function is `pub(crate)` -- it is used by `BruteForceIndex` and by Task 04's planner for brute-force fallback. It is not a public API.