tidaldb/docs/planning/milestone-2/phase-1/task-01-vector-index-trait-and-brute-force.md
jordan 6fdaa1584b feat: complete M1 signal engine — m0p3 samples/docs, m1p5 TidalDb API, examples, and periodic checkpoint
- m0p3: CONTRIBUTING.md with run-samples checklist, all 4 examples
  (quickstart, cli_embedding, axum_embedding, actix_embedding), doc-test
  coverage for every public API surface
- m1p5: TidalDb public API — write_item, signal, read_decay_score,
  read_windowed_count, read_velocity; StorageBox enum routing memory vs
  fjall; WalSender/WalHandleWriter bridge; WAL replay on open
- Periodic checkpoint: 30s background thread for persistent+schema mode;
  FjallBackend::Clone (O(1), fjall::Keyspace is ref-counted); graceful
  shutdown via Arc<AtomicBool> + join before final checkpoint
- ROADMAP.md: M0 and M1 fully marked COMPLETE (341 tests passing)
- Milestone 2 planning scaffolding added under docs/planning/milestone-2/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 22:45:10 -07:00

31 KiB

Task 01: VectorIndex Trait + BruteForceIndex

Context

Milestone: 2 -- Ranked Retrieval Phase: m2p1 -- Vector Index Integration (USearch) Depends On: None (uses types from m1p1 but no m2p1 tasks) Blocks: Task 02 (USearch Backend), Task 03 (Embedding Lifecycle), Task 04 (Adaptive Query Planner) Complexity: M

Objective

Deliver the VectorIndex trait -- the public interface for all ANN operations in tidalDB -- along with the full type system for vector search (VectorId, VectorSearchResult, VectorIndexConfig, DistanceMetric, QuantizationLevel, VectorError) and two pure-Rust implementations: BruteForceIndex (exact linear-scan search) and MockVectorIndex (predetermined results for unit tests).

The VectorIndex trait is the abstraction boundary. No module outside storage/vector/ will ever know whether USearch, hnsw_rs, or brute-force is behind it. This is the same pattern as StorageEngine in m1p3: define the trait first, implement brute-force for correctness, then add the production backend in the next task.

BruteForceIndex is not a throwaway. It serves three permanent roles:

  1. Correctness oracle -- recall measurements compare HNSW results against BruteForceIndex exact results.
  2. Small datasets -- when the index has fewer than ~10,000 vectors, brute-force is faster than HNSW because there is no graph construction overhead.
  3. Pre-filter fallback -- the adaptive query planner (Task 04) uses BruteForceIndex-style linear scan over bitmap-filtered candidate sets when selectivity < 1%.

No unsafe code in this task. Pure Rust throughout.

Requirements

  • VectorIndex trait: insert, search, filtered_search, delete, reserve, save, load, view, len, len_live, is_empty, tombstone_ratio
  • All trait methods match the signatures in Spec 07, Section 11
  • VectorIndex: Send + Sync bound
  • VectorId = u64 type alias
  • VectorSearchResult { id: VectorId, distance: f32 } with Debug, Clone
  • VectorIndexConfig with all HNSW parameters
  • DistanceMetric enum: L2, InnerProduct
  • QuantizationLevel enum: F32, F16, Int8
  • VectorError enum with Display, Debug, From<std::io::Error>
  • BruteForceIndex: RwLock<HashMap<VectorId, Vec<f32>>> for storage, linear scan for search
  • BruteForceIndex::search returns results sorted by ascending L2 squared distance
  • BruteForceIndex::filtered_search applies predicate during linear scan, returns only matching results
  • BruteForceIndex::delete removes the vector from the HashMap (true delete, not tombstone)
  • BruteForceIndex::save/load/view use a simple binary format for test persistence
  • MockVectorIndex: predetermined results, call recording for test assertions
  • No unsafe code

Technical Design

Module Structure

tidal/src/storage/vector/
  mod.rs      -- VectorIndex trait, all types, re-exports
  brute.rs    -- BruteForceIndex, MockVectorIndex

Public API

// === storage/vector/mod.rs ===

use std::path::Path;

/// A unique identifier for an entity in the vector index.
/// Corresponds to the u64 representation of the application-provided entity ID.
pub type VectorId = u64;

/// A scored search result from the vector index.
#[derive(Debug, Clone)]
pub struct VectorSearchResult {
    /// Entity ID in the vector index.
    pub id: VectorId,
    /// L2 squared distance from query vector. Lower = more similar.
    /// For L2-normalized vectors, range is [0.0, 4.0] where 0.0 = identical.
    pub distance: f32,
}

/// Configuration for vector index construction.
#[derive(Debug, Clone)]
pub struct VectorIndexConfig {
    /// Number of dimensions per vector.
    pub dimensions: usize,
    /// Distance metric.
    pub metric: DistanceMetric,
    /// Quantization level for stored vectors.
    pub quantization: QuantizationLevel,
    /// Maximum connections per node per layer (M parameter). Default: 16.
    pub connectivity: usize,
    /// Beam width during index construction. Default: 200.
    pub ef_construction: usize,
    /// Default beam width during search (overridable per query). Default: 200.
    pub ef_search: usize,
}

impl Default for VectorIndexConfig {
    fn default() -> Self {
        Self {
            dimensions: 1536,
            metric: DistanceMetric::L2,
            quantization: QuantizationLevel::F16,
            connectivity: 16,
            ef_construction: 200,
            ef_search: 200,
        }
    }
}

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum DistanceMetric {
    /// L2 squared distance. Default for cosine over normalized vectors.
    L2,
    /// Inner product. For MIPS workloads (with XBOX transformation).
    InnerProduct,
}

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum QuantizationLevel {
    /// Full precision (4 bytes per dimension).
    F32,
    /// Half precision (2 bytes per dimension). Default.
    F16,
    /// Scalar quantization (1 byte per dimension).
    Int8,
}

/// Errors from vector index operations.
#[derive(Debug)]
pub enum VectorError {
    /// Vector dimensions do not match index configuration.
    DimensionMismatch { expected: usize, got: usize },
    /// Index is at capacity and cannot accept more vectors.
    CapacityExceeded { capacity: usize },
    /// Vector ID not found in the index.
    NotFound { id: VectorId },
    /// I/O error during persistence.
    Io(std::io::Error),
    /// Index file is corrupted or incompatible.
    CorruptedIndex(String),
    /// USearch or backend-specific error.
    Backend(String),
    /// Vector has zero L2 norm and cannot be normalized.
    ZeroNormVector,
}

// Note: `ZeroNormVector` is not in Spec 07 Section 11 but is required by `l2_normalize()` in Task 03. Spec 07 should be updated to include it.

impl std::fmt::Display for VectorError { /* variant-specific messages */ }
impl std::error::Error for VectorError {}
impl From<std::io::Error> for VectorError { /* wraps as VectorError::Io */ }

/// The vector index trait. All ANN operations go through this interface.
///
/// Implementations must be `Send + Sync` for concurrent search + insert.
///
/// # Contract
///
/// - Vectors passed to `insert()` must already be L2-normalized. The trait
///   does not normalize -- the caller (embedding lifecycle, Task 03) is
///   responsible for normalization before insertion.
/// - `search()` and `filtered_search()` return results sorted by ascending
///   distance (most similar first).
/// - `delete()` marks a vector as tombstoned. Tombstoned vectors are excluded
///   from search results but may remain in the index structure.
pub trait VectorIndex: Send + Sync {
    /// Insert a vector into the index.
    ///
    /// If a vector with this ID already exists, it is replaced (delete + insert).
    ///
    /// # Errors
    ///
    /// - `VectorError::CapacityExceeded` if the index is full.
    /// - `VectorError::DimensionMismatch` if `embedding.len() != config.dimensions`.
    fn insert(&self, id: VectorId, embedding: &[f32]) -> Result<(), VectorError>;

    /// Search for the K nearest neighbors to the query vector.
    ///
    /// Results are ordered by ascending distance (most similar first).
    ///
    /// # Arguments
    ///
    /// * `query` -- The query vector. Must be L2-normalized.
    /// * `k` -- Number of results to return.
    /// * `ef_search` -- Beam width override. If 0, uses the index default.
    fn search(
        &self,
        query: &[f32],
        k: usize,
        ef_search: usize,
    ) -> Result<Vec<VectorSearchResult>, VectorError>;

    /// Search for the K nearest neighbors that satisfy a filter predicate.
    ///
    /// The predicate is evaluated during traversal. Nodes failing the predicate
    /// are used for navigation but excluded from results (in-graph filtering).
    ///
    /// # Arguments
    ///
    /// * `query` -- The query vector. Must be L2-normalized.
    /// * `k` -- Number of results to return.
    /// * `ef_search` -- Beam width override. If 0, uses the index default.
    /// * `filter` -- Predicate per candidate node. Return `true` to include.
    fn filtered_search(
        &self,
        query: &[f32],
        k: usize,
        ef_search: usize,
        filter: &dyn Fn(VectorId) -> bool,
    ) -> Result<Vec<VectorSearchResult>, VectorError>;

    /// Remove a vector from the index (lazy tombstone).
    ///
    /// # Errors
    ///
    /// - `VectorError::NotFound` if the ID is not in the index.
    fn delete(&self, id: VectorId) -> Result<(), VectorError>;

    /// Reserve capacity for at least `additional` more vectors.
    fn reserve(&self, additional: usize) -> Result<(), VectorError>;

    /// Persist the index to disk.
    fn save(&self, path: &Path) -> Result<(), VectorError>;

    /// Load an index from disk into writable memory.
    fn load(path: &Path, config: &VectorIndexConfig) -> Result<Self, VectorError>
    where
        Self: Sized;

    /// Memory-map an index from disk for read-only access.
    // config required by USearch to initialize the mmap'd index with correct parameters
    fn view(path: &Path, config: &VectorIndexConfig) -> Result<Self, VectorError>
    where
        Self: Sized;

    /// Number of vectors in the index (including tombstoned).
    fn len(&self) -> usize;

    /// Number of live (non-tombstoned) vectors.
    fn len_live(&self) -> usize;

    /// Whether the index is empty.
    fn is_empty(&self) -> bool {
        self.len_live() == 0
    }

    /// Ratio of tombstoned vectors to total vectors.
    fn tombstone_ratio(&self) -> f64 {
        if self.len() == 0 {
            0.0
        } else {
            (self.len() - self.len_live()) as f64 / self.len() as f64
        }
    }
}

BruteForceIndex

// === storage/vector/brute.rs ===

use std::collections::HashMap;
use std::sync::RwLock;
use std::path::Path;
use std::io::{Read, Write, BufReader, BufWriter};
use std::fs::File;
use super::{VectorIndex, VectorId, VectorSearchResult, VectorIndexConfig, VectorError};

/// Exact nearest-neighbor search via linear scan.
///
/// Used for:
/// 1. Correctness verification (recall measurement against HNSW).
/// 2. Small datasets (< 10,000 vectors where brute-force is faster).
/// 3. Pre-filter fallback (adaptive query planner uses brute-force for
///    very selective filters where the filtered set is small).
pub struct BruteForceIndex {
    vectors: RwLock<HashMap<VectorId, Vec<f32>>>,
    config: VectorIndexConfig,
}

impl BruteForceIndex {
    pub fn new(config: VectorIndexConfig) -> Self;

    /// Number of vectors (HashMap length).
    fn vector_count(&self) -> usize;
}

Search implementation:

  • Acquire read lock on vectors
  • Compute L2 squared distance between query and every stored vector
  • Collect (VectorId, f32) pairs into a Vec
  • Sort by ascending distance
  • Take first k results
  • Return as Vec<VectorSearchResult>

L2 squared distance function:

/// Compute L2 squared distance between two vectors of equal length.
///
/// For L2-normalized vectors, this is equivalent to `2 - 2 * cos(a, b)`.
/// Returns sum of squared differences.
pub(crate) fn l2_distance_sq(a: &[f32], b: &[f32]) -> f32 {
    debug_assert_eq!(a.len(), b.len());
    a.iter()
        .zip(b.iter())
        .map(|(x, y)| {
            let d = x - y;
            d * d
        })
        .sum()
}

Persistence (save/load/view):

BruteForceIndex uses a simple binary format for test persistence:

Header:
  [magic: 4 bytes "BFVI"]
  [version: 1 byte (0x01)]
  [dimensions: 4 bytes LE]
  [count: 8 bytes LE]

Per vector:
  [id: 8 bytes LE]
  [vector: dimensions * 4 bytes, f32 LE]

view() loads the same file as load() (brute-force has no mmap mode -- it is always in-memory). This is acceptable because BruteForceIndex is not the production backend.

Filtered search: Same as search() but skips vectors where filter(id) == false before adding to the distance computation. This means brute-force filtered search only computes distances for vectors passing the filter, which is why it is fast for very selective filters.

MockVectorIndex

/// Configurable mock for unit tests.
///
/// Returns predetermined results from search calls and records all method
/// invocations for verification.
pub struct MockVectorIndex {
    search_results: RwLock<Vec<Vec<VectorSearchResult>>>,
    call_log: RwLock<Vec<VectorIndexCall>>,
    config: VectorIndexConfig,
    inserted_count: RwLock<usize>,
}

#[derive(Debug, Clone)]
pub enum VectorIndexCall {
    Insert { id: VectorId },
    Delete { id: VectorId },
    Search { k: usize, ef_search: usize },
    FilteredSearch { k: usize, ef_search: usize },
    Reserve { additional: usize },
    Save,
    Load,
    View,
}

impl MockVectorIndex {
    /// Create a mock with predetermined search results.
    ///
    /// Each call to `search()` or `filtered_search()` pops the first element
    /// from `search_results`. If empty, returns an empty Vec.
    pub fn new(config: VectorIndexConfig, search_results: Vec<Vec<VectorSearchResult>>) -> Self;

    /// Get the recorded call log.
    pub fn calls(&self) -> Vec<VectorIndexCall>;

    /// Clear the call log.
    pub fn clear_calls(&self);
}

Error Handling

  • insert() with wrong dimensions: returns VectorError::DimensionMismatch { expected, got }.
  • search() with wrong query dimensions: returns VectorError::DimensionMismatch.
  • delete() for unknown ID: returns VectorError::NotFound { id }.
  • save()/load() I/O failures: returns VectorError::Io(e).
  • load() with corrupt file: returns VectorError::CorruptedIndex(msg).

Test Strategy

Property Tests

use proptest::prelude::*;

// Insert + search roundtrip: every inserted vector is retrievable.
proptest! {
    #[test]
    fn insert_search_roundtrip(
        dim in 2usize..64,
        n_vectors in 1usize..200,
        k in 1usize..50,
    ) {
        let k = k.min(n_vectors);
        let config = VectorIndexConfig {
            dimensions: dim,
            ..VectorIndexConfig::default()
        };
        let index = BruteForceIndex::new(config);

        // Insert random unit vectors
        let mut rng = proptest::test_runner::TestRng::deterministic_rng(
            proptest::test_runner::RngAlgorithm::ChaCha
        );
        for id in 0..n_vectors as u64 {
            let v: Vec<f32> = (0..dim).map(|_| rng.gen::<f32>() - 0.5).collect();
            let norm: f32 = v.iter().map(|x| x * x).sum::<f32>().sqrt();
            let unit: Vec<f32> = v.iter().map(|x| x / norm).collect();
            index.insert(id, &unit).unwrap();
        }

        // Search for each inserted vector: it should be the top-1 result
        for id in 0..n_vectors as u64 {
            // Note: test must be in the module (or use pub(crate) vectors field) to access this private field.
            let v = index.vectors.read().unwrap()[&id].clone();
            let results = index.search(&v, 1, 0).unwrap();
            prop_assert!(!results.is_empty());
            prop_assert_eq!(results[0].id, id);
            prop_assert!(results[0].distance < 1e-6, "self-search should return distance ~0");
        }
    }
}

// Delete excludes tombstoned IDs from search results.
proptest! {
    #[test]
    fn delete_excludes_from_results(
        dim in 2usize..32,
        n_vectors in 5usize..100,
    ) {
        let config = VectorIndexConfig {
            dimensions: dim,
            ..VectorIndexConfig::default()
        };
        let index = BruteForceIndex::new(config);

        // Insert vectors
        let vectors: Vec<Vec<f32>> = (0..n_vectors).map(|_| {
            let v: Vec<f32> = (0..dim).map(|i| ((i * 7 + 13) % 100) as f32 / 100.0 - 0.5).collect();
            let norm: f32 = v.iter().map(|x| x * x).sum::<f32>().sqrt();
            v.iter().map(|x| x / norm).collect()
        }).collect();
        for (id, v) in vectors.iter().enumerate() {
            index.insert(id as u64, v).unwrap();
        }

        // Delete the first vector
        index.delete(0).unwrap();

        // Search should not return deleted ID
        let query = &vectors[0];
        let results = index.search(query, n_vectors, 0).unwrap();
        prop_assert!(results.iter().all(|r| r.id != 0),
            "deleted vector should not appear in results");
        prop_assert_eq!(results.len(), n_vectors - 1);
    }
}

// filtered_search honors all predicates.
proptest! {
    #[test]
    fn filtered_search_honors_predicate(
        dim in 2usize..32,
        n_vectors in 10usize..100,
        k in 1usize..20,
    ) {
        let k = k.min(n_vectors / 2);
        let config = VectorIndexConfig {
            dimensions: dim,
            ..VectorIndexConfig::default()
        };
        let index = BruteForceIndex::new(config);

        for id in 0..n_vectors as u64 {
            let v: Vec<f32> = (0..dim).map(|i| ((id as usize * 3 + i * 7) % 100) as f32 / 100.0).collect();
            let norm: f32 = v.iter().map(|x| x * x).sum::<f32>().sqrt();
            let unit: Vec<f32> = v.iter().map(|x| x / norm).collect();
            index.insert(id, &unit).unwrap();
        }

        // Filter: only even IDs
        let predicate = |id: VectorId| id % 2 == 0;
        let query: Vec<f32> = (0..dim).map(|i| (i as f32) / dim as f32).collect();
        let norm: f32 = query.iter().map(|x| x * x).sum::<f32>().sqrt();
        let unit_query: Vec<f32> = query.iter().map(|x| x / norm).collect();

        let results = index.filtered_search(&unit_query, k, 0, &predicate).unwrap();
        for r in &results {
            prop_assert!(r.id % 2 == 0,
                "filtered_search returned odd ID {}", r.id);
        }
    }
}

// Search results are sorted by ascending distance.
proptest! {
    #[test]
    fn results_sorted_by_distance(
        dim in 2usize..32,
        n_vectors in 5usize..100,
        k in 2usize..50,
    ) {
        let k = k.min(n_vectors);
        let config = VectorIndexConfig {
            dimensions: dim,
            ..VectorIndexConfig::default()
        };
        let index = BruteForceIndex::new(config);

        for id in 0..n_vectors as u64 {
            let v: Vec<f32> = (0..dim).map(|i| ((id as usize + i) % 100) as f32 / 100.0).collect();
            let norm: f32 = v.iter().map(|x| x * x).sum::<f32>().sqrt();
            let unit: Vec<f32> = v.iter().map(|x| x / norm).collect();
            index.insert(id, &unit).unwrap();
        }

        let query: Vec<f32> = vec![1.0 / (dim as f32).sqrt(); dim];
        let results = index.search(&query, k, 0).unwrap();
        for w in results.windows(2) {
            prop_assert!(w[0].distance <= w[1].distance,
                "results not sorted: {} > {}", w[0].distance, w[1].distance);
        }
    }
}

Unit Tests

#[test]
fn brute_force_new_is_empty() {
    let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
    let index = BruteForceIndex::new(config);
    assert_eq!(index.len(), 0);
    assert_eq!(index.len_live(), 0);
    assert!(index.is_empty());
    assert!((index.tombstone_ratio() - 0.0).abs() < f64::EPSILON);
}

#[test]
fn brute_force_insert_and_len() {
    let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
    let index = BruteForceIndex::new(config);
    index.insert(1, &[1.0, 0.0, 0.0]).unwrap();
    index.insert(2, &[0.0, 1.0, 0.0]).unwrap();
    assert_eq!(index.len(), 2);
    assert_eq!(index.len_live(), 2);
    assert!(!index.is_empty());
}

#[test]
fn brute_force_dimension_mismatch() {
    let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
    let index = BruteForceIndex::new(config);
    let result = index.insert(1, &[1.0, 0.0]); // 2 dims instead of 3
    assert!(matches!(result, Err(VectorError::DimensionMismatch { expected: 3, got: 2 })));
}

#[test]
fn brute_force_search_dimension_mismatch() {
    let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
    let index = BruteForceIndex::new(config);
    index.insert(1, &[1.0, 0.0, 0.0]).unwrap();
    let result = index.search(&[1.0, 0.0], 1, 0); // 2 dims query
    assert!(matches!(result, Err(VectorError::DimensionMismatch { .. })));
}

#[test]
fn brute_force_self_search_distance_zero() {
    let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
    let index = BruteForceIndex::new(config);
    let v = [1.0, 0.0, 0.0];
    index.insert(42, &v).unwrap();
    let results = index.search(&v, 1, 0).unwrap();
    assert_eq!(results.len(), 1);
    assert_eq!(results[0].id, 42);
    assert!(results[0].distance < 1e-6);
}

#[test]
fn brute_force_search_empty_index() {
    let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
    let index = BruteForceIndex::new(config);
    let results = index.search(&[1.0, 0.0, 0.0], 10, 0).unwrap();
    assert!(results.is_empty());
}

#[test]
fn brute_force_search_k_larger_than_index() {
    let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
    let index = BruteForceIndex::new(config);
    index.insert(1, &[1.0, 0.0, 0.0]).unwrap();
    index.insert(2, &[0.0, 1.0, 0.0]).unwrap();
    let results = index.search(&[1.0, 0.0, 0.0], 100, 0).unwrap();
    assert_eq!(results.len(), 2); // returns all available, not error
}

#[test]
fn brute_force_orthogonal_vectors_distance() {
    // For unit vectors a, b: ||a - b||^2 = 2 - 2*cos(a,b)
    // Orthogonal unit vectors: cos = 0, so distance = 2.0
    let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
    let index = BruteForceIndex::new(config);
    index.insert(1, &[1.0, 0.0, 0.0]).unwrap();
    let results = index.search(&[0.0, 1.0, 0.0], 1, 0).unwrap();
    assert!((results[0].distance - 2.0).abs() < 1e-5,
        "orthogonal unit vectors should have L2^2 distance of 2.0, got {}", results[0].distance);
}

#[test]
fn brute_force_identical_vectors_distance() {
    let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
    let index = BruteForceIndex::new(config);
    let v = [0.577_350_3, 0.577_350_3, 0.577_350_3]; // unit vector
    index.insert(1, &v).unwrap();
    let results = index.search(&v, 1, 0).unwrap();
    assert!(results[0].distance < 1e-6);
}

#[test]
fn brute_force_delete_and_search() {
    let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
    let index = BruteForceIndex::new(config);
    index.insert(1, &[1.0, 0.0, 0.0]).unwrap();
    index.insert(2, &[0.0, 1.0, 0.0]).unwrap();
    index.insert(3, &[0.0, 0.0, 1.0]).unwrap();

    index.delete(2).unwrap();
    assert_eq!(index.len(), 2); // BruteForce does true delete
    assert_eq!(index.len_live(), 2);

    let results = index.search(&[0.0, 1.0, 0.0], 10, 0).unwrap();
    assert!(results.iter().all(|r| r.id != 2));
}

#[test]
fn brute_force_delete_not_found() {
    let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
    let index = BruteForceIndex::new(config);
    let result = index.delete(999);
    assert!(matches!(result, Err(VectorError::NotFound { id: 999 })));
}

#[test]
fn brute_force_insert_replaces_existing() {
    let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
    let index = BruteForceIndex::new(config);
    index.insert(1, &[1.0, 0.0, 0.0]).unwrap();
    index.insert(1, &[0.0, 1.0, 0.0]).unwrap(); // replace

    assert_eq!(index.len(), 1); // still 1 vector
    let results = index.search(&[0.0, 1.0, 0.0], 1, 0).unwrap();
    assert_eq!(results[0].id, 1);
    assert!(results[0].distance < 1e-6, "should match the replacement vector");
}

#[test]
fn brute_force_filtered_search_excludes_non_matching() {
    let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
    let index = BruteForceIndex::new(config);
    for id in 0..10u64 {
        let v = [1.0, 0.0, 0.0]; // all same direction
        index.insert(id, &v).unwrap();
    }

    // Only include even IDs
    let results = index.filtered_search(&[1.0, 0.0, 0.0], 10, 0, &|id| id % 2 == 0).unwrap();
    assert_eq!(results.len(), 5);
    assert!(results.iter().all(|r| r.id % 2 == 0));
}

#[test]
fn brute_force_filtered_search_empty_result() {
    let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
    let index = BruteForceIndex::new(config);
    index.insert(1, &[1.0, 0.0, 0.0]).unwrap();

    // Predicate that matches nothing
    let results = index.filtered_search(&[1.0, 0.0, 0.0], 10, 0, &|_| false).unwrap();
    assert!(results.is_empty());
}

#[test]
fn brute_force_save_load_roundtrip() {
    let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
    let index = BruteForceIndex::new(config.clone());
    index.insert(1, &[1.0, 0.0, 0.0]).unwrap();
    index.insert(2, &[0.0, 1.0, 0.0]).unwrap();

    let dir = tempfile::tempdir().unwrap();
    let path = dir.path().join("test.bfvi");
    index.save(&path).unwrap();

    let loaded = BruteForceIndex::load(&path, &config).unwrap();
    assert_eq!(loaded.len(), 2);

    // Search should produce identical results
    let results_orig = index.search(&[1.0, 0.0, 0.0], 2, 0).unwrap();
    let results_loaded = loaded.search(&[1.0, 0.0, 0.0], 2, 0).unwrap();
    assert_eq!(results_orig.len(), results_loaded.len());
    for (a, b) in results_orig.iter().zip(results_loaded.iter()) {
        assert_eq!(a.id, b.id);
        assert!((a.distance - b.distance).abs() < 1e-6);
    }
}

#[test]
fn brute_force_reserve_is_noop() {
    // BruteForce uses HashMap, which resizes automatically.
    // reserve() is a noop but must not error.
    let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
    let index = BruteForceIndex::new(config);
    assert!(index.reserve(1_000_000).is_ok());
}

#[test]
fn l2_distance_sq_correctness() {
    let a = [1.0, 0.0, 0.0];
    let b = [0.0, 1.0, 0.0];
    let dist = l2_distance_sq(&a, &b);
    assert!((dist - 2.0).abs() < 1e-6);

    let c = [1.0, 0.0, 0.0];
    assert!(l2_distance_sq(&a, &c) < 1e-6);
}

#[test]
fn mock_vector_index_returns_predetermined() {
    let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
    let results = vec![
        vec![VectorSearchResult { id: 42, distance: 0.1 }],
        vec![VectorSearchResult { id: 99, distance: 0.5 }],
    ];
    let mock = MockVectorIndex::new(config, results);

    let r1 = mock.search(&[1.0, 0.0, 0.0], 1, 0).unwrap();
    assert_eq!(r1[0].id, 42);

    let r2 = mock.search(&[0.0, 1.0, 0.0], 1, 0).unwrap();
    assert_eq!(r2[0].id, 99);

    // Third call: no more results, returns empty
    let r3 = mock.search(&[0.0, 0.0, 1.0], 1, 0).unwrap();
    assert!(r3.is_empty());
}

#[test]
fn mock_vector_index_records_calls() {
    let config = VectorIndexConfig { dimensions: 3, ..VectorIndexConfig::default() };
    let mock = MockVectorIndex::new(config, vec![]);

    mock.insert(1, &[1.0, 0.0, 0.0]).unwrap();
    mock.delete(1).unwrap();
    mock.search(&[1.0, 0.0, 0.0], 10, 200).unwrap();
    mock.filtered_search(&[1.0, 0.0, 0.0], 5, 0, &|_| true).unwrap();

    let calls = mock.calls();
    assert_eq!(calls.len(), 4);
    assert!(matches!(calls[0], VectorIndexCall::Insert { id: 1 }));
    assert!(matches!(calls[1], VectorIndexCall::Delete { id: 1 }));
    assert!(matches!(calls[2], VectorIndexCall::Search { k: 10, ef_search: 200 }));
    assert!(matches!(calls[3], VectorIndexCall::FilteredSearch { k: 5, ef_search: 0 }));
}

#[test]
fn vector_index_is_send_and_sync() {
    fn assert_send_sync<T: Send + Sync>() {}
    assert_send_sync::<BruteForceIndex>();
    assert_send_sync::<MockVectorIndex>();
}

#[test]
fn vector_index_config_defaults() {
    let config = VectorIndexConfig::default();
    assert_eq!(config.dimensions, 1536);
    assert_eq!(config.metric, DistanceMetric::L2);
    assert_eq!(config.quantization, QuantizationLevel::F16);
    assert_eq!(config.connectivity, 16);
    assert_eq!(config.ef_construction, 200);
    assert_eq!(config.ef_search, 200);
}

Acceptance Criteria

  • VectorIndex trait with all methods from Spec 07, Section 11
  • VectorIndex: Send + Sync bound
  • VectorId = u64 type alias
  • VectorSearchResult, VectorIndexConfig, DistanceMetric, QuantizationLevel, VectorError types with correct derives
  • VectorIndexConfig::default() returns dimensions=1536, L2, F16, M=16, ef_construction=200, ef_search=200
  • VectorError implements Display, Error, From<std::io::Error>
  • l2_distance_sq() computes correct L2 squared distance
  • BruteForceIndex::search() returns exact nearest neighbors sorted by ascending distance
  • BruteForceIndex::filtered_search() returns only results where filter(id) == true
  • BruteForceIndex::insert() validates dimensions and rejects mismatches
  • BruteForceIndex::insert() replaces existing vectors with the same ID
  • BruteForceIndex::delete() removes vectors; they never appear in search results
  • BruteForceIndex::delete() returns NotFound for unknown IDs
  • BruteForceIndex::save() and load() roundtrip produces identical search results
  • MockVectorIndex returns predetermined results and records call history
  • All property tests pass: insert+search roundtrip, delete exclusion, filtered_search predicate honor, result ordering
  • BruteForceIndex and MockVectorIndex are Send + Sync
  • No unsafe code
  • cargo clippy -- -D warnings passes
  • All property tests and unit tests pass

Research References

  • docs/research/ann_for_tidaldb.md -- Section "Implementation recommendation: wrap USearch, build the planner": "A BruteForceIndex exists for correctness verification and small-dataset deployments", brute-force breakeven point (~2,000-5,000 vectors)

Spec References

  • docs/specs/07-vector-retrieval.md -- Section 11 (VectorIndex trait: full API signatures, VectorError variants, BruteForceIndex implementation sketch, MockVectorIndex), Section 12 (performance targets), Section 13 (invariants 1-3: insert retrievability, delete exclusion, filtered_search predicate compliance)

Implementation Notes

  • Add pub mod vector; to tidal/src/storage/mod.rs. The vector module is a submodule of storage because vector indexes are a storage concern (persistence, key encoding, entity store integration).
  • BruteForceIndex uses true deletion (HashMap::remove), not lazy tombstoning. This means len() and len_live() always return the same value. The tombstone_ratio() default implementation handles this correctly (returns 0.0). USearch (Task 02) uses lazy tombstoning, where len() > len_live().
  • The ef_search parameter is ignored by BruteForceIndex (exact search has no beam width). It is accepted for trait compliance but unused.
  • view() for BruteForceIndex delegates to load() since there is no mmap mode. This is documented on the method.
  • reserve() for BruteForceIndex is a no-op since HashMap resizes automatically. This is documented on the method.
  • Do NOT add the usearch crate dependency in this task. That is Task 02.
  • Do NOT implement l2_normalize() in this task. That is Task 03 (Embedding Lifecycle).
  • Do NOT implement the adaptive query planner in this task. That is Task 04.
  • The l2_distance_sq() function is pub(crate) -- it is used by BruteForceIndex and by Task 04's planner for brute-force fallback. It is not a public API.