tidaldb/docs/planning/milestone-7/phase-3/task-02-usearch-parameter-tuning.md

# Task 02: USearch Parameter Tuning

## Delivers

A systematic benchmark of USearch HNSW parameters (M x ef_construction) at 1M vectors, documenting the recall/latency tradeoff. The optimal configuration is applied to `VectorIndexConfig::default()`. ANN recall@10 must exceed 0.95.

## Complexity

M

## Dependencies

- task-01 complete (scale benchmark infrastructure)
- `docs/research/ann_for_tidaldb.md` (parameter guidance)

## Technical Design

### 1. Parameter matrix

The research doc identifies M and ef_construction as the two critical HNSW parameters for recall/latency tradeoff. At 1536D (production) or 128D (benchmark), the relationship between these parameters and recall quality must be measured, not assumed.

| Parameter | Values | Rationale |
|-----------|--------|-----------|
| M (connectivity) | 8, 16, 32 | M=16 is the USearch default; M=8 saves ~50% graph memory; M=32 improves recall under selective filters at 2x memory |
| ef_construction | 100, 200, 400 | Controls index build quality; diminishing returns past 200 in most benchmarks |
| ef_search | 200 (fixed) | Query-time expansion factor; held constant to isolate build-quality effects |

This produces a 3x3 = 9 configuration matrix.

### 2. Benchmark implementation

Add to `tidal/benches/vector.rs` or a new `tidal/benches/usearch_tuning.rs`:

```rust
#![allow(clippy::unwrap_used, clippy::cast_precision_loss)]

use criterion::{Criterion, black_box, criterion_group, criterion_main, BenchmarkId};
use rand::Rng;
use std::time::Duration;
use tidaldb::storage::vector::{
    AdaptiveQueryPlanner, BruteForceIndex, DistanceMetric,
    QuantizationLevel, VectorId, VectorIndex, VectorIndexConfig,
};

const DIM: usize = 128;
const N: u64 = 1_000_000;
const K: usize = 10;
const NUM_QUERIES: usize = 100;

fn random_unit_vector(dim: usize, rng: &mut impl Rng) -> Vec<f32> {
    let v: Vec<f32> = (0..dim).map(|_| rng.random::<f32>() - 0.5).collect();
    let norm: f32 = v.iter().map(|x| x * x).sum::<f32>().sqrt();
    if norm < f32::EPSILON {
        let mut fallback = vec![0.0f32; dim];
        fallback[0] = 1.0;
        return fallback;
    }
    v.iter().map(|x| x / norm).collect()
}

struct TuningResult {
    m: usize,
    ef_construction: usize,
    recall_at_10: f64,
    mean_latency_us: f64,
    build_time_secs: f64,
    memory_bytes: usize,
}

/// Build an index with specific parameters, compute ground truth recall,
/// and measure search latency.
fn evaluate_config(m: usize, ef_construction: usize) -> TuningResult {
    let config = VectorIndexConfig {
        dimensions: DIM,
        metric: DistanceMetric::L2,
        quantization: QuantizationLevel::F32,
        connectivity: m,
        ef_construction,
        ef_search: 200,
    };

    let mut rng = rand::rng();

    // Build ground truth with brute force
    let brute = BruteForceIndex::new(config.clone());
    let build_start = std::time::Instant::now();
    // (In practice, use the real USearch-backed index here, not BruteForceIndex)
    for id in 0..N {
        let vec = random_unit_vector(DIM, &mut rng);
        brute.insert(id, &vec).unwrap();
    }
    let build_time = build_start.elapsed();

    // Generate query vectors
    let queries: Vec<Vec<f32>> = (0..NUM_QUERIES)
        .map(|_| random_unit_vector(DIM, &mut rng))
        .collect();

    // Compute ground truth (brute force top-K for each query)
    let ground_truths: Vec<Vec<VectorId>> = queries
        .iter()
        .map(|q| {
            brute
                .search(q, K, K * 2)
                .unwrap()
                .iter()
                .map(|r| r.id)
                .collect()
        })
        .collect();

    // Search and measure recall + latency
    let planner = AdaptiveQueryPlanner::with_defaults();
    let mut total_recall = 0.0;
    let mut total_latency = Duration::ZERO;

    for (query, gt) in queries.iter().zip(ground_truths.iter()) {
        let start = std::time::Instant::now();
        let results = planner
            .execute(&brute, query, K, None, 1.0, None)
            .unwrap();
        total_latency += start.elapsed();

        let result_ids: Vec<VectorId> = results.iter().map(|r| r.id).collect();
        let hits = result_ids.iter().filter(|id| gt.contains(id)).count();
        total_recall += hits as f64 / gt.len() as f64;
    }

    TuningResult {
        m,
        ef_construction,
        recall_at_10: total_recall / NUM_QUERIES as f64,
        mean_latency_us: total_latency.as_micros() as f64 / NUM_QUERIES as f64,
        build_time_secs: build_time.as_secs_f64(),
        memory_bytes: 0, // Measured via index-specific API if available
    }
}
```

### 3. Criterion benchmark for the optimal config

After determining the optimal (M, ef_construction) from the evaluation, add a Criterion benchmark that measures search latency at the chosen parameters:

```rust
fn bench_usearch_optimal_1m(c: &mut Criterion) {
    let mut group = c.benchmark_group("usearch_1m");
    group.sample_size(10);
    group.measurement_time(Duration::from_secs(30));

    // Build index with candidate-optimal config
    let configs = [
        (8, 100),
        (8, 200),
        (16, 100),
        (16, 200),
        (16, 400),
        (32, 200),
        (32, 400),
    ];

    let mut rng = rand::rng();
    let query = random_unit_vector(DIM, &mut rng);

    for &(m, ef_c) in &configs {
        let config = VectorIndexConfig {
            dimensions: DIM,
            metric: DistanceMetric::L2,
            quantization: QuantizationLevel::F32,
            connectivity: m,
            ef_construction: ef_c,
            ef_search: 200,
        };
        let index = BruteForceIndex::new(config);
        // Pre-populate (in real implementation, use the HNSW-backed index)
        for id in 0..10_000u64 {
            let vec = random_unit_vector(DIM, &mut rng);
            index.insert(id, &vec).unwrap();
        }

        group.bench_with_input(
            BenchmarkId::new("search", format!("M{m}_ef{ef_c}")),
            &(m, ef_c),
            |b, _| {
                b.iter(|| {
                    index.search(black_box(&query), black_box(K), black_box(200)).unwrap()
                });
            },
        );
    }

    group.finish();
}
```

### 4. Apply optimal config

Once the optimal (M, ef_construction) is determined, update `VectorIndexConfig` defaults:

```rust
// In tidal/src/storage/vector/mod.rs or config.rs
impl Default for VectorIndexConfig {
    fn default() -> Self {
        Self {
            dimensions: 128,
            metric: DistanceMetric::L2,
            quantization: QuantizationLevel::F16, // research doc recommends f16 default
            connectivity: OPTIMAL_M,               // determined by benchmark
            ef_construction: OPTIMAL_EF_C,          // determined by benchmark
            ef_search: 200,
        }
    }
}
```

### 5. Recall measurement methodology

Recall@K is computed as the fraction of brute-force top-K results that appear in the HNSW search results:

```
recall@K = |HNSW_top_K intersect BruteForce_top_K| / K
```

Averaged over 100 random queries. The threshold is recall@10 > 0.95.

### 6. Memory estimation

Per the research doc, HNSW graph overhead is ~300 bytes per node. At 1M vectors with 128D float32:

| M | Vector data | Graph overhead | Total |
|---|-------------|---------------|-------|
| 8 | 488 MB | ~150 MB | ~638 MB |
| 16 | 488 MB | ~300 MB | ~788 MB |
| 32 | 488 MB | ~600 MB | ~1.1 GB |

At 1536D (production), multiply vector data by 12x. The graph overhead stays the same.

## Acceptance Criteria

- [ ] All 9 (M, ef_construction) configurations benchmarked at 1M vectors (or subset for CI time)
- [ ] Recall@10 > 0.95 for the selected optimal configuration
- [ ] Search latency for 100 queries recorded: mean and p99
- [ ] Build time per configuration recorded
- [ ] Optimal (M, ef_construction) applied to `VectorIndexConfig` default
- [ ] Results documented in `docs/profiling/usearch-tuning.md` with a recall/latency tradeoff table
- [ ] If recall@10 < 0.95 for all configs, document the finding and propose mitigation (increase ef_search, ACORN-1, etc.)

## Test Strategy

1. **Recall validation:** For the chosen config, run 100 queries and verify recall@10 > 0.95 against brute-force ground truth. This is a correctness test, not just a benchmark.
2. **Regression guard:** After applying the optimal config, re-run the existing `tidal/benches/vector.rs` benchmarks to ensure no regression at 10K scale.
3. **Config round-trip:** Verify that the new default config serializes and deserializes correctly if `VectorIndexConfig` is persisted.