tidaldb/docs/planning/milestone-5/phase-2/task-02-retrieval-mode-router.md

# Task 02: Retrieval Mode Router

## Delivers

`RetrievalMode` enum and `route_results()` function. `RetrievalMode::determine()` selects text-only, vector-only, or hybrid based on what's present in the query. `route_results()` converts pre-retrieved result lists through the appropriate path — direct passthrough for single-mode, `HybridFusion::fuse()` for hybrid. Criterion benchmark confirming fusion adds < 1ms at 1000 candidates per list.

## Complexity: S

## Dependencies

- Task 01 COMPLETE: `HybridFusion` with `fuse()` method in `tidal/src/query/fusion.rs`
- m5p1 COMPLETE: `EntityId` type
- m2p1 COMPLETE: `VectorSearchResult { id: VectorId, distance: f32 }` in `tidal/src/storage/vector/`

## Technical Design

### RetrievalMode

```rust
// tidal/src/query/fusion.rs (additions)

/// Which retrieval system(s) to use for a search query.
///
/// Determined by what the query provides:
/// - `TextOnly` — only `query_text` is present
/// - `VectorOnly` — only `query_vector` is present
/// - `Hybrid` — both `query_text` and `query_vector` are present
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum RetrievalMode {
    /// Execute BM25 text search only.
    TextOnly,
    /// Execute ANN vector search only.
    VectorOnly,
    /// Execute both and fuse results via RRF.
    Hybrid,
}

impl RetrievalMode {
    /// Determine the retrieval mode from query contents.
    ///
    /// Returns `None` if neither text nor vector is provided (invalid query).
    #[must_use]
    pub fn determine(has_text: bool, has_vector: bool) -> Option<Self> {
        match (has_text, has_vector) {
            (true, false) => Some(Self::TextOnly),
            (false, true) => Some(Self::VectorOnly),
            (true, true) => Some(Self::Hybrid),
            (false, false) => None,
        }
    }
}
```

### route_results()

```rust
/// Route pre-retrieved result lists through the appropriate fusion path.
///
/// - `TextOnly`: converts BM25 scores to `f64` and returns them sorted descending.
/// - `VectorOnly`: converts ANN distance → rank-based score and returns sorted descending.
/// - `Hybrid`: calls `HybridFusion::fuse()` and returns the fused result.
///
/// # Inputs
///
/// - `bm25_results`: `(EntityId, f32)` where f32 is BM25 score, **pre-sorted descending**.
/// - `ann_results`: `(EntityId, f32)` where f32 is L2-squared distance, **pre-sorted ascending**.
/// - Both slices may be empty; callers pass `&[]` for unused modes.
///
/// # Returns
///
/// `Vec<(EntityId, f64)>` sorted descending by score. For `TextOnly` and `VectorOnly`,
/// scores are normalized to `[0, 1]` relative to the top candidate (score 1.0).
/// For `Hybrid`, scores are raw RRF values (typically 0.01–0.04 for k=60).
pub fn route_results(
    mode: RetrievalMode,
    bm25_results: &[(EntityId, f32)],
    ann_results: &[(EntityId, f32)],
    fusion: &HybridFusion,
) -> Vec<(EntityId, f64)> {
    match mode {
        RetrievalMode::TextOnly => {
            // Convert f32 BM25 scores to f64; already sorted descending by caller.
            bm25_results
                .iter()
                .map(|(id, score)| (*id, f64::from(*score)))
                .collect()
        }
        RetrievalMode::VectorOnly => {
            // Convert rank-position to a score using the same RRF formula for
            // consistency: score = 1.0 / (k + rank). This gives ANN-only results
            // the same score range as hybrid results.
            let k = f64::from(fusion.k);
            ann_results
                .iter()
                .enumerate()
                .map(|(i, (id, _distance))| {
                    let rank = (i + 1) as f64;
                    (*id, 1.0 / (k + rank))
                })
                .collect()
        }
        RetrievalMode::Hybrid => fusion.fuse(bm25_results, ann_results),
    }
}
```

### ann_to_ranked()

A helper to convert `Vec<VectorSearchResult>` (returned by `VectorIndex::search()`) to `Vec<(EntityId, f32)>` suitable as input to `fuse()` or `route_results()`:

```rust
use crate::storage::vector::VectorSearchResult;

/// Convert ANN search results to a ranked list for fusion input.
///
/// `VectorSearchResult` is already sorted ascending by distance (best first).
/// This function maps it to `(EntityId, f32)` where the f32 is the raw L2 distance.
/// The caller passes this to `fuse()` or `route_results()` which uses position-as-rank.
#[must_use]
pub fn ann_to_ranked(ann_results: &[VectorSearchResult]) -> Vec<(EntityId, f32)> {
    ann_results
        .iter()
        .map(|r| (EntityId::new(r.id), r.distance))
        .collect()
}
```

### Module Integration

Add to `tidal/src/query/mod.rs`:
```rust
pub use fusion::{HybridFusion, RetrievalMode, ann_to_ranked, route_results};
```

### Criterion Benchmark

```rust
// tidal/benches/fusion.rs

fn bench_rrf_1k_per_list(c: &mut Criterion) {
    // 1000 BM25 results
    let bm25: Vec<(EntityId, f32)> = (0u64..1000)
        .map(|i| (EntityId::new(i), (1000 - i) as f32))
        .collect();
    // 1000 ANN results, 50% overlap with BM25
    let ann: Vec<(EntityId, f32)> = (500u64..1500)
        .enumerate()
        .map(|(i, id)| (EntityId::new(id), i as f32 * 0.001))
        .collect();

    let fusion = HybridFusion::new();

    c.bench_function("rrf_fuse_1k_per_list", |b| {
        b.iter(|| {
            let results = fusion.fuse(black_box(&bm25), black_box(&ann));
            black_box(results)
        });
    });
}
```

## Acceptance Criteria

- [ ] `RetrievalMode` enum with `TextOnly`, `VectorOnly`, `Hybrid` variants in `fusion.rs`
- [ ] `RetrievalMode::determine(has_text, has_vector) -> Option<RetrievalMode>` returns correct variant
- [ ] `determine(false, false)` returns `None`
- [ ] `route_results(mode, bm25, ann, fusion) -> Vec<(EntityId, f64)>` implemented
- [ ] `TextOnly` path: BM25 scores converted to f64, list preserved
- [ ] `VectorOnly` path: ANN results converted to rank-based scores via `1.0 / (k + rank)`
- [ ] `Hybrid` path: calls `HybridFusion::fuse()` and returns result
- [ ] `ann_to_ranked(ann_results: &[VectorSearchResult]) -> Vec<(EntityId, f32)>` helper
- [ ] `RetrievalMode`, `route_results`, `ann_to_ranked` exported from `tidal/src/query/mod.rs`
- [ ] `tidal/benches/fusion.rs` created with Criterion benchmark `rrf_fuse_1k_per_list`
- [ ] Benchmark result confirms fusion < 1ms for 1000 candidates per list
- [ ] `[[bench]] name = "fusion" harness = false` added to `tidal/Cargo.toml`
- [ ] Unit tests: `determine_text_only`, `determine_vector_only`, `determine_hybrid`, `determine_none`, `route_text_only_passthrough`, `route_vector_only_rank_based`, `route_hybrid_calls_fuse`, `ann_to_ranked_converts_correctly`
- [ ] `cargo check`, `cargo fmt`, `cargo clippy -D warnings` all pass

## Test Strategy

```rust
#[test]
fn determine_text_only() {
    assert_eq!(RetrievalMode::determine(true, false), Some(RetrievalMode::TextOnly));
}

#[test]
fn determine_hybrid() {
    assert_eq!(RetrievalMode::determine(true, true), Some(RetrievalMode::Hybrid));
}

#[test]
fn determine_none() {
    assert_eq!(RetrievalMode::determine(false, false), None);
}

#[test]
fn route_text_only_passthrough() {
    let bm25 = vec![(EntityId::new(1), 1.0f32), (EntityId::new(2), 0.5f32)];
    let fusion = HybridFusion::new();
    let results = route_results(RetrievalMode::TextOnly, &bm25, &[], &fusion);
    assert_eq!(results.len(), 2);
    assert!((results[0].1 - 1.0f64).abs() < 1e-6);  // f32 → f64 exact
    assert!((results[1].1 - 0.5f64).abs() < 1e-6);
}

#[test]
fn route_vector_only_rank_based() {
    // VectorSearchResult order: rank 1 (index 0) gets score 1/(60+1)
    let ann = vec![
        (EntityId::new(1), 0.1f32),  // rank 1
        (EntityId::new(2), 0.2f32),  // rank 2
    ];
    let fusion = HybridFusion::new();
    let results = route_results(RetrievalMode::VectorOnly, &[], &ann, &fusion);
    assert_eq!(results.len(), 2);
    let expected_rank1 = 1.0 / (60.0 + 1.0);
    let expected_rank2 = 1.0 / (60.0 + 2.0);
    assert!((results[0].1 - expected_rank1).abs() < 1e-9);
    assert!((results[1].1 - expected_rank2).abs() < 1e-9);
}

#[test]
fn ann_to_ranked_converts_correctly() {
    use crate::storage::vector::VectorSearchResult;
    let ann_results = vec![
        VectorSearchResult { id: 42, distance: 0.1 },
        VectorSearchResult { id: 99, distance: 0.3 },
    ];
    let ranked = ann_to_ranked(&ann_results);
    assert_eq!(ranked.len(), 2);
    assert_eq!(ranked[0].0.as_u64(), 42);
    assert!((ranked[0].1 - 0.1f32).abs() < 1e-6);
    assert_eq!(ranked[1].0.as_u64(), 99);
}
```