tidaldb/docs/planning/milestone-2/phase-5/task-03-m2-uat-integration-test.md
jordan 6fdaa1584b feat: complete M1 signal engine — m0p3 samples/docs, m1p5 TidalDb API, examples, and periodic checkpoint
- m0p3: CONTRIBUTING.md with run-samples checklist, all 4 examples
  (quickstart, cli_embedding, axum_embedding, actix_embedding), doc-test
  coverage for every public API surface
- m1p5: TidalDb public API — write_item, signal, read_decay_score,
  read_windowed_count, read_velocity; StorageBox enum routing memory vs
  fjall; WalSender/WalHandleWriter bridge; WAL replay on open
- Periodic checkpoint: 30s background thread for persistent+schema mode;
  FjallBackend::Clone (O(1), fjall::Keyspace is ref-counted); graceful
  shutdown via Arc<AtomicBool> + join before final checkpoint
- ROADMAP.md: M0 and M1 fully marked COMPLETE (341 tests passing)
- Milestone 2 planning scaffolding added under docs/planning/milestone-2/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 22:45:10 -07:00

37 KiB

Task 03: M2 UAT Integration Test

Context

Milestone: 2 -- Ranked Retrieval Phase: m2p5 -- Query Parser and RETRIEVE Executor Depends On: Task 01 (Retrieve, Results, QueryError types), Task 02 (RetrieveExecutor, TidalDb::retrieve()) Blocks: Milestone 3 (personalized ranking) Complexity: M

Objective

Deliver the Milestone 2 User Acceptance Test as a Rust integration test in tidal/tests/m2_uat.rs. This test exercises the complete M2 scenario from the roadmap: open a database with a full schema (5 signal types, 6 ranking profiles), write 10K items with metadata and embeddings, write 10K signal events, execute all 6 profile queries verifying ordering and filter correctness, write a signal burst and verify rank change, and re-verify after shutdown and reopen.

This is the milestone gate. If it passes, Milestone 2 is done. The test proves that "a single query retrieves, scores, and ranks content using live signals" -- the M2 thesis.

Requirements

  • Full M2 UAT scenario from ROADMAP.md implemented as tidal/tests/m2_uat.rs
  • 10K items with metadata (category, format, creator_id) and 64-dim embeddings
  • 10K signal events spanning 7 days across 5 signal types
  • All 6 RETRIEVE queries executed and verified:
    1. trending with max_per_creator:1 diversity -- 25 results, creator-diverse, score-sorted
    2. hot with category:jazz filter -- only jazz items, score-sorted
    3. new -- created_at descending
    4. top_week -- signal-based ordering within 7d window
    5. hidden_gems -- quality/reach ratio ordering
    6. controversial -- dual-signal ranking
  • Signal burst for item #500, re-query trending, verify rank change
  • Shutdown and reopen, re-verify all queries
  • All tests use tempfile::TempDir for isolation
  • Tests must pass cargo test --test m2_uat
  • Deterministic test data (fixed timestamps, reproducible event sequences)

Technical Design

Module Structure

tidal/tests/
  m2_uat.rs   -- Full M2 UAT integration test

Test Implementation

// === tidal/tests/m2_uat.rs ===

use std::collections::HashMap;
use std::time::Duration;
use tempfile::TempDir;

use tidaldb::query::retrieve::Retrieve;
use tidaldb::ranking::diversity::DiversityConstraints;
use tidaldb::schema::*;
use tidaldb::storage::indexes::filter::FilterExpr;
use tidaldb::{Config, TidalDB};

// ============================================================
// Test Helpers
// ============================================================

/// Build the M2 schema: 5 signal types, 6 ranking profiles, 64-dim embeddings.
fn m2_schema() -> Schema {
    let mut builder = SchemaBuilder::new();

    // Embedding slot for items: 64-dim (small for test speed)
    builder.embedding_slot("default", EntityKind::Item, 64);

    // Signal types
    builder
        .signal(
            "view",
            EntityKind::Item,
            DecaySpec::Exponential {
                half_life: Duration::from_secs(7 * 24 * 3600), // 7 days
            },
        )
        .windows(&[
            Window::OneHour,
            Window::TwentyFourHours,
            Window::SevenDays,
            Window::AllTime,
        ])
        .velocity(true)
        .add();

    builder
        .signal(
            "like",
            EntityKind::Item,
            DecaySpec::Exponential {
                half_life: Duration::from_secs(14 * 24 * 3600), // 14 days
            },
        )
        .windows(&[
            Window::TwentyFourHours,
            Window::SevenDays,
            Window::AllTime,
        ])
        .velocity(true)
        .add();

    builder
        .signal(
            "skip",
            EntityKind::Item,
            DecaySpec::Exponential {
                half_life: Duration::from_secs(24 * 3600), // 1 day
            },
        )
        .windows(&[Window::OneHour, Window::TwentyFourHours])
        .velocity(false)
        .add();

    builder
        .signal(
            "share",
            EntityKind::Item,
            DecaySpec::Exponential {
                half_life: Duration::from_secs(3 * 24 * 3600), // 3 days
            },
        )
        .windows(&[
            Window::OneHour,
            Window::TwentyFourHours,
            Window::SevenDays,
        ])
        .velocity(true)
        .add();

    builder
        .signal(
            "completion",
            EntityKind::Item,
            DecaySpec::Exponential {
                half_life: Duration::from_secs(30 * 24 * 3600), // 30 days
            },
        )
        .windows(&[Window::SevenDays, Window::AllTime])
        .velocity(false)
        .add();

    // Built-in profiles are auto-registered: trending, hot, new, top_week,
    // hidden_gems, controversial, most_viewed, most_liked, shuffle, etc.

    builder.build().unwrap()
}

/// Categories used for test items. 10 distinct values.
const CATEGORIES: &[&str] = &[
    "jazz", "rock", "classical", "electronic", "hip_hop",
    "country", "blues", "folk", "metal", "pop",
];

/// Formats used for test items. 4 distinct values.
const FORMATS: &[&str] = &["video", "audio", "article", "short"];

/// Generate deterministic item metadata.
///
/// Returns (category, format, creator_id, created_at_offset_nanos).
fn item_metadata(
    item_index: u64,
) -> (String, String, EntityId, u64) {
    let category = CATEGORIES[(item_index as usize) % CATEGORIES.len()].to_string();
    let format = FORMATS[(item_index as usize) % FORMATS.len()].to_string();
    // 200 creators, distributed round-robin
    let creator_id = EntityId::new((item_index % 200) + 1);
    // Spread creation times across 30 days (newest items have highest index)
    let thirty_days_nanos = 30u64 * 24 * 3600 * 1_000_000_000;
    let created_at_offset = (item_index * thirty_days_nanos) / 10_000;
    (category, format, creator_id, created_at_offset)
}

/// Generate a deterministic 64-dim embedding for an item.
///
/// Uses a simple deterministic formula based on the item index.
/// The embeddings are normalized to unit length for cosine similarity.
fn generate_embedding(item_index: u64, dimensions: usize) -> Vec<f32> {
    let mut vec: Vec<f32> = (0..dimensions)
        .map(|d| {
            // Deterministic pseudo-random using item index and dimension
            let seed = (item_index as f32 * 0.7 + d as f32 * 1.3).sin();
            seed
        })
        .collect();

    // L2 normalize
    let norm: f32 = vec.iter().map(|x| x * x).sum::<f32>().sqrt();
    if norm > 0.0 {
        for v in &mut vec {
            *v /= norm;
        }
    }

    vec
}

/// Generate deterministic signal events spanning a time range.
///
/// Distributes events across entities and signal types with a prime
/// stride for reproducible but varied patterns. Each entity gets a
/// different number of events to create interesting ranking dynamics.
fn generate_signal_events(
    count: usize,
    entity_count: u64,
    base_time_nanos: u64,
    span_nanos: u64,
) -> Vec<(EntityId, &'static str, f64, u64)> {
    let signal_types = ["view", "like", "skip", "share", "completion"];
    let mut events = Vec::with_capacity(count);

    for i in 0..count {
        // Entity distribution: power-law-ish (some items get many more events)
        let entity_raw = ((i as u64) * 7919 + 1) % entity_count;
        let entity_id = EntityId::new(entity_raw + 1);

        // Signal type: round-robin
        let signal = signal_types[i % signal_types.len()];

        // Weight: always 1.0 for count-based signals
        let weight = 1.0;

        // Timestamp: spread across the time span
        let offset = ((i as u64) * 104729 + 1) % span_nanos;
        let ts = base_time_nanos.saturating_sub(span_nanos) + offset;

        events.push((entity_id, signal, weight, ts));
    }

    events
}

/// Count unique creators in a result set.
fn creator_counts(
    results: &[tidaldb::query::retrieve::RetrieveResult],
    db: &TidalDB,
) -> HashMap<EntityId, usize> {
    let mut counts: HashMap<EntityId, usize> = HashMap::new();
    for result in results {
        if let Ok(Some(meta)) = db.get_item_metadata(result.entity_id) {
            if let Some(creator_id) = meta.creator_id {
                *counts.entry(creator_id).or_insert(0) += 1;
            }
        }
    }
    counts
}

/// Get the category of an item from the database.
fn item_category(db: &TidalDB, entity_id: EntityId) -> Option<String> {
    db.get_item_metadata(entity_id)
        .ok()
        .flatten()
        .and_then(|m| m.category.clone())
}

// ============================================================
// THE M2 UAT TEST
// ============================================================
//
// This is the definitive acceptance test for Milestone 2.
// It matches the UAT scenario in ROADMAP.md.
#[test]
fn milestone_2_uat() {
    let dir = TempDir::new().unwrap();
    let schema = m2_schema();

    let db = TidalDB::open(Config {
        data_dir: dir.path().to_owned(),
        schema: schema.clone(),
    })
    .unwrap();

    // ============================================================
    // Setup: Write 10K items with metadata and embeddings
    // ============================================================

    let now = Timestamp::now();
    let now_nanos = now.as_nanos();

    for i in 0..10_000u64 {
        let (category, format, creator_id, created_at_offset) = item_metadata(i);
        let embedding = generate_embedding(i, 64);
        let created_at_nanos = now_nanos.saturating_sub(created_at_offset);

        db.write_item_with_metadata(
            EntityId::new(i + 1),
            &category,
            &format,
            creator_id,
            Timestamp::from_nanos(created_at_nanos),
            Some(&embedding),
        )
        .unwrap();
    }

    // Verify item count
    assert_eq!(db.item_count().unwrap(), 10_000);

    // ============================================================
    // Setup: Write 10K signal events spanning 7 days
    // ============================================================

    let seven_days_nanos = 7u64 * 24 * 3600 * 1_000_000_000;
    let events = generate_signal_events(10_000, 10_000, now_nanos, seven_days_nanos);

    for (entity_id, signal_type, weight, ts_nanos) in &events {
        db.signal(signal_type, *entity_id, *weight, Timestamp::from_nanos(*ts_nanos))
            .unwrap();
    }

    // ============================================================
    // Query 1: Trending with diversity
    // ============================================================
    // RETRIEVE items USING PROFILE trending DIVERSITY max_per_creator:1 LIMIT 25

    let trending_query = Retrieve::builder()
        .entity(EntityKind::Item)
        .profile("trending")
        .diversity(DiversityConstraints::new().max_per_creator(1))
        .limit(25)
        .build()
        .unwrap();

    let trending_results = db.retrieve(&trending_query).unwrap();

    // Verify: got results (up to 25)
    assert!(
        !trending_results.is_empty(),
        "trending query should return results"
    );
    assert!(
        trending_results.len() <= 25,
        "trending query should return at most 25 results, got {}",
        trending_results.len()
    );

    // Verify: scores are sorted descending
    for pair in trending_results.items.windows(2) {
        assert!(
            pair[0].score >= pair[1].score,
            "trending results should be sorted descending: {} >= {} (ranks {} and {})",
            pair[0].score,
            pair[1].score,
            pair[0].rank,
            pair[1].rank,
        );
    }

    // Verify: creator diversity (max 1 per creator)
    let creators = creator_counts(&trending_results.items, &db);
    for (creator_id, count) in &creators {
        assert!(
            *count <= 1,
            "max_per_creator:1 violated: creator {} appears {} times",
            creator_id,
            count,
        );
    }

    // Verify: ranks are 1-based and sequential
    for (i, item) in trending_results.items.iter().enumerate() {
        assert_eq!(
            item.rank,
            i + 1,
            "rank should be 1-based sequential, got {} at position {}",
            item.rank,
            i,
        );
    }

    // ============================================================
    // Query 2: Hot with category filter
    // ============================================================
    // RETRIEVE items FILTER category:jazz USING PROFILE hot LIMIT 20

    let jazz_query = Retrieve::builder()
        .entity(EntityKind::Item)
        .profile("hot")
        .filter(FilterExpr::eq("category", "jazz"))
        .limit(20)
        .build()
        .unwrap();

    let jazz_results = db.retrieve(&jazz_query).unwrap();

    // Verify: only jazz items returned
    for item in &jazz_results.items {
        let category = item_category(&db, item.entity_id);
        assert_eq!(
            category.as_deref(),
            Some("jazz"),
            "hot+jazz query returned non-jazz item: entity={}, category={:?}",
            item.entity_id,
            category,
        );
    }

    // Verify: scores are sorted descending
    for pair in jazz_results.items.windows(2) {
        assert!(
            pair[0].score >= pair[1].score,
            "jazz results should be sorted descending: {} >= {}",
            pair[0].score,
            pair[1].score,
        );
    }

    // ============================================================
    // Query 3: New (created_at descending)
    // ============================================================
    // RETRIEVE items USING PROFILE new LIMIT 20

    let new_query = Retrieve::builder()
        .entity(EntityKind::Item)
        .profile("new")
        .limit(20)
        .build()
        .unwrap();

    let new_results = db.retrieve(&new_query).unwrap();

    assert!(
        !new_results.is_empty(),
        "new query should return results"
    );
    assert!(
        new_results.len() <= 20,
        "new query should return at most 20 results"
    );

    // Verify: scores are sorted descending (new profile uses created_at as score)
    for pair in new_results.items.windows(2) {
        assert!(
            pair[0].score >= pair[1].score,
            "new results should be sorted descending: {} >= {} (entities {} and {})",
            pair[0].score,
            pair[1].score,
            pair[0].entity_id,
            pair[1].entity_id,
        );
    }

    // ============================================================
    // Query 4: Top week (signal-based ordering within 7d window)
    // ============================================================
    // RETRIEVE items USING PROFILE top_week LIMIT 20

    let top_week_query = Retrieve::builder()
        .entity(EntityKind::Item)
        .profile("top_week")
        .limit(20)
        .build()
        .unwrap();

    let top_week_results = db.retrieve(&top_week_query).unwrap();

    assert!(
        !top_week_results.is_empty(),
        "top_week query should return results"
    );

    // Verify: scores are sorted descending
    for pair in top_week_results.items.windows(2) {
        assert!(
            pair[0].score >= pair[1].score,
            "top_week results should be sorted descending: {} >= {}",
            pair[0].score,
            pair[1].score,
        );
    }

    // ============================================================
    // Query 5: Hidden gems
    // ============================================================
    // ROADMAP UAT: RETRIEVE items USING PROFILE hidden_gems FILTER min_completion_rate:0.7 LIMIT 10
    //
    // M2 limitation: `min_completion_rate` is a signal-derived filter (completion
    // rate = completion_count / view_count). The m2p2 filter engine supports
    // metadata field filters (BitmapIndex, RangeIndex) but not computed signal
    // ratios. Signal-derived predicates are an M3+ extension to the filter engine.
    // For M2, the hidden_gems query runs without the completion rate filter;
    // all items are candidates and the hidden_gems scoring formula naturally
    // surfaces items with high completion-to-view ratios.

    let hidden_gems_query = Retrieve::builder()
        .entity(EntityKind::Item)
        .profile("hidden_gems")
        // TODO M3: add .filter(FilterExpr::signal_ratio("completion", "view", 0.7))
        // once signal-derived predicates are supported in the filter engine.
        .limit(10)
        .build()
        .unwrap();

    let hidden_gems_results = db.retrieve(&hidden_gems_query).unwrap();

    assert!(
        !hidden_gems_results.is_empty(),
        "hidden_gems query should return results"
    );

    // Verify: scores are sorted descending
    for pair in hidden_gems_results.items.windows(2) {
        assert!(
            pair[0].score >= pair[1].score,
            "hidden_gems results should be sorted descending: {} >= {}",
            pair[0].score,
            pair[1].score,
        );
    }

    // ============================================================
    // Query 6: Controversial (dual-signal ranking)
    // ============================================================
    // RETRIEVE items USING PROFILE controversial LIMIT 10

    let controversial_query = Retrieve::builder()
        .entity(EntityKind::Item)
        .profile("controversial")
        .limit(10)
        .build()
        .unwrap();

    let controversial_results = db.retrieve(&controversial_query).unwrap();

    assert!(
        !controversial_results.is_empty(),
        "controversial query should return results"
    );

    // Verify: scores are sorted descending
    for pair in controversial_results.items.windows(2) {
        assert!(
            pair[0].score >= pair[1].score,
            "controversial results should be sorted descending: {} >= {}",
            pair[0].score,
            pair[1].score,
        );
    }

    // ============================================================
    // Signal Burst: Write 100 "share" signals for item #500
    // ============================================================

    // Record pre-burst trending results
    let pre_burst_trending = Retrieve::builder()
        .entity(EntityKind::Item)
        .profile("trending")
        .limit(50)
        .build()
        .unwrap();
    let pre_burst_results = db.retrieve(&pre_burst_trending).unwrap();
    let pre_burst_rank = pre_burst_results
        .items
        .iter()
        .position(|r| r.entity_id == EntityId::new(500));

    // Write 100 "share" signals for item #500 at the current time
    let burst_time = Timestamp::now();
    for _ in 0..100 {
        db.signal("share", EntityId::new(500), 1.0, burst_time)
            .unwrap();
    }

    // Re-execute trending query
    let post_burst_results = db.retrieve(&pre_burst_trending).unwrap();
    let post_burst_rank = post_burst_results
        .items
        .iter()
        .position(|r| r.entity_id == EntityId::new(500));

    // Verify: item #500 should be present (or rose from absent to present)
    // and its rank should have improved (or appeared)
    match (pre_burst_rank, post_burst_rank) {
        (None, Some(rank)) => {
            // Item was not in the top 50 before, now it is -- signal burst worked
            assert!(
                rank < 50,
                "item #500 should appear in top 50 after burst, found at position {}",
                rank
            );
        }
        (Some(pre), Some(post)) => {
            // Item was in top 50 and should have moved up
            assert!(
                post <= pre,
                "item #500 should rank higher after burst: pre={}, post={}",
                pre,
                post
            );
        }
        (None, None) => {
            // If item #500 still does not appear in top 50 after 100 share signals,
            // check that it at least has a higher score than before.
            // This can happen if the item is in a crowded ranking.
            // We verify signal write worked by reading the signal directly.
            let share_count = db
                .read_windowed_count(EntityId::new(500), "share", Window::AllTime)
                .unwrap();
            assert!(
                share_count >= 100,
                "item #500 should have at least 100 shares after burst, got {}",
                share_count
            );
        }
        (Some(_), None) => {
            panic!(
                "item #500 was in trending before burst but disappeared after -- this is wrong"
            );
        }
    }

    // ============================================================
    // Crash Recovery: Shutdown and reopen
    // ============================================================

    db.shutdown().unwrap();

    let db2 = TidalDB::open(Config {
        data_dir: dir.path().to_owned(),
        schema: schema.clone(),
    })
    .unwrap();

    // Re-verify: items survived
    assert_eq!(
        db2.item_count().unwrap(),
        10_000,
        "item count should survive restart"
    );

    // Re-verify: trending query still works
    let recovered_trending = Retrieve::builder()
        .entity(EntityKind::Item)
        .profile("trending")
        .limit(25)
        .build()
        .unwrap();
    let recovered_results = db2.retrieve(&recovered_trending).unwrap();
    assert!(
        !recovered_results.is_empty(),
        "trending query should work after restart"
    );

    // Re-verify: scores are sorted descending after restart
    for pair in recovered_results.items.windows(2) {
        assert!(
            pair[0].score >= pair[1].score,
            "trending results after restart should be sorted: {} >= {}",
            pair[0].score,
            pair[1].score,
        );
    }

    // Re-verify: hot+jazz filter still works
    let recovered_jazz = Retrieve::builder()
        .entity(EntityKind::Item)
        .profile("hot")
        .filter(FilterExpr::eq("category", "jazz"))
        .limit(20)
        .build()
        .unwrap();
    let recovered_jazz_results = db2.retrieve(&recovered_jazz).unwrap();
    for item in &recovered_jazz_results.items {
        let category = item_category(&db2, item.entity_id);
        assert_eq!(
            category.as_deref(),
            Some("jazz"),
            "jazz filter should still work after restart"
        );
    }

    // Re-verify: signal burst for item #500 survived
    let recovered_share_count = db2
        .read_windowed_count(EntityId::new(500), "share", Window::AllTime)
        .unwrap();
    assert!(
        recovered_share_count >= 100,
        "share signals for item #500 should survive restart, got {}",
        recovered_share_count
    );

    db2.shutdown().unwrap();
}

// ============================================================
// SIGNAL SNAPSHOT TRANSPARENCY TEST
// ============================================================
//
// Verifies that RETRIEVE results include signal snapshots
// for debugging and ranking transparency.
#[test]
fn retrieve_results_include_signal_snapshots() {
    let dir = TempDir::new().unwrap();
    let schema = m2_schema();

    let db = TidalDB::open(Config {
        data_dir: dir.path().to_owned(),
        schema,
    })
    .unwrap();

    // Write 100 items with embeddings
    for i in 0..100u64 {
        let (category, format, creator_id, created_at_offset) = item_metadata(i);
        let embedding = generate_embedding(i, 64);
        let now = Timestamp::now();

        db.write_item_with_metadata(
            EntityId::new(i + 1),
            &category,
            &format,
            creator_id,
            Timestamp::from_nanos(now.as_nanos().saturating_sub(created_at_offset)),
            Some(&embedding),
        )
        .unwrap();
    }

    // Write enough signals so profiles have data to score with
    let now = Timestamp::now();
    for i in 0..500u64 {
        let entity = EntityId::new((i % 100) + 1);
        db.signal("view", entity, 1.0, now).unwrap();
        if i % 3 == 0 {
            db.signal("like", entity, 1.0, now).unwrap();
        }
    }

    // Query with hot profile
    let query = Retrieve::builder()
        .profile("hot")
        .limit(10)
        .build()
        .unwrap();

    let results = db.retrieve(&query).unwrap();

    // At least some results should have signal snapshots
    let has_snapshots = results
        .items
        .iter()
        .any(|r| !r.signal_snapshot.is_empty());
    assert!(
        has_snapshots,
        "at least some results should include signal snapshots"
    );

    // Signal snapshots should be capped at 10
    for item in &results.items {
        assert!(
            item.signal_snapshot.len() <= 10,
            "signal snapshot should be capped at 10, got {}",
            item.signal_snapshot.len()
        );
    }

    db.shutdown().unwrap();
}

// ============================================================
// EXCLUDE LIST TEST
// ============================================================
//
// Verifies that EXCLUDE IDs are removed from results.
#[test]
fn retrieve_excludes_specified_ids() {
    let dir = TempDir::new().unwrap();
    let schema = m2_schema();

    let db = TidalDB::open(Config {
        data_dir: dir.path().to_owned(),
        schema,
    })
    .unwrap();

    // Write 50 items
    for i in 0..50u64 {
        let (category, format, creator_id, created_at_offset) = item_metadata(i);
        let embedding = generate_embedding(i, 64);
        let now = Timestamp::now();

        db.write_item_with_metadata(
            EntityId::new(i + 1),
            &category,
            &format,
            creator_id,
            Timestamp::from_nanos(now.as_nanos().saturating_sub(created_at_offset)),
            Some(&embedding),
        )
        .unwrap();
    }

    // Write signals
    let now = Timestamp::now();
    for i in 0..200u64 {
        let entity = EntityId::new((i % 50) + 1);
        db.signal("view", entity, 1.0, now).unwrap();
    }

    // Query without excludes
    let query_no_exclude = Retrieve::builder()
        .profile("hot")
        .limit(20)
        .build()
        .unwrap();
    let results_no_exclude = db.retrieve(&query_no_exclude).unwrap();

    // Pick the top 3 IDs to exclude
    let exclude_ids: Vec<EntityId> = results_no_exclude
        .items
        .iter()
        .take(3)
        .map(|r| r.entity_id)
        .collect();

    // Query with excludes
    let query_with_exclude = Retrieve::builder()
        .profile("hot")
        .exclude_ids(exclude_ids.clone())
        .limit(20)
        .build()
        .unwrap();
    let results_with_exclude = db.retrieve(&query_with_exclude).unwrap();

    // Verify: excluded IDs are not in results
    for item in &results_with_exclude.items {
        assert!(
            !exclude_ids.contains(&item.entity_id),
            "excluded entity {} should not appear in results",
            item.entity_id,
        );
    }

    db.shutdown().unwrap();
}

// ============================================================
// PAGINATION TEST
// ============================================================
//
// Verifies that offset-based cursor pagination works correctly
// in the absence of concurrent writes. Note: offset cursors are
// NOT stable under concurrent signal writes (the ranked list can
// shift between pages). This test only covers the non-concurrent
// case. See Cursor doc in task-01 for the full limitation note.
#[test]
fn retrieve_pagination_via_cursor() {
    let dir = TempDir::new().unwrap();
    let schema = m2_schema();

    let db = TidalDB::open(Config {
        data_dir: dir.path().to_owned(),
        schema,
    })
    .unwrap();

    // Write 100 items
    for i in 0..100u64 {
        let (category, format, creator_id, created_at_offset) = item_metadata(i);
        let embedding = generate_embedding(i, 64);
        let now = Timestamp::now();

        db.write_item_with_metadata(
            EntityId::new(i + 1),
            &category,
            &format,
            creator_id,
            Timestamp::from_nanos(now.as_nanos().saturating_sub(created_at_offset)),
            Some(&embedding),
        )
        .unwrap();
    }

    // Write signals
    let now = Timestamp::now();
    for i in 0..500u64 {
        let entity = EntityId::new((i % 100) + 1);
        db.signal("view", entity, 1.0, now).unwrap();
    }

    // Page 1: first 10 results
    let page1_query = Retrieve::builder()
        .profile("hot")
        .limit(10)
        .build()
        .unwrap();
    let page1 = db.retrieve(&page1_query).unwrap();

    assert_eq!(page1.len(), 10, "page 1 should have 10 results");
    assert!(
        page1.next_cursor.is_some(),
        "page 1 should have a next cursor"
    );

    // Page 2: next 10 results using cursor
    let page2_query = Retrieve::builder()
        .profile("hot")
        .limit(10)
        .cursor(page1.next_cursor.unwrap())
        .build()
        .unwrap();
    let page2 = db.retrieve(&page2_query).unwrap();

    assert_eq!(page2.len(), 10, "page 2 should have 10 results");

    // Verify: no overlap between pages
    let page1_ids: Vec<EntityId> = page1.items.iter().map(|r| r.entity_id).collect();
    let page2_ids: Vec<EntityId> = page2.items.iter().map(|r| r.entity_id).collect();
    for id in &page2_ids {
        assert!(
            !page1_ids.contains(id),
            "entity {} appears on both page 1 and page 2",
            id,
        );
    }

    // Verify: page 2 ranks continue from page 1
    assert_eq!(page2.items[0].rank, 11, "page 2 should start at rank 11");

    db.shutdown().unwrap();
}

// ============================================================
// QUERY VALIDATION ERROR TEST
// ============================================================
//
// Verifies that invalid queries produce clear errors.
#[test]
fn retrieve_rejects_invalid_queries() {
    let dir = TempDir::new().unwrap();
    let schema = m2_schema();

    let db = TidalDB::open(Config {
        data_dir: dir.path().to_owned(),
        schema,
    })
    .unwrap();

    // Unknown profile
    let unknown_profile = Retrieve::builder()
        .profile("nonexistent_profile")
        .limit(10)
        .build()
        .unwrap();
    let result = db.retrieve(&unknown_profile);
    assert!(
        matches!(result, Err(tidaldb::query::retrieve::QueryError::ProfileNotFound(_))),
        "unknown profile should return ProfileNotFound, got: {:?}",
        result,
    );

    // Limit = 0 (caught at builder level)
    let result = Retrieve::builder().profile("new").limit(0).build();
    assert!(
        matches!(result, Err(tidaldb::query::retrieve::QueryError::InvalidLimit { .. })),
        "limit=0 should return InvalidLimit"
    );

    // Limit > 500 (caught at builder level)
    let result = Retrieve::builder().profile("new").limit(501).build();
    assert!(
        matches!(result, Err(tidaldb::query::retrieve::QueryError::InvalidLimit { .. })),
        "limit=501 should return InvalidLimit"
    );

    db.shutdown().unwrap();
}

// ============================================================
// DETERMINISTIC RESULTS TEST
// ============================================================
//
// Verifies INV-QUERY-1: same query with same state produces
// identical results.
#[test]
fn retrieve_deterministic_results() {
    let dir = TempDir::new().unwrap();
    let schema = m2_schema();

    let db = TidalDB::open(Config {
        data_dir: dir.path().to_owned(),
        schema,
    })
    .unwrap();

    // Write 100 items with signals
    let now = Timestamp::now();
    for i in 0..100u64 {
        let (category, format, creator_id, created_at_offset) = item_metadata(i);
        let embedding = generate_embedding(i, 64);
        db.write_item_with_metadata(
            EntityId::new(i + 1),
            &category,
            &format,
            creator_id,
            Timestamp::from_nanos(now.as_nanos().saturating_sub(created_at_offset)),
            Some(&embedding),
        )
        .unwrap();

        db.signal("view", EntityId::new(i + 1), 1.0, now).unwrap();
        if i % 3 == 0 {
            db.signal("like", EntityId::new(i + 1), 1.0, now)
                .unwrap();
        }
    }

    let query = Retrieve::builder()
        .profile("hot")
        .limit(20)
        .build()
        .unwrap();

    let results1 = db.retrieve(&query).unwrap();
    let results2 = db.retrieve(&query).unwrap();

    assert_eq!(results1.len(), results2.len(), "result counts must match");

    for (r1, r2) in results1.items.iter().zip(results2.items.iter()) {
        assert_eq!(
            r1.entity_id, r2.entity_id,
            "entity IDs must match at rank {}",
            r1.rank,
        );
        assert!(
            (r1.score - r2.score).abs() < f64::EPSILON,
            "scores must be identical for entity {} at rank {}: {} vs {}",
            r1.entity_id,
            r1.rank,
            r1.score,
            r2.score,
        );
    }

    db.shutdown().unwrap();
}

Acceptance Criteria

  • milestone_2_uat test passes: all 6 queries return correctly ordered results
  • Query 1 (trending): results sorted descending, creator diversity enforced (max 1 per creator), ranks are 1-based sequential
  • Query 2 (hot + jazz filter): only jazz items returned, sorted descending by hot score
  • Query 3 (new): results sorted by created_at descending
  • Query 4 (top_week): results sorted by 7d signal-based score
  • Query 5 (hidden_gems): results sorted by quality/reach ratio
  • Query 6 (controversial): results sorted by dual-signal score
  • Signal burst: writing 100 "share" signals for item #500 causes it to rise in trending rank (or appear if previously absent)
  • Crash recovery: shutdown and reopen preserves all items, signals, and query functionality
  • retrieve_results_include_signal_snapshots test passes: at least some results have non-empty snapshots, all capped at 10
  • retrieve_excludes_specified_ids test passes: excluded IDs never appear in results
  • retrieve_pagination_via_cursor test passes: pages do not overlap, ranks continue correctly
  • retrieve_rejects_invalid_queries test passes: clear errors for unknown profile, invalid limit
  • retrieve_deterministic_results test passes: same query produces identical results (INV-QUERY-1)
  • cargo test --test m2_uat passes
  • No unsafe code in tests
  • Test data is deterministic (fixed seeds, reproducible event sequences)

Research References

Spec References

  • docs/specs/08-query-engine.md -- Section 2 (RETRIEVE operation), Section 5 (execution pipeline), Section 8 (pagination), Section 15 (invariants: INV-QUERY-1 deterministic, INV-QUERY-2 filter correctness)
  • docs/specs/09-ranking-scoring.md -- Section 11 (sort mode formulas verified by query ordering), Section 16 (INV-RANK-1 deterministic scoring, INV-RANK-5 diversity never reduces result count)

Implementation Notes

  • Signal count (10K vs ROADMAP's 100K): The ROADMAP UAT specifies 100K signal events. This test uses 10K to keep cargo test --test m2_uat under 30 seconds. 10K signals across 10K items averages 1 signal per entity — sparse but sufficient for correctness testing of ranking logic. For scale validation, add a #[ignore] test:
    #[test]
    #[ignore = "scale test: takes 2-3 minutes, run with --ignored"]
    fn milestone_2_uat_100k_signals() {
        // same as milestone_2_uat but with 100K signals
    }
    
    Run with: cargo test --test m2_uat -- --ignored milestone_2_uat_100k_signals
  • The generate_embedding function uses sin() for deterministic pseudo-random vectors. The embeddings are L2-normalized so they work correctly with USearch's cosine/L2 equivalence. Use 64 dimensions for test speed -- the trait abstraction handles any dimension.
  • The generate_signal_events function uses prime strides (7919, 104729) for reproducible distribution without a PRNG dependency. The distribution is power-law-ish: some entities get more events than others, creating interesting ranking dynamics.
  • The write_item_with_metadata API is a convenience wrapper expected to exist on TidalDb for M2. If it does not exist, this task must add it. It stores structured metadata (category, format, creator_id, created_at) that the bitmap/range indexes and the RETRIEVE executor can read. The exact API shape depends on how metadata is stored after m2p2 (bitmap indexes) is integrated.
  • The signal burst test (100 "share" signals for item #500) verifies signal freshness: a signal written during the test is reflected in the very next query. The test handles the case where item #500 does not appear in the top 50 before or after the burst (possible with random signal distribution) by falling back to verifying the signal count directly.
  • The crash recovery section re-verifies item count, trending query, jazz filter, and signal persistence. It does NOT require exact score-level equality with pre-crash results (decay scores advance with time, so scores computed at a later time after restart will differ slightly). It verifies functional correctness: queries work, filters apply, signals survived.
  • Test execution time target: < 30 seconds for the full m2_uat test. At 10K items with 64-dim embeddings and 10K signals, setup should take ~5 seconds (item writes + signal writes), and the 6 queries should each take < 100ms. If the test is too slow, reduce item count to 5K or embedding dimension to 32.
  • All test assertions include descriptive failure messages. A failing assertion should tell the developer exactly what went wrong and which UAT step failed.
  • The m2_uat.rs file is an integration test (in tidal/tests/), not a unit test (in src/). It links against the compiled crate and tests the public API exactly as a user would.