jordan 6fdaa1584b feat: complete M1 signal engine — m0p3 samples/docs, m1p5 TidalDb API, examples, and periodic checkpoint

- m0p3: CONTRIBUTING.md with run-samples checklist, all 4 examples
  (quickstart, cli_embedding, axum_embedding, actix_embedding), doc-test
  coverage for every public API surface
- m1p5: TidalDb public API — write_item, signal, read_decay_score,
  read_windowed_count, read_velocity; StorageBox enum routing memory vs
  fjall; WalSender/WalHandleWriter bridge; WAL replay on open
- Periodic checkpoint: 30s background thread for persistent+schema mode;
  FjallBackend::Clone (O(1), fjall::Keyspace is ref-counted); graceful
  shutdown via Arc<AtomicBool> + join before final checkpoint
- ROADMAP.md: M0 and M1 fully marked COMPLETE (341 tests passing)
- Milestone 2 planning scaffolding added under docs/planning/milestone-2/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-20 22:45:10 -07:00

37 KiB

Raw Blame History

Task 03: M2 UAT Integration Test

Context

Milestone: 2 -- Ranked Retrieval Phase: m2p5 -- Query Parser and RETRIEVE Executor Depends On: Task 01 (Retrieve, Results, QueryError types), Task 02 (RetrieveExecutor, TidalDb::retrieve()) Blocks: Milestone 3 (personalized ranking) Complexity: M

Objective

Deliver the Milestone 2 User Acceptance Test as a Rust integration test in tidal/tests/m2_uat.rs. This test exercises the complete M2 scenario from the roadmap: open a database with a full schema (5 signal types, 6 ranking profiles), write 10K items with metadata and embeddings, write 10K signal events, execute all 6 profile queries verifying ordering and filter correctness, write a signal burst and verify rank change, and re-verify after shutdown and reopen.

This is the milestone gate. If it passes, Milestone 2 is done. The test proves that "a single query retrieves, scores, and ranks content using live signals" -- the M2 thesis.

Requirements

Full M2 UAT scenario from ROADMAP.md implemented as tidal/tests/m2_uat.rs
10K items with metadata (category, format, creator_id) and 64-dim embeddings
10K signal events spanning 7 days across 5 signal types
All 6 RETRIEVE queries executed and verified:
1. trending with max_per_creator:1 diversity -- 25 results, creator-diverse, score-sorted
2. hot with category:jazz filter -- only jazz items, score-sorted
3. new -- created_at descending
4. top_week -- signal-based ordering within 7d window
5. hidden_gems -- quality/reach ratio ordering
6. controversial -- dual-signal ranking
Signal burst for item #500, re-query trending, verify rank change
Shutdown and reopen, re-verify all queries
All tests use tempfile::TempDir for isolation
Tests must pass cargo test --test m2_uat
Deterministic test data (fixed timestamps, reproducible event sequences)

Technical Design

Module Structure

tidal/tests/
  m2_uat.rs   -- Full M2 UAT integration test

Test Implementation

// === tidal/tests/m2_uat.rs ===

use std::collections::HashMap;
use std::time::Duration;
use tempfile::TempDir;

use tidaldb::query::retrieve::Retrieve;
use tidaldb::ranking::diversity::DiversityConstraints;
use tidaldb::schema::*;
use tidaldb::storage::indexes::filter::FilterExpr;
use tidaldb::{Config, TidalDB};

// ============================================================
// Test Helpers
// ============================================================

/// Build the M2 schema: 5 signal types, 6 ranking profiles, 64-dim embeddings.
fn m2_schema() -> Schema {
    let mut builder = SchemaBuilder::new();

    // Embedding slot for items: 64-dim (small for test speed)
    builder.embedding_slot("default", EntityKind::Item, 64);

    // Signal types
    builder
        .signal(
            "view",
            EntityKind::Item,
            DecaySpec::Exponential {
                half_life: Duration::from_secs(7 * 24 * 3600), // 7 days
            },
        )
        .windows(&[
            Window::OneHour,
            Window::TwentyFourHours,
            Window::SevenDays,
            Window::AllTime,
        ])
        .velocity(true)
        .add();

    builder
        .signal(
            "like",
            EntityKind::Item,
            DecaySpec::Exponential {
                half_life: Duration::from_secs(14 * 24 * 3600), // 14 days
            },
        )
        .windows(&[
            Window::TwentyFourHours,
            Window::SevenDays,
            Window::AllTime,
        ])
        .velocity(true)
        .add();

    builder
        .signal(
            "skip",
            EntityKind::Item,
            DecaySpec::Exponential {
                half_life: Duration::from_secs(24 * 3600), // 1 day
            },
        )
        .windows(&[Window::OneHour, Window::TwentyFourHours])
        .velocity(false)
        .add();

    builder
        .signal(
            "share",
            EntityKind::Item,
            DecaySpec::Exponential {
                half_life: Duration::from_secs(3 * 24 * 3600), // 3 days
            },
        )
        .windows(&[
            Window::OneHour,
            Window::TwentyFourHours,
            Window::SevenDays,
        ])
        .velocity(true)
        .add();

    builder
        .signal(
            "completion",
            EntityKind::Item,
            DecaySpec::Exponential {
                half_life: Duration::from_secs(30 * 24 * 3600), // 30 days
            },
        )
        .windows(&[Window::SevenDays, Window::AllTime])
        .velocity(false)
        .add();

    // Built-in profiles are auto-registered: trending, hot, new, top_week,
    // hidden_gems, controversial, most_viewed, most_liked, shuffle, etc.

    builder.build().unwrap()
}

/// Categories used for test items. 10 distinct values.
const CATEGORIES: &[&str] = &[
    "jazz", "rock", "classical", "electronic", "hip_hop",
    "country", "blues", "folk", "metal", "pop",
];

/// Formats used for test items. 4 distinct values.
const FORMATS: &[&str] = &["video", "audio", "article", "short"];

/// Generate deterministic item metadata.
///
/// Returns (category, format, creator_id, created_at_offset_nanos).
fn item_metadata(
    item_index: u64,
) -> (String, String, EntityId, u64) {
    let category = CATEGORIES[(item_index as usize) % CATEGORIES.len()].to_string();
    let format = FORMATS[(item_index as usize) % FORMATS.len()].to_string();
    // 200 creators, distributed round-robin
    let creator_id = EntityId::new((item_index % 200) + 1);
    // Spread creation times across 30 days (newest items have highest index)
    let thirty_days_nanos = 30u64 * 24 * 3600 * 1_000_000_000;
    let created_at_offset = (item_index * thirty_days_nanos) / 10_000;
    (category, format, creator_id, created_at_offset)
}

/// Generate a deterministic 64-dim embedding for an item.
///
/// Uses a simple deterministic formula based on the item index.
/// The embeddings are normalized to unit length for cosine similarity.
fn generate_embedding(item_index: u64, dimensions: usize) -> Vec<f32> {
    let mut vec: Vec<f32> = (0..dimensions)
        .map(|d| {
            // Deterministic pseudo-random using item index and dimension
            let seed = (item_index as f32 * 0.7 + d as f32 * 1.3).sin();
            seed
        })
        .collect();

    // L2 normalize
    let norm: f32 = vec.iter().map(|x| x * x).sum::<f32>().sqrt();
    if norm > 0.0 {
        for v in &mut vec {
            *v /= norm;
        }
    }

    vec
}

/// Generate deterministic signal events spanning a time range.
///
/// Distributes events across entities and signal types with a prime
/// stride for reproducible but varied patterns. Each entity gets a
/// different number of events to create interesting ranking dynamics.
fn generate_signal_events(
    count: usize,
    entity_count: u64,
    base_time_nanos: u64,
    span_nanos: u64,
) -> Vec<(EntityId, &'static str, f64, u64)> {
    let signal_types = ["view", "like", "skip", "share", "completion"];
    let mut events = Vec::with_capacity(count);

    for i in 0..count {
        // Entity distribution: power-law-ish (some items get many more events)
        let entity_raw = ((i as u64) * 7919 + 1) % entity_count;
        let entity_id = EntityId::new(entity_raw + 1);

        // Signal type: round-robin
        let signal = signal_types[i % signal_types.len()];

        // Weight: always 1.0 for count-based signals
        let weight = 1.0;

        // Timestamp: spread across the time span
        let offset = ((i as u64) * 104729 + 1) % span_nanos;
        let ts = base_time_nanos.saturating_sub(span_nanos) + offset;

        events.push((entity_id, signal, weight, ts));
    }

    events
}

/// Count unique creators in a result set.
fn creator_counts(
    results: &[tidaldb::query::retrieve::RetrieveResult],
    db: &TidalDB,
) -> HashMap<EntityId, usize> {
    let mut counts: HashMap<EntityId, usize> = HashMap::new();
    for result in results {
        if let Ok(Some(meta)) = db.get_item_metadata(result.entity_id) {
            if let Some(creator_id) = meta.creator_id {
                *counts.entry(creator_id).or_insert(0) += 1;
            }
        }
    }
    counts
}

/// Get the category of an item from the database.
fn item_category(db: &TidalDB, entity_id: EntityId) -> Option<String> {
    db.get_item_metadata(entity_id)
        .ok()
        .flatten()
        .and_then(|m| m.category.clone())
}

// ============================================================
// THE M2 UAT TEST
// ============================================================
//
// This is the definitive acceptance test for Milestone 2.
// It matches the UAT scenario in ROADMAP.md.
#[test]
fn milestone_2_uat() {
    let dir = TempDir::new().unwrap();
    let schema = m2_schema();

    let db = TidalDB::open(Config {
        data_dir: dir.path().to_owned(),
        schema: schema.clone(),
    })
    .unwrap();

    // ============================================================
    // Setup: Write 10K items with metadata and embeddings
    // ============================================================

    let now = Timestamp::now();
    let now_nanos = now.as_nanos();

    for i in 0..10_000u64 {
        let (category, format, creator_id, created_at_offset) = item_metadata(i);
        let embedding = generate_embedding(i, 64);
        let created_at_nanos = now_nanos.saturating_sub(created_at_offset);

        db.write_item_with_metadata(
            EntityId::new(i + 1),
            &category,
            &format,
            creator_id,
            Timestamp::from_nanos(created_at_nanos),
            Some(&embedding),
        )
        .unwrap();
    }

    // Verify item count
    assert_eq!(db.item_count().unwrap(), 10_000);

    // ============================================================
    // Setup: Write 10K signal events spanning 7 days
    // ============================================================

    let seven_days_nanos = 7u64 * 24 * 3600 * 1_000_000_000;
    let events = generate_signal_events(10_000, 10_000, now_nanos, seven_days_nanos);

    for (entity_id, signal_type, weight, ts_nanos) in &events {
        db.signal(signal_type, *entity_id, *weight, Timestamp::from_nanos(*ts_nanos))
            .unwrap();
    }

    // ============================================================
    // Query 1: Trending with diversity
    // ============================================================
    // RETRIEVE items USING PROFILE trending DIVERSITY max_per_creator:1 LIMIT 25

    let trending_query = Retrieve::builder()
        .entity(EntityKind::Item)
        .profile("trending")
        .diversity(DiversityConstraints::new().max_per_creator(1))
        .limit(25)
        .build()
        .unwrap();

    let trending_results = db.retrieve(&trending_query).unwrap();

    // Verify: got results (up to 25)
    assert!(
        !trending_results.is_empty(),
        "trending query should return results"
    );
    assert!(
        trending_results.len() <= 25,
        "trending query should return at most 25 results, got {}",
        trending_results.len()
    );

    // Verify: scores are sorted descending
    for pair in trending_results.items.windows(2) {
        assert!(
            pair[0].score >= pair[1].score,
            "trending results should be sorted descending: {} >= {} (ranks {} and {})",
            pair[0].score,
            pair[1].score,
            pair[0].rank,
            pair[1].rank,
        );
    }

    // Verify: creator diversity (max 1 per creator)
    let creators = creator_counts(&trending_results.items, &db);
    for (creator_id, count) in &creators {
        assert!(
            *count <= 1,
            "max_per_creator:1 violated: creator {} appears {} times",
            creator_id,
            count,
        );
    }

    // Verify: ranks are 1-based and sequential
    for (i, item) in trending_results.items.iter().enumerate() {
        assert_eq!(
            item.rank,
            i + 1,
            "rank should be 1-based sequential, got {} at position {}",
            item.rank,
            i,
        );
    }

    // ============================================================
    // Query 2: Hot with category filter
    // ============================================================
    // RETRIEVE items FILTER category:jazz USING PROFILE hot LIMIT 20

    let jazz_query = Retrieve::builder()
        .entity(EntityKind::Item)
        .profile("hot")
        .filter(FilterExpr::eq("category", "jazz"))
        .limit(20)
        .build()
        .unwrap();

    let jazz_results = db.retrieve(&jazz_query).unwrap();

    // Verify: only jazz items returned
    for item in &jazz_results.items {
        let category = item_category(&db, item.entity_id);
        assert_eq!(
            category.as_deref(),
            Some("jazz"),
            "hot+jazz query returned non-jazz item: entity={}, category={:?}",
            item.entity_id,
            category,
        );
    }

    // Verify: scores are sorted descending
    for pair in jazz_results.items.windows(2) {
        assert!(
            pair[0].score >= pair[1].score,
            "jazz results should be sorted descending: {} >= {}",
            pair[0].score,
            pair[1].score,
        );
    }

    // ============================================================
    // Query 3: New (created_at descending)
    // ============================================================
    // RETRIEVE items USING PROFILE new LIMIT 20

    let new_query = Retrieve::builder()
        .entity(EntityKind::Item)
        .profile("new")
        .limit(20)
        .build()
        .unwrap();

    let new_results = db.retrieve(&new_query).unwrap();

    assert!(
        !new_results.is_empty(),
        "new query should return results"
    );
    assert!(
        new_results.len() <= 20,
        "new query should return at most 20 results"
    );

    // Verify: scores are sorted descending (new profile uses created_at as score)
    for pair in new_results.items.windows(2) {
        assert!(
            pair[0].score >= pair[1].score,
            "new results should be sorted descending: {} >= {} (entities {} and {})",
            pair[0].score,
            pair[1].score,
            pair[0].entity_id,
            pair[1].entity_id,
        );
    }

    // ============================================================
    // Query 4: Top week (signal-based ordering within 7d window)
    // ============================================================
    // RETRIEVE items USING PROFILE top_week LIMIT 20

    let top_week_query = Retrieve::builder()
        .entity(EntityKind::Item)
        .profile("top_week")
        .limit(20)
        .build()
        .unwrap();

    let top_week_results = db.retrieve(&top_week_query).unwrap();

    assert!(
        !top_week_results.is_empty(),
        "top_week query should return results"
    );

    // Verify: scores are sorted descending
    for pair in top_week_results.items.windows(2) {
        assert!(
            pair[0].score >= pair[1].score,
            "top_week results should be sorted descending: {} >= {}",
            pair[0].score,
            pair[1].score,
        );
    }

    // ============================================================
    // Query 5: Hidden gems
    // ============================================================
    // ROADMAP UAT: RETRIEVE items USING PROFILE hidden_gems FILTER min_completion_rate:0.7 LIMIT 10
    //
    // M2 limitation: `min_completion_rate` is a signal-derived filter (completion
    // rate = completion_count / view_count). The m2p2 filter engine supports
    // metadata field filters (BitmapIndex, RangeIndex) but not computed signal
    // ratios. Signal-derived predicates are an M3+ extension to the filter engine.
    // For M2, the hidden_gems query runs without the completion rate filter;
    // all items are candidates and the hidden_gems scoring formula naturally
    // surfaces items with high completion-to-view ratios.

    let hidden_gems_query = Retrieve::builder()
        .entity(EntityKind::Item)
        .profile("hidden_gems")
        // TODO M3: add .filter(FilterExpr::signal_ratio("completion", "view", 0.7))
        // once signal-derived predicates are supported in the filter engine.
        .limit(10)
        .build()
        .unwrap();

    let hidden_gems_results = db.retrieve(&hidden_gems_query).unwrap();

    assert!(
        !hidden_gems_results.is_empty(),
        "hidden_gems query should return results"
    );

    // Verify: scores are sorted descending
    for pair in hidden_gems_results.items.windows(2) {
        assert!(
            pair[0].score >= pair[1].score,
            "hidden_gems results should be sorted descending: {} >= {}",
            pair[0].score,
            pair[1].score,
        );
    }

    // ============================================================
    // Query 6: Controversial (dual-signal ranking)
    // ============================================================
    // RETRIEVE items USING PROFILE controversial LIMIT 10

    let controversial_query = Retrieve::builder()
        .entity(EntityKind::Item)
        .profile("controversial")
        .limit(10)
        .build()
        .unwrap();

    let controversial_results = db.retrieve(&controversial_query).unwrap();

    assert!(
        !controversial_results.is_empty(),
        "controversial query should return results"
    );

    // Verify: scores are sorted descending
    for pair in controversial_results.items.windows(2) {
        assert!(
            pair[0].score >= pair[1].score,
            "controversial results should be sorted descending: {} >= {}",
            pair[0].score,
            pair[1].score,
        );
    }

    // ============================================================
    // Signal Burst: Write 100 "share" signals for item #500
    // ============================================================

    // Record pre-burst trending results
    let pre_burst_trending = Retrieve::builder()
        .entity(EntityKind::Item)
        .profile("trending")
        .limit(50)
        .build()
        .unwrap();
    let pre_burst_results = db.retrieve(&pre_burst_trending).unwrap();
    let pre_burst_rank = pre_burst_results
        .items
        .iter()
        .position(|r| r.entity_id == EntityId::new(500));

    // Write 100 "share" signals for item #500 at the current time
    let burst_time = Timestamp::now();
    for _ in 0..100 {
        db.signal("share", EntityId::new(500), 1.0, burst_time)
            .unwrap();
    }

    // Re-execute trending query
    let post_burst_results = db.retrieve(&pre_burst_trending).unwrap();
    let post_burst_rank = post_burst_results
        .items
        .iter()
        .position(|r| r.entity_id == EntityId::new(500));

    // Verify: item #500 should be present (or rose from absent to present)
    // and its rank should have improved (or appeared)
    match (pre_burst_rank, post_burst_rank) {
        (None, Some(rank)) => {
            // Item was not in the top 50 before, now it is -- signal burst worked
            assert!(
                rank < 50,
                "item #500 should appear in top 50 after burst, found at position {}",
                rank
            );
        }
        (Some(pre), Some(post)) => {
            // Item was in top 50 and should have moved up
            assert!(
                post <= pre,
                "item #500 should rank higher after burst: pre={}, post={}",
                pre,
                post
            );
        }
        (None, None) => {
            // If item #500 still does not appear in top 50 after 100 share signals,
            // check that it at least has a higher score than before.
            // This can happen if the item is in a crowded ranking.
            // We verify signal write worked by reading the signal directly.
            let share_count = db
                .read_windowed_count(EntityId::new(500), "share", Window::AllTime)
                .unwrap();
            assert!(
                share_count >= 100,
                "item #500 should have at least 100 shares after burst, got {}",
                share_count
            );
        }
        (Some(_), None) => {
            panic!(
                "item #500 was in trending before burst but disappeared after -- this is wrong"
            );
        }
    }

    // ============================================================
    // Crash Recovery: Shutdown and reopen
    // ============================================================

    db.shutdown().unwrap();

    let db2 = TidalDB::open(Config {
        data_dir: dir.path().to_owned(),
        schema: schema.clone(),
    })
    .unwrap();

    // Re-verify: items survived
    assert_eq!(
        db2.item_count().unwrap(),
        10_000,
        "item count should survive restart"
    );

    // Re-verify: trending query still works
    let recovered_trending = Retrieve::builder()
        .entity(EntityKind::Item)
        .profile("trending")
        .limit(25)
        .build()
        .unwrap();
    let recovered_results = db2.retrieve(&recovered_trending).unwrap();
    assert!(
        !recovered_results.is_empty(),
        "trending query should work after restart"
    );

    // Re-verify: scores are sorted descending after restart
    for pair in recovered_results.items.windows(2) {
        assert!(
            pair[0].score >= pair[1].score,
            "trending results after restart should be sorted: {} >= {}",
            pair[0].score,
            pair[1].score,
        );
    }

    // Re-verify: hot+jazz filter still works
    let recovered_jazz = Retrieve::builder()
        .entity(EntityKind::Item)
        .profile("hot")
        .filter(FilterExpr::eq("category", "jazz"))
        .limit(20)
        .build()
        .unwrap();
    let recovered_jazz_results = db2.retrieve(&recovered_jazz).unwrap();
    for item in &recovered_jazz_results.items {
        let category = item_category(&db2, item.entity_id);
        assert_eq!(
            category.as_deref(),
            Some("jazz"),
            "jazz filter should still work after restart"
        );
    }

    // Re-verify: signal burst for item #500 survived
    let recovered_share_count = db2
        .read_windowed_count(EntityId::new(500), "share", Window::AllTime)
        .unwrap();
    assert!(
        recovered_share_count >= 100,
        "share signals for item #500 should survive restart, got {}",
        recovered_share_count
    );

    db2.shutdown().unwrap();
}

// ============================================================
// SIGNAL SNAPSHOT TRANSPARENCY TEST
// ============================================================
//
// Verifies that RETRIEVE results include signal snapshots
// for debugging and ranking transparency.
#[test]
fn retrieve_results_include_signal_snapshots() {
    let dir = TempDir::new().unwrap();
    let schema = m2_schema();

    let db = TidalDB::open(Config {
        data_dir: dir.path().to_owned(),
        schema,
    })
    .unwrap();

    // Write 100 items with embeddings
    for i in 0..100u64 {
        let (category, format, creator_id, created_at_offset) = item_metadata(i);
        let embedding = generate_embedding(i, 64);
        let now = Timestamp::now();

        db.write_item_with_metadata(
            EntityId::new(i + 1),
            &category,
            &format,
            creator_id,
            Timestamp::from_nanos(now.as_nanos().saturating_sub(created_at_offset)),
            Some(&embedding),
        )
        .unwrap();
    }

    // Write enough signals so profiles have data to score with
    let now = Timestamp::now();
    for i in 0..500u64 {
        let entity = EntityId::new((i % 100) + 1);
        db.signal("view", entity, 1.0, now).unwrap();
        if i % 3 == 0 {
            db.signal("like", entity, 1.0, now).unwrap();
        }
    }

    // Query with hot profile
    let query = Retrieve::builder()
        .profile("hot")
        .limit(10)
        .build()
        .unwrap();

    let results = db.retrieve(&query).unwrap();

    // At least some results should have signal snapshots
    let has_snapshots = results
        .items
        .iter()
        .any(|r| !r.signal_snapshot.is_empty());
    assert!(
        has_snapshots,
        "at least some results should include signal snapshots"
    );

    // Signal snapshots should be capped at 10
    for item in &results.items {
        assert!(
            item.signal_snapshot.len() <= 10,
            "signal snapshot should be capped at 10, got {}",
            item.signal_snapshot.len()
        );
    }

    db.shutdown().unwrap();
}

// ============================================================
// EXCLUDE LIST TEST
// ============================================================
//
// Verifies that EXCLUDE IDs are removed from results.
#[test]
fn retrieve_excludes_specified_ids() {
    let dir = TempDir::new().unwrap();
    let schema = m2_schema();

    let db = TidalDB::open(Config {
        data_dir: dir.path().to_owned(),
        schema,
    })
    .unwrap();

    // Write 50 items
    for i in 0..50u64 {
        let (category, format, creator_id, created_at_offset) = item_metadata(i);
        let embedding = generate_embedding(i, 64);
        let now = Timestamp::now();

        db.write_item_with_metadata(
            EntityId::new(i + 1),
            &category,
            &format,
            creator_id,
            Timestamp::from_nanos(now.as_nanos().saturating_sub(created_at_offset)),
            Some(&embedding),
        )
        .unwrap();
    }

    // Write signals
    let now = Timestamp::now();
    for i in 0..200u64 {
        let entity = EntityId::new((i % 50) + 1);
        db.signal("view", entity, 1.0, now).unwrap();
    }

    // Query without excludes
    let query_no_exclude = Retrieve::builder()
        .profile("hot")
        .limit(20)
        .build()
        .unwrap();
    let results_no_exclude = db.retrieve(&query_no_exclude).unwrap();

    // Pick the top 3 IDs to exclude
    let exclude_ids: Vec<EntityId> = results_no_exclude
        .items
        .iter()
        .take(3)
        .map(|r| r.entity_id)
        .collect();

    // Query with excludes
    let query_with_exclude = Retrieve::builder()
        .profile("hot")
        .exclude_ids(exclude_ids.clone())
        .limit(20)
        .build()
        .unwrap();
    let results_with_exclude = db.retrieve(&query_with_exclude).unwrap();

    // Verify: excluded IDs are not in results
    for item in &results_with_exclude.items {
        assert!(
            !exclude_ids.contains(&item.entity_id),
            "excluded entity {} should not appear in results",
            item.entity_id,
        );
    }

    db.shutdown().unwrap();
}

// ============================================================
// PAGINATION TEST
// ============================================================
//
// Verifies that offset-based cursor pagination works correctly
// in the absence of concurrent writes. Note: offset cursors are
// NOT stable under concurrent signal writes (the ranked list can
// shift between pages). This test only covers the non-concurrent
// case. See Cursor doc in task-01 for the full limitation note.
#[test]
fn retrieve_pagination_via_cursor() {
    let dir = TempDir::new().unwrap();
    let schema = m2_schema();

    let db = TidalDB::open(Config {
        data_dir: dir.path().to_owned(),
        schema,
    })
    .unwrap();

    // Write 100 items
    for i in 0..100u64 {
        let (category, format, creator_id, created_at_offset) = item_metadata(i);
        let embedding = generate_embedding(i, 64);
        let now = Timestamp::now();

        db.write_item_with_metadata(
            EntityId::new(i + 1),
            &category,
            &format,
            creator_id,
            Timestamp::from_nanos(now.as_nanos().saturating_sub(created_at_offset)),
            Some(&embedding),
        )
        .unwrap();
    }

    // Write signals
    let now = Timestamp::now();
    for i in 0..500u64 {
        let entity = EntityId::new((i % 100) + 1);
        db.signal("view", entity, 1.0, now).unwrap();
    }

    // Page 1: first 10 results
    let page1_query = Retrieve::builder()
        .profile("hot")
        .limit(10)
        .build()
        .unwrap();
    let page1 = db.retrieve(&page1_query).unwrap();

    assert_eq!(page1.len(), 10, "page 1 should have 10 results");
    assert!(
        page1.next_cursor.is_some(),
        "page 1 should have a next cursor"
    );

    // Page 2: next 10 results using cursor
    let page2_query = Retrieve::builder()
        .profile("hot")
        .limit(10)
        .cursor(page1.next_cursor.unwrap())
        .build()
        .unwrap();
    let page2 = db.retrieve(&page2_query).unwrap();

    assert_eq!(page2.len(), 10, "page 2 should have 10 results");

    // Verify: no overlap between pages
    let page1_ids: Vec<EntityId> = page1.items.iter().map(|r| r.entity_id).collect();
    let page2_ids: Vec<EntityId> = page2.items.iter().map(|r| r.entity_id).collect();
    for id in &page2_ids {
        assert!(
            !page1_ids.contains(id),
            "entity {} appears on both page 1 and page 2",
            id,
        );
    }

    // Verify: page 2 ranks continue from page 1
    assert_eq!(page2.items[0].rank, 11, "page 2 should start at rank 11");

    db.shutdown().unwrap();
}

// ============================================================
// QUERY VALIDATION ERROR TEST
// ============================================================
//
// Verifies that invalid queries produce clear errors.
#[test]
fn retrieve_rejects_invalid_queries() {
    let dir = TempDir::new().unwrap();
    let schema = m2_schema();

    let db = TidalDB::open(Config {
        data_dir: dir.path().to_owned(),
        schema,
    })
    .unwrap();

    // Unknown profile
    let unknown_profile = Retrieve::builder()
        .profile("nonexistent_profile")
        .limit(10)
        .build()
        .unwrap();
    let result = db.retrieve(&unknown_profile);
    assert!(
        matches!(result, Err(tidaldb::query::retrieve::QueryError::ProfileNotFound(_))),
        "unknown profile should return ProfileNotFound, got: {:?}",
        result,
    );

    // Limit = 0 (caught at builder level)
    let result = Retrieve::builder().profile("new").limit(0).build();
    assert!(
        matches!(result, Err(tidaldb::query::retrieve::QueryError::InvalidLimit { .. })),
        "limit=0 should return InvalidLimit"
    );

    // Limit > 500 (caught at builder level)
    let result = Retrieve::builder().profile("new").limit(501).build();
    assert!(
        matches!(result, Err(tidaldb::query::retrieve::QueryError::InvalidLimit { .. })),
        "limit=501 should return InvalidLimit"
    );

    db.shutdown().unwrap();
}

// ============================================================
// DETERMINISTIC RESULTS TEST
// ============================================================
//
// Verifies INV-QUERY-1: same query with same state produces
// identical results.
#[test]
fn retrieve_deterministic_results() {
    let dir = TempDir::new().unwrap();
    let schema = m2_schema();

    let db = TidalDB::open(Config {
        data_dir: dir.path().to_owned(),
        schema,
    })
    .unwrap();

    // Write 100 items with signals
    let now = Timestamp::now();
    for i in 0..100u64 {
        let (category, format, creator_id, created_at_offset) = item_metadata(i);
        let embedding = generate_embedding(i, 64);
        db.write_item_with_metadata(
            EntityId::new(i + 1),
            &category,
            &format,
            creator_id,
            Timestamp::from_nanos(now.as_nanos().saturating_sub(created_at_offset)),
            Some(&embedding),
        )
        .unwrap();

        db.signal("view", EntityId::new(i + 1), 1.0, now).unwrap();
        if i % 3 == 0 {
            db.signal("like", EntityId::new(i + 1), 1.0, now)
                .unwrap();
        }
    }

    let query = Retrieve::builder()
        .profile("hot")
        .limit(20)
        .build()
        .unwrap();

    let results1 = db.retrieve(&query).unwrap();
    let results2 = db.retrieve(&query).unwrap();

    assert_eq!(results1.len(), results2.len(), "result counts must match");

    for (r1, r2) in results1.items.iter().zip(results2.items.iter()) {
        assert_eq!(
            r1.entity_id, r2.entity_id,
            "entity IDs must match at rank {}",
            r1.rank,
        );
        assert!(
            (r1.score - r2.score).abs() < f64::EPSILON,
            "scores must be identical for entity {} at rank {}: {} vs {}",
            r1.entity_id,
            r1.rank,
            r1.score,
            r2.score,
        );
    }

    db.shutdown().unwrap();
}

Acceptance Criteria

milestone_2_uat test passes: all 6 queries return correctly ordered results
Query 1 (trending): results sorted descending, creator diversity enforced (max 1 per creator), ranks are 1-based sequential
Query 2 (hot + jazz filter): only jazz items returned, sorted descending by hot score
Query 3 (new): results sorted by created_at descending
Query 4 (top_week): results sorted by 7d signal-based score
Query 5 (hidden_gems): results sorted by quality/reach ratio
Query 6 (controversial): results sorted by dual-signal score
Signal burst: writing 100 "share" signals for item #500 causes it to rise in trending rank (or appear if previously absent)
Crash recovery: shutdown and reopen preserves all items, signals, and query functionality
retrieve_results_include_signal_snapshots test passes: at least some results have non-empty snapshots, all capped at 10
retrieve_excludes_specified_ids test passes: excluded IDs never appear in results
retrieve_pagination_via_cursor test passes: pages do not overlap, ranks continue correctly
retrieve_rejects_invalid_queries test passes: clear errors for unknown profile, invalid limit
retrieve_deterministic_results test passes: same query produces identical results (INV-QUERY-1)
cargo test --test m2_uat passes
No unsafe code in tests
Test data is deterministic (fixed seeds, reproducible event sequences)

Research References

docs/research/tidaldb_signal_ledger.md -- Signal write/read latencies referenced in test timing expectations
docs/research/ann_for_tidaldb.md -- ANN recall@k expectations for verifying retrieval correctness

Spec References

docs/specs/08-query-engine.md -- Section 2 (RETRIEVE operation), Section 5 (execution pipeline), Section 8 (pagination), Section 15 (invariants: INV-QUERY-1 deterministic, INV-QUERY-2 filter correctness)
docs/specs/09-ranking-scoring.md -- Section 11 (sort mode formulas verified by query ordering), Section 16 (INV-RANK-1 deterministic scoring, INV-RANK-5 diversity never reduces result count)

Implementation Notes

Signal count (10K vs ROADMAP's 100K): The ROADMAP UAT specifies 100K signal events. This test uses 10K to keep cargo test --test m2_uat under 30 seconds. 10K signals across 10K items averages 1 signal per entity — sparse but sufficient for correctness testing of ranking logic. For scale validation, add a #[ignore] test:
```
#[test]
#[ignore = "scale test: takes 2-3 minutes, run with --ignored"]
fn milestone_2_uat_100k_signals() {
    // same as milestone_2_uat but with 100K signals
}
```
Run with: cargo test --test m2_uat -- --ignored milestone_2_uat_100k_signals
The generate_embedding function uses sin() for deterministic pseudo-random vectors. The embeddings are L2-normalized so they work correctly with USearch's cosine/L2 equivalence. Use 64 dimensions for test speed -- the trait abstraction handles any dimension.
The generate_signal_events function uses prime strides (7919, 104729) for reproducible distribution without a PRNG dependency. The distribution is power-law-ish: some entities get more events than others, creating interesting ranking dynamics.
The write_item_with_metadata API is a convenience wrapper expected to exist on TidalDb for M2. If it does not exist, this task must add it. It stores structured metadata (category, format, creator_id, created_at) that the bitmap/range indexes and the RETRIEVE executor can read. The exact API shape depends on how metadata is stored after m2p2 (bitmap indexes) is integrated.
The signal burst test (100 "share" signals for item #500) verifies signal freshness: a signal written during the test is reflected in the very next query. The test handles the case where item #500 does not appear in the top 50 before or after the burst (possible with random signal distribution) by falling back to verifying the signal count directly.
The crash recovery section re-verifies item count, trending query, jazz filter, and signal persistence. It does NOT require exact score-level equality with pre-crash results (decay scores advance with time, so scores computed at a later time after restart will differ slightly). It verifies functional correctness: queries work, filters apply, signals survived.
Test execution time target: < 30 seconds for the full m2_uat test. At 10K items with 64-dim embeddings and 10K signals, setup should take ~5 seconds (item writes + signal writes), and the 6 queries should each take < 100ms. If the test is too slow, reduce item count to 5K or embedding dimension to 32.
All test assertions include descriptive failure messages. A failing assertion should tell the developer exactly what went wrong and which UAT step failed.
The m2_uat.rs file is an integration test (in tidal/tests/), not a unit test (in src/). It links against the compiled crate and tests the public API exactly as a user would.

37 KiB Raw Blame History