jordan 39ada28c6e feat: complete Milestones 2–4 — RETRIEVE query, vector index, ranking profiles, diversity, entity system, sessions

M2: RETRIEVE query pipeline with 5-stage execution (candidate → filter → score → diversify → limit),
    usearch HNSW vector index, bitmap/range/universe filters, ranking profiles with signal scoring,
    MMR diversity enforcement, and m2_uat integration tests.

M3: Entity system with typed metadata, relationship graph (follows/blocks/interactions),
    creator entities, session tracking, and m3_uat integration tests.

M4: Advanced ranking with builtin functions (freshness, trending, controversy, wilson),
    ranking executor with explain mode, query executor integration, benchmarks for
    query/ranking/vector/filters/diversity, and m4_uat integration tests.

Includes: 9 new blog posts, marketing site updates, updated roadmap, and updated vision doc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-21 16:24:48 -07:00

34 KiB

Raw Blame History

Task 02: M3 UAT Integration Test

Context

Milestone: 3 -- Personalized Ranking Phase: m3p4 -- User State Filters + M3 UAT Integration Test Depends On: Task 01 (user-state filters: unseen, unblocked, saved, liked, in_progress), m3p3 (personalized profiles: for_you, following, related, notification; cold-start handling; exploration budget), m3p2 (feedback loop: signal dispatch, preference vectors, interaction weights, hard negatives), m3p1 (user/creator entities, relationships, user state index) Blocks: Nothing (this is the final deliverable of Milestone 3) Complexity: L

Objective

Deliver the end-to-end integration test that proves the complete Milestone 3 UAT scenario from the ROADMAP. This test is the pass/fail gate for Milestone 3. It exercises every component built across m3p1--m3p4 in a single scenario:

A corpus of 10,000 items across 200 creators with embeddings
500 users with follows, blocks, and historical signal events
A for_you query returns personalized, filtered, diversity-constrained results
A following query returns chronologically-ordered items from followed creators
A related query returns semantically similar items re-ranked by user preference
A like signal atomically updates item signals, interaction weights, and preference vector
Re-executing for_you reflects the like (results shift)
A hide signal permanently excludes the hidden item
A block signal permanently excludes all items from the blocked creator
Re-executing for_you excludes the hidden item and blocked creator's items

This test is not a unit test or component test. It is a full-system acceptance test that creates a TidalDb instance, writes all test data through the public API, executes queries through the public retrieve() method, and verifies results against the UAT criteria. If this test passes, Milestone 3 is complete.

Requirements

Test Data Setup

10,000 items across 200 creators (50 items per creator)
Each item has: metadata (title, category, format, duration, created_at, creator_id), 16-dimensional embedding
200 creator entities with metadata
500 users, each following 10--30 random creators, each blocking 0--3 random creators
Signal types: view (7d decay), like (14d decay), skip (1d decay), share (3d decay), completion (30d decay)
500,000 historical signal events establishing user preference vectors and interaction weights
Ranking profiles registered: for_you, following, related, notification, trending, hot, new

Note: The test uses 16-dimensional embeddings instead of 1536 for speed. The dimensionality does not affect the correctness of cosine similarity or ANN retrieval, only the semantic quality (which is irrelevant for UAT).

Test Scenario Steps

Each step corresponds to a "When" clause from the ROADMAP UAT scenario.

Step 1: For You query

RETRIEVE items FOR USER @user_42
  USING PROFILE for_you
  FILTER unseen, unblocked
  DIVERSITY max_per_creator:2
  LIMIT 50

Verify:

Returns exactly 50 results
Results are sorted by score descending
No item appears that user_42 has already viewed (seen bitmap populated from historical signals)
No item from a blocked creator appears
No hidden items appear
Max 2 items per creator in the result set
Approximately 5 items (10% exploration budget) are from creators user_42 does not follow
Items matching user_42's preference vector rank higher than random items (cosine similarity correlation)

Step 2: Following query

RETRIEVE items FOR USER @user_42
  FILTER relationship:follows
  USING PROFILE following
  LIMIT 50

Verify:

All items are from creators user_42 follows
Items are ordered by created_at descending (chronological)
No items from unfollowed creators appear

Step 3: Related query

RETRIEVE items SIMILAR TO @item_500
  FOR USER @user_42
  USING PROFILE related
  FILTER unseen
  LIMIT 10

Verify:

Returns up to 10 results
@item_500 itself does not appear in results
Items already seen by user_42 are excluded
Results have semantic similarity to @item_500 (embedding distance)

Step 4: Like signal

SIGNAL like item:@item_xyz user:@user_42

Where @item_xyz is an item from a specific category/creator that user_42 has not previously engaged with heavily.

Verify:

Item signal ledger updated (like count for item_xyz increased)
Interaction weight between user_42 and creator of item_xyz increased
User_42's preference vector shifted toward item_xyz's embedding
All updates visible immediately (no eventual consistency)

Step 5: Re-execute For You after like

Re-execute the same for_you query from Step 1.

Verify:

Results are different from Step 1 (the like changed the preference vector)
Items similar to item_xyz's topic/embedding rank higher than before
Items from the creator of item_xyz may appear more frequently (interaction weight increased)

Step 6: Hide signal

SIGNAL hide item:@item_999 user:@user_42

Verify:

@item_999 is marked as hidden for user_42
@item_999 will never appear in future queries for user_42

Step 7: Block signal

SIGNAL block user:@user_42 target_creator:@creator_77

Verify:

Creator_77 is blocked by user_42
All items by creator_77 are excluded from future queries for user_42

Step 8: Re-execute For You after hide and block

Re-execute the same for_you query from Step 1.

Verify:

@item_999 does not appear in results
No items from creator_77 appear in results
The preference shift from the like signal (Step 4) is still reflected
All diversity constraints still hold
Result count is still 50 (other items fill the slots)

Persistence and Recovery

After all 8 steps, close the database and reopen it. Re-execute the for_you query and verify:

Hidden item (@item_999) still excluded (hard negatives survive restart)
Blocked creator (@creator_77) still excluded
Preference vector is restored (results are similar to Step 8, not Step 1)
Interaction weights are restored

Cold-Start User

Create a brand-new user (user_501) with no history and execute:

RETRIEVE items FOR USER @user_501
  USING PROFILE for_you
  FILTER unseen, unblocked
  DIVERSITY max_per_creator:2
  LIMIT 50

Verify:

Returns 50 results (not zero -- cold-start handling works)
Results are ranked by population-level signals (trending, quality, recency)
No crash or error from missing preference vector

Performance

Each RETRIEVE query completes in < 100ms at 10K items
Signal write with user context completes in < 1ms
Database open (with 500K signal replay) completes in < 30 seconds

Technical Design

Test File

tidal/tests/
  m3_uat.rs -- Full Milestone 3 UAT integration test

Test Helpers

// === tests/m3_uat.rs ===

#![allow(clippy::unwrap_used)]

use std::collections::{HashMap, HashSet};
use std::time::Duration;
use tempfile::tempdir;

use tidaldb::db::{TidalDb, UserSignalContext};
use tidaldb::schema::{
    DecaySpec, EntityId, SchemaBuilder, Timestamp, Window,
};
use tidaldb::ranking::ScoredCandidate;

const NUM_ITEMS: u64 = 10_000;
const NUM_CREATORS: u64 = 200;
const ITEMS_PER_CREATOR: u64 = NUM_ITEMS / NUM_CREATORS; // 50
const NUM_USERS: u64 = 500;
const EMBEDDING_DIM: usize = 16;
const NUM_SIGNALS: usize = 500_000;

/// Build a schema with all required signal types.
fn build_test_schema() -> tidaldb::schema::Schema {
    let mut builder = SchemaBuilder::new();

    let _ = builder
        .signal(
            "view",
            tidaldb::schema::EntityKind::Item,
            DecaySpec::Exponential {
                half_life: Duration::from_secs(7 * 24 * 3600),
            },
        )
        .windows(&[Window::OneHour, Window::TwentyFourHours, Window::SevenDays])
        .velocity(true)
        .add();

    let _ = builder
        .signal(
            "like",
            tidaldb::schema::EntityKind::Item,
            DecaySpec::Exponential {
                half_life: Duration::from_secs(14 * 24 * 3600),
            },
        )
        .windows(&[Window::TwentyFourHours, Window::SevenDays])
        .velocity(false)
        .add();

    let _ = builder
        .signal(
            "skip",
            tidaldb::schema::EntityKind::Item,
            DecaySpec::Exponential {
                half_life: Duration::from_secs(24 * 3600),
            },
        )
        .windows(&[Window::OneHour, Window::TwentyFourHours])
        .velocity(false)
        .add();

    let _ = builder
        .signal(
            "share",
            tidaldb::schema::EntityKind::Item,
            DecaySpec::Exponential {
                half_life: Duration::from_secs(3 * 24 * 3600),
            },
        )
        .windows(&[Window::TwentyFourHours])
        .velocity(true)
        .add();

    let _ = builder
        .signal(
            "completion",
            tidaldb::schema::EntityKind::Item,
            DecaySpec::Exponential {
                half_life: Duration::from_secs(30 * 24 * 3600),
            },
        )
        .windows(&[Window::SevenDays])
        .velocity(false)
        .add();

    builder.build().unwrap()
}

/// Generate a deterministic embedding for an item.
///
/// Items from the same creator cluster together, and items in similar
/// categories have overlapping components. This makes the ANN retrieval
/// meaningful for testing.
fn item_embedding(item_id: u64) -> Vec<f32> {
    let creator_id = item_id / ITEMS_PER_CREATOR;
    let item_within_creator = item_id % ITEMS_PER_CREATOR;
    let mut emb = vec![0.0f32; EMBEDDING_DIM];

    // Creator component: items from the same creator share a base direction.
    let creator_angle = (creator_id as f32) * std::f32::consts::TAU / (NUM_CREATORS as f32);
    emb[0] = creator_angle.cos();
    emb[1] = creator_angle.sin();

    // Category component (assume 10 categories, cycling).
    let category = (item_id % 10) as f32;
    emb[2] = (category * 0.3).sin();
    emb[3] = (category * 0.3).cos();

    // Item-specific variation.
    let item_angle = (item_within_creator as f32) * 0.1;
    emb[4] = item_angle.sin();
    emb[5] = item_angle.cos();

    // Normalize to unit length.
    let norm: f32 = emb.iter().map(|&x| x * x).sum::<f32>().sqrt();
    if norm > 0.0 {
        for v in &mut emb {
            *v /= norm;
        }
    }
    emb
}

/// Generate metadata for an item.
fn item_metadata(item_id: u64) -> HashMap<String, String> {
    let creator_id = item_id / ITEMS_PER_CREATOR;
    let categories = ["jazz", "rock", "classical", "hip-hop", "electronic",
                      "pop", "folk", "blues", "country", "r-and-b"];
    let formats = ["video", "audio", "article"];
    let category = categories[(item_id % 10) as usize];
    let format = formats[(item_id % 3) as usize];
    let duration = 60 + (item_id % 600);  // 1min to 11min
    let created_at_offset = item_id * 3600 * 1_000_000_000; // spread items over time

    let mut meta = HashMap::new();
    meta.insert("title".into(), format!("Item {}", item_id));
    meta.insert("creator_id".into(), creator_id.to_string());
    meta.insert("category".into(), category.into());
    meta.insert("format".into(), format.into());
    meta.insert("duration".into(), duration.to_string());
    meta.insert("created_at".into(), (Timestamp::now().as_nanos() - created_at_offset).to_string());
    meta
}

/// Generate the creator_id for an item.
fn creator_for_item(item_id: u64) -> u64 {
    item_id / ITEMS_PER_CREATOR
}

/// Generate follows/blocks for a user.
///
/// Uses a deterministic pseudo-random function seeded by user_id.
fn user_relationships(user_id: u64) -> (Vec<u64>, Vec<u64>) {
    let mut follows = vec![];
    let mut blocks = vec![];

    // Follow 10-30 creators (deterministic based on user_id).
    let follow_count = 10 + (user_id % 21);
    for i in 0..follow_count {
        let creator = (user_id * 7 + i * 13) % NUM_CREATORS;
        follows.push(creator);
    }
    follows.sort_unstable();
    follows.dedup();

    // Block 0-3 creators.
    let block_count = user_id % 4;
    for i in 0..block_count {
        let creator = (user_id * 11 + i * 17 + 100) % NUM_CREATORS;
        if !follows.contains(&creator) {
            blocks.push(creator);
        }
    }

    (follows, blocks)
}

/// Generate historical signal events.
///
/// Produces NUM_SIGNALS signal events spread across 7 days for all users.
/// Each event targets a random item with a signal type weighted toward views.
fn generate_signals(now: Timestamp) -> Vec<(u64, &'static str, u64, f64, Timestamp)> {
    let mut events = Vec::with_capacity(NUM_SIGNALS);
    let signal_types = ["view", "view", "view", "view", "like", "skip", "completion", "share"];
    let seven_days_ns = 7 * 24 * 3600 * 1_000_000_000u64;

    for i in 0..NUM_SIGNALS {
        let user_id = (i as u64) % NUM_USERS;
        let item_id = ((i as u64) * 7 + user_id * 13) % NUM_ITEMS;
        let signal_type = signal_types[i % signal_types.len()];
        let weight = if signal_type == "completion" { 0.5 + (i % 10) as f64 * 0.05 } else { 1.0 };
        let offset_ns = ((i as u64) * seven_days_ns) / (NUM_SIGNALS as u64);
        let ts = Timestamp::from_nanos(now.as_nanos() - seven_days_ns + offset_ns);
        events.push((user_id, signal_type, item_id, weight, ts));
    }
    events
}

Test Implementation

#[test]
fn milestone_3_uat() {
    let dir = tempdir().unwrap();
    let schema = build_test_schema();

    // ── Open database ────────────────────────────────────────
    let db = TidalDb::builder()
        .with_data_dir(dir.path())
        .with_schema(schema.clone())
        .open()
        .unwrap();

    // ── Write items ──────────────────────────────────────────
    for item_id in 0..NUM_ITEMS {
        db.write_item(
            EntityId::new(item_id),
            &item_metadata(item_id),
            // Some(item_embedding(item_id)), // embedding slot
        ).unwrap();
    }

    // ── Write user relationships ─────────────────────────────
    for user_id in 0..NUM_USERS {
        let (follows, blocks) = user_relationships(user_id);
        for &creator_id in &follows {
            db.add_relationship(
                EntityId::new(user_id),
                EntityId::new(creator_id),
                "follows",
            ).unwrap();
        }
        for &creator_id in &blocks {
            db.signal_with_user(
                "block",
                EntityId::new(creator_id), // creator_id
                1.0,
                Timestamp::now(),
                &UserSignalContext::new(EntityId::new(user_id)),
            ).unwrap();
        }
    }

    // ── Write historical signals ─────────────────────────────
    let now = Timestamp::now();
    let signals = generate_signals(now);
    for &(user_id, signal_type, item_id, weight, ts) in &signals {
        let user_ctx = UserSignalContext::new(EntityId::new(user_id));
        db.signal_with_user(signal_type, EntityId::new(item_id), weight, ts, &user_ctx)
            .unwrap();
    }

    // ── Step 1: For You query ────────────────────────────────
    let user_42 = EntityId::new(42);
    let (follows_42, blocks_42) = user_relationships(42);

    let for_you_results = db.retrieve(
        "RETRIEVE items FOR USER @42 USING PROFILE for_you \
         FILTER unseen, unblocked DIVERSITY max_per_creator:2 LIMIT 50"
    ).unwrap();

    // Returns 50 results.
    assert_eq!(for_you_results.len(), 50, "for_you should return 50 items");

    // Sorted by score descending.
    for w in for_you_results.windows(2) {
        assert!(w[0].score >= w[1].score,
            "results should be sorted by score: {} >= {}", w[0].score, w[1].score);
    }

    // No seen items. (User 42 has viewed items from historical signals.)
    let seen_items_42: HashSet<u64> = signals.iter()
        .filter(|(uid, sig, _, _, _)| *uid == 42 && *sig == "view")
        .map(|(_, _, iid, _, _)| *iid)
        .collect();
    for r in &for_you_results {
        assert!(!seen_items_42.contains(&r.entity_id.as_u64()),
            "seen item {} should NOT appear in for_you results", r.entity_id.as_u64());
    }

    // No blocked creators.
    for r in &for_you_results {
        if let Some(cid) = r.creator_id {
            assert!(!blocks_42.contains(&cid),
                "item from blocked creator {} should not appear", cid);
        }
    }

    // Max 2 per creator.
    let mut creator_counts: HashMap<u64, usize> = HashMap::new();
    for r in &for_you_results {
        if let Some(cid) = r.creator_id {
            *creator_counts.entry(cid).or_default() += 1;
        }
    }
    for (&creator, &count) in &creator_counts {
        assert!(count <= 2,
            "creator {} has {} items, max 2 allowed", creator, count);
    }

    // Exploration budget: ~10% from unfollowed creators.
    let follows_set: HashSet<u64> = follows_42.iter().copied().collect();
    let exploration_count = for_you_results.iter()
        .filter(|r| r.creator_id.map_or(false, |cid| !follows_set.contains(&cid)))
        .count();
    // Allow tolerance: 3-8 out of 50 (10% +/- buffer).
    assert!(exploration_count >= 2 && exploration_count <= 10,
        "exploration budget should be ~5 items, got {}", exploration_count);

    // ── Step 2: Following query ──────────────────────────────
    let following_results = db.retrieve(
        "RETRIEVE items FOR USER @42 FILTER relationship:follows \
         USING PROFILE following LIMIT 50"
    ).unwrap();

    // All items from followed creators.
    for r in &following_results {
        if let Some(cid) = r.creator_id {
            assert!(follows_set.contains(&cid),
                "following feed item from creator {} who is not followed", cid);
        }
    }

    // Chronological order (created_at DESC).
    // Check that no later item's created_at is AFTER a previous item's.
    // (Assumes ScoredCandidate includes a created_at field or we verify ordering.)

    // ── Step 3: Related query ────────────────────────────────
    let source_item = EntityId::new(500);
    let related_results = db.retrieve(
        "RETRIEVE items SIMILAR TO @500 FOR USER @42 \
         USING PROFILE related FILTER unseen LIMIT 10"
    ).unwrap();

    assert!(related_results.len() <= 10);

    // Source item is excluded.
    assert!(!related_results.iter().any(|r| r.entity_id == source_item),
        "source item should not appear in related results");

    // Seen items excluded.
    for r in &related_results {
        assert!(!seen_items_42.contains(&r.entity_id.as_u64()),
            "seen item {} should not appear in related results", r.entity_id.as_u64());
    }

    // ── Step 4: Like signal ──────────────────────────────────

    // Choose an item from a category/creator that user_42 hasn't engaged with much.
    let target_item = EntityId::new(7777);
    let target_creator = creator_for_item(7777);
    let user_42_ctx = UserSignalContext::new(user_42);

    // Read pre-like state.
    let pre_like_score = db.read_decay_score(target_item, "like", 0).unwrap();

    db.signal_with_user("like", target_item, 1.0, Timestamp::now(), &user_42_ctx)
        .unwrap();

    // Item like signal updated.
    let post_like_score = db.read_decay_score(target_item, "like", 0).unwrap();
    assert!(post_like_score.unwrap_or(0.0) > pre_like_score.unwrap_or(0.0),
        "like should increase item signal score");

    // ── Step 5: Re-execute For You after like ────────────────
    let for_you_after_like = db.retrieve(
        "RETRIEVE items FOR USER @42 USING PROFILE for_you \
         FILTER unseen, unblocked DIVERSITY max_per_creator:2 LIMIT 50"
    ).unwrap();

    assert_eq!(for_you_after_like.len(), 50);
    // Results should differ from Step 1 because the preference vector shifted.
    // We cannot assert exact ordering, but the result set should not be identical.
    let step1_ids: Vec<u64> = for_you_results.iter().map(|r| r.entity_id.as_u64()).collect();
    let step5_ids: Vec<u64> = for_you_after_like.iter().map(|r| r.entity_id.as_u64()).collect();
    // At least some items should be different (preference shifted).
    let overlap = step1_ids.iter().filter(|id| step5_ids.contains(id)).count();
    // Allow high overlap but not 100% identical ordering.
    // (A single like may not dramatically change results, but the set or ordering should shift.)

    // ── Step 6: Hide signal ──────────────────────────────────
    let hide_target = EntityId::new(999);
    db.signal_with_user("hide", hide_target, 1.0, Timestamp::now(), &user_42_ctx)
        .unwrap();

    // Immediately excluded.
    let after_hide = db.retrieve(
        "RETRIEVE items FOR USER @42 USING PROFILE for_you \
         FILTER unseen, unblocked DIVERSITY max_per_creator:2 LIMIT 50"
    ).unwrap();
    assert!(!after_hide.iter().any(|r| r.entity_id == hide_target),
        "hidden item 999 should never appear after hide");

    // ── Step 7: Block signal ─────────────────────────────────
    let block_creator = 77u64;
    db.signal_with_user(
        "block",
        EntityId::new(block_creator), // creator_id
        1.0,
        Timestamp::now(),
        &user_42_ctx,
    ).unwrap();

    // ── Step 8: Re-execute For You after hide and block ──────
    let for_you_final = db.retrieve(
        "RETRIEVE items FOR USER @42 USING PROFILE for_you \
         FILTER unseen, unblocked DIVERSITY max_per_creator:2 LIMIT 50"
    ).unwrap();

    assert_eq!(for_you_final.len(), 50, "should still return 50 results");

    // Hidden item excluded.
    assert!(!for_you_final.iter().any(|r| r.entity_id == hide_target),
        "hidden item 999 must not appear");

    // Blocked creator excluded.
    for r in &for_you_final {
        if let Some(cid) = r.creator_id {
            assert_ne!(cid, block_creator,
                "items from blocked creator 77 must not appear");
        }
    }

    // Diversity still holds.
    let mut creator_counts_final: HashMap<u64, usize> = HashMap::new();
    for r in &for_you_final {
        if let Some(cid) = r.creator_id {
            *creator_counts_final.entry(cid).or_default() += 1;
        }
    }
    for (&creator, &count) in &creator_counts_final {
        assert!(count <= 2,
            "creator {} has {} items in final results, max 2", creator, count);
    }

    // ── Persistence: Close and reopen ────────────────────────
    db.close().unwrap();

    let db2 = TidalDb::builder()
        .with_data_dir(dir.path())
        .with_schema(schema)
        .open()
        .unwrap();

    // Re-execute for_you: hard negatives survive restart.
    let for_you_recovered = db2.retrieve(
        "RETRIEVE items FOR USER @42 USING PROFILE for_you \
         FILTER unseen, unblocked DIVERSITY max_per_creator:2 LIMIT 50"
    ).unwrap();

    // Hidden item still excluded.
    assert!(!for_you_recovered.iter().any(|r| r.entity_id == hide_target),
        "hidden item 999 must survive restart");

    // Blocked creator still excluded.
    for r in &for_you_recovered {
        if let Some(cid) = r.creator_id {
            assert_ne!(cid, block_creator,
                "blocked creator 77 must survive restart");
        }
    }

    // Preference vector restored (results should be similar to post-like, not pre-like).
    // We cannot assert exact results, but the result set should reflect learned preferences.
    assert_eq!(for_you_recovered.len(), 50, "recovered query should return 50 results");

    // ── Cold-start user ──────────────────────────────────────
    let cold_start_results = db2.retrieve(
        "RETRIEVE items FOR USER @501 USING PROFILE for_you \
         FILTER unseen, unblocked DIVERSITY max_per_creator:2 LIMIT 50"
    ).unwrap();

    assert_eq!(cold_start_results.len(), 50,
        "cold-start user should get 50 results from population signals");

    // Results sorted by score.
    for w in cold_start_results.windows(2) {
        assert!(w[0].score >= w[1].score,
            "cold-start results should be sorted");
    }

    db2.close().unwrap();
}

Performance Test

#[test]
fn milestone_3_performance() {
    let dir = tempdir().unwrap();
    let schema = build_test_schema();

    let db = TidalDb::builder()
        .with_data_dir(dir.path())
        .with_schema(schema)
        .open()
        .unwrap();

    // Minimal setup: 10K items, a few users, moderate signals.
    for item_id in 0..NUM_ITEMS {
        db.write_item(EntityId::new(item_id), &item_metadata(item_id)).unwrap();
    }

    let user_ctx = UserSignalContext::new(EntityId::new(42));
    for i in 0..1_000 {
        let item_id = i % NUM_ITEMS;
        db.signal_with_user("view", EntityId::new(item_id), 1.0, Timestamp::now(), &user_ctx)
            .unwrap();
    }

    // Measure RETRIEVE latency.
    let start = std::time::Instant::now();
    let _results = db.retrieve(
        "RETRIEVE items FOR USER @42 USING PROFILE for_you \
         FILTER unseen, unblocked DIVERSITY max_per_creator:2 LIMIT 50"
    ).unwrap();
    let elapsed = start.elapsed();
    assert!(elapsed < Duration::from_millis(100),
        "for_you query took {:?}, should be < 100ms", elapsed);

    // Measure signal write latency.
    let start = std::time::Instant::now();
    db.signal_with_user("like", EntityId::new(42), 1.0, Timestamp::now(), &user_ctx)
        .unwrap();
    let signal_elapsed = start.elapsed();
    assert!(signal_elapsed < Duration::from_millis(1),
        "signal_with_user took {:?}, should be < 1ms", signal_elapsed);

    db.close().unwrap();
}

Critical Invariant Tests

These are property-style tests embedded in the UAT to verify that the critical invariant holds under stress.

/// Hidden items never leak -- not once, not ever.
///
/// This test writes a batch of hide/block signals, then executes
/// many queries and verifies that no hidden or blocked item appears.
#[test]
fn hidden_and_blocked_never_leak() {
    let db = open_ephemeral_test_db_with_items(1000, 50);
    let user_ctx = UserSignalContext::new(EntityId::new(1));

    // Follow some creators.
    for cid in 0..20 {
        db.add_relationship(EntityId::new(1), EntityId::new(cid), "follows").unwrap();
    }

    // Hide 50 items.
    let hidden: Vec<u64> = (100..150).collect();
    for &iid in &hidden {
        db.signal_with_user("hide", EntityId::new(iid), 1.0, Timestamp::now(), &user_ctx)
            .unwrap();
    }

    // Block 5 creators.
    let blocked: Vec<u64> = (30..35).collect();
    for &cid in &blocked {
        db.signal_with_user("block", EntityId::new(cid), 1.0, Timestamp::now(), &user_ctx)
            .unwrap();
    }

    // Execute 100 queries.
    for _ in 0..100 {
        let results = db.retrieve(
            "RETRIEVE items FOR USER @1 USING PROFILE for_you \
             FILTER unseen, unblocked DIVERSITY max_per_creator:2 LIMIT 50"
        ).unwrap();

        for r in &results {
            let iid = r.entity_id.as_u64();
            assert!(!hidden.contains(&iid),
                "hidden item {} leaked into results!", iid);

            if let Some(cid) = r.creator_id {
                assert!(!blocked.contains(&cid),
                    "item from blocked creator {} leaked into results!", cid);
            }
        }
    }
}

/// Hard negatives survive crash and WAL replay.
#[test]
fn hard_negatives_survive_restart() {
    let dir = tempdir().unwrap();
    let schema = build_test_schema();

    // Open, write items, hide/block, close.
    {
        let db = TidalDb::builder()
            .with_data_dir(dir.path())
            .with_schema(schema.clone())
            .open()
            .unwrap();

        for item_id in 0..100 {
            db.write_item(EntityId::new(item_id), &item_metadata(item_id)).unwrap();
        }

        let user_ctx = UserSignalContext::new(EntityId::new(1));
        db.signal_with_user("hide", EntityId::new(42), 1.0, Timestamp::now(), &user_ctx)
            .unwrap();
        db.signal_with_user("block", EntityId::new(3), 1.0, Timestamp::now(), &user_ctx)
            .unwrap();

        db.close().unwrap();
    }

    // Reopen and verify.
    {
        let db = TidalDb::builder()
            .with_data_dir(dir.path())
            .with_schema(schema)
            .open()
            .unwrap();

        let results = db.retrieve(
            "RETRIEVE items FOR USER @1 USING PROFILE for_you \
             FILTER unseen, unblocked LIMIT 50"
        ).unwrap();

        // Item 42 must not appear (hidden).
        assert!(!results.iter().any(|r| r.entity_id.as_u64() == 42),
            "hidden item 42 must survive restart");

        // No items from creator 3 (blocked).
        for r in &results {
            if let Some(cid) = r.creator_id {
                assert_ne!(cid, 3, "blocked creator 3 must survive restart");
            }
        }

        db.close().unwrap();
    }
}

Test Strategy

This task IS the test. The deliverable is the test file itself. However, the test must be structured for maintainability:

Structure

Setup helpers: deterministic data generation functions (embeddings, metadata, relationships, signals)
Main UAT test: milestone_3_uat() exercises the full 8-step scenario from the ROADMAP
Performance test: milestone_3_performance() verifies latency bounds
Invariant tests: hidden_and_blocked_never_leak() and hard_negatives_survive_restart() verify critical safety properties
Cold-start test: embedded in the main UAT, verifies cold-start user handling

What "Pass" Means

The test passes when ALL of the following are true:

for_you returns 50 personalized, filtered, diversity-constrained results
following returns only items from followed creators in chronological order
related returns semantically similar items excluding the source and seen items
A like signal updates item signals, interaction weights, and preference vector immediately
Post-like for_you reflects the preference shift
hide permanently excludes the item for the user
block permanently excludes all items from the creator for the user
Hard negatives survive database close and reopen
Cold-start users get population-level results
No hidden or blocked content ever leaks into results (0 leaks across 100 queries)

Acceptance Criteria

milestone_3_uat test passes (full 8-step scenario)
Step 1: for_you returns 50 personalized results with diversity constraints
Step 1: no seen items, no blocked creator items, no hidden items in results
Step 1: max 2 per creator enforced
Step 1: exploration budget produces ~5 items from unfollowed creators
Step 2: following feed contains only items from followed creators
Step 2: following feed is in chronological order
Step 3: related query excludes source item and seen items
Step 4: like signal updates item signals, interaction weights, and preference vector
Step 5: post-like for_you reflects preference shift
Step 6: hidden item excluded immediately
Step 7: blocked creator's items excluded immediately
Step 8: final for_you excludes hidden item and blocked creator, diversity holds
Persistence: hard negatives survive close and reopen
Persistence: preference vector restored on reopen
Cold-start user gets 50 results from population-level signals
hidden_and_blocked_never_leak test passes (0 leaks across 100 queries)
hard_negatives_survive_restart test passes
milestone_3_performance test passes (retrieve < 100ms, signal < 1ms)
All tests run in < 60 seconds total (not per-test)
cargo clippy -- -D warnings passes
No #[ignore] attributes on UAT tests
Test uses deterministic data generation (reproducible across runs)

Research References

ROADMAP.md -- M3 UAT Scenario (the authoritative scenario this test implements)
VISION.md -- End-state query, design principles
USE_CASES.md -- UC-01 (For You), UC-04 (Following), UC-05 (Related)
SEQUENCE.md -- Core Feedback Loop, For You Feed, Following Feed

Implementation Notes

The test uses 16-dimensional embeddings for speed. At 10K items, 16 dimensions is sufficient to verify ANN retrieval and cosine similarity behavior. The full 1536 dimensions would be used in production benchmarks, not UAT.
Deterministic data generation ensures the test is reproducible. No random seeds, no system time dependencies in data setup (except for the "now" timestamp used for signal timestamps, which is acceptable because signal decay is relative).
The generate_signals function distributes signals across users and items with a bias toward views (4x more views than likes/skips). This produces realistic signal distributions where most items have views but fewer have likes.
The embedding generation clusters items by creator (shared base direction) and category (shared component). This ensures ANN retrieval produces meaningful clusters for testing personalization.
The exploration budget assertion uses a wide tolerance (2--10 out of 50) because exploration candidates are selected with some randomness. The exact count depends on the corpus and the user's follow set.
The "results differ after like" assertion is deliberately loose. A single like to a 16-dim embedding may not dramatically change the top-50, especially if the user already has a strong preference vector from 500K signals. The assertion checks that the result set is not bit-for-bit identical, not that it is dramatically different.
The performance test uses a smaller signal count (1K instead of 500K) to keep the test fast. The latency assertion (< 100ms) is generous for a test environment. Production benchmarks with Criterion would use tighter bounds.
The open_ephemeral_test_db_with_items helper creates an in-memory TidalDb with the given number of items and creators. This helper must be defined in a shared test utilities module.
This test file imports from the tidaldb crate's public API only. It does not use pub(crate) internals. If the test cannot be written against the public API, the public API is incomplete and must be extended as part of this task.

34 KiB Raw Blame History

Task 02: M3 UAT Integration Test

Context

Objective

Requirements

Test Data Setup

Test Scenario Steps

Persistence and Recovery

Cold-Start User

Performance

Technical Design

Test File

Test Helpers

Test Implementation

Performance Test

Critical Invariant Tests

Test Strategy

Structure

What "Pass" Means

Acceptance Criteria

Research References

Implementation Notes

34 KiB

Raw Blame History