# Task 08: Hard Negative Crash Invariant Test ## Delivers Integration tests proving that after any crash scenario, `RETRIEVE` never returns items that the user has hidden (hard negatives) or content from creators that the user has blocked. This is the ultimate correctness invariant for crash recovery: no matter what goes wrong during a crash, the user's negative preferences are respected in query results. The invariant under test: if a user has recorded a `hide` relationship on item X or a `block` relationship on creator C, then after any crash-and-recovery sequence, `RETRIEVE ... FOR USER @user ... FILTER unblocked, unseen` must never include item X or any item by creator C in the results. ## Complexity: M ## Dependencies - Task 07 (M6 crash fencing -- ensures all state surfaces recover correctly, which is prerequisite for this end-to-end invariant) ## Technical Design ### 1. Test architecture Each test follows this pattern: 1. **Setup**: Open persistent database. Write items with metadata (including `creator_id`). Write user relationships (hide, block). Write signals to ensure items are rankable. 2. **Verify pre-crash**: Execute `RETRIEVE` and confirm hidden/blocked items are absent. 3. **Crash simulation**: Close and reopen the database (simulating a clean restart, which is the minimal crash scenario; the property tests from tasks 02 and 07 cover unclean crashes). 4. **Verify post-crash**: Execute the same `RETRIEVE` and confirm hidden/blocked items are still absent. ### 2. Hidden items invariant ```rust // tidal/tests/m7_crash_invariant.rs #![allow(clippy::unwrap_used)] use std::collections::HashMap; use std::time::Duration; use tidaldb::schema::{ DecaySpec, EntityId, EntityKind, SchemaBuilder, Timestamp, Window, }; use tidaldb::{TidalDb, TidalDbBuilder}; use tidaldb::query::retrieve::{Retrieve, RetrieveBuilder}; fn invariant_schema() -> tidaldb::schema::Schema { let mut builder = SchemaBuilder::new(); let _ = builder .signal( "view", EntityKind::Item, DecaySpec::Exponential { half_life: Duration::from_secs(7 * 24 * 3600), }, ) .windows(&[Window::AllTime]) .velocity(false) .add(); let _ = builder .signal( "like", EntityKind::Item, DecaySpec::Exponential { half_life: Duration::from_secs(14 * 24 * 3600), }, ) .windows(&[Window::AllTime]) .velocity(false) .add(); // Add text fields for item metadata. builder.text_field("title", tidaldb::schema::TextFieldType::Text); // Add embedding slot for vector search (required by some profiles). builder.embedding_slot(EntityKind::Item, "content", 128); builder.build().unwrap() } fn write_items_with_creators(db: &TidalDb, count: u64) { for i in 1..=count { let creator_id = (i % 5) + 1; // 5 creators let mut meta = HashMap::new(); meta.insert("title".to_string(), format!("Item {i}")); meta.insert("creator_id".to_string(), creator_id.to_string()); meta.insert("category".to_string(), "music".to_string()); db.write_item_with_metadata(EntityId::new(i), &meta).unwrap(); // Write a signal so the item is rankable. let ts = Timestamp::from_nanos(1_000_000_000_000 + i * 1_000_000); db.signal("view", EntityId::new(i), 1.0, ts).unwrap(); } } /// Core invariant: hidden items never appear in RETRIEVE results. #[test] fn hidden_items_never_returned_after_restart() { let dir = tempfile::tempdir().unwrap(); let schema = invariant_schema(); let user_id = 42u64; let hidden_item_ids: Vec = vec![3, 7, 15, 22]; // Phase 1: Write data and hide items. { let db = TidalDb::builder() .with_data_dir(dir.path()) .with_schema(schema.clone()) .open() .unwrap(); write_items_with_creators(&db, 30); // Write user. db.write_user(EntityId::new(user_id), &HashMap::new()).unwrap(); // Hide specific items. for &item_id in &hidden_item_ids { db.hide_item(EntityId::new(user_id), EntityId::new(item_id)).unwrap(); } // Verify pre-crash: hidden items are absent from results. let query = Retrieve::builder() .for_user(user_id) .using_profile("chronological") .filter_unseen() .limit(30) .build(); let results = db.retrieve(&query).unwrap(); for item in &results.items { assert!( !hidden_item_ids.contains(&item.entity_id.as_u64()), "hidden item {} appeared in pre-crash RETRIEVE", item.entity_id.as_u64() ); } db.close().unwrap(); } // Phase 2: Reopen and verify the invariant holds. { let db = TidalDb::builder() .with_data_dir(dir.path()) .with_schema(schema) .open() .unwrap(); let query = Retrieve::builder() .for_user(user_id) .using_profile("chronological") .filter_unseen() .limit(30) .build(); let results = db.retrieve(&query).unwrap(); for item in &results.items { assert!( !hidden_item_ids.contains(&item.entity_id.as_u64()), "INVARIANT VIOLATION: hidden item {} appeared in RETRIEVE after restart", item.entity_id.as_u64() ); } // Also verify via direct state check. for &item_id in &hidden_item_ids { // The user_state should still have the hide relationship. // This depends on rebuild_entity_state scanning the users engine. } db.close().unwrap(); } } ``` ### 3. Blocked creators invariant ```rust /// Core invariant: blocked creator content never appears in RETRIEVE results. #[test] fn blocked_creator_content_never_returned_after_restart() { let dir = tempfile::tempdir().unwrap(); let schema = invariant_schema(); let user_id = 42u64; let blocked_creator_id = 3u64; // creator 3 // Phase 1: Write data and block a creator. { let db = TidalDb::builder() .with_data_dir(dir.path()) .with_schema(schema.clone()) .open() .unwrap(); write_items_with_creators(&db, 30); db.write_user(EntityId::new(user_id), &HashMap::new()).unwrap(); // Block creator 3. db.block_creator(EntityId::new(user_id), EntityId::new(blocked_creator_id)) .unwrap(); // Verify pre-crash: no items by creator 3 in results. let query = Retrieve::builder() .for_user(user_id) .using_profile("chronological") .filter_unblocked() .limit(30) .build(); let results = db.retrieve(&query).unwrap(); for item in &results.items { // Items with creator_id == blocked_creator_id should be absent. // creator_id = (item_id % 5) + 1. So items where (id % 5) + 1 == 3, // i.e., id % 5 == 2, have creator 3. let item_creator = (item.entity_id.as_u64() % 5) + 1; assert_ne!( item_creator, blocked_creator_id, "blocked creator's item {} appeared in pre-crash RETRIEVE", item.entity_id.as_u64() ); } db.close().unwrap(); } // Phase 2: Reopen and verify. { let db = TidalDb::builder() .with_data_dir(dir.path()) .with_schema(schema) .open() .unwrap(); let query = Retrieve::builder() .for_user(user_id) .using_profile("chronological") .filter_unblocked() .limit(30) .build(); let results = db.retrieve(&query).unwrap(); for item in &results.items { let item_creator = (item.entity_id.as_u64() % 5) + 1; assert_ne!( item_creator, blocked_creator_id, "INVARIANT VIOLATION: blocked creator's item {} appeared after restart", item.entity_id.as_u64() ); } db.close().unwrap(); } } ``` ### 4. Combined hide + block invariant ```rust /// Both hidden items AND blocked creators must be absent after restart. #[test] fn combined_hide_and_block_after_restart() { let dir = tempfile::tempdir().unwrap(); let schema = invariant_schema(); let user_id = 99u64; let hidden_items = vec![1u64, 5, 10]; let blocked_creator = 2u64; // creator 2: items where (id % 5) + 1 == 2, i.e., id % 5 == 1 { let db = TidalDb::builder() .with_data_dir(dir.path()) .with_schema(schema.clone()) .open() .unwrap(); write_items_with_creators(&db, 50); db.write_user(EntityId::new(user_id), &HashMap::new()).unwrap(); for &item_id in &hidden_items { db.hide_item(EntityId::new(user_id), EntityId::new(item_id)).unwrap(); } db.block_creator(EntityId::new(user_id), EntityId::new(blocked_creator)) .unwrap(); db.close().unwrap(); } { let db = TidalDb::builder() .with_data_dir(dir.path()) .with_schema(schema) .open() .unwrap(); let query = Retrieve::builder() .for_user(user_id) .using_profile("chronological") .filter_unseen() .filter_unblocked() .limit(50) .build(); let results = db.retrieve(&query).unwrap(); for item in &results.items { let id = item.entity_id.as_u64(); let item_creator = (id % 5) + 1; assert!( !hidden_items.contains(&id), "INVARIANT VIOLATION: hidden item {id} in results after restart" ); assert_ne!( item_creator, blocked_creator, "INVARIANT VIOLATION: blocked creator {blocked_creator}'s item {id} in results after restart" ); } db.close().unwrap(); } } ``` ### 5. Property test: random hide/block patterns ```rust use proptest::prelude::*; proptest! { #![proptest_config(ProptestConfig::with_cases(100))] #[test] fn no_phantom_items_after_restart( item_count in 10usize..60, hidden_count in 1usize..10, block_creator in 1u64..6, ) { let dir = tempfile::tempdir().unwrap(); let schema = invariant_schema(); let user_id = 42u64; // Choose which items to hide (random subset). let hidden: Vec = (1..=hidden_count as u64).collect(); { let db = TidalDb::builder() .with_data_dir(dir.path()) .with_schema(schema.clone()) .open() .unwrap(); write_items_with_creators(&db, item_count as u64); db.write_user(EntityId::new(user_id), &HashMap::new()).unwrap(); for &h in &hidden { if h <= item_count as u64 { db.hide_item(EntityId::new(user_id), EntityId::new(h)).unwrap(); } } db.block_creator(EntityId::new(user_id), EntityId::new(block_creator)) .unwrap(); db.close().unwrap(); } { let db = TidalDb::builder() .with_data_dir(dir.path()) .with_schema(schema) .open() .unwrap(); let query = Retrieve::builder() .for_user(user_id) .using_profile("chronological") .filter_unseen() .filter_unblocked() .limit(item_count as u32) .build(); let results = db.retrieve(&query).unwrap(); for item in &results.items { let id = item.entity_id.as_u64(); let creator = (id % 5) + 1; prop_assert!( !hidden.contains(&id), "hidden item {id} appeared in results" ); prop_assert_ne!( creator, block_creator, "blocked creator {block_creator}'s item {id} appeared" ); } db.close().unwrap(); } } } ``` ### 6. Hard negative leak detection Test that hard negatives recorded via the session feedback path also survive restart. This covers the `HardNegIndex` rebuild from durable `RelationshipType::Hide` edges. ```rust #[test] fn hard_negatives_from_session_survive_restart() { let dir = tempfile::tempdir().unwrap(); let schema = invariant_schema(); let user_id = 42u64; { let db = TidalDb::builder() .with_data_dir(dir.path()) .with_schema(schema.clone()) .open() .unwrap(); write_items_with_creators(&db, 20); db.write_user(EntityId::new(user_id), &HashMap::new()).unwrap(); // Start a session and record negative feedback. let handle = db.start_session(user_id, "test-agent", "default").unwrap(); // Signal "skip" or "dislike" which triggers hard negative. let ts = Timestamp::now(); // Use the skip signal (if registered) or hide_item directly. db.hide_item(EntityId::new(user_id), EntityId::new(5)).unwrap(); db.hide_item(EntityId::new(user_id), EntityId::new(12)).unwrap(); db.close_session(handle.session_id()).unwrap(); db.close().unwrap(); } { let db = TidalDb::builder() .with_data_dir(dir.path()) .with_schema(schema) .open() .unwrap(); // The hard negatives should be rebuilt from the users engine // (RelationshipType::Hide edges). let query = Retrieve::builder() .for_user(user_id) .using_profile("chronological") .filter_unseen() .limit(20) .build(); let results = db.retrieve(&query).unwrap(); let result_ids: Vec = results.items.iter().map(|i| i.entity_id.as_u64()).collect(); assert!( !result_ids.contains(&5), "hidden item 5 leaked after restart" ); assert!( !result_ids.contains(&12), "hidden item 12 leaked after restart" ); db.close().unwrap(); } } ``` ## Acceptance Criteria - [ ] `hidden_items_never_returned_after_restart`: hidden items absent from RETRIEVE after clean restart - [ ] `blocked_creator_content_never_returned_after_restart`: blocked creator items absent after restart - [ ] `combined_hide_and_block_after_restart`: both hidden items and blocked creator content absent - [ ] `no_phantom_items_after_restart`: 100 proptest cases with random hide/block patterns, no invariant violations - [ ] `hard_negatives_from_session_survive_restart`: session-recorded hard negatives persist through restart - [ ] No test produces a false positive (the invariant is actually tested end-to-end through RETRIEVE, not just by checking internal state) - [ ] All tests pass with `cargo test --test m7_crash_invariant` ## Test Strategy The tests above ARE the deliverable. Key design principles: 1. **End-to-end verification**: Every invariant is verified by executing a `RETRIEVE` query through the public API. This catches bugs anywhere in the pipeline (state rebuild, filter evaluation, user state index, hard negative index). 2. **Persistent mode only**: All tests use `with_data_dir()` to exercise the full durability pipeline. 3. **Property tests for coverage**: The `no_phantom_items_after_restart` proptest generates random combinations of hidden items and blocked creators to catch edge cases in the rebuild logic (e.g., boundary conditions in `RoaringBitmap` serialization, off-by-one in entity ID casting). 4. **Explicit creator mapping**: Items are assigned to creators deterministically (`creator_id = (item_id % 5) + 1`) so the test can verify which items should be blocked without needing to read metadata. 5. **Both hide and block paths**: The tests exercise both `hide_item` (user -> item edge, `Tag::Rel` with `RelationshipType::Hide`) and `block_creator` (user -> creator edge, `RelationshipType::Blocks`). Both are rebuilt by `rebuild_entity_state` from the users engine scan.