tidaldb/docs/planning/milestone-7/phase-1/task-08-hard-negative-crash-invariant.md
2026-02-23 22:41:16 -07:00

482 lines
16 KiB
Markdown

# Task 08: Hard Negative Crash Invariant Test
## Delivers
Integration tests proving that after any crash scenario, `RETRIEVE` never returns items that the user has hidden (hard negatives) or content from creators that the user has blocked. This is the ultimate correctness invariant for crash recovery: no matter what goes wrong during a crash, the user's negative preferences are respected in query results.
The invariant under test: if a user has recorded a `hide` relationship on item X or a `block` relationship on creator C, then after any crash-and-recovery sequence, `RETRIEVE ... FOR USER @user ... FILTER unblocked, unseen` must never include item X or any item by creator C in the results.
## Complexity: M
## Dependencies
- Task 07 (M6 crash fencing -- ensures all state surfaces recover correctly, which is prerequisite for this end-to-end invariant)
## Technical Design
### 1. Test architecture
Each test follows this pattern:
1. **Setup**: Open persistent database. Write items with metadata (including `creator_id`). Write user relationships (hide, block). Write signals to ensure items are rankable.
2. **Verify pre-crash**: Execute `RETRIEVE` and confirm hidden/blocked items are absent.
3. **Crash simulation**: Close and reopen the database (simulating a clean restart, which is the minimal crash scenario; the property tests from tasks 02 and 07 cover unclean crashes).
4. **Verify post-crash**: Execute the same `RETRIEVE` and confirm hidden/blocked items are still absent.
### 2. Hidden items invariant
```rust
// tidal/tests/m7_crash_invariant.rs
#![allow(clippy::unwrap_used)]
use std::collections::HashMap;
use std::time::Duration;
use tidaldb::schema::{
DecaySpec, EntityId, EntityKind, SchemaBuilder, Timestamp, Window,
};
use tidaldb::{TidalDb, TidalDbBuilder};
use tidaldb::query::retrieve::{Retrieve, RetrieveBuilder};
fn invariant_schema() -> tidaldb::schema::Schema {
let mut builder = SchemaBuilder::new();
let _ = builder
.signal(
"view",
EntityKind::Item,
DecaySpec::Exponential {
half_life: Duration::from_secs(7 * 24 * 3600),
},
)
.windows(&[Window::AllTime])
.velocity(false)
.add();
let _ = builder
.signal(
"like",
EntityKind::Item,
DecaySpec::Exponential {
half_life: Duration::from_secs(14 * 24 * 3600),
},
)
.windows(&[Window::AllTime])
.velocity(false)
.add();
// Add text fields for item metadata.
builder.text_field("title", tidaldb::schema::TextFieldType::Text);
// Add embedding slot for vector search (required by some profiles).
builder.embedding_slot(EntityKind::Item, "content", 128);
builder.build().unwrap()
}
fn write_items_with_creators(db: &TidalDb, count: u64) {
for i in 1..=count {
let creator_id = (i % 5) + 1; // 5 creators
let mut meta = HashMap::new();
meta.insert("title".to_string(), format!("Item {i}"));
meta.insert("creator_id".to_string(), creator_id.to_string());
meta.insert("category".to_string(), "music".to_string());
db.write_item_with_metadata(EntityId::new(i), &meta).unwrap();
// Write a signal so the item is rankable.
let ts = Timestamp::from_nanos(1_000_000_000_000 + i * 1_000_000);
db.signal("view", EntityId::new(i), 1.0, ts).unwrap();
}
}
/// Core invariant: hidden items never appear in RETRIEVE results.
#[test]
fn hidden_items_never_returned_after_restart() {
let dir = tempfile::tempdir().unwrap();
let schema = invariant_schema();
let user_id = 42u64;
let hidden_item_ids: Vec<u64> = vec![3, 7, 15, 22];
// Phase 1: Write data and hide items.
{
let db = TidalDb::builder()
.with_data_dir(dir.path())
.with_schema(schema.clone())
.open()
.unwrap();
write_items_with_creators(&db, 30);
// Write user.
db.write_user(EntityId::new(user_id), &HashMap::new()).unwrap();
// Hide specific items.
for &item_id in &hidden_item_ids {
db.hide_item(EntityId::new(user_id), EntityId::new(item_id)).unwrap();
}
// Verify pre-crash: hidden items are absent from results.
let query = Retrieve::builder()
.for_user(user_id)
.using_profile("chronological")
.filter_unseen()
.limit(30)
.build();
let results = db.retrieve(&query).unwrap();
for item in &results.items {
assert!(
!hidden_item_ids.contains(&item.entity_id.as_u64()),
"hidden item {} appeared in pre-crash RETRIEVE",
item.entity_id.as_u64()
);
}
db.close().unwrap();
}
// Phase 2: Reopen and verify the invariant holds.
{
let db = TidalDb::builder()
.with_data_dir(dir.path())
.with_schema(schema)
.open()
.unwrap();
let query = Retrieve::builder()
.for_user(user_id)
.using_profile("chronological")
.filter_unseen()
.limit(30)
.build();
let results = db.retrieve(&query).unwrap();
for item in &results.items {
assert!(
!hidden_item_ids.contains(&item.entity_id.as_u64()),
"INVARIANT VIOLATION: hidden item {} appeared in RETRIEVE after restart",
item.entity_id.as_u64()
);
}
// Also verify via direct state check.
for &item_id in &hidden_item_ids {
// The user_state should still have the hide relationship.
// This depends on rebuild_entity_state scanning the users engine.
}
db.close().unwrap();
}
}
```
### 3. Blocked creators invariant
```rust
/// Core invariant: blocked creator content never appears in RETRIEVE results.
#[test]
fn blocked_creator_content_never_returned_after_restart() {
let dir = tempfile::tempdir().unwrap();
let schema = invariant_schema();
let user_id = 42u64;
let blocked_creator_id = 3u64; // creator 3
// Phase 1: Write data and block a creator.
{
let db = TidalDb::builder()
.with_data_dir(dir.path())
.with_schema(schema.clone())
.open()
.unwrap();
write_items_with_creators(&db, 30);
db.write_user(EntityId::new(user_id), &HashMap::new()).unwrap();
// Block creator 3.
db.block_creator(EntityId::new(user_id), EntityId::new(blocked_creator_id))
.unwrap();
// Verify pre-crash: no items by creator 3 in results.
let query = Retrieve::builder()
.for_user(user_id)
.using_profile("chronological")
.filter_unblocked()
.limit(30)
.build();
let results = db.retrieve(&query).unwrap();
for item in &results.items {
// Items with creator_id == blocked_creator_id should be absent.
// creator_id = (item_id % 5) + 1. So items where (id % 5) + 1 == 3,
// i.e., id % 5 == 2, have creator 3.
let item_creator = (item.entity_id.as_u64() % 5) + 1;
assert_ne!(
item_creator, blocked_creator_id,
"blocked creator's item {} appeared in pre-crash RETRIEVE",
item.entity_id.as_u64()
);
}
db.close().unwrap();
}
// Phase 2: Reopen and verify.
{
let db = TidalDb::builder()
.with_data_dir(dir.path())
.with_schema(schema)
.open()
.unwrap();
let query = Retrieve::builder()
.for_user(user_id)
.using_profile("chronological")
.filter_unblocked()
.limit(30)
.build();
let results = db.retrieve(&query).unwrap();
for item in &results.items {
let item_creator = (item.entity_id.as_u64() % 5) + 1;
assert_ne!(
item_creator, blocked_creator_id,
"INVARIANT VIOLATION: blocked creator's item {} appeared after restart",
item.entity_id.as_u64()
);
}
db.close().unwrap();
}
}
```
### 4. Combined hide + block invariant
```rust
/// Both hidden items AND blocked creators must be absent after restart.
#[test]
fn combined_hide_and_block_after_restart() {
let dir = tempfile::tempdir().unwrap();
let schema = invariant_schema();
let user_id = 99u64;
let hidden_items = vec![1u64, 5, 10];
let blocked_creator = 2u64; // creator 2: items where (id % 5) + 1 == 2, i.e., id % 5 == 1
{
let db = TidalDb::builder()
.with_data_dir(dir.path())
.with_schema(schema.clone())
.open()
.unwrap();
write_items_with_creators(&db, 50);
db.write_user(EntityId::new(user_id), &HashMap::new()).unwrap();
for &item_id in &hidden_items {
db.hide_item(EntityId::new(user_id), EntityId::new(item_id)).unwrap();
}
db.block_creator(EntityId::new(user_id), EntityId::new(blocked_creator))
.unwrap();
db.close().unwrap();
}
{
let db = TidalDb::builder()
.with_data_dir(dir.path())
.with_schema(schema)
.open()
.unwrap();
let query = Retrieve::builder()
.for_user(user_id)
.using_profile("chronological")
.filter_unseen()
.filter_unblocked()
.limit(50)
.build();
let results = db.retrieve(&query).unwrap();
for item in &results.items {
let id = item.entity_id.as_u64();
let item_creator = (id % 5) + 1;
assert!(
!hidden_items.contains(&id),
"INVARIANT VIOLATION: hidden item {id} in results after restart"
);
assert_ne!(
item_creator, blocked_creator,
"INVARIANT VIOLATION: blocked creator {blocked_creator}'s item {id} in results after restart"
);
}
db.close().unwrap();
}
}
```
### 5. Property test: random hide/block patterns
```rust
use proptest::prelude::*;
proptest! {
#![proptest_config(ProptestConfig::with_cases(100))]
#[test]
fn no_phantom_items_after_restart(
item_count in 10usize..60,
hidden_count in 1usize..10,
block_creator in 1u64..6,
) {
let dir = tempfile::tempdir().unwrap();
let schema = invariant_schema();
let user_id = 42u64;
// Choose which items to hide (random subset).
let hidden: Vec<u64> = (1..=hidden_count as u64).collect();
{
let db = TidalDb::builder()
.with_data_dir(dir.path())
.with_schema(schema.clone())
.open()
.unwrap();
write_items_with_creators(&db, item_count as u64);
db.write_user(EntityId::new(user_id), &HashMap::new()).unwrap();
for &h in &hidden {
if h <= item_count as u64 {
db.hide_item(EntityId::new(user_id), EntityId::new(h)).unwrap();
}
}
db.block_creator(EntityId::new(user_id), EntityId::new(block_creator))
.unwrap();
db.close().unwrap();
}
{
let db = TidalDb::builder()
.with_data_dir(dir.path())
.with_schema(schema)
.open()
.unwrap();
let query = Retrieve::builder()
.for_user(user_id)
.using_profile("chronological")
.filter_unseen()
.filter_unblocked()
.limit(item_count as u32)
.build();
let results = db.retrieve(&query).unwrap();
for item in &results.items {
let id = item.entity_id.as_u64();
let creator = (id % 5) + 1;
prop_assert!(
!hidden.contains(&id),
"hidden item {id} appeared in results"
);
prop_assert_ne!(
creator, block_creator,
"blocked creator {block_creator}'s item {id} appeared"
);
}
db.close().unwrap();
}
}
}
```
### 6. Hard negative leak detection
Test that hard negatives recorded via the session feedback path also survive restart. This covers the `HardNegIndex` rebuild from durable `RelationshipType::Hide` edges.
```rust
#[test]
fn hard_negatives_from_session_survive_restart() {
let dir = tempfile::tempdir().unwrap();
let schema = invariant_schema();
let user_id = 42u64;
{
let db = TidalDb::builder()
.with_data_dir(dir.path())
.with_schema(schema.clone())
.open()
.unwrap();
write_items_with_creators(&db, 20);
db.write_user(EntityId::new(user_id), &HashMap::new()).unwrap();
// Start a session and record negative feedback.
let handle = db.start_session(user_id, "test-agent", "default").unwrap();
// Signal "skip" or "dislike" which triggers hard negative.
let ts = Timestamp::now();
// Use the skip signal (if registered) or hide_item directly.
db.hide_item(EntityId::new(user_id), EntityId::new(5)).unwrap();
db.hide_item(EntityId::new(user_id), EntityId::new(12)).unwrap();
db.close_session(handle.session_id()).unwrap();
db.close().unwrap();
}
{
let db = TidalDb::builder()
.with_data_dir(dir.path())
.with_schema(schema)
.open()
.unwrap();
// The hard negatives should be rebuilt from the users engine
// (RelationshipType::Hide edges).
let query = Retrieve::builder()
.for_user(user_id)
.using_profile("chronological")
.filter_unseen()
.limit(20)
.build();
let results = db.retrieve(&query).unwrap();
let result_ids: Vec<u64> = results.items.iter().map(|i| i.entity_id.as_u64()).collect();
assert!(
!result_ids.contains(&5),
"hidden item 5 leaked after restart"
);
assert!(
!result_ids.contains(&12),
"hidden item 12 leaked after restart"
);
db.close().unwrap();
}
}
```
## Acceptance Criteria
- [ ] `hidden_items_never_returned_after_restart`: hidden items absent from RETRIEVE after clean restart
- [ ] `blocked_creator_content_never_returned_after_restart`: blocked creator items absent after restart
- [ ] `combined_hide_and_block_after_restart`: both hidden items and blocked creator content absent
- [ ] `no_phantom_items_after_restart`: 100 proptest cases with random hide/block patterns, no invariant violations
- [ ] `hard_negatives_from_session_survive_restart`: session-recorded hard negatives persist through restart
- [ ] No test produces a false positive (the invariant is actually tested end-to-end through RETRIEVE, not just by checking internal state)
- [ ] All tests pass with `cargo test --test m7_crash_invariant`
## Test Strategy
The tests above ARE the deliverable. Key design principles:
1. **End-to-end verification**: Every invariant is verified by executing a `RETRIEVE` query through the public API. This catches bugs anywhere in the pipeline (state rebuild, filter evaluation, user state index, hard negative index).
2. **Persistent mode only**: All tests use `with_data_dir()` to exercise the full durability pipeline.
3. **Property tests for coverage**: The `no_phantom_items_after_restart` proptest generates random combinations of hidden items and blocked creators to catch edge cases in the rebuild logic (e.g., boundary conditions in `RoaringBitmap` serialization, off-by-one in entity ID casting).
4. **Explicit creator mapping**: Items are assigned to creators deterministically (`creator_id = (item_id % 5) + 1`) so the test can verify which items should be blocked without needing to read metadata.
5. **Both hide and block paths**: The tests exercise both `hide_item` (user -> item edge, `Tag::Rel` with `RelationshipType::Hide`) and `block_creator` (user -> creator edge, `RelationshipType::Blocks`). Both are rebuilt by `rebuild_entity_state` from the users engine scan.