482 lines
16 KiB
Markdown
482 lines
16 KiB
Markdown
# Task 08: Hard Negative Crash Invariant Test
|
|
|
|
## Delivers
|
|
|
|
Integration tests proving that after any crash scenario, `RETRIEVE` never returns items that the user has hidden (hard negatives) or content from creators that the user has blocked. This is the ultimate correctness invariant for crash recovery: no matter what goes wrong during a crash, the user's negative preferences are respected in query results.
|
|
|
|
The invariant under test: if a user has recorded a `hide` relationship on item X or a `block` relationship on creator C, then after any crash-and-recovery sequence, `RETRIEVE ... FOR USER @user ... FILTER unblocked, unseen` must never include item X or any item by creator C in the results.
|
|
|
|
## Complexity: M
|
|
|
|
## Dependencies
|
|
|
|
- Task 07 (M6 crash fencing -- ensures all state surfaces recover correctly, which is prerequisite for this end-to-end invariant)
|
|
|
|
## Technical Design
|
|
|
|
### 1. Test architecture
|
|
|
|
Each test follows this pattern:
|
|
|
|
1. **Setup**: Open persistent database. Write items with metadata (including `creator_id`). Write user relationships (hide, block). Write signals to ensure items are rankable.
|
|
2. **Verify pre-crash**: Execute `RETRIEVE` and confirm hidden/blocked items are absent.
|
|
3. **Crash simulation**: Close and reopen the database (simulating a clean restart, which is the minimal crash scenario; the property tests from tasks 02 and 07 cover unclean crashes).
|
|
4. **Verify post-crash**: Execute the same `RETRIEVE` and confirm hidden/blocked items are still absent.
|
|
|
|
### 2. Hidden items invariant
|
|
|
|
```rust
|
|
// tidal/tests/m7_crash_invariant.rs
|
|
|
|
#![allow(clippy::unwrap_used)]
|
|
|
|
use std::collections::HashMap;
|
|
use std::time::Duration;
|
|
|
|
use tidaldb::schema::{
|
|
DecaySpec, EntityId, EntityKind, SchemaBuilder, Timestamp, Window,
|
|
};
|
|
use tidaldb::{TidalDb, TidalDbBuilder};
|
|
use tidaldb::query::retrieve::{Retrieve, RetrieveBuilder};
|
|
|
|
fn invariant_schema() -> tidaldb::schema::Schema {
|
|
let mut builder = SchemaBuilder::new();
|
|
let _ = builder
|
|
.signal(
|
|
"view",
|
|
EntityKind::Item,
|
|
DecaySpec::Exponential {
|
|
half_life: Duration::from_secs(7 * 24 * 3600),
|
|
},
|
|
)
|
|
.windows(&[Window::AllTime])
|
|
.velocity(false)
|
|
.add();
|
|
let _ = builder
|
|
.signal(
|
|
"like",
|
|
EntityKind::Item,
|
|
DecaySpec::Exponential {
|
|
half_life: Duration::from_secs(14 * 24 * 3600),
|
|
},
|
|
)
|
|
.windows(&[Window::AllTime])
|
|
.velocity(false)
|
|
.add();
|
|
// Add text fields for item metadata.
|
|
builder.text_field("title", tidaldb::schema::TextFieldType::Text);
|
|
// Add embedding slot for vector search (required by some profiles).
|
|
builder.embedding_slot(EntityKind::Item, "content", 128);
|
|
builder.build().unwrap()
|
|
}
|
|
|
|
fn write_items_with_creators(db: &TidalDb, count: u64) {
|
|
for i in 1..=count {
|
|
let creator_id = (i % 5) + 1; // 5 creators
|
|
let mut meta = HashMap::new();
|
|
meta.insert("title".to_string(), format!("Item {i}"));
|
|
meta.insert("creator_id".to_string(), creator_id.to_string());
|
|
meta.insert("category".to_string(), "music".to_string());
|
|
db.write_item_with_metadata(EntityId::new(i), &meta).unwrap();
|
|
|
|
// Write a signal so the item is rankable.
|
|
let ts = Timestamp::from_nanos(1_000_000_000_000 + i * 1_000_000);
|
|
db.signal("view", EntityId::new(i), 1.0, ts).unwrap();
|
|
}
|
|
}
|
|
|
|
/// Core invariant: hidden items never appear in RETRIEVE results.
|
|
#[test]
|
|
fn hidden_items_never_returned_after_restart() {
|
|
let dir = tempfile::tempdir().unwrap();
|
|
let schema = invariant_schema();
|
|
let user_id = 42u64;
|
|
let hidden_item_ids: Vec<u64> = vec![3, 7, 15, 22];
|
|
|
|
// Phase 1: Write data and hide items.
|
|
{
|
|
let db = TidalDb::builder()
|
|
.with_data_dir(dir.path())
|
|
.with_schema(schema.clone())
|
|
.open()
|
|
.unwrap();
|
|
|
|
write_items_with_creators(&db, 30);
|
|
|
|
// Write user.
|
|
db.write_user(EntityId::new(user_id), &HashMap::new()).unwrap();
|
|
|
|
// Hide specific items.
|
|
for &item_id in &hidden_item_ids {
|
|
db.hide_item(EntityId::new(user_id), EntityId::new(item_id)).unwrap();
|
|
}
|
|
|
|
// Verify pre-crash: hidden items are absent from results.
|
|
let query = Retrieve::builder()
|
|
.for_user(user_id)
|
|
.using_profile("chronological")
|
|
.filter_unseen()
|
|
.limit(30)
|
|
.build();
|
|
let results = db.retrieve(&query).unwrap();
|
|
for item in &results.items {
|
|
assert!(
|
|
!hidden_item_ids.contains(&item.entity_id.as_u64()),
|
|
"hidden item {} appeared in pre-crash RETRIEVE",
|
|
item.entity_id.as_u64()
|
|
);
|
|
}
|
|
|
|
db.close().unwrap();
|
|
}
|
|
|
|
// Phase 2: Reopen and verify the invariant holds.
|
|
{
|
|
let db = TidalDb::builder()
|
|
.with_data_dir(dir.path())
|
|
.with_schema(schema)
|
|
.open()
|
|
.unwrap();
|
|
|
|
let query = Retrieve::builder()
|
|
.for_user(user_id)
|
|
.using_profile("chronological")
|
|
.filter_unseen()
|
|
.limit(30)
|
|
.build();
|
|
let results = db.retrieve(&query).unwrap();
|
|
|
|
for item in &results.items {
|
|
assert!(
|
|
!hidden_item_ids.contains(&item.entity_id.as_u64()),
|
|
"INVARIANT VIOLATION: hidden item {} appeared in RETRIEVE after restart",
|
|
item.entity_id.as_u64()
|
|
);
|
|
}
|
|
|
|
// Also verify via direct state check.
|
|
for &item_id in &hidden_item_ids {
|
|
// The user_state should still have the hide relationship.
|
|
// This depends on rebuild_entity_state scanning the users engine.
|
|
}
|
|
|
|
db.close().unwrap();
|
|
}
|
|
}
|
|
```
|
|
|
|
### 3. Blocked creators invariant
|
|
|
|
```rust
|
|
/// Core invariant: blocked creator content never appears in RETRIEVE results.
|
|
#[test]
|
|
fn blocked_creator_content_never_returned_after_restart() {
|
|
let dir = tempfile::tempdir().unwrap();
|
|
let schema = invariant_schema();
|
|
let user_id = 42u64;
|
|
let blocked_creator_id = 3u64; // creator 3
|
|
|
|
// Phase 1: Write data and block a creator.
|
|
{
|
|
let db = TidalDb::builder()
|
|
.with_data_dir(dir.path())
|
|
.with_schema(schema.clone())
|
|
.open()
|
|
.unwrap();
|
|
|
|
write_items_with_creators(&db, 30);
|
|
|
|
db.write_user(EntityId::new(user_id), &HashMap::new()).unwrap();
|
|
|
|
// Block creator 3.
|
|
db.block_creator(EntityId::new(user_id), EntityId::new(blocked_creator_id))
|
|
.unwrap();
|
|
|
|
// Verify pre-crash: no items by creator 3 in results.
|
|
let query = Retrieve::builder()
|
|
.for_user(user_id)
|
|
.using_profile("chronological")
|
|
.filter_unblocked()
|
|
.limit(30)
|
|
.build();
|
|
let results = db.retrieve(&query).unwrap();
|
|
|
|
for item in &results.items {
|
|
// Items with creator_id == blocked_creator_id should be absent.
|
|
// creator_id = (item_id % 5) + 1. So items where (id % 5) + 1 == 3,
|
|
// i.e., id % 5 == 2, have creator 3.
|
|
let item_creator = (item.entity_id.as_u64() % 5) + 1;
|
|
assert_ne!(
|
|
item_creator, blocked_creator_id,
|
|
"blocked creator's item {} appeared in pre-crash RETRIEVE",
|
|
item.entity_id.as_u64()
|
|
);
|
|
}
|
|
|
|
db.close().unwrap();
|
|
}
|
|
|
|
// Phase 2: Reopen and verify.
|
|
{
|
|
let db = TidalDb::builder()
|
|
.with_data_dir(dir.path())
|
|
.with_schema(schema)
|
|
.open()
|
|
.unwrap();
|
|
|
|
let query = Retrieve::builder()
|
|
.for_user(user_id)
|
|
.using_profile("chronological")
|
|
.filter_unblocked()
|
|
.limit(30)
|
|
.build();
|
|
let results = db.retrieve(&query).unwrap();
|
|
|
|
for item in &results.items {
|
|
let item_creator = (item.entity_id.as_u64() % 5) + 1;
|
|
assert_ne!(
|
|
item_creator, blocked_creator_id,
|
|
"INVARIANT VIOLATION: blocked creator's item {} appeared after restart",
|
|
item.entity_id.as_u64()
|
|
);
|
|
}
|
|
|
|
db.close().unwrap();
|
|
}
|
|
}
|
|
```
|
|
|
|
### 4. Combined hide + block invariant
|
|
|
|
```rust
|
|
/// Both hidden items AND blocked creators must be absent after restart.
|
|
#[test]
|
|
fn combined_hide_and_block_after_restart() {
|
|
let dir = tempfile::tempdir().unwrap();
|
|
let schema = invariant_schema();
|
|
let user_id = 99u64;
|
|
let hidden_items = vec![1u64, 5, 10];
|
|
let blocked_creator = 2u64; // creator 2: items where (id % 5) + 1 == 2, i.e., id % 5 == 1
|
|
|
|
{
|
|
let db = TidalDb::builder()
|
|
.with_data_dir(dir.path())
|
|
.with_schema(schema.clone())
|
|
.open()
|
|
.unwrap();
|
|
|
|
write_items_with_creators(&db, 50);
|
|
db.write_user(EntityId::new(user_id), &HashMap::new()).unwrap();
|
|
|
|
for &item_id in &hidden_items {
|
|
db.hide_item(EntityId::new(user_id), EntityId::new(item_id)).unwrap();
|
|
}
|
|
db.block_creator(EntityId::new(user_id), EntityId::new(blocked_creator))
|
|
.unwrap();
|
|
|
|
db.close().unwrap();
|
|
}
|
|
|
|
{
|
|
let db = TidalDb::builder()
|
|
.with_data_dir(dir.path())
|
|
.with_schema(schema)
|
|
.open()
|
|
.unwrap();
|
|
|
|
let query = Retrieve::builder()
|
|
.for_user(user_id)
|
|
.using_profile("chronological")
|
|
.filter_unseen()
|
|
.filter_unblocked()
|
|
.limit(50)
|
|
.build();
|
|
let results = db.retrieve(&query).unwrap();
|
|
|
|
for item in &results.items {
|
|
let id = item.entity_id.as_u64();
|
|
let item_creator = (id % 5) + 1;
|
|
|
|
assert!(
|
|
!hidden_items.contains(&id),
|
|
"INVARIANT VIOLATION: hidden item {id} in results after restart"
|
|
);
|
|
assert_ne!(
|
|
item_creator, blocked_creator,
|
|
"INVARIANT VIOLATION: blocked creator {blocked_creator}'s item {id} in results after restart"
|
|
);
|
|
}
|
|
|
|
db.close().unwrap();
|
|
}
|
|
}
|
|
```
|
|
|
|
### 5. Property test: random hide/block patterns
|
|
|
|
```rust
|
|
use proptest::prelude::*;
|
|
|
|
proptest! {
|
|
#![proptest_config(ProptestConfig::with_cases(100))]
|
|
|
|
#[test]
|
|
fn no_phantom_items_after_restart(
|
|
item_count in 10usize..60,
|
|
hidden_count in 1usize..10,
|
|
block_creator in 1u64..6,
|
|
) {
|
|
let dir = tempfile::tempdir().unwrap();
|
|
let schema = invariant_schema();
|
|
let user_id = 42u64;
|
|
|
|
// Choose which items to hide (random subset).
|
|
let hidden: Vec<u64> = (1..=hidden_count as u64).collect();
|
|
|
|
{
|
|
let db = TidalDb::builder()
|
|
.with_data_dir(dir.path())
|
|
.with_schema(schema.clone())
|
|
.open()
|
|
.unwrap();
|
|
|
|
write_items_with_creators(&db, item_count as u64);
|
|
db.write_user(EntityId::new(user_id), &HashMap::new()).unwrap();
|
|
|
|
for &h in &hidden {
|
|
if h <= item_count as u64 {
|
|
db.hide_item(EntityId::new(user_id), EntityId::new(h)).unwrap();
|
|
}
|
|
}
|
|
db.block_creator(EntityId::new(user_id), EntityId::new(block_creator))
|
|
.unwrap();
|
|
|
|
db.close().unwrap();
|
|
}
|
|
|
|
{
|
|
let db = TidalDb::builder()
|
|
.with_data_dir(dir.path())
|
|
.with_schema(schema)
|
|
.open()
|
|
.unwrap();
|
|
|
|
let query = Retrieve::builder()
|
|
.for_user(user_id)
|
|
.using_profile("chronological")
|
|
.filter_unseen()
|
|
.filter_unblocked()
|
|
.limit(item_count as u32)
|
|
.build();
|
|
let results = db.retrieve(&query).unwrap();
|
|
|
|
for item in &results.items {
|
|
let id = item.entity_id.as_u64();
|
|
let creator = (id % 5) + 1;
|
|
|
|
prop_assert!(
|
|
!hidden.contains(&id),
|
|
"hidden item {id} appeared in results"
|
|
);
|
|
prop_assert_ne!(
|
|
creator, block_creator,
|
|
"blocked creator {block_creator}'s item {id} appeared"
|
|
);
|
|
}
|
|
|
|
db.close().unwrap();
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### 6. Hard negative leak detection
|
|
|
|
Test that hard negatives recorded via the session feedback path also survive restart. This covers the `HardNegIndex` rebuild from durable `RelationshipType::Hide` edges.
|
|
|
|
```rust
|
|
#[test]
|
|
fn hard_negatives_from_session_survive_restart() {
|
|
let dir = tempfile::tempdir().unwrap();
|
|
let schema = invariant_schema();
|
|
let user_id = 42u64;
|
|
|
|
{
|
|
let db = TidalDb::builder()
|
|
.with_data_dir(dir.path())
|
|
.with_schema(schema.clone())
|
|
.open()
|
|
.unwrap();
|
|
|
|
write_items_with_creators(&db, 20);
|
|
db.write_user(EntityId::new(user_id), &HashMap::new()).unwrap();
|
|
|
|
// Start a session and record negative feedback.
|
|
let handle = db.start_session(user_id, "test-agent", "default").unwrap();
|
|
|
|
// Signal "skip" or "dislike" which triggers hard negative.
|
|
let ts = Timestamp::now();
|
|
// Use the skip signal (if registered) or hide_item directly.
|
|
db.hide_item(EntityId::new(user_id), EntityId::new(5)).unwrap();
|
|
db.hide_item(EntityId::new(user_id), EntityId::new(12)).unwrap();
|
|
|
|
db.close_session(handle.session_id()).unwrap();
|
|
db.close().unwrap();
|
|
}
|
|
|
|
{
|
|
let db = TidalDb::builder()
|
|
.with_data_dir(dir.path())
|
|
.with_schema(schema)
|
|
.open()
|
|
.unwrap();
|
|
|
|
// The hard negatives should be rebuilt from the users engine
|
|
// (RelationshipType::Hide edges).
|
|
let query = Retrieve::builder()
|
|
.for_user(user_id)
|
|
.using_profile("chronological")
|
|
.filter_unseen()
|
|
.limit(20)
|
|
.build();
|
|
let results = db.retrieve(&query).unwrap();
|
|
|
|
let result_ids: Vec<u64> = results.items.iter().map(|i| i.entity_id.as_u64()).collect();
|
|
assert!(
|
|
!result_ids.contains(&5),
|
|
"hidden item 5 leaked after restart"
|
|
);
|
|
assert!(
|
|
!result_ids.contains(&12),
|
|
"hidden item 12 leaked after restart"
|
|
);
|
|
|
|
db.close().unwrap();
|
|
}
|
|
}
|
|
```
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [ ] `hidden_items_never_returned_after_restart`: hidden items absent from RETRIEVE after clean restart
|
|
- [ ] `blocked_creator_content_never_returned_after_restart`: blocked creator items absent after restart
|
|
- [ ] `combined_hide_and_block_after_restart`: both hidden items and blocked creator content absent
|
|
- [ ] `no_phantom_items_after_restart`: 100 proptest cases with random hide/block patterns, no invariant violations
|
|
- [ ] `hard_negatives_from_session_survive_restart`: session-recorded hard negatives persist through restart
|
|
- [ ] No test produces a false positive (the invariant is actually tested end-to-end through RETRIEVE, not just by checking internal state)
|
|
- [ ] All tests pass with `cargo test --test m7_crash_invariant`
|
|
|
|
## Test Strategy
|
|
|
|
The tests above ARE the deliverable. Key design principles:
|
|
|
|
1. **End-to-end verification**: Every invariant is verified by executing a `RETRIEVE` query through the public API. This catches bugs anywhere in the pipeline (state rebuild, filter evaluation, user state index, hard negative index).
|
|
|
|
2. **Persistent mode only**: All tests use `with_data_dir()` to exercise the full durability pipeline.
|
|
|
|
3. **Property tests for coverage**: The `no_phantom_items_after_restart` proptest generates random combinations of hidden items and blocked creators to catch edge cases in the rebuild logic (e.g., boundary conditions in `RoaringBitmap` serialization, off-by-one in entity ID casting).
|
|
|
|
4. **Explicit creator mapping**: Items are assigned to creators deterministically (`creator_id = (item_id % 5) + 1`) so the test can verify which items should be blocked without needing to read metadata.
|
|
|
|
5. **Both hide and block paths**: The tests exercise both `hide_item` (user -> item edge, `Tag::Rel` with `RelationshipType::Hide`) and `block_creator` (user -> creator edge, `RelationshipType::Blocks`). Both are rebuilt by `rebuild_entity_state` from the users engine scan.
|