tidaldb/docs/planning/milestone-7/phase-1/task-08-hard-negative-crash-invariant.md
2026-02-23 22:41:16 -07:00

16 KiB

Task 08: Hard Negative Crash Invariant Test

Delivers

Integration tests proving that after any crash scenario, RETRIEVE never returns items that the user has hidden (hard negatives) or content from creators that the user has blocked. This is the ultimate correctness invariant for crash recovery: no matter what goes wrong during a crash, the user's negative preferences are respected in query results.

The invariant under test: if a user has recorded a hide relationship on item X or a block relationship on creator C, then after any crash-and-recovery sequence, RETRIEVE ... FOR USER @user ... FILTER unblocked, unseen must never include item X or any item by creator C in the results.

Complexity: M

Dependencies

  • Task 07 (M6 crash fencing -- ensures all state surfaces recover correctly, which is prerequisite for this end-to-end invariant)

Technical Design

1. Test architecture

Each test follows this pattern:

  1. Setup: Open persistent database. Write items with metadata (including creator_id). Write user relationships (hide, block). Write signals to ensure items are rankable.
  2. Verify pre-crash: Execute RETRIEVE and confirm hidden/blocked items are absent.
  3. Crash simulation: Close and reopen the database (simulating a clean restart, which is the minimal crash scenario; the property tests from tasks 02 and 07 cover unclean crashes).
  4. Verify post-crash: Execute the same RETRIEVE and confirm hidden/blocked items are still absent.

2. Hidden items invariant

// tidal/tests/m7_crash_invariant.rs

#![allow(clippy::unwrap_used)]

use std::collections::HashMap;
use std::time::Duration;

use tidaldb::schema::{
    DecaySpec, EntityId, EntityKind, SchemaBuilder, Timestamp, Window,
};
use tidaldb::{TidalDb, TidalDbBuilder};
use tidaldb::query::retrieve::{Retrieve, RetrieveBuilder};

fn invariant_schema() -> tidaldb::schema::Schema {
    let mut builder = SchemaBuilder::new();
    let _ = builder
        .signal(
            "view",
            EntityKind::Item,
            DecaySpec::Exponential {
                half_life: Duration::from_secs(7 * 24 * 3600),
            },
        )
        .windows(&[Window::AllTime])
        .velocity(false)
        .add();
    let _ = builder
        .signal(
            "like",
            EntityKind::Item,
            DecaySpec::Exponential {
                half_life: Duration::from_secs(14 * 24 * 3600),
            },
        )
        .windows(&[Window::AllTime])
        .velocity(false)
        .add();
    // Add text fields for item metadata.
    builder.text_field("title", tidaldb::schema::TextFieldType::Text);
    // Add embedding slot for vector search (required by some profiles).
    builder.embedding_slot(EntityKind::Item, "content", 128);
    builder.build().unwrap()
}

fn write_items_with_creators(db: &TidalDb, count: u64) {
    for i in 1..=count {
        let creator_id = (i % 5) + 1; // 5 creators
        let mut meta = HashMap::new();
        meta.insert("title".to_string(), format!("Item {i}"));
        meta.insert("creator_id".to_string(), creator_id.to_string());
        meta.insert("category".to_string(), "music".to_string());
        db.write_item_with_metadata(EntityId::new(i), &meta).unwrap();

        // Write a signal so the item is rankable.
        let ts = Timestamp::from_nanos(1_000_000_000_000 + i * 1_000_000);
        db.signal("view", EntityId::new(i), 1.0, ts).unwrap();
    }
}

/// Core invariant: hidden items never appear in RETRIEVE results.
#[test]
fn hidden_items_never_returned_after_restart() {
    let dir = tempfile::tempdir().unwrap();
    let schema = invariant_schema();
    let user_id = 42u64;
    let hidden_item_ids: Vec<u64> = vec![3, 7, 15, 22];

    // Phase 1: Write data and hide items.
    {
        let db = TidalDb::builder()
            .with_data_dir(dir.path())
            .with_schema(schema.clone())
            .open()
            .unwrap();

        write_items_with_creators(&db, 30);

        // Write user.
        db.write_user(EntityId::new(user_id), &HashMap::new()).unwrap();

        // Hide specific items.
        for &item_id in &hidden_item_ids {
            db.hide_item(EntityId::new(user_id), EntityId::new(item_id)).unwrap();
        }

        // Verify pre-crash: hidden items are absent from results.
        let query = Retrieve::builder()
            .for_user(user_id)
            .using_profile("chronological")
            .filter_unseen()
            .limit(30)
            .build();
        let results = db.retrieve(&query).unwrap();
        for item in &results.items {
            assert!(
                !hidden_item_ids.contains(&item.entity_id.as_u64()),
                "hidden item {} appeared in pre-crash RETRIEVE",
                item.entity_id.as_u64()
            );
        }

        db.close().unwrap();
    }

    // Phase 2: Reopen and verify the invariant holds.
    {
        let db = TidalDb::builder()
            .with_data_dir(dir.path())
            .with_schema(schema)
            .open()
            .unwrap();

        let query = Retrieve::builder()
            .for_user(user_id)
            .using_profile("chronological")
            .filter_unseen()
            .limit(30)
            .build();
        let results = db.retrieve(&query).unwrap();

        for item in &results.items {
            assert!(
                !hidden_item_ids.contains(&item.entity_id.as_u64()),
                "INVARIANT VIOLATION: hidden item {} appeared in RETRIEVE after restart",
                item.entity_id.as_u64()
            );
        }

        // Also verify via direct state check.
        for &item_id in &hidden_item_ids {
            // The user_state should still have the hide relationship.
            // This depends on rebuild_entity_state scanning the users engine.
        }

        db.close().unwrap();
    }
}

3. Blocked creators invariant

/// Core invariant: blocked creator content never appears in RETRIEVE results.
#[test]
fn blocked_creator_content_never_returned_after_restart() {
    let dir = tempfile::tempdir().unwrap();
    let schema = invariant_schema();
    let user_id = 42u64;
    let blocked_creator_id = 3u64; // creator 3

    // Phase 1: Write data and block a creator.
    {
        let db = TidalDb::builder()
            .with_data_dir(dir.path())
            .with_schema(schema.clone())
            .open()
            .unwrap();

        write_items_with_creators(&db, 30);

        db.write_user(EntityId::new(user_id), &HashMap::new()).unwrap();

        // Block creator 3.
        db.block_creator(EntityId::new(user_id), EntityId::new(blocked_creator_id))
            .unwrap();

        // Verify pre-crash: no items by creator 3 in results.
        let query = Retrieve::builder()
            .for_user(user_id)
            .using_profile("chronological")
            .filter_unblocked()
            .limit(30)
            .build();
        let results = db.retrieve(&query).unwrap();

        for item in &results.items {
            // Items with creator_id == blocked_creator_id should be absent.
            // creator_id = (item_id % 5) + 1. So items where (id % 5) + 1 == 3,
            // i.e., id % 5 == 2, have creator 3.
            let item_creator = (item.entity_id.as_u64() % 5) + 1;
            assert_ne!(
                item_creator, blocked_creator_id,
                "blocked creator's item {} appeared in pre-crash RETRIEVE",
                item.entity_id.as_u64()
            );
        }

        db.close().unwrap();
    }

    // Phase 2: Reopen and verify.
    {
        let db = TidalDb::builder()
            .with_data_dir(dir.path())
            .with_schema(schema)
            .open()
            .unwrap();

        let query = Retrieve::builder()
            .for_user(user_id)
            .using_profile("chronological")
            .filter_unblocked()
            .limit(30)
            .build();
        let results = db.retrieve(&query).unwrap();

        for item in &results.items {
            let item_creator = (item.entity_id.as_u64() % 5) + 1;
            assert_ne!(
                item_creator, blocked_creator_id,
                "INVARIANT VIOLATION: blocked creator's item {} appeared after restart",
                item.entity_id.as_u64()
            );
        }

        db.close().unwrap();
    }
}

4. Combined hide + block invariant

/// Both hidden items AND blocked creators must be absent after restart.
#[test]
fn combined_hide_and_block_after_restart() {
    let dir = tempfile::tempdir().unwrap();
    let schema = invariant_schema();
    let user_id = 99u64;
    let hidden_items = vec![1u64, 5, 10];
    let blocked_creator = 2u64; // creator 2: items where (id % 5) + 1 == 2, i.e., id % 5 == 1

    {
        let db = TidalDb::builder()
            .with_data_dir(dir.path())
            .with_schema(schema.clone())
            .open()
            .unwrap();

        write_items_with_creators(&db, 50);
        db.write_user(EntityId::new(user_id), &HashMap::new()).unwrap();

        for &item_id in &hidden_items {
            db.hide_item(EntityId::new(user_id), EntityId::new(item_id)).unwrap();
        }
        db.block_creator(EntityId::new(user_id), EntityId::new(blocked_creator))
            .unwrap();

        db.close().unwrap();
    }

    {
        let db = TidalDb::builder()
            .with_data_dir(dir.path())
            .with_schema(schema)
            .open()
            .unwrap();

        let query = Retrieve::builder()
            .for_user(user_id)
            .using_profile("chronological")
            .filter_unseen()
            .filter_unblocked()
            .limit(50)
            .build();
        let results = db.retrieve(&query).unwrap();

        for item in &results.items {
            let id = item.entity_id.as_u64();
            let item_creator = (id % 5) + 1;

            assert!(
                !hidden_items.contains(&id),
                "INVARIANT VIOLATION: hidden item {id} in results after restart"
            );
            assert_ne!(
                item_creator, blocked_creator,
                "INVARIANT VIOLATION: blocked creator {blocked_creator}'s item {id} in results after restart"
            );
        }

        db.close().unwrap();
    }
}

5. Property test: random hide/block patterns

use proptest::prelude::*;

proptest! {
    #![proptest_config(ProptestConfig::with_cases(100))]

    #[test]
    fn no_phantom_items_after_restart(
        item_count in 10usize..60,
        hidden_count in 1usize..10,
        block_creator in 1u64..6,
    ) {
        let dir = tempfile::tempdir().unwrap();
        let schema = invariant_schema();
        let user_id = 42u64;

        // Choose which items to hide (random subset).
        let hidden: Vec<u64> = (1..=hidden_count as u64).collect();

        {
            let db = TidalDb::builder()
                .with_data_dir(dir.path())
                .with_schema(schema.clone())
                .open()
                .unwrap();

            write_items_with_creators(&db, item_count as u64);
            db.write_user(EntityId::new(user_id), &HashMap::new()).unwrap();

            for &h in &hidden {
                if h <= item_count as u64 {
                    db.hide_item(EntityId::new(user_id), EntityId::new(h)).unwrap();
                }
            }
            db.block_creator(EntityId::new(user_id), EntityId::new(block_creator))
                .unwrap();

            db.close().unwrap();
        }

        {
            let db = TidalDb::builder()
                .with_data_dir(dir.path())
                .with_schema(schema)
                .open()
                .unwrap();

            let query = Retrieve::builder()
                .for_user(user_id)
                .using_profile("chronological")
                .filter_unseen()
                .filter_unblocked()
                .limit(item_count as u32)
                .build();
            let results = db.retrieve(&query).unwrap();

            for item in &results.items {
                let id = item.entity_id.as_u64();
                let creator = (id % 5) + 1;

                prop_assert!(
                    !hidden.contains(&id),
                    "hidden item {id} appeared in results"
                );
                prop_assert_ne!(
                    creator, block_creator,
                    "blocked creator {block_creator}'s item {id} appeared"
                );
            }

            db.close().unwrap();
        }
    }
}

6. Hard negative leak detection

Test that hard negatives recorded via the session feedback path also survive restart. This covers the HardNegIndex rebuild from durable RelationshipType::Hide edges.

#[test]
fn hard_negatives_from_session_survive_restart() {
    let dir = tempfile::tempdir().unwrap();
    let schema = invariant_schema();
    let user_id = 42u64;

    {
        let db = TidalDb::builder()
            .with_data_dir(dir.path())
            .with_schema(schema.clone())
            .open()
            .unwrap();

        write_items_with_creators(&db, 20);
        db.write_user(EntityId::new(user_id), &HashMap::new()).unwrap();

        // Start a session and record negative feedback.
        let handle = db.start_session(user_id, "test-agent", "default").unwrap();

        // Signal "skip" or "dislike" which triggers hard negative.
        let ts = Timestamp::now();
        // Use the skip signal (if registered) or hide_item directly.
        db.hide_item(EntityId::new(user_id), EntityId::new(5)).unwrap();
        db.hide_item(EntityId::new(user_id), EntityId::new(12)).unwrap();

        db.close_session(handle.session_id()).unwrap();
        db.close().unwrap();
    }

    {
        let db = TidalDb::builder()
            .with_data_dir(dir.path())
            .with_schema(schema)
            .open()
            .unwrap();

        // The hard negatives should be rebuilt from the users engine
        // (RelationshipType::Hide edges).
        let query = Retrieve::builder()
            .for_user(user_id)
            .using_profile("chronological")
            .filter_unseen()
            .limit(20)
            .build();
        let results = db.retrieve(&query).unwrap();

        let result_ids: Vec<u64> = results.items.iter().map(|i| i.entity_id.as_u64()).collect();
        assert!(
            !result_ids.contains(&5),
            "hidden item 5 leaked after restart"
        );
        assert!(
            !result_ids.contains(&12),
            "hidden item 12 leaked after restart"
        );

        db.close().unwrap();
    }
}

Acceptance Criteria

  • hidden_items_never_returned_after_restart: hidden items absent from RETRIEVE after clean restart
  • blocked_creator_content_never_returned_after_restart: blocked creator items absent after restart
  • combined_hide_and_block_after_restart: both hidden items and blocked creator content absent
  • no_phantom_items_after_restart: 100 proptest cases with random hide/block patterns, no invariant violations
  • hard_negatives_from_session_survive_restart: session-recorded hard negatives persist through restart
  • No test produces a false positive (the invariant is actually tested end-to-end through RETRIEVE, not just by checking internal state)
  • All tests pass with cargo test --test m7_crash_invariant

Test Strategy

The tests above ARE the deliverable. Key design principles:

  1. End-to-end verification: Every invariant is verified by executing a RETRIEVE query through the public API. This catches bugs anywhere in the pipeline (state rebuild, filter evaluation, user state index, hard negative index).

  2. Persistent mode only: All tests use with_data_dir() to exercise the full durability pipeline.

  3. Property tests for coverage: The no_phantom_items_after_restart proptest generates random combinations of hidden items and blocked creators to catch edge cases in the rebuild logic (e.g., boundary conditions in RoaringBitmap serialization, off-by-one in entity ID casting).

  4. Explicit creator mapping: Items are assigned to creators deterministically (creator_id = (item_id % 5) + 1) so the test can verify which items should be blocked without needing to read metadata.

  5. Both hide and block paths: The tests exercise both hide_item (user -> item edge, Tag::Rel with RelationshipType::Hide) and block_creator (user -> creator edge, RelationshipType::Blocks). Both are rebuilt by rebuild_entity_state from the users engine scan.