# Task 03: Hard Negatives ## Context **Milestone:** 3 -- Personalized Ranking **Phase:** m3p2 -- Feedback Loop **Depends On:** m3p1 Task 02 (relationship graph: `RelationshipType::Hide`, `RelationshipType::Blocks`), m3p1 Task 03 (`UserStateIndex`: `add_hide`, `add_block`), m1p2 (WAL: durability infrastructure) **Blocks:** Task 04 (Atomic Signal Dispatch integrates hard negative writes), m3p4 (User State Filters depend on crash-safe hide/block state) **Complexity:** L ## Objective Deliver crash-safe hard negative storage. When a user hides an item or blocks a creator, that exclusion must be permanent and must survive process crash + WAL replay without leaking. A hidden item that reappears after restart is a trust-destroying bug. Hard negatives have three storage layers: 1. **WAL event**: durable, append-only, replayed on crash recovery 2. **Relationship edge**: permanent storage in users keyspace (`RelationshipType::Hide` or `RelationshipType::Blocks`) 3. **In-memory bitmap**: `UserStateIndex.add_hide()` / `UserStateIndex.add_block()` for O(1) query-time filtering This task delivers the WAL event format for relationship changes, the replay logic that reconstructs hide/block state from WAL events, and the `HardNegativeStore` that coordinates all three layers. The critical invariant: **for any sequence of hide/block/signal events, a RETRIEVE query NEVER returns a hidden item or a blocked creator's items**. This invariant is enforced by a property test that generates random event sequences and verifies exclusion. ## Requirements - `WalRelationshipEvent` struct: WAL event for relationship changes (hide, block, unblock) - WAL event format extends existing WAL wire format with a `RelationshipChange` tag - `HardNegativeStore` struct: coordinates WAL write, storage write, and bitmap update - `store.hide_item(user_id, item_id, timestamp)` atomically: appends to WAL, writes Hide edge, adds to `UserStateIndex` - `store.block_creator(user_id, creator_id, timestamp)` atomically: appends to WAL, writes Blocks edge, adds to `UserStateIndex` - `store.replay_wal_event(event)` replays a WAL relationship event into storage + bitmap - Hide events point from user to item (`RelationshipType::Hide`) - Block events point from user to creator (`RelationshipType::Blocks`) - Hide and block are permanent -- no automatic expiry - WAL replay reconstructs all hide/block state correctly - Crash at any point during the hide/block write path never produces a state where the hide/block is lost - `unblock_creator(user_id, creator_id)` and `unhide_item(user_id, item_id)` for explicit reversal ## Technical Design ### Module Structure ``` tidal/src/ entities/ hard_neg.rs -- HardNegativeStore, WalRelationshipEvent, replay ``` ### WAL Event Extension ```rust // === entities/hard_neg.rs === use crate::schema::{EntityId, Timestamp}; use crate::entities::relationship::RelationshipType; /// WAL event for a relationship change. /// /// Extends the WAL event format to support durable relationship mutations. /// The WAL is the source of truth for crash recovery: on restart, all /// relationship change events are replayed to rebuild in-memory state. #[derive(Debug, Clone, PartialEq)] pub struct WalRelationshipEvent { /// User performing the action. pub user_id: EntityId, /// Target entity (item for hide, creator for block). pub target_id: EntityId, /// Relationship type (Hide or Blocks). pub rel_type: RelationshipType, /// Whether this is an add (true) or remove (false) operation. pub is_add: bool, /// Timestamp of the event. pub timestamp_nanos: u64, } /// Wire format for WAL relationship events. /// /// ```text /// [tag: 1 byte = 0x52 ('R' for Relationship)] /// [user_id: 8 bytes BE] /// [target_id: 8 bytes BE] /// [rel_type: 1 byte] /// [is_add: 1 byte (0 or 1)] /// [timestamp: 8 bytes LE] /// ``` /// /// Total: 27 bytes (fixed size). pub const WAL_REL_TAG: u8 = 0x52; // 'R' pub fn serialize_wal_rel_event(event: &WalRelationshipEvent) -> [u8; 27] { let mut buf = [0u8; 27]; buf[0] = WAL_REL_TAG; buf[1..9].copy_from_slice(&event.user_id.to_be_bytes()); buf[9..17].copy_from_slice(&event.target_id.to_be_bytes()); buf[17] = event.rel_type.as_byte(); buf[18] = if event.is_add { 1 } else { 0 }; buf[19..27].copy_from_slice(&event.timestamp_nanos.to_le_bytes()); buf } pub fn deserialize_wal_rel_event(buf: &[u8]) -> Option { if buf.len() < 27 || buf[0] != WAL_REL_TAG { return None; } let user_id = EntityId::new(u64::from_be_bytes(buf[1..9].try_into().ok()?)); let target_id = EntityId::new(u64::from_be_bytes(buf[9..17].try_into().ok()?)); let rel_type = RelationshipType::from_byte(buf[17])?; let is_add = buf[18] != 0; let timestamp_nanos = u64::from_le_bytes(buf[19..27].try_into().ok()?); Some(WalRelationshipEvent { user_id, target_id, rel_type, is_add, timestamp_nanos, }) } ``` ### HardNegativeStore ```rust use crate::entities::user_state::UserStateIndex; use crate::storage::StorageEngine; /// Coordinates hard negative writes across WAL, storage, and in-memory bitmap. /// /// The write order is: /// 1. Append to WAL (durability first) /// 2. Write relationship edge to storage (persistent state) /// 3. Update in-memory bitmap (query-time filter) /// /// If the process crashes after step 1 but before step 2/3, WAL replay /// on the next startup will re-execute steps 2 and 3 from the WAL event. /// This is safe because relationship writes are idempotent. pub struct HardNegativeStore { /// Reference to the user state index for in-memory bitmap updates. /// The store borrows this; ownership remains with TidalDb. _phantom: std::marker::PhantomData<()>, } impl HardNegativeStore { /// Hide an item for a user. /// /// Atomically: /// 1. Append WAL relationship event /// 2. Write Hide edge to users keyspace /// 3. Add item to UserStateIndex.hidden_items /// /// # Ordering guarantee /// /// After this method returns, the item is excluded from all /// RETRIEVE queries for this user, even if the process crashes /// immediately after (WAL replay will re-apply). pub fn hide_item( user_id: EntityId, item_id: EntityId, timestamp: Timestamp, wal_writer: &dyn WalRelWriter, storage: &dyn StorageEngine, user_state: &UserStateIndex, ) -> crate::Result<()> { let event = WalRelationshipEvent { user_id, target_id: item_id, rel_type: RelationshipType::Hide, is_add: true, timestamp_nanos: timestamp.as_nanos(), }; // 1. WAL first. wal_writer.append_relationship(&event)?; // 2. Persist to storage. Self::write_edge_to_storage(storage, &event)?; // 3. Update in-memory bitmap. user_state.add_hide(user_id, item_id); Ok(()) } /// Block a creator for a user. /// /// Atomically: /// 1. Append WAL relationship event /// 2. Write Blocks edge to users keyspace /// 3. Add creator to UserStateIndex.blocked_creators pub fn block_creator( user_id: EntityId, creator_id: EntityId, timestamp: Timestamp, wal_writer: &dyn WalRelWriter, storage: &dyn StorageEngine, user_state: &UserStateIndex, ) -> crate::Result<()> { let event = WalRelationshipEvent { user_id, target_id: creator_id, rel_type: RelationshipType::Blocks, is_add: true, timestamp_nanos: timestamp.as_nanos(), }; // 1. WAL first. wal_writer.append_relationship(&event)?; // 2. Persist to storage. Self::write_edge_to_storage(storage, &event)?; // 3. Update in-memory bitmap. user_state.add_block(user_id, creator_id); Ok(()) } /// Unhide an item (explicit reversal). pub fn unhide_item( user_id: EntityId, item_id: EntityId, timestamp: Timestamp, wal_writer: &dyn WalRelWriter, storage: &dyn StorageEngine, user_state: &UserStateIndex, ) -> crate::Result<()> { let event = WalRelationshipEvent { user_id, target_id: item_id, rel_type: RelationshipType::Hide, is_add: false, timestamp_nanos: timestamp.as_nanos(), }; wal_writer.append_relationship(&event)?; Self::delete_edge_from_storage(storage, &event)?; user_state.remove_hide(user_id, item_id); Ok(()) } /// Unblock a creator (explicit reversal). pub fn unblock_creator( user_id: EntityId, creator_id: EntityId, timestamp: Timestamp, wal_writer: &dyn WalRelWriter, storage: &dyn StorageEngine, user_state: &UserStateIndex, ) -> crate::Result<()> { let event = WalRelationshipEvent { user_id, target_id: creator_id, rel_type: RelationshipType::Blocks, is_add: false, timestamp_nanos: timestamp.as_nanos(), }; wal_writer.append_relationship(&event)?; Self::delete_edge_from_storage(storage, &event)?; user_state.remove_block(user_id, creator_id); Ok(()) } /// Replay a WAL relationship event during crash recovery. /// /// Applies steps 2 (storage write) and 3 (bitmap update) from the /// WAL event. This is idempotent: replaying the same event twice /// produces the same state. pub fn replay_wal_event( event: &WalRelationshipEvent, storage: &dyn StorageEngine, user_state: &UserStateIndex, ) -> crate::Result<()> { if event.is_add { Self::write_edge_to_storage(storage, event)?; match event.rel_type { RelationshipType::Hide => { user_state.add_hide(event.user_id, event.target_id); } RelationshipType::Blocks => { user_state.add_block(event.user_id, event.target_id); } _ => {} // Other types handled elsewhere } } else { Self::delete_edge_from_storage(storage, event)?; match event.rel_type { RelationshipType::Hide => { user_state.remove_hide(event.user_id, event.target_id); } RelationshipType::Blocks => { user_state.remove_block(event.user_id, event.target_id); } _ => {} } } Ok(()) } /// Write a relationship edge to storage. fn write_edge_to_storage( storage: &dyn StorageEngine, event: &WalRelationshipEvent, ) -> crate::Result<()> { use crate::entities::relationship::{ encode_relationship_key, encode_relationship_value, }; let key = encode_relationship_key(event.user_id, event.rel_type, event.target_id); let value = encode_relationship_value(1.0, event.timestamp_nanos); storage.put(&key, &value).map_err(crate::LumenError::from) } /// Delete a relationship edge from storage. fn delete_edge_from_storage( storage: &dyn StorageEngine, event: &WalRelationshipEvent, ) -> crate::Result<()> { use crate::entities::relationship::encode_relationship_key; let key = encode_relationship_key(event.user_id, event.rel_type, event.target_id); storage.delete(&key).map_err(crate::LumenError::from) } } /// Trait for appending relationship events to the WAL. /// /// Separated from `WalWriter` (signal events) because the event format /// and tag byte are different. The `WalHandleWriter` implements both traits. pub trait WalRelWriter: Send + Sync { fn append_relationship(&self, event: &WalRelationshipEvent) -> crate::Result<()>; } /// No-op WAL writer for testing. pub struct NoopWalRelWriter; impl WalRelWriter for NoopWalRelWriter { fn append_relationship(&self, _event: &WalRelationshipEvent) -> crate::Result<()> { Ok(()) } } ``` ## Test Strategy ### Unit Tests ```rust #[test] fn wal_rel_event_serialize_roundtrip() { let event = WalRelationshipEvent { user_id: EntityId::new(42), target_id: EntityId::new(999), rel_type: RelationshipType::Hide, is_add: true, timestamp_nanos: 1_000_000_000_000_000_000, }; let bytes = serialize_wal_rel_event(&event); let recovered = deserialize_wal_rel_event(&bytes).unwrap(); assert_eq!(recovered, event); } #[test] fn wal_rel_event_serialize_block() { let event = WalRelationshipEvent { user_id: EntityId::new(1), target_id: EntityId::new(77), rel_type: RelationshipType::Blocks, is_add: true, timestamp_nanos: 2_000_000_000_000_000_000, }; let bytes = serialize_wal_rel_event(&event); assert_eq!(bytes.len(), 27); let recovered = deserialize_wal_rel_event(&bytes).unwrap(); assert_eq!(recovered.rel_type, RelationshipType::Blocks); assert!(recovered.is_add); } #[test] fn wal_rel_event_remove() { let event = WalRelationshipEvent { user_id: EntityId::new(1), target_id: EntityId::new(42), rel_type: RelationshipType::Hide, is_add: false, timestamp_nanos: 1_000_000_000_000_000_000, }; let bytes = serialize_wal_rel_event(&event); let recovered = deserialize_wal_rel_event(&bytes).unwrap(); assert!(!recovered.is_add); } #[test] fn hide_item_updates_all_layers() { let user_state = UserStateIndex::new(); let storage = InMemoryBackend::default(); let wal = NoopWalRelWriter; HardNegativeStore::hide_item( EntityId::new(1), EntityId::new(42), Timestamp::now(), &wal, &storage, &user_state, ).unwrap(); // In-memory: hidden assert!(user_state.is_hidden(EntityId::new(1), EntityId::new(42))); // Storage: edge exists let key = encode_relationship_key( EntityId::new(1), RelationshipType::Hide, EntityId::new(42), ); assert!(storage.get(&key).unwrap().is_some()); } #[test] fn block_creator_updates_all_layers() { let user_state = UserStateIndex::new(); let storage = InMemoryBackend::default(); let wal = NoopWalRelWriter; HardNegativeStore::block_creator( EntityId::new(1), EntityId::new(77), Timestamp::now(), &wal, &storage, &user_state, ).unwrap(); assert!(user_state.is_blocked(EntityId::new(1), EntityId::new(77))); let key = encode_relationship_key( EntityId::new(1), RelationshipType::Blocks, EntityId::new(77), ); assert!(storage.get(&key).unwrap().is_some()); } #[test] fn unhide_item_removes_from_all_layers() { let user_state = UserStateIndex::new(); let storage = InMemoryBackend::default(); let wal = NoopWalRelWriter; // Hide then unhide. HardNegativeStore::hide_item( EntityId::new(1), EntityId::new(42), Timestamp::now(), &wal, &storage, &user_state, ).unwrap(); assert!(user_state.is_hidden(EntityId::new(1), EntityId::new(42))); HardNegativeStore::unhide_item( EntityId::new(1), EntityId::new(42), Timestamp::now(), &wal, &storage, &user_state, ).unwrap(); assert!(!user_state.is_hidden(EntityId::new(1), EntityId::new(42))); let key = encode_relationship_key( EntityId::new(1), RelationshipType::Hide, EntityId::new(42), ); assert!(storage.get(&key).unwrap().is_none()); } #[test] fn replay_wal_event_reconstructs_state() { let user_state = UserStateIndex::new(); let storage = InMemoryBackend::default(); let event = WalRelationshipEvent { user_id: EntityId::new(1), target_id: EntityId::new(42), rel_type: RelationshipType::Hide, is_add: true, timestamp_nanos: 1_000_000_000_000_000_000, }; HardNegativeStore::replay_wal_event(&event, &storage, &user_state).unwrap(); assert!(user_state.is_hidden(EntityId::new(1), EntityId::new(42))); let key = encode_relationship_key( EntityId::new(1), RelationshipType::Hide, EntityId::new(42), ); assert!(storage.get(&key).unwrap().is_some()); } #[test] fn replay_is_idempotent() { let user_state = UserStateIndex::new(); let storage = InMemoryBackend::default(); let event = WalRelationshipEvent { user_id: EntityId::new(1), target_id: EntityId::new(42), rel_type: RelationshipType::Hide, is_add: true, timestamp_nanos: 1_000_000_000_000_000_000, }; // Replay same event twice. HardNegativeStore::replay_wal_event(&event, &storage, &user_state).unwrap(); HardNegativeStore::replay_wal_event(&event, &storage, &user_state).unwrap(); // State should be the same as replaying once. assert!(user_state.is_hidden(EntityId::new(1), EntityId::new(42))); } #[test] fn replay_remove_after_add() { let user_state = UserStateIndex::new(); let storage = InMemoryBackend::default(); let add = WalRelationshipEvent { user_id: EntityId::new(1), target_id: EntityId::new(42), rel_type: RelationshipType::Hide, is_add: true, timestamp_nanos: 1_000_000_000_000_000_000, }; let remove = WalRelationshipEvent { user_id: EntityId::new(1), target_id: EntityId::new(42), rel_type: RelationshipType::Hide, is_add: false, timestamp_nanos: 2_000_000_000_000_000_000, }; HardNegativeStore::replay_wal_event(&add, &storage, &user_state).unwrap(); HardNegativeStore::replay_wal_event(&remove, &storage, &user_state).unwrap(); assert!(!user_state.is_hidden(EntityId::new(1), EntityId::new(42))); } ``` ### Property Tests ```rust use proptest::prelude::*; /// Represents a user action in a random event sequence. #[derive(Debug, Clone)] enum UserAction { Hide(u64), // item_id Unhide(u64), // item_id Block(u64), // creator_id Unblock(u64), // creator_id } fn action_strategy() -> impl Strategy { prop_oneof![ (1u64..100).prop_map(UserAction::Hide), (1u64..100).prop_map(UserAction::Unhide), (1u64..50).prop_map(UserAction::Block), (1u64..50).prop_map(UserAction::Unblock), ] } proptest! { #[test] fn hidden_items_never_leak( actions in proptest::collection::vec(action_strategy(), 1..100), ) { let user_state = UserStateIndex::new(); let storage = InMemoryBackend::default(); let wal = NoopWalRelWriter; let user = EntityId::new(1); let mut ts_ns = 1_000_000_000_000_000_000u64; for action in &actions { let ts = Timestamp::from_nanos(ts_ns); match action { UserAction::Hide(item_id) => { let _ = HardNegativeStore::hide_item( user, EntityId::new(*item_id), ts, &wal, &storage, &user_state, ); } UserAction::Unhide(item_id) => { let _ = HardNegativeStore::unhide_item( user, EntityId::new(*item_id), ts, &wal, &storage, &user_state, ); } UserAction::Block(creator_id) => { let _ = HardNegativeStore::block_creator( user, EntityId::new(*creator_id), ts, &wal, &storage, &user_state, ); } UserAction::Unblock(creator_id) => { let _ = HardNegativeStore::unblock_creator( user, EntityId::new(*creator_id), ts, &wal, &storage, &user_state, ); } } ts_ns += 1_000_000_000; } // Verify: for every hidden item, is_hidden returns true. // For every blocked creator, is_blocked returns true. // This is the consistency invariant. for action in &actions { match action { UserAction::Hide(item_id) => { // Check if it was subsequently unhidden. let was_unhidden = actions.iter().any(|a| matches!(a, UserAction::Unhide(id) if id == item_id)); // The final state depends on the last action. // We verify against the in-memory state. let hidden = user_state.is_hidden(user, EntityId::new(*item_id)); // Storage should agree with in-memory. let key = encode_relationship_key( user, RelationshipType::Hide, EntityId::new(*item_id), ); let in_storage = storage.get(&key).unwrap().is_some(); prop_assert_eq!(hidden, in_storage, "memory/storage disagree for hidden item {}", item_id); } UserAction::Block(creator_id) => { let blocked = user_state.is_blocked(user, EntityId::new(*creator_id)); let key = encode_relationship_key( user, RelationshipType::Blocks, EntityId::new(*creator_id), ); let in_storage = storage.get(&key).unwrap().is_some(); prop_assert_eq!(blocked, in_storage, "memory/storage disagree for blocked creator {}", creator_id); } _ => {} } } } #[test] fn wal_rel_event_roundtrip( user_id in 1u64..10000, target_id in 1u64..10000, type_byte in 2u8..=5u8, // Blocks=2, InteractionWeight=3, Hide=4, Mute=5 is_add in proptest::bool::ANY, ts in 0u64..u64::MAX, ) { if let Some(rel_type) = RelationshipType::from_byte(type_byte) { let event = WalRelationshipEvent { user_id: EntityId::new(user_id), target_id: EntityId::new(target_id), rel_type, is_add, timestamp_nanos: ts, }; let bytes = serialize_wal_rel_event(&event); let recovered = deserialize_wal_rel_event(&bytes); prop_assert!(recovered.is_some()); prop_assert_eq!(recovered.unwrap(), event); } } } ``` ## Acceptance Criteria - [ ] `WalRelationshipEvent` struct with user_id, target_id, rel_type, is_add, timestamp - [ ] `serialize_wal_rel_event` produces fixed 27-byte wire format - [ ] `deserialize_wal_rel_event` roundtrips correctly (property tested) - [ ] `HardNegativeStore::hide_item` writes to WAL, storage, and in-memory bitmap - [ ] `HardNegativeStore::block_creator` writes to WAL, storage, and in-memory bitmap - [ ] `HardNegativeStore::unhide_item` removes from all three layers - [ ] `HardNegativeStore::unblock_creator` removes from all three layers - [ ] `replay_wal_event` reconstructs state from WAL events - [ ] Replay is idempotent (same event replayed twice = same state) - [ ] Replay of add followed by remove produces correct final state - [ ] Property test: memory and storage always agree on hide/block state - [ ] Property test: WAL event serialize/deserialize roundtrip - [ ] `remove_hide` and `remove_block` methods added to `UserStateIndex` (extends m3p1 Task 03) - [ ] `cargo clippy -- -D warnings` passes - [ ] All tests pass ## Research References - [docs/research/tidaldb_signal_ledger.md](../../../research/tidaldb_signal_ledger.md) -- WAL-first durability pattern - [thoughts.md](../../../../thoughts.md) -- Part V.5-6 (quarantine-first, WAL convergence) ## Implementation Notes - The `WalRelWriter` trait is separate from the existing `WalWriter` trait (signal events). This is because the wire format is different: signal events use a signal-specific format with signal_type_id, entity_id, weight, timestamp; relationship events use the `WalRelationshipEvent` format. The `WalHandleWriter` from `db/wal_bridge.rs` will implement both traits. - The `WAL_REL_TAG` byte (0x52) must not collide with the existing WAL event tags. Check the WAL module's tag bytes before implementation. - The `HardNegativeStore` is stateless (no fields) -- it operates on references passed to each method. This is intentional: the state lives in `UserStateIndex` (in-memory) and the storage engine (persistent). The store is a coordinator, not an owner. - `remove_hide` and `remove_block` are not implemented in m3p1 Task 03. This task must add them to `UserStateIndex`. For `remove_hide`: remove the item_id from `hidden_items` bitmap. For `remove_block`: remove the creator_id from `blocked_creators` set. - The write order (WAL -> storage -> bitmap) is critical. If the process crashes after WAL write but before storage/bitmap update, WAL replay will complete the operation. If the process crashes after storage write but before bitmap update, `UserStateIndex::rebuild_from_relationships` (m3p1 Task 03) will rebuild the bitmap from storage on next startup. - Hide events use `weight: 1.0` in the relationship edge (the weight is meaningless for boolean relationships). This is fine because the relationship type is what matters, not the weight. - Do NOT implement the integration with `db.signal()` in this task. The wiring that routes "hide" and "block" signal types to `HardNegativeStore` instead of the signal ledger is done in Task 04 (Atomic Signal Dispatch).