M2: RETRIEVE query pipeline with 5-stage execution (candidate → filter → score → diversify → limit),
usearch HNSW vector index, bitmap/range/universe filters, ranking profiles with signal scoring,
MMR diversity enforcement, and m2_uat integration tests.
M3: Entity system with typed metadata, relationship graph (follows/blocks/interactions),
creator entities, session tracking, and m3_uat integration tests.
M4: Advanced ranking with builtin functions (freshness, trending, controversy, wilson),
ranking executor with explain mode, query executor integration, benchmarks for
query/ranking/vector/filters/diversity, and m4_uat integration tests.
Includes: 9 new blog posts, marketing site updates, updated roadmap, and updated vision doc.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
25 KiB
Task 03: Hard Negatives
Context
Milestone: 3 -- Personalized Ranking
Phase: m3p2 -- Feedback Loop
Depends On: m3p1 Task 02 (relationship graph: RelationshipType::Hide, RelationshipType::Blocks), m3p1 Task 03 (UserStateIndex: add_hide, add_block), m1p2 (WAL: durability infrastructure)
Blocks: Task 04 (Atomic Signal Dispatch integrates hard negative writes), m3p4 (User State Filters depend on crash-safe hide/block state)
Complexity: L
Objective
Deliver crash-safe hard negative storage. When a user hides an item or blocks a creator, that exclusion must be permanent and must survive process crash + WAL replay without leaking. A hidden item that reappears after restart is a trust-destroying bug.
Hard negatives have three storage layers:
- WAL event: durable, append-only, replayed on crash recovery
- Relationship edge: permanent storage in users keyspace (
RelationshipType::HideorRelationshipType::Blocks) - In-memory bitmap:
UserStateIndex.add_hide()/UserStateIndex.add_block()for O(1) query-time filtering
This task delivers the WAL event format for relationship changes, the replay logic that reconstructs hide/block state from WAL events, and the HardNegativeStore that coordinates all three layers.
The critical invariant: for any sequence of hide/block/signal events, a RETRIEVE query NEVER returns a hidden item or a blocked creator's items. This invariant is enforced by a property test that generates random event sequences and verifies exclusion.
Requirements
WalRelationshipEventstruct: WAL event for relationship changes (hide, block, unblock)- WAL event format extends existing WAL wire format with a
RelationshipChangetag HardNegativeStorestruct: coordinates WAL write, storage write, and bitmap updatestore.hide_item(user_id, item_id, timestamp)atomically: appends to WAL, writes Hide edge, adds toUserStateIndexstore.block_creator(user_id, creator_id, timestamp)atomically: appends to WAL, writes Blocks edge, adds toUserStateIndexstore.replay_wal_event(event)replays a WAL relationship event into storage + bitmap- Hide events point from user to item (
RelationshipType::Hide) - Block events point from user to creator (
RelationshipType::Blocks) - Hide and block are permanent -- no automatic expiry
- WAL replay reconstructs all hide/block state correctly
- Crash at any point during the hide/block write path never produces a state where the hide/block is lost
unblock_creator(user_id, creator_id)andunhide_item(user_id, item_id)for explicit reversal
Technical Design
Module Structure
tidal/src/
entities/
hard_neg.rs -- HardNegativeStore, WalRelationshipEvent, replay
WAL Event Extension
// === entities/hard_neg.rs ===
use crate::schema::{EntityId, Timestamp};
use crate::entities::relationship::RelationshipType;
/// WAL event for a relationship change.
///
/// Extends the WAL event format to support durable relationship mutations.
/// The WAL is the source of truth for crash recovery: on restart, all
/// relationship change events are replayed to rebuild in-memory state.
#[derive(Debug, Clone, PartialEq)]
pub struct WalRelationshipEvent {
/// User performing the action.
pub user_id: EntityId,
/// Target entity (item for hide, creator for block).
pub target_id: EntityId,
/// Relationship type (Hide or Blocks).
pub rel_type: RelationshipType,
/// Whether this is an add (true) or remove (false) operation.
pub is_add: bool,
/// Timestamp of the event.
pub timestamp_nanos: u64,
}
/// Wire format for WAL relationship events.
///
/// ```text
/// [tag: 1 byte = 0x52 ('R' for Relationship)]
/// [user_id: 8 bytes BE]
/// [target_id: 8 bytes BE]
/// [rel_type: 1 byte]
/// [is_add: 1 byte (0 or 1)]
/// [timestamp: 8 bytes LE]
/// ```
///
/// Total: 27 bytes (fixed size).
pub const WAL_REL_TAG: u8 = 0x52; // 'R'
pub fn serialize_wal_rel_event(event: &WalRelationshipEvent) -> [u8; 27] {
let mut buf = [0u8; 27];
buf[0] = WAL_REL_TAG;
buf[1..9].copy_from_slice(&event.user_id.to_be_bytes());
buf[9..17].copy_from_slice(&event.target_id.to_be_bytes());
buf[17] = event.rel_type.as_byte();
buf[18] = if event.is_add { 1 } else { 0 };
buf[19..27].copy_from_slice(&event.timestamp_nanos.to_le_bytes());
buf
}
pub fn deserialize_wal_rel_event(buf: &[u8]) -> Option<WalRelationshipEvent> {
if buf.len() < 27 || buf[0] != WAL_REL_TAG {
return None;
}
let user_id = EntityId::new(u64::from_be_bytes(buf[1..9].try_into().ok()?));
let target_id = EntityId::new(u64::from_be_bytes(buf[9..17].try_into().ok()?));
let rel_type = RelationshipType::from_byte(buf[17])?;
let is_add = buf[18] != 0;
let timestamp_nanos = u64::from_le_bytes(buf[19..27].try_into().ok()?);
Some(WalRelationshipEvent {
user_id,
target_id,
rel_type,
is_add,
timestamp_nanos,
})
}
HardNegativeStore
use crate::entities::user_state::UserStateIndex;
use crate::storage::StorageEngine;
/// Coordinates hard negative writes across WAL, storage, and in-memory bitmap.
///
/// The write order is:
/// 1. Append to WAL (durability first)
/// 2. Write relationship edge to storage (persistent state)
/// 3. Update in-memory bitmap (query-time filter)
///
/// If the process crashes after step 1 but before step 2/3, WAL replay
/// on the next startup will re-execute steps 2 and 3 from the WAL event.
/// This is safe because relationship writes are idempotent.
pub struct HardNegativeStore {
/// Reference to the user state index for in-memory bitmap updates.
/// The store borrows this; ownership remains with TidalDb.
_phantom: std::marker::PhantomData<()>,
}
impl HardNegativeStore {
/// Hide an item for a user.
///
/// Atomically:
/// 1. Append WAL relationship event
/// 2. Write Hide edge to users keyspace
/// 3. Add item to UserStateIndex.hidden_items
///
/// # Ordering guarantee
///
/// After this method returns, the item is excluded from all
/// RETRIEVE queries for this user, even if the process crashes
/// immediately after (WAL replay will re-apply).
pub fn hide_item(
user_id: EntityId,
item_id: EntityId,
timestamp: Timestamp,
wal_writer: &dyn WalRelWriter,
storage: &dyn StorageEngine,
user_state: &UserStateIndex,
) -> crate::Result<()> {
let event = WalRelationshipEvent {
user_id,
target_id: item_id,
rel_type: RelationshipType::Hide,
is_add: true,
timestamp_nanos: timestamp.as_nanos(),
};
// 1. WAL first.
wal_writer.append_relationship(&event)?;
// 2. Persist to storage.
Self::write_edge_to_storage(storage, &event)?;
// 3. Update in-memory bitmap.
user_state.add_hide(user_id, item_id);
Ok(())
}
/// Block a creator for a user.
///
/// Atomically:
/// 1. Append WAL relationship event
/// 2. Write Blocks edge to users keyspace
/// 3. Add creator to UserStateIndex.blocked_creators
pub fn block_creator(
user_id: EntityId,
creator_id: EntityId,
timestamp: Timestamp,
wal_writer: &dyn WalRelWriter,
storage: &dyn StorageEngine,
user_state: &UserStateIndex,
) -> crate::Result<()> {
let event = WalRelationshipEvent {
user_id,
target_id: creator_id,
rel_type: RelationshipType::Blocks,
is_add: true,
timestamp_nanos: timestamp.as_nanos(),
};
// 1. WAL first.
wal_writer.append_relationship(&event)?;
// 2. Persist to storage.
Self::write_edge_to_storage(storage, &event)?;
// 3. Update in-memory bitmap.
user_state.add_block(user_id, creator_id);
Ok(())
}
/// Unhide an item (explicit reversal).
pub fn unhide_item(
user_id: EntityId,
item_id: EntityId,
timestamp: Timestamp,
wal_writer: &dyn WalRelWriter,
storage: &dyn StorageEngine,
user_state: &UserStateIndex,
) -> crate::Result<()> {
let event = WalRelationshipEvent {
user_id,
target_id: item_id,
rel_type: RelationshipType::Hide,
is_add: false,
timestamp_nanos: timestamp.as_nanos(),
};
wal_writer.append_relationship(&event)?;
Self::delete_edge_from_storage(storage, &event)?;
user_state.remove_hide(user_id, item_id);
Ok(())
}
/// Unblock a creator (explicit reversal).
pub fn unblock_creator(
user_id: EntityId,
creator_id: EntityId,
timestamp: Timestamp,
wal_writer: &dyn WalRelWriter,
storage: &dyn StorageEngine,
user_state: &UserStateIndex,
) -> crate::Result<()> {
let event = WalRelationshipEvent {
user_id,
target_id: creator_id,
rel_type: RelationshipType::Blocks,
is_add: false,
timestamp_nanos: timestamp.as_nanos(),
};
wal_writer.append_relationship(&event)?;
Self::delete_edge_from_storage(storage, &event)?;
user_state.remove_block(user_id, creator_id);
Ok(())
}
/// Replay a WAL relationship event during crash recovery.
///
/// Applies steps 2 (storage write) and 3 (bitmap update) from the
/// WAL event. This is idempotent: replaying the same event twice
/// produces the same state.
pub fn replay_wal_event(
event: &WalRelationshipEvent,
storage: &dyn StorageEngine,
user_state: &UserStateIndex,
) -> crate::Result<()> {
if event.is_add {
Self::write_edge_to_storage(storage, event)?;
match event.rel_type {
RelationshipType::Hide => {
user_state.add_hide(event.user_id, event.target_id);
}
RelationshipType::Blocks => {
user_state.add_block(event.user_id, event.target_id);
}
_ => {} // Other types handled elsewhere
}
} else {
Self::delete_edge_from_storage(storage, event)?;
match event.rel_type {
RelationshipType::Hide => {
user_state.remove_hide(event.user_id, event.target_id);
}
RelationshipType::Blocks => {
user_state.remove_block(event.user_id, event.target_id);
}
_ => {}
}
}
Ok(())
}
/// Write a relationship edge to storage.
fn write_edge_to_storage(
storage: &dyn StorageEngine,
event: &WalRelationshipEvent,
) -> crate::Result<()> {
use crate::entities::relationship::{
encode_relationship_key, encode_relationship_value,
};
let key = encode_relationship_key(event.user_id, event.rel_type, event.target_id);
let value = encode_relationship_value(1.0, event.timestamp_nanos);
storage.put(&key, &value).map_err(crate::LumenError::from)
}
/// Delete a relationship edge from storage.
fn delete_edge_from_storage(
storage: &dyn StorageEngine,
event: &WalRelationshipEvent,
) -> crate::Result<()> {
use crate::entities::relationship::encode_relationship_key;
let key = encode_relationship_key(event.user_id, event.rel_type, event.target_id);
storage.delete(&key).map_err(crate::LumenError::from)
}
}
/// Trait for appending relationship events to the WAL.
///
/// Separated from `WalWriter` (signal events) because the event format
/// and tag byte are different. The `WalHandleWriter` implements both traits.
pub trait WalRelWriter: Send + Sync {
fn append_relationship(&self, event: &WalRelationshipEvent) -> crate::Result<()>;
}
/// No-op WAL writer for testing.
pub struct NoopWalRelWriter;
impl WalRelWriter for NoopWalRelWriter {
fn append_relationship(&self, _event: &WalRelationshipEvent) -> crate::Result<()> {
Ok(())
}
}
Test Strategy
Unit Tests
#[test]
fn wal_rel_event_serialize_roundtrip() {
let event = WalRelationshipEvent {
user_id: EntityId::new(42),
target_id: EntityId::new(999),
rel_type: RelationshipType::Hide,
is_add: true,
timestamp_nanos: 1_000_000_000_000_000_000,
};
let bytes = serialize_wal_rel_event(&event);
let recovered = deserialize_wal_rel_event(&bytes).unwrap();
assert_eq!(recovered, event);
}
#[test]
fn wal_rel_event_serialize_block() {
let event = WalRelationshipEvent {
user_id: EntityId::new(1),
target_id: EntityId::new(77),
rel_type: RelationshipType::Blocks,
is_add: true,
timestamp_nanos: 2_000_000_000_000_000_000,
};
let bytes = serialize_wal_rel_event(&event);
assert_eq!(bytes.len(), 27);
let recovered = deserialize_wal_rel_event(&bytes).unwrap();
assert_eq!(recovered.rel_type, RelationshipType::Blocks);
assert!(recovered.is_add);
}
#[test]
fn wal_rel_event_remove() {
let event = WalRelationshipEvent {
user_id: EntityId::new(1),
target_id: EntityId::new(42),
rel_type: RelationshipType::Hide,
is_add: false,
timestamp_nanos: 1_000_000_000_000_000_000,
};
let bytes = serialize_wal_rel_event(&event);
let recovered = deserialize_wal_rel_event(&bytes).unwrap();
assert!(!recovered.is_add);
}
#[test]
fn hide_item_updates_all_layers() {
let user_state = UserStateIndex::new();
let storage = InMemoryBackend::default();
let wal = NoopWalRelWriter;
HardNegativeStore::hide_item(
EntityId::new(1), EntityId::new(42),
Timestamp::now(), &wal, &storage, &user_state,
).unwrap();
// In-memory: hidden
assert!(user_state.is_hidden(EntityId::new(1), EntityId::new(42)));
// Storage: edge exists
let key = encode_relationship_key(
EntityId::new(1), RelationshipType::Hide, EntityId::new(42),
);
assert!(storage.get(&key).unwrap().is_some());
}
#[test]
fn block_creator_updates_all_layers() {
let user_state = UserStateIndex::new();
let storage = InMemoryBackend::default();
let wal = NoopWalRelWriter;
HardNegativeStore::block_creator(
EntityId::new(1), EntityId::new(77),
Timestamp::now(), &wal, &storage, &user_state,
).unwrap();
assert!(user_state.is_blocked(EntityId::new(1), EntityId::new(77)));
let key = encode_relationship_key(
EntityId::new(1), RelationshipType::Blocks, EntityId::new(77),
);
assert!(storage.get(&key).unwrap().is_some());
}
#[test]
fn unhide_item_removes_from_all_layers() {
let user_state = UserStateIndex::new();
let storage = InMemoryBackend::default();
let wal = NoopWalRelWriter;
// Hide then unhide.
HardNegativeStore::hide_item(
EntityId::new(1), EntityId::new(42),
Timestamp::now(), &wal, &storage, &user_state,
).unwrap();
assert!(user_state.is_hidden(EntityId::new(1), EntityId::new(42)));
HardNegativeStore::unhide_item(
EntityId::new(1), EntityId::new(42),
Timestamp::now(), &wal, &storage, &user_state,
).unwrap();
assert!(!user_state.is_hidden(EntityId::new(1), EntityId::new(42)));
let key = encode_relationship_key(
EntityId::new(1), RelationshipType::Hide, EntityId::new(42),
);
assert!(storage.get(&key).unwrap().is_none());
}
#[test]
fn replay_wal_event_reconstructs_state() {
let user_state = UserStateIndex::new();
let storage = InMemoryBackend::default();
let event = WalRelationshipEvent {
user_id: EntityId::new(1),
target_id: EntityId::new(42),
rel_type: RelationshipType::Hide,
is_add: true,
timestamp_nanos: 1_000_000_000_000_000_000,
};
HardNegativeStore::replay_wal_event(&event, &storage, &user_state).unwrap();
assert!(user_state.is_hidden(EntityId::new(1), EntityId::new(42)));
let key = encode_relationship_key(
EntityId::new(1), RelationshipType::Hide, EntityId::new(42),
);
assert!(storage.get(&key).unwrap().is_some());
}
#[test]
fn replay_is_idempotent() {
let user_state = UserStateIndex::new();
let storage = InMemoryBackend::default();
let event = WalRelationshipEvent {
user_id: EntityId::new(1),
target_id: EntityId::new(42),
rel_type: RelationshipType::Hide,
is_add: true,
timestamp_nanos: 1_000_000_000_000_000_000,
};
// Replay same event twice.
HardNegativeStore::replay_wal_event(&event, &storage, &user_state).unwrap();
HardNegativeStore::replay_wal_event(&event, &storage, &user_state).unwrap();
// State should be the same as replaying once.
assert!(user_state.is_hidden(EntityId::new(1), EntityId::new(42)));
}
#[test]
fn replay_remove_after_add() {
let user_state = UserStateIndex::new();
let storage = InMemoryBackend::default();
let add = WalRelationshipEvent {
user_id: EntityId::new(1),
target_id: EntityId::new(42),
rel_type: RelationshipType::Hide,
is_add: true,
timestamp_nanos: 1_000_000_000_000_000_000,
};
let remove = WalRelationshipEvent {
user_id: EntityId::new(1),
target_id: EntityId::new(42),
rel_type: RelationshipType::Hide,
is_add: false,
timestamp_nanos: 2_000_000_000_000_000_000,
};
HardNegativeStore::replay_wal_event(&add, &storage, &user_state).unwrap();
HardNegativeStore::replay_wal_event(&remove, &storage, &user_state).unwrap();
assert!(!user_state.is_hidden(EntityId::new(1), EntityId::new(42)));
}
Property Tests
use proptest::prelude::*;
/// Represents a user action in a random event sequence.
#[derive(Debug, Clone)]
enum UserAction {
Hide(u64), // item_id
Unhide(u64), // item_id
Block(u64), // creator_id
Unblock(u64), // creator_id
}
fn action_strategy() -> impl Strategy<Value = UserAction> {
prop_oneof![
(1u64..100).prop_map(UserAction::Hide),
(1u64..100).prop_map(UserAction::Unhide),
(1u64..50).prop_map(UserAction::Block),
(1u64..50).prop_map(UserAction::Unblock),
]
}
proptest! {
#[test]
fn hidden_items_never_leak(
actions in proptest::collection::vec(action_strategy(), 1..100),
) {
let user_state = UserStateIndex::new();
let storage = InMemoryBackend::default();
let wal = NoopWalRelWriter;
let user = EntityId::new(1);
let mut ts_ns = 1_000_000_000_000_000_000u64;
for action in &actions {
let ts = Timestamp::from_nanos(ts_ns);
match action {
UserAction::Hide(item_id) => {
let _ = HardNegativeStore::hide_item(
user, EntityId::new(*item_id), ts, &wal, &storage, &user_state,
);
}
UserAction::Unhide(item_id) => {
let _ = HardNegativeStore::unhide_item(
user, EntityId::new(*item_id), ts, &wal, &storage, &user_state,
);
}
UserAction::Block(creator_id) => {
let _ = HardNegativeStore::block_creator(
user, EntityId::new(*creator_id), ts, &wal, &storage, &user_state,
);
}
UserAction::Unblock(creator_id) => {
let _ = HardNegativeStore::unblock_creator(
user, EntityId::new(*creator_id), ts, &wal, &storage, &user_state,
);
}
}
ts_ns += 1_000_000_000;
}
// Verify: for every hidden item, is_hidden returns true.
// For every blocked creator, is_blocked returns true.
// This is the consistency invariant.
for action in &actions {
match action {
UserAction::Hide(item_id) => {
// Check if it was subsequently unhidden.
let was_unhidden = actions.iter().any(|a| matches!(a, UserAction::Unhide(id) if id == item_id));
// The final state depends on the last action.
// We verify against the in-memory state.
let hidden = user_state.is_hidden(user, EntityId::new(*item_id));
// Storage should agree with in-memory.
let key = encode_relationship_key(
user, RelationshipType::Hide, EntityId::new(*item_id),
);
let in_storage = storage.get(&key).unwrap().is_some();
prop_assert_eq!(hidden, in_storage,
"memory/storage disagree for hidden item {}", item_id);
}
UserAction::Block(creator_id) => {
let blocked = user_state.is_blocked(user, EntityId::new(*creator_id));
let key = encode_relationship_key(
user, RelationshipType::Blocks, EntityId::new(*creator_id),
);
let in_storage = storage.get(&key).unwrap().is_some();
prop_assert_eq!(blocked, in_storage,
"memory/storage disagree for blocked creator {}", creator_id);
}
_ => {}
}
}
}
#[test]
fn wal_rel_event_roundtrip(
user_id in 1u64..10000,
target_id in 1u64..10000,
type_byte in 2u8..=5u8, // Blocks=2, InteractionWeight=3, Hide=4, Mute=5
is_add in proptest::bool::ANY,
ts in 0u64..u64::MAX,
) {
if let Some(rel_type) = RelationshipType::from_byte(type_byte) {
let event = WalRelationshipEvent {
user_id: EntityId::new(user_id),
target_id: EntityId::new(target_id),
rel_type,
is_add,
timestamp_nanos: ts,
};
let bytes = serialize_wal_rel_event(&event);
let recovered = deserialize_wal_rel_event(&bytes);
prop_assert!(recovered.is_some());
prop_assert_eq!(recovered.unwrap(), event);
}
}
}
Acceptance Criteria
WalRelationshipEventstruct with user_id, target_id, rel_type, is_add, timestampserialize_wal_rel_eventproduces fixed 27-byte wire formatdeserialize_wal_rel_eventroundtrips correctly (property tested)HardNegativeStore::hide_itemwrites to WAL, storage, and in-memory bitmapHardNegativeStore::block_creatorwrites to WAL, storage, and in-memory bitmapHardNegativeStore::unhide_itemremoves from all three layersHardNegativeStore::unblock_creatorremoves from all three layersreplay_wal_eventreconstructs state from WAL events- Replay is idempotent (same event replayed twice = same state)
- Replay of add followed by remove produces correct final state
- Property test: memory and storage always agree on hide/block state
- Property test: WAL event serialize/deserialize roundtrip
remove_hideandremove_blockmethods added toUserStateIndex(extends m3p1 Task 03)cargo clippy -- -D warningspasses- All tests pass
Research References
- docs/research/tidaldb_signal_ledger.md -- WAL-first durability pattern
- thoughts.md -- Part V.5-6 (quarantine-first, WAL convergence)
Implementation Notes
- The
WalRelWritertrait is separate from the existingWalWritertrait (signal events). This is because the wire format is different: signal events use a signal-specific format with signal_type_id, entity_id, weight, timestamp; relationship events use theWalRelationshipEventformat. TheWalHandleWriterfromdb/wal_bridge.rswill implement both traits. - The
WAL_REL_TAGbyte (0x52) must not collide with the existing WAL event tags. Check the WAL module's tag bytes before implementation. - The
HardNegativeStoreis stateless (no fields) -- it operates on references passed to each method. This is intentional: the state lives inUserStateIndex(in-memory) and the storage engine (persistent). The store is a coordinator, not an owner. remove_hideandremove_blockare not implemented in m3p1 Task 03. This task must add them toUserStateIndex. Forremove_hide: remove the item_id fromhidden_itemsbitmap. Forremove_block: remove the creator_id fromblocked_creatorsset.- The write order (WAL -> storage -> bitmap) is critical. If the process crashes after WAL write but before storage/bitmap update, WAL replay will complete the operation. If the process crashes after storage write but before bitmap update,
UserStateIndex::rebuild_from_relationships(m3p1 Task 03) will rebuild the bitmap from storage on next startup. - Hide events use
weight: 1.0in the relationship edge (the weight is meaningless for boolean relationships). This is fine because the relationship type is what matters, not the weight. - Do NOT implement the integration with
db.signal()in this task. The wiring that routes "hide" and "block" signal types toHardNegativeStoreinstead of the signal ledger is done in Task 04 (Atomic Signal Dispatch).