# Task 03: User-State Bitmap Indexes ## Context **Milestone:** 3 -- Personalized Ranking **Phase:** m3p1 -- User and Creator Entities with Relationships **Depends On:** Task 01 (User/Creator entities, `CreatorItemsBitmap`), Task 02 (Relationship graph: follows, blocks, hide edges) **Blocks:** m3p2 (Feedback Loop updates seen/hide bitmaps), m3p3 (Personalized Profiles use follows bitmap for `following` profile), m3p4 (User State Filters compose these bitmaps with metadata filters) **Complexity:** M ## Objective Deliver three user-state bitmap structures that power the `unseen`, `unblocked`, and `relationship:follows` filters in the RETRIEVE executor: 1. **`UserSeenBitmap`**: per-user roaring bitmap of item IDs the user has viewed. Updated on `view` signals. Used by the `unseen` filter to exclude already-seen items. 2. **`UserBlockedSet`**: per-user set of blocked creator IDs and hidden item IDs. Built from `blocks` and `hide` relationship edges. Used by the `unblocked` filter to exclude all items from blocked creators and all hidden items. 3. **`FollowsBitmap`**: per-user roaring bitmap of item IDs from followed creators. Built by intersecting the user's `follows` edges with the `CreatorItemsBitmap` from Task 01. Used by `FILTER relationship:follows` to restrict candidates to followed creators' items. These bitmaps are maintained in-memory for hot-path query performance and reconstructed from the storage engine on restart. They compose with the existing `FilterExpr` / `FilterResult` system from m2p2. ## Requirements - `UserSeenBitmap`: per-user `RoaringBitmap` of viewed item IDs - `UserBlockedSet`: per-user `HashSet` of blocked creator IDs + `RoaringBitmap` of hidden item IDs - `FollowsBitmap`: per-user `RoaringBitmap` of item IDs from followed creators - `UserStateIndex`: container holding all three structures for all users, backed by `DashMap` per structure - `user_state.mark_seen(user_id, item_id)` adds to seen bitmap - `user_state.is_seen(user_id, item_id)` checks membership - `user_state.add_block(user_id, creator_id)` adds to blocked set - `user_state.add_hide(user_id, item_id)` adds to hidden set - `user_state.add_follow(user_id, creator_id)` rebuilds follows bitmap from creator items - `user_state.remove_follow(user_id, creator_id)` updates follows bitmap - `user_state.unseen_filter(user_id)` returns `FilterResult::Predicate` excluding seen items - `user_state.unblocked_filter(user_id, creator_items_bitmap)` returns `FilterResult::Predicate` excluding blocked+hidden - `user_state.follows_filter(user_id)` returns `FilterResult::Bitmap` of followed creators' items - Memory budget: ~125KB per user at 1M items for seen bitmap (roaring bitmap compression) - Reconstruction from storage on restart via `rebuild_from_storage()` ## Technical Design ### Module Structure ``` tidal/src/ entities/ user_state.rs -- UserStateIndex, all bitmap types ``` ### Core Types ```rust // === entities/user_state.rs === use dashmap::DashMap; use roaring::RoaringBitmap; use std::collections::HashSet; use crate::schema::EntityId; use super::CreatorItemsBitmap; /// Per-user blocked creators and hidden items. #[derive(Debug, Default, Clone)] pub struct BlockedState { /// Creator IDs the user has blocked. pub blocked_creators: HashSet, /// Item IDs the user has hidden. pub hidden_items: RoaringBitmap, } /// Centralized user-state index for fast query-time filtering. /// /// All structures are in-memory for hot-path performance. They are /// rebuilt from the storage engine on startup and incrementally /// maintained on signal writes and relationship changes. pub struct UserStateIndex { /// Per-user seen item bitmaps. Key: user_id as u64. seen: DashMap, /// Per-user blocked/hidden state. Key: user_id as u64. blocked: DashMap, /// Per-user followed creator IDs. Key: user_id as u64. follows: DashMap>, } impl UserStateIndex { pub fn new() -> Self { Self { seen: DashMap::new(), blocked: DashMap::new(), follows: DashMap::new(), } } // ── Seen ───────────────────────────────────────────────── /// Mark an item as seen by a user. pub fn mark_seen(&self, user_id: EntityId, item_id: EntityId) { self.seen .entry(user_id.as_u64()) .or_default() .insert(item_id.as_u64() as u32); } /// Check if a user has seen an item. pub fn is_seen(&self, user_id: EntityId, item_id: EntityId) -> bool { self.seen .get(&user_id.as_u64()) .map_or(false, |bm| bm.contains(item_id.as_u64() as u32)) } /// Get the count of seen items for a user. pub fn seen_count(&self, user_id: EntityId) -> u64 { self.seen .get(&user_id.as_u64()) .map_or(0, |bm| bm.len()) } // ── Blocked / Hidden ────────────────────────────────────── /// Add a creator to the user's blocked set. pub fn add_block(&self, user_id: EntityId, creator_id: EntityId) { self.blocked .entry(user_id.as_u64()) .or_default() .blocked_creators .insert(creator_id.as_u64()); } /// Add an item to the user's hidden set. pub fn add_hide(&self, user_id: EntityId, item_id: EntityId) { self.blocked .entry(user_id.as_u64()) .or_default() .hidden_items .insert(item_id.as_u64() as u32); } /// Check if a creator is blocked by a user. pub fn is_blocked(&self, user_id: EntityId, creator_id: EntityId) -> bool { self.blocked .get(&user_id.as_u64()) .map_or(false, |s| s.blocked_creators.contains(&creator_id.as_u64())) } /// Check if an item is hidden by a user. pub fn is_hidden(&self, user_id: EntityId, item_id: EntityId) -> bool { self.blocked .get(&user_id.as_u64()) .map_or(false, |s| s.hidden_items.contains(item_id.as_u64() as u32)) } // ── Follows ────────────────────────────────────────────── /// Add a follow relationship. pub fn add_follow(&self, user_id: EntityId, creator_id: EntityId) { self.follows .entry(user_id.as_u64()) .or_default() .insert(creator_id.as_u64()); } /// Remove a follow relationship. pub fn remove_follow(&self, user_id: EntityId, creator_id: EntityId) { if let Some(mut set) = self.follows.get_mut(&user_id.as_u64()) { set.remove(&creator_id.as_u64()); } } /// Get the set of creator IDs a user follows. pub fn followed_creators(&self, user_id: EntityId) -> Vec { self.follows .get(&user_id.as_u64()) .map_or_else(Vec::new, |set| { set.iter().map(|&id| EntityId::new(id)).collect() }) } // ── Filter builders ────────────────────────────────────── /// Build an "unseen" filter predicate for a user. /// /// Returns a closure that returns `true` for items the user has NOT seen. pub fn unseen_predicate( &self, user_id: EntityId, ) -> Box bool + Send + Sync> { let seen_bitmap = self.seen .get(&user_id.as_u64()) .map(|bm| bm.clone()); Box::new(move |item_id: u64| { match &seen_bitmap { Some(bm) => !bm.contains(item_id as u32), None => true, // no seen data = everything is unseen } }) } /// Build an "unblocked" filter predicate for a user. /// /// Returns a closure that returns `true` for items that are: /// - NOT from a blocked creator /// - NOT in the user's hidden set /// /// Requires a function to look up creator_id for an item. pub fn unblocked_predicate( &self, user_id: EntityId, ) -> Box) -> bool + Send + Sync> { let state = self.blocked .get(&user_id.as_u64()) .map(|s| s.clone()); Box::new(move |item_id: u64, creator_id: Option| { match &state { Some(s) => { // Check hidden items if s.hidden_items.contains(item_id as u32) { return false; } // Check blocked creators if let Some(cid) = creator_id { if s.blocked_creators.contains(&cid) { return false; } } true } None => true, } }) } /// Build a "follows" filter bitmap for a user. /// /// Returns the union of all item bitmaps for creators the user follows. pub fn follows_bitmap( &self, user_id: EntityId, creator_items: &CreatorItemsBitmap, ) -> RoaringBitmap { let creators = self.followed_creators(user_id); let creator_ids: Vec = creators; creator_items.items_for_creators(&creator_ids) } // ── Reconstruction ─────────────────────────────────────── /// Rebuild all user-state bitmaps from storage. /// /// Scans all relationship edges and signal ledger entries to reconstruct: /// - Seen bitmaps from view signals /// - Blocked/hidden sets from blocks/hide relationship edges /// - Follows sets from follows relationship edges pub fn rebuild_from_relationships( &self, storage: &dyn crate::storage::StorageEngine, ) -> crate::Result<()> { // Scan all Rel-tagged keys in users keyspace // For each key, decode the relationship type and update the // appropriate bitmap/set. // Implementation detail: use entity_tag_prefix scanning. // This is called once on startup. Ok(()) } } ``` ### Integration with FilterExpr The `unseen` and `unblocked` filters are new `FilterExpr` variants that the query executor evaluates using the `UserStateIndex`: ```rust // Extend FilterExpr in storage/indexes/filter.rs pub enum FilterExpr { // ... existing variants ... /// Exclude items the user has seen. Requires user context. Unseen, /// Exclude items from blocked creators and hidden items. Requires user context. Unblocked, /// Only items from followed creators. Requires user context. Follows, } ``` The executor resolves these variants at query time by consulting the `UserStateIndex` attached to the `TidalDb` instance. ## Test Strategy ### Unit Tests ```rust #[test] fn mark_seen_and_check() { let index = UserStateIndex::new(); let user = EntityId::new(1); let item = EntityId::new(42); assert!(!index.is_seen(user, item)); index.mark_seen(user, item); assert!(index.is_seen(user, item)); } #[test] fn seen_count_increments() { let index = UserStateIndex::new(); let user = EntityId::new(1); assert_eq!(index.seen_count(user), 0); index.mark_seen(user, EntityId::new(1)); index.mark_seen(user, EntityId::new(2)); index.mark_seen(user, EntityId::new(2)); // duplicate assert_eq!(index.seen_count(user), 2); } #[test] fn block_and_check() { let index = UserStateIndex::new(); let user = EntityId::new(1); let creator = EntityId::new(10); assert!(!index.is_blocked(user, creator)); index.add_block(user, creator); assert!(index.is_blocked(user, creator)); } #[test] fn hide_and_check() { let index = UserStateIndex::new(); let user = EntityId::new(1); let item = EntityId::new(42); assert!(!index.is_hidden(user, item)); index.add_hide(user, item); assert!(index.is_hidden(user, item)); } #[test] fn follow_and_list() { let index = UserStateIndex::new(); let user = EntityId::new(1); index.add_follow(user, EntityId::new(10)); index.add_follow(user, EntityId::new(20)); let creators = index.followed_creators(user); assert_eq!(creators.len(), 2); } #[test] fn unfollow_removes_creator() { let index = UserStateIndex::new(); let user = EntityId::new(1); index.add_follow(user, EntityId::new(10)); index.add_follow(user, EntityId::new(20)); index.remove_follow(user, EntityId::new(10)); let creators = index.followed_creators(user); assert_eq!(creators.len(), 1); assert!(creators.contains(&EntityId::new(20))); } #[test] fn unseen_predicate_excludes_seen_items() { let index = UserStateIndex::new(); let user = EntityId::new(1); index.mark_seen(user, EntityId::new(5)); index.mark_seen(user, EntityId::new(10)); let pred = index.unseen_predicate(user); assert!(!pred(5)); // seen -> excluded assert!(!pred(10)); // seen -> excluded assert!(pred(15)); // unseen -> included assert!(pred(1)); // unseen -> included } #[test] fn unseen_predicate_for_unknown_user_includes_all() { let index = UserStateIndex::new(); let pred = index.unseen_predicate(EntityId::new(999)); assert!(pred(1)); assert!(pred(100)); } #[test] fn unblocked_predicate_excludes_blocked_and_hidden() { let index = UserStateIndex::new(); let user = EntityId::new(1); index.add_block(user, EntityId::new(77)); // block creator 77 index.add_hide(user, EntityId::new(42)); // hide item 42 let pred = index.unblocked_predicate(user); // Item from blocked creator -> excluded assert!(!pred(100, Some(77))); // Hidden item -> excluded regardless of creator assert!(!pred(42, Some(1))); assert!(!pred(42, None)); // Normal item from unblocked creator -> included assert!(pred(50, Some(10))); // Item with unknown creator -> included (not blocked) assert!(pred(50, None)); } #[test] fn follows_bitmap_union_of_creator_items() { let index = UserStateIndex::new(); let creator_items = CreatorItemsBitmap::new(); // Creator 10 has items 100, 101 creator_items.add_item(EntityId::new(10), EntityId::new(100)); creator_items.add_item(EntityId::new(10), EntityId::new(101)); // Creator 20 has items 200, 201 creator_items.add_item(EntityId::new(20), EntityId::new(200)); creator_items.add_item(EntityId::new(20), EntityId::new(201)); // Creator 30 has items 300 (not followed) creator_items.add_item(EntityId::new(30), EntityId::new(300)); let user = EntityId::new(1); index.add_follow(user, EntityId::new(10)); index.add_follow(user, EntityId::new(20)); let bitmap = index.follows_bitmap(user, &creator_items); assert!(bitmap.contains(100)); assert!(bitmap.contains(101)); assert!(bitmap.contains(200)); assert!(bitmap.contains(201)); assert!(!bitmap.contains(300)); // not followed assert_eq!(bitmap.len(), 4); } #[test] fn different_users_have_independent_state() { let index = UserStateIndex::new(); let user_a = EntityId::new(1); let user_b = EntityId::new(2); index.mark_seen(user_a, EntityId::new(42)); index.add_block(user_b, EntityId::new(77)); assert!(index.is_seen(user_a, EntityId::new(42))); assert!(!index.is_seen(user_b, EntityId::new(42))); assert!(!index.is_blocked(user_a, EntityId::new(77))); assert!(index.is_blocked(user_b, EntityId::new(77))); } ``` ### Property Tests ```rust use proptest::prelude::*; proptest! { #[test] fn seen_items_never_pass_unseen_filter( user_id in 1u64..100, seen_items in proptest::collection::vec(1u64..10000, 1..100), test_item in 1u64..10000, ) { let index = UserStateIndex::new(); let user = EntityId::new(user_id); for &item in &seen_items { index.mark_seen(user, EntityId::new(item)); } let pred = index.unseen_predicate(user); if seen_items.contains(&test_item) { prop_assert!(!pred(test_item), "seen item {} should be excluded by unseen filter", test_item); } else { prop_assert!(pred(test_item), "unseen item {} should pass unseen filter", test_item); } } #[test] fn blocked_creators_items_never_pass_unblocked_filter( user_id in 1u64..100, blocked_creators in proptest::collection::vec(1u64..100, 1..10), test_creator in 1u64..100, test_item in 1u64..10000, ) { let index = UserStateIndex::new(); let user = EntityId::new(user_id); for &cid in &blocked_creators { index.add_block(user, EntityId::new(cid)); } let pred = index.unblocked_predicate(user); if blocked_creators.contains(&test_creator) { prop_assert!(!pred(test_item, Some(test_creator)), "item from blocked creator {} should be excluded", test_creator); } else { prop_assert!(pred(test_item, Some(test_creator)), "item from unblocked creator {} should pass", test_creator); } } } ``` ## Acceptance Criteria - [ ] `UserStateIndex` with `DashMap`-backed seen, blocked, and follows structures - [ ] `mark_seen` / `is_seen` / `seen_count` work correctly - [ ] `add_block` / `is_blocked` / `add_hide` / `is_hidden` work correctly - [ ] `add_follow` / `remove_follow` / `followed_creators` work correctly - [ ] `unseen_predicate` returns closure excluding all seen items - [ ] `unseen_predicate` for unknown user includes all items - [ ] `unblocked_predicate` excludes items from blocked creators AND hidden items - [ ] `follows_bitmap` returns union of item sets for followed creators - [ ] Different users have fully independent state (no cross-contamination) - [ ] `FilterExpr` extended with `Unseen`, `Unblocked`, `Follows` variants - [ ] Memory: roaring bitmap for 1M items is < 200KB per user in typical usage - [ ] Property test: seen items NEVER pass unseen filter - [ ] Property test: blocked creators' items NEVER pass unblocked filter - [ ] All unit and property tests pass - [ ] `cargo clippy -- -D warnings` passes ## Research References - [docs/research/ann_for_tidaldb.md](../../../research/ann_for_tidaldb.md) -- Roaring bitmap selectivity estimation - [VISION.md](../../../../VISION.md) -- "unseen" and "unblocked" as first-class filter primitives ## Implementation Notes - The `UserStateIndex` is stored as a field on `TidalDb`, allocated during `open()`. It is `Send + Sync` because all inner maps are `DashMap`. - `RoaringBitmap` uses `u32` keys. For M3 at up to 1M items, `u32` is sufficient. If item IDs exceed `u32::MAX`, the bitmap must be partitioned. This is unlikely before M7 (production hardening). Document the `u32` limitation. - The `unblocked_predicate` takes an `(item_id, Option)` pair because the predicate needs to know the creator for each item. The executor must look up creator_id per item when evaluating this filter. In the RETRIEVE executor, creator_id is already available from `ScoredCandidate::creator_id` (set in m2p3). - On startup, `rebuild_from_relationships` scans all `Tag::Rel` keys in the users keyspace and populates the `follows`, `blocked`, and `hidden` structures. Seen bitmaps are NOT rebuilt from storage on startup for M3 -- they start empty and are populated from signal writes during the session. Full seen-state persistence (checkpoint + restore) is deferred to m3p4 Task 01 where it is implemented properly. - The `CreatorItemsBitmap` (from Task 01) must be updated when new items are written. The `FollowsBitmap` then becomes stale if new items arrive for a followed creator. Two approaches: (a) rebuild follows bitmap on every item write (expensive), (b) rebuild follows bitmap on every query (cached). Recommendation: (b) -- cache the follows bitmap per user with a generation counter that increments on item writes. Invalidate on write, rebuild lazily on query. - Do NOT implement signal-triggered updates to these bitmaps in this task. That wiring is done in m3p2 (Feedback Loop) where the signal dispatch atomically calls `mark_seen`, `add_hide`, `add_block` as part of the signal write path.