tidaldb/docs/planning/milestone-3/phase-1/task-03-user-state-bitmap-indexes.md
jordan 39ada28c6e feat: complete Milestones 2–4 — RETRIEVE query, vector index, ranking profiles, diversity, entity system, sessions
M2: RETRIEVE query pipeline with 5-stage execution (candidate → filter → score → diversify → limit),
    usearch HNSW vector index, bitmap/range/universe filters, ranking profiles with signal scoring,
    MMR diversity enforcement, and m2_uat integration tests.

M3: Entity system with typed metadata, relationship graph (follows/blocks/interactions),
    creator entities, session tracking, and m3_uat integration tests.

M4: Advanced ranking with builtin functions (freshness, trending, controversy, wilson),
    ranking executor with explain mode, query executor integration, benchmarks for
    query/ranking/vector/filters/diversity, and m4_uat integration tests.

Includes: 9 new blog posts, marketing site updates, updated roadmap, and updated vision doc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 16:24:48 -07:00

20 KiB

Task 03: User-State Bitmap Indexes

Context

Milestone: 3 -- Personalized Ranking Phase: m3p1 -- User and Creator Entities with Relationships Depends On: Task 01 (User/Creator entities, CreatorItemsBitmap), Task 02 (Relationship graph: follows, blocks, hide edges) Blocks: m3p2 (Feedback Loop updates seen/hide bitmaps), m3p3 (Personalized Profiles use follows bitmap for following profile), m3p4 (User State Filters compose these bitmaps with metadata filters) Complexity: M

Objective

Deliver three user-state bitmap structures that power the unseen, unblocked, and relationship:follows filters in the RETRIEVE executor:

  1. UserSeenBitmap: per-user roaring bitmap of item IDs the user has viewed. Updated on view signals. Used by the unseen filter to exclude already-seen items.

  2. UserBlockedSet: per-user set of blocked creator IDs and hidden item IDs. Built from blocks and hide relationship edges. Used by the unblocked filter to exclude all items from blocked creators and all hidden items.

  3. FollowsBitmap: per-user roaring bitmap of item IDs from followed creators. Built by intersecting the user's follows edges with the CreatorItemsBitmap from Task 01. Used by FILTER relationship:follows to restrict candidates to followed creators' items.

These bitmaps are maintained in-memory for hot-path query performance and reconstructed from the storage engine on restart. They compose with the existing FilterExpr / FilterResult system from m2p2.

Requirements

  • UserSeenBitmap: per-user RoaringBitmap of viewed item IDs
  • UserBlockedSet: per-user HashSet<EntityId> of blocked creator IDs + RoaringBitmap of hidden item IDs
  • FollowsBitmap: per-user RoaringBitmap of item IDs from followed creators
  • UserStateIndex: container holding all three structures for all users, backed by DashMap per structure
  • user_state.mark_seen(user_id, item_id) adds to seen bitmap
  • user_state.is_seen(user_id, item_id) checks membership
  • user_state.add_block(user_id, creator_id) adds to blocked set
  • user_state.add_hide(user_id, item_id) adds to hidden set
  • user_state.add_follow(user_id, creator_id) rebuilds follows bitmap from creator items
  • user_state.remove_follow(user_id, creator_id) updates follows bitmap
  • user_state.unseen_filter(user_id) returns FilterResult::Predicate excluding seen items
  • user_state.unblocked_filter(user_id, creator_items_bitmap) returns FilterResult::Predicate excluding blocked+hidden
  • user_state.follows_filter(user_id) returns FilterResult::Bitmap of followed creators' items
  • Memory budget: ~125KB per user at 1M items for seen bitmap (roaring bitmap compression)
  • Reconstruction from storage on restart via rebuild_from_storage()

Technical Design

Module Structure

tidal/src/
  entities/
    user_state.rs -- UserStateIndex, all bitmap types

Core Types

// === entities/user_state.rs ===

use dashmap::DashMap;
use roaring::RoaringBitmap;
use std::collections::HashSet;

use crate::schema::EntityId;
use super::CreatorItemsBitmap;

/// Per-user blocked creators and hidden items.
#[derive(Debug, Default, Clone)]
pub struct BlockedState {
    /// Creator IDs the user has blocked.
    pub blocked_creators: HashSet<u64>,
    /// Item IDs the user has hidden.
    pub hidden_items: RoaringBitmap,
}

/// Centralized user-state index for fast query-time filtering.
///
/// All structures are in-memory for hot-path performance. They are
/// rebuilt from the storage engine on startup and incrementally
/// maintained on signal writes and relationship changes.
pub struct UserStateIndex {
    /// Per-user seen item bitmaps. Key: user_id as u64.
    seen: DashMap<u64, RoaringBitmap>,
    /// Per-user blocked/hidden state. Key: user_id as u64.
    blocked: DashMap<u64, BlockedState>,
    /// Per-user followed creator IDs. Key: user_id as u64.
    follows: DashMap<u64, HashSet<u64>>,
}

impl UserStateIndex {
    pub fn new() -> Self {
        Self {
            seen: DashMap::new(),
            blocked: DashMap::new(),
            follows: DashMap::new(),
        }
    }

    // ── Seen ─────────────────────────────────────────────────

    /// Mark an item as seen by a user.
    pub fn mark_seen(&self, user_id: EntityId, item_id: EntityId) {
        self.seen
            .entry(user_id.as_u64())
            .or_default()
            .insert(item_id.as_u64() as u32);
    }

    /// Check if a user has seen an item.
    pub fn is_seen(&self, user_id: EntityId, item_id: EntityId) -> bool {
        self.seen
            .get(&user_id.as_u64())
            .map_or(false, |bm| bm.contains(item_id.as_u64() as u32))
    }

    /// Get the count of seen items for a user.
    pub fn seen_count(&self, user_id: EntityId) -> u64 {
        self.seen
            .get(&user_id.as_u64())
            .map_or(0, |bm| bm.len())
    }

    // ── Blocked / Hidden ──────────────────────────────────────

    /// Add a creator to the user's blocked set.
    pub fn add_block(&self, user_id: EntityId, creator_id: EntityId) {
        self.blocked
            .entry(user_id.as_u64())
            .or_default()
            .blocked_creators
            .insert(creator_id.as_u64());
    }

    /// Add an item to the user's hidden set.
    pub fn add_hide(&self, user_id: EntityId, item_id: EntityId) {
        self.blocked
            .entry(user_id.as_u64())
            .or_default()
            .hidden_items
            .insert(item_id.as_u64() as u32);
    }

    /// Check if a creator is blocked by a user.
    pub fn is_blocked(&self, user_id: EntityId, creator_id: EntityId) -> bool {
        self.blocked
            .get(&user_id.as_u64())
            .map_or(false, |s| s.blocked_creators.contains(&creator_id.as_u64()))
    }

    /// Check if an item is hidden by a user.
    pub fn is_hidden(&self, user_id: EntityId, item_id: EntityId) -> bool {
        self.blocked
            .get(&user_id.as_u64())
            .map_or(false, |s| s.hidden_items.contains(item_id.as_u64() as u32))
    }

    // ── Follows ──────────────────────────────────────────────

    /// Add a follow relationship.
    pub fn add_follow(&self, user_id: EntityId, creator_id: EntityId) {
        self.follows
            .entry(user_id.as_u64())
            .or_default()
            .insert(creator_id.as_u64());
    }

    /// Remove a follow relationship.
    pub fn remove_follow(&self, user_id: EntityId, creator_id: EntityId) {
        if let Some(mut set) = self.follows.get_mut(&user_id.as_u64()) {
            set.remove(&creator_id.as_u64());
        }
    }

    /// Get the set of creator IDs a user follows.
    pub fn followed_creators(&self, user_id: EntityId) -> Vec<EntityId> {
        self.follows
            .get(&user_id.as_u64())
            .map_or_else(Vec::new, |set| {
                set.iter().map(|&id| EntityId::new(id)).collect()
            })
    }

    // ── Filter builders ──────────────────────────────────────

    /// Build an "unseen" filter predicate for a user.
    ///
    /// Returns a closure that returns `true` for items the user has NOT seen.
    pub fn unseen_predicate(
        &self,
        user_id: EntityId,
    ) -> Box<dyn Fn(u64) -> bool + Send + Sync> {
        let seen_bitmap = self.seen
            .get(&user_id.as_u64())
            .map(|bm| bm.clone());
        Box::new(move |item_id: u64| {
            match &seen_bitmap {
                Some(bm) => !bm.contains(item_id as u32),
                None => true, // no seen data = everything is unseen
            }
        })
    }

    /// Build an "unblocked" filter predicate for a user.
    ///
    /// Returns a closure that returns `true` for items that are:
    /// - NOT from a blocked creator
    /// - NOT in the user's hidden set
    ///
    /// Requires a function to look up creator_id for an item.
    pub fn unblocked_predicate(
        &self,
        user_id: EntityId,
    ) -> Box<dyn Fn(u64, Option<u64>) -> bool + Send + Sync> {
        let state = self.blocked
            .get(&user_id.as_u64())
            .map(|s| s.clone());
        Box::new(move |item_id: u64, creator_id: Option<u64>| {
            match &state {
                Some(s) => {
                    // Check hidden items
                    if s.hidden_items.contains(item_id as u32) {
                        return false;
                    }
                    // Check blocked creators
                    if let Some(cid) = creator_id {
                        if s.blocked_creators.contains(&cid) {
                            return false;
                        }
                    }
                    true
                }
                None => true,
            }
        })
    }

    /// Build a "follows" filter bitmap for a user.
    ///
    /// Returns the union of all item bitmaps for creators the user follows.
    pub fn follows_bitmap(
        &self,
        user_id: EntityId,
        creator_items: &CreatorItemsBitmap,
    ) -> RoaringBitmap {
        let creators = self.followed_creators(user_id);
        let creator_ids: Vec<EntityId> = creators;
        creator_items.items_for_creators(&creator_ids)
    }

    // ── Reconstruction ───────────────────────────────────────

    /// Rebuild all user-state bitmaps from storage.
    ///
    /// Scans all relationship edges and signal ledger entries to reconstruct:
    /// - Seen bitmaps from view signals
    /// - Blocked/hidden sets from blocks/hide relationship edges
    /// - Follows sets from follows relationship edges
    pub fn rebuild_from_relationships(
        &self,
        storage: &dyn crate::storage::StorageEngine,
    ) -> crate::Result<()> {
        // Scan all Rel-tagged keys in users keyspace
        // For each key, decode the relationship type and update the
        // appropriate bitmap/set.
        // Implementation detail: use entity_tag_prefix scanning.
        // This is called once on startup.
        Ok(())
    }
}

Integration with FilterExpr

The unseen and unblocked filters are new FilterExpr variants that the query executor evaluates using the UserStateIndex:

// Extend FilterExpr in storage/indexes/filter.rs
pub enum FilterExpr {
    // ... existing variants ...

    /// Exclude items the user has seen. Requires user context.
    Unseen,
    /// Exclude items from blocked creators and hidden items. Requires user context.
    Unblocked,
    /// Only items from followed creators. Requires user context.
    Follows,
}

The executor resolves these variants at query time by consulting the UserStateIndex attached to the TidalDb instance.

Test Strategy

Unit Tests

#[test]
fn mark_seen_and_check() {
    let index = UserStateIndex::new();
    let user = EntityId::new(1);
    let item = EntityId::new(42);
    assert!(!index.is_seen(user, item));
    index.mark_seen(user, item);
    assert!(index.is_seen(user, item));
}

#[test]
fn seen_count_increments() {
    let index = UserStateIndex::new();
    let user = EntityId::new(1);
    assert_eq!(index.seen_count(user), 0);
    index.mark_seen(user, EntityId::new(1));
    index.mark_seen(user, EntityId::new(2));
    index.mark_seen(user, EntityId::new(2)); // duplicate
    assert_eq!(index.seen_count(user), 2);
}

#[test]
fn block_and_check() {
    let index = UserStateIndex::new();
    let user = EntityId::new(1);
    let creator = EntityId::new(10);
    assert!(!index.is_blocked(user, creator));
    index.add_block(user, creator);
    assert!(index.is_blocked(user, creator));
}

#[test]
fn hide_and_check() {
    let index = UserStateIndex::new();
    let user = EntityId::new(1);
    let item = EntityId::new(42);
    assert!(!index.is_hidden(user, item));
    index.add_hide(user, item);
    assert!(index.is_hidden(user, item));
}

#[test]
fn follow_and_list() {
    let index = UserStateIndex::new();
    let user = EntityId::new(1);
    index.add_follow(user, EntityId::new(10));
    index.add_follow(user, EntityId::new(20));
    let creators = index.followed_creators(user);
    assert_eq!(creators.len(), 2);
}

#[test]
fn unfollow_removes_creator() {
    let index = UserStateIndex::new();
    let user = EntityId::new(1);
    index.add_follow(user, EntityId::new(10));
    index.add_follow(user, EntityId::new(20));
    index.remove_follow(user, EntityId::new(10));
    let creators = index.followed_creators(user);
    assert_eq!(creators.len(), 1);
    assert!(creators.contains(&EntityId::new(20)));
}

#[test]
fn unseen_predicate_excludes_seen_items() {
    let index = UserStateIndex::new();
    let user = EntityId::new(1);
    index.mark_seen(user, EntityId::new(5));
    index.mark_seen(user, EntityId::new(10));

    let pred = index.unseen_predicate(user);
    assert!(!pred(5));   // seen -> excluded
    assert!(!pred(10));  // seen -> excluded
    assert!(pred(15));   // unseen -> included
    assert!(pred(1));    // unseen -> included
}

#[test]
fn unseen_predicate_for_unknown_user_includes_all() {
    let index = UserStateIndex::new();
    let pred = index.unseen_predicate(EntityId::new(999));
    assert!(pred(1));
    assert!(pred(100));
}

#[test]
fn unblocked_predicate_excludes_blocked_and_hidden() {
    let index = UserStateIndex::new();
    let user = EntityId::new(1);
    index.add_block(user, EntityId::new(77)); // block creator 77
    index.add_hide(user, EntityId::new(42));  // hide item 42

    let pred = index.unblocked_predicate(user);
    // Item from blocked creator -> excluded
    assert!(!pred(100, Some(77)));
    // Hidden item -> excluded regardless of creator
    assert!(!pred(42, Some(1)));
    assert!(!pred(42, None));
    // Normal item from unblocked creator -> included
    assert!(pred(50, Some(10)));
    // Item with unknown creator -> included (not blocked)
    assert!(pred(50, None));
}

#[test]
fn follows_bitmap_union_of_creator_items() {
    let index = UserStateIndex::new();
    let creator_items = CreatorItemsBitmap::new();

    // Creator 10 has items 100, 101
    creator_items.add_item(EntityId::new(10), EntityId::new(100));
    creator_items.add_item(EntityId::new(10), EntityId::new(101));
    // Creator 20 has items 200, 201
    creator_items.add_item(EntityId::new(20), EntityId::new(200));
    creator_items.add_item(EntityId::new(20), EntityId::new(201));
    // Creator 30 has items 300 (not followed)
    creator_items.add_item(EntityId::new(30), EntityId::new(300));

    let user = EntityId::new(1);
    index.add_follow(user, EntityId::new(10));
    index.add_follow(user, EntityId::new(20));

    let bitmap = index.follows_bitmap(user, &creator_items);
    assert!(bitmap.contains(100));
    assert!(bitmap.contains(101));
    assert!(bitmap.contains(200));
    assert!(bitmap.contains(201));
    assert!(!bitmap.contains(300)); // not followed
    assert_eq!(bitmap.len(), 4);
}

#[test]
fn different_users_have_independent_state() {
    let index = UserStateIndex::new();
    let user_a = EntityId::new(1);
    let user_b = EntityId::new(2);

    index.mark_seen(user_a, EntityId::new(42));
    index.add_block(user_b, EntityId::new(77));

    assert!(index.is_seen(user_a, EntityId::new(42)));
    assert!(!index.is_seen(user_b, EntityId::new(42)));
    assert!(!index.is_blocked(user_a, EntityId::new(77)));
    assert!(index.is_blocked(user_b, EntityId::new(77)));
}

Property Tests

use proptest::prelude::*;

proptest! {
    #[test]
    fn seen_items_never_pass_unseen_filter(
        user_id in 1u64..100,
        seen_items in proptest::collection::vec(1u64..10000, 1..100),
        test_item in 1u64..10000,
    ) {
        let index = UserStateIndex::new();
        let user = EntityId::new(user_id);
        for &item in &seen_items {
            index.mark_seen(user, EntityId::new(item));
        }

        let pred = index.unseen_predicate(user);
        if seen_items.contains(&test_item) {
            prop_assert!(!pred(test_item),
                "seen item {} should be excluded by unseen filter", test_item);
        } else {
            prop_assert!(pred(test_item),
                "unseen item {} should pass unseen filter", test_item);
        }
    }

    #[test]
    fn blocked_creators_items_never_pass_unblocked_filter(
        user_id in 1u64..100,
        blocked_creators in proptest::collection::vec(1u64..100, 1..10),
        test_creator in 1u64..100,
        test_item in 1u64..10000,
    ) {
        let index = UserStateIndex::new();
        let user = EntityId::new(user_id);
        for &cid in &blocked_creators {
            index.add_block(user, EntityId::new(cid));
        }

        let pred = index.unblocked_predicate(user);
        if blocked_creators.contains(&test_creator) {
            prop_assert!(!pred(test_item, Some(test_creator)),
                "item from blocked creator {} should be excluded", test_creator);
        } else {
            prop_assert!(pred(test_item, Some(test_creator)),
                "item from unblocked creator {} should pass", test_creator);
        }
    }
}

Acceptance Criteria

  • UserStateIndex with DashMap-backed seen, blocked, and follows structures
  • mark_seen / is_seen / seen_count work correctly
  • add_block / is_blocked / add_hide / is_hidden work correctly
  • add_follow / remove_follow / followed_creators work correctly
  • unseen_predicate returns closure excluding all seen items
  • unseen_predicate for unknown user includes all items
  • unblocked_predicate excludes items from blocked creators AND hidden items
  • follows_bitmap returns union of item sets for followed creators
  • Different users have fully independent state (no cross-contamination)
  • FilterExpr extended with Unseen, Unblocked, Follows variants
  • Memory: roaring bitmap for 1M items is < 200KB per user in typical usage
  • Property test: seen items NEVER pass unseen filter
  • Property test: blocked creators' items NEVER pass unblocked filter
  • All unit and property tests pass
  • cargo clippy -- -D warnings passes

Research References

Implementation Notes

  • The UserStateIndex is stored as a field on TidalDb, allocated during open(). It is Send + Sync because all inner maps are DashMap.
  • RoaringBitmap uses u32 keys. For M3 at up to 1M items, u32 is sufficient. If item IDs exceed u32::MAX, the bitmap must be partitioned. This is unlikely before M7 (production hardening). Document the u32 limitation.
  • The unblocked_predicate takes an (item_id, Option<creator_id>) pair because the predicate needs to know the creator for each item. The executor must look up creator_id per item when evaluating this filter. In the RETRIEVE executor, creator_id is already available from ScoredCandidate::creator_id (set in m2p3).
  • On startup, rebuild_from_relationships scans all Tag::Rel keys in the users keyspace and populates the follows, blocked, and hidden structures. Seen bitmaps are NOT rebuilt from storage on startup for M3 -- they start empty and are populated from signal writes during the session. Full seen-state persistence (checkpoint + restore) is deferred to m3p4 Task 01 where it is implemented properly.
  • The CreatorItemsBitmap (from Task 01) must be updated when new items are written. The FollowsBitmap then becomes stale if new items arrive for a followed creator. Two approaches: (a) rebuild follows bitmap on every item write (expensive), (b) rebuild follows bitmap on every query (cached). Recommendation: (b) -- cache the follows bitmap per user with a generation counter that increments on item writes. Invalidate on write, rebuild lazily on query.
  • Do NOT implement signal-triggered updates to these bitmaps in this task. That wiring is done in m3p2 (Feedback Loop) where the signal dispatch atomically calls mark_seen, add_hide, add_block as part of the signal write path.