M2: RETRIEVE query pipeline with 5-stage execution (candidate → filter → score → diversify → limit),
usearch HNSW vector index, bitmap/range/universe filters, ranking profiles with signal scoring,
MMR diversity enforcement, and m2_uat integration tests.
M3: Entity system with typed metadata, relationship graph (follows/blocks/interactions),
creator entities, session tracking, and m3_uat integration tests.
M4: Advanced ranking with builtin functions (freshness, trending, controversy, wilson),
ranking executor with explain mode, query executor integration, benchmarks for
query/ranking/vector/filters/diversity, and m4_uat integration tests.
Includes: 9 new blog posts, marketing site updates, updated roadmap, and updated vision doc.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
20 KiB
Task 03: User-State Bitmap Indexes
Context
Milestone: 3 -- Personalized Ranking
Phase: m3p1 -- User and Creator Entities with Relationships
Depends On: Task 01 (User/Creator entities, CreatorItemsBitmap), Task 02 (Relationship graph: follows, blocks, hide edges)
Blocks: m3p2 (Feedback Loop updates seen/hide bitmaps), m3p3 (Personalized Profiles use follows bitmap for following profile), m3p4 (User State Filters compose these bitmaps with metadata filters)
Complexity: M
Objective
Deliver three user-state bitmap structures that power the unseen, unblocked, and relationship:follows filters in the RETRIEVE executor:
-
UserSeenBitmap: per-user roaring bitmap of item IDs the user has viewed. Updated onviewsignals. Used by theunseenfilter to exclude already-seen items. -
UserBlockedSet: per-user set of blocked creator IDs and hidden item IDs. Built fromblocksandhiderelationship edges. Used by theunblockedfilter to exclude all items from blocked creators and all hidden items. -
FollowsBitmap: per-user roaring bitmap of item IDs from followed creators. Built by intersecting the user'sfollowsedges with theCreatorItemsBitmapfrom Task 01. Used byFILTER relationship:followsto restrict candidates to followed creators' items.
These bitmaps are maintained in-memory for hot-path query performance and reconstructed from the storage engine on restart. They compose with the existing FilterExpr / FilterResult system from m2p2.
Requirements
UserSeenBitmap: per-userRoaringBitmapof viewed item IDsUserBlockedSet: per-userHashSet<EntityId>of blocked creator IDs +RoaringBitmapof hidden item IDsFollowsBitmap: per-userRoaringBitmapof item IDs from followed creatorsUserStateIndex: container holding all three structures for all users, backed byDashMapper structureuser_state.mark_seen(user_id, item_id)adds to seen bitmapuser_state.is_seen(user_id, item_id)checks membershipuser_state.add_block(user_id, creator_id)adds to blocked setuser_state.add_hide(user_id, item_id)adds to hidden setuser_state.add_follow(user_id, creator_id)rebuilds follows bitmap from creator itemsuser_state.remove_follow(user_id, creator_id)updates follows bitmapuser_state.unseen_filter(user_id)returnsFilterResult::Predicateexcluding seen itemsuser_state.unblocked_filter(user_id, creator_items_bitmap)returnsFilterResult::Predicateexcluding blocked+hiddenuser_state.follows_filter(user_id)returnsFilterResult::Bitmapof followed creators' items- Memory budget: ~125KB per user at 1M items for seen bitmap (roaring bitmap compression)
- Reconstruction from storage on restart via
rebuild_from_storage()
Technical Design
Module Structure
tidal/src/
entities/
user_state.rs -- UserStateIndex, all bitmap types
Core Types
// === entities/user_state.rs ===
use dashmap::DashMap;
use roaring::RoaringBitmap;
use std::collections::HashSet;
use crate::schema::EntityId;
use super::CreatorItemsBitmap;
/// Per-user blocked creators and hidden items.
#[derive(Debug, Default, Clone)]
pub struct BlockedState {
/// Creator IDs the user has blocked.
pub blocked_creators: HashSet<u64>,
/// Item IDs the user has hidden.
pub hidden_items: RoaringBitmap,
}
/// Centralized user-state index for fast query-time filtering.
///
/// All structures are in-memory for hot-path performance. They are
/// rebuilt from the storage engine on startup and incrementally
/// maintained on signal writes and relationship changes.
pub struct UserStateIndex {
/// Per-user seen item bitmaps. Key: user_id as u64.
seen: DashMap<u64, RoaringBitmap>,
/// Per-user blocked/hidden state. Key: user_id as u64.
blocked: DashMap<u64, BlockedState>,
/// Per-user followed creator IDs. Key: user_id as u64.
follows: DashMap<u64, HashSet<u64>>,
}
impl UserStateIndex {
pub fn new() -> Self {
Self {
seen: DashMap::new(),
blocked: DashMap::new(),
follows: DashMap::new(),
}
}
// ── Seen ─────────────────────────────────────────────────
/// Mark an item as seen by a user.
pub fn mark_seen(&self, user_id: EntityId, item_id: EntityId) {
self.seen
.entry(user_id.as_u64())
.or_default()
.insert(item_id.as_u64() as u32);
}
/// Check if a user has seen an item.
pub fn is_seen(&self, user_id: EntityId, item_id: EntityId) -> bool {
self.seen
.get(&user_id.as_u64())
.map_or(false, |bm| bm.contains(item_id.as_u64() as u32))
}
/// Get the count of seen items for a user.
pub fn seen_count(&self, user_id: EntityId) -> u64 {
self.seen
.get(&user_id.as_u64())
.map_or(0, |bm| bm.len())
}
// ── Blocked / Hidden ──────────────────────────────────────
/// Add a creator to the user's blocked set.
pub fn add_block(&self, user_id: EntityId, creator_id: EntityId) {
self.blocked
.entry(user_id.as_u64())
.or_default()
.blocked_creators
.insert(creator_id.as_u64());
}
/// Add an item to the user's hidden set.
pub fn add_hide(&self, user_id: EntityId, item_id: EntityId) {
self.blocked
.entry(user_id.as_u64())
.or_default()
.hidden_items
.insert(item_id.as_u64() as u32);
}
/// Check if a creator is blocked by a user.
pub fn is_blocked(&self, user_id: EntityId, creator_id: EntityId) -> bool {
self.blocked
.get(&user_id.as_u64())
.map_or(false, |s| s.blocked_creators.contains(&creator_id.as_u64()))
}
/// Check if an item is hidden by a user.
pub fn is_hidden(&self, user_id: EntityId, item_id: EntityId) -> bool {
self.blocked
.get(&user_id.as_u64())
.map_or(false, |s| s.hidden_items.contains(item_id.as_u64() as u32))
}
// ── Follows ──────────────────────────────────────────────
/// Add a follow relationship.
pub fn add_follow(&self, user_id: EntityId, creator_id: EntityId) {
self.follows
.entry(user_id.as_u64())
.or_default()
.insert(creator_id.as_u64());
}
/// Remove a follow relationship.
pub fn remove_follow(&self, user_id: EntityId, creator_id: EntityId) {
if let Some(mut set) = self.follows.get_mut(&user_id.as_u64()) {
set.remove(&creator_id.as_u64());
}
}
/// Get the set of creator IDs a user follows.
pub fn followed_creators(&self, user_id: EntityId) -> Vec<EntityId> {
self.follows
.get(&user_id.as_u64())
.map_or_else(Vec::new, |set| {
set.iter().map(|&id| EntityId::new(id)).collect()
})
}
// ── Filter builders ──────────────────────────────────────
/// Build an "unseen" filter predicate for a user.
///
/// Returns a closure that returns `true` for items the user has NOT seen.
pub fn unseen_predicate(
&self,
user_id: EntityId,
) -> Box<dyn Fn(u64) -> bool + Send + Sync> {
let seen_bitmap = self.seen
.get(&user_id.as_u64())
.map(|bm| bm.clone());
Box::new(move |item_id: u64| {
match &seen_bitmap {
Some(bm) => !bm.contains(item_id as u32),
None => true, // no seen data = everything is unseen
}
})
}
/// Build an "unblocked" filter predicate for a user.
///
/// Returns a closure that returns `true` for items that are:
/// - NOT from a blocked creator
/// - NOT in the user's hidden set
///
/// Requires a function to look up creator_id for an item.
pub fn unblocked_predicate(
&self,
user_id: EntityId,
) -> Box<dyn Fn(u64, Option<u64>) -> bool + Send + Sync> {
let state = self.blocked
.get(&user_id.as_u64())
.map(|s| s.clone());
Box::new(move |item_id: u64, creator_id: Option<u64>| {
match &state {
Some(s) => {
// Check hidden items
if s.hidden_items.contains(item_id as u32) {
return false;
}
// Check blocked creators
if let Some(cid) = creator_id {
if s.blocked_creators.contains(&cid) {
return false;
}
}
true
}
None => true,
}
})
}
/// Build a "follows" filter bitmap for a user.
///
/// Returns the union of all item bitmaps for creators the user follows.
pub fn follows_bitmap(
&self,
user_id: EntityId,
creator_items: &CreatorItemsBitmap,
) -> RoaringBitmap {
let creators = self.followed_creators(user_id);
let creator_ids: Vec<EntityId> = creators;
creator_items.items_for_creators(&creator_ids)
}
// ── Reconstruction ───────────────────────────────────────
/// Rebuild all user-state bitmaps from storage.
///
/// Scans all relationship edges and signal ledger entries to reconstruct:
/// - Seen bitmaps from view signals
/// - Blocked/hidden sets from blocks/hide relationship edges
/// - Follows sets from follows relationship edges
pub fn rebuild_from_relationships(
&self,
storage: &dyn crate::storage::StorageEngine,
) -> crate::Result<()> {
// Scan all Rel-tagged keys in users keyspace
// For each key, decode the relationship type and update the
// appropriate bitmap/set.
// Implementation detail: use entity_tag_prefix scanning.
// This is called once on startup.
Ok(())
}
}
Integration with FilterExpr
The unseen and unblocked filters are new FilterExpr variants that the query executor evaluates using the UserStateIndex:
// Extend FilterExpr in storage/indexes/filter.rs
pub enum FilterExpr {
// ... existing variants ...
/// Exclude items the user has seen. Requires user context.
Unseen,
/// Exclude items from blocked creators and hidden items. Requires user context.
Unblocked,
/// Only items from followed creators. Requires user context.
Follows,
}
The executor resolves these variants at query time by consulting the UserStateIndex attached to the TidalDb instance.
Test Strategy
Unit Tests
#[test]
fn mark_seen_and_check() {
let index = UserStateIndex::new();
let user = EntityId::new(1);
let item = EntityId::new(42);
assert!(!index.is_seen(user, item));
index.mark_seen(user, item);
assert!(index.is_seen(user, item));
}
#[test]
fn seen_count_increments() {
let index = UserStateIndex::new();
let user = EntityId::new(1);
assert_eq!(index.seen_count(user), 0);
index.mark_seen(user, EntityId::new(1));
index.mark_seen(user, EntityId::new(2));
index.mark_seen(user, EntityId::new(2)); // duplicate
assert_eq!(index.seen_count(user), 2);
}
#[test]
fn block_and_check() {
let index = UserStateIndex::new();
let user = EntityId::new(1);
let creator = EntityId::new(10);
assert!(!index.is_blocked(user, creator));
index.add_block(user, creator);
assert!(index.is_blocked(user, creator));
}
#[test]
fn hide_and_check() {
let index = UserStateIndex::new();
let user = EntityId::new(1);
let item = EntityId::new(42);
assert!(!index.is_hidden(user, item));
index.add_hide(user, item);
assert!(index.is_hidden(user, item));
}
#[test]
fn follow_and_list() {
let index = UserStateIndex::new();
let user = EntityId::new(1);
index.add_follow(user, EntityId::new(10));
index.add_follow(user, EntityId::new(20));
let creators = index.followed_creators(user);
assert_eq!(creators.len(), 2);
}
#[test]
fn unfollow_removes_creator() {
let index = UserStateIndex::new();
let user = EntityId::new(1);
index.add_follow(user, EntityId::new(10));
index.add_follow(user, EntityId::new(20));
index.remove_follow(user, EntityId::new(10));
let creators = index.followed_creators(user);
assert_eq!(creators.len(), 1);
assert!(creators.contains(&EntityId::new(20)));
}
#[test]
fn unseen_predicate_excludes_seen_items() {
let index = UserStateIndex::new();
let user = EntityId::new(1);
index.mark_seen(user, EntityId::new(5));
index.mark_seen(user, EntityId::new(10));
let pred = index.unseen_predicate(user);
assert!(!pred(5)); // seen -> excluded
assert!(!pred(10)); // seen -> excluded
assert!(pred(15)); // unseen -> included
assert!(pred(1)); // unseen -> included
}
#[test]
fn unseen_predicate_for_unknown_user_includes_all() {
let index = UserStateIndex::new();
let pred = index.unseen_predicate(EntityId::new(999));
assert!(pred(1));
assert!(pred(100));
}
#[test]
fn unblocked_predicate_excludes_blocked_and_hidden() {
let index = UserStateIndex::new();
let user = EntityId::new(1);
index.add_block(user, EntityId::new(77)); // block creator 77
index.add_hide(user, EntityId::new(42)); // hide item 42
let pred = index.unblocked_predicate(user);
// Item from blocked creator -> excluded
assert!(!pred(100, Some(77)));
// Hidden item -> excluded regardless of creator
assert!(!pred(42, Some(1)));
assert!(!pred(42, None));
// Normal item from unblocked creator -> included
assert!(pred(50, Some(10)));
// Item with unknown creator -> included (not blocked)
assert!(pred(50, None));
}
#[test]
fn follows_bitmap_union_of_creator_items() {
let index = UserStateIndex::new();
let creator_items = CreatorItemsBitmap::new();
// Creator 10 has items 100, 101
creator_items.add_item(EntityId::new(10), EntityId::new(100));
creator_items.add_item(EntityId::new(10), EntityId::new(101));
// Creator 20 has items 200, 201
creator_items.add_item(EntityId::new(20), EntityId::new(200));
creator_items.add_item(EntityId::new(20), EntityId::new(201));
// Creator 30 has items 300 (not followed)
creator_items.add_item(EntityId::new(30), EntityId::new(300));
let user = EntityId::new(1);
index.add_follow(user, EntityId::new(10));
index.add_follow(user, EntityId::new(20));
let bitmap = index.follows_bitmap(user, &creator_items);
assert!(bitmap.contains(100));
assert!(bitmap.contains(101));
assert!(bitmap.contains(200));
assert!(bitmap.contains(201));
assert!(!bitmap.contains(300)); // not followed
assert_eq!(bitmap.len(), 4);
}
#[test]
fn different_users_have_independent_state() {
let index = UserStateIndex::new();
let user_a = EntityId::new(1);
let user_b = EntityId::new(2);
index.mark_seen(user_a, EntityId::new(42));
index.add_block(user_b, EntityId::new(77));
assert!(index.is_seen(user_a, EntityId::new(42)));
assert!(!index.is_seen(user_b, EntityId::new(42)));
assert!(!index.is_blocked(user_a, EntityId::new(77)));
assert!(index.is_blocked(user_b, EntityId::new(77)));
}
Property Tests
use proptest::prelude::*;
proptest! {
#[test]
fn seen_items_never_pass_unseen_filter(
user_id in 1u64..100,
seen_items in proptest::collection::vec(1u64..10000, 1..100),
test_item in 1u64..10000,
) {
let index = UserStateIndex::new();
let user = EntityId::new(user_id);
for &item in &seen_items {
index.mark_seen(user, EntityId::new(item));
}
let pred = index.unseen_predicate(user);
if seen_items.contains(&test_item) {
prop_assert!(!pred(test_item),
"seen item {} should be excluded by unseen filter", test_item);
} else {
prop_assert!(pred(test_item),
"unseen item {} should pass unseen filter", test_item);
}
}
#[test]
fn blocked_creators_items_never_pass_unblocked_filter(
user_id in 1u64..100,
blocked_creators in proptest::collection::vec(1u64..100, 1..10),
test_creator in 1u64..100,
test_item in 1u64..10000,
) {
let index = UserStateIndex::new();
let user = EntityId::new(user_id);
for &cid in &blocked_creators {
index.add_block(user, EntityId::new(cid));
}
let pred = index.unblocked_predicate(user);
if blocked_creators.contains(&test_creator) {
prop_assert!(!pred(test_item, Some(test_creator)),
"item from blocked creator {} should be excluded", test_creator);
} else {
prop_assert!(pred(test_item, Some(test_creator)),
"item from unblocked creator {} should pass", test_creator);
}
}
}
Acceptance Criteria
UserStateIndexwithDashMap-backed seen, blocked, and follows structuresmark_seen/is_seen/seen_countwork correctlyadd_block/is_blocked/add_hide/is_hiddenwork correctlyadd_follow/remove_follow/followed_creatorswork correctlyunseen_predicatereturns closure excluding all seen itemsunseen_predicatefor unknown user includes all itemsunblocked_predicateexcludes items from blocked creators AND hidden itemsfollows_bitmapreturns union of item sets for followed creators- Different users have fully independent state (no cross-contamination)
FilterExprextended withUnseen,Unblocked,Followsvariants- Memory: roaring bitmap for 1M items is < 200KB per user in typical usage
- Property test: seen items NEVER pass unseen filter
- Property test: blocked creators' items NEVER pass unblocked filter
- All unit and property tests pass
cargo clippy -- -D warningspasses
Research References
- docs/research/ann_for_tidaldb.md -- Roaring bitmap selectivity estimation
- VISION.md -- "unseen" and "unblocked" as first-class filter primitives
Implementation Notes
- The
UserStateIndexis stored as a field onTidalDb, allocated duringopen(). It isSend + Syncbecause all inner maps areDashMap. RoaringBitmapusesu32keys. For M3 at up to 1M items,u32is sufficient. If item IDs exceedu32::MAX, the bitmap must be partitioned. This is unlikely before M7 (production hardening). Document theu32limitation.- The
unblocked_predicatetakes an(item_id, Option<creator_id>)pair because the predicate needs to know the creator for each item. The executor must look up creator_id per item when evaluating this filter. In the RETRIEVE executor, creator_id is already available fromScoredCandidate::creator_id(set in m2p3). - On startup,
rebuild_from_relationshipsscans allTag::Relkeys in the users keyspace and populates thefollows,blocked, andhiddenstructures. Seen bitmaps are NOT rebuilt from storage on startup for M3 -- they start empty and are populated from signal writes during the session. Full seen-state persistence (checkpoint + restore) is deferred to m3p4 Task 01 where it is implemented properly. - The
CreatorItemsBitmap(from Task 01) must be updated when new items are written. TheFollowsBitmapthen becomes stale if new items arrive for a followed creator. Two approaches: (a) rebuild follows bitmap on every item write (expensive), (b) rebuild follows bitmap on every query (cached). Recommendation: (b) -- cache the follows bitmap per user with a generation counter that increments on item writes. Invalidate on write, rebuild lazily on query. - Do NOT implement signal-triggered updates to these bitmaps in this task. That wiring is done in m3p2 (Feedback Loop) where the signal dispatch atomically calls
mark_seen,add_hide,add_blockas part of the signal write path.