M2: RETRIEVE query pipeline with 5-stage execution (candidate → filter → score → diversify → limit),
usearch HNSW vector index, bitmap/range/universe filters, ranking profiles with signal scoring,
MMR diversity enforcement, and m2_uat integration tests.
M3: Entity system with typed metadata, relationship graph (follows/blocks/interactions),
creator entities, session tracking, and m3_uat integration tests.
M4: Advanced ranking with builtin functions (freshness, trending, controversy, wilson),
ranking executor with explain mode, query executor integration, benchmarks for
query/ranking/vector/filters/diversity, and m4_uat integration tests.
Includes: 9 new blog posts, marketing site updates, updated roadmap, and updated vision doc.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
17 KiB
Task 01: FOR USER Query Context
Context
Milestone: 3 -- Personalized Ranking
Phase: m3p3 -- Personalized Ranking Profiles
Depends On: m3p2 (feedback loop: UserStateIndex, InteractionWeightLedger, preference vectors populated), m2p5 (query parser, RETRIEVE executor)
Blocks: Task 02 (Personalized Profiles need UserContext for scoring), Task 03 (Cold Start needs UserContext to detect cold-start state), m3p4 (User State Filters need FOR USER to resolve user state)
Complexity: M
Objective
Deliver the FOR USER @user_id clause in the query parser and the UserContext struct that loads all user state needed for personalized ranking. When a RETRIEVE query includes FOR USER @42, the executor loads user 42's preference vector, interaction weights, followed creators, and blocked state into a UserContext, and passes it to the profile executor for personalized scoring.
The UserContext is the bridge between the query language and the personalization engine. Without it, profiles cannot access user state. With it, the same profile definition can produce different rankings for different users.
Requirements
FOR USER @user_idclause parsed by the query parserUserContextstruct with all user state needed for personalized rankingUserContext::load(user_id, user_state, interaction_weights, storage)loads all stateUserContextcontains: preference vector, top-N interaction weights, followed creator set, blocked creator set, hidden item setUserContext::is_cold_start()returns true if no preference vector- Query parser produces
Option<EntityId>for the user_id fromFOR USER - RETRIEVE executor passes
UserContexttoProfileExecutorwhen available ProfileExecutor::score_with_context()accepts optionalUserContextSIMILAR TO @item_idclause parsed for therelatedprofile- Query AST extended with
user_id: Option<EntityId>andsimilar_to: Option<EntityId>
Technical Design
Module Structure
tidal/src/
db/
user_context.rs -- UserContext struct, load logic
query/
mod.rs -- Extended query AST with user_id, similar_to
UserContext
// === db/user_context.rs ===
use std::collections::HashSet;
use crate::schema::EntityId;
use crate::entities::preference::PreferenceVector;
use crate::entities::interaction::InteractionWeightLedger;
use crate::entities::user_state::UserStateIndex;
/// All user state needed for personalized ranking in a single query.
///
/// Loaded once per query from the user state index, interaction weight
/// ledger, and preference vector storage. Passed to the profile executor
/// for personalized scoring.
///
/// If the user does not exist or has no state, fields are empty/None.
/// The profile executor handles this gracefully by falling back to
/// population-level signals (cold-start path).
#[derive(Debug, Clone)]
pub struct UserContext {
/// The user performing the query.
pub user_id: EntityId,
/// User preference embedding for ANN retrieval and scoring.
/// `None` for cold-start users.
pub preference_vector: Option<PreferenceVector>,
/// Top creators by interaction weight (descending).
/// Used for social proof and creator-affinity scoring.
/// Typically top-50 creators.
pub top_creators: Vec<(EntityId, f64)>,
/// Set of creator IDs the user follows.
/// Used by the `following` profile and exploration budget.
pub followed_creators: HashSet<u64>,
/// Set of creator IDs the user has blocked.
/// Used for hard exclusion filtering.
pub blocked_creators: HashSet<u64>,
/// Set of item IDs the user has hidden.
/// Used for hard exclusion filtering.
pub hidden_items: HashSet<u32>,
/// Whether this user is a cold-start user (no engagement history).
pub is_cold_start: bool,
}
impl UserContext {
/// Load user context from all state sources.
///
/// This is called once per RETRIEVE/SEARCH query that includes
/// `FOR USER @user_id`. It reads from in-memory indexes (fast).
///
/// # Parameters
///
/// - `user_id`: the querying user
/// - `user_state`: the global user state index (seen, blocked, follows)
/// - `interaction_weights`: the interaction weight ledger
/// - `pref_reader`: closure to read the preference vector from storage
/// - `now`: current timestamp for decay computation
pub fn load(
user_id: EntityId,
user_state: &UserStateIndex,
interaction_weights: &InteractionWeightLedger,
pref_reader: &dyn Fn(EntityId) -> Option<PreferenceVector>,
now: crate::schema::Timestamp,
) -> Self {
let preference_vector = pref_reader(user_id);
let is_cold_start = preference_vector.as_ref()
.map_or(true, |p| p.is_cold_start());
let top_creators = interaction_weights.read_top_creators(user_id, 50, now);
let followed = user_state.followed_creators(user_id);
let followed_creators: HashSet<u64> = followed.iter()
.map(|e| e.as_u64())
.collect();
// Read blocked state.
let blocked_creators = user_state.blocked_creator_ids(user_id);
let hidden_items = user_state.hidden_item_ids(user_id);
Self {
user_id,
preference_vector,
top_creators,
followed_creators,
blocked_creators,
hidden_items,
is_cold_start,
}
}
/// Check if a creator is followed by this user.
pub fn is_following(&self, creator_id: EntityId) -> bool {
self.followed_creators.contains(&creator_id.as_u64())
}
/// Get the interaction weight for a specific creator.
///
/// Returns 0.0 if no interaction history.
pub fn interaction_weight(&self, creator_id: EntityId) -> f64 {
self.top_creators.iter()
.find(|(c, _)| *c == creator_id)
.map_or(0.0, |(_, w)| *w)
}
}
Query Parser Extension
// Extensions to the query AST in query/mod.rs
/// Parsed RETRIEVE query.
#[derive(Debug, Clone)]
pub struct RetrieveQuery {
/// Entity type to retrieve (always "items" for now).
pub entity_type: String,
/// Optional user context: `FOR USER @user_id`
pub user_id: Option<EntityId>,
/// Optional source item for related queries: `SIMILAR TO @item_id`
pub similar_to: Option<EntityId>,
/// Ranking profile name: `USING PROFILE <name>`
pub profile: Option<String>,
/// Filter expressions: `FILTER <conditions>`
pub filters: Vec<FilterExpr>,
/// Diversity constraints: `DIVERSITY <constraints>`
pub diversity: Option<DiversityConstraints>,
/// Result limit: `LIMIT <n>`
pub limit: Option<usize>,
/// Excluded IDs: `EXCLUDE [ids]`
pub excludes: Vec<EntityId>,
}
/// Parse the `FOR USER @<id>` clause.
///
/// Returns `Some(EntityId)` if the clause is present.
fn parse_for_user(tokens: &[Token], pos: &mut usize) -> Option<EntityId> {
// Look for "FOR" "USER" "@" <numeric_id>
if *pos + 3 < tokens.len()
&& tokens[*pos].is_keyword("FOR")
&& tokens[*pos + 1].is_keyword("USER")
&& tokens[*pos + 2].is_at_prefix()
{
let id_str = tokens[*pos + 2].strip_at_prefix();
if let Ok(id) = id_str.parse::<u64>() {
*pos += 3;
return Some(EntityId::new(id));
}
}
None
}
/// Parse the `SIMILAR TO @<id>` clause.
///
/// Returns `Some(EntityId)` if the clause is present.
fn parse_similar_to(tokens: &[Token], pos: &mut usize) -> Option<EntityId> {
if *pos + 3 < tokens.len()
&& tokens[*pos].is_keyword("SIMILAR")
&& tokens[*pos + 1].is_keyword("TO")
&& tokens[*pos + 2].is_at_prefix()
{
let id_str = tokens[*pos + 2].strip_at_prefix();
if let Ok(id) = id_str.parse::<u64>() {
*pos += 3;
return Some(EntityId::new(id));
}
}
None
}
Executor Integration
// In the RETRIEVE executor pipeline:
impl TidalDb {
pub fn retrieve(&self, query: &str) -> crate::Result<Vec<ScoredCandidate>> {
let parsed = parse_retrieve(query)?;
// Load user context if FOR USER is present.
let user_context = parsed.user_id.map(|uid| {
UserContext::load(
uid,
&self.user_state,
&self.interaction_weights,
&|id| self.read_user_preference(id).ok().flatten(),
Timestamp::now(),
)
});
// ... candidate retrieval, filtering, scoring ...
// Pass user_context to the profile executor.
let scored = match &user_context {
Some(ctx) => executor.score_with_context(candidates, profile, now, ctx),
None => executor.score(candidates, profile, now),
};
// ...
}
}
UserStateIndex Extensions
// Extensions needed on UserStateIndex (from m3p1 Task 03):
impl UserStateIndex {
/// Get the set of blocked creator IDs for a user.
pub fn blocked_creator_ids(&self, user_id: EntityId) -> HashSet<u64> {
self.blocked
.get(&user_id.as_u64())
.map_or_else(HashSet::new, |s| s.blocked_creators.clone())
}
/// Get the set of hidden item IDs for a user (as u32 values).
pub fn hidden_item_ids(&self, user_id: EntityId) -> HashSet<u32> {
self.blocked
.get(&user_id.as_u64())
.map_or_else(HashSet::new, |s| {
s.hidden_items.iter().collect()
})
}
}
Test Strategy
Unit Tests
#[test]
fn user_context_loads_empty_for_unknown_user() {
let user_state = UserStateIndex::new();
let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
let ctx = UserContext::load(
EntityId::new(999),
&user_state,
&iw_ledger,
&|_| None,
Timestamp::now(),
);
assert!(ctx.is_cold_start);
assert!(ctx.preference_vector.is_none());
assert!(ctx.followed_creators.is_empty());
assert!(ctx.blocked_creators.is_empty());
assert!(ctx.top_creators.is_empty());
}
#[test]
fn user_context_loads_follows_and_blocks() {
let user_state = UserStateIndex::new();
let user = EntityId::new(1);
user_state.add_follow(user, EntityId::new(10));
user_state.add_follow(user, EntityId::new(20));
user_state.add_block(user, EntityId::new(77));
let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
let ctx = UserContext::load(
user, &user_state, &iw_ledger, &|_| None, Timestamp::now(),
);
assert_eq!(ctx.followed_creators.len(), 2);
assert!(ctx.followed_creators.contains(&10));
assert!(ctx.followed_creators.contains(&20));
assert!(ctx.blocked_creators.contains(&77));
}
#[test]
fn user_context_loads_interaction_weights() {
let user_state = UserStateIndex::new();
let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
let user = EntityId::new(1);
let ts = Timestamp::now();
iw_ledger.update_weight(user, EntityId::new(10), 5.0, ts);
iw_ledger.update_weight(user, EntityId::new(20), 3.0, ts);
let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, ts);
assert_eq!(ctx.top_creators.len(), 2);
assert_eq!(ctx.top_creators[0].0, EntityId::new(10)); // highest weight
}
#[test]
fn user_context_detects_cold_start() {
let user_state = UserStateIndex::new();
let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
// No preference vector -> cold start.
let ctx = UserContext::load(
EntityId::new(1), &user_state, &iw_ledger, &|_| None, Timestamp::now(),
);
assert!(ctx.is_cold_start);
// With preference vector -> not cold start.
let pref = PreferenceVector::from_embedding(vec![0.1; 16], 16).unwrap();
let ctx2 = UserContext::load(
EntityId::new(1), &user_state, &iw_ledger,
&|_| Some(pref.clone()), Timestamp::now(),
);
assert!(!ctx2.is_cold_start);
}
#[test]
fn user_context_is_following() {
let user_state = UserStateIndex::new();
let user = EntityId::new(1);
user_state.add_follow(user, EntityId::new(10));
let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, Timestamp::now());
assert!(ctx.is_following(EntityId::new(10)));
assert!(!ctx.is_following(EntityId::new(20)));
}
#[test]
fn user_context_interaction_weight_lookup() {
let user_state = UserStateIndex::new();
let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
let user = EntityId::new(1);
let ts = Timestamp::now();
iw_ledger.update_weight(user, EntityId::new(10), 5.0, ts);
let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, ts);
assert!(ctx.interaction_weight(EntityId::new(10)) > 4.0);
assert!((ctx.interaction_weight(EntityId::new(99)) - 0.0).abs() < f64::EPSILON);
}
#[test]
fn parse_for_user_clause() {
// This test depends on the actual parser implementation.
// The expected behavior:
let query = "RETRIEVE items FOR USER @42 USING PROFILE for_you LIMIT 50";
let parsed = parse_retrieve(query).unwrap();
assert_eq!(parsed.user_id, Some(EntityId::new(42)));
}
#[test]
fn parse_similar_to_clause() {
let query = "RETRIEVE items SIMILAR TO @100 FOR USER @42 USING PROFILE related LIMIT 10";
let parsed = parse_retrieve(query).unwrap();
assert_eq!(parsed.user_id, Some(EntityId::new(42)));
assert_eq!(parsed.similar_to, Some(EntityId::new(100)));
}
#[test]
fn parse_without_for_user() {
let query = "RETRIEVE items USING PROFILE trending LIMIT 25";
let parsed = parse_retrieve(query).unwrap();
assert!(parsed.user_id.is_none());
}
Property Tests
use proptest::prelude::*;
proptest! {
#[test]
fn user_context_follows_set_matches_user_state(
follow_ids in proptest::collection::hash_set(1u64..100, 0..20),
) {
let user_state = UserStateIndex::new();
let user = EntityId::new(1);
for &cid in &follow_ids {
user_state.add_follow(user, EntityId::new(cid));
}
let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, Timestamp::now());
prop_assert_eq!(ctx.followed_creators.len(), follow_ids.len());
for &cid in &follow_ids {
prop_assert!(ctx.followed_creators.contains(&cid),
"followed creator {} not in context", cid);
}
}
}
Acceptance Criteria
UserContextstruct with preference_vector, top_creators, followed/blocked/hidden setsUserContext::load()populates all fields from user state index and interaction weightsUserContext::is_cold_start()returns true when no preference vectorUserContext::is_following()checks followed creator setUserContext::interaction_weight()looks up decayed weight for creatorFOR USER @user_idclause parsed by query parserSIMILAR TO @item_idclause parsed by query parser- Query AST extended with
user_id: Option<EntityId>andsimilar_to: Option<EntityId> - RETRIEVE executor loads
UserContextwhenFOR USERis present UserStateIndexextended withblocked_creator_ids()andhidden_item_ids()accessors- Parsing gracefully handles missing
FOR USER(returnsNone) - Property test: follows set in context matches user state index
cargo clippy -- -D warningspasses- All tests pass
Research References
- VISION.md --
FOR USERclause in query language - USE_CASES.md -- All personalized surfaces require user context
- ai-lookup/features/query-language.md -- Query language reference
Implementation Notes
UserContext::loadreads from in-memory data structures only (no storage I/O except for the preference vector). The user state index, interaction weights, and follows/blocks sets are all in memory. This ensures loading is fast (< 1ms).- The preference vector is the only component that requires a storage read. The
pref_readerclosure abstracts this, allowing tests to inject mock preference vectors without storage setup. - The
top_creatorsfield is limited to 50 entries. At M3 scale (200 creators per user), scanning all interaction weights for a user is fast. At larger scale, a sorted index may be needed. - The
hidden_item_idsaccessor returnsHashSet<u32>becauseRoaringBitmapusesu32keys. This matches theUserStateIndeximplementation from m3p1 Task 03. - The
SIMILAR TOclause parser should be flexible in ordering:RETRIEVE items SIMILAR TO @100 FOR USER @42 ...andRETRIEVE items FOR USER @42 SIMILAR TO @100 ...should both work. - Do NOT implement the personalized scoring logic in this task. This task delivers the context loading and query parsing. The scoring is done in Task 02.