# Task 01: FOR USER Query Context ## Context **Milestone:** 3 -- Personalized Ranking **Phase:** m3p3 -- Personalized Ranking Profiles **Depends On:** m3p2 (feedback loop: `UserStateIndex`, `InteractionWeightLedger`, preference vectors populated), m2p5 (query parser, RETRIEVE executor) **Blocks:** Task 02 (Personalized Profiles need `UserContext` for scoring), Task 03 (Cold Start needs `UserContext` to detect cold-start state), m3p4 (User State Filters need `FOR USER` to resolve user state) **Complexity:** M ## Objective Deliver the `FOR USER @user_id` clause in the query parser and the `UserContext` struct that loads all user state needed for personalized ranking. When a RETRIEVE query includes `FOR USER @42`, the executor loads user 42's preference vector, interaction weights, followed creators, and blocked state into a `UserContext`, and passes it to the profile executor for personalized scoring. The `UserContext` is the bridge between the query language and the personalization engine. Without it, profiles cannot access user state. With it, the same profile definition can produce different rankings for different users. ## Requirements - `FOR USER @user_id` clause parsed by the query parser - `UserContext` struct with all user state needed for personalized ranking - `UserContext::load(user_id, user_state, interaction_weights, storage)` loads all state - `UserContext` contains: preference vector, top-N interaction weights, followed creator set, blocked creator set, hidden item set - `UserContext::is_cold_start()` returns true if no preference vector - Query parser produces `Option` for the user_id from `FOR USER` - RETRIEVE executor passes `UserContext` to `ProfileExecutor` when available - `ProfileExecutor::score_with_context()` accepts optional `UserContext` - `SIMILAR TO @item_id` clause parsed for the `related` profile - Query AST extended with `user_id: Option` and `similar_to: Option` ## Technical Design ### Module Structure ``` tidal/src/ db/ user_context.rs -- UserContext struct, load logic query/ mod.rs -- Extended query AST with user_id, similar_to ``` ### UserContext ```rust // === db/user_context.rs === use std::collections::HashSet; use crate::schema::EntityId; use crate::entities::preference::PreferenceVector; use crate::entities::interaction::InteractionWeightLedger; use crate::entities::user_state::UserStateIndex; /// All user state needed for personalized ranking in a single query. /// /// Loaded once per query from the user state index, interaction weight /// ledger, and preference vector storage. Passed to the profile executor /// for personalized scoring. /// /// If the user does not exist or has no state, fields are empty/None. /// The profile executor handles this gracefully by falling back to /// population-level signals (cold-start path). #[derive(Debug, Clone)] pub struct UserContext { /// The user performing the query. pub user_id: EntityId, /// User preference embedding for ANN retrieval and scoring. /// `None` for cold-start users. pub preference_vector: Option, /// Top creators by interaction weight (descending). /// Used for social proof and creator-affinity scoring. /// Typically top-50 creators. pub top_creators: Vec<(EntityId, f64)>, /// Set of creator IDs the user follows. /// Used by the `following` profile and exploration budget. pub followed_creators: HashSet, /// Set of creator IDs the user has blocked. /// Used for hard exclusion filtering. pub blocked_creators: HashSet, /// Set of item IDs the user has hidden. /// Used for hard exclusion filtering. pub hidden_items: HashSet, /// Whether this user is a cold-start user (no engagement history). pub is_cold_start: bool, } impl UserContext { /// Load user context from all state sources. /// /// This is called once per RETRIEVE/SEARCH query that includes /// `FOR USER @user_id`. It reads from in-memory indexes (fast). /// /// # Parameters /// /// - `user_id`: the querying user /// - `user_state`: the global user state index (seen, blocked, follows) /// - `interaction_weights`: the interaction weight ledger /// - `pref_reader`: closure to read the preference vector from storage /// - `now`: current timestamp for decay computation pub fn load( user_id: EntityId, user_state: &UserStateIndex, interaction_weights: &InteractionWeightLedger, pref_reader: &dyn Fn(EntityId) -> Option, now: crate::schema::Timestamp, ) -> Self { let preference_vector = pref_reader(user_id); let is_cold_start = preference_vector.as_ref() .map_or(true, |p| p.is_cold_start()); let top_creators = interaction_weights.read_top_creators(user_id, 50, now); let followed = user_state.followed_creators(user_id); let followed_creators: HashSet = followed.iter() .map(|e| e.as_u64()) .collect(); // Read blocked state. let blocked_creators = user_state.blocked_creator_ids(user_id); let hidden_items = user_state.hidden_item_ids(user_id); Self { user_id, preference_vector, top_creators, followed_creators, blocked_creators, hidden_items, is_cold_start, } } /// Check if a creator is followed by this user. pub fn is_following(&self, creator_id: EntityId) -> bool { self.followed_creators.contains(&creator_id.as_u64()) } /// Get the interaction weight for a specific creator. /// /// Returns 0.0 if no interaction history. pub fn interaction_weight(&self, creator_id: EntityId) -> f64 { self.top_creators.iter() .find(|(c, _)| *c == creator_id) .map_or(0.0, |(_, w)| *w) } } ``` ### Query Parser Extension ```rust // Extensions to the query AST in query/mod.rs /// Parsed RETRIEVE query. #[derive(Debug, Clone)] pub struct RetrieveQuery { /// Entity type to retrieve (always "items" for now). pub entity_type: String, /// Optional user context: `FOR USER @user_id` pub user_id: Option, /// Optional source item for related queries: `SIMILAR TO @item_id` pub similar_to: Option, /// Ranking profile name: `USING PROFILE ` pub profile: Option, /// Filter expressions: `FILTER ` pub filters: Vec, /// Diversity constraints: `DIVERSITY ` pub diversity: Option, /// Result limit: `LIMIT ` pub limit: Option, /// Excluded IDs: `EXCLUDE [ids]` pub excludes: Vec, } /// Parse the `FOR USER @` clause. /// /// Returns `Some(EntityId)` if the clause is present. fn parse_for_user(tokens: &[Token], pos: &mut usize) -> Option { // Look for "FOR" "USER" "@" if *pos + 3 < tokens.len() && tokens[*pos].is_keyword("FOR") && tokens[*pos + 1].is_keyword("USER") && tokens[*pos + 2].is_at_prefix() { let id_str = tokens[*pos + 2].strip_at_prefix(); if let Ok(id) = id_str.parse::() { *pos += 3; return Some(EntityId::new(id)); } } None } /// Parse the `SIMILAR TO @` clause. /// /// Returns `Some(EntityId)` if the clause is present. fn parse_similar_to(tokens: &[Token], pos: &mut usize) -> Option { if *pos + 3 < tokens.len() && tokens[*pos].is_keyword("SIMILAR") && tokens[*pos + 1].is_keyword("TO") && tokens[*pos + 2].is_at_prefix() { let id_str = tokens[*pos + 2].strip_at_prefix(); if let Ok(id) = id_str.parse::() { *pos += 3; return Some(EntityId::new(id)); } } None } ``` ### Executor Integration ```rust // In the RETRIEVE executor pipeline: impl TidalDb { pub fn retrieve(&self, query: &str) -> crate::Result> { let parsed = parse_retrieve(query)?; // Load user context if FOR USER is present. let user_context = parsed.user_id.map(|uid| { UserContext::load( uid, &self.user_state, &self.interaction_weights, &|id| self.read_user_preference(id).ok().flatten(), Timestamp::now(), ) }); // ... candidate retrieval, filtering, scoring ... // Pass user_context to the profile executor. let scored = match &user_context { Some(ctx) => executor.score_with_context(candidates, profile, now, ctx), None => executor.score(candidates, profile, now), }; // ... } } ``` ### UserStateIndex Extensions ```rust // Extensions needed on UserStateIndex (from m3p1 Task 03): impl UserStateIndex { /// Get the set of blocked creator IDs for a user. pub fn blocked_creator_ids(&self, user_id: EntityId) -> HashSet { self.blocked .get(&user_id.as_u64()) .map_or_else(HashSet::new, |s| s.blocked_creators.clone()) } /// Get the set of hidden item IDs for a user (as u32 values). pub fn hidden_item_ids(&self, user_id: EntityId) -> HashSet { self.blocked .get(&user_id.as_u64()) .map_or_else(HashSet::new, |s| { s.hidden_items.iter().collect() }) } } ``` ## Test Strategy ### Unit Tests ```rust #[test] fn user_context_loads_empty_for_unknown_user() { let user_state = UserStateIndex::new(); let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default()); let ctx = UserContext::load( EntityId::new(999), &user_state, &iw_ledger, &|_| None, Timestamp::now(), ); assert!(ctx.is_cold_start); assert!(ctx.preference_vector.is_none()); assert!(ctx.followed_creators.is_empty()); assert!(ctx.blocked_creators.is_empty()); assert!(ctx.top_creators.is_empty()); } #[test] fn user_context_loads_follows_and_blocks() { let user_state = UserStateIndex::new(); let user = EntityId::new(1); user_state.add_follow(user, EntityId::new(10)); user_state.add_follow(user, EntityId::new(20)); user_state.add_block(user, EntityId::new(77)); let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default()); let ctx = UserContext::load( user, &user_state, &iw_ledger, &|_| None, Timestamp::now(), ); assert_eq!(ctx.followed_creators.len(), 2); assert!(ctx.followed_creators.contains(&10)); assert!(ctx.followed_creators.contains(&20)); assert!(ctx.blocked_creators.contains(&77)); } #[test] fn user_context_loads_interaction_weights() { let user_state = UserStateIndex::new(); let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default()); let user = EntityId::new(1); let ts = Timestamp::now(); iw_ledger.update_weight(user, EntityId::new(10), 5.0, ts); iw_ledger.update_weight(user, EntityId::new(20), 3.0, ts); let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, ts); assert_eq!(ctx.top_creators.len(), 2); assert_eq!(ctx.top_creators[0].0, EntityId::new(10)); // highest weight } #[test] fn user_context_detects_cold_start() { let user_state = UserStateIndex::new(); let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default()); // No preference vector -> cold start. let ctx = UserContext::load( EntityId::new(1), &user_state, &iw_ledger, &|_| None, Timestamp::now(), ); assert!(ctx.is_cold_start); // With preference vector -> not cold start. let pref = PreferenceVector::from_embedding(vec![0.1; 16], 16).unwrap(); let ctx2 = UserContext::load( EntityId::new(1), &user_state, &iw_ledger, &|_| Some(pref.clone()), Timestamp::now(), ); assert!(!ctx2.is_cold_start); } #[test] fn user_context_is_following() { let user_state = UserStateIndex::new(); let user = EntityId::new(1); user_state.add_follow(user, EntityId::new(10)); let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default()); let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, Timestamp::now()); assert!(ctx.is_following(EntityId::new(10))); assert!(!ctx.is_following(EntityId::new(20))); } #[test] fn user_context_interaction_weight_lookup() { let user_state = UserStateIndex::new(); let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default()); let user = EntityId::new(1); let ts = Timestamp::now(); iw_ledger.update_weight(user, EntityId::new(10), 5.0, ts); let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, ts); assert!(ctx.interaction_weight(EntityId::new(10)) > 4.0); assert!((ctx.interaction_weight(EntityId::new(99)) - 0.0).abs() < f64::EPSILON); } #[test] fn parse_for_user_clause() { // This test depends on the actual parser implementation. // The expected behavior: let query = "RETRIEVE items FOR USER @42 USING PROFILE for_you LIMIT 50"; let parsed = parse_retrieve(query).unwrap(); assert_eq!(parsed.user_id, Some(EntityId::new(42))); } #[test] fn parse_similar_to_clause() { let query = "RETRIEVE items SIMILAR TO @100 FOR USER @42 USING PROFILE related LIMIT 10"; let parsed = parse_retrieve(query).unwrap(); assert_eq!(parsed.user_id, Some(EntityId::new(42))); assert_eq!(parsed.similar_to, Some(EntityId::new(100))); } #[test] fn parse_without_for_user() { let query = "RETRIEVE items USING PROFILE trending LIMIT 25"; let parsed = parse_retrieve(query).unwrap(); assert!(parsed.user_id.is_none()); } ``` ### Property Tests ```rust use proptest::prelude::*; proptest! { #[test] fn user_context_follows_set_matches_user_state( follow_ids in proptest::collection::hash_set(1u64..100, 0..20), ) { let user_state = UserStateIndex::new(); let user = EntityId::new(1); for &cid in &follow_ids { user_state.add_follow(user, EntityId::new(cid)); } let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default()); let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, Timestamp::now()); prop_assert_eq!(ctx.followed_creators.len(), follow_ids.len()); for &cid in &follow_ids { prop_assert!(ctx.followed_creators.contains(&cid), "followed creator {} not in context", cid); } } } ``` ## Acceptance Criteria - [ ] `UserContext` struct with preference_vector, top_creators, followed/blocked/hidden sets - [ ] `UserContext::load()` populates all fields from user state index and interaction weights - [ ] `UserContext::is_cold_start()` returns true when no preference vector - [ ] `UserContext::is_following()` checks followed creator set - [ ] `UserContext::interaction_weight()` looks up decayed weight for creator - [ ] `FOR USER @user_id` clause parsed by query parser - [ ] `SIMILAR TO @item_id` clause parsed by query parser - [ ] Query AST extended with `user_id: Option` and `similar_to: Option` - [ ] RETRIEVE executor loads `UserContext` when `FOR USER` is present - [ ] `UserStateIndex` extended with `blocked_creator_ids()` and `hidden_item_ids()` accessors - [ ] Parsing gracefully handles missing `FOR USER` (returns `None`) - [ ] Property test: follows set in context matches user state index - [ ] `cargo clippy -- -D warnings` passes - [ ] All tests pass ## Research References - [VISION.md](../../../../VISION.md) -- `FOR USER` clause in query language - [USE_CASES.md](../../../../USE_CASES.md) -- All personalized surfaces require user context - [ai-lookup/features/query-language.md](../../../../ai-lookup/features/query-language.md) -- Query language reference ## Implementation Notes - `UserContext::load` reads from in-memory data structures only (no storage I/O except for the preference vector). The user state index, interaction weights, and follows/blocks sets are all in memory. This ensures loading is fast (< 1ms). - The preference vector is the only component that requires a storage read. The `pref_reader` closure abstracts this, allowing tests to inject mock preference vectors without storage setup. - The `top_creators` field is limited to 50 entries. At M3 scale (200 creators per user), scanning all interaction weights for a user is fast. At larger scale, a sorted index may be needed. - The `hidden_item_ids` accessor returns `HashSet` because `RoaringBitmap` uses `u32` keys. This matches the `UserStateIndex` implementation from m3p1 Task 03. - The `SIMILAR TO` clause parser should be flexible in ordering: `RETRIEVE items SIMILAR TO @100 FOR USER @42 ...` and `RETRIEVE items FOR USER @42 SIMILAR TO @100 ...` should both work. - Do NOT implement the personalized scoring logic in this task. This task delivers the context loading and query parsing. The scoring is done in Task 02.