tidaldb/docs/planning/milestone-3/phase-3/task-01-for-user-query-context.md
jordan 39ada28c6e feat: complete Milestones 2–4 — RETRIEVE query, vector index, ranking profiles, diversity, entity system, sessions
M2: RETRIEVE query pipeline with 5-stage execution (candidate → filter → score → diversify → limit),
    usearch HNSW vector index, bitmap/range/universe filters, ranking profiles with signal scoring,
    MMR diversity enforcement, and m2_uat integration tests.

M3: Entity system with typed metadata, relationship graph (follows/blocks/interactions),
    creator entities, session tracking, and m3_uat integration tests.

M4: Advanced ranking with builtin functions (freshness, trending, controversy, wilson),
    ranking executor with explain mode, query executor integration, benchmarks for
    query/ranking/vector/filters/diversity, and m4_uat integration tests.

Includes: 9 new blog posts, marketing site updates, updated roadmap, and updated vision doc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 16:24:48 -07:00

466 lines
17 KiB
Markdown

# Task 01: FOR USER Query Context
## Context
**Milestone:** 3 -- Personalized Ranking
**Phase:** m3p3 -- Personalized Ranking Profiles
**Depends On:** m3p2 (feedback loop: `UserStateIndex`, `InteractionWeightLedger`, preference vectors populated), m2p5 (query parser, RETRIEVE executor)
**Blocks:** Task 02 (Personalized Profiles need `UserContext` for scoring), Task 03 (Cold Start needs `UserContext` to detect cold-start state), m3p4 (User State Filters need `FOR USER` to resolve user state)
**Complexity:** M
## Objective
Deliver the `FOR USER @user_id` clause in the query parser and the `UserContext` struct that loads all user state needed for personalized ranking. When a RETRIEVE query includes `FOR USER @42`, the executor loads user 42's preference vector, interaction weights, followed creators, and blocked state into a `UserContext`, and passes it to the profile executor for personalized scoring.
The `UserContext` is the bridge between the query language and the personalization engine. Without it, profiles cannot access user state. With it, the same profile definition can produce different rankings for different users.
## Requirements
- `FOR USER @user_id` clause parsed by the query parser
- `UserContext` struct with all user state needed for personalized ranking
- `UserContext::load(user_id, user_state, interaction_weights, storage)` loads all state
- `UserContext` contains: preference vector, top-N interaction weights, followed creator set, blocked creator set, hidden item set
- `UserContext::is_cold_start()` returns true if no preference vector
- Query parser produces `Option<EntityId>` for the user_id from `FOR USER`
- RETRIEVE executor passes `UserContext` to `ProfileExecutor` when available
- `ProfileExecutor::score_with_context()` accepts optional `UserContext`
- `SIMILAR TO @item_id` clause parsed for the `related` profile
- Query AST extended with `user_id: Option<EntityId>` and `similar_to: Option<EntityId>`
## Technical Design
### Module Structure
```
tidal/src/
db/
user_context.rs -- UserContext struct, load logic
query/
mod.rs -- Extended query AST with user_id, similar_to
```
### UserContext
```rust
// === db/user_context.rs ===
use std::collections::HashSet;
use crate::schema::EntityId;
use crate::entities::preference::PreferenceVector;
use crate::entities::interaction::InteractionWeightLedger;
use crate::entities::user_state::UserStateIndex;
/// All user state needed for personalized ranking in a single query.
///
/// Loaded once per query from the user state index, interaction weight
/// ledger, and preference vector storage. Passed to the profile executor
/// for personalized scoring.
///
/// If the user does not exist or has no state, fields are empty/None.
/// The profile executor handles this gracefully by falling back to
/// population-level signals (cold-start path).
#[derive(Debug, Clone)]
pub struct UserContext {
/// The user performing the query.
pub user_id: EntityId,
/// User preference embedding for ANN retrieval and scoring.
/// `None` for cold-start users.
pub preference_vector: Option<PreferenceVector>,
/// Top creators by interaction weight (descending).
/// Used for social proof and creator-affinity scoring.
/// Typically top-50 creators.
pub top_creators: Vec<(EntityId, f64)>,
/// Set of creator IDs the user follows.
/// Used by the `following` profile and exploration budget.
pub followed_creators: HashSet<u64>,
/// Set of creator IDs the user has blocked.
/// Used for hard exclusion filtering.
pub blocked_creators: HashSet<u64>,
/// Set of item IDs the user has hidden.
/// Used for hard exclusion filtering.
pub hidden_items: HashSet<u32>,
/// Whether this user is a cold-start user (no engagement history).
pub is_cold_start: bool,
}
impl UserContext {
/// Load user context from all state sources.
///
/// This is called once per RETRIEVE/SEARCH query that includes
/// `FOR USER @user_id`. It reads from in-memory indexes (fast).
///
/// # Parameters
///
/// - `user_id`: the querying user
/// - `user_state`: the global user state index (seen, blocked, follows)
/// - `interaction_weights`: the interaction weight ledger
/// - `pref_reader`: closure to read the preference vector from storage
/// - `now`: current timestamp for decay computation
pub fn load(
user_id: EntityId,
user_state: &UserStateIndex,
interaction_weights: &InteractionWeightLedger,
pref_reader: &dyn Fn(EntityId) -> Option<PreferenceVector>,
now: crate::schema::Timestamp,
) -> Self {
let preference_vector = pref_reader(user_id);
let is_cold_start = preference_vector.as_ref()
.map_or(true, |p| p.is_cold_start());
let top_creators = interaction_weights.read_top_creators(user_id, 50, now);
let followed = user_state.followed_creators(user_id);
let followed_creators: HashSet<u64> = followed.iter()
.map(|e| e.as_u64())
.collect();
// Read blocked state.
let blocked_creators = user_state.blocked_creator_ids(user_id);
let hidden_items = user_state.hidden_item_ids(user_id);
Self {
user_id,
preference_vector,
top_creators,
followed_creators,
blocked_creators,
hidden_items,
is_cold_start,
}
}
/// Check if a creator is followed by this user.
pub fn is_following(&self, creator_id: EntityId) -> bool {
self.followed_creators.contains(&creator_id.as_u64())
}
/// Get the interaction weight for a specific creator.
///
/// Returns 0.0 if no interaction history.
pub fn interaction_weight(&self, creator_id: EntityId) -> f64 {
self.top_creators.iter()
.find(|(c, _)| *c == creator_id)
.map_or(0.0, |(_, w)| *w)
}
}
```
### Query Parser Extension
```rust
// Extensions to the query AST in query/mod.rs
/// Parsed RETRIEVE query.
#[derive(Debug, Clone)]
pub struct RetrieveQuery {
/// Entity type to retrieve (always "items" for now).
pub entity_type: String,
/// Optional user context: `FOR USER @user_id`
pub user_id: Option<EntityId>,
/// Optional source item for related queries: `SIMILAR TO @item_id`
pub similar_to: Option<EntityId>,
/// Ranking profile name: `USING PROFILE <name>`
pub profile: Option<String>,
/// Filter expressions: `FILTER <conditions>`
pub filters: Vec<FilterExpr>,
/// Diversity constraints: `DIVERSITY <constraints>`
pub diversity: Option<DiversityConstraints>,
/// Result limit: `LIMIT <n>`
pub limit: Option<usize>,
/// Excluded IDs: `EXCLUDE [ids]`
pub excludes: Vec<EntityId>,
}
/// Parse the `FOR USER @<id>` clause.
///
/// Returns `Some(EntityId)` if the clause is present.
fn parse_for_user(tokens: &[Token], pos: &mut usize) -> Option<EntityId> {
// Look for "FOR" "USER" "@" <numeric_id>
if *pos + 3 < tokens.len()
&& tokens[*pos].is_keyword("FOR")
&& tokens[*pos + 1].is_keyword("USER")
&& tokens[*pos + 2].is_at_prefix()
{
let id_str = tokens[*pos + 2].strip_at_prefix();
if let Ok(id) = id_str.parse::<u64>() {
*pos += 3;
return Some(EntityId::new(id));
}
}
None
}
/// Parse the `SIMILAR TO @<id>` clause.
///
/// Returns `Some(EntityId)` if the clause is present.
fn parse_similar_to(tokens: &[Token], pos: &mut usize) -> Option<EntityId> {
if *pos + 3 < tokens.len()
&& tokens[*pos].is_keyword("SIMILAR")
&& tokens[*pos + 1].is_keyword("TO")
&& tokens[*pos + 2].is_at_prefix()
{
let id_str = tokens[*pos + 2].strip_at_prefix();
if let Ok(id) = id_str.parse::<u64>() {
*pos += 3;
return Some(EntityId::new(id));
}
}
None
}
```
### Executor Integration
```rust
// In the RETRIEVE executor pipeline:
impl TidalDb {
pub fn retrieve(&self, query: &str) -> crate::Result<Vec<ScoredCandidate>> {
let parsed = parse_retrieve(query)?;
// Load user context if FOR USER is present.
let user_context = parsed.user_id.map(|uid| {
UserContext::load(
uid,
&self.user_state,
&self.interaction_weights,
&|id| self.read_user_preference(id).ok().flatten(),
Timestamp::now(),
)
});
// ... candidate retrieval, filtering, scoring ...
// Pass user_context to the profile executor.
let scored = match &user_context {
Some(ctx) => executor.score_with_context(candidates, profile, now, ctx),
None => executor.score(candidates, profile, now),
};
// ...
}
}
```
### UserStateIndex Extensions
```rust
// Extensions needed on UserStateIndex (from m3p1 Task 03):
impl UserStateIndex {
/// Get the set of blocked creator IDs for a user.
pub fn blocked_creator_ids(&self, user_id: EntityId) -> HashSet<u64> {
self.blocked
.get(&user_id.as_u64())
.map_or_else(HashSet::new, |s| s.blocked_creators.clone())
}
/// Get the set of hidden item IDs for a user (as u32 values).
pub fn hidden_item_ids(&self, user_id: EntityId) -> HashSet<u32> {
self.blocked
.get(&user_id.as_u64())
.map_or_else(HashSet::new, |s| {
s.hidden_items.iter().collect()
})
}
}
```
## Test Strategy
### Unit Tests
```rust
#[test]
fn user_context_loads_empty_for_unknown_user() {
let user_state = UserStateIndex::new();
let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
let ctx = UserContext::load(
EntityId::new(999),
&user_state,
&iw_ledger,
&|_| None,
Timestamp::now(),
);
assert!(ctx.is_cold_start);
assert!(ctx.preference_vector.is_none());
assert!(ctx.followed_creators.is_empty());
assert!(ctx.blocked_creators.is_empty());
assert!(ctx.top_creators.is_empty());
}
#[test]
fn user_context_loads_follows_and_blocks() {
let user_state = UserStateIndex::new();
let user = EntityId::new(1);
user_state.add_follow(user, EntityId::new(10));
user_state.add_follow(user, EntityId::new(20));
user_state.add_block(user, EntityId::new(77));
let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
let ctx = UserContext::load(
user, &user_state, &iw_ledger, &|_| None, Timestamp::now(),
);
assert_eq!(ctx.followed_creators.len(), 2);
assert!(ctx.followed_creators.contains(&10));
assert!(ctx.followed_creators.contains(&20));
assert!(ctx.blocked_creators.contains(&77));
}
#[test]
fn user_context_loads_interaction_weights() {
let user_state = UserStateIndex::new();
let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
let user = EntityId::new(1);
let ts = Timestamp::now();
iw_ledger.update_weight(user, EntityId::new(10), 5.0, ts);
iw_ledger.update_weight(user, EntityId::new(20), 3.0, ts);
let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, ts);
assert_eq!(ctx.top_creators.len(), 2);
assert_eq!(ctx.top_creators[0].0, EntityId::new(10)); // highest weight
}
#[test]
fn user_context_detects_cold_start() {
let user_state = UserStateIndex::new();
let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
// No preference vector -> cold start.
let ctx = UserContext::load(
EntityId::new(1), &user_state, &iw_ledger, &|_| None, Timestamp::now(),
);
assert!(ctx.is_cold_start);
// With preference vector -> not cold start.
let pref = PreferenceVector::from_embedding(vec![0.1; 16], 16).unwrap();
let ctx2 = UserContext::load(
EntityId::new(1), &user_state, &iw_ledger,
&|_| Some(pref.clone()), Timestamp::now(),
);
assert!(!ctx2.is_cold_start);
}
#[test]
fn user_context_is_following() {
let user_state = UserStateIndex::new();
let user = EntityId::new(1);
user_state.add_follow(user, EntityId::new(10));
let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, Timestamp::now());
assert!(ctx.is_following(EntityId::new(10)));
assert!(!ctx.is_following(EntityId::new(20)));
}
#[test]
fn user_context_interaction_weight_lookup() {
let user_state = UserStateIndex::new();
let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
let user = EntityId::new(1);
let ts = Timestamp::now();
iw_ledger.update_weight(user, EntityId::new(10), 5.0, ts);
let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, ts);
assert!(ctx.interaction_weight(EntityId::new(10)) > 4.0);
assert!((ctx.interaction_weight(EntityId::new(99)) - 0.0).abs() < f64::EPSILON);
}
#[test]
fn parse_for_user_clause() {
// This test depends on the actual parser implementation.
// The expected behavior:
let query = "RETRIEVE items FOR USER @42 USING PROFILE for_you LIMIT 50";
let parsed = parse_retrieve(query).unwrap();
assert_eq!(parsed.user_id, Some(EntityId::new(42)));
}
#[test]
fn parse_similar_to_clause() {
let query = "RETRIEVE items SIMILAR TO @100 FOR USER @42 USING PROFILE related LIMIT 10";
let parsed = parse_retrieve(query).unwrap();
assert_eq!(parsed.user_id, Some(EntityId::new(42)));
assert_eq!(parsed.similar_to, Some(EntityId::new(100)));
}
#[test]
fn parse_without_for_user() {
let query = "RETRIEVE items USING PROFILE trending LIMIT 25";
let parsed = parse_retrieve(query).unwrap();
assert!(parsed.user_id.is_none());
}
```
### Property Tests
```rust
use proptest::prelude::*;
proptest! {
#[test]
fn user_context_follows_set_matches_user_state(
follow_ids in proptest::collection::hash_set(1u64..100, 0..20),
) {
let user_state = UserStateIndex::new();
let user = EntityId::new(1);
for &cid in &follow_ids {
user_state.add_follow(user, EntityId::new(cid));
}
let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, Timestamp::now());
prop_assert_eq!(ctx.followed_creators.len(), follow_ids.len());
for &cid in &follow_ids {
prop_assert!(ctx.followed_creators.contains(&cid),
"followed creator {} not in context", cid);
}
}
}
```
## Acceptance Criteria
- [ ] `UserContext` struct with preference_vector, top_creators, followed/blocked/hidden sets
- [ ] `UserContext::load()` populates all fields from user state index and interaction weights
- [ ] `UserContext::is_cold_start()` returns true when no preference vector
- [ ] `UserContext::is_following()` checks followed creator set
- [ ] `UserContext::interaction_weight()` looks up decayed weight for creator
- [ ] `FOR USER @user_id` clause parsed by query parser
- [ ] `SIMILAR TO @item_id` clause parsed by query parser
- [ ] Query AST extended with `user_id: Option<EntityId>` and `similar_to: Option<EntityId>`
- [ ] RETRIEVE executor loads `UserContext` when `FOR USER` is present
- [ ] `UserStateIndex` extended with `blocked_creator_ids()` and `hidden_item_ids()` accessors
- [ ] Parsing gracefully handles missing `FOR USER` (returns `None`)
- [ ] Property test: follows set in context matches user state index
- [ ] `cargo clippy -- -D warnings` passes
- [ ] All tests pass
## Research References
- [VISION.md](../../../../VISION.md) -- `FOR USER` clause in query language
- [USE_CASES.md](../../../../USE_CASES.md) -- All personalized surfaces require user context
- [ai-lookup/features/query-language.md](../../../../ai-lookup/features/query-language.md) -- Query language reference
## Implementation Notes
- `UserContext::load` reads from in-memory data structures only (no storage I/O except for the preference vector). The user state index, interaction weights, and follows/blocks sets are all in memory. This ensures loading is fast (< 1ms).
- The preference vector is the only component that requires a storage read. The `pref_reader` closure abstracts this, allowing tests to inject mock preference vectors without storage setup.
- The `top_creators` field is limited to 50 entries. At M3 scale (200 creators per user), scanning all interaction weights for a user is fast. At larger scale, a sorted index may be needed.
- The `hidden_item_ids` accessor returns `HashSet<u32>` because `RoaringBitmap` uses `u32` keys. This matches the `UserStateIndex` implementation from m3p1 Task 03.
- The `SIMILAR TO` clause parser should be flexible in ordering: `RETRIEVE items SIMILAR TO @100 FOR USER @42 ...` and `RETRIEVE items FOR USER @42 SIMILAR TO @100 ...` should both work.
- Do NOT implement the personalized scoring logic in this task. This task delivers the context loading and query parsing. The scoring is done in Task 02.