tidaldb/docs/planning/milestone-3/phase-3/task-01-for-user-query-context.md

# Task 01: FOR USER Query Context

## Context

**Milestone:** 3 -- Personalized Ranking
**Phase:** m3p3 -- Personalized Ranking Profiles
**Depends On:** m3p2 (feedback loop: `UserStateIndex`, `InteractionWeightLedger`, preference vectors populated), m2p5 (query parser, RETRIEVE executor)
**Blocks:** Task 02 (Personalized Profiles need `UserContext` for scoring), Task 03 (Cold Start needs `UserContext` to detect cold-start state), m3p4 (User State Filters need `FOR USER` to resolve user state)
**Complexity:** M

## Objective

Deliver the `FOR USER @user_id` clause in the query parser and the `UserContext` struct that loads all user state needed for personalized ranking. When a RETRIEVE query includes `FOR USER @42`, the executor loads user 42's preference vector, interaction weights, followed creators, and blocked state into a `UserContext`, and passes it to the profile executor for personalized scoring.

The `UserContext` is the bridge between the query language and the personalization engine. Without it, profiles cannot access user state. With it, the same profile definition can produce different rankings for different users.

## Requirements

- `FOR USER @user_id` clause parsed by the query parser
- `UserContext` struct with all user state needed for personalized ranking
- `UserContext::load(user_id, user_state, interaction_weights, storage)` loads all state
- `UserContext` contains: preference vector, top-N interaction weights, followed creator set, blocked creator set, hidden item set
- `UserContext::is_cold_start()` returns true if no preference vector
- Query parser produces `Option<EntityId>` for the user_id from `FOR USER`
- RETRIEVE executor passes `UserContext` to `ProfileExecutor` when available
- `ProfileExecutor::score_with_context()` accepts optional `UserContext`
- `SIMILAR TO @item_id` clause parsed for the `related` profile
- Query AST extended with `user_id: Option<EntityId>` and `similar_to: Option<EntityId>`

## Technical Design

### Module Structure

```
tidal/src/
  db/
    user_context.rs -- UserContext struct, load logic
  query/
    mod.rs          -- Extended query AST with user_id, similar_to
```

### UserContext

```rust
// === db/user_context.rs ===

use std::collections::HashSet;
use crate::schema::EntityId;
use crate::entities::preference::PreferenceVector;
use crate::entities::interaction::InteractionWeightLedger;
use crate::entities::user_state::UserStateIndex;

/// All user state needed for personalized ranking in a single query.
///
/// Loaded once per query from the user state index, interaction weight
/// ledger, and preference vector storage. Passed to the profile executor
/// for personalized scoring.
///
/// If the user does not exist or has no state, fields are empty/None.
/// The profile executor handles this gracefully by falling back to
/// population-level signals (cold-start path).
#[derive(Debug, Clone)]
pub struct UserContext {
    /// The user performing the query.
    pub user_id: EntityId,

    /// User preference embedding for ANN retrieval and scoring.
    /// `None` for cold-start users.
    pub preference_vector: Option<PreferenceVector>,

    /// Top creators by interaction weight (descending).
    /// Used for social proof and creator-affinity scoring.
    /// Typically top-50 creators.
    pub top_creators: Vec<(EntityId, f64)>,

    /// Set of creator IDs the user follows.
    /// Used by the `following` profile and exploration budget.
    pub followed_creators: HashSet<u64>,

    /// Set of creator IDs the user has blocked.
    /// Used for hard exclusion filtering.
    pub blocked_creators: HashSet<u64>,

    /// Set of item IDs the user has hidden.
    /// Used for hard exclusion filtering.
    pub hidden_items: HashSet<u32>,

    /// Whether this user is a cold-start user (no engagement history).
    pub is_cold_start: bool,
}

impl UserContext {
    /// Load user context from all state sources.
    ///
    /// This is called once per RETRIEVE/SEARCH query that includes
    /// `FOR USER @user_id`. It reads from in-memory indexes (fast).
    ///
    /// # Parameters
    ///
    /// - `user_id`: the querying user
    /// - `user_state`: the global user state index (seen, blocked, follows)
    /// - `interaction_weights`: the interaction weight ledger
    /// - `pref_reader`: closure to read the preference vector from storage
    /// - `now`: current timestamp for decay computation
    pub fn load(
        user_id: EntityId,
        user_state: &UserStateIndex,
        interaction_weights: &InteractionWeightLedger,
        pref_reader: &dyn Fn(EntityId) -> Option<PreferenceVector>,
        now: crate::schema::Timestamp,
    ) -> Self {
        let preference_vector = pref_reader(user_id);
        let is_cold_start = preference_vector.as_ref()
            .map_or(true, |p| p.is_cold_start());

        let top_creators = interaction_weights.read_top_creators(user_id, 50, now);

        let followed = user_state.followed_creators(user_id);
        let followed_creators: HashSet<u64> = followed.iter()
            .map(|e| e.as_u64())
            .collect();

        // Read blocked state.
        let blocked_creators = user_state.blocked_creator_ids(user_id);
        let hidden_items = user_state.hidden_item_ids(user_id);

        Self {
            user_id,
            preference_vector,
            top_creators,
            followed_creators,
            blocked_creators,
            hidden_items,
            is_cold_start,
        }
    }

    /// Check if a creator is followed by this user.
    pub fn is_following(&self, creator_id: EntityId) -> bool {
        self.followed_creators.contains(&creator_id.as_u64())
    }

    /// Get the interaction weight for a specific creator.
    ///
    /// Returns 0.0 if no interaction history.
    pub fn interaction_weight(&self, creator_id: EntityId) -> f64 {
        self.top_creators.iter()
            .find(|(c, _)| *c == creator_id)
            .map_or(0.0, |(_, w)| *w)
    }
}
```

### Query Parser Extension

```rust
// Extensions to the query AST in query/mod.rs

/// Parsed RETRIEVE query.
#[derive(Debug, Clone)]
pub struct RetrieveQuery {
    /// Entity type to retrieve (always "items" for now).
    pub entity_type: String,
    /// Optional user context: `FOR USER @user_id`
    pub user_id: Option<EntityId>,
    /// Optional source item for related queries: `SIMILAR TO @item_id`
    pub similar_to: Option<EntityId>,
    /// Ranking profile name: `USING PROFILE <name>`
    pub profile: Option<String>,
    /// Filter expressions: `FILTER <conditions>`
    pub filters: Vec<FilterExpr>,
    /// Diversity constraints: `DIVERSITY <constraints>`
    pub diversity: Option<DiversityConstraints>,
    /// Result limit: `LIMIT <n>`
    pub limit: Option<usize>,
    /// Excluded IDs: `EXCLUDE [ids]`
    pub excludes: Vec<EntityId>,
}

/// Parse the `FOR USER @<id>` clause.
///
/// Returns `Some(EntityId)` if the clause is present.
fn parse_for_user(tokens: &[Token], pos: &mut usize) -> Option<EntityId> {
    // Look for "FOR" "USER" "@" <numeric_id>
    if *pos + 3 < tokens.len()
        && tokens[*pos].is_keyword("FOR")
        && tokens[*pos + 1].is_keyword("USER")
        && tokens[*pos + 2].is_at_prefix()
    {
        let id_str = tokens[*pos + 2].strip_at_prefix();
        if let Ok(id) = id_str.parse::<u64>() {
            *pos += 3;
            return Some(EntityId::new(id));
        }
    }
    None
}

/// Parse the `SIMILAR TO @<id>` clause.
///
/// Returns `Some(EntityId)` if the clause is present.
fn parse_similar_to(tokens: &[Token], pos: &mut usize) -> Option<EntityId> {
    if *pos + 3 < tokens.len()
        && tokens[*pos].is_keyword("SIMILAR")
        && tokens[*pos + 1].is_keyword("TO")
        && tokens[*pos + 2].is_at_prefix()
    {
        let id_str = tokens[*pos + 2].strip_at_prefix();
        if let Ok(id) = id_str.parse::<u64>() {
            *pos += 3;
            return Some(EntityId::new(id));
        }
    }
    None
}
```

### Executor Integration

```rust
// In the RETRIEVE executor pipeline:

impl TidalDb {
    pub fn retrieve(&self, query: &str) -> crate::Result<Vec<ScoredCandidate>> {
        let parsed = parse_retrieve(query)?;

        // Load user context if FOR USER is present.
        let user_context = parsed.user_id.map(|uid| {
            UserContext::load(
                uid,
                &self.user_state,
                &self.interaction_weights,
                &|id| self.read_user_preference(id).ok().flatten(),
                Timestamp::now(),
            )
        });

        // ... candidate retrieval, filtering, scoring ...
        // Pass user_context to the profile executor.
        let scored = match &user_context {
            Some(ctx) => executor.score_with_context(candidates, profile, now, ctx),
            None => executor.score(candidates, profile, now),
        };
        // ...
    }
}
```

### UserStateIndex Extensions

```rust
// Extensions needed on UserStateIndex (from m3p1 Task 03):

impl UserStateIndex {
    /// Get the set of blocked creator IDs for a user.
    pub fn blocked_creator_ids(&self, user_id: EntityId) -> HashSet<u64> {
        self.blocked
            .get(&user_id.as_u64())
            .map_or_else(HashSet::new, |s| s.blocked_creators.clone())
    }

    /// Get the set of hidden item IDs for a user (as u32 values).
    pub fn hidden_item_ids(&self, user_id: EntityId) -> HashSet<u32> {
        self.blocked
            .get(&user_id.as_u64())
            .map_or_else(HashSet::new, |s| {
                s.hidden_items.iter().collect()
            })
    }
}
```

## Test Strategy

### Unit Tests

```rust
#[test]
fn user_context_loads_empty_for_unknown_user() {
    let user_state = UserStateIndex::new();
    let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());

    let ctx = UserContext::load(
        EntityId::new(999),
        &user_state,
        &iw_ledger,
        &|_| None,
        Timestamp::now(),
    );

    assert!(ctx.is_cold_start);
    assert!(ctx.preference_vector.is_none());
    assert!(ctx.followed_creators.is_empty());
    assert!(ctx.blocked_creators.is_empty());
    assert!(ctx.top_creators.is_empty());
}

#[test]
fn user_context_loads_follows_and_blocks() {
    let user_state = UserStateIndex::new();
    let user = EntityId::new(1);

    user_state.add_follow(user, EntityId::new(10));
    user_state.add_follow(user, EntityId::new(20));
    user_state.add_block(user, EntityId::new(77));

    let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
    let ctx = UserContext::load(
        user, &user_state, &iw_ledger, &|_| None, Timestamp::now(),
    );

    assert_eq!(ctx.followed_creators.len(), 2);
    assert!(ctx.followed_creators.contains(&10));
    assert!(ctx.followed_creators.contains(&20));
    assert!(ctx.blocked_creators.contains(&77));
}

#[test]
fn user_context_loads_interaction_weights() {
    let user_state = UserStateIndex::new();
    let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
    let user = EntityId::new(1);
    let ts = Timestamp::now();

    iw_ledger.update_weight(user, EntityId::new(10), 5.0, ts);
    iw_ledger.update_weight(user, EntityId::new(20), 3.0, ts);

    let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, ts);

    assert_eq!(ctx.top_creators.len(), 2);
    assert_eq!(ctx.top_creators[0].0, EntityId::new(10)); // highest weight
}

#[test]
fn user_context_detects_cold_start() {
    let user_state = UserStateIndex::new();
    let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());

    // No preference vector -> cold start.
    let ctx = UserContext::load(
        EntityId::new(1), &user_state, &iw_ledger, &|_| None, Timestamp::now(),
    );
    assert!(ctx.is_cold_start);

    // With preference vector -> not cold start.
    let pref = PreferenceVector::from_embedding(vec![0.1; 16], 16).unwrap();
    let ctx2 = UserContext::load(
        EntityId::new(1), &user_state, &iw_ledger,
        &|_| Some(pref.clone()), Timestamp::now(),
    );
    assert!(!ctx2.is_cold_start);
}

#[test]
fn user_context_is_following() {
    let user_state = UserStateIndex::new();
    let user = EntityId::new(1);
    user_state.add_follow(user, EntityId::new(10));

    let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
    let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, Timestamp::now());

    assert!(ctx.is_following(EntityId::new(10)));
    assert!(!ctx.is_following(EntityId::new(20)));
}

#[test]
fn user_context_interaction_weight_lookup() {
    let user_state = UserStateIndex::new();
    let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
    let user = EntityId::new(1);
    let ts = Timestamp::now();

    iw_ledger.update_weight(user, EntityId::new(10), 5.0, ts);

    let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, ts);

    assert!(ctx.interaction_weight(EntityId::new(10)) > 4.0);
    assert!((ctx.interaction_weight(EntityId::new(99)) - 0.0).abs() < f64::EPSILON);
}

#[test]
fn parse_for_user_clause() {
    // This test depends on the actual parser implementation.
    // The expected behavior:
    let query = "RETRIEVE items FOR USER @42 USING PROFILE for_you LIMIT 50";
    let parsed = parse_retrieve(query).unwrap();
    assert_eq!(parsed.user_id, Some(EntityId::new(42)));
}

#[test]
fn parse_similar_to_clause() {
    let query = "RETRIEVE items SIMILAR TO @100 FOR USER @42 USING PROFILE related LIMIT 10";
    let parsed = parse_retrieve(query).unwrap();
    assert_eq!(parsed.user_id, Some(EntityId::new(42)));
    assert_eq!(parsed.similar_to, Some(EntityId::new(100)));
}

#[test]
fn parse_without_for_user() {
    let query = "RETRIEVE items USING PROFILE trending LIMIT 25";
    let parsed = parse_retrieve(query).unwrap();
    assert!(parsed.user_id.is_none());
}
```

### Property Tests

```rust
use proptest::prelude::*;

proptest! {
    #[test]
    fn user_context_follows_set_matches_user_state(
        follow_ids in proptest::collection::hash_set(1u64..100, 0..20),
    ) {
        let user_state = UserStateIndex::new();
        let user = EntityId::new(1);
        for &cid in &follow_ids {
            user_state.add_follow(user, EntityId::new(cid));
        }

        let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
        let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, Timestamp::now());

        prop_assert_eq!(ctx.followed_creators.len(), follow_ids.len());
        for &cid in &follow_ids {
            prop_assert!(ctx.followed_creators.contains(&cid),
                "followed creator {} not in context", cid);
        }
    }
}
```

## Acceptance Criteria

- [ ] `UserContext` struct with preference_vector, top_creators, followed/blocked/hidden sets
- [ ] `UserContext::load()` populates all fields from user state index and interaction weights
- [ ] `UserContext::is_cold_start()` returns true when no preference vector
- [ ] `UserContext::is_following()` checks followed creator set
- [ ] `UserContext::interaction_weight()` looks up decayed weight for creator
- [ ] `FOR USER @user_id` clause parsed by query parser
- [ ] `SIMILAR TO @item_id` clause parsed by query parser
- [ ] Query AST extended with `user_id: Option<EntityId>` and `similar_to: Option<EntityId>`
- [ ] RETRIEVE executor loads `UserContext` when `FOR USER` is present
- [ ] `UserStateIndex` extended with `blocked_creator_ids()` and `hidden_item_ids()` accessors
- [ ] Parsing gracefully handles missing `FOR USER` (returns `None`)
- [ ] Property test: follows set in context matches user state index
- [ ] `cargo clippy -- -D warnings` passes
- [ ] All tests pass

## Research References

- [VISION.md](../../../../VISION.md) -- `FOR USER` clause in query language
- [USE_CASES.md](../../../../USE_CASES.md) -- All personalized surfaces require user context
- [ai-lookup/features/query-language.md](../../../../ai-lookup/features/query-language.md) -- Query language reference

## Implementation Notes

- `UserContext::load` reads from in-memory data structures only (no storage I/O except for the preference vector). The user state index, interaction weights, and follows/blocks sets are all in memory. This ensures loading is fast (< 1ms).
- The preference vector is the only component that requires a storage read. The `pref_reader` closure abstracts this, allowing tests to inject mock preference vectors without storage setup.
- The `top_creators` field is limited to 50 entries. At M3 scale (200 creators per user), scanning all interaction weights for a user is fast. At larger scale, a sorted index may be needed.
- The `hidden_item_ids` accessor returns `HashSet<u32>` because `RoaringBitmap` uses `u32` keys. This matches the `UserStateIndex` implementation from m3p1 Task 03.
- The `SIMILAR TO` clause parser should be flexible in ordering: `RETRIEVE items SIMILAR TO @100 FOR USER @42 ...` and `RETRIEVE items FOR USER @42 SIMILAR TO @100 ...` should both work.
- Do NOT implement the personalized scoring logic in this task. This task delivers the context loading and query parsing. The scoring is done in Task 02.