M2: RETRIEVE query pipeline with 5-stage execution (candidate → filter → score → diversify → limit),
usearch HNSW vector index, bitmap/range/universe filters, ranking profiles with signal scoring,
MMR diversity enforcement, and m2_uat integration tests.
M3: Entity system with typed metadata, relationship graph (follows/blocks/interactions),
creator entities, session tracking, and m3_uat integration tests.
M4: Advanced ranking with builtin functions (freshness, trending, controversy, wilson),
ranking executor with explain mode, query executor integration, benchmarks for
query/ranking/vector/filters/diversity, and m4_uat integration tests.
Includes: 9 new blog posts, marketing site updates, updated roadmap, and updated vision doc.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
466 lines
17 KiB
Markdown
466 lines
17 KiB
Markdown
# Task 01: FOR USER Query Context
|
|
|
|
## Context
|
|
|
|
**Milestone:** 3 -- Personalized Ranking
|
|
**Phase:** m3p3 -- Personalized Ranking Profiles
|
|
**Depends On:** m3p2 (feedback loop: `UserStateIndex`, `InteractionWeightLedger`, preference vectors populated), m2p5 (query parser, RETRIEVE executor)
|
|
**Blocks:** Task 02 (Personalized Profiles need `UserContext` for scoring), Task 03 (Cold Start needs `UserContext` to detect cold-start state), m3p4 (User State Filters need `FOR USER` to resolve user state)
|
|
**Complexity:** M
|
|
|
|
## Objective
|
|
|
|
Deliver the `FOR USER @user_id` clause in the query parser and the `UserContext` struct that loads all user state needed for personalized ranking. When a RETRIEVE query includes `FOR USER @42`, the executor loads user 42's preference vector, interaction weights, followed creators, and blocked state into a `UserContext`, and passes it to the profile executor for personalized scoring.
|
|
|
|
The `UserContext` is the bridge between the query language and the personalization engine. Without it, profiles cannot access user state. With it, the same profile definition can produce different rankings for different users.
|
|
|
|
## Requirements
|
|
|
|
- `FOR USER @user_id` clause parsed by the query parser
|
|
- `UserContext` struct with all user state needed for personalized ranking
|
|
- `UserContext::load(user_id, user_state, interaction_weights, storage)` loads all state
|
|
- `UserContext` contains: preference vector, top-N interaction weights, followed creator set, blocked creator set, hidden item set
|
|
- `UserContext::is_cold_start()` returns true if no preference vector
|
|
- Query parser produces `Option<EntityId>` for the user_id from `FOR USER`
|
|
- RETRIEVE executor passes `UserContext` to `ProfileExecutor` when available
|
|
- `ProfileExecutor::score_with_context()` accepts optional `UserContext`
|
|
- `SIMILAR TO @item_id` clause parsed for the `related` profile
|
|
- Query AST extended with `user_id: Option<EntityId>` and `similar_to: Option<EntityId>`
|
|
|
|
## Technical Design
|
|
|
|
### Module Structure
|
|
|
|
```
|
|
tidal/src/
|
|
db/
|
|
user_context.rs -- UserContext struct, load logic
|
|
query/
|
|
mod.rs -- Extended query AST with user_id, similar_to
|
|
```
|
|
|
|
### UserContext
|
|
|
|
```rust
|
|
// === db/user_context.rs ===
|
|
|
|
use std::collections::HashSet;
|
|
use crate::schema::EntityId;
|
|
use crate::entities::preference::PreferenceVector;
|
|
use crate::entities::interaction::InteractionWeightLedger;
|
|
use crate::entities::user_state::UserStateIndex;
|
|
|
|
/// All user state needed for personalized ranking in a single query.
|
|
///
|
|
/// Loaded once per query from the user state index, interaction weight
|
|
/// ledger, and preference vector storage. Passed to the profile executor
|
|
/// for personalized scoring.
|
|
///
|
|
/// If the user does not exist or has no state, fields are empty/None.
|
|
/// The profile executor handles this gracefully by falling back to
|
|
/// population-level signals (cold-start path).
|
|
#[derive(Debug, Clone)]
|
|
pub struct UserContext {
|
|
/// The user performing the query.
|
|
pub user_id: EntityId,
|
|
|
|
/// User preference embedding for ANN retrieval and scoring.
|
|
/// `None` for cold-start users.
|
|
pub preference_vector: Option<PreferenceVector>,
|
|
|
|
/// Top creators by interaction weight (descending).
|
|
/// Used for social proof and creator-affinity scoring.
|
|
/// Typically top-50 creators.
|
|
pub top_creators: Vec<(EntityId, f64)>,
|
|
|
|
/// Set of creator IDs the user follows.
|
|
/// Used by the `following` profile and exploration budget.
|
|
pub followed_creators: HashSet<u64>,
|
|
|
|
/// Set of creator IDs the user has blocked.
|
|
/// Used for hard exclusion filtering.
|
|
pub blocked_creators: HashSet<u64>,
|
|
|
|
/// Set of item IDs the user has hidden.
|
|
/// Used for hard exclusion filtering.
|
|
pub hidden_items: HashSet<u32>,
|
|
|
|
/// Whether this user is a cold-start user (no engagement history).
|
|
pub is_cold_start: bool,
|
|
}
|
|
|
|
impl UserContext {
|
|
/// Load user context from all state sources.
|
|
///
|
|
/// This is called once per RETRIEVE/SEARCH query that includes
|
|
/// `FOR USER @user_id`. It reads from in-memory indexes (fast).
|
|
///
|
|
/// # Parameters
|
|
///
|
|
/// - `user_id`: the querying user
|
|
/// - `user_state`: the global user state index (seen, blocked, follows)
|
|
/// - `interaction_weights`: the interaction weight ledger
|
|
/// - `pref_reader`: closure to read the preference vector from storage
|
|
/// - `now`: current timestamp for decay computation
|
|
pub fn load(
|
|
user_id: EntityId,
|
|
user_state: &UserStateIndex,
|
|
interaction_weights: &InteractionWeightLedger,
|
|
pref_reader: &dyn Fn(EntityId) -> Option<PreferenceVector>,
|
|
now: crate::schema::Timestamp,
|
|
) -> Self {
|
|
let preference_vector = pref_reader(user_id);
|
|
let is_cold_start = preference_vector.as_ref()
|
|
.map_or(true, |p| p.is_cold_start());
|
|
|
|
let top_creators = interaction_weights.read_top_creators(user_id, 50, now);
|
|
|
|
let followed = user_state.followed_creators(user_id);
|
|
let followed_creators: HashSet<u64> = followed.iter()
|
|
.map(|e| e.as_u64())
|
|
.collect();
|
|
|
|
// Read blocked state.
|
|
let blocked_creators = user_state.blocked_creator_ids(user_id);
|
|
let hidden_items = user_state.hidden_item_ids(user_id);
|
|
|
|
Self {
|
|
user_id,
|
|
preference_vector,
|
|
top_creators,
|
|
followed_creators,
|
|
blocked_creators,
|
|
hidden_items,
|
|
is_cold_start,
|
|
}
|
|
}
|
|
|
|
/// Check if a creator is followed by this user.
|
|
pub fn is_following(&self, creator_id: EntityId) -> bool {
|
|
self.followed_creators.contains(&creator_id.as_u64())
|
|
}
|
|
|
|
/// Get the interaction weight for a specific creator.
|
|
///
|
|
/// Returns 0.0 if no interaction history.
|
|
pub fn interaction_weight(&self, creator_id: EntityId) -> f64 {
|
|
self.top_creators.iter()
|
|
.find(|(c, _)| *c == creator_id)
|
|
.map_or(0.0, |(_, w)| *w)
|
|
}
|
|
}
|
|
```
|
|
|
|
### Query Parser Extension
|
|
|
|
```rust
|
|
// Extensions to the query AST in query/mod.rs
|
|
|
|
/// Parsed RETRIEVE query.
|
|
#[derive(Debug, Clone)]
|
|
pub struct RetrieveQuery {
|
|
/// Entity type to retrieve (always "items" for now).
|
|
pub entity_type: String,
|
|
/// Optional user context: `FOR USER @user_id`
|
|
pub user_id: Option<EntityId>,
|
|
/// Optional source item for related queries: `SIMILAR TO @item_id`
|
|
pub similar_to: Option<EntityId>,
|
|
/// Ranking profile name: `USING PROFILE <name>`
|
|
pub profile: Option<String>,
|
|
/// Filter expressions: `FILTER <conditions>`
|
|
pub filters: Vec<FilterExpr>,
|
|
/// Diversity constraints: `DIVERSITY <constraints>`
|
|
pub diversity: Option<DiversityConstraints>,
|
|
/// Result limit: `LIMIT <n>`
|
|
pub limit: Option<usize>,
|
|
/// Excluded IDs: `EXCLUDE [ids]`
|
|
pub excludes: Vec<EntityId>,
|
|
}
|
|
|
|
/// Parse the `FOR USER @<id>` clause.
|
|
///
|
|
/// Returns `Some(EntityId)` if the clause is present.
|
|
fn parse_for_user(tokens: &[Token], pos: &mut usize) -> Option<EntityId> {
|
|
// Look for "FOR" "USER" "@" <numeric_id>
|
|
if *pos + 3 < tokens.len()
|
|
&& tokens[*pos].is_keyword("FOR")
|
|
&& tokens[*pos + 1].is_keyword("USER")
|
|
&& tokens[*pos + 2].is_at_prefix()
|
|
{
|
|
let id_str = tokens[*pos + 2].strip_at_prefix();
|
|
if let Ok(id) = id_str.parse::<u64>() {
|
|
*pos += 3;
|
|
return Some(EntityId::new(id));
|
|
}
|
|
}
|
|
None
|
|
}
|
|
|
|
/// Parse the `SIMILAR TO @<id>` clause.
|
|
///
|
|
/// Returns `Some(EntityId)` if the clause is present.
|
|
fn parse_similar_to(tokens: &[Token], pos: &mut usize) -> Option<EntityId> {
|
|
if *pos + 3 < tokens.len()
|
|
&& tokens[*pos].is_keyword("SIMILAR")
|
|
&& tokens[*pos + 1].is_keyword("TO")
|
|
&& tokens[*pos + 2].is_at_prefix()
|
|
{
|
|
let id_str = tokens[*pos + 2].strip_at_prefix();
|
|
if let Ok(id) = id_str.parse::<u64>() {
|
|
*pos += 3;
|
|
return Some(EntityId::new(id));
|
|
}
|
|
}
|
|
None
|
|
}
|
|
```
|
|
|
|
### Executor Integration
|
|
|
|
```rust
|
|
// In the RETRIEVE executor pipeline:
|
|
|
|
impl TidalDb {
|
|
pub fn retrieve(&self, query: &str) -> crate::Result<Vec<ScoredCandidate>> {
|
|
let parsed = parse_retrieve(query)?;
|
|
|
|
// Load user context if FOR USER is present.
|
|
let user_context = parsed.user_id.map(|uid| {
|
|
UserContext::load(
|
|
uid,
|
|
&self.user_state,
|
|
&self.interaction_weights,
|
|
&|id| self.read_user_preference(id).ok().flatten(),
|
|
Timestamp::now(),
|
|
)
|
|
});
|
|
|
|
// ... candidate retrieval, filtering, scoring ...
|
|
// Pass user_context to the profile executor.
|
|
let scored = match &user_context {
|
|
Some(ctx) => executor.score_with_context(candidates, profile, now, ctx),
|
|
None => executor.score(candidates, profile, now),
|
|
};
|
|
// ...
|
|
}
|
|
}
|
|
```
|
|
|
|
### UserStateIndex Extensions
|
|
|
|
```rust
|
|
// Extensions needed on UserStateIndex (from m3p1 Task 03):
|
|
|
|
impl UserStateIndex {
|
|
/// Get the set of blocked creator IDs for a user.
|
|
pub fn blocked_creator_ids(&self, user_id: EntityId) -> HashSet<u64> {
|
|
self.blocked
|
|
.get(&user_id.as_u64())
|
|
.map_or_else(HashSet::new, |s| s.blocked_creators.clone())
|
|
}
|
|
|
|
/// Get the set of hidden item IDs for a user (as u32 values).
|
|
pub fn hidden_item_ids(&self, user_id: EntityId) -> HashSet<u32> {
|
|
self.blocked
|
|
.get(&user_id.as_u64())
|
|
.map_or_else(HashSet::new, |s| {
|
|
s.hidden_items.iter().collect()
|
|
})
|
|
}
|
|
}
|
|
```
|
|
|
|
## Test Strategy
|
|
|
|
### Unit Tests
|
|
|
|
```rust
|
|
#[test]
|
|
fn user_context_loads_empty_for_unknown_user() {
|
|
let user_state = UserStateIndex::new();
|
|
let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
|
|
|
|
let ctx = UserContext::load(
|
|
EntityId::new(999),
|
|
&user_state,
|
|
&iw_ledger,
|
|
&|_| None,
|
|
Timestamp::now(),
|
|
);
|
|
|
|
assert!(ctx.is_cold_start);
|
|
assert!(ctx.preference_vector.is_none());
|
|
assert!(ctx.followed_creators.is_empty());
|
|
assert!(ctx.blocked_creators.is_empty());
|
|
assert!(ctx.top_creators.is_empty());
|
|
}
|
|
|
|
#[test]
|
|
fn user_context_loads_follows_and_blocks() {
|
|
let user_state = UserStateIndex::new();
|
|
let user = EntityId::new(1);
|
|
|
|
user_state.add_follow(user, EntityId::new(10));
|
|
user_state.add_follow(user, EntityId::new(20));
|
|
user_state.add_block(user, EntityId::new(77));
|
|
|
|
let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
|
|
let ctx = UserContext::load(
|
|
user, &user_state, &iw_ledger, &|_| None, Timestamp::now(),
|
|
);
|
|
|
|
assert_eq!(ctx.followed_creators.len(), 2);
|
|
assert!(ctx.followed_creators.contains(&10));
|
|
assert!(ctx.followed_creators.contains(&20));
|
|
assert!(ctx.blocked_creators.contains(&77));
|
|
}
|
|
|
|
#[test]
|
|
fn user_context_loads_interaction_weights() {
|
|
let user_state = UserStateIndex::new();
|
|
let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
|
|
let user = EntityId::new(1);
|
|
let ts = Timestamp::now();
|
|
|
|
iw_ledger.update_weight(user, EntityId::new(10), 5.0, ts);
|
|
iw_ledger.update_weight(user, EntityId::new(20), 3.0, ts);
|
|
|
|
let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, ts);
|
|
|
|
assert_eq!(ctx.top_creators.len(), 2);
|
|
assert_eq!(ctx.top_creators[0].0, EntityId::new(10)); // highest weight
|
|
}
|
|
|
|
#[test]
|
|
fn user_context_detects_cold_start() {
|
|
let user_state = UserStateIndex::new();
|
|
let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
|
|
|
|
// No preference vector -> cold start.
|
|
let ctx = UserContext::load(
|
|
EntityId::new(1), &user_state, &iw_ledger, &|_| None, Timestamp::now(),
|
|
);
|
|
assert!(ctx.is_cold_start);
|
|
|
|
// With preference vector -> not cold start.
|
|
let pref = PreferenceVector::from_embedding(vec![0.1; 16], 16).unwrap();
|
|
let ctx2 = UserContext::load(
|
|
EntityId::new(1), &user_state, &iw_ledger,
|
|
&|_| Some(pref.clone()), Timestamp::now(),
|
|
);
|
|
assert!(!ctx2.is_cold_start);
|
|
}
|
|
|
|
#[test]
|
|
fn user_context_is_following() {
|
|
let user_state = UserStateIndex::new();
|
|
let user = EntityId::new(1);
|
|
user_state.add_follow(user, EntityId::new(10));
|
|
|
|
let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
|
|
let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, Timestamp::now());
|
|
|
|
assert!(ctx.is_following(EntityId::new(10)));
|
|
assert!(!ctx.is_following(EntityId::new(20)));
|
|
}
|
|
|
|
#[test]
|
|
fn user_context_interaction_weight_lookup() {
|
|
let user_state = UserStateIndex::new();
|
|
let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
|
|
let user = EntityId::new(1);
|
|
let ts = Timestamp::now();
|
|
|
|
iw_ledger.update_weight(user, EntityId::new(10), 5.0, ts);
|
|
|
|
let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, ts);
|
|
|
|
assert!(ctx.interaction_weight(EntityId::new(10)) > 4.0);
|
|
assert!((ctx.interaction_weight(EntityId::new(99)) - 0.0).abs() < f64::EPSILON);
|
|
}
|
|
|
|
#[test]
|
|
fn parse_for_user_clause() {
|
|
// This test depends on the actual parser implementation.
|
|
// The expected behavior:
|
|
let query = "RETRIEVE items FOR USER @42 USING PROFILE for_you LIMIT 50";
|
|
let parsed = parse_retrieve(query).unwrap();
|
|
assert_eq!(parsed.user_id, Some(EntityId::new(42)));
|
|
}
|
|
|
|
#[test]
|
|
fn parse_similar_to_clause() {
|
|
let query = "RETRIEVE items SIMILAR TO @100 FOR USER @42 USING PROFILE related LIMIT 10";
|
|
let parsed = parse_retrieve(query).unwrap();
|
|
assert_eq!(parsed.user_id, Some(EntityId::new(42)));
|
|
assert_eq!(parsed.similar_to, Some(EntityId::new(100)));
|
|
}
|
|
|
|
#[test]
|
|
fn parse_without_for_user() {
|
|
let query = "RETRIEVE items USING PROFILE trending LIMIT 25";
|
|
let parsed = parse_retrieve(query).unwrap();
|
|
assert!(parsed.user_id.is_none());
|
|
}
|
|
```
|
|
|
|
### Property Tests
|
|
|
|
```rust
|
|
use proptest::prelude::*;
|
|
|
|
proptest! {
|
|
#[test]
|
|
fn user_context_follows_set_matches_user_state(
|
|
follow_ids in proptest::collection::hash_set(1u64..100, 0..20),
|
|
) {
|
|
let user_state = UserStateIndex::new();
|
|
let user = EntityId::new(1);
|
|
for &cid in &follow_ids {
|
|
user_state.add_follow(user, EntityId::new(cid));
|
|
}
|
|
|
|
let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
|
|
let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, Timestamp::now());
|
|
|
|
prop_assert_eq!(ctx.followed_creators.len(), follow_ids.len());
|
|
for &cid in &follow_ids {
|
|
prop_assert!(ctx.followed_creators.contains(&cid),
|
|
"followed creator {} not in context", cid);
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [ ] `UserContext` struct with preference_vector, top_creators, followed/blocked/hidden sets
|
|
- [ ] `UserContext::load()` populates all fields from user state index and interaction weights
|
|
- [ ] `UserContext::is_cold_start()` returns true when no preference vector
|
|
- [ ] `UserContext::is_following()` checks followed creator set
|
|
- [ ] `UserContext::interaction_weight()` looks up decayed weight for creator
|
|
- [ ] `FOR USER @user_id` clause parsed by query parser
|
|
- [ ] `SIMILAR TO @item_id` clause parsed by query parser
|
|
- [ ] Query AST extended with `user_id: Option<EntityId>` and `similar_to: Option<EntityId>`
|
|
- [ ] RETRIEVE executor loads `UserContext` when `FOR USER` is present
|
|
- [ ] `UserStateIndex` extended with `blocked_creator_ids()` and `hidden_item_ids()` accessors
|
|
- [ ] Parsing gracefully handles missing `FOR USER` (returns `None`)
|
|
- [ ] Property test: follows set in context matches user state index
|
|
- [ ] `cargo clippy -- -D warnings` passes
|
|
- [ ] All tests pass
|
|
|
|
## Research References
|
|
|
|
- [VISION.md](../../../../VISION.md) -- `FOR USER` clause in query language
|
|
- [USE_CASES.md](../../../../USE_CASES.md) -- All personalized surfaces require user context
|
|
- [ai-lookup/features/query-language.md](../../../../ai-lookup/features/query-language.md) -- Query language reference
|
|
|
|
## Implementation Notes
|
|
|
|
- `UserContext::load` reads from in-memory data structures only (no storage I/O except for the preference vector). The user state index, interaction weights, and follows/blocks sets are all in memory. This ensures loading is fast (< 1ms).
|
|
- The preference vector is the only component that requires a storage read. The `pref_reader` closure abstracts this, allowing tests to inject mock preference vectors without storage setup.
|
|
- The `top_creators` field is limited to 50 entries. At M3 scale (200 creators per user), scanning all interaction weights for a user is fast. At larger scale, a sorted index may be needed.
|
|
- The `hidden_item_ids` accessor returns `HashSet<u32>` because `RoaringBitmap` uses `u32` keys. This matches the `UserStateIndex` implementation from m3p1 Task 03.
|
|
- The `SIMILAR TO` clause parser should be flexible in ordering: `RETRIEVE items SIMILAR TO @100 FOR USER @42 ...` and `RETRIEVE items FOR USER @42 SIMILAR TO @100 ...` should both work.
|
|
- Do NOT implement the personalized scoring logic in this task. This task delivers the context loading and query parsing. The scoring is done in Task 02.
|