jordan 39ada28c6e feat: complete Milestones 2–4 — RETRIEVE query, vector index, ranking profiles, diversity, entity system, sessions

M2: RETRIEVE query pipeline with 5-stage execution (candidate → filter → score → diversify → limit),
    usearch HNSW vector index, bitmap/range/universe filters, ranking profiles with signal scoring,
    MMR diversity enforcement, and m2_uat integration tests.

M3: Entity system with typed metadata, relationship graph (follows/blocks/interactions),
    creator entities, session tracking, and m3_uat integration tests.

M4: Advanced ranking with builtin functions (freshness, trending, controversy, wilson),
    ranking executor with explain mode, query executor integration, benchmarks for
    query/ranking/vector/filters/diversity, and m4_uat integration tests.

Includes: 9 new blog posts, marketing site updates, updated roadmap, and updated vision doc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-21 16:24:48 -07:00

17 KiB

Raw Blame History

Task 01: FOR USER Query Context

Context

Milestone: 3 -- Personalized Ranking Phase: m3p3 -- Personalized Ranking Profiles Depends On: m3p2 (feedback loop: UserStateIndex, InteractionWeightLedger, preference vectors populated), m2p5 (query parser, RETRIEVE executor) Blocks: Task 02 (Personalized Profiles need UserContext for scoring), Task 03 (Cold Start needs UserContext to detect cold-start state), m3p4 (User State Filters need FOR USER to resolve user state) Complexity: M

Objective

Deliver the FOR USER @user_id clause in the query parser and the UserContext struct that loads all user state needed for personalized ranking. When a RETRIEVE query includes FOR USER @42, the executor loads user 42's preference vector, interaction weights, followed creators, and blocked state into a UserContext, and passes it to the profile executor for personalized scoring.

The UserContext is the bridge between the query language and the personalization engine. Without it, profiles cannot access user state. With it, the same profile definition can produce different rankings for different users.

Requirements

FOR USER @user_id clause parsed by the query parser
UserContext struct with all user state needed for personalized ranking
UserContext::load(user_id, user_state, interaction_weights, storage) loads all state
UserContext contains: preference vector, top-N interaction weights, followed creator set, blocked creator set, hidden item set
UserContext::is_cold_start() returns true if no preference vector
Query parser produces Option<EntityId> for the user_id from FOR USER
RETRIEVE executor passes UserContext to ProfileExecutor when available
ProfileExecutor::score_with_context() accepts optional UserContext
SIMILAR TO @item_id clause parsed for the related profile
Query AST extended with user_id: Option<EntityId> and similar_to: Option<EntityId>

Technical Design

Module Structure

tidal/src/
  db/
    user_context.rs -- UserContext struct, load logic
  query/
    mod.rs          -- Extended query AST with user_id, similar_to

UserContext

// === db/user_context.rs ===

use std::collections::HashSet;
use crate::schema::EntityId;
use crate::entities::preference::PreferenceVector;
use crate::entities::interaction::InteractionWeightLedger;
use crate::entities::user_state::UserStateIndex;

/// All user state needed for personalized ranking in a single query.
///
/// Loaded once per query from the user state index, interaction weight
/// ledger, and preference vector storage. Passed to the profile executor
/// for personalized scoring.
///
/// If the user does not exist or has no state, fields are empty/None.
/// The profile executor handles this gracefully by falling back to
/// population-level signals (cold-start path).
#[derive(Debug, Clone)]
pub struct UserContext {
    /// The user performing the query.
    pub user_id: EntityId,

    /// User preference embedding for ANN retrieval and scoring.
    /// `None` for cold-start users.
    pub preference_vector: Option<PreferenceVector>,

    /// Top creators by interaction weight (descending).
    /// Used for social proof and creator-affinity scoring.
    /// Typically top-50 creators.
    pub top_creators: Vec<(EntityId, f64)>,

    /// Set of creator IDs the user follows.
    /// Used by the `following` profile and exploration budget.
    pub followed_creators: HashSet<u64>,

    /// Set of creator IDs the user has blocked.
    /// Used for hard exclusion filtering.
    pub blocked_creators: HashSet<u64>,

    /// Set of item IDs the user has hidden.
    /// Used for hard exclusion filtering.
    pub hidden_items: HashSet<u32>,

    /// Whether this user is a cold-start user (no engagement history).
    pub is_cold_start: bool,
}

impl UserContext {
    /// Load user context from all state sources.
    ///
    /// This is called once per RETRIEVE/SEARCH query that includes
    /// `FOR USER @user_id`. It reads from in-memory indexes (fast).
    ///
    /// # Parameters
    ///
    /// - `user_id`: the querying user
    /// - `user_state`: the global user state index (seen, blocked, follows)
    /// - `interaction_weights`: the interaction weight ledger
    /// - `pref_reader`: closure to read the preference vector from storage
    /// - `now`: current timestamp for decay computation
    pub fn load(
        user_id: EntityId,
        user_state: &UserStateIndex,
        interaction_weights: &InteractionWeightLedger,
        pref_reader: &dyn Fn(EntityId) -> Option<PreferenceVector>,
        now: crate::schema::Timestamp,
    ) -> Self {
        let preference_vector = pref_reader(user_id);
        let is_cold_start = preference_vector.as_ref()
            .map_or(true, |p| p.is_cold_start());

        let top_creators = interaction_weights.read_top_creators(user_id, 50, now);

        let followed = user_state.followed_creators(user_id);
        let followed_creators: HashSet<u64> = followed.iter()
            .map(|e| e.as_u64())
            .collect();

        // Read blocked state.
        let blocked_creators = user_state.blocked_creator_ids(user_id);
        let hidden_items = user_state.hidden_item_ids(user_id);

        Self {
            user_id,
            preference_vector,
            top_creators,
            followed_creators,
            blocked_creators,
            hidden_items,
            is_cold_start,
        }
    }

    /// Check if a creator is followed by this user.
    pub fn is_following(&self, creator_id: EntityId) -> bool {
        self.followed_creators.contains(&creator_id.as_u64())
    }

    /// Get the interaction weight for a specific creator.
    ///
    /// Returns 0.0 if no interaction history.
    pub fn interaction_weight(&self, creator_id: EntityId) -> f64 {
        self.top_creators.iter()
            .find(|(c, _)| *c == creator_id)
            .map_or(0.0, |(_, w)| *w)
    }
}

Query Parser Extension

// Extensions to the query AST in query/mod.rs

/// Parsed RETRIEVE query.
#[derive(Debug, Clone)]
pub struct RetrieveQuery {
    /// Entity type to retrieve (always "items" for now).
    pub entity_type: String,
    /// Optional user context: `FOR USER @user_id`
    pub user_id: Option<EntityId>,
    /// Optional source item for related queries: `SIMILAR TO @item_id`
    pub similar_to: Option<EntityId>,
    /// Ranking profile name: `USING PROFILE <name>`
    pub profile: Option<String>,
    /// Filter expressions: `FILTER <conditions>`
    pub filters: Vec<FilterExpr>,
    /// Diversity constraints: `DIVERSITY <constraints>`
    pub diversity: Option<DiversityConstraints>,
    /// Result limit: `LIMIT <n>`
    pub limit: Option<usize>,
    /// Excluded IDs: `EXCLUDE [ids]`
    pub excludes: Vec<EntityId>,
}

/// Parse the `FOR USER @<id>` clause.
///
/// Returns `Some(EntityId)` if the clause is present.
fn parse_for_user(tokens: &[Token], pos: &mut usize) -> Option<EntityId> {
    // Look for "FOR" "USER" "@" <numeric_id>
    if *pos + 3 < tokens.len()
        && tokens[*pos].is_keyword("FOR")
        && tokens[*pos + 1].is_keyword("USER")
        && tokens[*pos + 2].is_at_prefix()
    {
        let id_str = tokens[*pos + 2].strip_at_prefix();
        if let Ok(id) = id_str.parse::<u64>() {
            *pos += 3;
            return Some(EntityId::new(id));
        }
    }
    None
}

/// Parse the `SIMILAR TO @<id>` clause.
///
/// Returns `Some(EntityId)` if the clause is present.
fn parse_similar_to(tokens: &[Token], pos: &mut usize) -> Option<EntityId> {
    if *pos + 3 < tokens.len()
        && tokens[*pos].is_keyword("SIMILAR")
        && tokens[*pos + 1].is_keyword("TO")
        && tokens[*pos + 2].is_at_prefix()
    {
        let id_str = tokens[*pos + 2].strip_at_prefix();
        if let Ok(id) = id_str.parse::<u64>() {
            *pos += 3;
            return Some(EntityId::new(id));
        }
    }
    None
}

Executor Integration

// In the RETRIEVE executor pipeline:

impl TidalDb {
    pub fn retrieve(&self, query: &str) -> crate::Result<Vec<ScoredCandidate>> {
        let parsed = parse_retrieve(query)?;

        // Load user context if FOR USER is present.
        let user_context = parsed.user_id.map(|uid| {
            UserContext::load(
                uid,
                &self.user_state,
                &self.interaction_weights,
                &|id| self.read_user_preference(id).ok().flatten(),
                Timestamp::now(),
            )
        });

        // ... candidate retrieval, filtering, scoring ...
        // Pass user_context to the profile executor.
        let scored = match &user_context {
            Some(ctx) => executor.score_with_context(candidates, profile, now, ctx),
            None => executor.score(candidates, profile, now),
        };
        // ...
    }
}

UserStateIndex Extensions

// Extensions needed on UserStateIndex (from m3p1 Task 03):

impl UserStateIndex {
    /// Get the set of blocked creator IDs for a user.
    pub fn blocked_creator_ids(&self, user_id: EntityId) -> HashSet<u64> {
        self.blocked
            .get(&user_id.as_u64())
            .map_or_else(HashSet::new, |s| s.blocked_creators.clone())
    }

    /// Get the set of hidden item IDs for a user (as u32 values).
    pub fn hidden_item_ids(&self, user_id: EntityId) -> HashSet<u32> {
        self.blocked
            .get(&user_id.as_u64())
            .map_or_else(HashSet::new, |s| {
                s.hidden_items.iter().collect()
            })
    }
}

Test Strategy

Unit Tests

#[test]
fn user_context_loads_empty_for_unknown_user() {
    let user_state = UserStateIndex::new();
    let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());

    let ctx = UserContext::load(
        EntityId::new(999),
        &user_state,
        &iw_ledger,
        &|_| None,
        Timestamp::now(),
    );

    assert!(ctx.is_cold_start);
    assert!(ctx.preference_vector.is_none());
    assert!(ctx.followed_creators.is_empty());
    assert!(ctx.blocked_creators.is_empty());
    assert!(ctx.top_creators.is_empty());
}

#[test]
fn user_context_loads_follows_and_blocks() {
    let user_state = UserStateIndex::new();
    let user = EntityId::new(1);

    user_state.add_follow(user, EntityId::new(10));
    user_state.add_follow(user, EntityId::new(20));
    user_state.add_block(user, EntityId::new(77));

    let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
    let ctx = UserContext::load(
        user, &user_state, &iw_ledger, &|_| None, Timestamp::now(),
    );

    assert_eq!(ctx.followed_creators.len(), 2);
    assert!(ctx.followed_creators.contains(&10));
    assert!(ctx.followed_creators.contains(&20));
    assert!(ctx.blocked_creators.contains(&77));
}

#[test]
fn user_context_loads_interaction_weights() {
    let user_state = UserStateIndex::new();
    let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
    let user = EntityId::new(1);
    let ts = Timestamp::now();

    iw_ledger.update_weight(user, EntityId::new(10), 5.0, ts);
    iw_ledger.update_weight(user, EntityId::new(20), 3.0, ts);

    let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, ts);

    assert_eq!(ctx.top_creators.len(), 2);
    assert_eq!(ctx.top_creators[0].0, EntityId::new(10)); // highest weight
}

#[test]
fn user_context_detects_cold_start() {
    let user_state = UserStateIndex::new();
    let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());

    // No preference vector -> cold start.
    let ctx = UserContext::load(
        EntityId::new(1), &user_state, &iw_ledger, &|_| None, Timestamp::now(),
    );
    assert!(ctx.is_cold_start);

    // With preference vector -> not cold start.
    let pref = PreferenceVector::from_embedding(vec![0.1; 16], 16).unwrap();
    let ctx2 = UserContext::load(
        EntityId::new(1), &user_state, &iw_ledger,
        &|_| Some(pref.clone()), Timestamp::now(),
    );
    assert!(!ctx2.is_cold_start);
}

#[test]
fn user_context_is_following() {
    let user_state = UserStateIndex::new();
    let user = EntityId::new(1);
    user_state.add_follow(user, EntityId::new(10));

    let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
    let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, Timestamp::now());

    assert!(ctx.is_following(EntityId::new(10)));
    assert!(!ctx.is_following(EntityId::new(20)));
}

#[test]
fn user_context_interaction_weight_lookup() {
    let user_state = UserStateIndex::new();
    let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
    let user = EntityId::new(1);
    let ts = Timestamp::now();

    iw_ledger.update_weight(user, EntityId::new(10), 5.0, ts);

    let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, ts);

    assert!(ctx.interaction_weight(EntityId::new(10)) > 4.0);
    assert!((ctx.interaction_weight(EntityId::new(99)) - 0.0).abs() < f64::EPSILON);
}

#[test]
fn parse_for_user_clause() {
    // This test depends on the actual parser implementation.
    // The expected behavior:
    let query = "RETRIEVE items FOR USER @42 USING PROFILE for_you LIMIT 50";
    let parsed = parse_retrieve(query).unwrap();
    assert_eq!(parsed.user_id, Some(EntityId::new(42)));
}

#[test]
fn parse_similar_to_clause() {
    let query = "RETRIEVE items SIMILAR TO @100 FOR USER @42 USING PROFILE related LIMIT 10";
    let parsed = parse_retrieve(query).unwrap();
    assert_eq!(parsed.user_id, Some(EntityId::new(42)));
    assert_eq!(parsed.similar_to, Some(EntityId::new(100)));
}

#[test]
fn parse_without_for_user() {
    let query = "RETRIEVE items USING PROFILE trending LIMIT 25";
    let parsed = parse_retrieve(query).unwrap();
    assert!(parsed.user_id.is_none());
}

Property Tests

use proptest::prelude::*;

proptest! {
    #[test]
    fn user_context_follows_set_matches_user_state(
        follow_ids in proptest::collection::hash_set(1u64..100, 0..20),
    ) {
        let user_state = UserStateIndex::new();
        let user = EntityId::new(1);
        for &cid in &follow_ids {
            user_state.add_follow(user, EntityId::new(cid));
        }

        let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default());
        let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, Timestamp::now());

        prop_assert_eq!(ctx.followed_creators.len(), follow_ids.len());
        for &cid in &follow_ids {
            prop_assert!(ctx.followed_creators.contains(&cid),
                "followed creator {} not in context", cid);
        }
    }
}

Acceptance Criteria

UserContext struct with preference_vector, top_creators, followed/blocked/hidden sets
UserContext::load() populates all fields from user state index and interaction weights
UserContext::is_cold_start() returns true when no preference vector
UserContext::is_following() checks followed creator set
UserContext::interaction_weight() looks up decayed weight for creator
FOR USER @user_id clause parsed by query parser
SIMILAR TO @item_id clause parsed by query parser
Query AST extended with user_id: Option<EntityId> and similar_to: Option<EntityId>
RETRIEVE executor loads UserContext when FOR USER is present
UserStateIndex extended with blocked_creator_ids() and hidden_item_ids() accessors
Parsing gracefully handles missing FOR USER (returns None)
Property test: follows set in context matches user state index
cargo clippy -- -D warnings passes
All tests pass

Research References

VISION.md -- FOR USER clause in query language
USE_CASES.md -- All personalized surfaces require user context
ai-lookup/features/query-language.md -- Query language reference

Implementation Notes

UserContext::load reads from in-memory data structures only (no storage I/O except for the preference vector). The user state index, interaction weights, and follows/blocks sets are all in memory. This ensures loading is fast (< 1ms).
The preference vector is the only component that requires a storage read. The pref_reader closure abstracts this, allowing tests to inject mock preference vectors without storage setup.
The top_creators field is limited to 50 entries. At M3 scale (200 creators per user), scanning all interaction weights for a user is fast. At larger scale, a sorted index may be needed.
The hidden_item_ids accessor returns HashSet<u32> because RoaringBitmap uses u32 keys. This matches the UserStateIndex implementation from m3p1 Task 03.
The SIMILAR TO clause parser should be flexible in ordering: RETRIEVE items SIMILAR TO @100 FOR USER @42 ... and RETRIEVE items FOR USER @42 SIMILAR TO @100 ... should both work.
Do NOT implement the personalized scoring logic in this task. This task delivers the context loading and query parsing. The scoring is done in Task 02.

17 KiB Raw Blame History