jordan 39ada28c6e feat: complete Milestones 2–4 — RETRIEVE query, vector index, ranking profiles, diversity, entity system, sessions

M2: RETRIEVE query pipeline with 5-stage execution (candidate → filter → score → diversify → limit),
    usearch HNSW vector index, bitmap/range/universe filters, ranking profiles with signal scoring,
    MMR diversity enforcement, and m2_uat integration tests.

M3: Entity system with typed metadata, relationship graph (follows/blocks/interactions),
    creator entities, session tracking, and m3_uat integration tests.

M4: Advanced ranking with builtin functions (freshness, trending, controversy, wilson),
    ranking executor with explain mode, query executor integration, benchmarks for
    query/ranking/vector/filters/diversity, and m4_uat integration tests.

Includes: 9 new blog posts, marketing site updates, updated roadmap, and updated vision doc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-21 16:24:48 -07:00

21 KiB

Raw Blame History

Task 02: Personalized Profiles

Context

Milestone: 3 -- Personalized Ranking Phase: m3p3 -- Personalized Ranking Profiles Depends On: Task 01 (UserContext with preference vector, interaction weights, follows), m2p3 (profile engine, ProfileExecutor, RankingProfile), m2p1 (vector index for ANN retrieval), m2p4 (diversity enforcement) Blocks: Task 03 (Cold Start and Exploration needs for_you profile to inject exploration candidates) Complexity: L

Objective

Deliver four personalized ranking profiles: for_you, following, related, and notification. These profiles are registered as builtins in the ProfileRegistry alongside the existing M2 profiles (trending, hot, new, etc.). Each profile uses the UserContext from Task 01 to personalize candidate retrieval and scoring.

The ProfileExecutor is extended with a score_with_context method that accepts an optional UserContext. When user context is available, the executor applies personalization factors: preference match (cosine similarity between user and item embeddings), creator affinity (interaction weight boost), and social proof (engagement from followed creators).

These four profiles cover UC-01 (For You Feed), UC-04 (Following Feed), UC-05 (Related/Up Next), and UC-07 (Notifications).

Requirements

for_you Profile

Candidate strategy: ANN retrieval using user preference vector (top 200 candidates)
Scoring formula: preference_match * 0.4 + engagement_velocity * 0.3 + recency * 0.2 + creator_affinity * 0.1
preference_match: cosine similarity between user preference vector and item embedding
engagement_velocity: normalized view + share velocity from signal ledger
recency: exponential decay from item age (half-life 48h)
creator_affinity: interaction weight between user and item's creator, normalized to [0, 1]
Gate: completion_rate > 0.02 (filters very low quality)
Diversity: max_per_creator from query, format_mix 0.6
Exploration: 10% budget (injected by Task 03)

following Profile

Candidate strategy: relationship-based (items from followed creators only)
Scoring: created_at DESC (chronological), with tiebreaker on completion_rate
No engagement-based scoring -- chronological is the default for following feeds
No diversity enforcement (creator identity IS the filter)
No exploration budget

Candidate strategy: ANN retrieval using source item embedding (top 100 candidates)
Scoring: item_similarity * 0.5 + preference_match * 0.3 + engagement * 0.2
item_similarity: cosine between source item and candidate item embeddings
preference_match: cosine between user preference vector and candidate, if available
engagement: normalized population signals (view count, like count)
Filter: exclude the source item itself
Diversity: max_per_creator:2

notification Profile

Candidate strategy: scan recent items from followed creators (last 48h)
Scoring: relationship_strength * 0.6 + item_quality * 0.4
relationship_strength: interaction weight between user and creator, normalized
item_quality: composite of view velocity + completion rate
Filter: only items from followed creators, created within 48h
Sort: descending by score

Technical Design

Module Structure

tidal/src/
  ranking/
    personalized.rs -- Personalized scoring functions
    builtins.rs     -- Extended with new profile definitions

Personalized Scoring Functions

// === ranking/personalized.rs ===

use crate::db::user_context::UserContext;
use crate::schema::{EntityId, Timestamp, Window};
use crate::signals::SignalLedger;

/// Compute the preference match score between a user and an item.
///
/// Returns cosine similarity in [-1.0, 1.0], remapped to [0.0, 1.0].
/// Returns 0.5 (neutral) if either vector is unavailable.
pub fn preference_match(
    user_ctx: &UserContext,
    item_embedding: Option<&[f32]>,
) -> f64 {
    match (&user_ctx.preference_vector, item_embedding) {
        (Some(pref), Some(item)) => {
            if let Some(pref_data) = pref.as_slice() {
                if pref_data.len() == item.len() {
                    let cosine: f64 = pref_data.iter()
                        .zip(item.iter())
                        .map(|(&a, &b)| f64::from(a) * f64::from(b))
                        .sum();
                    // Remap [-1, 1] to [0, 1].
                    (cosine + 1.0) / 2.0
                } else {
                    0.5 // Dimension mismatch: neutral score
                }
            } else {
                0.5 // Cold start: neutral score
            }
        }
        _ => 0.5, // Missing data: neutral score
    }
}

/// Compute the creator affinity score for a user-creator pair.
///
/// Normalizes the interaction weight to [0.0, 1.0] using a sigmoid-like
/// transformation: `affinity = weight / (weight + k)` where k is a
/// half-saturation constant (default 5.0).
pub fn creator_affinity(
    user_ctx: &UserContext,
    creator_id: Option<EntityId>,
) -> f64 {
    const K: f64 = 5.0; // Half-saturation constant
    match creator_id {
        Some(cid) => {
            let weight = user_ctx.interaction_weight(cid);
            weight / (weight + K)
        }
        None => 0.0,
    }
}

/// Compute a recency score based on item age.
///
/// Uses exponential decay with a 48-hour half-life.
/// Items created at `now` get score 1.0; items 48h old get 0.5.
pub fn recency_score(
    created_at_ns: u64,
    now: Timestamp,
) -> f64 {
    let now_ns = now.as_nanos();
    if created_at_ns >= now_ns {
        return 1.0;
    }
    let age_secs = (now_ns - created_at_ns) as f64 / 1_000_000_000.0;
    let half_life_secs = 48.0 * 3600.0;
    let lambda = std::f64::consts::LN_2 / half_life_secs;
    (-lambda * age_secs).exp()
}

/// Composite for_you score for a single candidate.
pub fn for_you_score(
    pref_match: f64,
    engagement_vel: f64,
    recency: f64,
    affinity: f64,
) -> f64 {
    pref_match * 0.4 + engagement_vel * 0.3 + recency * 0.2 + affinity * 0.1
}

/// Composite related score for a single candidate.
pub fn related_score(
    item_similarity: f64,
    pref_match: f64,
    engagement: f64,
) -> f64 {
    item_similarity * 0.5 + pref_match * 0.3 + engagement * 0.2
}

/// Composite notification score for a single candidate.
pub fn notification_score(
    relationship_strength: f64,
    item_quality: f64,
) -> f64 {
    relationship_strength * 0.6 + item_quality * 0.4
}

Profile Definitions

// === ranking/builtins.rs (extensions) ===

/// Register the personalized profiles.
pub fn register_personalized_builtins(registry: &mut ProfileRegistry) -> crate::Result<()> {
    // for_you
    registry.register(RankingProfile {
        name: "for_you".into(),
        version: 1,
        candidate_strategy: CandidateStrategy::Ann {
            slot: "user_preference".into(),
            limit: 200,
        },
        boosts: vec![],
        decay: None,
        gates: vec![Gate {
            signal: "completion".into(),
            agg: SignalAgg::Ratio,
            window: Window::AllTime,
            min_threshold: 0.02,
        }],
        penalties: vec![Penalty {
            signal: "skip".into(),
            agg: SignalAgg::Value,
            window: Window::TwentyFourHours,
            weight: 0.1,
        }],
        excludes: vec![],
        diversity: DiversitySpec {
            max_per_creator: Some(2),
            format_mix_max_fraction: Some(0.6),
        },
        exploration: 0.1, // 10%
        sort: None, // Custom scoring via score_with_context
        is_builtin: true,
    })?;

    // following
    registry.register(RankingProfile {
        name: "following".into(),
        version: 1,
        candidate_strategy: CandidateStrategy::Relationship,
        boosts: vec![],
        decay: None,
        gates: vec![],
        penalties: vec![],
        excludes: vec![],
        diversity: DiversitySpec::default(),
        exploration: 0.0,
        sort: Some(Sort::New), // Chronological
        is_builtin: true,
    })?;

    // related
    registry.register(RankingProfile {
        name: "related".into(),
        version: 1,
        candidate_strategy: CandidateStrategy::Ann {
            slot: "default".into(), // Source item embedding
            limit: 100,
        },
        boosts: vec![],
        decay: None,
        gates: vec![],
        penalties: vec![],
        excludes: vec![],
        diversity: DiversitySpec {
            max_per_creator: Some(2),
            format_mix_max_fraction: None,
        },
        exploration: 0.0,
        sort: None, // Custom scoring via score_with_context
        is_builtin: true,
    })?;

    // notification
    registry.register(RankingProfile {
        name: "notification".into(),
        version: 1,
        candidate_strategy: CandidateStrategy::Relationship,
        boosts: vec![],
        decay: None,
        gates: vec![],
        penalties: vec![],
        excludes: vec![],
        diversity: DiversitySpec::default(),
        exploration: 0.0,
        sort: None, // Custom scoring via score_with_context
        is_builtin: true,
    })?;

    Ok(())
}

ProfileExecutor Extension

impl<'a> ProfileExecutor<'a> {
    /// Score candidates with user context for personalized ranking.
    ///
    /// When `user_ctx` is provided, the executor uses personalized scoring
    /// functions. The profile name determines which scoring formula is used:
    /// - `for_you`: preference match + engagement + recency + affinity
    /// - `related`: item similarity + preference match + engagement
    /// - `notification`: relationship strength + item quality
    /// - All others: delegates to `score()` (population-level)
    pub fn score_with_context(
        &self,
        candidates: &[EntityId],
        profile: &RankingProfile,
        now: Timestamp,
        user_ctx: &UserContext,
        item_embeddings: &dyn Fn(EntityId) -> Option<Vec<f32>>,
        item_created_at: &dyn Fn(EntityId) -> Option<u64>,
    ) -> Vec<ScoredCandidate> {
        let profile_name = profile.name.as_str();

        let mut scored: Vec<ScoredCandidate> = candidates
            .iter()
            .filter(|&&eid| passes_gates(eid, &profile.gates, self.ledger))
            .map(|&entity_id| {
                let raw = match profile_name {
                    "for_you" => self.score_for_you(entity_id, user_ctx, now, item_embeddings, item_created_at),
                    "related" => self.score_related(entity_id, user_ctx, item_embeddings),
                    "notification" => self.score_notification(entity_id, user_ctx),
                    _ => self.compute_raw_score(entity_id, profile, now),
                };
                ScoredCandidate {
                    entity_id,
                    score: raw,
                    signal_snapshot: vec![],
                    creator_id: None,
                    format: None,
                }
            })
            .collect();

        scored.sort_by(|a, b| b.score.partial_cmp(&a.score).unwrap_or(std::cmp::Ordering::Equal));
        normalize(&mut scored);
        scored
    }

    fn score_for_you(
        &self,
        entity_id: EntityId,
        user_ctx: &UserContext,
        now: Timestamp,
        item_embeddings: &dyn Fn(EntityId) -> Option<Vec<f32>>,
        item_created_at: &dyn Fn(EntityId) -> Option<u64>,
    ) -> f64 {
        let item_emb = item_embeddings(entity_id);
        let pref_match = preference_match(user_ctx, item_emb.as_deref());

        let view_vel = read_agg(entity_id, "view", &SignalAgg::Velocity, Window::TwentyFourHours, self.ledger);
        let share_vel = read_agg(entity_id, "share", &SignalAgg::Velocity, Window::TwentyFourHours, self.ledger);
        let engagement_vel = (view_vel + 2.0 * share_vel).min(1.0);

        let recency = item_created_at(entity_id)
            .map_or(0.5, |ts| recency_score(ts, now));

        let creator_id = None; // Read from metadata in actual implementation
        let affinity = creator_affinity(user_ctx, creator_id);

        for_you_score(pref_match, engagement_vel, recency, affinity)
    }

    fn score_related(
        &self,
        entity_id: EntityId,
        user_ctx: &UserContext,
        item_embeddings: &dyn Fn(EntityId) -> Option<Vec<f32>>,
    ) -> f64 {
        let item_emb = item_embeddings(entity_id);
        let pref_match = preference_match(user_ctx, item_emb.as_deref());

        let views = read_agg(entity_id, "view", &SignalAgg::Value, Window::AllTime, self.ledger);
        let likes = read_agg(entity_id, "like", &SignalAgg::Value, Window::AllTime, self.ledger);
        let engagement = (views.log10().max(0.0) + likes.log10().max(0.0)) / 10.0;

        // item_similarity is computed by the caller from ANN distances.
        // For now, use preference match as a proxy.
        let item_similarity = pref_match;

        related_score(item_similarity, pref_match, engagement)
    }

    fn score_notification(
        &self,
        entity_id: EntityId,
        user_ctx: &UserContext,
    ) -> f64 {
        let creator_id = None; // Read from metadata
        let rel_strength = creator_affinity(user_ctx, creator_id);

        let view_vel = read_agg(entity_id, "view", &SignalAgg::Velocity, Window::TwentyFourHours, self.ledger);
        let completion = read_agg(entity_id, "completion", &SignalAgg::DecayScore, Window::AllTime, self.ledger);
        let item_quality = (view_vel.log10().max(0.0) + completion) / 2.0;

        notification_score(rel_strength, item_quality)
    }
}

Test Strategy

Unit Tests

#[test]
fn preference_match_identical_vectors() {
    let pref = PreferenceVector::from_embedding(vec![1.0, 0.0, 0.0], 3).unwrap();
    let ctx = UserContext {
        user_id: EntityId::new(1),
        preference_vector: Some(pref),
        top_creators: vec![],
        followed_creators: HashSet::new(),
        blocked_creators: HashSet::new(),
        hidden_items: HashSet::new(),
        is_cold_start: false,
    };
    let item = [1.0f32, 0.0, 0.0];
    let score = preference_match(&ctx, Some(&item));
    assert!((score - 1.0).abs() < 0.01, "identical vectors: {}", score);
}

#[test]
fn preference_match_orthogonal_vectors() {
    let pref = PreferenceVector::from_embedding(vec![1.0, 0.0, 0.0], 3).unwrap();
    let ctx = UserContext {
        user_id: EntityId::new(1),
        preference_vector: Some(pref),
        ..cold_start_context()
    };
    let item = [0.0f32, 1.0, 0.0];
    let score = preference_match(&ctx, Some(&item));
    assert!((score - 0.5).abs() < 0.01, "orthogonal: {}", score);
}

#[test]
fn preference_match_opposite_vectors() {
    let pref = PreferenceVector::from_embedding(vec![1.0, 0.0, 0.0], 3).unwrap();
    let ctx = UserContext {
        user_id: EntityId::new(1),
        preference_vector: Some(pref),
        ..cold_start_context()
    };
    let item = [-1.0f32, 0.0, 0.0];
    let score = preference_match(&ctx, Some(&item));
    assert!((score - 0.0).abs() < 0.01, "opposite: {}", score);
}

#[test]
fn preference_match_cold_start_returns_neutral() {
    let ctx = cold_start_context();
    let item = [1.0f32, 0.0, 0.0];
    let score = preference_match(&ctx, Some(&item));
    assert!((score - 0.5).abs() < f64::EPSILON);
}

#[test]
fn creator_affinity_zero_for_no_interaction() {
    let ctx = cold_start_context();
    let score = creator_affinity(&ctx, Some(EntityId::new(10)));
    assert!((score - 0.0).abs() < f64::EPSILON);
}

#[test]
fn creator_affinity_saturates() {
    let ctx = UserContext {
        user_id: EntityId::new(1),
        top_creators: vec![(EntityId::new(10), 100.0)],
        ..cold_start_context()
    };
    let score = creator_affinity(&ctx, Some(EntityId::new(10)));
    // weight=100, k=5: 100/(100+5) = 0.952
    assert!(score > 0.9, "high affinity: {}", score);
}

#[test]
fn recency_score_now_is_one() {
    let now = Timestamp::now();
    let score = recency_score(now.as_nanos(), now);
    assert!((score - 1.0).abs() < 0.01);
}

#[test]
fn recency_score_48h_is_half() {
    let now = Timestamp::now();
    let forty_eight_hours_ago = now.as_nanos() - 48 * 3600 * 1_000_000_000;
    let score = recency_score(forty_eight_hours_ago, now);
    assert!((score - 0.5).abs() < 0.05, "48h recency: {}", score);
}

#[test]
fn for_you_score_range() {
    let score = for_you_score(1.0, 1.0, 1.0, 1.0);
    assert!((score - 1.0).abs() < f64::EPSILON);
    let score_zero = for_you_score(0.0, 0.0, 0.0, 0.0);
    assert!((score_zero - 0.0).abs() < f64::EPSILON);
}

#[test]
fn related_score_range() {
    let score = related_score(1.0, 1.0, 1.0);
    assert!((score - 1.0).abs() < f64::EPSILON);
}

#[test]
fn notification_score_range() {
    let score = notification_score(1.0, 1.0);
    assert!((score - 1.0).abs() < f64::EPSILON);
}

fn cold_start_context() -> UserContext {
    UserContext {
        user_id: EntityId::new(1),
        preference_vector: None,
        top_creators: vec![],
        followed_creators: HashSet::new(),
        blocked_creators: HashSet::new(),
        hidden_items: HashSet::new(),
        is_cold_start: true,
    }
}

Property Tests

use proptest::prelude::*;

proptest! {
    #[test]
    fn preference_match_always_in_unit_range(
        pref_vec in proptest::collection::vec(-1.0f32..1.0, 16),
        item_vec in proptest::collection::vec(-1.0f32..1.0, 16),
    ) {
        if let Some(pref) = PreferenceVector::from_embedding(pref_vec, 16) {
            let ctx = UserContext {
                user_id: EntityId::new(1),
                preference_vector: Some(pref),
                ..cold_start_context()
            };
            let score = preference_match(&ctx, Some(&item_vec));
            prop_assert!(score >= 0.0 && score <= 1.0,
                "preference match out of range: {}", score);
        }
    }

    #[test]
    fn for_you_score_always_in_unit_range(
        pm in 0.0f64..1.0,
        ev in 0.0f64..1.0,
        r in 0.0f64..1.0,
        a in 0.0f64..1.0,
    ) {
        let score = for_you_score(pm, ev, r, a);
        prop_assert!(score >= 0.0 && score <= 1.0,
            "for_you score out of range: {}", score);
    }

    #[test]
    fn creator_affinity_always_in_unit_range(
        weight in 0.0f64..1000.0,
    ) {
        let ctx = UserContext {
            user_id: EntityId::new(1),
            top_creators: vec![(EntityId::new(10), weight)],
            ..cold_start_context()
        };
        let score = creator_affinity(&ctx, Some(EntityId::new(10)));
        prop_assert!(score >= 0.0 && score <= 1.0,
            "creator affinity out of range: {}", score);
    }
}

Acceptance Criteria

preference_match returns cosine similarity remapped to [0, 1], neutral 0.5 for missing data
creator_affinity returns sigmoid-normalized interaction weight in [0, 1]
recency_score returns exponential decay with 48h half-life
for_you_score combines four factors with weights summing to 1.0
related_score combines three factors with weights summing to 1.0
notification_score combines two factors with weights summing to 1.0
All scoring functions return values in [0.0, 1.0] (property tested)
for_you profile registered as builtin with correct configuration
following profile registered with Sort::New and CandidateStrategy::Relationship
related profile registered with ANN candidate strategy
notification profile registered with relationship-based candidates
ProfileExecutor::score_with_context dispatches to correct scoring function by profile name
Cold-start users get neutral scores (0.5 preference match, 0.0 affinity)
cargo clippy -- -D warnings passes
All tests pass

Research References

docs/research/ann_for_tidaldb.md -- Cosine similarity via dot product on unit vectors
VISION.md -- Ranking profile formulas
USE_CASES.md -- UC-01, UC-04, UC-05, UC-07

Implementation Notes

The scoring functions are intentionally simple linear combinations. The weights (0.4/0.3/0.2/0.1 for for_you) are starting points that can be tuned without code changes if the profile system is extended to accept configurable weights. For M3, hardcoded weights are sufficient.
creator_affinity uses a sigmoid-like w/(w+k) transformation instead of raw weight. This bounds the output to [0, 1] and prevents high-weight creators from completely dominating the score. The half-saturation constant k=5.0 means a weight of 5 produces affinity 0.5.
For the related profile, the item_similarity should ideally come from the ANN distance between the source item and candidate item. In this task, we use preference match as a proxy. The full implementation should pipe ANN distances through from the candidate retrieval phase.
The following profile uses Sort::New from the existing sort system. No custom scoring is needed -- the executor's existing score_by_sort handles chronological ordering.
The notification profile's CandidateStrategy::Relationship means candidates are sourced from followed creators' items. The RETRIEVE executor must implement this candidate sourcing strategy, which uses the FollowsBitmap from m3p1 Task 03.
Do NOT implement the exploration budget injection in this task. The exploration: 0.1 field on the for_you profile is defined here but not enforced. Enforcement is done in Task 03 (Cold Start and Exploration).

21 KiB Raw Blame History