# Task 02: Personalized Profiles ## Context **Milestone:** 3 -- Personalized Ranking **Phase:** m3p3 -- Personalized Ranking Profiles **Depends On:** Task 01 (`UserContext` with preference vector, interaction weights, follows), m2p3 (profile engine, `ProfileExecutor`, `RankingProfile`), m2p1 (vector index for ANN retrieval), m2p4 (diversity enforcement) **Blocks:** Task 03 (Cold Start and Exploration needs `for_you` profile to inject exploration candidates) **Complexity:** L ## Objective Deliver four personalized ranking profiles: `for_you`, `following`, `related`, and `notification`. These profiles are registered as builtins in the `ProfileRegistry` alongside the existing M2 profiles (trending, hot, new, etc.). Each profile uses the `UserContext` from Task 01 to personalize candidate retrieval and scoring. The `ProfileExecutor` is extended with a `score_with_context` method that accepts an optional `UserContext`. When user context is available, the executor applies personalization factors: preference match (cosine similarity between user and item embeddings), creator affinity (interaction weight boost), and social proof (engagement from followed creators). These four profiles cover UC-01 (For You Feed), UC-04 (Following Feed), UC-05 (Related/Up Next), and UC-07 (Notifications). ## Requirements ### for_you Profile - Candidate strategy: ANN retrieval using user preference vector (top 200 candidates) - Scoring formula: `preference_match * 0.4 + engagement_velocity * 0.3 + recency * 0.2 + creator_affinity * 0.1` - `preference_match`: cosine similarity between user preference vector and item embedding - `engagement_velocity`: normalized view + share velocity from signal ledger - `recency`: exponential decay from item age (half-life 48h) - `creator_affinity`: interaction weight between user and item's creator, normalized to [0, 1] - Gate: completion_rate > 0.02 (filters very low quality) - Diversity: max_per_creator from query, format_mix 0.6 - Exploration: 10% budget (injected by Task 03) ### following Profile - Candidate strategy: relationship-based (items from followed creators only) - Scoring: `created_at` DESC (chronological), with tiebreaker on `completion_rate` - No engagement-based scoring -- chronological is the default for following feeds - No diversity enforcement (creator identity IS the filter) - No exploration budget ### related Profile - Candidate strategy: ANN retrieval using source item embedding (top 100 candidates) - Scoring: `item_similarity * 0.5 + preference_match * 0.3 + engagement * 0.2` - `item_similarity`: cosine between source item and candidate item embeddings - `preference_match`: cosine between user preference vector and candidate, if available - `engagement`: normalized population signals (view count, like count) - Filter: exclude the source item itself - Diversity: max_per_creator:2 ### notification Profile - Candidate strategy: scan recent items from followed creators (last 48h) - Scoring: `relationship_strength * 0.6 + item_quality * 0.4` - `relationship_strength`: interaction weight between user and creator, normalized - `item_quality`: composite of view velocity + completion rate - Filter: only items from followed creators, created within 48h - Sort: descending by score ## Technical Design ### Module Structure ``` tidal/src/ ranking/ personalized.rs -- Personalized scoring functions builtins.rs -- Extended with new profile definitions ``` ### Personalized Scoring Functions ```rust // === ranking/personalized.rs === use crate::db::user_context::UserContext; use crate::schema::{EntityId, Timestamp, Window}; use crate::signals::SignalLedger; /// Compute the preference match score between a user and an item. /// /// Returns cosine similarity in [-1.0, 1.0], remapped to [0.0, 1.0]. /// Returns 0.5 (neutral) if either vector is unavailable. pub fn preference_match( user_ctx: &UserContext, item_embedding: Option<&[f32]>, ) -> f64 { match (&user_ctx.preference_vector, item_embedding) { (Some(pref), Some(item)) => { if let Some(pref_data) = pref.as_slice() { if pref_data.len() == item.len() { let cosine: f64 = pref_data.iter() .zip(item.iter()) .map(|(&a, &b)| f64::from(a) * f64::from(b)) .sum(); // Remap [-1, 1] to [0, 1]. (cosine + 1.0) / 2.0 } else { 0.5 // Dimension mismatch: neutral score } } else { 0.5 // Cold start: neutral score } } _ => 0.5, // Missing data: neutral score } } /// Compute the creator affinity score for a user-creator pair. /// /// Normalizes the interaction weight to [0.0, 1.0] using a sigmoid-like /// transformation: `affinity = weight / (weight + k)` where k is a /// half-saturation constant (default 5.0). pub fn creator_affinity( user_ctx: &UserContext, creator_id: Option, ) -> f64 { const K: f64 = 5.0; // Half-saturation constant match creator_id { Some(cid) => { let weight = user_ctx.interaction_weight(cid); weight / (weight + K) } None => 0.0, } } /// Compute a recency score based on item age. /// /// Uses exponential decay with a 48-hour half-life. /// Items created at `now` get score 1.0; items 48h old get 0.5. pub fn recency_score( created_at_ns: u64, now: Timestamp, ) -> f64 { let now_ns = now.as_nanos(); if created_at_ns >= now_ns { return 1.0; } let age_secs = (now_ns - created_at_ns) as f64 / 1_000_000_000.0; let half_life_secs = 48.0 * 3600.0; let lambda = std::f64::consts::LN_2 / half_life_secs; (-lambda * age_secs).exp() } /// Composite for_you score for a single candidate. pub fn for_you_score( pref_match: f64, engagement_vel: f64, recency: f64, affinity: f64, ) -> f64 { pref_match * 0.4 + engagement_vel * 0.3 + recency * 0.2 + affinity * 0.1 } /// Composite related score for a single candidate. pub fn related_score( item_similarity: f64, pref_match: f64, engagement: f64, ) -> f64 { item_similarity * 0.5 + pref_match * 0.3 + engagement * 0.2 } /// Composite notification score for a single candidate. pub fn notification_score( relationship_strength: f64, item_quality: f64, ) -> f64 { relationship_strength * 0.6 + item_quality * 0.4 } ``` ### Profile Definitions ```rust // === ranking/builtins.rs (extensions) === /// Register the personalized profiles. pub fn register_personalized_builtins(registry: &mut ProfileRegistry) -> crate::Result<()> { // for_you registry.register(RankingProfile { name: "for_you".into(), version: 1, candidate_strategy: CandidateStrategy::Ann { slot: "user_preference".into(), limit: 200, }, boosts: vec![], decay: None, gates: vec![Gate { signal: "completion".into(), agg: SignalAgg::Ratio, window: Window::AllTime, min_threshold: 0.02, }], penalties: vec![Penalty { signal: "skip".into(), agg: SignalAgg::Value, window: Window::TwentyFourHours, weight: 0.1, }], excludes: vec![], diversity: DiversitySpec { max_per_creator: Some(2), format_mix_max_fraction: Some(0.6), }, exploration: 0.1, // 10% sort: None, // Custom scoring via score_with_context is_builtin: true, })?; // following registry.register(RankingProfile { name: "following".into(), version: 1, candidate_strategy: CandidateStrategy::Relationship, boosts: vec![], decay: None, gates: vec![], penalties: vec![], excludes: vec![], diversity: DiversitySpec::default(), exploration: 0.0, sort: Some(Sort::New), // Chronological is_builtin: true, })?; // related registry.register(RankingProfile { name: "related".into(), version: 1, candidate_strategy: CandidateStrategy::Ann { slot: "default".into(), // Source item embedding limit: 100, }, boosts: vec![], decay: None, gates: vec![], penalties: vec![], excludes: vec![], diversity: DiversitySpec { max_per_creator: Some(2), format_mix_max_fraction: None, }, exploration: 0.0, sort: None, // Custom scoring via score_with_context is_builtin: true, })?; // notification registry.register(RankingProfile { name: "notification".into(), version: 1, candidate_strategy: CandidateStrategy::Relationship, boosts: vec![], decay: None, gates: vec![], penalties: vec![], excludes: vec![], diversity: DiversitySpec::default(), exploration: 0.0, sort: None, // Custom scoring via score_with_context is_builtin: true, })?; Ok(()) } ``` ### ProfileExecutor Extension ```rust impl<'a> ProfileExecutor<'a> { /// Score candidates with user context for personalized ranking. /// /// When `user_ctx` is provided, the executor uses personalized scoring /// functions. The profile name determines which scoring formula is used: /// - `for_you`: preference match + engagement + recency + affinity /// - `related`: item similarity + preference match + engagement /// - `notification`: relationship strength + item quality /// - All others: delegates to `score()` (population-level) pub fn score_with_context( &self, candidates: &[EntityId], profile: &RankingProfile, now: Timestamp, user_ctx: &UserContext, item_embeddings: &dyn Fn(EntityId) -> Option>, item_created_at: &dyn Fn(EntityId) -> Option, ) -> Vec { let profile_name = profile.name.as_str(); let mut scored: Vec = candidates .iter() .filter(|&&eid| passes_gates(eid, &profile.gates, self.ledger)) .map(|&entity_id| { let raw = match profile_name { "for_you" => self.score_for_you(entity_id, user_ctx, now, item_embeddings, item_created_at), "related" => self.score_related(entity_id, user_ctx, item_embeddings), "notification" => self.score_notification(entity_id, user_ctx), _ => self.compute_raw_score(entity_id, profile, now), }; ScoredCandidate { entity_id, score: raw, signal_snapshot: vec![], creator_id: None, format: None, } }) .collect(); scored.sort_by(|a, b| b.score.partial_cmp(&a.score).unwrap_or(std::cmp::Ordering::Equal)); normalize(&mut scored); scored } fn score_for_you( &self, entity_id: EntityId, user_ctx: &UserContext, now: Timestamp, item_embeddings: &dyn Fn(EntityId) -> Option>, item_created_at: &dyn Fn(EntityId) -> Option, ) -> f64 { let item_emb = item_embeddings(entity_id); let pref_match = preference_match(user_ctx, item_emb.as_deref()); let view_vel = read_agg(entity_id, "view", &SignalAgg::Velocity, Window::TwentyFourHours, self.ledger); let share_vel = read_agg(entity_id, "share", &SignalAgg::Velocity, Window::TwentyFourHours, self.ledger); let engagement_vel = (view_vel + 2.0 * share_vel).min(1.0); let recency = item_created_at(entity_id) .map_or(0.5, |ts| recency_score(ts, now)); let creator_id = None; // Read from metadata in actual implementation let affinity = creator_affinity(user_ctx, creator_id); for_you_score(pref_match, engagement_vel, recency, affinity) } fn score_related( &self, entity_id: EntityId, user_ctx: &UserContext, item_embeddings: &dyn Fn(EntityId) -> Option>, ) -> f64 { let item_emb = item_embeddings(entity_id); let pref_match = preference_match(user_ctx, item_emb.as_deref()); let views = read_agg(entity_id, "view", &SignalAgg::Value, Window::AllTime, self.ledger); let likes = read_agg(entity_id, "like", &SignalAgg::Value, Window::AllTime, self.ledger); let engagement = (views.log10().max(0.0) + likes.log10().max(0.0)) / 10.0; // item_similarity is computed by the caller from ANN distances. // For now, use preference match as a proxy. let item_similarity = pref_match; related_score(item_similarity, pref_match, engagement) } fn score_notification( &self, entity_id: EntityId, user_ctx: &UserContext, ) -> f64 { let creator_id = None; // Read from metadata let rel_strength = creator_affinity(user_ctx, creator_id); let view_vel = read_agg(entity_id, "view", &SignalAgg::Velocity, Window::TwentyFourHours, self.ledger); let completion = read_agg(entity_id, "completion", &SignalAgg::DecayScore, Window::AllTime, self.ledger); let item_quality = (view_vel.log10().max(0.0) + completion) / 2.0; notification_score(rel_strength, item_quality) } } ``` ## Test Strategy ### Unit Tests ```rust #[test] fn preference_match_identical_vectors() { let pref = PreferenceVector::from_embedding(vec![1.0, 0.0, 0.0], 3).unwrap(); let ctx = UserContext { user_id: EntityId::new(1), preference_vector: Some(pref), top_creators: vec![], followed_creators: HashSet::new(), blocked_creators: HashSet::new(), hidden_items: HashSet::new(), is_cold_start: false, }; let item = [1.0f32, 0.0, 0.0]; let score = preference_match(&ctx, Some(&item)); assert!((score - 1.0).abs() < 0.01, "identical vectors: {}", score); } #[test] fn preference_match_orthogonal_vectors() { let pref = PreferenceVector::from_embedding(vec![1.0, 0.0, 0.0], 3).unwrap(); let ctx = UserContext { user_id: EntityId::new(1), preference_vector: Some(pref), ..cold_start_context() }; let item = [0.0f32, 1.0, 0.0]; let score = preference_match(&ctx, Some(&item)); assert!((score - 0.5).abs() < 0.01, "orthogonal: {}", score); } #[test] fn preference_match_opposite_vectors() { let pref = PreferenceVector::from_embedding(vec![1.0, 0.0, 0.0], 3).unwrap(); let ctx = UserContext { user_id: EntityId::new(1), preference_vector: Some(pref), ..cold_start_context() }; let item = [-1.0f32, 0.0, 0.0]; let score = preference_match(&ctx, Some(&item)); assert!((score - 0.0).abs() < 0.01, "opposite: {}", score); } #[test] fn preference_match_cold_start_returns_neutral() { let ctx = cold_start_context(); let item = [1.0f32, 0.0, 0.0]; let score = preference_match(&ctx, Some(&item)); assert!((score - 0.5).abs() < f64::EPSILON); } #[test] fn creator_affinity_zero_for_no_interaction() { let ctx = cold_start_context(); let score = creator_affinity(&ctx, Some(EntityId::new(10))); assert!((score - 0.0).abs() < f64::EPSILON); } #[test] fn creator_affinity_saturates() { let ctx = UserContext { user_id: EntityId::new(1), top_creators: vec![(EntityId::new(10), 100.0)], ..cold_start_context() }; let score = creator_affinity(&ctx, Some(EntityId::new(10))); // weight=100, k=5: 100/(100+5) = 0.952 assert!(score > 0.9, "high affinity: {}", score); } #[test] fn recency_score_now_is_one() { let now = Timestamp::now(); let score = recency_score(now.as_nanos(), now); assert!((score - 1.0).abs() < 0.01); } #[test] fn recency_score_48h_is_half() { let now = Timestamp::now(); let forty_eight_hours_ago = now.as_nanos() - 48 * 3600 * 1_000_000_000; let score = recency_score(forty_eight_hours_ago, now); assert!((score - 0.5).abs() < 0.05, "48h recency: {}", score); } #[test] fn for_you_score_range() { let score = for_you_score(1.0, 1.0, 1.0, 1.0); assert!((score - 1.0).abs() < f64::EPSILON); let score_zero = for_you_score(0.0, 0.0, 0.0, 0.0); assert!((score_zero - 0.0).abs() < f64::EPSILON); } #[test] fn related_score_range() { let score = related_score(1.0, 1.0, 1.0); assert!((score - 1.0).abs() < f64::EPSILON); } #[test] fn notification_score_range() { let score = notification_score(1.0, 1.0); assert!((score - 1.0).abs() < f64::EPSILON); } fn cold_start_context() -> UserContext { UserContext { user_id: EntityId::new(1), preference_vector: None, top_creators: vec![], followed_creators: HashSet::new(), blocked_creators: HashSet::new(), hidden_items: HashSet::new(), is_cold_start: true, } } ``` ### Property Tests ```rust use proptest::prelude::*; proptest! { #[test] fn preference_match_always_in_unit_range( pref_vec in proptest::collection::vec(-1.0f32..1.0, 16), item_vec in proptest::collection::vec(-1.0f32..1.0, 16), ) { if let Some(pref) = PreferenceVector::from_embedding(pref_vec, 16) { let ctx = UserContext { user_id: EntityId::new(1), preference_vector: Some(pref), ..cold_start_context() }; let score = preference_match(&ctx, Some(&item_vec)); prop_assert!(score >= 0.0 && score <= 1.0, "preference match out of range: {}", score); } } #[test] fn for_you_score_always_in_unit_range( pm in 0.0f64..1.0, ev in 0.0f64..1.0, r in 0.0f64..1.0, a in 0.0f64..1.0, ) { let score = for_you_score(pm, ev, r, a); prop_assert!(score >= 0.0 && score <= 1.0, "for_you score out of range: {}", score); } #[test] fn creator_affinity_always_in_unit_range( weight in 0.0f64..1000.0, ) { let ctx = UserContext { user_id: EntityId::new(1), top_creators: vec![(EntityId::new(10), weight)], ..cold_start_context() }; let score = creator_affinity(&ctx, Some(EntityId::new(10))); prop_assert!(score >= 0.0 && score <= 1.0, "creator affinity out of range: {}", score); } } ``` ## Acceptance Criteria - [ ] `preference_match` returns cosine similarity remapped to [0, 1], neutral 0.5 for missing data - [ ] `creator_affinity` returns sigmoid-normalized interaction weight in [0, 1] - [ ] `recency_score` returns exponential decay with 48h half-life - [ ] `for_you_score` combines four factors with weights summing to 1.0 - [ ] `related_score` combines three factors with weights summing to 1.0 - [ ] `notification_score` combines two factors with weights summing to 1.0 - [ ] All scoring functions return values in [0.0, 1.0] (property tested) - [ ] `for_you` profile registered as builtin with correct configuration - [ ] `following` profile registered with `Sort::New` and `CandidateStrategy::Relationship` - [ ] `related` profile registered with ANN candidate strategy - [ ] `notification` profile registered with relationship-based candidates - [ ] `ProfileExecutor::score_with_context` dispatches to correct scoring function by profile name - [ ] Cold-start users get neutral scores (0.5 preference match, 0.0 affinity) - [ ] `cargo clippy -- -D warnings` passes - [ ] All tests pass ## Research References - [docs/research/ann_for_tidaldb.md](../../../research/ann_for_tidaldb.md) -- Cosine similarity via dot product on unit vectors - [VISION.md](../../../../VISION.md) -- Ranking profile formulas - [USE_CASES.md](../../../../USE_CASES.md) -- UC-01, UC-04, UC-05, UC-07 ## Implementation Notes - The scoring functions are intentionally simple linear combinations. The weights (0.4/0.3/0.2/0.1 for for_you) are starting points that can be tuned without code changes if the profile system is extended to accept configurable weights. For M3, hardcoded weights are sufficient. - `creator_affinity` uses a sigmoid-like `w/(w+k)` transformation instead of raw weight. This bounds the output to [0, 1] and prevents high-weight creators from completely dominating the score. The half-saturation constant `k=5.0` means a weight of 5 produces affinity 0.5. - For the `related` profile, the `item_similarity` should ideally come from the ANN distance between the source item and candidate item. In this task, we use preference match as a proxy. The full implementation should pipe ANN distances through from the candidate retrieval phase. - The `following` profile uses `Sort::New` from the existing sort system. No custom scoring is needed -- the executor's existing `score_by_sort` handles chronological ordering. - The `notification` profile's `CandidateStrategy::Relationship` means candidates are sourced from followed creators' items. The RETRIEVE executor must implement this candidate sourcing strategy, which uses the `FollowsBitmap` from m3p1 Task 03. - Do NOT implement the exploration budget injection in this task. The `exploration: 0.1` field on the `for_you` profile is defined here but not enforced. Enforcement is done in Task 03 (Cold Start and Exploration).