# Task 03: Cold Start and Exploration ## Context **Milestone:** 3 -- Personalized Ranking **Phase:** m3p3 -- Personalized Ranking Profiles **Depends On:** Task 02 (Personalized Profiles: `for_you` profile with `exploration: 0.1`), Task 01 (`UserContext::is_cold_start`), m2p4 (diversity enforcement, `DiversitySelector`) **Blocks:** m3p4 (User State Filters + UAT need complete for_you behavior) **Complexity:** M ## Objective Deliver cold-start handling for new users and new items, plus the exploration budget injection for the `for_you` profile. These mechanisms prevent the personalization system from collapsing into a filter bubble or failing silently for users/items with no history. **Cold-start users**: A new user with no engagement history has no preference vector. The `for_you` profile falls back to population-level signals: trending, quality (completion rate), and recency. This ensures new users see a reasonable feed on their first visit. **Cold-start items**: New items with no signal history have no engagement data. Without intervention, they would never appear in personalized feeds, creating a chicken-and-egg problem. The exploration window gives new items a brief period (configurable, default 24h) where they are eligible for inclusion in the exploration budget of for_you feeds. **Exploration budget**: The `for_you` profile reserves 10% of its result set (e.g., 5 of 50 items) for exploration candidates: items from creators the user does NOT follow. This prevents the feed from becoming a closed loop of familiar content and exposes users to new creators. ## Requirements ### Cold-Start User Fallback - When `UserContext::is_cold_start` is true, `for_you` scoring uses population-level signals only - Fallback formula: `trending_velocity * 0.4 + completion_rate * 0.3 + recency * 0.3` - No preference match (no vector to compare against) - No creator affinity (no interaction history) - Candidate strategy: full corpus scan sorted by trending (not ANN, since no query vector) - Diversity constraints still apply ### Cold-Start Item Exploration Window - `ExplorationWindow` struct: items created within the last N hours (default 24h) with fewer than M signals (default 10) are eligible for exploration - Eligible items are added to an `exploration_pool` bitmap - The exploration budget draws from this pool when filling exploration slots - After the exploration window expires, items must earn organic engagement to appear - The window is configurable via `ExplorationConfig` ### Exploration Budget - The `for_you` profile has `exploration: 0.1` (10%) - For `LIMIT 50`, 5 items come from the exploration budget - Exploration candidates are items NOT from followed creators - Selection from the exploration pool: random shuffle, then score by population signals - Exploration candidates are placed at positions throughout the result set (interleaved), not clustered at the end - The remaining 90% (45 items) come from the standard personalized scoring ## Technical Design ### Module Structure ``` tidal/src/ ranking/ exploration.rs -- ExplorationConfig, ExplorationWindow, budget injection ``` ### ExplorationConfig ```rust // === ranking/exploration.rs === use std::time::Duration; /// Configuration for cold-start and exploration behavior. #[derive(Debug, Clone)] pub struct ExplorationConfig { /// How long new items are eligible for exploration. /// Default: 24 hours. pub item_window: Duration, /// Maximum signal count for an item to be considered "cold start". /// Items with more signals than this are no longer in the exploration pool. /// Default: 10. pub max_signals_for_cold_item: u64, /// Fraction of results reserved for exploration. /// Default: 0.1 (10%). pub exploration_fraction: f64, } impl Default for ExplorationConfig { fn default() -> Self { Self { item_window: Duration::from_secs(24 * 3600), max_signals_for_cold_item: 10, exploration_fraction: 0.1, } } } ``` ### Cold-Start User Scoring ```rust use crate::db::user_context::UserContext; use crate::schema::{EntityId, Timestamp, Window}; use crate::signals::SignalLedger; /// Score a candidate for a cold-start user. /// /// Uses population-level signals only: trending velocity, completion rate, /// and recency. No personalization factors. pub fn cold_start_score( entity_id: EntityId, now: Timestamp, ledger: &SignalLedger, item_created_at: &dyn Fn(EntityId) -> Option, ) -> f64 { let view_vel = read_agg(entity_id, "view", &SignalAgg::Velocity, Window::TwentyFourHours, ledger); let share_vel = read_agg(entity_id, "share", &SignalAgg::Velocity, Window::TwentyFourHours, ledger); let trending = (view_vel + 2.0 * share_vel).min(1.0); let completion = read_agg(entity_id, "completion", &SignalAgg::DecayScore, Window::AllTime, ledger); let completion_rate = completion.min(1.0); let recency = item_created_at(entity_id) .map_or(0.5, |ts| recency_score(ts, now)); trending * 0.4 + completion_rate * 0.3 + recency * 0.3 } ``` ### Exploration Budget Injection ```rust /// Inject exploration candidates into a personalized result set. /// /// Takes the scored personalized results and injects exploration candidates /// at regular intervals (interleaved, not appended). /// /// # Parameters /// /// - `personalized`: scored candidates from the personalized pipeline (sorted desc) /// - `exploration_candidates`: candidates from unfollowed creators, scored by population signals /// - `total_limit`: target result count (e.g., 50) /// - `exploration_fraction`: fraction of results for exploration (e.g., 0.1) /// /// # Returns /// /// A merged result set with exploration candidates interleaved. pub fn inject_exploration( personalized: &[ScoredCandidate], exploration_candidates: &[ScoredCandidate], total_limit: usize, exploration_fraction: f64, ) -> Vec { let exploration_count = ((total_limit as f64) * exploration_fraction).ceil() as usize; let personalized_count = total_limit.saturating_sub(exploration_count); let personalized_slice = &personalized[..personalized_count.min(personalized.len())]; let exploration_slice = &exploration_candidates[..exploration_count.min(exploration_candidates.len())]; if exploration_slice.is_empty() { // No exploration candidates available: return personalized only. return personalized[..total_limit.min(personalized.len())].to_vec(); } // Interleave: place exploration candidates at regular intervals. let mut result = Vec::with_capacity(total_limit); let step = if exploration_slice.is_empty() { usize::MAX } else { total_limit / exploration_slice.len() }; let mut p_idx = 0; let mut e_idx = 0; for i in 0..total_limit { if e_idx < exploration_slice.len() && i > 0 && i % step == 0 { result.push(exploration_slice[e_idx].clone()); e_idx += 1; } else if p_idx < personalized_slice.len() { result.push(personalized_slice[p_idx].clone()); p_idx += 1; } else if e_idx < exploration_slice.len() { result.push(exploration_slice[e_idx].clone()); e_idx += 1; } } result } /// Select exploration candidates from the item corpus. /// /// Exploration candidates are items NOT from followed creators. /// They are scored by population-level signals and returned in /// descending score order. pub fn select_exploration_candidates( all_item_ids: &[EntityId], user_ctx: &UserContext, exploration_config: &ExplorationConfig, now: Timestamp, ledger: &SignalLedger, item_creator_lookup: &dyn Fn(EntityId) -> Option, item_created_at: &dyn Fn(EntityId) -> Option, limit: usize, ) -> Vec { let now_ns = now.as_nanos(); let window_ns = exploration_config.item_window.as_nanos() as u64; let mut candidates: Vec = all_item_ids.iter() .filter(|&&item_id| { // Exclude items from followed creators. let creator = item_creator_lookup(item_id); let from_followed = creator .map_or(false, |c| user_ctx.followed_creators.contains(&c.as_u64())); !from_followed }) .filter(|&&item_id| { // Exclude hidden items and items from blocked creators. if user_ctx.hidden_items.contains(&(item_id.as_u64() as u32)) { return false; } let creator = item_creator_lookup(item_id); if let Some(c) = creator { if user_ctx.blocked_creators.contains(&c.as_u64()) { return false; } } true }) .map(|&item_id| { let score = cold_start_score(item_id, now, ledger, item_created_at); ScoredCandidate { entity_id: item_id, score, signal_snapshot: vec![], creator_id: item_creator_lookup(item_id), format: None, } }) .collect(); // Sort by score descending. candidates.sort_by(|a, b| b.score.partial_cmp(&a.score).unwrap_or(std::cmp::Ordering::Equal)); candidates.truncate(limit); candidates } /// Check if an item is within the cold-start exploration window. pub fn is_cold_start_item( item_id: EntityId, now: Timestamp, config: &ExplorationConfig, item_created_at: &dyn Fn(EntityId) -> Option, total_signal_count: &dyn Fn(EntityId) -> u64, ) -> bool { let now_ns = now.as_nanos(); let window_ns = config.item_window.as_nanos() as u64; // Created within window. let created = item_created_at(item_id).unwrap_or(0); if now_ns.saturating_sub(created) > window_ns { return false; } // Below signal threshold. total_signal_count(item_id) < config.max_signals_for_cold_item } ``` ## Test Strategy ### Unit Tests ```rust #[test] fn cold_start_score_uses_population_signals() { let ledger = test_ledger_with_signals(); let entity = EntityId::new(1); let now = Timestamp::now(); let score = cold_start_score(entity, now, &ledger, &|_| Some(now.as_nanos())); assert!(score > 0.0, "cold start score should be positive: {}", score); assert!(score <= 1.0, "cold start score should be <= 1.0: {}", score); } #[test] fn inject_exploration_correct_count() { let personalized: Vec = (0..45) .map(|i| make_candidate(i + 1, (45 - i) as f64, Some(i as u64 + 1), None)) .collect(); let exploration: Vec = (0..5) .map(|i| make_candidate(i + 100, (5 - i) as f64, Some(i as u64 + 50), None)) .collect(); let result = inject_exploration(&personalized, &exploration, 50, 0.1); assert_eq!(result.len(), 50); // Count exploration items (IDs >= 100). let explore_count = result.iter().filter(|c| c.entity_id.as_u64() >= 100).count(); assert_eq!(explore_count, 5, "should have 5 exploration items"); } #[test] fn inject_exploration_empty_exploration_pool() { let personalized: Vec = (0..50) .map(|i| make_candidate(i + 1, (50 - i) as f64, Some(1), None)) .collect(); let exploration: Vec = vec![]; let result = inject_exploration(&personalized, &exploration, 50, 0.1); assert_eq!(result.len(), 50); // All items from personalized. assert!(result.iter().all(|c| c.entity_id.as_u64() <= 50)); } #[test] fn inject_exploration_interleaves() { let personalized: Vec = (0..45) .map(|i| make_candidate(i + 1, (45 - i) as f64, Some(1), None)) .collect(); let exploration: Vec = (0..5) .map(|i| make_candidate(i + 100, 1.0, Some(50), None)) .collect(); let result = inject_exploration(&personalized, &exploration, 50, 0.1); // Exploration items should be spread through the list, not all at the end. let first_half = &result[..25]; let second_half = &result[25..]; let explore_first = first_half.iter().filter(|c| c.entity_id.as_u64() >= 100).count(); let explore_second = second_half.iter().filter(|c| c.entity_id.as_u64() >= 100).count(); // At least one exploration item should be in each half. assert!(explore_first > 0 || explore_second > 0); } #[test] fn select_exploration_excludes_followed_creators() { let user_state = UserStateIndex::new(); let user = EntityId::new(1); user_state.add_follow(user, EntityId::new(10)); let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default()); let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, Timestamp::now()); let all_items: Vec = (1..=20).map(EntityId::new).collect(); let creator_lookup = |id: EntityId| -> Option { // Items 1-10 from creator 10 (followed), 11-20 from creator 20 (not followed). if id.as_u64() <= 10 { Some(EntityId::new(10)) } else { Some(EntityId::new(20)) } }; let ledger = empty_test_ledger(); let config = ExplorationConfig::default(); let now = Timestamp::now(); let candidates = select_exploration_candidates( &all_items, &ctx, &config, now, &ledger, &creator_lookup, &|_| Some(now.as_nanos()), 10, ); // Only items from creator 20 (unfollowed) should appear. assert!(candidates.iter().all(|c| c.creator_id == Some(EntityId::new(20))), "exploration should exclude followed creators"); } #[test] fn select_exploration_excludes_blocked_and_hidden() { let user_state = UserStateIndex::new(); let user = EntityId::new(1); user_state.add_block(user, EntityId::new(30)); user_state.add_hide(user, EntityId::new(15)); let iw_ledger = InteractionWeightLedger::new(InteractionWeightConfig::default()); let ctx = UserContext::load(user, &user_state, &iw_ledger, &|_| None, Timestamp::now()); let all_items: Vec = (1..=20).map(EntityId::new).collect(); let creator_lookup = |id: EntityId| -> Option { if id.as_u64() <= 10 { Some(EntityId::new(20)) } else { Some(EntityId::new(30)) } // Items 11-20 from blocked creator }; let ledger = empty_test_ledger(); let config = ExplorationConfig::default(); let now = Timestamp::now(); let candidates = select_exploration_candidates( &all_items, &ctx, &config, now, &ledger, &creator_lookup, &|_| Some(now.as_nanos()), 20, ); // Item 15 hidden, items 11-20 from blocked creator 30. assert!(candidates.iter().all(|c| c.entity_id.as_u64() != 15), "hidden items excluded"); assert!(candidates.iter().all(|c| c.creator_id != Some(EntityId::new(30))), "blocked creator items excluded"); } #[test] fn is_cold_start_item_within_window() { let config = ExplorationConfig::default(); // 24h window let now = Timestamp::now(); // Item created 1 hour ago with 0 signals: cold start. let one_hour_ago = now.as_nanos() - 3600 * 1_000_000_000; assert!(is_cold_start_item( EntityId::new(1), now, &config, &|_| Some(one_hour_ago), &|_| 0, )); } #[test] fn is_cold_start_item_outside_window() { let config = ExplorationConfig::default(); let now = Timestamp::now(); // Item created 48 hours ago: outside 24h window. let forty_eight_hours_ago = now.as_nanos() - 48 * 3600 * 1_000_000_000; assert!(!is_cold_start_item( EntityId::new(1), now, &config, &|_| Some(forty_eight_hours_ago), &|_| 0, )); } #[test] fn is_cold_start_item_too_many_signals() { let config = ExplorationConfig::default(); // max 10 signals let now = Timestamp::now(); // Item created recently but has 20 signals: not cold start. assert!(!is_cold_start_item( EntityId::new(1), now, &config, &|_| Some(now.as_nanos()), &|_| 20, )); } ``` ### Property Tests ```rust use proptest::prelude::*; proptest! { #[test] fn inject_exploration_preserves_total_count( n_personalized in 0usize..100, n_exploration in 0usize..20, total_limit in 1usize..100, frac in 0.0f64..0.3, ) { let personalized: Vec = (0..n_personalized) .map(|i| make_candidate(i as u64 + 1, (n_personalized - i) as f64, Some(1), None)) .collect(); let exploration: Vec = (0..n_exploration) .map(|i| make_candidate(i as u64 + 1000, 1.0, Some(50), None)) .collect(); let result = inject_exploration(&personalized, &exploration, total_limit, frac); let expected = total_limit.min(n_personalized + n_exploration); prop_assert!(result.len() <= total_limit, "result {} > limit {}", result.len(), total_limit); } #[test] fn cold_start_score_always_in_unit_range( entity_id in 1u64..1000, ) { let ledger = empty_test_ledger(); let now = Timestamp::now(); let score = cold_start_score( EntityId::new(entity_id), now, &ledger, &|_| Some(now.as_nanos()), ); prop_assert!(score >= 0.0 && score <= 1.0, "cold start score out of range: {}", score); } } ``` ## Acceptance Criteria - [ ] `ExplorationConfig` with configurable item window, signal threshold, and fraction - [ ] `cold_start_score` returns population-level score in [0, 1] - [ ] Cold-start users get `for_you` results ranked by trending + quality + recency - [ ] `is_cold_start_item` correctly identifies new items within window and below signal threshold - [ ] `inject_exploration` interleaves exploration candidates into personalized results - [ ] Exploration count = `ceil(limit * fraction)` (e.g., 5 for limit=50, fraction=0.1) - [ ] Exploration candidates exclude: followed creators, blocked creators, hidden items - [ ] Exploration candidates are interleaved, not clustered at the end - [ ] Empty exploration pool gracefully falls back to personalized-only results - [ ] `select_exploration_candidates` returns items from unfollowed creators scored by population signals - [ ] Property test: result count <= total_limit - [ ] Property test: cold_start_score always in [0, 1] - [ ] `cargo clippy -- -D warnings` passes - [ ] All tests pass ## Research References - [VISION.md](../../../../VISION.md) -- Exploration budget, cold-start handling - [USE_CASES.md](../../../../USE_CASES.md) -- UC-01 (For You: exploration budget prevents filter bubbles) - [thoughts.md](../../../../thoughts.md) -- Part V.16 (cold-start user defaults to population signals) ## Implementation Notes - The exploration budget is enforced at the RETRIEVE executor level, after personalized scoring and before the final result assembly. The executor calls `select_exploration_candidates` to get exploration items, then `inject_exploration` to merge them with personalized results. - For cold-start users, the candidate strategy switches from ANN (no query vector available) to full corpus scan sorted by `cold_start_score`. This is correct because without a preference vector, ANN retrieval has no meaningful query. - The interleaving strategy is simple: place one exploration candidate every `total_limit / exploration_count` positions. This distributes exploration items evenly. More sophisticated interleaving (e.g., placing them at positions where the personalized score drops) is deferred to M6. - `select_exploration_candidates` does a linear scan of all items. At M3 scale (10K items), this is fast. At larger scale, maintaining an exploration pool bitmap would be more efficient. - The `ExplorationConfig` is stored as a field on `TidalDb`, initialized from defaults in `TidalDbBuilder::open()`. Custom values can be set via `builder.with_exploration_config(config)`. - Cold-start item detection (`is_cold_start_item`) is used during exploration candidate selection to prioritize genuinely new items. However, all unfollowed items are eligible for exploration, not just cold-start items. Cold-start items get a small score boost within the exploration pool.