tidaldb/docs/planning/milestone-3/phase-3/task-02-personalized-profiles.md
jordan 39ada28c6e feat: complete Milestones 2–4 — RETRIEVE query, vector index, ranking profiles, diversity, entity system, sessions
M2: RETRIEVE query pipeline with 5-stage execution (candidate → filter → score → diversify → limit),
    usearch HNSW vector index, bitmap/range/universe filters, ranking profiles with signal scoring,
    MMR diversity enforcement, and m2_uat integration tests.

M3: Entity system with typed metadata, relationship graph (follows/blocks/interactions),
    creator entities, session tracking, and m3_uat integration tests.

M4: Advanced ranking with builtin functions (freshness, trending, controversy, wilson),
    ranking executor with explain mode, query executor integration, benchmarks for
    query/ranking/vector/filters/diversity, and m4_uat integration tests.

Includes: 9 new blog posts, marketing site updates, updated roadmap, and updated vision doc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 16:24:48 -07:00

582 lines
21 KiB
Markdown

# Task 02: Personalized Profiles
## Context
**Milestone:** 3 -- Personalized Ranking
**Phase:** m3p3 -- Personalized Ranking Profiles
**Depends On:** Task 01 (`UserContext` with preference vector, interaction weights, follows), m2p3 (profile engine, `ProfileExecutor`, `RankingProfile`), m2p1 (vector index for ANN retrieval), m2p4 (diversity enforcement)
**Blocks:** Task 03 (Cold Start and Exploration needs `for_you` profile to inject exploration candidates)
**Complexity:** L
## Objective
Deliver four personalized ranking profiles: `for_you`, `following`, `related`, and `notification`. These profiles are registered as builtins in the `ProfileRegistry` alongside the existing M2 profiles (trending, hot, new, etc.). Each profile uses the `UserContext` from Task 01 to personalize candidate retrieval and scoring.
The `ProfileExecutor` is extended with a `score_with_context` method that accepts an optional `UserContext`. When user context is available, the executor applies personalization factors: preference match (cosine similarity between user and item embeddings), creator affinity (interaction weight boost), and social proof (engagement from followed creators).
These four profiles cover UC-01 (For You Feed), UC-04 (Following Feed), UC-05 (Related/Up Next), and UC-07 (Notifications).
## Requirements
### for_you Profile
- Candidate strategy: ANN retrieval using user preference vector (top 200 candidates)
- Scoring formula: `preference_match * 0.4 + engagement_velocity * 0.3 + recency * 0.2 + creator_affinity * 0.1`
- `preference_match`: cosine similarity between user preference vector and item embedding
- `engagement_velocity`: normalized view + share velocity from signal ledger
- `recency`: exponential decay from item age (half-life 48h)
- `creator_affinity`: interaction weight between user and item's creator, normalized to [0, 1]
- Gate: completion_rate > 0.02 (filters very low quality)
- Diversity: max_per_creator from query, format_mix 0.6
- Exploration: 10% budget (injected by Task 03)
### following Profile
- Candidate strategy: relationship-based (items from followed creators only)
- Scoring: `created_at` DESC (chronological), with tiebreaker on `completion_rate`
- No engagement-based scoring -- chronological is the default for following feeds
- No diversity enforcement (creator identity IS the filter)
- No exploration budget
### related Profile
- Candidate strategy: ANN retrieval using source item embedding (top 100 candidates)
- Scoring: `item_similarity * 0.5 + preference_match * 0.3 + engagement * 0.2`
- `item_similarity`: cosine between source item and candidate item embeddings
- `preference_match`: cosine between user preference vector and candidate, if available
- `engagement`: normalized population signals (view count, like count)
- Filter: exclude the source item itself
- Diversity: max_per_creator:2
### notification Profile
- Candidate strategy: scan recent items from followed creators (last 48h)
- Scoring: `relationship_strength * 0.6 + item_quality * 0.4`
- `relationship_strength`: interaction weight between user and creator, normalized
- `item_quality`: composite of view velocity + completion rate
- Filter: only items from followed creators, created within 48h
- Sort: descending by score
## Technical Design
### Module Structure
```
tidal/src/
ranking/
personalized.rs -- Personalized scoring functions
builtins.rs -- Extended with new profile definitions
```
### Personalized Scoring Functions
```rust
// === ranking/personalized.rs ===
use crate::db::user_context::UserContext;
use crate::schema::{EntityId, Timestamp, Window};
use crate::signals::SignalLedger;
/// Compute the preference match score between a user and an item.
///
/// Returns cosine similarity in [-1.0, 1.0], remapped to [0.0, 1.0].
/// Returns 0.5 (neutral) if either vector is unavailable.
pub fn preference_match(
user_ctx: &UserContext,
item_embedding: Option<&[f32]>,
) -> f64 {
match (&user_ctx.preference_vector, item_embedding) {
(Some(pref), Some(item)) => {
if let Some(pref_data) = pref.as_slice() {
if pref_data.len() == item.len() {
let cosine: f64 = pref_data.iter()
.zip(item.iter())
.map(|(&a, &b)| f64::from(a) * f64::from(b))
.sum();
// Remap [-1, 1] to [0, 1].
(cosine + 1.0) / 2.0
} else {
0.5 // Dimension mismatch: neutral score
}
} else {
0.5 // Cold start: neutral score
}
}
_ => 0.5, // Missing data: neutral score
}
}
/// Compute the creator affinity score for a user-creator pair.
///
/// Normalizes the interaction weight to [0.0, 1.0] using a sigmoid-like
/// transformation: `affinity = weight / (weight + k)` where k is a
/// half-saturation constant (default 5.0).
pub fn creator_affinity(
user_ctx: &UserContext,
creator_id: Option<EntityId>,
) -> f64 {
const K: f64 = 5.0; // Half-saturation constant
match creator_id {
Some(cid) => {
let weight = user_ctx.interaction_weight(cid);
weight / (weight + K)
}
None => 0.0,
}
}
/// Compute a recency score based on item age.
///
/// Uses exponential decay with a 48-hour half-life.
/// Items created at `now` get score 1.0; items 48h old get 0.5.
pub fn recency_score(
created_at_ns: u64,
now: Timestamp,
) -> f64 {
let now_ns = now.as_nanos();
if created_at_ns >= now_ns {
return 1.0;
}
let age_secs = (now_ns - created_at_ns) as f64 / 1_000_000_000.0;
let half_life_secs = 48.0 * 3600.0;
let lambda = std::f64::consts::LN_2 / half_life_secs;
(-lambda * age_secs).exp()
}
/// Composite for_you score for a single candidate.
pub fn for_you_score(
pref_match: f64,
engagement_vel: f64,
recency: f64,
affinity: f64,
) -> f64 {
pref_match * 0.4 + engagement_vel * 0.3 + recency * 0.2 + affinity * 0.1
}
/// Composite related score for a single candidate.
pub fn related_score(
item_similarity: f64,
pref_match: f64,
engagement: f64,
) -> f64 {
item_similarity * 0.5 + pref_match * 0.3 + engagement * 0.2
}
/// Composite notification score for a single candidate.
pub fn notification_score(
relationship_strength: f64,
item_quality: f64,
) -> f64 {
relationship_strength * 0.6 + item_quality * 0.4
}
```
### Profile Definitions
```rust
// === ranking/builtins.rs (extensions) ===
/// Register the personalized profiles.
pub fn register_personalized_builtins(registry: &mut ProfileRegistry) -> crate::Result<()> {
// for_you
registry.register(RankingProfile {
name: "for_you".into(),
version: 1,
candidate_strategy: CandidateStrategy::Ann {
slot: "user_preference".into(),
limit: 200,
},
boosts: vec![],
decay: None,
gates: vec![Gate {
signal: "completion".into(),
agg: SignalAgg::Ratio,
window: Window::AllTime,
min_threshold: 0.02,
}],
penalties: vec![Penalty {
signal: "skip".into(),
agg: SignalAgg::Value,
window: Window::TwentyFourHours,
weight: 0.1,
}],
excludes: vec![],
diversity: DiversitySpec {
max_per_creator: Some(2),
format_mix_max_fraction: Some(0.6),
},
exploration: 0.1, // 10%
sort: None, // Custom scoring via score_with_context
is_builtin: true,
})?;
// following
registry.register(RankingProfile {
name: "following".into(),
version: 1,
candidate_strategy: CandidateStrategy::Relationship,
boosts: vec![],
decay: None,
gates: vec![],
penalties: vec![],
excludes: vec![],
diversity: DiversitySpec::default(),
exploration: 0.0,
sort: Some(Sort::New), // Chronological
is_builtin: true,
})?;
// related
registry.register(RankingProfile {
name: "related".into(),
version: 1,
candidate_strategy: CandidateStrategy::Ann {
slot: "default".into(), // Source item embedding
limit: 100,
},
boosts: vec![],
decay: None,
gates: vec![],
penalties: vec![],
excludes: vec![],
diversity: DiversitySpec {
max_per_creator: Some(2),
format_mix_max_fraction: None,
},
exploration: 0.0,
sort: None, // Custom scoring via score_with_context
is_builtin: true,
})?;
// notification
registry.register(RankingProfile {
name: "notification".into(),
version: 1,
candidate_strategy: CandidateStrategy::Relationship,
boosts: vec![],
decay: None,
gates: vec![],
penalties: vec![],
excludes: vec![],
diversity: DiversitySpec::default(),
exploration: 0.0,
sort: None, // Custom scoring via score_with_context
is_builtin: true,
})?;
Ok(())
}
```
### ProfileExecutor Extension
```rust
impl<'a> ProfileExecutor<'a> {
/// Score candidates with user context for personalized ranking.
///
/// When `user_ctx` is provided, the executor uses personalized scoring
/// functions. The profile name determines which scoring formula is used:
/// - `for_you`: preference match + engagement + recency + affinity
/// - `related`: item similarity + preference match + engagement
/// - `notification`: relationship strength + item quality
/// - All others: delegates to `score()` (population-level)
pub fn score_with_context(
&self,
candidates: &[EntityId],
profile: &RankingProfile,
now: Timestamp,
user_ctx: &UserContext,
item_embeddings: &dyn Fn(EntityId) -> Option<Vec<f32>>,
item_created_at: &dyn Fn(EntityId) -> Option<u64>,
) -> Vec<ScoredCandidate> {
let profile_name = profile.name.as_str();
let mut scored: Vec<ScoredCandidate> = candidates
.iter()
.filter(|&&eid| passes_gates(eid, &profile.gates, self.ledger))
.map(|&entity_id| {
let raw = match profile_name {
"for_you" => self.score_for_you(entity_id, user_ctx, now, item_embeddings, item_created_at),
"related" => self.score_related(entity_id, user_ctx, item_embeddings),
"notification" => self.score_notification(entity_id, user_ctx),
_ => self.compute_raw_score(entity_id, profile, now),
};
ScoredCandidate {
entity_id,
score: raw,
signal_snapshot: vec![],
creator_id: None,
format: None,
}
})
.collect();
scored.sort_by(|a, b| b.score.partial_cmp(&a.score).unwrap_or(std::cmp::Ordering::Equal));
normalize(&mut scored);
scored
}
fn score_for_you(
&self,
entity_id: EntityId,
user_ctx: &UserContext,
now: Timestamp,
item_embeddings: &dyn Fn(EntityId) -> Option<Vec<f32>>,
item_created_at: &dyn Fn(EntityId) -> Option<u64>,
) -> f64 {
let item_emb = item_embeddings(entity_id);
let pref_match = preference_match(user_ctx, item_emb.as_deref());
let view_vel = read_agg(entity_id, "view", &SignalAgg::Velocity, Window::TwentyFourHours, self.ledger);
let share_vel = read_agg(entity_id, "share", &SignalAgg::Velocity, Window::TwentyFourHours, self.ledger);
let engagement_vel = (view_vel + 2.0 * share_vel).min(1.0);
let recency = item_created_at(entity_id)
.map_or(0.5, |ts| recency_score(ts, now));
let creator_id = None; // Read from metadata in actual implementation
let affinity = creator_affinity(user_ctx, creator_id);
for_you_score(pref_match, engagement_vel, recency, affinity)
}
fn score_related(
&self,
entity_id: EntityId,
user_ctx: &UserContext,
item_embeddings: &dyn Fn(EntityId) -> Option<Vec<f32>>,
) -> f64 {
let item_emb = item_embeddings(entity_id);
let pref_match = preference_match(user_ctx, item_emb.as_deref());
let views = read_agg(entity_id, "view", &SignalAgg::Value, Window::AllTime, self.ledger);
let likes = read_agg(entity_id, "like", &SignalAgg::Value, Window::AllTime, self.ledger);
let engagement = (views.log10().max(0.0) + likes.log10().max(0.0)) / 10.0;
// item_similarity is computed by the caller from ANN distances.
// For now, use preference match as a proxy.
let item_similarity = pref_match;
related_score(item_similarity, pref_match, engagement)
}
fn score_notification(
&self,
entity_id: EntityId,
user_ctx: &UserContext,
) -> f64 {
let creator_id = None; // Read from metadata
let rel_strength = creator_affinity(user_ctx, creator_id);
let view_vel = read_agg(entity_id, "view", &SignalAgg::Velocity, Window::TwentyFourHours, self.ledger);
let completion = read_agg(entity_id, "completion", &SignalAgg::DecayScore, Window::AllTime, self.ledger);
let item_quality = (view_vel.log10().max(0.0) + completion) / 2.0;
notification_score(rel_strength, item_quality)
}
}
```
## Test Strategy
### Unit Tests
```rust
#[test]
fn preference_match_identical_vectors() {
let pref = PreferenceVector::from_embedding(vec![1.0, 0.0, 0.0], 3).unwrap();
let ctx = UserContext {
user_id: EntityId::new(1),
preference_vector: Some(pref),
top_creators: vec![],
followed_creators: HashSet::new(),
blocked_creators: HashSet::new(),
hidden_items: HashSet::new(),
is_cold_start: false,
};
let item = [1.0f32, 0.0, 0.0];
let score = preference_match(&ctx, Some(&item));
assert!((score - 1.0).abs() < 0.01, "identical vectors: {}", score);
}
#[test]
fn preference_match_orthogonal_vectors() {
let pref = PreferenceVector::from_embedding(vec![1.0, 0.0, 0.0], 3).unwrap();
let ctx = UserContext {
user_id: EntityId::new(1),
preference_vector: Some(pref),
..cold_start_context()
};
let item = [0.0f32, 1.0, 0.0];
let score = preference_match(&ctx, Some(&item));
assert!((score - 0.5).abs() < 0.01, "orthogonal: {}", score);
}
#[test]
fn preference_match_opposite_vectors() {
let pref = PreferenceVector::from_embedding(vec![1.0, 0.0, 0.0], 3).unwrap();
let ctx = UserContext {
user_id: EntityId::new(1),
preference_vector: Some(pref),
..cold_start_context()
};
let item = [-1.0f32, 0.0, 0.0];
let score = preference_match(&ctx, Some(&item));
assert!((score - 0.0).abs() < 0.01, "opposite: {}", score);
}
#[test]
fn preference_match_cold_start_returns_neutral() {
let ctx = cold_start_context();
let item = [1.0f32, 0.0, 0.0];
let score = preference_match(&ctx, Some(&item));
assert!((score - 0.5).abs() < f64::EPSILON);
}
#[test]
fn creator_affinity_zero_for_no_interaction() {
let ctx = cold_start_context();
let score = creator_affinity(&ctx, Some(EntityId::new(10)));
assert!((score - 0.0).abs() < f64::EPSILON);
}
#[test]
fn creator_affinity_saturates() {
let ctx = UserContext {
user_id: EntityId::new(1),
top_creators: vec![(EntityId::new(10), 100.0)],
..cold_start_context()
};
let score = creator_affinity(&ctx, Some(EntityId::new(10)));
// weight=100, k=5: 100/(100+5) = 0.952
assert!(score > 0.9, "high affinity: {}", score);
}
#[test]
fn recency_score_now_is_one() {
let now = Timestamp::now();
let score = recency_score(now.as_nanos(), now);
assert!((score - 1.0).abs() < 0.01);
}
#[test]
fn recency_score_48h_is_half() {
let now = Timestamp::now();
let forty_eight_hours_ago = now.as_nanos() - 48 * 3600 * 1_000_000_000;
let score = recency_score(forty_eight_hours_ago, now);
assert!((score - 0.5).abs() < 0.05, "48h recency: {}", score);
}
#[test]
fn for_you_score_range() {
let score = for_you_score(1.0, 1.0, 1.0, 1.0);
assert!((score - 1.0).abs() < f64::EPSILON);
let score_zero = for_you_score(0.0, 0.0, 0.0, 0.0);
assert!((score_zero - 0.0).abs() < f64::EPSILON);
}
#[test]
fn related_score_range() {
let score = related_score(1.0, 1.0, 1.0);
assert!((score - 1.0).abs() < f64::EPSILON);
}
#[test]
fn notification_score_range() {
let score = notification_score(1.0, 1.0);
assert!((score - 1.0).abs() < f64::EPSILON);
}
fn cold_start_context() -> UserContext {
UserContext {
user_id: EntityId::new(1),
preference_vector: None,
top_creators: vec![],
followed_creators: HashSet::new(),
blocked_creators: HashSet::new(),
hidden_items: HashSet::new(),
is_cold_start: true,
}
}
```
### Property Tests
```rust
use proptest::prelude::*;
proptest! {
#[test]
fn preference_match_always_in_unit_range(
pref_vec in proptest::collection::vec(-1.0f32..1.0, 16),
item_vec in proptest::collection::vec(-1.0f32..1.0, 16),
) {
if let Some(pref) = PreferenceVector::from_embedding(pref_vec, 16) {
let ctx = UserContext {
user_id: EntityId::new(1),
preference_vector: Some(pref),
..cold_start_context()
};
let score = preference_match(&ctx, Some(&item_vec));
prop_assert!(score >= 0.0 && score <= 1.0,
"preference match out of range: {}", score);
}
}
#[test]
fn for_you_score_always_in_unit_range(
pm in 0.0f64..1.0,
ev in 0.0f64..1.0,
r in 0.0f64..1.0,
a in 0.0f64..1.0,
) {
let score = for_you_score(pm, ev, r, a);
prop_assert!(score >= 0.0 && score <= 1.0,
"for_you score out of range: {}", score);
}
#[test]
fn creator_affinity_always_in_unit_range(
weight in 0.0f64..1000.0,
) {
let ctx = UserContext {
user_id: EntityId::new(1),
top_creators: vec![(EntityId::new(10), weight)],
..cold_start_context()
};
let score = creator_affinity(&ctx, Some(EntityId::new(10)));
prop_assert!(score >= 0.0 && score <= 1.0,
"creator affinity out of range: {}", score);
}
}
```
## Acceptance Criteria
- [ ] `preference_match` returns cosine similarity remapped to [0, 1], neutral 0.5 for missing data
- [ ] `creator_affinity` returns sigmoid-normalized interaction weight in [0, 1]
- [ ] `recency_score` returns exponential decay with 48h half-life
- [ ] `for_you_score` combines four factors with weights summing to 1.0
- [ ] `related_score` combines three factors with weights summing to 1.0
- [ ] `notification_score` combines two factors with weights summing to 1.0
- [ ] All scoring functions return values in [0.0, 1.0] (property tested)
- [ ] `for_you` profile registered as builtin with correct configuration
- [ ] `following` profile registered with `Sort::New` and `CandidateStrategy::Relationship`
- [ ] `related` profile registered with ANN candidate strategy
- [ ] `notification` profile registered with relationship-based candidates
- [ ] `ProfileExecutor::score_with_context` dispatches to correct scoring function by profile name
- [ ] Cold-start users get neutral scores (0.5 preference match, 0.0 affinity)
- [ ] `cargo clippy -- -D warnings` passes
- [ ] All tests pass
## Research References
- [docs/research/ann_for_tidaldb.md](../../../research/ann_for_tidaldb.md) -- Cosine similarity via dot product on unit vectors
- [VISION.md](../../../../VISION.md) -- Ranking profile formulas
- [USE_CASES.md](../../../../USE_CASES.md) -- UC-01, UC-04, UC-05, UC-07
## Implementation Notes
- The scoring functions are intentionally simple linear combinations. The weights (0.4/0.3/0.2/0.1 for for_you) are starting points that can be tuned without code changes if the profile system is extended to accept configurable weights. For M3, hardcoded weights are sufficient.
- `creator_affinity` uses a sigmoid-like `w/(w+k)` transformation instead of raw weight. This bounds the output to [0, 1] and prevents high-weight creators from completely dominating the score. The half-saturation constant `k=5.0` means a weight of 5 produces affinity 0.5.
- For the `related` profile, the `item_similarity` should ideally come from the ANN distance between the source item and candidate item. In this task, we use preference match as a proxy. The full implementation should pipe ANN distances through from the candidate retrieval phase.
- The `following` profile uses `Sort::New` from the existing sort system. No custom scoring is needed -- the executor's existing `score_by_sort` handles chronological ordering.
- The `notification` profile's `CandidateStrategy::Relationship` means candidates are sourced from followed creators' items. The RETRIEVE executor must implement this candidate sourcing strategy, which uses the `FollowsBitmap` from m3p1 Task 03.
- Do NOT implement the exploration budget injection in this task. The `exploration: 0.1` field on the `for_you` profile is defined here but not enforced. Enforcement is done in Task 03 (Cold Start and Exploration).