M2: RETRIEVE query pipeline with 5-stage execution (candidate → filter → score → diversify → limit),
usearch HNSW vector index, bitmap/range/universe filters, ranking profiles with signal scoring,
MMR diversity enforcement, and m2_uat integration tests.
M3: Entity system with typed metadata, relationship graph (follows/blocks/interactions),
creator entities, session tracking, and m3_uat integration tests.
M4: Advanced ranking with builtin functions (freshness, trending, controversy, wilson),
ranking executor with explain mode, query executor integration, benchmarks for
query/ranking/vector/filters/diversity, and m4_uat integration tests.
Includes: 9 new blog posts, marketing site updates, updated roadmap, and updated vision doc.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
21 KiB
Task 02: Personalized Profiles
Context
Milestone: 3 -- Personalized Ranking
Phase: m3p3 -- Personalized Ranking Profiles
Depends On: Task 01 (UserContext with preference vector, interaction weights, follows), m2p3 (profile engine, ProfileExecutor, RankingProfile), m2p1 (vector index for ANN retrieval), m2p4 (diversity enforcement)
Blocks: Task 03 (Cold Start and Exploration needs for_you profile to inject exploration candidates)
Complexity: L
Objective
Deliver four personalized ranking profiles: for_you, following, related, and notification. These profiles are registered as builtins in the ProfileRegistry alongside the existing M2 profiles (trending, hot, new, etc.). Each profile uses the UserContext from Task 01 to personalize candidate retrieval and scoring.
The ProfileExecutor is extended with a score_with_context method that accepts an optional UserContext. When user context is available, the executor applies personalization factors: preference match (cosine similarity between user and item embeddings), creator affinity (interaction weight boost), and social proof (engagement from followed creators).
These four profiles cover UC-01 (For You Feed), UC-04 (Following Feed), UC-05 (Related/Up Next), and UC-07 (Notifications).
Requirements
for_you Profile
- Candidate strategy: ANN retrieval using user preference vector (top 200 candidates)
- Scoring formula:
preference_match * 0.4 + engagement_velocity * 0.3 + recency * 0.2 + creator_affinity * 0.1 preference_match: cosine similarity between user preference vector and item embeddingengagement_velocity: normalized view + share velocity from signal ledgerrecency: exponential decay from item age (half-life 48h)creator_affinity: interaction weight between user and item's creator, normalized to [0, 1]- Gate: completion_rate > 0.02 (filters very low quality)
- Diversity: max_per_creator from query, format_mix 0.6
- Exploration: 10% budget (injected by Task 03)
following Profile
- Candidate strategy: relationship-based (items from followed creators only)
- Scoring:
created_atDESC (chronological), with tiebreaker oncompletion_rate - No engagement-based scoring -- chronological is the default for following feeds
- No diversity enforcement (creator identity IS the filter)
- No exploration budget
related Profile
- Candidate strategy: ANN retrieval using source item embedding (top 100 candidates)
- Scoring:
item_similarity * 0.5 + preference_match * 0.3 + engagement * 0.2 item_similarity: cosine between source item and candidate item embeddingspreference_match: cosine between user preference vector and candidate, if availableengagement: normalized population signals (view count, like count)- Filter: exclude the source item itself
- Diversity: max_per_creator:2
notification Profile
- Candidate strategy: scan recent items from followed creators (last 48h)
- Scoring:
relationship_strength * 0.6 + item_quality * 0.4 relationship_strength: interaction weight between user and creator, normalizeditem_quality: composite of view velocity + completion rate- Filter: only items from followed creators, created within 48h
- Sort: descending by score
Technical Design
Module Structure
tidal/src/
ranking/
personalized.rs -- Personalized scoring functions
builtins.rs -- Extended with new profile definitions
Personalized Scoring Functions
// === ranking/personalized.rs ===
use crate::db::user_context::UserContext;
use crate::schema::{EntityId, Timestamp, Window};
use crate::signals::SignalLedger;
/// Compute the preference match score between a user and an item.
///
/// Returns cosine similarity in [-1.0, 1.0], remapped to [0.0, 1.0].
/// Returns 0.5 (neutral) if either vector is unavailable.
pub fn preference_match(
user_ctx: &UserContext,
item_embedding: Option<&[f32]>,
) -> f64 {
match (&user_ctx.preference_vector, item_embedding) {
(Some(pref), Some(item)) => {
if let Some(pref_data) = pref.as_slice() {
if pref_data.len() == item.len() {
let cosine: f64 = pref_data.iter()
.zip(item.iter())
.map(|(&a, &b)| f64::from(a) * f64::from(b))
.sum();
// Remap [-1, 1] to [0, 1].
(cosine + 1.0) / 2.0
} else {
0.5 // Dimension mismatch: neutral score
}
} else {
0.5 // Cold start: neutral score
}
}
_ => 0.5, // Missing data: neutral score
}
}
/// Compute the creator affinity score for a user-creator pair.
///
/// Normalizes the interaction weight to [0.0, 1.0] using a sigmoid-like
/// transformation: `affinity = weight / (weight + k)` where k is a
/// half-saturation constant (default 5.0).
pub fn creator_affinity(
user_ctx: &UserContext,
creator_id: Option<EntityId>,
) -> f64 {
const K: f64 = 5.0; // Half-saturation constant
match creator_id {
Some(cid) => {
let weight = user_ctx.interaction_weight(cid);
weight / (weight + K)
}
None => 0.0,
}
}
/// Compute a recency score based on item age.
///
/// Uses exponential decay with a 48-hour half-life.
/// Items created at `now` get score 1.0; items 48h old get 0.5.
pub fn recency_score(
created_at_ns: u64,
now: Timestamp,
) -> f64 {
let now_ns = now.as_nanos();
if created_at_ns >= now_ns {
return 1.0;
}
let age_secs = (now_ns - created_at_ns) as f64 / 1_000_000_000.0;
let half_life_secs = 48.0 * 3600.0;
let lambda = std::f64::consts::LN_2 / half_life_secs;
(-lambda * age_secs).exp()
}
/// Composite for_you score for a single candidate.
pub fn for_you_score(
pref_match: f64,
engagement_vel: f64,
recency: f64,
affinity: f64,
) -> f64 {
pref_match * 0.4 + engagement_vel * 0.3 + recency * 0.2 + affinity * 0.1
}
/// Composite related score for a single candidate.
pub fn related_score(
item_similarity: f64,
pref_match: f64,
engagement: f64,
) -> f64 {
item_similarity * 0.5 + pref_match * 0.3 + engagement * 0.2
}
/// Composite notification score for a single candidate.
pub fn notification_score(
relationship_strength: f64,
item_quality: f64,
) -> f64 {
relationship_strength * 0.6 + item_quality * 0.4
}
Profile Definitions
// === ranking/builtins.rs (extensions) ===
/// Register the personalized profiles.
pub fn register_personalized_builtins(registry: &mut ProfileRegistry) -> crate::Result<()> {
// for_you
registry.register(RankingProfile {
name: "for_you".into(),
version: 1,
candidate_strategy: CandidateStrategy::Ann {
slot: "user_preference".into(),
limit: 200,
},
boosts: vec![],
decay: None,
gates: vec![Gate {
signal: "completion".into(),
agg: SignalAgg::Ratio,
window: Window::AllTime,
min_threshold: 0.02,
}],
penalties: vec![Penalty {
signal: "skip".into(),
agg: SignalAgg::Value,
window: Window::TwentyFourHours,
weight: 0.1,
}],
excludes: vec![],
diversity: DiversitySpec {
max_per_creator: Some(2),
format_mix_max_fraction: Some(0.6),
},
exploration: 0.1, // 10%
sort: None, // Custom scoring via score_with_context
is_builtin: true,
})?;
// following
registry.register(RankingProfile {
name: "following".into(),
version: 1,
candidate_strategy: CandidateStrategy::Relationship,
boosts: vec![],
decay: None,
gates: vec![],
penalties: vec![],
excludes: vec![],
diversity: DiversitySpec::default(),
exploration: 0.0,
sort: Some(Sort::New), // Chronological
is_builtin: true,
})?;
// related
registry.register(RankingProfile {
name: "related".into(),
version: 1,
candidate_strategy: CandidateStrategy::Ann {
slot: "default".into(), // Source item embedding
limit: 100,
},
boosts: vec![],
decay: None,
gates: vec![],
penalties: vec![],
excludes: vec![],
diversity: DiversitySpec {
max_per_creator: Some(2),
format_mix_max_fraction: None,
},
exploration: 0.0,
sort: None, // Custom scoring via score_with_context
is_builtin: true,
})?;
// notification
registry.register(RankingProfile {
name: "notification".into(),
version: 1,
candidate_strategy: CandidateStrategy::Relationship,
boosts: vec![],
decay: None,
gates: vec![],
penalties: vec![],
excludes: vec![],
diversity: DiversitySpec::default(),
exploration: 0.0,
sort: None, // Custom scoring via score_with_context
is_builtin: true,
})?;
Ok(())
}
ProfileExecutor Extension
impl<'a> ProfileExecutor<'a> {
/// Score candidates with user context for personalized ranking.
///
/// When `user_ctx` is provided, the executor uses personalized scoring
/// functions. The profile name determines which scoring formula is used:
/// - `for_you`: preference match + engagement + recency + affinity
/// - `related`: item similarity + preference match + engagement
/// - `notification`: relationship strength + item quality
/// - All others: delegates to `score()` (population-level)
pub fn score_with_context(
&self,
candidates: &[EntityId],
profile: &RankingProfile,
now: Timestamp,
user_ctx: &UserContext,
item_embeddings: &dyn Fn(EntityId) -> Option<Vec<f32>>,
item_created_at: &dyn Fn(EntityId) -> Option<u64>,
) -> Vec<ScoredCandidate> {
let profile_name = profile.name.as_str();
let mut scored: Vec<ScoredCandidate> = candidates
.iter()
.filter(|&&eid| passes_gates(eid, &profile.gates, self.ledger))
.map(|&entity_id| {
let raw = match profile_name {
"for_you" => self.score_for_you(entity_id, user_ctx, now, item_embeddings, item_created_at),
"related" => self.score_related(entity_id, user_ctx, item_embeddings),
"notification" => self.score_notification(entity_id, user_ctx),
_ => self.compute_raw_score(entity_id, profile, now),
};
ScoredCandidate {
entity_id,
score: raw,
signal_snapshot: vec![],
creator_id: None,
format: None,
}
})
.collect();
scored.sort_by(|a, b| b.score.partial_cmp(&a.score).unwrap_or(std::cmp::Ordering::Equal));
normalize(&mut scored);
scored
}
fn score_for_you(
&self,
entity_id: EntityId,
user_ctx: &UserContext,
now: Timestamp,
item_embeddings: &dyn Fn(EntityId) -> Option<Vec<f32>>,
item_created_at: &dyn Fn(EntityId) -> Option<u64>,
) -> f64 {
let item_emb = item_embeddings(entity_id);
let pref_match = preference_match(user_ctx, item_emb.as_deref());
let view_vel = read_agg(entity_id, "view", &SignalAgg::Velocity, Window::TwentyFourHours, self.ledger);
let share_vel = read_agg(entity_id, "share", &SignalAgg::Velocity, Window::TwentyFourHours, self.ledger);
let engagement_vel = (view_vel + 2.0 * share_vel).min(1.0);
let recency = item_created_at(entity_id)
.map_or(0.5, |ts| recency_score(ts, now));
let creator_id = None; // Read from metadata in actual implementation
let affinity = creator_affinity(user_ctx, creator_id);
for_you_score(pref_match, engagement_vel, recency, affinity)
}
fn score_related(
&self,
entity_id: EntityId,
user_ctx: &UserContext,
item_embeddings: &dyn Fn(EntityId) -> Option<Vec<f32>>,
) -> f64 {
let item_emb = item_embeddings(entity_id);
let pref_match = preference_match(user_ctx, item_emb.as_deref());
let views = read_agg(entity_id, "view", &SignalAgg::Value, Window::AllTime, self.ledger);
let likes = read_agg(entity_id, "like", &SignalAgg::Value, Window::AllTime, self.ledger);
let engagement = (views.log10().max(0.0) + likes.log10().max(0.0)) / 10.0;
// item_similarity is computed by the caller from ANN distances.
// For now, use preference match as a proxy.
let item_similarity = pref_match;
related_score(item_similarity, pref_match, engagement)
}
fn score_notification(
&self,
entity_id: EntityId,
user_ctx: &UserContext,
) -> f64 {
let creator_id = None; // Read from metadata
let rel_strength = creator_affinity(user_ctx, creator_id);
let view_vel = read_agg(entity_id, "view", &SignalAgg::Velocity, Window::TwentyFourHours, self.ledger);
let completion = read_agg(entity_id, "completion", &SignalAgg::DecayScore, Window::AllTime, self.ledger);
let item_quality = (view_vel.log10().max(0.0) + completion) / 2.0;
notification_score(rel_strength, item_quality)
}
}
Test Strategy
Unit Tests
#[test]
fn preference_match_identical_vectors() {
let pref = PreferenceVector::from_embedding(vec![1.0, 0.0, 0.0], 3).unwrap();
let ctx = UserContext {
user_id: EntityId::new(1),
preference_vector: Some(pref),
top_creators: vec![],
followed_creators: HashSet::new(),
blocked_creators: HashSet::new(),
hidden_items: HashSet::new(),
is_cold_start: false,
};
let item = [1.0f32, 0.0, 0.0];
let score = preference_match(&ctx, Some(&item));
assert!((score - 1.0).abs() < 0.01, "identical vectors: {}", score);
}
#[test]
fn preference_match_orthogonal_vectors() {
let pref = PreferenceVector::from_embedding(vec![1.0, 0.0, 0.0], 3).unwrap();
let ctx = UserContext {
user_id: EntityId::new(1),
preference_vector: Some(pref),
..cold_start_context()
};
let item = [0.0f32, 1.0, 0.0];
let score = preference_match(&ctx, Some(&item));
assert!((score - 0.5).abs() < 0.01, "orthogonal: {}", score);
}
#[test]
fn preference_match_opposite_vectors() {
let pref = PreferenceVector::from_embedding(vec![1.0, 0.0, 0.0], 3).unwrap();
let ctx = UserContext {
user_id: EntityId::new(1),
preference_vector: Some(pref),
..cold_start_context()
};
let item = [-1.0f32, 0.0, 0.0];
let score = preference_match(&ctx, Some(&item));
assert!((score - 0.0).abs() < 0.01, "opposite: {}", score);
}
#[test]
fn preference_match_cold_start_returns_neutral() {
let ctx = cold_start_context();
let item = [1.0f32, 0.0, 0.0];
let score = preference_match(&ctx, Some(&item));
assert!((score - 0.5).abs() < f64::EPSILON);
}
#[test]
fn creator_affinity_zero_for_no_interaction() {
let ctx = cold_start_context();
let score = creator_affinity(&ctx, Some(EntityId::new(10)));
assert!((score - 0.0).abs() < f64::EPSILON);
}
#[test]
fn creator_affinity_saturates() {
let ctx = UserContext {
user_id: EntityId::new(1),
top_creators: vec![(EntityId::new(10), 100.0)],
..cold_start_context()
};
let score = creator_affinity(&ctx, Some(EntityId::new(10)));
// weight=100, k=5: 100/(100+5) = 0.952
assert!(score > 0.9, "high affinity: {}", score);
}
#[test]
fn recency_score_now_is_one() {
let now = Timestamp::now();
let score = recency_score(now.as_nanos(), now);
assert!((score - 1.0).abs() < 0.01);
}
#[test]
fn recency_score_48h_is_half() {
let now = Timestamp::now();
let forty_eight_hours_ago = now.as_nanos() - 48 * 3600 * 1_000_000_000;
let score = recency_score(forty_eight_hours_ago, now);
assert!((score - 0.5).abs() < 0.05, "48h recency: {}", score);
}
#[test]
fn for_you_score_range() {
let score = for_you_score(1.0, 1.0, 1.0, 1.0);
assert!((score - 1.0).abs() < f64::EPSILON);
let score_zero = for_you_score(0.0, 0.0, 0.0, 0.0);
assert!((score_zero - 0.0).abs() < f64::EPSILON);
}
#[test]
fn related_score_range() {
let score = related_score(1.0, 1.0, 1.0);
assert!((score - 1.0).abs() < f64::EPSILON);
}
#[test]
fn notification_score_range() {
let score = notification_score(1.0, 1.0);
assert!((score - 1.0).abs() < f64::EPSILON);
}
fn cold_start_context() -> UserContext {
UserContext {
user_id: EntityId::new(1),
preference_vector: None,
top_creators: vec![],
followed_creators: HashSet::new(),
blocked_creators: HashSet::new(),
hidden_items: HashSet::new(),
is_cold_start: true,
}
}
Property Tests
use proptest::prelude::*;
proptest! {
#[test]
fn preference_match_always_in_unit_range(
pref_vec in proptest::collection::vec(-1.0f32..1.0, 16),
item_vec in proptest::collection::vec(-1.0f32..1.0, 16),
) {
if let Some(pref) = PreferenceVector::from_embedding(pref_vec, 16) {
let ctx = UserContext {
user_id: EntityId::new(1),
preference_vector: Some(pref),
..cold_start_context()
};
let score = preference_match(&ctx, Some(&item_vec));
prop_assert!(score >= 0.0 && score <= 1.0,
"preference match out of range: {}", score);
}
}
#[test]
fn for_you_score_always_in_unit_range(
pm in 0.0f64..1.0,
ev in 0.0f64..1.0,
r in 0.0f64..1.0,
a in 0.0f64..1.0,
) {
let score = for_you_score(pm, ev, r, a);
prop_assert!(score >= 0.0 && score <= 1.0,
"for_you score out of range: {}", score);
}
#[test]
fn creator_affinity_always_in_unit_range(
weight in 0.0f64..1000.0,
) {
let ctx = UserContext {
user_id: EntityId::new(1),
top_creators: vec![(EntityId::new(10), weight)],
..cold_start_context()
};
let score = creator_affinity(&ctx, Some(EntityId::new(10)));
prop_assert!(score >= 0.0 && score <= 1.0,
"creator affinity out of range: {}", score);
}
}
Acceptance Criteria
preference_matchreturns cosine similarity remapped to [0, 1], neutral 0.5 for missing datacreator_affinityreturns sigmoid-normalized interaction weight in [0, 1]recency_scorereturns exponential decay with 48h half-lifefor_you_scorecombines four factors with weights summing to 1.0related_scorecombines three factors with weights summing to 1.0notification_scorecombines two factors with weights summing to 1.0- All scoring functions return values in [0.0, 1.0] (property tested)
for_youprofile registered as builtin with correct configurationfollowingprofile registered withSort::NewandCandidateStrategy::Relationshiprelatedprofile registered with ANN candidate strategynotificationprofile registered with relationship-based candidatesProfileExecutor::score_with_contextdispatches to correct scoring function by profile name- Cold-start users get neutral scores (0.5 preference match, 0.0 affinity)
cargo clippy -- -D warningspasses- All tests pass
Research References
- docs/research/ann_for_tidaldb.md -- Cosine similarity via dot product on unit vectors
- VISION.md -- Ranking profile formulas
- USE_CASES.md -- UC-01, UC-04, UC-05, UC-07
Implementation Notes
- The scoring functions are intentionally simple linear combinations. The weights (0.4/0.3/0.2/0.1 for for_you) are starting points that can be tuned without code changes if the profile system is extended to accept configurable weights. For M3, hardcoded weights are sufficient.
creator_affinityuses a sigmoid-likew/(w+k)transformation instead of raw weight. This bounds the output to [0, 1] and prevents high-weight creators from completely dominating the score. The half-saturation constantk=5.0means a weight of 5 produces affinity 0.5.- For the
relatedprofile, theitem_similarityshould ideally come from the ANN distance between the source item and candidate item. In this task, we use preference match as a proxy. The full implementation should pipe ANN distances through from the candidate retrieval phase. - The
followingprofile usesSort::Newfrom the existing sort system. No custom scoring is needed -- the executor's existingscore_by_sorthandles chronological ordering. - The
notificationprofile'sCandidateStrategy::Relationshipmeans candidates are sourced from followed creators' items. The RETRIEVE executor must implement this candidate sourcing strategy, which uses theFollowsBitmapfrom m3p1 Task 03. - Do NOT implement the exploration budget injection in this task. The
exploration: 0.1field on thefor_youprofile is defined here but not enforced. Enforcement is done in Task 03 (Cold Start and Exploration).