tidaldb/docs/planning/milestone-3/phase-3/task-02-personalized-profiles.md

# Task 02: Personalized Profiles

## Context

**Milestone:** 3 -- Personalized Ranking
**Phase:** m3p3 -- Personalized Ranking Profiles
**Depends On:** Task 01 (`UserContext` with preference vector, interaction weights, follows), m2p3 (profile engine, `ProfileExecutor`, `RankingProfile`), m2p1 (vector index for ANN retrieval), m2p4 (diversity enforcement)
**Blocks:** Task 03 (Cold Start and Exploration needs `for_you` profile to inject exploration candidates)
**Complexity:** L

## Objective

Deliver four personalized ranking profiles: `for_you`, `following`, `related`, and `notification`. These profiles are registered as builtins in the `ProfileRegistry` alongside the existing M2 profiles (trending, hot, new, etc.). Each profile uses the `UserContext` from Task 01 to personalize candidate retrieval and scoring.

The `ProfileExecutor` is extended with a `score_with_context` method that accepts an optional `UserContext`. When user context is available, the executor applies personalization factors: preference match (cosine similarity between user and item embeddings), creator affinity (interaction weight boost), and social proof (engagement from followed creators).

These four profiles cover UC-01 (For You Feed), UC-04 (Following Feed), UC-05 (Related/Up Next), and UC-07 (Notifications).

## Requirements

### for_you Profile
- Candidate strategy: ANN retrieval using user preference vector (top 200 candidates)
- Scoring formula: `preference_match * 0.4 + engagement_velocity * 0.3 + recency * 0.2 + creator_affinity * 0.1`
- `preference_match`: cosine similarity between user preference vector and item embedding
- `engagement_velocity`: normalized view + share velocity from signal ledger
- `recency`: exponential decay from item age (half-life 48h)
- `creator_affinity`: interaction weight between user and item's creator, normalized to [0, 1]
- Gate: completion_rate > 0.02 (filters very low quality)
- Diversity: max_per_creator from query, format_mix 0.6
- Exploration: 10% budget (injected by Task 03)

### following Profile
- Candidate strategy: relationship-based (items from followed creators only)
- Scoring: `created_at` DESC (chronological), with tiebreaker on `completion_rate`
- No engagement-based scoring -- chronological is the default for following feeds
- No diversity enforcement (creator identity IS the filter)
- No exploration budget

### related Profile
- Candidate strategy: ANN retrieval using source item embedding (top 100 candidates)
- Scoring: `item_similarity * 0.5 + preference_match * 0.3 + engagement * 0.2`
- `item_similarity`: cosine between source item and candidate item embeddings
- `preference_match`: cosine between user preference vector and candidate, if available
- `engagement`: normalized population signals (view count, like count)
- Filter: exclude the source item itself
- Diversity: max_per_creator:2

### notification Profile
- Candidate strategy: scan recent items from followed creators (last 48h)
- Scoring: `relationship_strength * 0.6 + item_quality * 0.4`
- `relationship_strength`: interaction weight between user and creator, normalized
- `item_quality`: composite of view velocity + completion rate
- Filter: only items from followed creators, created within 48h
- Sort: descending by score

## Technical Design

### Module Structure

```
tidal/src/
  ranking/
    personalized.rs -- Personalized scoring functions
    builtins.rs     -- Extended with new profile definitions
```

### Personalized Scoring Functions

```rust
// === ranking/personalized.rs ===

use crate::db::user_context::UserContext;
use crate::schema::{EntityId, Timestamp, Window};
use crate::signals::SignalLedger;

/// Compute the preference match score between a user and an item.
///
/// Returns cosine similarity in [-1.0, 1.0], remapped to [0.0, 1.0].
/// Returns 0.5 (neutral) if either vector is unavailable.
pub fn preference_match(
    user_ctx: &UserContext,
    item_embedding: Option<&[f32]>,
) -> f64 {
    match (&user_ctx.preference_vector, item_embedding) {
        (Some(pref), Some(item)) => {
            if let Some(pref_data) = pref.as_slice() {
                if pref_data.len() == item.len() {
                    let cosine: f64 = pref_data.iter()
                        .zip(item.iter())
                        .map(|(&a, &b)| f64::from(a) * f64::from(b))
                        .sum();
                    // Remap [-1, 1] to [0, 1].
                    (cosine + 1.0) / 2.0
                } else {
                    0.5 // Dimension mismatch: neutral score
                }
            } else {
                0.5 // Cold start: neutral score
            }
        }
        _ => 0.5, // Missing data: neutral score
    }
}

/// Compute the creator affinity score for a user-creator pair.
///
/// Normalizes the interaction weight to [0.0, 1.0] using a sigmoid-like
/// transformation: `affinity = weight / (weight + k)` where k is a
/// half-saturation constant (default 5.0).
pub fn creator_affinity(
    user_ctx: &UserContext,
    creator_id: Option<EntityId>,
) -> f64 {
    const K: f64 = 5.0; // Half-saturation constant
    match creator_id {
        Some(cid) => {
            let weight = user_ctx.interaction_weight(cid);
            weight / (weight + K)
        }
        None => 0.0,
    }
}

/// Compute a recency score based on item age.
///
/// Uses exponential decay with a 48-hour half-life.
/// Items created at `now` get score 1.0; items 48h old get 0.5.
pub fn recency_score(
    created_at_ns: u64,
    now: Timestamp,
) -> f64 {
    let now_ns = now.as_nanos();
    if created_at_ns >= now_ns {
        return 1.0;
    }
    let age_secs = (now_ns - created_at_ns) as f64 / 1_000_000_000.0;
    let half_life_secs = 48.0 * 3600.0;
    let lambda = std::f64::consts::LN_2 / half_life_secs;
    (-lambda * age_secs).exp()
}

/// Composite for_you score for a single candidate.
pub fn for_you_score(
    pref_match: f64,
    engagement_vel: f64,
    recency: f64,
    affinity: f64,
) -> f64 {
    pref_match * 0.4 + engagement_vel * 0.3 + recency * 0.2 + affinity * 0.1
}

/// Composite related score for a single candidate.
pub fn related_score(
    item_similarity: f64,
    pref_match: f64,
    engagement: f64,
) -> f64 {
    item_similarity * 0.5 + pref_match * 0.3 + engagement * 0.2
}

/// Composite notification score for a single candidate.
pub fn notification_score(
    relationship_strength: f64,
    item_quality: f64,
) -> f64 {
    relationship_strength * 0.6 + item_quality * 0.4
}
```

### Profile Definitions

```rust
// === ranking/builtins.rs (extensions) ===

/// Register the personalized profiles.
pub fn register_personalized_builtins(registry: &mut ProfileRegistry) -> crate::Result<()> {
    // for_you
    registry.register(RankingProfile {
        name: "for_you".into(),
        version: 1,
        candidate_strategy: CandidateStrategy::Ann {
            slot: "user_preference".into(),
            limit: 200,
        },
        boosts: vec![],
        decay: None,
        gates: vec![Gate {
            signal: "completion".into(),
            agg: SignalAgg::Ratio,
            window: Window::AllTime,
            min_threshold: 0.02,
        }],
        penalties: vec![Penalty {
            signal: "skip".into(),
            agg: SignalAgg::Value,
            window: Window::TwentyFourHours,
            weight: 0.1,
        }],
        excludes: vec![],
        diversity: DiversitySpec {
            max_per_creator: Some(2),
            format_mix_max_fraction: Some(0.6),
        },
        exploration: 0.1, // 10%
        sort: None, // Custom scoring via score_with_context
        is_builtin: true,
    })?;

    // following
    registry.register(RankingProfile {
        name: "following".into(),
        version: 1,
        candidate_strategy: CandidateStrategy::Relationship,
        boosts: vec![],
        decay: None,
        gates: vec![],
        penalties: vec![],
        excludes: vec![],
        diversity: DiversitySpec::default(),
        exploration: 0.0,
        sort: Some(Sort::New), // Chronological
        is_builtin: true,
    })?;

    // related
    registry.register(RankingProfile {
        name: "related".into(),
        version: 1,
        candidate_strategy: CandidateStrategy::Ann {
            slot: "default".into(), // Source item embedding
            limit: 100,
        },
        boosts: vec![],
        decay: None,
        gates: vec![],
        penalties: vec![],
        excludes: vec![],
        diversity: DiversitySpec {
            max_per_creator: Some(2),
            format_mix_max_fraction: None,
        },
        exploration: 0.0,
        sort: None, // Custom scoring via score_with_context
        is_builtin: true,
    })?;

    // notification
    registry.register(RankingProfile {
        name: "notification".into(),
        version: 1,
        candidate_strategy: CandidateStrategy::Relationship,
        boosts: vec![],
        decay: None,
        gates: vec![],
        penalties: vec![],
        excludes: vec![],
        diversity: DiversitySpec::default(),
        exploration: 0.0,
        sort: None, // Custom scoring via score_with_context
        is_builtin: true,
    })?;

    Ok(())
}
```

### ProfileExecutor Extension

```rust
impl<'a> ProfileExecutor<'a> {
    /// Score candidates with user context for personalized ranking.
    ///
    /// When `user_ctx` is provided, the executor uses personalized scoring
    /// functions. The profile name determines which scoring formula is used:
    /// - `for_you`: preference match + engagement + recency + affinity
    /// - `related`: item similarity + preference match + engagement
    /// - `notification`: relationship strength + item quality
    /// - All others: delegates to `score()` (population-level)
    pub fn score_with_context(
        &self,
        candidates: &[EntityId],
        profile: &RankingProfile,
        now: Timestamp,
        user_ctx: &UserContext,
        item_embeddings: &dyn Fn(EntityId) -> Option<Vec<f32>>,
        item_created_at: &dyn Fn(EntityId) -> Option<u64>,
    ) -> Vec<ScoredCandidate> {
        let profile_name = profile.name.as_str();

        let mut scored: Vec<ScoredCandidate> = candidates
            .iter()
            .filter(|&&eid| passes_gates(eid, &profile.gates, self.ledger))
            .map(|&entity_id| {
                let raw = match profile_name {
                    "for_you" => self.score_for_you(entity_id, user_ctx, now, item_embeddings, item_created_at),
                    "related" => self.score_related(entity_id, user_ctx, item_embeddings),
                    "notification" => self.score_notification(entity_id, user_ctx),
                    _ => self.compute_raw_score(entity_id, profile, now),
                };
                ScoredCandidate {
                    entity_id,
                    score: raw,
                    signal_snapshot: vec![],
                    creator_id: None,
                    format: None,
                }
            })
            .collect();

        scored.sort_by(|a, b| b.score.partial_cmp(&a.score).unwrap_or(std::cmp::Ordering::Equal));
        normalize(&mut scored);
        scored
    }

    fn score_for_you(
        &self,
        entity_id: EntityId,
        user_ctx: &UserContext,
        now: Timestamp,
        item_embeddings: &dyn Fn(EntityId) -> Option<Vec<f32>>,
        item_created_at: &dyn Fn(EntityId) -> Option<u64>,
    ) -> f64 {
        let item_emb = item_embeddings(entity_id);
        let pref_match = preference_match(user_ctx, item_emb.as_deref());

        let view_vel = read_agg(entity_id, "view", &SignalAgg::Velocity, Window::TwentyFourHours, self.ledger);
        let share_vel = read_agg(entity_id, "share", &SignalAgg::Velocity, Window::TwentyFourHours, self.ledger);
        let engagement_vel = (view_vel + 2.0 * share_vel).min(1.0);

        let recency = item_created_at(entity_id)
            .map_or(0.5, |ts| recency_score(ts, now));

        let creator_id = None; // Read from metadata in actual implementation
        let affinity = creator_affinity(user_ctx, creator_id);

        for_you_score(pref_match, engagement_vel, recency, affinity)
    }

    fn score_related(
        &self,
        entity_id: EntityId,
        user_ctx: &UserContext,
        item_embeddings: &dyn Fn(EntityId) -> Option<Vec<f32>>,
    ) -> f64 {
        let item_emb = item_embeddings(entity_id);
        let pref_match = preference_match(user_ctx, item_emb.as_deref());

        let views = read_agg(entity_id, "view", &SignalAgg::Value, Window::AllTime, self.ledger);
        let likes = read_agg(entity_id, "like", &SignalAgg::Value, Window::AllTime, self.ledger);
        let engagement = (views.log10().max(0.0) + likes.log10().max(0.0)) / 10.0;

        // item_similarity is computed by the caller from ANN distances.
        // For now, use preference match as a proxy.
        let item_similarity = pref_match;

        related_score(item_similarity, pref_match, engagement)
    }

    fn score_notification(
        &self,
        entity_id: EntityId,
        user_ctx: &UserContext,
    ) -> f64 {
        let creator_id = None; // Read from metadata
        let rel_strength = creator_affinity(user_ctx, creator_id);

        let view_vel = read_agg(entity_id, "view", &SignalAgg::Velocity, Window::TwentyFourHours, self.ledger);
        let completion = read_agg(entity_id, "completion", &SignalAgg::DecayScore, Window::AllTime, self.ledger);
        let item_quality = (view_vel.log10().max(0.0) + completion) / 2.0;

        notification_score(rel_strength, item_quality)
    }
}
```

## Test Strategy

### Unit Tests

```rust
#[test]
fn preference_match_identical_vectors() {
    let pref = PreferenceVector::from_embedding(vec![1.0, 0.0, 0.0], 3).unwrap();
    let ctx = UserContext {
        user_id: EntityId::new(1),
        preference_vector: Some(pref),
        top_creators: vec![],
        followed_creators: HashSet::new(),
        blocked_creators: HashSet::new(),
        hidden_items: HashSet::new(),
        is_cold_start: false,
    };
    let item = [1.0f32, 0.0, 0.0];
    let score = preference_match(&ctx, Some(&item));
    assert!((score - 1.0).abs() < 0.01, "identical vectors: {}", score);
}

#[test]
fn preference_match_orthogonal_vectors() {
    let pref = PreferenceVector::from_embedding(vec![1.0, 0.0, 0.0], 3).unwrap();
    let ctx = UserContext {
        user_id: EntityId::new(1),
        preference_vector: Some(pref),
        ..cold_start_context()
    };
    let item = [0.0f32, 1.0, 0.0];
    let score = preference_match(&ctx, Some(&item));
    assert!((score - 0.5).abs() < 0.01, "orthogonal: {}", score);
}

#[test]
fn preference_match_opposite_vectors() {
    let pref = PreferenceVector::from_embedding(vec![1.0, 0.0, 0.0], 3).unwrap();
    let ctx = UserContext {
        user_id: EntityId::new(1),
        preference_vector: Some(pref),
        ..cold_start_context()
    };
    let item = [-1.0f32, 0.0, 0.0];
    let score = preference_match(&ctx, Some(&item));
    assert!((score - 0.0).abs() < 0.01, "opposite: {}", score);
}

#[test]
fn preference_match_cold_start_returns_neutral() {
    let ctx = cold_start_context();
    let item = [1.0f32, 0.0, 0.0];
    let score = preference_match(&ctx, Some(&item));
    assert!((score - 0.5).abs() < f64::EPSILON);
}

#[test]
fn creator_affinity_zero_for_no_interaction() {
    let ctx = cold_start_context();
    let score = creator_affinity(&ctx, Some(EntityId::new(10)));
    assert!((score - 0.0).abs() < f64::EPSILON);
}

#[test]
fn creator_affinity_saturates() {
    let ctx = UserContext {
        user_id: EntityId::new(1),
        top_creators: vec![(EntityId::new(10), 100.0)],
        ..cold_start_context()
    };
    let score = creator_affinity(&ctx, Some(EntityId::new(10)));
    // weight=100, k=5: 100/(100+5) = 0.952
    assert!(score > 0.9, "high affinity: {}", score);
}

#[test]
fn recency_score_now_is_one() {
    let now = Timestamp::now();
    let score = recency_score(now.as_nanos(), now);
    assert!((score - 1.0).abs() < 0.01);
}

#[test]
fn recency_score_48h_is_half() {
    let now = Timestamp::now();
    let forty_eight_hours_ago = now.as_nanos() - 48 * 3600 * 1_000_000_000;
    let score = recency_score(forty_eight_hours_ago, now);
    assert!((score - 0.5).abs() < 0.05, "48h recency: {}", score);
}

#[test]
fn for_you_score_range() {
    let score = for_you_score(1.0, 1.0, 1.0, 1.0);
    assert!((score - 1.0).abs() < f64::EPSILON);
    let score_zero = for_you_score(0.0, 0.0, 0.0, 0.0);
    assert!((score_zero - 0.0).abs() < f64::EPSILON);
}

#[test]
fn related_score_range() {
    let score = related_score(1.0, 1.0, 1.0);
    assert!((score - 1.0).abs() < f64::EPSILON);
}

#[test]
fn notification_score_range() {
    let score = notification_score(1.0, 1.0);
    assert!((score - 1.0).abs() < f64::EPSILON);
}

fn cold_start_context() -> UserContext {
    UserContext {
        user_id: EntityId::new(1),
        preference_vector: None,
        top_creators: vec![],
        followed_creators: HashSet::new(),
        blocked_creators: HashSet::new(),
        hidden_items: HashSet::new(),
        is_cold_start: true,
    }
}
```

### Property Tests

```rust
use proptest::prelude::*;

proptest! {
    #[test]
    fn preference_match_always_in_unit_range(
        pref_vec in proptest::collection::vec(-1.0f32..1.0, 16),
        item_vec in proptest::collection::vec(-1.0f32..1.0, 16),
    ) {
        if let Some(pref) = PreferenceVector::from_embedding(pref_vec, 16) {
            let ctx = UserContext {
                user_id: EntityId::new(1),
                preference_vector: Some(pref),
                ..cold_start_context()
            };
            let score = preference_match(&ctx, Some(&item_vec));
            prop_assert!(score >= 0.0 && score <= 1.0,
                "preference match out of range: {}", score);
        }
    }

    #[test]
    fn for_you_score_always_in_unit_range(
        pm in 0.0f64..1.0,
        ev in 0.0f64..1.0,
        r in 0.0f64..1.0,
        a in 0.0f64..1.0,
    ) {
        let score = for_you_score(pm, ev, r, a);
        prop_assert!(score >= 0.0 && score <= 1.0,
            "for_you score out of range: {}", score);
    }

    #[test]
    fn creator_affinity_always_in_unit_range(
        weight in 0.0f64..1000.0,
    ) {
        let ctx = UserContext {
            user_id: EntityId::new(1),
            top_creators: vec![(EntityId::new(10), weight)],
            ..cold_start_context()
        };
        let score = creator_affinity(&ctx, Some(EntityId::new(10)));
        prop_assert!(score >= 0.0 && score <= 1.0,
            "creator affinity out of range: {}", score);
    }
}
```

## Acceptance Criteria

- [ ] `preference_match` returns cosine similarity remapped to [0, 1], neutral 0.5 for missing data
- [ ] `creator_affinity` returns sigmoid-normalized interaction weight in [0, 1]
- [ ] `recency_score` returns exponential decay with 48h half-life
- [ ] `for_you_score` combines four factors with weights summing to 1.0
- [ ] `related_score` combines three factors with weights summing to 1.0
- [ ] `notification_score` combines two factors with weights summing to 1.0
- [ ] All scoring functions return values in [0.0, 1.0] (property tested)
- [ ] `for_you` profile registered as builtin with correct configuration
- [ ] `following` profile registered with `Sort::New` and `CandidateStrategy::Relationship`
- [ ] `related` profile registered with ANN candidate strategy
- [ ] `notification` profile registered with relationship-based candidates
- [ ] `ProfileExecutor::score_with_context` dispatches to correct scoring function by profile name
- [ ] Cold-start users get neutral scores (0.5 preference match, 0.0 affinity)
- [ ] `cargo clippy -- -D warnings` passes
- [ ] All tests pass

## Research References

- [docs/research/ann_for_tidaldb.md](../../../research/ann_for_tidaldb.md) -- Cosine similarity via dot product on unit vectors
- [VISION.md](../../../../VISION.md) -- Ranking profile formulas
- [USE_CASES.md](../../../../USE_CASES.md) -- UC-01, UC-04, UC-05, UC-07

## Implementation Notes

- The scoring functions are intentionally simple linear combinations. The weights (0.4/0.3/0.2/0.1 for for_you) are starting points that can be tuned without code changes if the profile system is extended to accept configurable weights. For M3, hardcoded weights are sufficient.
- `creator_affinity` uses a sigmoid-like `w/(w+k)` transformation instead of raw weight. This bounds the output to [0, 1] and prevents high-weight creators from completely dominating the score. The half-saturation constant `k=5.0` means a weight of 5 produces affinity 0.5.
- For the `related` profile, the `item_similarity` should ideally come from the ANN distance between the source item and candidate item. In this task, we use preference match as a proxy. The full implementation should pipe ANN distances through from the candidate retrieval phase.
- The `following` profile uses `Sort::New` from the existing sort system. No custom scoring is needed -- the executor's existing `score_by_sort` handles chronological ordering.
- The `notification` profile's `CandidateStrategy::Relationship` means candidates are sourced from followed creators' items. The RETRIEVE executor must implement this candidate sourcing strategy, which uses the `FollowsBitmap` from m3p1 Task 03.
- Do NOT implement the exploration budget injection in this task. The `exploration: 0.1` field on the `for_you` profile is defined here but not enforced. Enforcement is done in Task 03 (Cold Start and Exploration).