tidaldb/docs/planning/milestone-3/phase-3/OVERVIEW.md
jordan 39ada28c6e feat: complete Milestones 2–4 — RETRIEVE query, vector index, ranking profiles, diversity, entity system, sessions
M2: RETRIEVE query pipeline with 5-stage execution (candidate → filter → score → diversify → limit),
    usearch HNSW vector index, bitmap/range/universe filters, ranking profiles with signal scoring,
    MMR diversity enforcement, and m2_uat integration tests.

M3: Entity system with typed metadata, relationship graph (follows/blocks/interactions),
    creator entities, session tracking, and m3_uat integration tests.

M4: Advanced ranking with builtin functions (freshness, trending, controversy, wilson),
    ranking executor with explain mode, query executor integration, benchmarks for
    query/ranking/vector/filters/diversity, and m4_uat integration tests.

Includes: 9 new blog posts, marketing site updates, updated roadmap, and updated vision doc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 16:24:48 -07:00

5.3 KiB

Milestone 3, Phase 3: Personalized Ranking Profiles

Phase Deliverable

Four personalized ranking profiles (for_you, following, related, notification) that incorporate user context into scoring, plus cold-start handling for new users and new items. The FOR USER @user_id clause in the query parser is parsed and resolved into a UserContext that loads the user's preference vector, interaction weights, followed creators, and blocked state. The profile executor uses this context to score candidates with personalization factors.

Before m3p3, all ranking profiles operate on population-level signals (trending, hot, new, etc.). After m3p3, profiles can weight candidates by how well they match the user's learned preferences, boost items from creators the user frequently engages with, and inject exploration candidates from unfollowed creators to prevent filter bubbles.

Acceptance Criteria

  • FOR USER @user_id clause parsed by the query parser
  • UserContext loaded from UserStateIndex, InteractionWeightLedger, preference vector
  • for_you profile: ANN retrieval using user preference vector, scoring = preference_match * engagement * recency * social_proof, 10% exploration budget
  • following profile: candidates restricted to followed creators, sorted by created_at DESC
  • related profile: ANN retrieval using source item embedding, re-ranked by user preference
  • notification profile: candidates from followed creators' recent items, scored by relationship_strength * item_quality
  • Cold-start users (no preference vector): fall back to population-level signals (trending/quality)
  • Cold-start items (no signals): exploration window -- appear in ~2% of for_you feeds
  • Exploration budget: ~5 of 50 for_you results from unfollowed creators
  • ProfileExecutor extended with score_with_user_context() method
  • Profile registration: for_you, following, related, notification added to ProfileRegistry as builtins

Dependencies

  • Requires: m3p2 (feedback loop: preference vectors populated, interaction weights updated, user state bitmaps maintained), m2p3 (ranking profile engine, ProfileExecutor), m2p5 (query parser, RETRIEVE executor), m2p1 (vector index for ANN retrieval with user preference vector)
  • Blocks: m3p4 (User State Filters need FOR USER parsing to inject user context into filter evaluation)

Research References

Task Index

# Task Delivers Depends On Complexity
01 FOR USER Query Context UserContext loader, query parser extension for FOR USER, planner integration None M
02 Personalized Profiles for_you, following, related, notification profiles, executor extensions Task 01 L
03 Cold Start and Exploration Cold-start fallback, exploration budget injection, new-item exploration window Task 02 M

Task Dependency DAG

Task 01: FOR USER Query Context
    |
    v
Task 02: Personalized Profiles
    |
    v
Task 03: Cold Start and Exploration

All three tasks are sequential: Task 02 needs the user context from Task 01, and Task 03 needs the profiles from Task 02 to inject exploration candidates.

File Layout

tidal/src/
  db/
    user_context.rs  -- UserContext loader, query context resolution (Task 01)
  query/
    mod.rs           -- Extended parser with FOR USER clause (Task 01)
  ranking/
    personalized.rs  -- Personalized scoring functions, profile definitions (Task 02)
    exploration.rs   -- Cold-start fallback, exploration budget (Task 03)
    builtins.rs      -- Extended with for_you, following, related, notification (Task 02)
tidal/tests/
  m3p3_personalized.rs -- Phase integration tests

Open Questions

  1. Exploration budget implementation: Should exploration candidates be selected randomly from the full corpus, or from a "new items" pool? Recommendation: random selection from the full corpus minus followed creators' items. This maximizes serendipity. New-item exploration is a separate budget within the exploration slice.

  2. Social proof signal: How should "social proof" (engagement from followed creators' followers) be implemented? For M3, social proof is approximated by the item's population-level engagement signals (view velocity, like count). True social graph traversal ("trending among my follows' follows") is deferred to M6.

  3. SIMILAR TO clause: The related profile needs a source item for ANN retrieval. Should SIMILAR TO @item_id be a separate parser clause, or embedded in the profile configuration? Recommendation: separate clause (RETRIEVE items SIMILAR TO @item_abc USING PROFILE related ...). This keeps the profile generic and the source item explicit.

  4. Notification frequency capping: Should the notification profile enforce per-creator notification limits (e.g., max 3 per creator per day)? Recommendation: deferred to M6. For M3, the notification profile ranks by recency * relationship strength without capping.