M2: RETRIEVE query pipeline with 5-stage execution (candidate → filter → score → diversify → limit),
usearch HNSW vector index, bitmap/range/universe filters, ranking profiles with signal scoring,
MMR diversity enforcement, and m2_uat integration tests.
M3: Entity system with typed metadata, relationship graph (follows/blocks/interactions),
creator entities, session tracking, and m3_uat integration tests.
M4: Advanced ranking with builtin functions (freshness, trending, controversy, wilson),
ranking executor with explain mode, query executor integration, benchmarks for
query/ranking/vector/filters/diversity, and m4_uat integration tests.
Includes: 9 new blog posts, marketing site updates, updated roadmap, and updated vision doc.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
22 KiB
Task 01: User Preference Vector
Context
Milestone: 3 -- Personalized Ranking
Phase: m3p2 -- Feedback Loop
Depends On: m3p1 Task 01 (UserEntity with embedding slot, StorageBox::users_engine()), m2p1 (vector index for reading item embeddings), m1p3 (storage engine)
Blocks: Task 02 (Interaction Weight Ledger needs entity lookup patterns), Task 04 (Atomic Signal Dispatch integrates preference updates), m3p3 (Personalized Profiles use preference vectors for ANN retrieval)
Complexity: L
Objective
Deliver the user preference vector management system: a 1536-dimensional embedding that represents a user's learned taste, updated atomically on engagement signals via exponential moving average (EMA). When a user likes or completes an item, their preference vector shifts toward that item's embedding. When they skip an item, it shifts away. The vector is normalized after every update to maintain unit length for correct cosine-distance ANN queries.
The preference vector is the core personalization primitive. In m3p3, the for_you profile uses it as the ANN query vector to retrieve items matching the user's taste. Without a correct, normalized, and promptly-updated preference vector, personalized ranking cannot work.
Cold-start users (no engagement history) have embedding: None. The query planner in m3p3 detects this and falls back to population-level signals. The first engagement signal that triggers a preference update initializes the vector from the engaged item's embedding.
Requirements
PreferenceVectorstruct: wrapsVec<f32>with invariant that||v|| == 1.0(unit normalized)PreferenceVector::update_toward(item_embedding, alpha)applies EMA:v_new = normalize(alpha * item + (1 - alpha) * v)PreferenceVector::update_away(item_embedding, alpha)applies negative EMA:v_new = normalize(v - alpha * item), clamped to prevent sign flipPreferenceVector::initialize_from(item_embedding)sets the vector to the normalized item embedding (cold-start initialization)PreferenceConfigstruct:alpha: f32(default 0.1),negative_alpha: f32(default 0.05),dimensions: usize(default 1536)serialize_preference_vector/deserialize_preference_vectorfor storage under key[user_id][0x00][META][PREF:default]- Normalization is L2 (Euclidean) -- result is unit vector
- Zero vector after subtraction is handled gracefully (keep previous vector)
PreferenceVectorisSend + Sync- Storage codec produces compact f32 array (no JSON overhead)
Technical Design
Module Structure
tidal/src/
entities/
preference.rs -- PreferenceVector, PreferenceConfig, update logic, serialization
Core Types
// === entities/preference.rs ===
/// Configuration for preference vector updates.
#[derive(Debug, Clone)]
pub struct PreferenceConfig {
/// Learning rate for positive signals (like, completion, share).
/// Controls how much the vector shifts toward the engaged item.
/// Range: (0.0, 1.0). Default: 0.1.
pub alpha: f32,
/// Learning rate for negative signals (skip).
/// Controls how much the vector shifts away from the skipped item.
/// Lower than alpha to prevent rapid drift from sparse negative feedback.
/// Range: (0.0, 1.0). Default: 0.05.
pub negative_alpha: f32,
/// Embedding dimensionality. Must match item embeddings.
/// Default: 1536.
pub dimensions: usize,
}
impl Default for PreferenceConfig {
fn default() -> Self {
Self {
alpha: 0.1,
negative_alpha: 0.05,
dimensions: 1536,
}
}
}
/// A normalized user preference embedding.
///
/// Invariant: the vector is always L2-normalized (unit length) after any mutation.
/// This ensures cosine similarity via dot product and correct ANN distance calculations.
///
/// The vector may be `None` for cold-start users who have not yet engaged with any
/// content. The first positive signal initializes it from the engaged item's embedding.
#[derive(Debug, Clone)]
pub struct PreferenceVector {
/// The normalized embedding. `None` for cold-start users.
data: Option<Vec<f32>>,
/// Expected dimensionality for validation.
dimensions: usize,
}
impl PreferenceVector {
/// Create an empty (cold-start) preference vector.
pub fn cold_start(dimensions: usize) -> Self {
Self {
data: None,
dimensions,
}
}
/// Create a preference vector from an existing embedding.
///
/// The embedding is normalized on construction.
///
/// # Errors
///
/// Returns `None` if the embedding has wrong dimensions or zero magnitude.
pub fn from_embedding(embedding: Vec<f32>, dimensions: usize) -> Option<Self> {
if embedding.len() != dimensions {
return None;
}
let normalized = l2_normalize(&embedding)?;
Some(Self {
data: Some(normalized),
dimensions,
})
}
/// Returns the inner embedding, or `None` for cold-start users.
pub fn as_slice(&self) -> Option<&[f32]> {
self.data.as_deref()
}
/// Whether this is a cold-start (no engagement) preference vector.
pub fn is_cold_start(&self) -> bool {
self.data.is_none()
}
/// Initialize from the first engaged item's embedding.
///
/// Only called when `is_cold_start()` is true. Sets the preference
/// to the normalized item embedding.
///
/// Returns `true` if initialization succeeded.
pub fn initialize_from(&mut self, item_embedding: &[f32]) -> bool {
if item_embedding.len() != self.dimensions {
return false;
}
if let Some(normalized) = l2_normalize(item_embedding) {
self.data = Some(normalized);
true
} else {
false
}
}
/// Shift the preference vector toward an item embedding (positive signal).
///
/// EMA update: `v_new = normalize(alpha * item + (1 - alpha) * v)`
///
/// If the user is cold-start, initializes from the item embedding instead.
pub fn update_toward(&mut self, item_embedding: &[f32], alpha: f32) {
if item_embedding.len() != self.dimensions {
return;
}
match &self.data {
None => {
// Cold start: initialize from item embedding.
self.initialize_from(item_embedding);
}
Some(current) => {
let mut new_vec = vec![0.0f32; self.dimensions];
for i in 0..self.dimensions {
new_vec[i] = alpha.mul_add(item_embedding[i], (1.0 - alpha) * current[i]);
}
if let Some(normalized) = l2_normalize(&new_vec) {
self.data = Some(normalized);
}
// If normalization fails (zero vector), keep the previous vector.
}
}
}
/// Shift the preference vector away from an item embedding (negative signal).
///
/// Subtraction: `v_new = normalize(v - alpha * item)`
///
/// If the result is a zero vector (complete cancellation), the previous
/// vector is retained. This prevents preference collapse from a single
/// strong negative signal.
///
/// No-op for cold-start users (cannot move away from nothing).
pub fn update_away(&mut self, item_embedding: &[f32], alpha: f32) {
if item_embedding.len() != self.dimensions {
return;
}
if let Some(current) = &self.data {
let mut new_vec = vec![0.0f32; self.dimensions];
for i in 0..self.dimensions {
new_vec[i] = current[i] - alpha * item_embedding[i];
}
// Only update if the result is normalizable (non-zero magnitude).
if let Some(normalized) = l2_normalize(&new_vec) {
self.data = Some(normalized);
}
// Otherwise keep the previous vector.
}
// Cold-start: no-op (cannot move away from nothing).
}
}
Normalization
/// L2-normalize a vector to unit length.
///
/// Returns `None` if the vector has zero magnitude (all zeros or
/// the subtraction result cancelled out completely).
fn l2_normalize(v: &[f32]) -> Option<Vec<f32>> {
let magnitude: f32 = v.iter().map(|x| x * x).sum::<f32>().sqrt();
if magnitude < f32::EPSILON {
return None;
}
let inv_mag = 1.0 / magnitude;
Some(v.iter().map(|x| x * inv_mag).collect())
}
/// Compute L2 norm of a vector.
fn l2_norm(v: &[f32]) -> f32 {
v.iter().map(|x| x * x).sum::<f32>().sqrt()
}
Storage Codec
/// Storage key suffix for preference vectors.
///
/// Full key: `[user_id: 8 BE][0x00][META: 0x03][PREF][slot_name_bytes]`
/// For the default slot: `[user_id][0x00][0x03][0x50][default]`
pub const PREF_TAG: u8 = 0x50; // 'P' -- preference vector marker within META
/// Serialize a preference vector to bytes.
///
/// Format:
/// ```text
/// [dimensions: 4 bytes LE]
/// [has_data: 1 byte (0 or 1)]
/// if has_data:
/// [f32 values: dimensions * 4 bytes, LE]
/// ```
pub fn serialize_preference(pref: &PreferenceVector) -> Vec<u8> {
let mut buf = Vec::new();
buf.extend_from_slice(&(pref.dimensions as u32).to_le_bytes());
match pref.as_slice() {
None => {
buf.push(0); // no data
}
Some(data) => {
buf.push(1); // has data
for &val in data {
buf.extend_from_slice(&val.to_le_bytes());
}
}
}
buf
}
/// Deserialize a preference vector from bytes.
pub fn deserialize_preference(bytes: &[u8]) -> Option<PreferenceVector> {
if bytes.len() < 5 {
return None;
}
let dimensions = u32::from_le_bytes(bytes[0..4].try_into().ok()?) as usize;
let has_data = bytes[4];
if has_data == 0 {
return Some(PreferenceVector::cold_start(dimensions));
}
let expected_len = 5 + dimensions * 4;
if bytes.len() < expected_len {
return None;
}
let mut data = Vec::with_capacity(dimensions);
for i in 0..dimensions {
let offset = 5 + i * 4;
let val = f32::from_le_bytes(bytes[offset..offset + 4].try_into().ok()?);
data.push(val);
}
PreferenceVector::from_embedding(data, dimensions)
}
TidalDb Integration
impl TidalDb {
/// Read the preference vector for a user.
///
/// Returns `None` if the user does not exist or has no preference vector stored.
pub fn read_user_preference(
&self,
user_id: EntityId,
) -> crate::Result<Option<PreferenceVector>> {
let storage = self.storage.as_ref()
.ok_or_else(|| LumenError::Internal("no storage".into()))?;
let mut suffix = vec![PREF_TAG];
suffix.extend_from_slice(b"default");
let key = encode_key(user_id, Tag::Meta, &suffix);
match storage.users_engine().get(&key)? {
Some(bytes) => Ok(deserialize_preference(&bytes)),
None => Ok(None),
}
}
/// Write the preference vector for a user.
pub(crate) fn write_user_preference(
&self,
user_id: EntityId,
pref: &PreferenceVector,
) -> crate::Result<()> {
let storage = self.storage.as_ref()
.ok_or_else(|| LumenError::Internal("no storage".into()))?;
let mut suffix = vec![PREF_TAG];
suffix.extend_from_slice(b"default");
let key = encode_key(user_id, Tag::Meta, &suffix);
let value = serialize_preference(pref);
storage.users_engine().put(&key, &value).map_err(LumenError::from)
}
/// Read an item's embedding for preference vector updates.
///
/// Used by the signal dispatch to get the item embedding when
/// updating the user preference vector.
pub(crate) fn read_item_embedding(
&self,
item_id: EntityId,
) -> crate::Result<Option<Vec<f32>>> {
// Read from the vector index or from storage directly.
// Implementation delegates to the embedding slot registry from m2p1.
// Returns None if the item has no embedding.
todo!("delegate to EmbeddingSlotRegistry::get(item_id, \"default\")")
}
}
Test Strategy
Unit Tests
#[test]
fn cold_start_is_none() {
let pref = PreferenceVector::cold_start(1536);
assert!(pref.is_cold_start());
assert!(pref.as_slice().is_none());
}
#[test]
fn from_embedding_normalizes() {
let raw = vec![3.0, 4.0]; // magnitude = 5.0
let pref = PreferenceVector::from_embedding(raw, 2).unwrap();
let data = pref.as_slice().unwrap();
assert!((data[0] - 0.6).abs() < 1e-6);
assert!((data[1] - 0.8).abs() < 1e-6);
}
#[test]
fn from_embedding_wrong_dimensions_returns_none() {
let raw = vec![1.0, 2.0, 3.0];
assert!(PreferenceVector::from_embedding(raw, 2).is_none());
}
#[test]
fn from_zero_embedding_returns_none() {
let raw = vec![0.0, 0.0];
assert!(PreferenceVector::from_embedding(raw, 2).is_none());
}
#[test]
fn initialize_from_sets_normalized_vector() {
let mut pref = PreferenceVector::cold_start(3);
let item = [3.0f32, 0.0, 4.0];
assert!(pref.initialize_from(&item));
assert!(!pref.is_cold_start());
let data = pref.as_slice().unwrap();
assert!((l2_norm(data) - 1.0).abs() < 1e-5);
}
#[test]
fn update_toward_shifts_preference() {
let initial = vec![1.0, 0.0, 0.0]; // unit vector along x
let mut pref = PreferenceVector::from_embedding(initial, 3).unwrap();
let item = [0.0f32, 1.0, 0.0]; // unit vector along y
let alpha = 0.5;
pref.update_toward(&item, alpha);
let data = pref.as_slice().unwrap();
// After EMA with alpha=0.5: raw = (0.5, 0.5, 0.0), normalized = (1/sqrt(2), 1/sqrt(2), 0)
let expected_component = 1.0 / 2.0f32.sqrt();
assert!((data[0] - expected_component).abs() < 1e-5);
assert!((data[1] - expected_component).abs() < 1e-5);
assert!((l2_norm(data) - 1.0).abs() < 1e-5);
}
#[test]
fn update_toward_cold_start_initializes() {
let mut pref = PreferenceVector::cold_start(3);
let item = [1.0f32, 0.0, 0.0];
pref.update_toward(&item, 0.1);
assert!(!pref.is_cold_start());
let data = pref.as_slice().unwrap();
// Cold start initializes from item embedding, normalized.
assert!((data[0] - 1.0).abs() < 1e-5);
}
#[test]
fn update_away_shifts_preference_away() {
let initial = vec![0.6, 0.8, 0.0]; // unit vector
let mut pref = PreferenceVector::from_embedding(initial.clone(), 3).unwrap();
let item = [1.0f32, 0.0, 0.0]; // x-axis
pref.update_away(&item, 0.1);
let data = pref.as_slice().unwrap();
// x component should decrease, y component should relatively increase.
assert!(data[0] < 0.6, "x should decrease after update_away");
assert!((l2_norm(data) - 1.0).abs() < 1e-5);
}
#[test]
fn update_away_complete_cancellation_keeps_previous() {
let initial = vec![1.0, 0.0]; // unit x
let mut pref = PreferenceVector::from_embedding(initial.clone(), 2).unwrap();
// Try to subtract the entire vector
let item = [1.0f32, 0.0];
pref.update_away(&item, 1.0); // alpha=1.0 -> 1.0 - 1.0*1.0 = 0.0
// Should keep previous vector because result would be zero.
let data = pref.as_slice().unwrap();
assert!((data[0] - 1.0).abs() < 1e-5);
}
#[test]
fn update_away_cold_start_is_noop() {
let mut pref = PreferenceVector::cold_start(3);
let item = [1.0f32, 0.0, 0.0];
pref.update_away(&item, 0.1);
assert!(pref.is_cold_start());
}
#[test]
fn serialize_deserialize_roundtrip_with_data() {
let raw = vec![0.6, 0.8]; // will be normalized
let pref = PreferenceVector::from_embedding(raw, 2).unwrap();
let bytes = serialize_preference(&pref);
let recovered = deserialize_preference(&bytes).unwrap();
let data = recovered.as_slice().unwrap();
assert_eq!(data.len(), 2);
assert!((l2_norm(data) - 1.0).abs() < 1e-5);
}
#[test]
fn serialize_deserialize_roundtrip_cold_start() {
let pref = PreferenceVector::cold_start(1536);
let bytes = serialize_preference(&pref);
let recovered = deserialize_preference(&bytes).unwrap();
assert!(recovered.is_cold_start());
}
#[test]
fn serialize_size_is_compact() {
let pref = PreferenceVector::from_embedding(vec![0.1; 1536], 1536).unwrap();
let bytes = serialize_preference(&pref);
// 4 (dimensions) + 1 (has_data) + 1536 * 4 (f32 values) = 6149 bytes
assert_eq!(bytes.len(), 4 + 1 + 1536 * 4);
}
#[test]
fn preference_config_default_values() {
let config = PreferenceConfig::default();
assert!((config.alpha - 0.1).abs() < f32::EPSILON);
assert!((config.negative_alpha - 0.05).abs() < f32::EPSILON);
assert_eq!(config.dimensions, 1536);
}
Property Tests
use proptest::prelude::*;
proptest! {
#[test]
fn preference_always_unit_normalized_after_update(
initial in proptest::collection::vec(-1.0f32..1.0, 16),
item in proptest::collection::vec(-1.0f32..1.0, 16),
alpha in 0.01f32..0.99,
) {
// Use 16 dims for speed in property tests (not 1536).
if let Some(mut pref) = PreferenceVector::from_embedding(initial, 16) {
pref.update_toward(&item, alpha);
if let Some(data) = pref.as_slice() {
let norm = l2_norm(data);
prop_assert!((norm - 1.0).abs() < 1e-4,
"norm after update_toward: {}", norm);
}
}
}
#[test]
fn preference_always_unit_normalized_after_negative_update(
initial in proptest::collection::vec(-1.0f32..1.0, 16),
item in proptest::collection::vec(-1.0f32..1.0, 16),
alpha in 0.01f32..0.5,
) {
if let Some(mut pref) = PreferenceVector::from_embedding(initial, 16) {
pref.update_away(&item, alpha);
if let Some(data) = pref.as_slice() {
let norm = l2_norm(data);
prop_assert!((norm - 1.0).abs() < 1e-4,
"norm after update_away: {}", norm);
}
}
}
#[test]
fn serialize_deserialize_preserves_values(
values in proptest::collection::vec(-1.0f32..1.0, 16),
) {
if let Some(pref) = PreferenceVector::from_embedding(values, 16) {
let bytes = serialize_preference(&pref);
let recovered = deserialize_preference(&bytes);
prop_assert!(recovered.is_some());
let recovered = recovered.unwrap();
let orig = pref.as_slice().unwrap();
let recv = recovered.as_slice().unwrap();
for (a, b) in orig.iter().zip(recv.iter()) {
prop_assert!((a - b).abs() < 1e-6,
"mismatch: {} vs {}", a, b);
}
}
}
#[test]
fn update_toward_moves_closer_to_item(
initial in proptest::collection::vec(-1.0f32..1.0, 16),
item in proptest::collection::vec(-1.0f32..1.0, 16),
alpha in 0.01f32..0.5,
) {
if let (Some(mut pref), Some(item_norm)) = (
PreferenceVector::from_embedding(initial, 16),
l2_normalize(&item),
) {
let before = dot_product(pref.as_slice().unwrap(), &item_norm);
pref.update_toward(&item, alpha);
if let Some(data) = pref.as_slice() {
let after = dot_product(data, &item_norm);
// Cosine similarity should increase or stay the same
// (within floating-point tolerance).
prop_assert!(after >= before - 1e-4,
"similarity should not decrease: before={}, after={}", before, after);
}
}
}
}
Acceptance Criteria
PreferenceVector::cold_start(dims)creates aNone-data vectorPreferenceVector::from_embedding(vec, dims)normalizes and validates dimensionsPreferenceVector::from_embeddingrejects zero vectors and wrong dimensionsupdate_towardapplies EMA and re-normalizes; result is always unit lengthupdate_towardon cold-start initializes from item embeddingupdate_awaysubtracts and re-normalizes; keeps previous on zero resultupdate_awayon cold-start is a no-opserialize_preference/deserialize_preferenceroundtrip for data and cold-start- Storage size = 4 + 1 + dims * 4 bytes for populated vectors
PreferenceConfighas sensible defaults: alpha=0.1, negative_alpha=0.05, dims=1536- Property test: normalization invariant holds after any update sequence
- Property test:
update_towardincreases cosine similarity to item cargo clippy -- -D warningspasses- All tests pass
Research References
- docs/research/ann_for_tidaldb.md -- Embedding normalization, cosine distance via dot product
- thoughts.md -- Part V.16 (user preference vector as database-managed embedding)
- VISION.md -- User preference embeddings are first-class
Implementation Notes
- The
PreferenceVectorusesVec<f32>internally, not[f32; 1536], because the dimension count comes from schema configuration. Property tests use 16 dimensions for speed; integration tests use 1536. l2_normalizeusesf32::EPSILONas the zero-magnitude threshold. This is conservative but safe. A vector with norm < EPSILON is treated as zero and the update is rejected.- The preference vector is stored in the users keyspace under a
Tag::Metakey with aPREF_TAGprefix byte. This co-locates it with the user metadata for cache-friendly reads. read_item_embeddingis a placeholder in this task. The actual implementation delegates to theEmbeddingSlotRegistryfrom m2p1. The key observation is that the signal dispatch (Task 04) calls this method, not the preference module itself. The preference module only receives the item embedding as a parameter.- Do NOT implement the signal dispatch wiring in this task. This task delivers the preference vector type, update logic, and storage codec. The wiring into
db.signal()is done in Task 04. - The
dot_producthelper used in property tests is a simple sum of element-wise products. It is not part of the public API.