jordan f4cfd6c81f feat: complete M8 replication primitives + forage enhancements + docs

Milestone 8 (phases 1-4):
- Shard-aware WAL segment naming, BatchHeader v2, ShardRouter
- Transport trait, InProcessTransport, WalShipper, FollowerDb
- HLC, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine
- Session replication bridge with SeqNo/HWM, idempotency store

Forage application:
- Multi-source discovery engine with MAB exploration
- Embedding-based label system, server handlers, UI refresh

Other:
- QUICKSTART.md, README.md, milestone-8 planning docs
- Hard negative union semantics, RLHF export enhancements
- Recovery benchmark and visibility test expansions
- Split 8 oversized source files per CODING_GUIDELINES §9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-24 13:17:19 -07:00

9.8 KiB

Raw Blame History

Quickstart

Get a working ranked feed in 10 minutes.

Prerequisites: Rust 1.91+, Cargo. No external services.

Run the example

The fastest path is the included example, which demonstrates the complete loop — schema, ingest, signals, ranking:

cargo run --manifest-path tidal/Cargo.toml --example quickstart

The rest of this guide explains what it does and extends it with personalization and search.

Step 1: Add the dependency

[dependencies]
tidaldb = { git = "https://github.com/your-org/tidalDB", rev = "..." }

Step 2: Define a schema

Schema is defined before opening the database. It declares signal types (what events you'll record and how they decay), text fields (for BM25 search), and embedding slots (for vector search).

use std::time::Duration;
use tidaldb::schema::{SchemaBuilder, EntityKind, DecaySpec, Window, TextFieldType};

let mut schema = SchemaBuilder::new();

// View signal: 7-day half-life, three windows, velocity enabled.
// You declare the decay. tidalDB applies it at query time — no formula to maintain.
let _ = schema.signal("view", EntityKind::Item, DecaySpec::Exponential {
    half_life: Duration::from_secs(7 * 24 * 3600),
}).windows(&[Window::OneHour, Window::TwentyFourHours, Window::AllTime]).velocity(true).add();

// Like signal: 30-day half-life. Durable engagement decays slowly.
let _ = schema.signal("like", EntityKind::Item, DecaySpec::Exponential {
    half_life: Duration::from_secs(30 * 24 * 3600),
}).windows(&[Window::AllTime]).velocity(false).add();

// Share signal: 3-day half-life. Short-lived but strongly trending.
let _ = schema.signal("share", EntityKind::Item, DecaySpec::Exponential {
    half_life: Duration::from_secs(3 * 24 * 3600),
}).windows(&[Window::TwentyFourHours, Window::AllTime]).velocity(true).add();

// Skip signal: permanent. A user who skipped should not see it again.
let _ = schema.signal("hide", EntityKind::Item, DecaySpec::Permanent).add();

// Text fields for BM25 full-text search.
schema.text_field("title", TextFieldType::Text);
schema.text_field("category", TextFieldType::Keyword);

// Embedding slot for semantic / vector search (128D in this example).
// In production, use the dimensionality of your embedding model.
schema.embedding_slot("content", EntityKind::Item, 128);

let schema = schema.build()?;

Decay types:

Exponential { half_life } — weight halves every half_life. Use for views, likes, shares.
Linear { lifetime } — weight drops to zero over lifetime.
Permanent — never decays. Use for hides, blocks, follows.

Step 3: Open the database

use tidaldb::TidalDb;

// Ephemeral: in-memory, ideal for tests and this tutorial.
let db = TidalDb::builder().ephemeral().with_schema(schema).open()?;

// Persistent: durable storage at a path on disk.
// let db = TidalDb::builder().with_data_dir("/var/lib/myapp/tidaldb").with_schema(schema).open()?;

db.health_check()?;

TidalDb is Send + Sync. Wrap it in Arc<TidalDb> to share across threads or tasks.

Step 4: Ingest content

Write items with metadata as HashMap<String, String> key-value pairs. Then write their embeddings separately.

tidalDB does not generate embeddings. You bring your model; tidalDB handles retrieval and ranking over the vectors you produce.

use std::collections::HashMap;
use tidaldb::schema::{EntityId, Timestamp};

let tracks = [
    (1u64, "Introduction to Jazz Piano",  "music",    "1320"),
    (2,    "Rust Async Programming",       "tech",     "3600"),
    (3,    "Sourdough Bread Masterclass",  "cooking",  "2700"),
    (4,    "Jazz Improvisation Techniques","music",    "1800"),
    (5,    "Building a Compiler in Rust",  "tech",     "5400"),
    (6,    "French Pastry Fundamentals",   "cooking",  "2100"),
    (7,    "Modal Jazz: Coltrane Changes", "music",    "2400"),
    (8,    "WebAssembly from Scratch",     "tech",     "2700"),
    (9,    "Knife Skills for Home Cooks",  "cooking",   "900"),
    (10,   "Bebop Piano Vocabulary",       "music",    "1500"),
];

for (id, title, category, duration) in &tracks {
    let mut meta = HashMap::new();
    meta.insert("title".to_string(), title.to_string());
    meta.insert("category".to_string(), category.to_string());
    meta.insert("format".to_string(), "video".to_string());
    meta.insert("duration".to_string(), duration.to_string());
    meta.insert("created_at".to_string(), Timestamp::now().as_nanos().to_string());

    db.write_item_with_metadata(EntityId::new(*id), &meta)?;

    // In production: embed the title with your model.
    // Here we use random unit vectors for illustration.
    let embedding = random_unit_vector(128, &mut rng);
    db.write_item_embedding(EntityId::new(*id), &embedding)?;
}

println!("Ingested {} items.", db.item_count());

On write, tidalDB:

Stores the entity and metadata
Indexes text fields into the BM25 index
Inserts the embedding into the HNSW vector index
Initializes the signal ledger with an exploration budget
Makes the item immediately queryable

Step 5: Record engagement signals

When a user engages with content, write a signal. The feedback loop closes at write time — no Kafka consumer to lag, no feature store sync to schedule.

let now = Timestamp::now();

// Global signals — these update the item's aggregate signal ledger.
db.signal("view", EntityId::new(1), 1.0, now)?;  // Jazz Piano viewed
db.signal("view", EntityId::new(4), 1.0, now)?;
db.signal("view", EntityId::new(7), 1.0, now)?;  // Modal Jazz viewed
db.signal("like", EntityId::new(4), 1.0, now)?;  // Jazz Improv liked
db.signal("share", EntityId::new(7), 1.0, now)?; // Modal Jazz shared

For signals with user context, use signal_with_context. This also updates the user's preference vector and interaction weights — enabling personalization.

let user_id = 42u64;
let creator_id = 100u64;

// User 42 viewed item 4. Their preference vector shifts toward jazz content.
db.signal_with_context("view", EntityId::new(4), 1.0, now, Some(user_id), Some(creator_id))?;
db.signal_with_context("like", EntityId::new(7), 1.0, now, Some(user_id), Some(creator_id))?;

// Negative signals are equal citizens.
db.signal("hide", EntityId::new(2), 1.0, now)?; // User hid the Rust video.

A ranking query issued 100ms later sees the updated state. No ETL required.

Step 6: Retrieve a ranked feed

tidalDB ships 25 built-in ranking profiles. The application names a profile; the database executes the full scoring pipeline.

use tidaldb::query::retrieve::Retrieve;

// Global trending: items with the highest share + view velocity.
let query = Retrieve::builder().profile("trending").limit(10).build()?;
let results = db.retrieve(&query)?;

println!("Trending ({} candidates):", results.total_candidates);
for item in &results.items {
    let sigs: Vec<_> = item.signals.iter()
        .map(|s| format!("{}={:.3}", s.name, s.value))
        .collect();
    println!("  #{} id={} score={:.4} [{}]",
        item.rank, item.entity_id.as_u64(), item.score, sigs.join(", "));
}

Step 7: Personalize

Swap the profile to for_you. Because user 42 signaled views and likes on jazz content, their results differ from global trending.

// Personalized feed for user 42.
let query = Retrieve::builder()
    .for_user(user_id)
    .profile("for_you")
    .limit(10)
    .build()?;
let results = db.retrieve(&query)?;

println!("For You (user {}):", user_id);
for item in &results.items {
    println!("  #{} id={} score={:.4}", item.rank, item.entity_id.as_u64(), item.score);
}

Other useful profiles:

"hot" — score with age decay (Reddit model)
"following" — content from followed creators (requires for_user + written follows relationships)
"hidden_gems" — high completion rate, low reach
"top_week" — cumulative quality over the last 7 days
"shuffle" — random, quality-weighted

Step 8: Search

Search combines BM25 full-text and ANN semantic similarity via Reciprocal Rank Fusion.

use tidaldb::query::search::Search;

// Flush the text index so recently written items are searchable.
// In production with persistent mode this happens automatically on a ~2s commit cycle.
db.flush_text_index()?;

// Keyword search, personalized for user 42.
let query = Search::builder()
    .query("jazz piano")
    .for_user(user_id)
    .limit(5)
    .build()?;
let results = db.search(&query)?;

println!("Search 'jazz piano':");
for item in &results.items {
    println!("  #{} id={} bm25={:.3?} semantic={:.3?}",
        item.rank,
        item.entity_id.as_u64(),
        item.bm25_score,
        item.semantic_score,
    );
}

Add a query embedding for hybrid search — text relevance + semantic similarity:

let query_vector = your_model.embed("jazz piano");  // same model as item embeddings
let query = Search::builder()
    .query("jazz piano")
    .vector(query_vector)
    .for_user(user_id)
    .limit(5)
    .build()?;

Step 9: Close

db.close()?;

This flushes the WAL, checkpoints signal state, and persists indexes. In persistent mode, the next open recovers to the last checkpointed state.

What to explore next

Topic	Where to look
Full API reference	API.md
Filters — format, duration, location, engagement thresholds	API.md — Filters
Diversity constraints	API.md — Diversity Constraints
All 25 ranking profiles	API.md — Sort Modes
Cohort-scoped trending	API.md — Cohorts
Collections and saved searches	API.md — Collections
Axum embedding example	`tidal/examples/axum_embedding.rs`
14 content discovery surfaces	USE_CASES.md
Architecture and design decisions	ARCHITECTURE.md

9.8 KiB Raw Blame History