Milestone 8 (phases 1-4): - Shard-aware WAL segment naming, BatchHeader v2, ShardRouter - Transport trait, InProcessTransport, WalShipper, FollowerDb - HLC, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine - Session replication bridge with SeqNo/HWM, idempotency store Forage application: - Multi-source discovery engine with MAB exploration - Embedding-based label system, server handlers, UI refresh Other: - QUICKSTART.md, README.md, milestone-8 planning docs - Hard negative union semantics, RLHF export enhancements - Recovery benchmark and visibility test expansions - Split 8 oversized source files per CODING_GUIDELINES §9 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
9.8 KiB
Quickstart
Get a working ranked feed in 10 minutes.
Prerequisites: Rust 1.91+, Cargo. No external services.
Run the example
The fastest path is the included example, which demonstrates the complete loop — schema, ingest, signals, ranking:
cargo run --manifest-path tidal/Cargo.toml --example quickstart
The rest of this guide explains what it does and extends it with personalization and search.
Step 1: Add the dependency
[dependencies]
tidaldb = { git = "https://github.com/your-org/tidalDB", rev = "..." }
Step 2: Define a schema
Schema is defined before opening the database. It declares signal types (what events you'll record and how they decay), text fields (for BM25 search), and embedding slots (for vector search).
use std::time::Duration;
use tidaldb::schema::{SchemaBuilder, EntityKind, DecaySpec, Window, TextFieldType};
let mut schema = SchemaBuilder::new();
// View signal: 7-day half-life, three windows, velocity enabled.
// You declare the decay. tidalDB applies it at query time — no formula to maintain.
let _ = schema.signal("view", EntityKind::Item, DecaySpec::Exponential {
half_life: Duration::from_secs(7 * 24 * 3600),
}).windows(&[Window::OneHour, Window::TwentyFourHours, Window::AllTime]).velocity(true).add();
// Like signal: 30-day half-life. Durable engagement decays slowly.
let _ = schema.signal("like", EntityKind::Item, DecaySpec::Exponential {
half_life: Duration::from_secs(30 * 24 * 3600),
}).windows(&[Window::AllTime]).velocity(false).add();
// Share signal: 3-day half-life. Short-lived but strongly trending.
let _ = schema.signal("share", EntityKind::Item, DecaySpec::Exponential {
half_life: Duration::from_secs(3 * 24 * 3600),
}).windows(&[Window::TwentyFourHours, Window::AllTime]).velocity(true).add();
// Skip signal: permanent. A user who skipped should not see it again.
let _ = schema.signal("hide", EntityKind::Item, DecaySpec::Permanent).add();
// Text fields for BM25 full-text search.
schema.text_field("title", TextFieldType::Text);
schema.text_field("category", TextFieldType::Keyword);
// Embedding slot for semantic / vector search (128D in this example).
// In production, use the dimensionality of your embedding model.
schema.embedding_slot("content", EntityKind::Item, 128);
let schema = schema.build()?;
Decay types:
Exponential { half_life }— weight halves everyhalf_life. Use for views, likes, shares.Linear { lifetime }— weight drops to zero overlifetime.Permanent— never decays. Use for hides, blocks, follows.
Step 3: Open the database
use tidaldb::TidalDb;
// Ephemeral: in-memory, ideal for tests and this tutorial.
let db = TidalDb::builder().ephemeral().with_schema(schema).open()?;
// Persistent: durable storage at a path on disk.
// let db = TidalDb::builder().with_data_dir("/var/lib/myapp/tidaldb").with_schema(schema).open()?;
db.health_check()?;
TidalDb is Send + Sync. Wrap it in Arc<TidalDb> to share across threads or tasks.
Step 4: Ingest content
Write items with metadata as HashMap<String, String> key-value pairs. Then write their embeddings separately.
tidalDB does not generate embeddings. You bring your model; tidalDB handles retrieval and ranking over the vectors you produce.
use std::collections::HashMap;
use tidaldb::schema::{EntityId, Timestamp};
let tracks = [
(1u64, "Introduction to Jazz Piano", "music", "1320"),
(2, "Rust Async Programming", "tech", "3600"),
(3, "Sourdough Bread Masterclass", "cooking", "2700"),
(4, "Jazz Improvisation Techniques","music", "1800"),
(5, "Building a Compiler in Rust", "tech", "5400"),
(6, "French Pastry Fundamentals", "cooking", "2100"),
(7, "Modal Jazz: Coltrane Changes", "music", "2400"),
(8, "WebAssembly from Scratch", "tech", "2700"),
(9, "Knife Skills for Home Cooks", "cooking", "900"),
(10, "Bebop Piano Vocabulary", "music", "1500"),
];
for (id, title, category, duration) in &tracks {
let mut meta = HashMap::new();
meta.insert("title".to_string(), title.to_string());
meta.insert("category".to_string(), category.to_string());
meta.insert("format".to_string(), "video".to_string());
meta.insert("duration".to_string(), duration.to_string());
meta.insert("created_at".to_string(), Timestamp::now().as_nanos().to_string());
db.write_item_with_metadata(EntityId::new(*id), &meta)?;
// In production: embed the title with your model.
// Here we use random unit vectors for illustration.
let embedding = random_unit_vector(128, &mut rng);
db.write_item_embedding(EntityId::new(*id), &embedding)?;
}
println!("Ingested {} items.", db.item_count());
On write, tidalDB:
- Stores the entity and metadata
- Indexes text fields into the BM25 index
- Inserts the embedding into the HNSW vector index
- Initializes the signal ledger with an exploration budget
- Makes the item immediately queryable
Step 5: Record engagement signals
When a user engages with content, write a signal. The feedback loop closes at write time — no Kafka consumer to lag, no feature store sync to schedule.
let now = Timestamp::now();
// Global signals — these update the item's aggregate signal ledger.
db.signal("view", EntityId::new(1), 1.0, now)?; // Jazz Piano viewed
db.signal("view", EntityId::new(4), 1.0, now)?;
db.signal("view", EntityId::new(7), 1.0, now)?; // Modal Jazz viewed
db.signal("like", EntityId::new(4), 1.0, now)?; // Jazz Improv liked
db.signal("share", EntityId::new(7), 1.0, now)?; // Modal Jazz shared
For signals with user context, use signal_with_context. This also updates the user's preference vector and interaction weights — enabling personalization.
let user_id = 42u64;
let creator_id = 100u64;
// User 42 viewed item 4. Their preference vector shifts toward jazz content.
db.signal_with_context("view", EntityId::new(4), 1.0, now, Some(user_id), Some(creator_id))?;
db.signal_with_context("like", EntityId::new(7), 1.0, now, Some(user_id), Some(creator_id))?;
// Negative signals are equal citizens.
db.signal("hide", EntityId::new(2), 1.0, now)?; // User hid the Rust video.
A ranking query issued 100ms later sees the updated state. No ETL required.
Step 6: Retrieve a ranked feed
tidalDB ships 25 built-in ranking profiles. The application names a profile; the database executes the full scoring pipeline.
use tidaldb::query::retrieve::Retrieve;
// Global trending: items with the highest share + view velocity.
let query = Retrieve::builder().profile("trending").limit(10).build()?;
let results = db.retrieve(&query)?;
println!("Trending ({} candidates):", results.total_candidates);
for item in &results.items {
let sigs: Vec<_> = item.signals.iter()
.map(|s| format!("{}={:.3}", s.name, s.value))
.collect();
println!(" #{} id={} score={:.4} [{}]",
item.rank, item.entity_id.as_u64(), item.score, sigs.join(", "));
}
Step 7: Personalize
Swap the profile to for_you. Because user 42 signaled views and likes on jazz content, their results differ from global trending.
// Personalized feed for user 42.
let query = Retrieve::builder()
.for_user(user_id)
.profile("for_you")
.limit(10)
.build()?;
let results = db.retrieve(&query)?;
println!("For You (user {}):", user_id);
for item in &results.items {
println!(" #{} id={} score={:.4}", item.rank, item.entity_id.as_u64(), item.score);
}
Other useful profiles:
"hot"— score with age decay (Reddit model)"following"— content from followed creators (requiresfor_user+ writtenfollowsrelationships)"hidden_gems"— high completion rate, low reach"top_week"— cumulative quality over the last 7 days"shuffle"— random, quality-weighted
Step 8: Search
Search combines BM25 full-text and ANN semantic similarity via Reciprocal Rank Fusion.
use tidaldb::query::search::Search;
// Flush the text index so recently written items are searchable.
// In production with persistent mode this happens automatically on a ~2s commit cycle.
db.flush_text_index()?;
// Keyword search, personalized for user 42.
let query = Search::builder()
.query("jazz piano")
.for_user(user_id)
.limit(5)
.build()?;
let results = db.search(&query)?;
println!("Search 'jazz piano':");
for item in &results.items {
println!(" #{} id={} bm25={:.3?} semantic={:.3?}",
item.rank,
item.entity_id.as_u64(),
item.bm25_score,
item.semantic_score,
);
}
Add a query embedding for hybrid search — text relevance + semantic similarity:
let query_vector = your_model.embed("jazz piano"); // same model as item embeddings
let query = Search::builder()
.query("jazz piano")
.vector(query_vector)
.for_user(user_id)
.limit(5)
.build()?;
Step 9: Close
db.close()?;
This flushes the WAL, checkpoints signal state, and persists indexes. In persistent mode, the next open recovers to the last checkpointed state.
What to explore next
| Topic | Where to look |
|---|---|
| Full API reference | API.md |
| Filters — format, duration, location, engagement thresholds | API.md — Filters |
| Diversity constraints | API.md — Diversity Constraints |
| All 25 ranking profiles | API.md — Sort Modes |
| Cohort-scoped trending | API.md — Cohorts |
| Collections and saved searches | API.md — Collections |
| Axum embedding example | tidal/examples/axum_embedding.rs |
| 14 content discovery surfaces | USE_CASES.md |
| Architecture and design decisions | ARCHITECTURE.md |