Milestone 8 (phases 1-4): - Shard-aware WAL segment naming, BatchHeader v2, ShardRouter - Transport trait, InProcessTransport, WalShipper, FollowerDb - HLC, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine - Session replication bridge with SeqNo/HWM, idempotency store Forage application: - Multi-source discovery engine with MAB exploration - Embedding-based label system, server handlers, UI refresh Other: - QUICKSTART.md, README.md, milestone-8 planning docs - Hard negative union semantics, RLHF export enhancements - Recovery benchmark and visibility test expansions - Split 8 oversized source files per CODING_GUIDELINES §9 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
297 lines
9.8 KiB
Markdown
297 lines
9.8 KiB
Markdown
# Quickstart
|
|
|
|
Get a working ranked feed in 10 minutes.
|
|
|
|
**Prerequisites:** Rust 1.91+, Cargo. No external services.
|
|
|
|
---
|
|
|
|
## Run the example
|
|
|
|
The fastest path is the included example, which demonstrates the complete loop — schema, ingest, signals, ranking:
|
|
|
|
```bash
|
|
cargo run --manifest-path tidal/Cargo.toml --example quickstart
|
|
```
|
|
|
|
The rest of this guide explains what it does and extends it with personalization and search.
|
|
|
|
---
|
|
|
|
## Step 1: Add the dependency
|
|
|
|
```toml
|
|
[dependencies]
|
|
tidaldb = { git = "https://github.com/your-org/tidalDB", rev = "..." }
|
|
```
|
|
|
|
---
|
|
|
|
## Step 2: Define a schema
|
|
|
|
Schema is defined before opening the database. It declares signal types (what events you'll record and how they decay), text fields (for BM25 search), and embedding slots (for vector search).
|
|
|
|
```rust
|
|
use std::time::Duration;
|
|
use tidaldb::schema::{SchemaBuilder, EntityKind, DecaySpec, Window, TextFieldType};
|
|
|
|
let mut schema = SchemaBuilder::new();
|
|
|
|
// View signal: 7-day half-life, three windows, velocity enabled.
|
|
// You declare the decay. tidalDB applies it at query time — no formula to maintain.
|
|
let _ = schema.signal("view", EntityKind::Item, DecaySpec::Exponential {
|
|
half_life: Duration::from_secs(7 * 24 * 3600),
|
|
}).windows(&[Window::OneHour, Window::TwentyFourHours, Window::AllTime]).velocity(true).add();
|
|
|
|
// Like signal: 30-day half-life. Durable engagement decays slowly.
|
|
let _ = schema.signal("like", EntityKind::Item, DecaySpec::Exponential {
|
|
half_life: Duration::from_secs(30 * 24 * 3600),
|
|
}).windows(&[Window::AllTime]).velocity(false).add();
|
|
|
|
// Share signal: 3-day half-life. Short-lived but strongly trending.
|
|
let _ = schema.signal("share", EntityKind::Item, DecaySpec::Exponential {
|
|
half_life: Duration::from_secs(3 * 24 * 3600),
|
|
}).windows(&[Window::TwentyFourHours, Window::AllTime]).velocity(true).add();
|
|
|
|
// Skip signal: permanent. A user who skipped should not see it again.
|
|
let _ = schema.signal("hide", EntityKind::Item, DecaySpec::Permanent).add();
|
|
|
|
// Text fields for BM25 full-text search.
|
|
schema.text_field("title", TextFieldType::Text);
|
|
schema.text_field("category", TextFieldType::Keyword);
|
|
|
|
// Embedding slot for semantic / vector search (128D in this example).
|
|
// In production, use the dimensionality of your embedding model.
|
|
schema.embedding_slot("content", EntityKind::Item, 128);
|
|
|
|
let schema = schema.build()?;
|
|
```
|
|
|
|
**Decay types:**
|
|
- `Exponential { half_life }` — weight halves every `half_life`. Use for views, likes, shares.
|
|
- `Linear { lifetime }` — weight drops to zero over `lifetime`.
|
|
- `Permanent` — never decays. Use for hides, blocks, follows.
|
|
|
|
---
|
|
|
|
## Step 3: Open the database
|
|
|
|
```rust
|
|
use tidaldb::TidalDb;
|
|
|
|
// Ephemeral: in-memory, ideal for tests and this tutorial.
|
|
let db = TidalDb::builder().ephemeral().with_schema(schema).open()?;
|
|
|
|
// Persistent: durable storage at a path on disk.
|
|
// let db = TidalDb::builder().with_data_dir("/var/lib/myapp/tidaldb").with_schema(schema).open()?;
|
|
|
|
db.health_check()?;
|
|
```
|
|
|
|
`TidalDb` is `Send + Sync`. Wrap it in `Arc<TidalDb>` to share across threads or tasks.
|
|
|
|
---
|
|
|
|
## Step 4: Ingest content
|
|
|
|
Write items with metadata as `HashMap<String, String>` key-value pairs. Then write their embeddings separately.
|
|
|
|
**tidalDB does not generate embeddings.** You bring your model; tidalDB handles retrieval and ranking over the vectors you produce.
|
|
|
|
```rust
|
|
use std::collections::HashMap;
|
|
use tidaldb::schema::{EntityId, Timestamp};
|
|
|
|
let tracks = [
|
|
(1u64, "Introduction to Jazz Piano", "music", "1320"),
|
|
(2, "Rust Async Programming", "tech", "3600"),
|
|
(3, "Sourdough Bread Masterclass", "cooking", "2700"),
|
|
(4, "Jazz Improvisation Techniques","music", "1800"),
|
|
(5, "Building a Compiler in Rust", "tech", "5400"),
|
|
(6, "French Pastry Fundamentals", "cooking", "2100"),
|
|
(7, "Modal Jazz: Coltrane Changes", "music", "2400"),
|
|
(8, "WebAssembly from Scratch", "tech", "2700"),
|
|
(9, "Knife Skills for Home Cooks", "cooking", "900"),
|
|
(10, "Bebop Piano Vocabulary", "music", "1500"),
|
|
];
|
|
|
|
for (id, title, category, duration) in &tracks {
|
|
let mut meta = HashMap::new();
|
|
meta.insert("title".to_string(), title.to_string());
|
|
meta.insert("category".to_string(), category.to_string());
|
|
meta.insert("format".to_string(), "video".to_string());
|
|
meta.insert("duration".to_string(), duration.to_string());
|
|
meta.insert("created_at".to_string(), Timestamp::now().as_nanos().to_string());
|
|
|
|
db.write_item_with_metadata(EntityId::new(*id), &meta)?;
|
|
|
|
// In production: embed the title with your model.
|
|
// Here we use random unit vectors for illustration.
|
|
let embedding = random_unit_vector(128, &mut rng);
|
|
db.write_item_embedding(EntityId::new(*id), &embedding)?;
|
|
}
|
|
|
|
println!("Ingested {} items.", db.item_count());
|
|
```
|
|
|
|
On write, tidalDB:
|
|
1. Stores the entity and metadata
|
|
2. Indexes text fields into the BM25 index
|
|
3. Inserts the embedding into the HNSW vector index
|
|
4. Initializes the signal ledger with an exploration budget
|
|
5. Makes the item immediately queryable
|
|
|
|
---
|
|
|
|
## Step 5: Record engagement signals
|
|
|
|
When a user engages with content, write a signal. The feedback loop closes at write time — no Kafka consumer to lag, no feature store sync to schedule.
|
|
|
|
```rust
|
|
let now = Timestamp::now();
|
|
|
|
// Global signals — these update the item's aggregate signal ledger.
|
|
db.signal("view", EntityId::new(1), 1.0, now)?; // Jazz Piano viewed
|
|
db.signal("view", EntityId::new(4), 1.0, now)?;
|
|
db.signal("view", EntityId::new(7), 1.0, now)?; // Modal Jazz viewed
|
|
db.signal("like", EntityId::new(4), 1.0, now)?; // Jazz Improv liked
|
|
db.signal("share", EntityId::new(7), 1.0, now)?; // Modal Jazz shared
|
|
```
|
|
|
|
For signals with user context, use `signal_with_context`. This also updates the user's preference vector and interaction weights — enabling personalization.
|
|
|
|
```rust
|
|
let user_id = 42u64;
|
|
let creator_id = 100u64;
|
|
|
|
// User 42 viewed item 4. Their preference vector shifts toward jazz content.
|
|
db.signal_with_context("view", EntityId::new(4), 1.0, now, Some(user_id), Some(creator_id))?;
|
|
db.signal_with_context("like", EntityId::new(7), 1.0, now, Some(user_id), Some(creator_id))?;
|
|
|
|
// Negative signals are equal citizens.
|
|
db.signal("hide", EntityId::new(2), 1.0, now)?; // User hid the Rust video.
|
|
```
|
|
|
|
A ranking query issued 100ms later sees the updated state. No ETL required.
|
|
|
|
---
|
|
|
|
## Step 6: Retrieve a ranked feed
|
|
|
|
tidalDB ships 25 built-in ranking profiles. The application names a profile; the database executes the full scoring pipeline.
|
|
|
|
```rust
|
|
use tidaldb::query::retrieve::Retrieve;
|
|
|
|
// Global trending: items with the highest share + view velocity.
|
|
let query = Retrieve::builder().profile("trending").limit(10).build()?;
|
|
let results = db.retrieve(&query)?;
|
|
|
|
println!("Trending ({} candidates):", results.total_candidates);
|
|
for item in &results.items {
|
|
let sigs: Vec<_> = item.signals.iter()
|
|
.map(|s| format!("{}={:.3}", s.name, s.value))
|
|
.collect();
|
|
println!(" #{} id={} score={:.4} [{}]",
|
|
item.rank, item.entity_id.as_u64(), item.score, sigs.join(", "));
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Step 7: Personalize
|
|
|
|
Swap the profile to `for_you`. Because user 42 signaled views and likes on jazz content, their results differ from global trending.
|
|
|
|
```rust
|
|
// Personalized feed for user 42.
|
|
let query = Retrieve::builder()
|
|
.for_user(user_id)
|
|
.profile("for_you")
|
|
.limit(10)
|
|
.build()?;
|
|
let results = db.retrieve(&query)?;
|
|
|
|
println!("For You (user {}):", user_id);
|
|
for item in &results.items {
|
|
println!(" #{} id={} score={:.4}", item.rank, item.entity_id.as_u64(), item.score);
|
|
}
|
|
```
|
|
|
|
Other useful profiles:
|
|
- `"hot"` — score with age decay (Reddit model)
|
|
- `"following"` — content from followed creators (requires `for_user` + written `follows` relationships)
|
|
- `"hidden_gems"` — high completion rate, low reach
|
|
- `"top_week"` — cumulative quality over the last 7 days
|
|
- `"shuffle"` — random, quality-weighted
|
|
|
|
---
|
|
|
|
## Step 8: Search
|
|
|
|
Search combines BM25 full-text and ANN semantic similarity via Reciprocal Rank Fusion.
|
|
|
|
```rust
|
|
use tidaldb::query::search::Search;
|
|
|
|
// Flush the text index so recently written items are searchable.
|
|
// In production with persistent mode this happens automatically on a ~2s commit cycle.
|
|
db.flush_text_index()?;
|
|
|
|
// Keyword search, personalized for user 42.
|
|
let query = Search::builder()
|
|
.query("jazz piano")
|
|
.for_user(user_id)
|
|
.limit(5)
|
|
.build()?;
|
|
let results = db.search(&query)?;
|
|
|
|
println!("Search 'jazz piano':");
|
|
for item in &results.items {
|
|
println!(" #{} id={} bm25={:.3?} semantic={:.3?}",
|
|
item.rank,
|
|
item.entity_id.as_u64(),
|
|
item.bm25_score,
|
|
item.semantic_score,
|
|
);
|
|
}
|
|
```
|
|
|
|
Add a query embedding for hybrid search — text relevance + semantic similarity:
|
|
|
|
```rust
|
|
let query_vector = your_model.embed("jazz piano"); // same model as item embeddings
|
|
let query = Search::builder()
|
|
.query("jazz piano")
|
|
.vector(query_vector)
|
|
.for_user(user_id)
|
|
.limit(5)
|
|
.build()?;
|
|
```
|
|
|
|
---
|
|
|
|
## Step 9: Close
|
|
|
|
```rust
|
|
db.close()?;
|
|
```
|
|
|
|
This flushes the WAL, checkpoints signal state, and persists indexes. In persistent mode, the next open recovers to the last checkpointed state.
|
|
|
|
---
|
|
|
|
## What to explore next
|
|
|
|
| Topic | Where to look |
|
|
|-------|--------------|
|
|
| Full API reference | [API.md](API.md) |
|
|
| Filters — format, duration, location, engagement thresholds | [API.md — Filters](API.md#filters) |
|
|
| Diversity constraints | [API.md — Diversity Constraints](API.md#diversity-constraints) |
|
|
| All 25 ranking profiles | [API.md — Sort Modes](API.md#sort-modes) |
|
|
| Cohort-scoped trending | [API.md — Cohorts](API.md#cohort-definitions) |
|
|
| Collections and saved searches | [API.md — Collections](API.md#collections) |
|
|
| Axum embedding example | `tidal/examples/axum_embedding.rs` |
|
|
| 14 content discovery surfaces | [USE_CASES.md](USE_CASES.md) |
|
|
| Architecture and design decisions | [ARCHITECTURE.md](ARCHITECTURE.md) |
|