tidaldb/API.md
jordan 413b712c0a chore: initialize tidalDB repository with schema foundation and standards
- Schema phase 1 (tasks 01-02): EntityId, EntityKind, Timestamp, Score, SignalTypeDef, DecayModel, Window, WindowSet — all with property tests and benchmarks scaffolding
- Stub modules for storage, signals, query, ranking
- Full documentation suite: VISION, USE_CASES, SEQUENCE, API, CODING_GUIDELINES, ai-lookup, research docs, specs, roadmap, planning docs
- Marketing site (Next.js) with blog infrastructure
- .claude/ agents and skills for the tidalDB development workflow
- Foundation standards enforced: thiserror + tracing declared as dependencies, clippy::unwrap_used = deny added to lint config
- .gitignore hardened: .next/, node_modules/, .env, secrets, logs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 12:52:20 -07:00

1229 lines
35 KiB
Markdown

# API Reference
How developers interact with tidalDB. This document covers initialization, schema definition, write operations, queries, and the feedback loop.
tidalDB is an embeddable Rust library. You link it into your process. There is no separate server, no network protocol, no client SDK. The API is Rust types and method calls.
---
## Table of Contents
- [Initialization](#initialization)
- [Schema Definition](#schema-definition)
- [Entity Types](#entity-types)
- [Signal Definitions](#signal-definitions)
- [Ranking Profiles](#ranking-profiles)
- [Write Path](#write-path)
- [Ingesting Entities](#ingesting-entities)
- [Updating Entities](#updating-entities)
- [Writing Relationships](#writing-relationships)
- [Writing Signals](#writing-signals)
- [Query Language](#query-language)
- [RETRIEVE — Feeds, Browse, Related](#retrieve--feeds-browse-related)
- [SEARCH — Text + Semantic Retrieval](#search--text--semantic-retrieval)
- [SUGGEST — Autocomplete and Suggestions](#suggest--autocomplete-and-suggestions)
- [Filters](#filters)
- [Sort Modes](#sort-modes)
- [Diversity Constraints](#diversity-constraints)
- [Pagination](#pagination)
- [Response Format](#response-format)
- [Lifecycle and Operations](#lifecycle-and-operations)
---
## Initialization
Open a database, providing a path and configuration.
```rust
use tidaldb::{TidalDB, Config};
let db = TidalDB::open(Config {
path: "./data/my_app",
// Memory budget for in-memory signal state and hot caches.
// Higher = faster ranking queries. 10M entities need ~1-2 GB.
memory_budget: 2 * 1024 * 1024 * 1024, // 2 GB
// WAL fsync strategy for signal writes.
signal_durability: Durability::Batched { max_batch: 100, max_delay_ms: 10 },
// Number of threads for background materializer and segment merging.
background_threads: 4,
})?;
```
**`Durability`** controls fsync behavior for signal writes:
| Level | Behavior | Use Case |
|---|---|---|
| `Immediate` | fsync every write | Financial, purchase events |
| `Batched { max_batch, max_delay_ms }` | fsync per batch | Default for engagement signals |
| `Eventual` | fsync on OS schedule | Impressions, low-value telemetry |
The database is `Send + Sync`. Share it across threads with `Arc<TidalDB>`.
---
## Schema Definition
Schema is defined before writing data. It declares entity types, signal types, and ranking profiles. Schema is versioned — old profiles remain queryable by name and version.
### Entity Types
Entities are the nodes of the system. Three built-in types: **Item**, **User**, **Creator**.
```rust
use tidaldb::schema::*;
db.define_entity(EntityDef {
kind: EntityKind::Item,
metadata_fields: vec![
Field::text("title"), // full-text indexed
Field::text("description"), // full-text indexed
Field::keyword("category"), // exact match, filterable
Field::keywords("tags"), // multi-value, filterable
Field::keyword("format"), // video, short, podcast, article, etc.
Field::keyword("language"), // ISO code
Field::keyword("content_rating"), // G, PG, PG-13, R
Field::keyword("status"), // published, live, scheduled, archived
Field::keyword("availability"), // free, premium, subscriber_only
Field::duration("duration"), // filterable, sortable
Field::timestamp("created_at"), // filterable, sortable
Field::bool("has_subtitles"),
Field::bool("downloadable"),
],
// Embedding slot — you provide the vector, tidalDB indexes it.
embedding_dimensions: 1536,
})?;
db.define_entity(EntityDef {
kind: EntityKind::User,
metadata_fields: vec![
// Demographic (application-set)
Field::keyword("locale"), // en-US, ja-JP, etc.
Field::keyword("region"), // country or region code
Field::keyword("timezone"), // IANA timezone
Field::keyword("age_range"), // 18-24, 25-34, 35-44, etc.
Field::keyword("gender"), // optional demographic
// Interests (mixed: app-set + DB-computed)
Field::keywords("explicit_interests"), // stated interests
Field::keywords("inferred_interests"), // DB-computed from engagement
Field::keywords("primary_categories"), // top categories by engagement
// Behavioral (DB-computed)
Field::keyword("engagement_level"), // power, regular, casual, dormant
Field::keyword("format_preference"), // short, long, mixed
Field::keyword("session_pattern"), // binge, browse, search
Field::i64("platform_tenure_days"), // days since first signal
],
// User preference vector — managed by the database.
// Updated automatically on every signal write.
embedding_dimensions: 1536,
})?;
db.define_entity(EntityDef {
kind: EntityKind::Creator,
metadata_fields: vec![
Field::text("name"),
Field::keyword("handle"),
Field::keyword("language"),
Field::keyword("region"),
Field::bool("verified"),
],
// Creator embedding — aggregated from their item catalog.
embedding_dimensions: 1536,
})?;
```
**Field types:**
| Type | Behavior |
|---|---|
| `text` | Full-text indexed (BM25), searchable with tokenization |
| `keyword` | Exact match, filterable, facetable |
| `keywords` | Multi-value keyword (tags, categories) |
| `i64` / `f64` | Numeric, range-filterable, sortable |
| `bool` | Boolean filter |
| `timestamp` | Time value, range-filterable |
| `duration` | Duration value, range-filterable, sortable |
### Signal Definitions
Signals are typed, timestamped event streams. Decay, velocity, and windowed aggregation are declared in schema — not computed in application code.
```rust
db.define_signal(SignalDef {
name: "view",
target: EntityKind::Item,
decay: Decay::Exponential { half_life: Duration::days(7) },
windows: vec![
Window::hours(1),
Window::hours(24),
Window::days(7),
Window::days(30),
Window::all_time(),
],
velocity: true, // compute rate-of-change per window
})?;
db.define_signal(SignalDef {
name: "like",
target: EntityKind::Item,
decay: Decay::Exponential { half_life: Duration::days(7) },
windows: vec![Window::hours(1), Window::hours(24), Window::days(7), Window::all_time()],
velocity: true,
})?;
db.define_signal(SignalDef {
name: "skip",
target: EntityKind::Item,
decay: Decay::Exponential { half_life: Duration::hours(24) },
windows: vec![Window::hours(1), Window::hours(24)],
velocity: false,
})?;
db.define_signal(SignalDef {
name: "hide",
target: EntityKind::Item,
decay: Decay::Permanent, // never decays
windows: vec![], // no aggregation — binary flag
velocity: false,
})?;
db.define_signal(SignalDef {
name: "completion",
target: EntityKind::Item,
decay: Decay::Exponential { half_life: Duration::days(30) },
windows: vec![Window::all_time()],
velocity: false,
})?;
```
**Decay types:**
| Decay | Behavior |
|---|---|
| `Exponential { half_life }` | Signal weight halves every `half_life` duration |
| `Linear { lifetime }` | Signal weight drops linearly to zero over `lifetime` |
| `Permanent` | Never decays — hides, blocks, follows |
The full signal reference is in [USE_CASES.md Appendix C](USE_CASES.md#appendix-c--signal-reference).
### Ranking Profiles
Named, versioned scoring functions. The application says `USING PROFILE for_you`. The database executes the entire pipeline.
```rust
db.define_profile(ProfileDef {
name: "for_you",
version: 1,
candidate: Candidate::Ann {
query_vector: VectorSource::UserPreference,
index: EntityKind::Item,
top_k: 500,
},
boosts: vec![
Boost::signal("view", Window::hours(24), Velocity, 0.3),
Boost::relationship("interaction_weight", 0.2),
Boost::social_proof(0.15),
],
decay: ProfileDecay {
field: "created_at",
half_life: Duration::hours(48),
},
gates: vec![
Gate::min("completion", Window::all_time(), 0.3),
],
penalties: vec![
Penalty::signal("skip", Window::hours(24), -0.5),
],
excludes: vec![
Exclude::signal("hide"),
Exclude::relationship("blocked"),
],
diversity: DiversitySpec {
max_per_creator: Some(2),
format_mix: true,
topic_diversity: None,
},
exploration: 0.10, // 10% of results from creators the user doesn't follow
})?;
db.define_profile(ProfileDef {
name: "trending",
version: 1,
candidate: Candidate::Scan { entity: EntityKind::Item },
boosts: vec![
Boost::signal("share", Window::hours(6), Velocity, 0.5),
Boost::signal("view", Window::hours(6), Velocity, 0.3),
Boost::signal("view", Window::hours(24), UniqueRatio, 0.2), // new-user reach
],
gates: vec![
Gate::min_ratio("engagement_ratio", 0.03),
],
// No personalization. No user context. Pure velocity.
..ProfileDef::default()
})?;
db.define_profile(ProfileDef {
name: "search",
version: 1,
candidate: Candidate::Hybrid {
text_weight: 0.6,
vector_weight: 0.4,
fusion: Fusion::Rrf { k: 60 },
},
boosts: vec![
Boost::signal("completion", Window::all_time(), Value, 0.15),
Boost::signal("like", Window::all_time(), Ratio, 0.10),
],
decay: ProfileDecay {
field: "created_at",
half_life: Duration::days(90), // slow decay for evergreen content
},
diversity: DiversitySpec {
max_per_creator: Some(2),
..Default::default()
},
..ProfileDef::default()
})?;
db.define_profile(ProfileDef {
name: "following",
version: 1,
candidate: Candidate::Relationship { edge: "follows" },
boosts: vec![],
// Pure reverse chronological. Minimal algorithmic intervention.
sort: Sort::Field("created_at", Desc),
..ProfileDef::default()
})?;
```
See [ai-lookup/services/ranking-profiles.md](ai-lookup/services/ranking-profiles.md) for the full list of built-in profiles.
### Cohort Definitions
Cohorts are named predicates over user attributes. They define audience segments for scoped signal aggregation and trending.
```rust
db.define_cohort(CohortDef {
name: "us_young_music",
predicate: Predicate::and(vec![
Predicate::eq("locale", "en-US"),
Predicate::any("age_range", &["18-24", "25-34"]),
Predicate::any("primary_categories", &["music", "concerts"]),
]),
// Signal aggregation level — controls storage cost vs. query flexibility
aggregation: CohortAggregation::Materialized,
})?;
db.define_cohort(CohortDef {
name: "jp_casual",
predicate: Predicate::and(vec![
Predicate::eq("region", "JP"),
Predicate::eq("engagement_level", "casual"),
]),
aggregation: CohortAggregation::Materialized,
})?;
// Ad-hoc cohorts can be queried without pre-definition,
// but use query-time aggregation (slower).
```
**`CohortAggregation`:**
| Level | Behavior | Use Case |
|---|---|---|
| `Materialized` | Pre-computed per-cohort signal aggregates, updated at signal write time | High-traffic cohorts, production trending pages |
| `OnDemand` | Computed at query time by filtering signal events | Ad-hoc analysis, rare cohort combinations |
---
## Write Path
### Ingesting Entities
Items enter the system with metadata and an embedding. The application provides the embedding — tidalDB does not generate vectors.
```rust
db.write_item(WriteItem {
id: "item_abc",
creator_id: "creator_xyz",
metadata: metadata! {
"title" => "Introduction to Jazz Piano",
"description" => "A beginner's guide to jazz piano...",
"category" => "music",
"tags" => ["jazz", "piano", "tutorial", "beginner"],
"format" => "video",
"language" => "en",
"duration" => Duration::minutes(22),
"content_rating" => "G",
"status" => "published",
"availability" => "free",
"has_subtitles" => true,
"downloadable" => false,
"created_at" => Utc::now(),
},
embedding: &content_vector, // [f32; 1536] — you compute this externally
})?;
```
On commit, the item is:
1. Stored in the entity store
2. Text fields indexed in the inverted index (BM25)
3. Embedding inserted into the ANN index (HNSW)
4. Signal ledger initialized (all zeros)
5. Linked to its creator entity
6. Given a cold-start exploration budget
7. **Immediately queryable**
```rust
db.write_creator(WriteCreator {
id: "creator_xyz",
metadata: metadata! {
"name" => "Jazz Academy",
"handle" => "jazzacademy",
"language" => "en",
"verified" => true,
},
embedding: &creator_vector, // aggregated from catalog, computed externally
})?;
db.write_user(WriteUser {
id: "user_123",
metadata: metadata! {
"locale" => "en-US",
"region" => "US",
"timezone" => "America/New_York",
"age_range" => "18-24",
"explicit_interests" => ["jazz", "piano", "music production"],
},
// Initial preference vector. If None, uses cohort-level or population-level default.
embedding: None,
})?;
```
### Updating Entities
Update metadata or embeddings on existing entities.
```rust
db.update_item("item_abc", UpdateItem {
metadata: Some(metadata! {
"status" => "archived",
}),
embedding: None, // unchanged
})?;
```
### Writing Relationships
Relationships are first-class edges between entities. Weighted, directional, traversable in queries.
```rust
db.write_relationship(Relationship {
kind: "follows",
from: ("user", "user_123"),
to: ("creator", "creator_xyz"),
weight: 1.0,
timestamp: Utc::now(),
})?;
// Block a creator — permanent, hard filter in all future queries.
db.write_relationship(Relationship {
kind: "blocked",
from: ("user", "user_123"),
to: ("creator", "creator_bad"),
weight: 1.0,
timestamp: Utc::now(),
})?;
```
### Writing Signals
Signals are how the feedback loop closes. A single signal write atomically updates:
1. The item's signal ledger (windowed aggregates, velocity, decay score)
2. The user's preference vector (shifted toward or away from the item's embedding)
3. The user-to-creator relationship weight
4. The user-to-item relationship (seen, liked, hidden, etc.)
```rust
// User viewed an item
db.signal(Signal {
kind: "view",
item: "item_abc",
user: "user_123",
timestamp: Utc::now(),
weight: 1.0,
context: None,
})?;
// User completed 94% of the video
db.signal(Signal {
kind: "completion",
item: "item_abc",
user: "user_123",
timestamp: Utc::now(),
weight: 0.94, // completion ratio
context: None,
})?;
// User liked an item
db.signal(Signal {
kind: "like",
item: "item_abc",
user: "user_123",
timestamp: Utc::now(),
weight: 1.0,
context: None,
})?;
// User skipped after 3 seconds (strong negative)
db.signal(Signal {
kind: "skip",
item: "item_xyz",
user: "user_123",
timestamp: Utc::now(),
weight: 1.0,
context: Some(json!({ "dwell_ms": 3200, "source": "autoplay" })),
})?;
// User tapped "Not interested" (permanent negative on this item)
db.signal(Signal {
kind: "hide",
item: "item_xyz",
user: "user_123",
timestamp: Utc::now(),
weight: 1.0,
context: None,
})?;
// Search click with rank context (trains query relevance)
db.signal(Signal {
kind: "search_click",
item: "item_abc",
user: "user_123",
timestamp: Utc::now(),
weight: 1.0,
context: Some(json!({
"query": "jazz piano tutorial",
"rank_at_click": 3
})),
})?;
```
The next ranking query — even 100ms later — reflects the updated state.
---
## Query Language
Three operations: **RETRIEVE** (feed generation, browse, related), **SEARCH** (text + semantic retrieval), **SUGGEST** (autocomplete).
All queries return ranked results with scores. The application renders — it never re-ranks.
### RETRIEVE — Feeds, Browse, Related
RETRIEVE generates ranked content lists. It handles personalized feeds, category browse, trending, following, related content, and every other surface described in [USE_CASES.md](USE_CASES.md).
```rust
// Personalized For You feed
let results = db.retrieve(Retrieve {
entity: EntityKind::Item,
for_user: Some("user_123"),
context: Some("feed"),
profile: "for_you",
filters: vec![
Filter::unseen(),
Filter::not_blocked(),
Filter::eq("format", "video"),
Filter::preset("duration", "short"), // under 4 minutes
],
diversity: Some(DiversitySpec {
max_per_creator: Some(2),
format_mix: true,
topic_diversity: None,
}),
exclude_ids: vec![], // previously returned items
limit: 50,
cursor: None,
})?;
```
```rust
// Trending globally
let results = db.retrieve(Retrieve {
entity: EntityKind::Item,
for_user: None, // no personalization
profile: "trending",
filters: vec![],
diversity: Some(DiversitySpec {
max_per_creator: Some(1),
..Default::default()
}),
limit: 25,
..Default::default()
})?;
```
```rust
// Trending in a category
let results = db.retrieve(Retrieve {
entity: EntityKind::Item,
profile: "trending",
filters: vec![
Filter::eq("category", "jazz"),
],
diversity: Some(DiversitySpec {
max_per_creator: Some(1),
..Default::default()
}),
limit: 25,
..Default::default()
})?;
```
```rust
// Trending among people I follow (social graph scoped)
let results = db.retrieve(Retrieve {
entity: EntityKind::Item,
for_user: Some("user_123"),
profile: "trending",
filters: vec![
Filter::social_graph("user_123", Depth::Two),
],
limit: 25,
..Default::default()
})?;
```
```rust
// Trending within a cohort — what's hot among US young music fans
let results = db.retrieve(Retrieve {
entity: EntityKind::Item,
profile: "trending",
cohort: Some("us_young_music"), // scopes signal aggregation
filters: vec![],
diversity: Some(DiversitySpec {
max_per_creator: Some(1),
..Default::default()
}),
limit: 25,
..Default::default()
})?;
```
```rust
// Ad-hoc cohort (not pre-defined) — slower but flexible
let results = db.retrieve(Retrieve {
entity: EntityKind::Item,
profile: "trending",
cohort_predicate: Some(Predicate::and(vec![
Predicate::eq("region", "DE"),
Predicate::any("primary_categories", &["cooking", "food"]),
])),
limit: 25,
..Default::default()
})?;
```
```rust
// Following feed — pure chronological
let results = db.retrieve(Retrieve {
entity: EntityKind::Item,
for_user: Some("user_123"),
profile: "following",
filters: vec![
Filter::relationship("follows"),
Filter::unseen(),
],
limit: 50,
cursor: Some(cursor_from_last_batch),
})?;
```
```rust
// Related content / Up Next — anchored to a specific item
let results = db.retrieve(Retrieve {
entity: EntityKind::Item,
for_user: Some("user_123"),
profile: "related",
similar_to: Some("item_abc"), // anchor item
filters: vec![
Filter::unseen(),
],
diversity: Some(DiversitySpec {
max_per_creator: Some(1), // avoid same creator in top 3
..Default::default()
}),
limit: 10,
..Default::default()
})?;
```
```rust
// Browse category with explicit sort mode
let results = db.retrieve(Retrieve {
entity: EntityKind::Item,
profile: "browse",
sort: Some(Sort::TopWeek), // override profile's default sort
filters: vec![
Filter::eq("category", "jazz"),
Filter::preset("duration", "short"),
Filter::eq("has_subtitles", true),
],
limit: 20,
..Default::default()
})?;
```
```rust
// Hidden gems — high quality, low reach
let results = db.retrieve(Retrieve {
entity: EntityKind::Item,
profile: "hidden_gems",
filters: vec![
Filter::created_within(Duration::days(30)),
],
limit: 20,
..Default::default()
})?;
```
```rust
// User's watch history — personal library
let results = db.retrieve(Retrieve {
entity: EntityKind::Item,
for_user: Some("user_123"),
sort: Some(Sort::DateSaved),
filters: vec![
Filter::user_state("saved"),
],
limit: 20,
..Default::default()
})?;
```
```rust
// Creator discovery — find creators like another creator
let results = db.retrieve(Retrieve {
entity: EntityKind::Creator,
for_user: Some("user_123"),
similar_to: Some("creator_xyz"),
sort: Some(Sort::CreatorEngagementRate),
limit: 10,
..Default::default()
})?;
```
```rust
// Live content — what's live right now that this user cares about
let results = db.retrieve(Retrieve {
entity: EntityKind::Item,
for_user: Some("user_123"),
profile: "live",
filters: vec![
Filter::eq("status", "live"),
],
sort: Some(Sort::LiveViewerCount),
limit: 20,
..Default::default()
})?;
```
```rust
// Notification prioritization
let results = db.retrieve(Retrieve {
entity: EntityKind::Item,
for_user: Some("user_123"),
profile: "notification",
filters: vec![
Filter::since(last_seen_timestamp),
],
limit: 20,
..Default::default()
})?;
```
### SEARCH — Text + Semantic Retrieval
Search combines full-text BM25 relevance with semantic similarity. Text relevance is the floor — an irrelevant result never surfaces just because the user likes the creator.
```rust
// Basic keyword search, personalized for this user
let results = db.search(Search {
query: "rust tutorial beginner",
vector: Some(&query_embedding), // embed the query externally, pass it in
for_user: Some("user_123"),
profile: "search",
filters: vec![],
diversity: Some(DiversitySpec {
max_per_creator: Some(2),
..Default::default()
}),
limit: 20,
})?;
```
**Query syntax** — users can type these directly. The database parses them.
| Syntax | Meaning |
|---|---|
| `jazz piano tutorial` | Match any of these terms (OR), rank by relevance |
| `"jazz piano"` | Exact phrase match |
| `jazz AND piano NOT beginner` | Boolean operators |
| `jazz -beginner` | Exclude term |
| `jazz pian*` | Wildcard prefix |
| `title:jazz` | Field-scoped search |
| `tag:tutorial` | Search within specific field |
| `creator:jazzacademy` | Search by creator handle |
| `#jazz` | Hashtag match |
```rust
// Exact phrase with date filter
let results = db.search(Search {
query: "\"machine learning\" fundamentals",
filters: vec![
Filter::created_within(Duration::days(30)),
Filter::eq("format", "video"),
Filter::preset("duration", "long"),
],
limit: 20,
..Default::default()
})?;
```
```rust
// Semantic-only search (no text query, just a vector)
let results = db.search(Search {
query: "",
vector: Some(&image_embedding), // visual similarity search
for_user: Some("user_123"),
profile: "search",
limit: 20,
..Default::default()
})?;
```
```rust
// People/creator search
let results = db.search(Search {
query: "jazz piano",
entity: EntityKind::Creator,
filters: vec![
Filter::eq("verified", true),
Filter::min("follower_count", 1000),
],
sort: Some(Sort::CreatorEngagementRate),
limit: 10,
..Default::default()
})?;
```
### Query Composition — SEARCH within Scoped Results
SEARCH can be composed with RETRIEVE scopes. This enables searching within trending, within a cohort, or within any candidate set.
```rust
// Search within globally trending items
let results = db.search(Search {
query: "jazz piano",
vector: Some(&query_embedding),
within: Some(WithinScope::Trending {
window: Duration::hours(24),
}),
profile: "search",
limit: 20,
..Default::default()
})?;
```
```rust
// Search within cohort-scoped trending — the full three-layer query
let results = db.search(Search {
query: "jazz piano",
vector: Some(&query_embedding),
within: Some(WithinScope::CohortTrending {
cohort: "us_young_music",
window: Duration::hours(24),
}),
profile: "search",
limit: 20,
..Default::default()
})?;
```
```rust
// Search within a user's following feed
let results = db.search(Search {
query: "jazz piano",
for_user: Some("user_123"),
within: Some(WithinScope::Following),
profile: "search",
limit: 20,
..Default::default()
})?;
```
**`WithinScope`:**
| Scope | Candidate Set |
|---|---|
| `Trending { window }` | Items with high global velocity in window |
| `CohortTrending { cohort, window }` | Items with high velocity among cohort members |
| `Following` | Items from followed creators (requires for_user) |
| `Category { name }` | Items in a category |
| `Collection { id }` | Items in a collection |
### SUGGEST — Autocomplete and Suggestions
```rust
// Autocomplete on partial query
let suggestions = db.suggest(Suggest {
prefix: "jazz pia",
for_user: Some("user_123"),
limit: 5,
})?;
// Returns: ["jazz piano", "jazz piano tutorial", "jazz piano chords", ...]
// Trending searches (empty prefix)
let trending = db.suggest(Suggest {
prefix: "",
for_user: None,
limit: 10,
})?;
```
---
## Filters
All filters are composable. Any combination of filters produces a valid, efficiently-executed query. Multi-select within a dimension uses OR logic. Cross-dimension uses AND logic.
### Content Attribute Filters
```rust
Filter::eq("category", "jazz") // exact match
Filter::any("category", &["jazz", "blues"]) // OR within dimension
Filter::eq("format", "video")
Filter::eq("language", "en")
Filter::eq("content_rating", "PG")
Filter::eq("status", "published")
Filter::eq("availability", "free")
Filter::eq("has_subtitles", true)
Filter::eq("downloadable", true)
Filter::eq("original_only", true) // exclude crossposts
```
### Duration Filters
```rust
Filter::preset("duration", "short") // under 4 minutes
Filter::preset("duration", "medium") // 4-20 minutes
Filter::preset("duration", "long") // over 20 minutes
Filter::range("duration", 5 * 60..15 * 60) // custom range (seconds)
```
### Date / Time Filters
```rust
Filter::created_within(Duration::days(7))
Filter::created_preset("today") // today, week, month, year
Filter::created_after("2025-01-01T00:00:00Z")
Filter::created_before("2025-06-01T00:00:00Z")
Filter::since(timestamp) // for notifications
```
### Creator Filters
```rust
Filter::eq("creator", "creator_xyz")
Filter::exclude_creator("creator_bad")
Filter::eq("creator_verified", true)
Filter::creator_followed_by_user() // requires for_user
Filter::creator_new_to_user() // never seen this creator
Filter::min("creator_min_followers", 1000)
Filter::max("creator_max_followers", 50000)
```
### Engagement Threshold Filters
```rust
Filter::min("min_views", 10000)
Filter::max("max_views", 5000) // for hidden gems
Filter::min("min_likes", 100)
Filter::min("min_like_ratio", 0.85)
Filter::min("min_completion_rate", 0.5)
Filter::min("min_comments", 50)
```
### User State Filters
```rust
Filter::unseen() // not yet viewed by this user
Filter::user_state("seen") // already viewed
Filter::user_state("in_progress") // partially watched
Filter::user_state("saved") // bookmarked / watch later
Filter::user_state("liked") // user has liked
Filter::user_state("downloaded") // available offline
Filter::in_collection("playlist_abc") // in specific collection
Filter::not_blocked() // exclude blocked creators
```
### Geographic Filters
```rust
Filter::eq("content_region", "US")
Filter::eq("trending_in_region", "US")
Filter::near_location(lat, lng, radius_km)
```
### Cohort Filters
```rust
Filter::cohort("us_young_music") // pre-defined cohort
Filter::cohort_predicate(Predicate::and(vec![ // ad-hoc cohort
Predicate::eq("locale", "en-US"),
Predicate::eq("age_range", "18-24"),
]))
```
### Social Graph Filters
```rust
Filter::relationship("follows") // from followed creators only
Filter::social_graph("user_123", Depth::Two) // engaged by user's follows
```
See [USE_CASES.md Appendix A](USE_CASES.md#appendix-a--filter-reference) for the complete filter reference.
---
## Sort Modes
Sort modes are built-in. The application names a mode. The database executes it. No application-side sorting logic.
Sort can be specified as an override on any RETRIEVE query using the `sort` field, or it can be embedded in a ranking profile.
```rust
Sort::Relevance // text + semantic match (search only)
Sort::Personalized // user preference match
Sort::New // created_at DESC
Sort::Old // created_at ASC
Sort::Hot // score / (age + 2)^gravity — Reddit model
Sort::Trending // pure engagement velocity
Sort::Rising // velocity relative to baseline, age-boosted
Sort::Controversial // max(positive * negative signals)
Sort::HiddenGems // high quality, low reach
Sort::TopAllTime // cumulative quality, no decay
Sort::TopHour
Sort::TopToday
Sort::TopWeek
Sort::TopMonth
Sort::TopYear
Sort::MostViewed
Sort::MostLiked
Sort::MostCommented
Sort::MostShared
Sort::Shortest // duration ASC
Sort::Longest // duration DESC
Sort::AlphabeticalAsc // title A-Z
Sort::AlphabeticalDesc // title Z-A
Sort::Shuffle // random, quality-weighted
Sort::LiveViewerCount // current viewer count DESC
Sort::DateSaved // when user bookmarked DESC
Sort::CreatorEngagementRate
```
See [USE_CASES.md Appendix B](USE_CASES.md#appendix-b--sort-mode-reference) for the complete sort mode reference.
---
## Diversity Constraints
Diversity is a post-scoring pass. After candidates are scored, diversity constraints reorder the result set to enforce variety — without reducing the result count.
```rust
DiversitySpec {
// No more than N items from the same creator in the result set.
max_per_creator: Some(2),
// Ensure a mix of content formats (video, short, article, etc.)
format_mix: true,
// Topic diversity — 0.0 (no enforcement) to 1.0 (maximize diversity).
// Uses maximal marginal relevance (MMR).
topic_diversity: Some(0.7),
}
```
Diversity is specified per query or per ranking profile. Query-level diversity overrides the profile default.
---
## Pagination
Cursor-based pagination for stable result sets across pages.
```rust
// First page
let page1 = db.retrieve(Retrieve {
profile: "for_you",
for_user: Some("user_123"),
limit: 50,
cursor: None,
..Default::default()
})?;
// Next page — pass the cursor from the previous response
let page2 = db.retrieve(Retrieve {
profile: "for_you",
for_user: Some("user_123"),
limit: 50,
cursor: page1.next_cursor,
..Default::default()
})?;
```
Alternatively, use `exclude_ids` to exclude previously returned items:
```rust
let page2 = db.retrieve(Retrieve {
profile: "for_you",
for_user: Some("user_123"),
exclude_ids: page1.results.iter().map(|r| r.id.clone()).collect(),
limit: 50,
..Default::default()
})?;
```
---
## Response Format
All queries return a `Results` struct.
```rust
pub struct Results {
/// Ranked items with scores.
pub results: Vec<RankedItem>,
/// Cursor for fetching the next page.
pub next_cursor: Option<Cursor>,
/// Total candidate count before diversity/limit (useful for faceted UI).
pub total_candidates: u64,
}
pub struct RankedItem {
/// Entity ID.
pub id: String,
/// Final composite score after ranking profile, boosts, penalties, diversity.
pub score: f64,
/// Signal snapshot at query time — lets the application display counts.
pub signals: SignalSnapshot,
}
pub struct SignalSnapshot {
/// Raw signal values at query time.
/// e.g., { "view": { "24h": 12450, "7d": 89200, "all_time": 1230000 } }
pub values: HashMap<String, HashMap<String, f64>>,
}
```
The application uses `results` to render the UI. It uses `signals` to display engagement counts (views, likes, etc.). It never re-ranks — the order from tidalDB is the final order.
---
## Lifecycle and Operations
### Shutdown
```rust
// Graceful shutdown — flushes WAL, finalizes materializer, persists ANN index.
db.shutdown()?;
```
### Profile Management
```rust
// List all profiles
let profiles = db.list_profiles()?;
// Get a specific profile (with version)
let profile = db.get_profile("for_you", None)?; // latest
let profile = db.get_profile("for_you", Some(1))?; // specific version
// A/B test by using different profile versions at query time
let control = db.retrieve(Retrieve {
profile: "for_you", // uses latest version
..query.clone()
})?;
let variant = db.retrieve(Retrieve {
profile: "for_you_v2", // experimental profile
..query.clone()
})?;
```
### Schema Inspection
```rust
// List defined signals
let signals = db.list_signals()?;
// List entity types with their fields
let entities = db.list_entities()?;
```
### Saved Searches
```rust
// Save a search as a persistent feed — user gets new results over time.
db.save_search(SavedSearch {
user: "user_123",
name: "Jazz tutorials",
query: "jazz tutorial",
filters: vec![
Filter::eq("format", "video"),
Filter::eq("language", "en"),
],
})?;
// Query a saved search for new results since last check.
let results = db.retrieve_saved_search("user_123", "Jazz tutorials", since)?;
```
### Collections
```rust
// Create a user collection (playlist, board, etc.)
db.create_collection(Collection {
id: "playlist_abc",
owner: "user_123",
name: "Jazz Favorites",
visibility: Visibility::Private, // Private, Shared, Public
})?;
// Add an item to a collection
db.add_to_collection("playlist_abc", "item_abc")?;
// Items can belong to multiple collections.
// Public collections are themselves rankable — they appear in browse and search.
```
---
## Summary
| Operation | What the Application Does | What tidalDB Does |
|---|---|---|
| **Ingest content** | Compute embedding, call `write_item` | Index text, insert vector, initialize signals, apply cold start |
| **Record engagement** | Call `signal` with event type | Update signal ledger, user preferences, relationship weights — atomically |
| **Serve a feed** | Call `retrieve` with a profile name | Candidate retrieval, scoring, diversity enforcement, pagination |
| **Search** | Embed query, call `search` | BM25 + ANN + fusion + personalization + diversity |
| **Define ranking** | Declare a `ProfileDef` | The database executes the entire ranking pipeline |
| **Handle cold start** | Nothing | Exploration budget, population priors — automatic |
| **Handle negative signals** | Call `signal` with skip/hide/block | Permanent exclusion, preference decay, relationship zeroing |
| **Scope trending by cohort** | Specify cohort name or predicate in query | Cohort-scoped signal aggregation, same ranking profile |
| **Search within scope** | Specify `within` on search query | Intersects text/vector retrieval with scoped candidate set |
One process. One query interface. One operational model.