tidaldb/API.md
2026-02-23 22:41:16 -07:00

816 lines
24 KiB
Markdown

# API Reference
> **Quick API Reference:** The examples below reflect the current implementation API. Use `cargo doc --manifest-path tidal/Cargo.toml --open` for full documentation.
How developers interact with tidalDB. This document covers initialization, schema definition, write operations, queries, and the feedback loop.
tidalDB is an embeddable Rust library. You link it into your process. There is no separate server, no network protocol, no client SDK. The API is Rust types and method calls.
---
## Table of Contents
- [Initialization](#initialization)
- [Schema Definition](#schema-definition)
- [Entity Types](#entity-types)
- [Signal Definitions](#signal-definitions)
- [Ranking Profiles](#ranking-profiles)
- [Write Path](#write-path)
- [Ingesting Entities](#ingesting-entities)
- [Writing Embeddings](#writing-embeddings)
- [Writing Relationships](#writing-relationships)
- [Writing Signals](#writing-signals)
- [Query Language](#query-language)
- [RETRIEVE -- Feeds, Browse, Related](#retrieve--feeds-browse-related)
- [SEARCH -- Text + Semantic Retrieval](#search--text--semantic-retrieval)
- [SUGGEST -- Autocomplete and Suggestions](#suggest--autocomplete-and-suggestions)
- [Filters](#filters)
- [Sort Modes](#sort-modes)
- [Diversity Constraints](#diversity-constraints)
- [Pagination](#pagination)
- [Response Format](#response-format)
- [Lifecycle and Operations](#lifecycle-and-operations)
---
## Initialization
Open a database using the builder pattern. Define the schema first, then pass it to the builder.
```rust
use tidaldb::TidalDb;
use tidaldb::schema::{SchemaBuilder, EntityKind, DecaySpec, Window};
use std::time::Duration;
// 1. Define the schema (signal types, text fields, embedding slots).
let mut schema = SchemaBuilder::new();
let _ = schema.signal("view", EntityKind::Item, DecaySpec::Exponential {
half_life: Duration::from_secs(7 * 24 * 3600),
})
.windows(&[Window::OneHour, Window::TwentyFourHours, Window::AllTime])
.velocity(true)
.add();
let schema = schema.build().expect("valid schema");
// 2a. Ephemeral (in-memory) -- no filesystem access, ideal for testing.
let db = TidalDb::builder()
.ephemeral()
.with_schema(schema.clone())
.open()?;
// 2b. Persistent -- durable storage at the given path.
let db = TidalDb::builder()
.with_data_dir("/var/lib/tidaldb/my_app")
.with_schema(schema)
.open()?;
```
The database is `Send + Sync`. Share it across threads with `Arc<TidalDb>`.
---
## Schema Definition
Schema is defined before opening the database using `SchemaBuilder`. It declares signal types, text fields for full-text search, and embedding slots for vector search.
### Entity Types
Entities are the nodes of the system. Three built-in types: **Item**, **User**, **Creator**. Entity metadata is stored as `HashMap<String, String>` key-value pairs.
### Signal Definitions
Signals are typed, timestamped event streams. Decay, velocity, and windowed aggregation are declared in schema -- not computed in application code.
```rust
use tidaldb::schema::{SchemaBuilder, EntityKind, DecaySpec, Window, TextFieldType};
use std::time::Duration;
let mut schema = SchemaBuilder::new();
// View signal: exponential decay, 7-day half-life, three windows + velocity.
let _ = schema.signal("view", EntityKind::Item, DecaySpec::Exponential {
half_life: Duration::from_secs(7 * 24 * 3600),
})
.windows(&[Window::OneHour, Window::TwentyFourHours, Window::SevenDays, Window::AllTime])
.velocity(true)
.add();
// Like signal: slower decay (14 days).
let _ = schema.signal("like", EntityKind::Item, DecaySpec::Exponential {
half_life: Duration::from_secs(14 * 24 * 3600),
})
.windows(&[Window::TwentyFourHours, Window::SevenDays, Window::AllTime])
.velocity(true)
.add();
// Skip signal: fast decay (1 day), no velocity.
let _ = schema.signal("skip", EntityKind::Item, DecaySpec::Exponential {
half_life: Duration::from_secs(24 * 3600),
})
.windows(&[Window::OneHour, Window::TwentyFourHours])
.velocity(false)
.add();
// Hide signal: permanent (never decays), no windows.
let _ = schema.signal("hide", EntityKind::Item, DecaySpec::Permanent).add();
// Share signal: for trending and social features.
let _ = schema.signal("share", EntityKind::Item, DecaySpec::Exponential {
half_life: Duration::from_secs(3 * 24 * 3600),
})
.windows(&[Window::OneHour, Window::TwentyFourHours, Window::AllTime])
.velocity(true)
.add();
// Completion signal: long-lived quality metric.
let _ = schema.signal("completion", EntityKind::Item, DecaySpec::Exponential {
half_life: Duration::from_secs(30 * 24 * 3600),
})
.windows(&[Window::AllTime])
.velocity(false)
.add();
// Text fields for BM25 full-text search.
schema.text_field("title", TextFieldType::Text);
schema.text_field("description", TextFieldType::Text);
schema.text_field("category", TextFieldType::Keyword);
schema.text_field("tags", TextFieldType::Keyword);
// Creator text fields for creator search.
schema.creator_text_field("name", TextFieldType::Text);
schema.creator_text_field("handle", TextFieldType::Keyword);
// Embedding slots for vector search (you provide the vectors).
schema.embedding_slot("content", EntityKind::Item, 128);
schema.embedding_slot("content", EntityKind::Creator, 128);
let schema = schema.build()?;
```
**Decay types:**
| Decay | Behavior |
|---|---|
| `Exponential { half_life }` | Signal weight halves every `half_life` duration |
| `Linear { lifetime }` | Signal weight drops linearly to zero over `lifetime` |
| `Permanent` | Never decays -- hides, blocks, follows |
The full signal reference is in [USE_CASES.md Appendix C](USE_CASES.md#appendix-c--signal-reference).
### Ranking Profiles
tidalDB ships 25 built-in ranking profiles. The application says `profile("trending")`. The database executes the entire pipeline.
Built-in profiles include: `trending`, `hot`, `new`, `for_you`, `following`, `related`, `notification`, `search`, `top_week`, `top_month`, `top_all_time`, `hidden_gems`, `controversial`, `most_viewed`, `most_liked`, `shuffle`, `cohort_trending`, `live`, `alphabetical_asc`, `alphabetical_desc`, `shortest`, `longest`, `most_commented`, `most_shared`, `date_saved`.
See [ai-lookup/services/ranking-profiles.md](ai-lookup/services/ranking-profiles.md) for the full list of built-in profiles.
### Cohort Definitions
Cohorts are named predicates over user attributes. They define audience segments for scoped signal aggregation and trending.
```rust
use tidaldb::schema::EntityId;
// Define a cohort via db.define_cohort() after opening.
// Cohort signal aggregation happens at signal write time.
// Use RetrieveBuilder::cohort("us_young_music") to scope queries.
```
---
## Write Path
### Ingesting Entities
Items enter the system with metadata as `HashMap<String, String>` key-value pairs. The application provides the embedding -- tidalDB does not generate vectors.
```rust
use std::collections::HashMap;
use tidaldb::schema::EntityId;
let mut metadata = HashMap::new();
metadata.insert("title".to_string(), "Introduction to Jazz Piano".to_string());
metadata.insert("description".to_string(), "A beginner's guide...".to_string());
metadata.insert("category".to_string(), "music".to_string());
metadata.insert("tags".to_string(), "jazz,piano,tutorial,beginner".to_string());
metadata.insert("format".to_string(), "video".to_string());
metadata.insert("language".to_string(), "en".to_string());
metadata.insert("duration".to_string(), "1320".to_string()); // seconds
metadata.insert("creator_id".to_string(), "100".to_string());
db.write_item_with_metadata(EntityId::new(1), &metadata)?;
```
On commit, the item is:
1. Stored in the entity store
2. Text fields indexed in the inverted index (BM25)
3. Inserted into bitmap and range indexes for filtering
4. Added to the universe bitmap for RETRIEVE queries
5. **Immediately queryable**
### Writing Embeddings
Embeddings are written separately from metadata. tidalDB L2-normalizes and indexes them into the HNSW vector index.
```rust
use tidaldb::schema::EntityId;
// Item embedding (you compute this externally).
let embedding: Vec<f32> = compute_embedding("Introduction to Jazz Piano");
db.write_item_embedding(EntityId::new(1), &embedding)?;
// Creator embedding.
let creator_embedding: Vec<f32> = compute_creator_embedding("Jazz Academy");
db.write_creator_embedding(EntityId::new(100), &creator_embedding)?;
```
### Writing Relationships
Relationships are directional edges between entities (follows, blocks). Used for the `following` profile and blocked-creator filtering.
```rust
use tidaldb::schema::EntityId;
use tidaldb::schema::Timestamp;
// User follows a creator.
db.write_relationship(
EntityId::new(123), // user
EntityId::new(100), // creator
"follows",
1.0, // weight
Timestamp::now(),
)?;
```
### Writing Signals
Signals are how the feedback loop closes. A single signal write atomically updates:
1. The item's signal ledger (windowed aggregates, velocity, decay score)
2. The WAL (write-ahead log) for durability
```rust
use tidaldb::schema::{EntityId, Timestamp};
// User viewed an item.
db.signal("view", EntityId::new(1), 1.0, Timestamp::now())?;
// User completed 94% of the video.
db.signal("completion", EntityId::new(1), 0.94, Timestamp::now())?;
// User liked an item.
db.signal("like", EntityId::new(1), 1.0, Timestamp::now())?;
// User skipped after 3 seconds (strong negative).
db.signal("skip", EntityId::new(2), 1.0, Timestamp::now())?;
// User tapped "Not interested" (permanent negative on this item).
db.signal("hide", EntityId::new(2), 1.0, Timestamp::now())?;
```
For signals with user context (updates preference vectors, seen state, interaction weights):
```rust
use tidaldb::schema::{EntityId, Timestamp};
db.signal_with_context(
"view",
EntityId::new(1), // item
1.0, // weight
Timestamp::now(),
Some(123), // for_user
Some(100), // creator_id
)?;
```
The next ranking query -- even 100ms later -- reflects the updated state.
---
## Query Language
Three operations: **RETRIEVE** (feed generation, browse, related), **SEARCH** (text + semantic retrieval), **SUGGEST** (autocomplete).
All queries return ranked results with scores. The application renders -- it never re-ranks.
### RETRIEVE -- Feeds, Browse, Related
RETRIEVE generates ranked content lists. It handles personalized feeds, category browse, trending, following, related content, and every other surface described in [USE_CASES.md](USE_CASES.md).
```rust
use tidaldb::query::retrieve::Retrieve;
use tidaldb::schema::EntityId;
// Personalized For You feed.
let query = Retrieve::builder()
.for_user(123)
.profile("for_you")
.limit(50)
.build()?;
let results = db.retrieve(&query)?;
```
```rust
// Trending globally.
let query = Retrieve::builder()
.profile("trending")
.limit(25)
.build()?;
let results = db.retrieve(&query)?;
```
```rust
use tidaldb::storage::indexes::filter::FilterExpr;
// Trending in a category.
let query = Retrieve::builder()
.profile("trending")
.filter(FilterExpr::eq("category", "jazz"))
.limit(25)
.build()?;
let results = db.retrieve(&query)?;
```
```rust
// Trending within a cohort -- what's hot among US young music fans.
let query = Retrieve::builder()
.profile("cohort_trending")
.cohort("us_young_music")
.limit(25)
.build()?;
let results = db.retrieve(&query)?;
```
```rust
// Following feed -- content from followed creators.
let query = Retrieve::builder()
.for_user(123)
.profile("following")
.limit(50)
.build()?;
let results = db.retrieve(&query)?;
```
```rust
use tidaldb::ranking::diversity::DiversityConstraints;
// Related content / Up Next -- anchored to a specific item.
let query = Retrieve::builder()
.for_user(123)
.profile("related")
.similar_to(EntityId::new(1))
.diversity(DiversityConstraints::new().max_per_creator(1))
.limit(10)
.build()?;
let results = db.retrieve(&query)?;
```
```rust
// Browse category with explicit sort mode.
let query = Retrieve::builder()
.profile("top_week")
.filter(FilterExpr::eq("category", "jazz"))
.limit(20)
.build()?;
let results = db.retrieve(&query)?;
```
```rust
// Hidden gems -- high quality, low reach.
let query = Retrieve::builder()
.profile("hidden_gems")
.limit(20)
.build()?;
let results = db.retrieve(&query)?;
```
```rust
// Exclude previously seen items.
let query = Retrieve::builder()
.for_user(123)
.profile("for_you")
.exclude(vec![EntityId::new(1), EntityId::new(2)])
.limit(50)
.build()?;
let results = db.retrieve(&query)?;
```
```rust
// Creator profile -- items from a specific creator.
let query = Retrieve::builder()
.profile("new")
.for_creator(EntityId::new(100))
.limit(20)
.build()?;
let results = db.retrieve(&query)?;
```
```rust
// Notification prioritization.
let query = Retrieve::builder()
.for_user(123)
.profile("notification")
.limit(20)
.build()?;
let results = db.retrieve(&query)?;
```
### SEARCH -- Text + Semantic Retrieval
Search combines full-text BM25 relevance with semantic similarity via RRF (Reciprocal Rank Fusion). Text relevance is the floor -- an irrelevant result never surfaces just because the user likes the creator.
```rust
use tidaldb::query::search::Search;
// Basic keyword search, personalized for this user.
let query = Search::builder()
.query("rust tutorial beginner")
.for_user(123)
.limit(20)
.build()?;
let results = db.search(&query)?;
```
```rust
// Hybrid search: text + vector.
let query_embedding: Vec<f32> = embed("rust tutorial beginner");
let query = Search::builder()
.query("rust tutorial beginner")
.vector(query_embedding)
.for_user(123)
.limit(20)
.build()?;
let results = db.search(&query)?;
```
```rust
// Creator search.
use tidaldb::schema::EntityKind;
let query = Search::builder()
.query("jazz piano")
.entity_kind(EntityKind::Creator)
.limit(10)
.build()?;
let results = db.search(&query)?;
```
### Query Composition -- SEARCH within Scoped Results
SEARCH can be composed with scope constraints. This enables searching within trending, within a cohort, or within any candidate set.
```rust
use tidaldb::query::search::{Search, WithinScope};
// Search within globally trending items.
let query = Search::builder()
.query("jazz piano")
.within(WithinScope::Trending { window_hours: 24 })
.limit(20)
.build()?;
let results = db.search(&query)?;
```
```rust
// Search within cohort-scoped trending.
let query = Search::builder()
.query("jazz piano")
.within(WithinScope::CohortTrending {
cohort: "us_young_music".into(),
window_hours: 24,
})
.limit(20)
.build()?;
let results = db.search(&query)?;
```
```rust
// Search within a user's following feed.
let query = Search::builder()
.query("jazz piano")
.for_user(123)
.within(WithinScope::Following)
.limit(20)
.build()?;
let results = db.search(&query)?;
```
**`WithinScope`:**
| Scope | Candidate Set |
|---|---|
| `Trending { window_hours }` | Items with high global velocity in window |
| `CohortTrending { cohort, window_hours }` | Items with high velocity among cohort members |
| `Following` | Items from followed creators (requires `for_user`) |
| `Category { name }` | Items in a category |
| `Collection { id }` | Items in a collection |
### SUGGEST -- Autocomplete and Suggestions
```rust
use tidaldb::query::suggest::Suggest;
// Autocomplete on partial query.
let req = Suggest { prefix: "jazz pia".into(), for_user: None, limit: 5 };
let suggestions = db.suggest(&req)?;
// Returns Vec<Suggestion> with text and frequency.
// Trending searches (empty prefix).
let req = Suggest { prefix: "".into(), for_user: None, limit: 10 };
let trending = db.suggest(&req)?;
```
---
## Filters
All filters are composable. Any combination of filters produces a valid, efficiently-executed query. Filters use the `FilterExpr` type from `tidaldb::storage::indexes::filter::FilterExpr`.
### Content Attribute Filters
```rust
use tidaldb::storage::indexes::filter::FilterExpr;
FilterExpr::eq("category", "jazz") // exact match on category
FilterExpr::eq("format", "video") // exact match on format
FilterExpr::eq("tags", "tutorial") // tag match
```
### Engagement Threshold Filters
```rust
FilterExpr::MinSignal { signal: "view".into(), threshold: 10000.0 }
FilterExpr::MaxSignal { signal: "view".into(), threshold: 5000.0 }
```
### Geographic Filters
```rust
FilterExpr::NearLocation { lat: 40.7128, lng: -74.0060, radius_km: 50.0 }
```
### Collection Filters
```rust
use tidaldb::entities::CollectionId;
FilterExpr::InCollection(CollectionId::new(42))
```
See [USE_CASES.md Appendix A](USE_CASES.md#appendix-a--filter-reference) for the complete filter reference.
---
## Sort Modes
Sort modes are embedded in ranking profiles. The application names a profile. The database executes the ranking pipeline. 25 built-in profiles cover the most common sort needs.
| Profile | Sort Mode |
|---|---|
| `new` | `created_at` DESC |
| `trending` | Engagement velocity |
| `hot` | Score / (age + 2)^gravity |
| `top_week` / `top_month` / `top_all_time` | Cumulative quality by window |
| `most_viewed` / `most_liked` | Signal count by window |
| `most_commented` / `most_shared` | Signal count (AllTime) |
| `hidden_gems` | High quality, low reach |
| `controversial` | max(positive * negative signals) |
| `shuffle` | Random, quality-weighted |
| `live` | Live viewer count DESC |
| `date_saved` | When user bookmarked DESC |
| `alphabetical_asc` / `alphabetical_desc` | Title A-Z / Z-A |
| `shortest` / `longest` | Duration ASC / DESC |
See [USE_CASES.md Appendix B](USE_CASES.md#appendix-b--sort-mode-reference) for the complete sort mode reference.
---
## Diversity Constraints
Diversity is a post-scoring pass. After candidates are scored, diversity constraints reorder the result set to enforce variety -- without reducing the result count.
```rust
use tidaldb::ranking::diversity::DiversityConstraints;
let diversity = DiversityConstraints::new()
.max_per_creator(2) // No more than 2 items per creator
.max_format_fraction(0.4); // No format > 40% of results
let query = Retrieve::builder()
.profile("for_you")
.for_user(123)
.diversity(diversity)
.limit(50)
.build()?;
```
Diversity is specified per query or per ranking profile. Query-level diversity overrides the profile default.
---
## Pagination
Cursor-based pagination for stable result sets across pages.
```rust
use tidaldb::query::retrieve::Retrieve;
// First page.
let query = Retrieve::builder()
.for_user(123)
.profile("for_you")
.limit(50)
.build()?;
let page1 = db.retrieve(&query)?;
// Next page -- pass the cursor from the previous response.
if let Some(cursor) = page1.next_cursor {
let query = Retrieve::builder()
.for_user(123)
.profile("for_you")
.cursor(cursor)
.limit(50)
.build()?;
let page2 = db.retrieve(&query)?;
}
```
Alternatively, use `exclude` to exclude previously returned items:
```rust
let seen_ids: Vec<_> = page1.items.iter().map(|r| r.entity_id).collect();
let query = Retrieve::builder()
.for_user(123)
.profile("for_you")
.exclude(seen_ids)
.limit(50)
.build()?;
let page2 = db.retrieve(&query)?;
```
---
## Response Format
### RETRIEVE Response
```rust
pub struct Results {
/// Ranked items with scores.
pub items: Vec<RetrieveResult>,
/// Cursor for fetching the next page.
pub next_cursor: Option<Cursor>,
/// Total candidate count before diversity/limit.
pub total_candidates: usize,
/// Whether all diversity constraints were satisfied.
pub constraints_satisfied: bool,
/// Warnings generated during query execution.
pub warnings: Vec<String>,
/// The degradation level under which this query was executed.
pub degradation_level: DegradationLevel,
}
pub struct RetrieveResult {
/// Entity ID.
pub entity_id: EntityId,
/// Normalized score in [0.0, 1.0].
pub score: f64,
/// 1-based rank.
pub rank: usize,
/// Signal values that contributed to this score.
pub signals: Vec<Signal>,
}
```
### SEARCH Response
```rust
pub struct SearchResults {
pub items: Vec<SearchResultItem>,
pub next_cursor: Option<Cursor>,
pub total_candidates: usize,
pub constraints_satisfied: bool,
pub warnings: Vec<String>,
pub degradation_level: DegradationLevel,
}
pub struct SearchResultItem {
pub entity_id: EntityId,
pub score: f64,
pub rank: usize,
pub bm25_score: Option<f32>,
pub semantic_score: Option<f32>,
pub signals: Vec<Signal>,
pub metadata: Option<HashMap<String, String>>,
}
```
The application uses `items` to render the UI. It uses `signals` to display engagement counts (views, likes, etc.). It never re-ranks -- the order from tidalDB is the final order.
---
## Lifecycle and Operations
### Shutdown
```rust
// Graceful shutdown -- flushes WAL, checkpoints signal state, persists indexes.
db.close()?;
// Or equivalently:
db.shutdown()?;
```
### Health Check
```rust
db.health_check()?; // Returns Ok(()) if operational.
```
### Item Count
```rust
let count: u64 = db.item_count(); // Number of items in the universe bitmap.
```
### Reading Signal State
```rust
use tidaldb::schema::{EntityId, Window};
// Read decay score (applies lazy decay to current time).
let score: Option<f64> = db.read_decay_score(EntityId::new(1), "view", 0)?;
// Read windowed event count.
let count: u64 = db.read_windowed_count(EntityId::new(1), "view", Window::OneHour)?;
// Read velocity (events per second).
let velocity: f64 = db.read_velocity(EntityId::new(1), "view", Window::OneHour)?;
```
### Saved Searches
```rust
use tidaldb::schema::{EntityId, Timestamp};
// Save a search as a persistent feed.
db.save_search(EntityId::new(123), "Jazz tutorials", "jazz tutorial", None)?;
// Query a saved search for new results since a timestamp.
let results = db.retrieve_saved_search(EntityId::new(123), "Jazz tutorials", Some(since))?;
// List all saved searches for a user.
let searches = db.list_saved_searches(EntityId::new(123))?;
// Delete a saved search.
db.delete_saved_search(EntityId::new(123), "Jazz tutorials")?;
```
### Collections
```rust
use tidaldb::schema::EntityId;
use tidaldb::entities::collection::Visibility;
// Create a user collection (playlist, board, etc.)
let collection_id = db.create_collection(EntityId::new(123), "Jazz Favorites", Visibility::Private)?;
// Add an item to a collection.
db.add_to_collection(collection_id, EntityId::new(1))?;
// Remove an item from a collection.
db.remove_from_collection(collection_id, EntityId::new(1))?;
// List collections for a user.
let collections = db.list_collections(EntityId::new(123))?;
```
### Text Index Management
```rust
// Force a synchronous commit and reload of the text index.
// Useful in tests after writing items to make them immediately searchable.
db.flush_text_index()?;
db.flush_creator_text_index()?;
// Manual reload (for ephemeral mode).
db.reload_text_index()?;
```
---
## Summary
| Operation | What the Application Does | What tidalDB Does |
|---|---|---|
| **Ingest content** | Compute embedding, call `write_item_with_metadata` + `write_item_embedding` | Index text, insert vector, initialize signals, apply cold start |
| **Record engagement** | Call `signal` with event type | Update signal ledger, WAL-backed durability |
| **Record engagement with context** | Call `signal_with_context` with user/creator IDs | Update ledger + user preferences + interaction weights + cohort attribution |
| **Serve a feed** | Call `retrieve` with a profile name | Candidate retrieval, scoring, diversity enforcement, pagination |
| **Search** | Embed query, call `search` | BM25 + ANN + RRF fusion + personalization + diversity |
| **Handle cold start** | Nothing | Exploration budget, population priors -- automatic |
| **Handle negative signals** | Call `signal` with skip/hide | Preference decay, exclusion in future queries |
| **Scope trending by cohort** | Specify cohort name in retrieve query | Cohort-scoped signal aggregation, same ranking profile |
| **Search within scope** | Specify `within` on search query | Intersects text/vector retrieval with scoped candidate set |
One process. One query interface. One operational model.