tidaldb/API.md
jordan 413b712c0a chore: initialize tidalDB repository with schema foundation and standards
- Schema phase 1 (tasks 01-02): EntityId, EntityKind, Timestamp, Score, SignalTypeDef, DecayModel, Window, WindowSet — all with property tests and benchmarks scaffolding
- Stub modules for storage, signals, query, ranking
- Full documentation suite: VISION, USE_CASES, SEQUENCE, API, CODING_GUIDELINES, ai-lookup, research docs, specs, roadmap, planning docs
- Marketing site (Next.js) with blog infrastructure
- .claude/ agents and skills for the tidalDB development workflow
- Foundation standards enforced: thiserror + tracing declared as dependencies, clippy::unwrap_used = deny added to lint config
- .gitignore hardened: .next/, node_modules/, .env, secrets, logs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 12:52:20 -07:00

35 KiB

API Reference

How developers interact with tidalDB. This document covers initialization, schema definition, write operations, queries, and the feedback loop.

tidalDB is an embeddable Rust library. You link it into your process. There is no separate server, no network protocol, no client SDK. The API is Rust types and method calls.


Table of Contents


Initialization

Open a database, providing a path and configuration.

use tidaldb::{TidalDB, Config};

let db = TidalDB::open(Config {
    path: "./data/my_app",
    // Memory budget for in-memory signal state and hot caches.
    // Higher = faster ranking queries. 10M entities need ~1-2 GB.
    memory_budget: 2 * 1024 * 1024 * 1024, // 2 GB
    // WAL fsync strategy for signal writes.
    signal_durability: Durability::Batched { max_batch: 100, max_delay_ms: 10 },
    // Number of threads for background materializer and segment merging.
    background_threads: 4,
})?;

Durability controls fsync behavior for signal writes:

Level Behavior Use Case
Immediate fsync every write Financial, purchase events
Batched { max_batch, max_delay_ms } fsync per batch Default for engagement signals
Eventual fsync on OS schedule Impressions, low-value telemetry

The database is Send + Sync. Share it across threads with Arc<TidalDB>.


Schema Definition

Schema is defined before writing data. It declares entity types, signal types, and ranking profiles. Schema is versioned — old profiles remain queryable by name and version.

Entity Types

Entities are the nodes of the system. Three built-in types: Item, User, Creator.

use tidaldb::schema::*;

db.define_entity(EntityDef {
    kind: EntityKind::Item,
    metadata_fields: vec![
        Field::text("title"),              // full-text indexed
        Field::text("description"),        // full-text indexed
        Field::keyword("category"),        // exact match, filterable
        Field::keywords("tags"),           // multi-value, filterable
        Field::keyword("format"),          // video, short, podcast, article, etc.
        Field::keyword("language"),        // ISO code
        Field::keyword("content_rating"),  // G, PG, PG-13, R
        Field::keyword("status"),          // published, live, scheduled, archived
        Field::keyword("availability"),    // free, premium, subscriber_only
        Field::duration("duration"),       // filterable, sortable
        Field::timestamp("created_at"),    // filterable, sortable
        Field::bool("has_subtitles"),
        Field::bool("downloadable"),
    ],
    // Embedding slot — you provide the vector, tidalDB indexes it.
    embedding_dimensions: 1536,
})?;

db.define_entity(EntityDef {
    kind: EntityKind::User,
    metadata_fields: vec![
        // Demographic (application-set)
        Field::keyword("locale"),            // en-US, ja-JP, etc.
        Field::keyword("region"),            // country or region code
        Field::keyword("timezone"),          // IANA timezone
        Field::keyword("age_range"),         // 18-24, 25-34, 35-44, etc.
        Field::keyword("gender"),            // optional demographic

        // Interests (mixed: app-set + DB-computed)
        Field::keywords("explicit_interests"),   // stated interests
        Field::keywords("inferred_interests"),   // DB-computed from engagement
        Field::keywords("primary_categories"),   // top categories by engagement

        // Behavioral (DB-computed)
        Field::keyword("engagement_level"),      // power, regular, casual, dormant
        Field::keyword("format_preference"),     // short, long, mixed
        Field::keyword("session_pattern"),       // binge, browse, search
        Field::i64("platform_tenure_days"),      // days since first signal
    ],
    // User preference vector — managed by the database.
    // Updated automatically on every signal write.
    embedding_dimensions: 1536,
})?;

db.define_entity(EntityDef {
    kind: EntityKind::Creator,
    metadata_fields: vec![
        Field::text("name"),
        Field::keyword("handle"),
        Field::keyword("language"),
        Field::keyword("region"),
        Field::bool("verified"),
    ],
    // Creator embedding — aggregated from their item catalog.
    embedding_dimensions: 1536,
})?;

Field types:

Type Behavior
text Full-text indexed (BM25), searchable with tokenization
keyword Exact match, filterable, facetable
keywords Multi-value keyword (tags, categories)
i64 / f64 Numeric, range-filterable, sortable
bool Boolean filter
timestamp Time value, range-filterable
duration Duration value, range-filterable, sortable

Signal Definitions

Signals are typed, timestamped event streams. Decay, velocity, and windowed aggregation are declared in schema — not computed in application code.

db.define_signal(SignalDef {
    name: "view",
    target: EntityKind::Item,
    decay: Decay::Exponential { half_life: Duration::days(7) },
    windows: vec![
        Window::hours(1),
        Window::hours(24),
        Window::days(7),
        Window::days(30),
        Window::all_time(),
    ],
    velocity: true, // compute rate-of-change per window
})?;

db.define_signal(SignalDef {
    name: "like",
    target: EntityKind::Item,
    decay: Decay::Exponential { half_life: Duration::days(7) },
    windows: vec![Window::hours(1), Window::hours(24), Window::days(7), Window::all_time()],
    velocity: true,
})?;

db.define_signal(SignalDef {
    name: "skip",
    target: EntityKind::Item,
    decay: Decay::Exponential { half_life: Duration::hours(24) },
    windows: vec![Window::hours(1), Window::hours(24)],
    velocity: false,
})?;

db.define_signal(SignalDef {
    name: "hide",
    target: EntityKind::Item,
    decay: Decay::Permanent, // never decays
    windows: vec![],         // no aggregation — binary flag
    velocity: false,
})?;

db.define_signal(SignalDef {
    name: "completion",
    target: EntityKind::Item,
    decay: Decay::Exponential { half_life: Duration::days(30) },
    windows: vec![Window::all_time()],
    velocity: false,
})?;

Decay types:

Decay Behavior
Exponential { half_life } Signal weight halves every half_life duration
Linear { lifetime } Signal weight drops linearly to zero over lifetime
Permanent Never decays — hides, blocks, follows

The full signal reference is in USE_CASES.md Appendix C.

Ranking Profiles

Named, versioned scoring functions. The application says USING PROFILE for_you. The database executes the entire pipeline.

db.define_profile(ProfileDef {
    name: "for_you",
    version: 1,
    candidate: Candidate::Ann {
        query_vector: VectorSource::UserPreference,
        index: EntityKind::Item,
        top_k: 500,
    },
    boosts: vec![
        Boost::signal("view", Window::hours(24), Velocity, 0.3),
        Boost::relationship("interaction_weight", 0.2),
        Boost::social_proof(0.15),
    ],
    decay: ProfileDecay {
        field: "created_at",
        half_life: Duration::hours(48),
    },
    gates: vec![
        Gate::min("completion", Window::all_time(), 0.3),
    ],
    penalties: vec![
        Penalty::signal("skip", Window::hours(24), -0.5),
    ],
    excludes: vec![
        Exclude::signal("hide"),
        Exclude::relationship("blocked"),
    ],
    diversity: DiversitySpec {
        max_per_creator: Some(2),
        format_mix: true,
        topic_diversity: None,
    },
    exploration: 0.10, // 10% of results from creators the user doesn't follow
})?;

db.define_profile(ProfileDef {
    name: "trending",
    version: 1,
    candidate: Candidate::Scan { entity: EntityKind::Item },
    boosts: vec![
        Boost::signal("share", Window::hours(6), Velocity, 0.5),
        Boost::signal("view", Window::hours(6), Velocity, 0.3),
        Boost::signal("view", Window::hours(24), UniqueRatio, 0.2), // new-user reach
    ],
    gates: vec![
        Gate::min_ratio("engagement_ratio", 0.03),
    ],
    // No personalization. No user context. Pure velocity.
    ..ProfileDef::default()
})?;

db.define_profile(ProfileDef {
    name: "search",
    version: 1,
    candidate: Candidate::Hybrid {
        text_weight: 0.6,
        vector_weight: 0.4,
        fusion: Fusion::Rrf { k: 60 },
    },
    boosts: vec![
        Boost::signal("completion", Window::all_time(), Value, 0.15),
        Boost::signal("like", Window::all_time(), Ratio, 0.10),
    ],
    decay: ProfileDecay {
        field: "created_at",
        half_life: Duration::days(90), // slow decay for evergreen content
    },
    diversity: DiversitySpec {
        max_per_creator: Some(2),
        ..Default::default()
    },
    ..ProfileDef::default()
})?;

db.define_profile(ProfileDef {
    name: "following",
    version: 1,
    candidate: Candidate::Relationship { edge: "follows" },
    boosts: vec![],
    // Pure reverse chronological. Minimal algorithmic intervention.
    sort: Sort::Field("created_at", Desc),
    ..ProfileDef::default()
})?;

See ai-lookup/services/ranking-profiles.md for the full list of built-in profiles.

Cohort Definitions

Cohorts are named predicates over user attributes. They define audience segments for scoped signal aggregation and trending.

db.define_cohort(CohortDef {
    name: "us_young_music",
    predicate: Predicate::and(vec![
        Predicate::eq("locale", "en-US"),
        Predicate::any("age_range", &["18-24", "25-34"]),
        Predicate::any("primary_categories", &["music", "concerts"]),
    ]),
    // Signal aggregation level — controls storage cost vs. query flexibility
    aggregation: CohortAggregation::Materialized,
})?;

db.define_cohort(CohortDef {
    name: "jp_casual",
    predicate: Predicate::and(vec![
        Predicate::eq("region", "JP"),
        Predicate::eq("engagement_level", "casual"),
    ]),
    aggregation: CohortAggregation::Materialized,
})?;

// Ad-hoc cohorts can be queried without pre-definition,
// but use query-time aggregation (slower).

CohortAggregation:

Level Behavior Use Case
Materialized Pre-computed per-cohort signal aggregates, updated at signal write time High-traffic cohorts, production trending pages
OnDemand Computed at query time by filtering signal events Ad-hoc analysis, rare cohort combinations

Write Path

Ingesting Entities

Items enter the system with metadata and an embedding. The application provides the embedding — tidalDB does not generate vectors.

db.write_item(WriteItem {
    id: "item_abc",
    creator_id: "creator_xyz",
    metadata: metadata! {
        "title" => "Introduction to Jazz Piano",
        "description" => "A beginner's guide to jazz piano...",
        "category" => "music",
        "tags" => ["jazz", "piano", "tutorial", "beginner"],
        "format" => "video",
        "language" => "en",
        "duration" => Duration::minutes(22),
        "content_rating" => "G",
        "status" => "published",
        "availability" => "free",
        "has_subtitles" => true,
        "downloadable" => false,
        "created_at" => Utc::now(),
    },
    embedding: &content_vector, // [f32; 1536] — you compute this externally
})?;

On commit, the item is:

  1. Stored in the entity store
  2. Text fields indexed in the inverted index (BM25)
  3. Embedding inserted into the ANN index (HNSW)
  4. Signal ledger initialized (all zeros)
  5. Linked to its creator entity
  6. Given a cold-start exploration budget
  7. Immediately queryable
db.write_creator(WriteCreator {
    id: "creator_xyz",
    metadata: metadata! {
        "name" => "Jazz Academy",
        "handle" => "jazzacademy",
        "language" => "en",
        "verified" => true,
    },
    embedding: &creator_vector, // aggregated from catalog, computed externally
})?;

db.write_user(WriteUser {
    id: "user_123",
    metadata: metadata! {
        "locale" => "en-US",
        "region" => "US",
        "timezone" => "America/New_York",
        "age_range" => "18-24",
        "explicit_interests" => ["jazz", "piano", "music production"],
    },
    // Initial preference vector. If None, uses cohort-level or population-level default.
    embedding: None,
})?;

Updating Entities

Update metadata or embeddings on existing entities.

db.update_item("item_abc", UpdateItem {
    metadata: Some(metadata! {
        "status" => "archived",
    }),
    embedding: None, // unchanged
})?;

Writing Relationships

Relationships are first-class edges between entities. Weighted, directional, traversable in queries.

db.write_relationship(Relationship {
    kind: "follows",
    from: ("user", "user_123"),
    to: ("creator", "creator_xyz"),
    weight: 1.0,
    timestamp: Utc::now(),
})?;

// Block a creator — permanent, hard filter in all future queries.
db.write_relationship(Relationship {
    kind: "blocked",
    from: ("user", "user_123"),
    to: ("creator", "creator_bad"),
    weight: 1.0,
    timestamp: Utc::now(),
})?;

Writing Signals

Signals are how the feedback loop closes. A single signal write atomically updates:

  1. The item's signal ledger (windowed aggregates, velocity, decay score)
  2. The user's preference vector (shifted toward or away from the item's embedding)
  3. The user-to-creator relationship weight
  4. The user-to-item relationship (seen, liked, hidden, etc.)
// User viewed an item
db.signal(Signal {
    kind: "view",
    item: "item_abc",
    user: "user_123",
    timestamp: Utc::now(),
    weight: 1.0,
    context: None,
})?;

// User completed 94% of the video
db.signal(Signal {
    kind: "completion",
    item: "item_abc",
    user: "user_123",
    timestamp: Utc::now(),
    weight: 0.94, // completion ratio
    context: None,
})?;

// User liked an item
db.signal(Signal {
    kind: "like",
    item: "item_abc",
    user: "user_123",
    timestamp: Utc::now(),
    weight: 1.0,
    context: None,
})?;

// User skipped after 3 seconds (strong negative)
db.signal(Signal {
    kind: "skip",
    item: "item_xyz",
    user: "user_123",
    timestamp: Utc::now(),
    weight: 1.0,
    context: Some(json!({ "dwell_ms": 3200, "source": "autoplay" })),
})?;

// User tapped "Not interested" (permanent negative on this item)
db.signal(Signal {
    kind: "hide",
    item: "item_xyz",
    user: "user_123",
    timestamp: Utc::now(),
    weight: 1.0,
    context: None,
})?;

// Search click with rank context (trains query relevance)
db.signal(Signal {
    kind: "search_click",
    item: "item_abc",
    user: "user_123",
    timestamp: Utc::now(),
    weight: 1.0,
    context: Some(json!({
        "query": "jazz piano tutorial",
        "rank_at_click": 3
    })),
})?;

The next ranking query — even 100ms later — reflects the updated state.


Query Language

Three operations: RETRIEVE (feed generation, browse, related), SEARCH (text + semantic retrieval), SUGGEST (autocomplete).

All queries return ranked results with scores. The application renders — it never re-ranks.

RETRIEVE generates ranked content lists. It handles personalized feeds, category browse, trending, following, related content, and every other surface described in USE_CASES.md.

// Personalized For You feed
let results = db.retrieve(Retrieve {
    entity: EntityKind::Item,
    for_user: Some("user_123"),
    context: Some("feed"),
    profile: "for_you",
    filters: vec![
        Filter::unseen(),
        Filter::not_blocked(),
        Filter::eq("format", "video"),
        Filter::preset("duration", "short"),   // under 4 minutes
    ],
    diversity: Some(DiversitySpec {
        max_per_creator: Some(2),
        format_mix: true,
        topic_diversity: None,
    }),
    exclude_ids: vec![],  // previously returned items
    limit: 50,
    cursor: None,
})?;
// Trending globally
let results = db.retrieve(Retrieve {
    entity: EntityKind::Item,
    for_user: None, // no personalization
    profile: "trending",
    filters: vec![],
    diversity: Some(DiversitySpec {
        max_per_creator: Some(1),
        ..Default::default()
    }),
    limit: 25,
    ..Default::default()
})?;
// Trending in a category
let results = db.retrieve(Retrieve {
    entity: EntityKind::Item,
    profile: "trending",
    filters: vec![
        Filter::eq("category", "jazz"),
    ],
    diversity: Some(DiversitySpec {
        max_per_creator: Some(1),
        ..Default::default()
    }),
    limit: 25,
    ..Default::default()
})?;
// Trending among people I follow (social graph scoped)
let results = db.retrieve(Retrieve {
    entity: EntityKind::Item,
    for_user: Some("user_123"),
    profile: "trending",
    filters: vec![
        Filter::social_graph("user_123", Depth::Two),
    ],
    limit: 25,
    ..Default::default()
})?;
// Trending within a cohort — what's hot among US young music fans
let results = db.retrieve(Retrieve {
    entity: EntityKind::Item,
    profile: "trending",
    cohort: Some("us_young_music"),  // scopes signal aggregation
    filters: vec![],
    diversity: Some(DiversitySpec {
        max_per_creator: Some(1),
        ..Default::default()
    }),
    limit: 25,
    ..Default::default()
})?;
// Ad-hoc cohort (not pre-defined) — slower but flexible
let results = db.retrieve(Retrieve {
    entity: EntityKind::Item,
    profile: "trending",
    cohort_predicate: Some(Predicate::and(vec![
        Predicate::eq("region", "DE"),
        Predicate::any("primary_categories", &["cooking", "food"]),
    ])),
    limit: 25,
    ..Default::default()
})?;
// Following feed — pure chronological
let results = db.retrieve(Retrieve {
    entity: EntityKind::Item,
    for_user: Some("user_123"),
    profile: "following",
    filters: vec![
        Filter::relationship("follows"),
        Filter::unseen(),
    ],
    limit: 50,
    cursor: Some(cursor_from_last_batch),
})?;
// Related content / Up Next — anchored to a specific item
let results = db.retrieve(Retrieve {
    entity: EntityKind::Item,
    for_user: Some("user_123"),
    profile: "related",
    similar_to: Some("item_abc"), // anchor item
    filters: vec![
        Filter::unseen(),
    ],
    diversity: Some(DiversitySpec {
        max_per_creator: Some(1), // avoid same creator in top 3
        ..Default::default()
    }),
    limit: 10,
    ..Default::default()
})?;
// Browse category with explicit sort mode
let results = db.retrieve(Retrieve {
    entity: EntityKind::Item,
    profile: "browse",
    sort: Some(Sort::TopWeek), // override profile's default sort
    filters: vec![
        Filter::eq("category", "jazz"),
        Filter::preset("duration", "short"),
        Filter::eq("has_subtitles", true),
    ],
    limit: 20,
    ..Default::default()
})?;
// Hidden gems — high quality, low reach
let results = db.retrieve(Retrieve {
    entity: EntityKind::Item,
    profile: "hidden_gems",
    filters: vec![
        Filter::created_within(Duration::days(30)),
    ],
    limit: 20,
    ..Default::default()
})?;
// User's watch history — personal library
let results = db.retrieve(Retrieve {
    entity: EntityKind::Item,
    for_user: Some("user_123"),
    sort: Some(Sort::DateSaved),
    filters: vec![
        Filter::user_state("saved"),
    ],
    limit: 20,
    ..Default::default()
})?;
// Creator discovery — find creators like another creator
let results = db.retrieve(Retrieve {
    entity: EntityKind::Creator,
    for_user: Some("user_123"),
    similar_to: Some("creator_xyz"),
    sort: Some(Sort::CreatorEngagementRate),
    limit: 10,
    ..Default::default()
})?;
// Live content — what's live right now that this user cares about
let results = db.retrieve(Retrieve {
    entity: EntityKind::Item,
    for_user: Some("user_123"),
    profile: "live",
    filters: vec![
        Filter::eq("status", "live"),
    ],
    sort: Some(Sort::LiveViewerCount),
    limit: 20,
    ..Default::default()
})?;
// Notification prioritization
let results = db.retrieve(Retrieve {
    entity: EntityKind::Item,
    for_user: Some("user_123"),
    profile: "notification",
    filters: vec![
        Filter::since(last_seen_timestamp),
    ],
    limit: 20,
    ..Default::default()
})?;

SEARCH — Text + Semantic Retrieval

Search combines full-text BM25 relevance with semantic similarity. Text relevance is the floor — an irrelevant result never surfaces just because the user likes the creator.

// Basic keyword search, personalized for this user
let results = db.search(Search {
    query: "rust tutorial beginner",
    vector: Some(&query_embedding), // embed the query externally, pass it in
    for_user: Some("user_123"),
    profile: "search",
    filters: vec![],
    diversity: Some(DiversitySpec {
        max_per_creator: Some(2),
        ..Default::default()
    }),
    limit: 20,
})?;

Query syntax — users can type these directly. The database parses them.

Syntax Meaning
jazz piano tutorial Match any of these terms (OR), rank by relevance
"jazz piano" Exact phrase match
jazz AND piano NOT beginner Boolean operators
jazz -beginner Exclude term
jazz pian* Wildcard prefix
title:jazz Field-scoped search
tag:tutorial Search within specific field
creator:jazzacademy Search by creator handle
#jazz Hashtag match
// Exact phrase with date filter
let results = db.search(Search {
    query: "\"machine learning\" fundamentals",
    filters: vec![
        Filter::created_within(Duration::days(30)),
        Filter::eq("format", "video"),
        Filter::preset("duration", "long"),
    ],
    limit: 20,
    ..Default::default()
})?;
// Semantic-only search (no text query, just a vector)
let results = db.search(Search {
    query: "",
    vector: Some(&image_embedding), // visual similarity search
    for_user: Some("user_123"),
    profile: "search",
    limit: 20,
    ..Default::default()
})?;
// People/creator search
let results = db.search(Search {
    query: "jazz piano",
    entity: EntityKind::Creator,
    filters: vec![
        Filter::eq("verified", true),
        Filter::min("follower_count", 1000),
    ],
    sort: Some(Sort::CreatorEngagementRate),
    limit: 10,
    ..Default::default()
})?;

Query Composition — SEARCH within Scoped Results

SEARCH can be composed with RETRIEVE scopes. This enables searching within trending, within a cohort, or within any candidate set.

// Search within globally trending items
let results = db.search(Search {
    query: "jazz piano",
    vector: Some(&query_embedding),
    within: Some(WithinScope::Trending {
        window: Duration::hours(24),
    }),
    profile: "search",
    limit: 20,
    ..Default::default()
})?;
// Search within cohort-scoped trending — the full three-layer query
let results = db.search(Search {
    query: "jazz piano",
    vector: Some(&query_embedding),
    within: Some(WithinScope::CohortTrending {
        cohort: "us_young_music",
        window: Duration::hours(24),
    }),
    profile: "search",
    limit: 20,
    ..Default::default()
})?;
// Search within a user's following feed
let results = db.search(Search {
    query: "jazz piano",
    for_user: Some("user_123"),
    within: Some(WithinScope::Following),
    profile: "search",
    limit: 20,
    ..Default::default()
})?;

WithinScope:

Scope Candidate Set
Trending { window } Items with high global velocity in window
CohortTrending { cohort, window } Items with high velocity among cohort members
Following Items from followed creators (requires for_user)
Category { name } Items in a category
Collection { id } Items in a collection

SUGGEST — Autocomplete and Suggestions

// Autocomplete on partial query
let suggestions = db.suggest(Suggest {
    prefix: "jazz pia",
    for_user: Some("user_123"),
    limit: 5,
})?;
// Returns: ["jazz piano", "jazz piano tutorial", "jazz piano chords", ...]

// Trending searches (empty prefix)
let trending = db.suggest(Suggest {
    prefix: "",
    for_user: None,
    limit: 10,
})?;

Filters

All filters are composable. Any combination of filters produces a valid, efficiently-executed query. Multi-select within a dimension uses OR logic. Cross-dimension uses AND logic.

Content Attribute Filters

Filter::eq("category", "jazz")                // exact match
Filter::any("category", &["jazz", "blues"])    // OR within dimension
Filter::eq("format", "video")
Filter::eq("language", "en")
Filter::eq("content_rating", "PG")
Filter::eq("status", "published")
Filter::eq("availability", "free")
Filter::eq("has_subtitles", true)
Filter::eq("downloadable", true)
Filter::eq("original_only", true)              // exclude crossposts

Duration Filters

Filter::preset("duration", "short")            // under 4 minutes
Filter::preset("duration", "medium")           // 4-20 minutes
Filter::preset("duration", "long")             // over 20 minutes
Filter::range("duration", 5 * 60..15 * 60)     // custom range (seconds)

Date / Time Filters

Filter::created_within(Duration::days(7))
Filter::created_preset("today")               // today, week, month, year
Filter::created_after("2025-01-01T00:00:00Z")
Filter::created_before("2025-06-01T00:00:00Z")
Filter::since(timestamp)                      // for notifications

Creator Filters

Filter::eq("creator", "creator_xyz")
Filter::exclude_creator("creator_bad")
Filter::eq("creator_verified", true)
Filter::creator_followed_by_user()             // requires for_user
Filter::creator_new_to_user()                  // never seen this creator
Filter::min("creator_min_followers", 1000)
Filter::max("creator_max_followers", 50000)

Engagement Threshold Filters

Filter::min("min_views", 10000)
Filter::max("max_views", 5000)                 // for hidden gems
Filter::min("min_likes", 100)
Filter::min("min_like_ratio", 0.85)
Filter::min("min_completion_rate", 0.5)
Filter::min("min_comments", 50)

User State Filters

Filter::unseen()                               // not yet viewed by this user
Filter::user_state("seen")                     // already viewed
Filter::user_state("in_progress")              // partially watched
Filter::user_state("saved")                    // bookmarked / watch later
Filter::user_state("liked")                    // user has liked
Filter::user_state("downloaded")               // available offline
Filter::in_collection("playlist_abc")          // in specific collection
Filter::not_blocked()                          // exclude blocked creators

Geographic Filters

Filter::eq("content_region", "US")
Filter::eq("trending_in_region", "US")
Filter::near_location(lat, lng, radius_km)

Cohort Filters

Filter::cohort("us_young_music")              // pre-defined cohort
Filter::cohort_predicate(Predicate::and(vec![ // ad-hoc cohort
    Predicate::eq("locale", "en-US"),
    Predicate::eq("age_range", "18-24"),
]))

Social Graph Filters

Filter::relationship("follows")                // from followed creators only
Filter::social_graph("user_123", Depth::Two)   // engaged by user's follows

See USE_CASES.md Appendix A for the complete filter reference.


Sort Modes

Sort modes are built-in. The application names a mode. The database executes it. No application-side sorting logic.

Sort can be specified as an override on any RETRIEVE query using the sort field, or it can be embedded in a ranking profile.

Sort::Relevance           // text + semantic match (search only)
Sort::Personalized        // user preference match
Sort::New                 // created_at DESC
Sort::Old                 // created_at ASC
Sort::Hot                 // score / (age + 2)^gravity — Reddit model
Sort::Trending            // pure engagement velocity
Sort::Rising              // velocity relative to baseline, age-boosted
Sort::Controversial       // max(positive * negative signals)
Sort::HiddenGems          // high quality, low reach
Sort::TopAllTime          // cumulative quality, no decay
Sort::TopHour
Sort::TopToday
Sort::TopWeek
Sort::TopMonth
Sort::TopYear
Sort::MostViewed
Sort::MostLiked
Sort::MostCommented
Sort::MostShared
Sort::Shortest            // duration ASC
Sort::Longest             // duration DESC
Sort::AlphabeticalAsc     // title A-Z
Sort::AlphabeticalDesc    // title Z-A
Sort::Shuffle             // random, quality-weighted
Sort::LiveViewerCount     // current viewer count DESC
Sort::DateSaved           // when user bookmarked DESC
Sort::CreatorEngagementRate

See USE_CASES.md Appendix B for the complete sort mode reference.


Diversity Constraints

Diversity is a post-scoring pass. After candidates are scored, diversity constraints reorder the result set to enforce variety — without reducing the result count.

DiversitySpec {
    // No more than N items from the same creator in the result set.
    max_per_creator: Some(2),

    // Ensure a mix of content formats (video, short, article, etc.)
    format_mix: true,

    // Topic diversity — 0.0 (no enforcement) to 1.0 (maximize diversity).
    // Uses maximal marginal relevance (MMR).
    topic_diversity: Some(0.7),
}

Diversity is specified per query or per ranking profile. Query-level diversity overrides the profile default.


Pagination

Cursor-based pagination for stable result sets across pages.

// First page
let page1 = db.retrieve(Retrieve {
    profile: "for_you",
    for_user: Some("user_123"),
    limit: 50,
    cursor: None,
    ..Default::default()
})?;

// Next page — pass the cursor from the previous response
let page2 = db.retrieve(Retrieve {
    profile: "for_you",
    for_user: Some("user_123"),
    limit: 50,
    cursor: page1.next_cursor,
    ..Default::default()
})?;

Alternatively, use exclude_ids to exclude previously returned items:

let page2 = db.retrieve(Retrieve {
    profile: "for_you",
    for_user: Some("user_123"),
    exclude_ids: page1.results.iter().map(|r| r.id.clone()).collect(),
    limit: 50,
    ..Default::default()
})?;

Response Format

All queries return a Results struct.

pub struct Results {
    /// Ranked items with scores.
    pub results: Vec<RankedItem>,
    /// Cursor for fetching the next page.
    pub next_cursor: Option<Cursor>,
    /// Total candidate count before diversity/limit (useful for faceted UI).
    pub total_candidates: u64,
}

pub struct RankedItem {
    /// Entity ID.
    pub id: String,
    /// Final composite score after ranking profile, boosts, penalties, diversity.
    pub score: f64,
    /// Signal snapshot at query time — lets the application display counts.
    pub signals: SignalSnapshot,
}

pub struct SignalSnapshot {
    /// Raw signal values at query time.
    /// e.g., { "view": { "24h": 12450, "7d": 89200, "all_time": 1230000 } }
    pub values: HashMap<String, HashMap<String, f64>>,
}

The application uses results to render the UI. It uses signals to display engagement counts (views, likes, etc.). It never re-ranks — the order from tidalDB is the final order.


Lifecycle and Operations

Shutdown

// Graceful shutdown — flushes WAL, finalizes materializer, persists ANN index.
db.shutdown()?;

Profile Management

// List all profiles
let profiles = db.list_profiles()?;

// Get a specific profile (with version)
let profile = db.get_profile("for_you", None)?;      // latest
let profile = db.get_profile("for_you", Some(1))?;   // specific version

// A/B test by using different profile versions at query time
let control = db.retrieve(Retrieve {
    profile: "for_you",       // uses latest version
    ..query.clone()
})?;

let variant = db.retrieve(Retrieve {
    profile: "for_you_v2",    // experimental profile
    ..query.clone()
})?;

Schema Inspection

// List defined signals
let signals = db.list_signals()?;

// List entity types with their fields
let entities = db.list_entities()?;

Saved Searches

// Save a search as a persistent feed — user gets new results over time.
db.save_search(SavedSearch {
    user: "user_123",
    name: "Jazz tutorials",
    query: "jazz tutorial",
    filters: vec![
        Filter::eq("format", "video"),
        Filter::eq("language", "en"),
    ],
})?;

// Query a saved search for new results since last check.
let results = db.retrieve_saved_search("user_123", "Jazz tutorials", since)?;

Collections

// Create a user collection (playlist, board, etc.)
db.create_collection(Collection {
    id: "playlist_abc",
    owner: "user_123",
    name: "Jazz Favorites",
    visibility: Visibility::Private, // Private, Shared, Public
})?;

// Add an item to a collection
db.add_to_collection("playlist_abc", "item_abc")?;

// Items can belong to multiple collections.
// Public collections are themselves rankable — they appear in browse and search.

Summary

Operation What the Application Does What tidalDB Does
Ingest content Compute embedding, call write_item Index text, insert vector, initialize signals, apply cold start
Record engagement Call signal with event type Update signal ledger, user preferences, relationship weights — atomically
Serve a feed Call retrieve with a profile name Candidate retrieval, scoring, diversity enforcement, pagination
Search Embed query, call search BM25 + ANN + fusion + personalization + diversity
Define ranking Declare a ProfileDef The database executes the entire ranking pipeline
Handle cold start Nothing Exploration budget, population priors — automatic
Handle negative signals Call signal with skip/hide/block Permanent exclusion, preference decay, relationship zeroing
Scope trending by cohort Specify cohort name or predicate in query Cohort-scoped signal aggregation, same ranking profile
Search within scope Specify within on search query Intersects text/vector retrieval with scoped candidate set

One process. One query interface. One operational model.