Personalized content ranking database
Go to file
jordan 3cc798fc15
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
ci: add Woodpecker pipeline for tidal-server image build and k8s deploy
2026-02-28 10:21:37 -07:00
.agentive-remediation/establish-foundation-standards chore: initialize tidalDB repository with schema foundation and standards 2026-02-20 12:52:20 -07:00
.claude feat: add iknowyou app + complete M8 replication extensions + Aeries agents/skills 2026-02-24 21:09:11 -07:00
ai-lookup chore: initialize tidalDB repository with schema foundation and standards 2026-02-20 12:52:20 -07:00
applications feat(iknowyou): add SSH tunnel script, Synap space header, and hydration fix 2026-02-26 11:18:34 -05:00
docker feat: harden tidal-server for production (Weeks 1–3) 2026-02-27 20:32:39 -07:00
docs feat: harden tidal-server for production (Weeks 1–3) 2026-02-27 20:32:39 -07:00
site feat: complete M6-M7 + Enterprise Readiness milestones; split oversized source files per CODING_GUIDELINES §9 2026-02-23 22:41:16 -07:00
tidal feat: harden tidal-server for production (Weeks 1–3) 2026-02-27 20:32:39 -07:00
tidal-server feat: harden tidal-server for production (Weeks 1–3) 2026-02-27 20:32:39 -07:00
tidalctl feat: complete M6-M7 + Enterprise Readiness milestones; split oversized source files per CODING_GUIDELINES §9 2026-02-23 22:41:16 -07:00
.gitignore feat: complete Milestone 5 — full-text search, RRF fusion, and creator search 2026-02-21 23:53:16 -07:00
.woodpecker.yaml ci: add Woodpecker pipeline for tidal-server image build and k8s deploy 2026-02-28 10:21:37 -07:00
API.md feat: complete M6-M7 + Enterprise Readiness milestones; split oversized source files per CODING_GUIDELINES §9 2026-02-23 22:41:16 -07:00
ARCHITECTURE.md feat: M0p1 runtime skeleton, M0p2 tooling & diagnostics, m1p4 signal ledger 2026-02-20 20:32:00 -07:00
Cargo.lock feat: harden tidal-server for production (Weeks 1–3) 2026-02-27 20:32:39 -07:00
Cargo.toml fix: heal_region re-delivers missed WAL batches so partitioned followers converge immediately after heal 2026-02-25 11:57:01 -07:00
CHANGELOG.md feat: complete M6-M7 + Enterprise Readiness milestones; split oversized source files per CODING_GUIDELINES §9 2026-02-23 22:41:16 -07:00
CLAUDE.md feat: complete M8 replication primitives + forage enhancements + docs 2026-02-24 13:17:19 -07:00
CODING_GUIDELINES.md chore: initialize tidalDB repository with schema foundation and standards 2026-02-20 12:52:20 -07:00
CONTRIBUTING.md feat: complete M1 signal engine — m0p3 samples/docs, m1p5 TidalDb API, examples, and periodic checkpoint 2026-02-20 22:45:10 -07:00
forage-discover.sh feat: complete M8 replication primitives + forage enhancements + docs 2026-02-24 13:17:19 -07:00
package-lock.json feat: implement Milestone 1 phases 1-3 — schema, WAL, and storage layer 2026-02-20 16:43:24 -07:00
package.json feat: implement Milestone 1 phases 1-3 — schema, WAL, and storage layer 2026-02-20 16:43:24 -07:00
QUICKSTART.md feat: complete M8 replication primitives + forage enhancements + docs 2026-02-24 13:17:19 -07:00
README.md fix: heal_region re-delivers missed WAL batches so partitioned followers converge immediately after heal 2026-02-25 11:57:01 -07:00
SEQUENCE.md chore: initialize tidalDB repository with schema foundation and standards 2026-02-20 12:52:20 -07:00
thoughts.md chore: initialize tidalDB repository with schema foundation and standards 2026-02-20 12:52:20 -07:00
USE_CASES.md chore: initialize tidalDB repository with schema foundation and standards 2026-02-20 12:52:20 -07:00
VISION.md feat: complete Milestones 2–4 — RETRIEVE query, vector index, ranking profiles, diversity, entity system, sessions 2026-02-21 16:24:48 -07:00

tidalDB

An embeddable Rust database for the personalized content ranking problem.

Pre-release. API is stabilizing. Not yet recommended for production.


Every content platform eventually builds the same distributed system from scratch: Elasticsearch for retrieval, Redis for hot signals, Kafka for event ingestion, a feature store for user profiles, a vector database for semantic search, and a ranking service that stitches them together. The seams between those systems are where correctness dies — stale signals, inconsistent ranking, cache invalidation bugs, ETL lag.

The root cause: existing databases treat ranking as an afterthought. They have no native concept of signals that evolve over time, no understanding of user context, no diversity as a query constraint.

Ranking is not a feature. It is a primitive.

tidalDB is a single-node, embeddable Rust library built for one question: given a user and a context, what content should they see, and in what order? No server, no network protocol, no client SDK. Link it into your process.


What it looks like

use std::collections::HashMap;
use std::time::Duration;
use tidaldb::{TidalDb, query::retrieve::Retrieve, schema::{DecaySpec, EntityId, EntityKind, SchemaBuilder, Timestamp, Window}};

// Declare signals with native decay — no application formulas.
let mut schema = SchemaBuilder::new();
let _ = schema.signal("view", EntityKind::Item, DecaySpec::Exponential {
    half_life: Duration::from_secs(7 * 24 * 3600),
}).windows(&[Window::OneHour, Window::TwentyFourHours, Window::AllTime]).velocity(true).add();
let _ = schema.signal("like", EntityKind::Item, DecaySpec::Exponential {
    half_life: Duration::from_secs(30 * 24 * 3600),
}).windows(&[Window::AllTime]).velocity(false).add();
let schema = schema.build()?;

// Open — ephemeral for tests, persistent for production.
let db = TidalDb::builder().ephemeral().with_schema(schema).open()?;

// Ingest content with metadata.
let mut meta = HashMap::new();
meta.insert("title".to_string(), "Introduction to Jazz Piano".to_string());
meta.insert("category".to_string(), "music".to_string());
db.write_item_with_metadata(EntityId::new(1), &meta)?;

// Write an embedding (you generate it, tidalDB indexes and ranks over it).
db.write_item_embedding(EntityId::new(1), &your_model.embed("Introduction to Jazz Piano"))?;

// Record engagement — the feedback loop closes here, no ETL required.
db.signal("view", EntityId::new(1), 1.0, Timestamp::now())?;
db.signal_with_context("like", EntityId::new(1), 1.0, Timestamp::now(), Some(user_id), Some(creator_id))?;

// Retrieve a ranked feed. Name the profile. tidalDB executes the pipeline.
let results = db.retrieve(&Retrieve::builder().for_user(user_id).profile("for_you").limit(50).build()?)?;

// Search: BM25 + semantic similarity fused via RRF.
let results = db.search(&Search::builder().query("jazz piano tutorial").for_user(user_id).limit(20).build()?)?;

db.close()?;

What it replaces

System tidalDB equivalent
Elasticsearch Tantivy BM25 text index (derived, crash-recoverable)
Redis Lock-free in-memory signal ledger — decay scores, windowed counters
Kafka Write-ahead log — durable, ordered, replayable
Feature store Signal aggregates + user preference vectors (updated at write time)
Vector DB USearch HNSW — embedded, f16 quantized, predicate-filtered ANN
Ranking service 25 named profiles, scored at query time, swappable by name

Key capabilities

  • Signals with native decay — declare view with a 7-day half-life; the database applies it at query time. No trending_score_7d field to maintain.
  • 25 built-in ranking profilestrending, hot, for_you, following, related, hidden_gems, top_week, shuffle, controversial, and more. Name the profile; the database executes the full pipeline.
  • Hybrid search — BM25 full-text + ANN semantic similarity, fused via Reciprocal Rank Fusion, personalized by user preference vector.
  • Composable filters — filter by category, format, duration, language, engagement threshold, location, collection membership, and more — any combination, all composable.
  • Diversity as a query constraintmax_per_creator: 2 belongs in the query, not your API layer.
  • Feedback loop in the write path — a signal write atomically updates the item's ledger, the user's preference vector, and relationship weights. The next ranking query — 100ms later — reflects it.
  • Cold start handled — new content gets an exploration budget; new users get sensible defaults. No application logic required.
  • Cohort-scoped trending — "trending among US users aged 18-24 who engage with jazz" is one query, not a pipeline.
  • Embeddable first — runs in your process. Arc<TidalDb> is Send + Sync. No operational overhead.

Getting started

Pick the path that matches how you plan to use tidalDB today. Every option below is self-contained and ships in this repo.

1. Embed tidalDB inside your Rust service (library mode)

Setup

  1. Add the git dependency:
    [dependencies]
    tidaldb = { git = "https://github.com/your-org/tidalDB", rev = "..." }
    
  2. Define your schema before opening the database (decay, windows, text fields, embeddings). The snippet in Quickstart, Step 2 is a ready-to-copy template.
  3. Choose storage mode when building:
    let db = tidaldb::TidalDb::builder()
        .with_schema(schema)
        .ephemeral()               // in-memory for tests
        // .with_data_dir("/var/lib/tidaldb") // persistent deployment
        .open()?;
    
  4. Run the end-to-end sample:
    cargo run --manifest-path tidal/Cargo.toml --example quickstart
    

Usage

  • Call db.signal(...), db.signal_with_context(...), and db.retrieve(...) / db.search(...) from the same process; no network stack required.
  • Wrap the instance in Arc<TidalDb> to share it across threads or tasks.
  • Persisted deployments can be inspected with the CLI tool: cargo run -p tidalctl -- status --path /var/lib/tidaldb.
  • Full walkthrough: QUICKSTART.md and API.md.

2. Run the standalone HTTP server (tidal-server)

Why: you want a ready-to-run HTTP facade without writing Axum/Actix glue.

cargo run -p tidal-server -- \
  standalone \
  --listen 127.0.0.1:9400 \
  --schema tidal-server/config/default-schema.yaml

Options:

  • --data-dir /var/lib/tidaldb switches to persistent storage.
  • Provide your own schema file (YAML) to match your signal mix.

Usage:

# register metadata + embedding
curl -X POST http://127.0.0.1:9400/items \
  -H 'Content-Type: application/json' \
  -d '{ "entity_id": 1, "metadata": { "title": "Jazz Piano", "category": "music" } }'
curl -X POST http://127.0.0.1:9400/embeddings \
  -H 'Content-Type: application/json' \
  -d '{ "entity_id": 1, "values": [0.1, 0.2, 0.3] }'

# write engagement (supports user/creator context)
curl -X POST http://127.0.0.1:9400/signals \
  -H 'Content-Type: application/json' \
  -d '{ "entity_id": 1, "signal": "view", "weight": 1.0, "user_id": 42 }'

# query
curl "http://127.0.0.1:9400/feed?user_id=42&profile=for_you&limit=20"
curl "http://127.0.0.1:9400/search?query=jazz%20piano&user_id=42&limit=5"
curl http://127.0.0.1:9400/health

The default schema lives at tidal-server/config/default-schema.yaml. Edit it (or provide your own path) to align with your applications signals, text fields, and embedding slots.

3. Wrap it in an HTTP service you control

Expose tidalDB through your favorite web framework; the repo ships runnable templates.

  • Axum sample (tidal/examples/axum_embedding.rs)

    cargo run --example axum_embedding --manifest-path tidal/Cargo.toml
    

    Usage:

    curl -X POST http://127.0.0.1:3000/signal \
         -H 'Content-Type: application/json' \
         -d '{ "entity_id": 1, "signal": "view", "weight": 1.0 }'
    curl "http://127.0.0.1:3000/feed?user_id=42"
    curl http://127.0.0.1:3000/health
    

    The example handles schema setup, wraps Arc<TidalDb> in Axum State, and maps TidalError to HTTP responses.

  • Actix sample (tidal/examples/actix_embedding.rs)

    cargo run --example actix_embedding --manifest-path tidal/Cargo.toml
    # curl http://127.0.0.1:3001/health
    

    Demonstrates sharing Arc<TidalDb> through web::Data and using Actixs shutdown hooks.

Use either sample as a starting point for microservices that prefer a client/server boundary.

4. Run the Forage demo server (Axum + UI)

Want to see tidalDB powering a live personalization surface? Forage is a thin Axum server + feed UI that talks to a tidalDB instance embedded in-process.

cargo run -p forage-server --manifest-path applications/forage/server/Cargo.toml
open http://localhost:4242

Flags:

  • --ephemeral to keep everything in-memory.
  • --data-dir ~/.forage/data to point at a custom persistent directory.

Usage:

curl -X POST http://localhost:4242/signal \
     -H "Content-Type: application/json" \
     -d '{ "user_id": 1, "item_id": 42, "signal_type": "view" }'
curl "http://localhost:4242/feed?user=1&limit=7"

The UI shows seeded users, exploration labels, and real-time adaptation; see applications/forage/readme.md for the full loop.

5. Run the cluster server + Docker image

Need a single endpoint that fronts the built-in simulated cluster? Use tidal-server in cluster mode. It spins up the multi-region fabric, ships WAL batches between regions, and exposes /signals, /feed, /search plus cluster-management routes.

cargo run -p tidal-server -- \
  cluster \
  --listen 0.0.0.0:9500 \
  --schema tidal-server/config/default-schema.yaml \
  --topology tidal-server/config/default-cluster.yaml

Key endpoints:

curl http://127.0.0.1:9500/health
curl -X POST http://127.0.0.1:9500/signals -d '{ "entity_id": 1, "signal": "view", "weight": 1.0 }'
curl "http://127.0.0.1:9500/feed?profile=trending&region=eu-west"
curl http://127.0.0.1:9500/cluster/status
curl -X POST http://127.0.0.1:9500/cluster/promote -d '{ "region": "eu-west" }'

Cluster mode currently replicates global signals (no user_id / creator_id contexts) so that followers can stay in sync with the leaders WAL stream. See docs/runbooks/cluster.md for operational steps, failure drills, and API references.

Prefer containers? Build the provided image and run it anywhere:

docker build -f docker/cluster/Dockerfile -t tidal-cluster .
docker run --rm -p 9500:9500 tidal-cluster

Mount your own schema/topology files with -v if you want different regions or signal definitions.

6. Simulate a multi-region cluster in tests

The raw SimulatedCluster harness (no HTTP) remains available for property tests and fuzzing.

cargo test --test m8_uat
cargo test --test m8_uat uat_step3 -- --nocapture   # run a single scenario

Tweak tidal/tests/m8_uat.rs to script specific replication, failover, and migration scenarios inside your own test suites.

MSRV: Rust 1.91


Documentation

Document Contents
QUICKSTART.md Step-by-step guide: schema, ingest, signals, ranking, search
API.md Full API reference with code examples
VISION.md Problem statement and design thesis
ARCHITECTURE.md Storage, signal system, vector index, query pipeline
USE_CASES.md 14 content discovery surfaces, filter and sort references

Status

Milestones completed:

  • Storage engine, WAL, entity store, signal ledger
  • RETRIEVE query: candidate retrieval, filtering, scoring, diversity, pagination
  • Vector index (USearch HNSW) with adaptive filtered search
  • 25 built-in ranking profiles
  • BM25 full-text search (Tantivy) + hybrid RRF fusion
  • Creator search and creator profiles
  • Cohort-scoped signal aggregation and trending
  • Social graph (follows, blocks, following feed)
  • Collections, saved searches, autocomplete suggestions
  • Session and agent context (short-lived signals, preference decay)
  • Crash recovery, graceful degradation, rate limiting, diagnostics
  • Scale: tested to 1M items; scale benchmarks passing

The API surface is stable for the implemented features. Breaking changes are possible before 1.0.


License

MIT