|
|
||
|---|---|---|
| .agentive-remediation/establish-foundation-standards | ||
| .claude | ||
| ai-lookup | ||
| applications | ||
| docker | ||
| docs | ||
| site | ||
| tidal | ||
| tidal-server | ||
| tidalctl | ||
| .gitignore | ||
| .woodpecker.yaml | ||
| API.md | ||
| ARCHITECTURE.md | ||
| Cargo.lock | ||
| Cargo.toml | ||
| CHANGELOG.md | ||
| CLAUDE.md | ||
| CODING_GUIDELINES.md | ||
| CONTRIBUTING.md | ||
| forage-discover.sh | ||
| package-lock.json | ||
| package.json | ||
| QUICKSTART.md | ||
| README.md | ||
| SEQUENCE.md | ||
| thoughts.md | ||
| USE_CASES.md | ||
| VISION.md | ||
tidalDB
An embeddable Rust database for the personalized content ranking problem.
Pre-release. API is stabilizing. Not yet recommended for production.
Every content platform eventually builds the same distributed system from scratch: Elasticsearch for retrieval, Redis for hot signals, Kafka for event ingestion, a feature store for user profiles, a vector database for semantic search, and a ranking service that stitches them together. The seams between those systems are where correctness dies — stale signals, inconsistent ranking, cache invalidation bugs, ETL lag.
The root cause: existing databases treat ranking as an afterthought. They have no native concept of signals that evolve over time, no understanding of user context, no diversity as a query constraint.
Ranking is not a feature. It is a primitive.
tidalDB is a single-node, embeddable Rust library built for one question: given a user and a context, what content should they see, and in what order? No server, no network protocol, no client SDK. Link it into your process.
What it looks like
use std::collections::HashMap;
use std::time::Duration;
use tidaldb::{TidalDb, query::retrieve::Retrieve, schema::{DecaySpec, EntityId, EntityKind, SchemaBuilder, Timestamp, Window}};
// Declare signals with native decay — no application formulas.
let mut schema = SchemaBuilder::new();
let _ = schema.signal("view", EntityKind::Item, DecaySpec::Exponential {
half_life: Duration::from_secs(7 * 24 * 3600),
}).windows(&[Window::OneHour, Window::TwentyFourHours, Window::AllTime]).velocity(true).add();
let _ = schema.signal("like", EntityKind::Item, DecaySpec::Exponential {
half_life: Duration::from_secs(30 * 24 * 3600),
}).windows(&[Window::AllTime]).velocity(false).add();
let schema = schema.build()?;
// Open — ephemeral for tests, persistent for production.
let db = TidalDb::builder().ephemeral().with_schema(schema).open()?;
// Ingest content with metadata.
let mut meta = HashMap::new();
meta.insert("title".to_string(), "Introduction to Jazz Piano".to_string());
meta.insert("category".to_string(), "music".to_string());
db.write_item_with_metadata(EntityId::new(1), &meta)?;
// Write an embedding (you generate it, tidalDB indexes and ranks over it).
db.write_item_embedding(EntityId::new(1), &your_model.embed("Introduction to Jazz Piano"))?;
// Record engagement — the feedback loop closes here, no ETL required.
db.signal("view", EntityId::new(1), 1.0, Timestamp::now())?;
db.signal_with_context("like", EntityId::new(1), 1.0, Timestamp::now(), Some(user_id), Some(creator_id))?;
// Retrieve a ranked feed. Name the profile. tidalDB executes the pipeline.
let results = db.retrieve(&Retrieve::builder().for_user(user_id).profile("for_you").limit(50).build()?)?;
// Search: BM25 + semantic similarity fused via RRF.
let results = db.search(&Search::builder().query("jazz piano tutorial").for_user(user_id).limit(20).build()?)?;
db.close()?;
What it replaces
| System | tidalDB equivalent |
|---|---|
| Elasticsearch | Tantivy BM25 text index (derived, crash-recoverable) |
| Redis | Lock-free in-memory signal ledger — decay scores, windowed counters |
| Kafka | Write-ahead log — durable, ordered, replayable |
| Feature store | Signal aggregates + user preference vectors (updated at write time) |
| Vector DB | USearch HNSW — embedded, f16 quantized, predicate-filtered ANN |
| Ranking service | 25 named profiles, scored at query time, swappable by name |
Key capabilities
- Signals with native decay — declare
viewwith a 7-day half-life; the database applies it at query time. Notrending_score_7dfield to maintain. - 25 built-in ranking profiles —
trending,hot,for_you,following,related,hidden_gems,top_week,shuffle,controversial, and more. Name the profile; the database executes the full pipeline. - Hybrid search — BM25 full-text + ANN semantic similarity, fused via Reciprocal Rank Fusion, personalized by user preference vector.
- Composable filters — filter by category, format, duration, language, engagement threshold, location, collection membership, and more — any combination, all composable.
- Diversity as a query constraint —
max_per_creator: 2belongs in the query, not your API layer. - Feedback loop in the write path — a signal write atomically updates the item's ledger, the user's preference vector, and relationship weights. The next ranking query — 100ms later — reflects it.
- Cold start handled — new content gets an exploration budget; new users get sensible defaults. No application logic required.
- Cohort-scoped trending — "trending among US users aged 18-24 who engage with jazz" is one query, not a pipeline.
- Embeddable first — runs in your process.
Arc<TidalDb>isSend + Sync. No operational overhead.
Getting started
Pick the path that matches how you plan to use tidalDB today. Every option below is self-contained and ships in this repo.
1. Embed tidalDB inside your Rust service (library mode)
Setup
- Add the git dependency:
[dependencies] tidaldb = { git = "https://github.com/your-org/tidalDB", rev = "..." } - Define your schema before opening the database (decay, windows, text fields, embeddings). The snippet in Quickstart, Step 2 is a ready-to-copy template.
- Choose storage mode when building:
let db = tidaldb::TidalDb::builder() .with_schema(schema) .ephemeral() // in-memory for tests // .with_data_dir("/var/lib/tidaldb") // persistent deployment .open()?; - Run the end-to-end sample:
cargo run --manifest-path tidal/Cargo.toml --example quickstart
Usage
- Call
db.signal(...),db.signal_with_context(...), anddb.retrieve(...)/db.search(...)from the same process; no network stack required. - Wrap the instance in
Arc<TidalDb>to share it across threads or tasks. - Persisted deployments can be inspected with the CLI tool:
cargo run -p tidalctl -- status --path /var/lib/tidaldb. - Full walkthrough: QUICKSTART.md and API.md.
2. Run the standalone HTTP server (tidal-server)
Why: you want a ready-to-run HTTP facade without writing Axum/Actix glue.
cargo run -p tidal-server -- \
standalone \
--listen 127.0.0.1:9400 \
--schema tidal-server/config/default-schema.yaml
Options:
--data-dir /var/lib/tidaldbswitches to persistent storage.- Provide your own schema file (YAML) to match your signal mix.
Usage:
# register metadata + embedding
curl -X POST http://127.0.0.1:9400/items \
-H 'Content-Type: application/json' \
-d '{ "entity_id": 1, "metadata": { "title": "Jazz Piano", "category": "music" } }'
curl -X POST http://127.0.0.1:9400/embeddings \
-H 'Content-Type: application/json' \
-d '{ "entity_id": 1, "values": [0.1, 0.2, 0.3] }'
# write engagement (supports user/creator context)
curl -X POST http://127.0.0.1:9400/signals \
-H 'Content-Type: application/json' \
-d '{ "entity_id": 1, "signal": "view", "weight": 1.0, "user_id": 42 }'
# query
curl "http://127.0.0.1:9400/feed?user_id=42&profile=for_you&limit=20"
curl "http://127.0.0.1:9400/search?query=jazz%20piano&user_id=42&limit=5"
curl http://127.0.0.1:9400/health
The default schema lives at tidal-server/config/default-schema.yaml. Edit
it (or provide your own path) to align with your application’s signals,
text fields, and embedding slots.
3. Wrap it in an HTTP service you control
Expose tidalDB through your favorite web framework; the repo ships runnable templates.
-
Axum sample (
tidal/examples/axum_embedding.rs)cargo run --example axum_embedding --manifest-path tidal/Cargo.tomlUsage:
curl -X POST http://127.0.0.1:3000/signal \ -H 'Content-Type: application/json' \ -d '{ "entity_id": 1, "signal": "view", "weight": 1.0 }' curl "http://127.0.0.1:3000/feed?user_id=42" curl http://127.0.0.1:3000/healthThe example handles schema setup, wraps
Arc<TidalDb>in AxumState, and mapsTidalErrorto HTTP responses. -
Actix sample (
tidal/examples/actix_embedding.rs)cargo run --example actix_embedding --manifest-path tidal/Cargo.toml # curl http://127.0.0.1:3001/healthDemonstrates sharing
Arc<TidalDb>throughweb::Dataand using Actix’s shutdown hooks.
Use either sample as a starting point for microservices that prefer a client/server boundary.
4. Run the Forage demo server (Axum + UI)
Want to see tidalDB powering a live personalization surface? Forage is a thin Axum server + feed UI that talks to a tidalDB instance embedded in-process.
cargo run -p forage-server --manifest-path applications/forage/server/Cargo.toml
open http://localhost:4242
Flags:
--ephemeralto keep everything in-memory.--data-dir ~/.forage/datato point at a custom persistent directory.
Usage:
curl -X POST http://localhost:4242/signal \
-H "Content-Type: application/json" \
-d '{ "user_id": 1, "item_id": 42, "signal_type": "view" }'
curl "http://localhost:4242/feed?user=1&limit=7"
The UI shows seeded users, exploration labels, and real-time adaptation; see applications/forage/readme.md for the full loop.
5. Run the cluster server + Docker image
Need a single endpoint that fronts the built-in simulated cluster? Use
tidal-server in cluster mode. It spins up the multi-region fabric,
ships WAL batches between regions, and exposes /signals, /feed,
/search plus cluster-management routes.
cargo run -p tidal-server -- \
cluster \
--listen 0.0.0.0:9500 \
--schema tidal-server/config/default-schema.yaml \
--topology tidal-server/config/default-cluster.yaml
Key endpoints:
curl http://127.0.0.1:9500/health
curl -X POST http://127.0.0.1:9500/signals -d '{ "entity_id": 1, "signal": "view", "weight": 1.0 }'
curl "http://127.0.0.1:9500/feed?profile=trending®ion=eu-west"
curl http://127.0.0.1:9500/cluster/status
curl -X POST http://127.0.0.1:9500/cluster/promote -d '{ "region": "eu-west" }'
Cluster mode currently replicates global signals (no user_id /
creator_id contexts) so that followers can stay in sync with the leader’s
WAL stream. See docs/runbooks/cluster.md for
operational steps, failure drills, and API references.
Prefer containers? Build the provided image and run it anywhere:
docker build -f docker/cluster/Dockerfile -t tidal-cluster .
docker run --rm -p 9500:9500 tidal-cluster
Mount your own schema/topology files with -v if you want different regions
or signal definitions.
6. Simulate a multi-region cluster in tests
The raw SimulatedCluster harness (no HTTP) remains available for property
tests and fuzzing.
cargo test --test m8_uat
cargo test --test m8_uat uat_step3 -- --nocapture # run a single scenario
Tweak tidal/tests/m8_uat.rs to script specific replication, failover, and
migration scenarios inside your own test suites.
MSRV: Rust 1.91
Documentation
| Document | Contents |
|---|---|
| QUICKSTART.md | Step-by-step guide: schema, ingest, signals, ranking, search |
| API.md | Full API reference with code examples |
| VISION.md | Problem statement and design thesis |
| ARCHITECTURE.md | Storage, signal system, vector index, query pipeline |
| USE_CASES.md | 14 content discovery surfaces, filter and sort references |
Status
Milestones completed:
- Storage engine, WAL, entity store, signal ledger
- RETRIEVE query: candidate retrieval, filtering, scoring, diversity, pagination
- Vector index (USearch HNSW) with adaptive filtered search
- 25 built-in ranking profiles
- BM25 full-text search (Tantivy) + hybrid RRF fusion
- Creator search and creator profiles
- Cohort-scoped signal aggregation and trending
- Social graph (follows, blocks, following feed)
- Collections, saved searches, autocomplete suggestions
- Session and agent context (short-lived signals, preference decay)
- Crash recovery, graceful degradation, rate limiting, diagnostics
- Scale: tested to 1M items; scale benchmarks passing
The API surface is stable for the implemented features. Breaking changes are possible before 1.0.
License
MIT