Personalized content ranking database
Go to file
jordan c87e9b0fdd docs: mark M0-M8 complete in roadmap with milestone-level summaries
- Add milestone-level COMPLETE summary bullets for M0–M8 (only M8 had one)
- Fix m8p6 lib test count (1199 → 1206 after latest additions)
- Update iknowyou/Aeries date to 2026-02-24
- Each summary captures the key capabilities proved by that milestone

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 23:40:10 -07:00
.agentive-remediation/establish-foundation-standards chore: initialize tidalDB repository with schema foundation and standards 2026-02-20 12:52:20 -07:00
.claude feat: add iknowyou app + complete M8 replication extensions + Aeries agents/skills 2026-02-24 21:09:11 -07:00
ai-lookup chore: initialize tidalDB repository with schema foundation and standards 2026-02-20 12:52:20 -07:00
applications feat: add iknowyou app + complete M8 replication extensions + Aeries agents/skills 2026-02-24 21:09:11 -07:00
docs docs: mark M0-M8 complete in roadmap with milestone-level summaries 2026-02-24 23:40:10 -07:00
site feat: complete M6-M7 + Enterprise Readiness milestones; split oversized source files per CODING_GUIDELINES §9 2026-02-23 22:41:16 -07:00
tidal feat: add iknowyou app + complete M8 replication extensions + Aeries agents/skills 2026-02-24 21:09:11 -07:00
tidalctl feat: complete M6-M7 + Enterprise Readiness milestones; split oversized source files per CODING_GUIDELINES §9 2026-02-23 22:41:16 -07:00
.gitignore feat: complete Milestone 5 — full-text search, RRF fusion, and creator search 2026-02-21 23:53:16 -07:00
API.md feat: complete M6-M7 + Enterprise Readiness milestones; split oversized source files per CODING_GUIDELINES §9 2026-02-23 22:41:16 -07:00
ARCHITECTURE.md feat: M0p1 runtime skeleton, M0p2 tooling & diagnostics, m1p4 signal ledger 2026-02-20 20:32:00 -07:00
Cargo.lock feat: complete M8 replication primitives + forage enhancements + docs 2026-02-24 13:17:19 -07:00
Cargo.toml feat: complete M6-M7 + Enterprise Readiness milestones; split oversized source files per CODING_GUIDELINES §9 2026-02-23 22:41:16 -07:00
CHANGELOG.md feat: complete M6-M7 + Enterprise Readiness milestones; split oversized source files per CODING_GUIDELINES §9 2026-02-23 22:41:16 -07:00
CLAUDE.md feat: complete M8 replication primitives + forage enhancements + docs 2026-02-24 13:17:19 -07:00
CODING_GUIDELINES.md chore: initialize tidalDB repository with schema foundation and standards 2026-02-20 12:52:20 -07:00
CONTRIBUTING.md feat: complete M1 signal engine — m0p3 samples/docs, m1p5 TidalDb API, examples, and periodic checkpoint 2026-02-20 22:45:10 -07:00
forage-discover.sh feat: complete M8 replication primitives + forage enhancements + docs 2026-02-24 13:17:19 -07:00
package-lock.json feat: implement Milestone 1 phases 1-3 — schema, WAL, and storage layer 2026-02-20 16:43:24 -07:00
package.json feat: implement Milestone 1 phases 1-3 — schema, WAL, and storage layer 2026-02-20 16:43:24 -07:00
QUICKSTART.md feat: complete M8 replication primitives + forage enhancements + docs 2026-02-24 13:17:19 -07:00
README.md feat: complete M8 replication primitives + forage enhancements + docs 2026-02-24 13:17:19 -07:00
SEQUENCE.md chore: initialize tidalDB repository with schema foundation and standards 2026-02-20 12:52:20 -07:00
thoughts.md chore: initialize tidalDB repository with schema foundation and standards 2026-02-20 12:52:20 -07:00
USE_CASES.md chore: initialize tidalDB repository with schema foundation and standards 2026-02-20 12:52:20 -07:00
VISION.md feat: complete Milestones 2–4 — RETRIEVE query, vector index, ranking profiles, diversity, entity system, sessions 2026-02-21 16:24:48 -07:00

tidalDB

An embeddable Rust database for the personalized content ranking problem.

Pre-release. API is stabilizing. Not yet recommended for production.


Every content platform eventually builds the same distributed system from scratch: Elasticsearch for retrieval, Redis for hot signals, Kafka for event ingestion, a feature store for user profiles, a vector database for semantic search, and a ranking service that stitches them together. The seams between those systems are where correctness dies — stale signals, inconsistent ranking, cache invalidation bugs, ETL lag.

The root cause: existing databases treat ranking as an afterthought. They have no native concept of signals that evolve over time, no understanding of user context, no diversity as a query constraint.

Ranking is not a feature. It is a primitive.

tidalDB is a single-node, embeddable Rust library built for one question: given a user and a context, what content should they see, and in what order? No server, no network protocol, no client SDK. Link it into your process.


What it looks like

use std::collections::HashMap;
use std::time::Duration;
use tidaldb::{TidalDb, query::retrieve::Retrieve, schema::{DecaySpec, EntityId, EntityKind, SchemaBuilder, Timestamp, Window}};

// Declare signals with native decay — no application formulas.
let mut schema = SchemaBuilder::new();
let _ = schema.signal("view", EntityKind::Item, DecaySpec::Exponential {
    half_life: Duration::from_secs(7 * 24 * 3600),
}).windows(&[Window::OneHour, Window::TwentyFourHours, Window::AllTime]).velocity(true).add();
let _ = schema.signal("like", EntityKind::Item, DecaySpec::Exponential {
    half_life: Duration::from_secs(30 * 24 * 3600),
}).windows(&[Window::AllTime]).velocity(false).add();
let schema = schema.build()?;

// Open — ephemeral for tests, persistent for production.
let db = TidalDb::builder().ephemeral().with_schema(schema).open()?;

// Ingest content with metadata.
let mut meta = HashMap::new();
meta.insert("title".to_string(), "Introduction to Jazz Piano".to_string());
meta.insert("category".to_string(), "music".to_string());
db.write_item_with_metadata(EntityId::new(1), &meta)?;

// Write an embedding (you generate it, tidalDB indexes and ranks over it).
db.write_item_embedding(EntityId::new(1), &your_model.embed("Introduction to Jazz Piano"))?;

// Record engagement — the feedback loop closes here, no ETL required.
db.signal("view", EntityId::new(1), 1.0, Timestamp::now())?;
db.signal_with_context("like", EntityId::new(1), 1.0, Timestamp::now(), Some(user_id), Some(creator_id))?;

// Retrieve a ranked feed. Name the profile. tidalDB executes the pipeline.
let results = db.retrieve(&Retrieve::builder().for_user(user_id).profile("for_you").limit(50).build()?)?;

// Search: BM25 + semantic similarity fused via RRF.
let results = db.search(&Search::builder().query("jazz piano tutorial").for_user(user_id).limit(20).build()?)?;

db.close()?;

What it replaces

System tidalDB equivalent
Elasticsearch Tantivy BM25 text index (derived, crash-recoverable)
Redis Lock-free in-memory signal ledger — decay scores, windowed counters
Kafka Write-ahead log — durable, ordered, replayable
Feature store Signal aggregates + user preference vectors (updated at write time)
Vector DB USearch HNSW — embedded, f16 quantized, predicate-filtered ANN
Ranking service 25 named profiles, scored at query time, swappable by name

Key capabilities

  • Signals with native decay — declare view with a 7-day half-life; the database applies it at query time. No trending_score_7d field to maintain.
  • 25 built-in ranking profilestrending, hot, for_you, following, related, hidden_gems, top_week, shuffle, controversial, and more. Name the profile; the database executes the full pipeline.
  • Hybrid search — BM25 full-text + ANN semantic similarity, fused via Reciprocal Rank Fusion, personalized by user preference vector.
  • Composable filters — filter by category, format, duration, language, engagement threshold, location, collection membership, and more — any combination, all composable.
  • Diversity as a query constraintmax_per_creator: 2 belongs in the query, not your API layer.
  • Feedback loop in the write path — a signal write atomically updates the item's ledger, the user's preference vector, and relationship weights. The next ranking query — 100ms later — reflects it.
  • Cold start handled — new content gets an exploration budget; new users get sensible defaults. No application logic required.
  • Cohort-scoped trending — "trending among US users aged 18-24 who engage with jazz" is one query, not a pipeline.
  • Embeddable first — runs in your process. Arc<TidalDb> is Send + Sync. No operational overhead.

Getting started

tidalDB is not yet published to crates.io. Add it as a git dependency:

[dependencies]
tidaldb = { git = "https://github.com/your-org/tidalDB", rev = "..." }

Then follow the Quickstart to get a working ranked feed in 10 minutes, or run the included example:

cargo run --manifest-path tidal/Cargo.toml --example quickstart

MSRV: Rust 1.91


Documentation

Document Contents
QUICKSTART.md Step-by-step guide: schema, ingest, signals, ranking, search
API.md Full API reference with code examples
VISION.md Problem statement and design thesis
ARCHITECTURE.md Storage, signal system, vector index, query pipeline
USE_CASES.md 14 content discovery surfaces, filter and sort references

Status

Milestones completed:

  • Storage engine, WAL, entity store, signal ledger
  • RETRIEVE query: candidate retrieval, filtering, scoring, diversity, pagination
  • Vector index (USearch HNSW) with adaptive filtered search
  • 25 built-in ranking profiles
  • BM25 full-text search (Tantivy) + hybrid RRF fusion
  • Creator search and creator profiles
  • Cohort-scoped signal aggregation and trending
  • Social graph (follows, blocks, following feed)
  • Collections, saved searches, autocomplete suggestions
  • Session and agent context (short-lived signals, preference decay)
  • Crash recovery, graceful degradation, rate limiting, diagnostics
  • Scale: tested to 1M items; scale benchmarks passing

The API surface is stable for the implemented features. Breaking changes are possible before 1.0.


License

MIT