Personalized content ranking database

Go to file

jordan 3cc798fc15 Some checks failed ci/woodpecker/push/woodpecker Pipeline failed Details ci: add Woodpecker pipeline for tidal-server image build and k8s deploy		2026-02-28 10:21:37 -07:00
.agentive-remediation/establish-foundation-standards	chore: initialize tidalDB repository with schema foundation and standards	2026-02-20 12:52:20 -07:00
.claude	feat: add iknowyou app + complete M8 replication extensions + Aeries agents/skills	2026-02-24 21:09:11 -07:00
ai-lookup	chore: initialize tidalDB repository with schema foundation and standards	2026-02-20 12:52:20 -07:00
applications	feat(iknowyou): add SSH tunnel script, Synap space header, and hydration fix	2026-02-26 11:18:34 -05:00
docker	feat: harden tidal-server for production (Weeks 1–3)	2026-02-27 20:32:39 -07:00
docs	feat: harden tidal-server for production (Weeks 1–3)	2026-02-27 20:32:39 -07:00
site	feat: complete M6-M7 + Enterprise Readiness milestones; split oversized source files per CODING_GUIDELINES §9	2026-02-23 22:41:16 -07:00
tidal	feat: harden tidal-server for production (Weeks 1–3)	2026-02-27 20:32:39 -07:00
tidal-server	feat: harden tidal-server for production (Weeks 1–3)	2026-02-27 20:32:39 -07:00
tidalctl	feat: complete M6-M7 + Enterprise Readiness milestones; split oversized source files per CODING_GUIDELINES §9	2026-02-23 22:41:16 -07:00
.gitignore	feat: complete Milestone 5 — full-text search, RRF fusion, and creator search	2026-02-21 23:53:16 -07:00
.woodpecker.yaml	ci: add Woodpecker pipeline for tidal-server image build and k8s deploy	2026-02-28 10:21:37 -07:00
API.md	feat: complete M6-M7 + Enterprise Readiness milestones; split oversized source files per CODING_GUIDELINES §9	2026-02-23 22:41:16 -07:00
ARCHITECTURE.md	feat: M0p1 runtime skeleton, M0p2 tooling & diagnostics, m1p4 signal ledger	2026-02-20 20:32:00 -07:00
Cargo.lock	feat: harden tidal-server for production (Weeks 1–3)	2026-02-27 20:32:39 -07:00
Cargo.toml	fix: heal_region re-delivers missed WAL batches so partitioned followers converge immediately after heal	2026-02-25 11:57:01 -07:00
CHANGELOG.md	feat: complete M6-M7 + Enterprise Readiness milestones; split oversized source files per CODING_GUIDELINES §9	2026-02-23 22:41:16 -07:00
CLAUDE.md	feat: complete M8 replication primitives + forage enhancements + docs	2026-02-24 13:17:19 -07:00
CODING_GUIDELINES.md	chore: initialize tidalDB repository with schema foundation and standards	2026-02-20 12:52:20 -07:00
CONTRIBUTING.md	feat: complete M1 signal engine — m0p3 samples/docs, m1p5 TidalDb API, examples, and periodic checkpoint	2026-02-20 22:45:10 -07:00
forage-discover.sh	feat: complete M8 replication primitives + forage enhancements + docs	2026-02-24 13:17:19 -07:00
package-lock.json	feat: implement Milestone 1 phases 1-3 — schema, WAL, and storage layer	2026-02-20 16:43:24 -07:00
package.json	feat: implement Milestone 1 phases 1-3 — schema, WAL, and storage layer	2026-02-20 16:43:24 -07:00
QUICKSTART.md	feat: complete M8 replication primitives + forage enhancements + docs	2026-02-24 13:17:19 -07:00
README.md	fix: heal_region re-delivers missed WAL batches so partitioned followers converge immediately after heal	2026-02-25 11:57:01 -07:00
SEQUENCE.md	chore: initialize tidalDB repository with schema foundation and standards	2026-02-20 12:52:20 -07:00
thoughts.md	chore: initialize tidalDB repository with schema foundation and standards	2026-02-20 12:52:20 -07:00
USE_CASES.md	chore: initialize tidalDB repository with schema foundation and standards	2026-02-20 12:52:20 -07:00
VISION.md	feat: complete Milestones 2–4 — RETRIEVE query, vector index, ranking profiles, diversity, entity system, sessions	2026-02-21 16:24:48 -07:00

README.md

tidalDB

An embeddable Rust database for the personalized content ranking problem.

Pre-release. API is stabilizing. Not yet recommended for production.

Every content platform eventually builds the same distributed system from scratch: Elasticsearch for retrieval, Redis for hot signals, Kafka for event ingestion, a feature store for user profiles, a vector database for semantic search, and a ranking service that stitches them together. The seams between those systems are where correctness dies — stale signals, inconsistent ranking, cache invalidation bugs, ETL lag.

The root cause: existing databases treat ranking as an afterthought. They have no native concept of signals that evolve over time, no understanding of user context, no diversity as a query constraint.

Ranking is not a feature. It is a primitive.

tidalDB is a single-node, embeddable Rust library built for one question: given a user and a context, what content should they see, and in what order? No server, no network protocol, no client SDK. Link it into your process.

What it looks like

use std::collections::HashMap;
use std::time::Duration;
use tidaldb::{TidalDb, query::retrieve::Retrieve, schema::{DecaySpec, EntityId, EntityKind, SchemaBuilder, Timestamp, Window}};

// Declare signals with native decay — no application formulas.
let mut schema = SchemaBuilder::new();
let _ = schema.signal("view", EntityKind::Item, DecaySpec::Exponential {
    half_life: Duration::from_secs(7 * 24 * 3600),
}).windows(&[Window::OneHour, Window::TwentyFourHours, Window::AllTime]).velocity(true).add();
let _ = schema.signal("like", EntityKind::Item, DecaySpec::Exponential {
    half_life: Duration::from_secs(30 * 24 * 3600),
}).windows(&[Window::AllTime]).velocity(false).add();
let schema = schema.build()?;

// Open — ephemeral for tests, persistent for production.
let db = TidalDb::builder().ephemeral().with_schema(schema).open()?;

// Ingest content with metadata.
let mut meta = HashMap::new();
meta.insert("title".to_string(), "Introduction to Jazz Piano".to_string());
meta.insert("category".to_string(), "music".to_string());
db.write_item_with_metadata(EntityId::new(1), &meta)?;

// Write an embedding (you generate it, tidalDB indexes and ranks over it).
db.write_item_embedding(EntityId::new(1), &your_model.embed("Introduction to Jazz Piano"))?;

// Record engagement — the feedback loop closes here, no ETL required.
db.signal("view", EntityId::new(1), 1.0, Timestamp::now())?;
db.signal_with_context("like", EntityId::new(1), 1.0, Timestamp::now(), Some(user_id), Some(creator_id))?;

// Retrieve a ranked feed. Name the profile. tidalDB executes the pipeline.
let results = db.retrieve(&Retrieve::builder().for_user(user_id).profile("for_you").limit(50).build()?)?;

// Search: BM25 + semantic similarity fused via RRF.
let results = db.search(&Search::builder().query("jazz piano tutorial").for_user(user_id).limit(20).build()?)?;

db.close()?;

What it replaces

System	tidalDB equivalent
Elasticsearch	Tantivy BM25 text index (derived, crash-recoverable)
Redis	Lock-free in-memory signal ledger — decay scores, windowed counters
Kafka	Write-ahead log — durable, ordered, replayable
Feature store	Signal aggregates + user preference vectors (updated at write time)
Vector DB	USearch HNSW — embedded, f16 quantized, predicate-filtered ANN
Ranking service	25 named profiles, scored at query time, swappable by name

Key capabilities

Signals with native decay — declare view with a 7-day half-life; the database applies it at query time. No trending_score_7d field to maintain.
25 built-in ranking profiles — trending, hot, for_you, following, related, hidden_gems, top_week, shuffle, controversial, and more. Name the profile; the database executes the full pipeline.
Hybrid search — BM25 full-text + ANN semantic similarity, fused via Reciprocal Rank Fusion, personalized by user preference vector.
Composable filters — filter by category, format, duration, language, engagement threshold, location, collection membership, and more — any combination, all composable.
Diversity as a query constraint — max_per_creator: 2 belongs in the query, not your API layer.
Feedback loop in the write path — a signal write atomically updates the item's ledger, the user's preference vector, and relationship weights. The next ranking query — 100ms later — reflects it.
Cold start handled — new content gets an exploration budget; new users get sensible defaults. No application logic required.
Cohort-scoped trending — "trending among US users aged 18-24 who engage with jazz" is one query, not a pipeline.
Embeddable first — runs in your process. Arc<TidalDb> is Send + Sync. No operational overhead.

Getting started

Pick the path that matches how you plan to use tidalDB today. Every option below is self-contained and ships in this repo.

1. Embed tidalDB inside your Rust service (library mode)

Setup

Add the git dependency:

[dependencies]
tidaldb = { git = "https://github.com/your-org/tidalDB", rev = "..." }

Define your schema before opening the database (decay, windows, text fields, embeddings). The snippet in Quickstart, Step 2 is a ready-to-copy template.

Choose storage mode when building:

let db = tidaldb::TidalDb::builder()
    .with_schema(schema)
    .ephemeral()               // in-memory for tests
    // .with_data_dir("/var/lib/tidaldb") // persistent deployment
    .open()?;

Run the end-to-end sample:

cargo run --manifest-path tidal/Cargo.toml --example quickstart

Usage

Call db.signal(...), db.signal_with_context(...), and db.retrieve(...) / db.search(...) from the same process; no network stack required.
Wrap the instance in Arc<TidalDb> to share it across threads or tasks.
Persisted deployments can be inspected with the CLI tool: cargo run -p tidalctl -- status --path /var/lib/tidaldb.
Full walkthrough: QUICKSTART.md and API.md.

2. Run the standalone HTTP server (`tidal-server`)

Why: you want a ready-to-run HTTP facade without writing Axum/Actix glue.

cargo run -p tidal-server -- \
  standalone \
  --listen 127.0.0.1:9400 \
  --schema tidal-server/config/default-schema.yaml

Options:

--data-dir /var/lib/tidaldb switches to persistent storage.
Provide your own schema file (YAML) to match your signal mix.

Usage:

# register metadata + embedding
curl -X POST http://127.0.0.1:9400/items \
  -H 'Content-Type: application/json' \
  -d '{ "entity_id": 1, "metadata": { "title": "Jazz Piano", "category": "music" } }'
curl -X POST http://127.0.0.1:9400/embeddings \
  -H 'Content-Type: application/json' \
  -d '{ "entity_id": 1, "values": [0.1, 0.2, 0.3] }'

# write engagement (supports user/creator context)
curl -X POST http://127.0.0.1:9400/signals \
  -H 'Content-Type: application/json' \
  -d '{ "entity_id": 1, "signal": "view", "weight": 1.0, "user_id": 42 }'

# query
curl "http://127.0.0.1:9400/feed?user_id=42&profile=for_you&limit=20"
curl "http://127.0.0.1:9400/search?query=jazz%20piano&user_id=42&limit=5"
curl http://127.0.0.1:9400/health

The default schema lives at tidal-server/config/default-schema.yaml. Edit it (or provide your own path) to align with your application’s signals, text fields, and embedding slots.

3. Wrap it in an HTTP service you control

Expose tidalDB through your favorite web framework; the repo ships runnable templates.

Axum sample (tidal/examples/axum_embedding.rs)

cargo run --example axum_embedding --manifest-path tidal/Cargo.toml

Usage:

curl -X POST http://127.0.0.1:3000/signal \
     -H 'Content-Type: application/json' \
     -d '{ "entity_id": 1, "signal": "view", "weight": 1.0 }'
curl "http://127.0.0.1:3000/feed?user_id=42"
curl http://127.0.0.1:3000/health

The example handles schema setup, wraps Arc<TidalDb> in Axum State, and maps TidalError to HTTP responses.

Actix sample (tidal/examples/actix_embedding.rs)
```
cargo run --example actix_embedding --manifest-path tidal/Cargo.toml
# curl http://127.0.0.1:3001/health
```
Demonstrates sharing Arc<TidalDb> through web::Data and using Actix’s shutdown hooks.

Use either sample as a starting point for microservices that prefer a client/server boundary.

4. Run the Forage demo server (Axum + UI)

Want to see tidalDB powering a live personalization surface? Forage is a thin Axum server + feed UI that talks to a tidalDB instance embedded in-process.

cargo run -p forage-server --manifest-path applications/forage/server/Cargo.toml
open http://localhost:4242

Flags:

--ephemeral to keep everything in-memory.
--data-dir ~/.forage/data to point at a custom persistent directory.

Usage:

curl -X POST http://localhost:4242/signal \
     -H "Content-Type: application/json" \
     -d '{ "user_id": 1, "item_id": 42, "signal_type": "view" }'
curl "http://localhost:4242/feed?user=1&limit=7"

The UI shows seeded users, exploration labels, and real-time adaptation; see applications/forage/readme.md for the full loop.

5. Run the cluster server + Docker image

Need a single endpoint that fronts the built-in simulated cluster? Use tidal-server in cluster mode. It spins up the multi-region fabric, ships WAL batches between regions, and exposes /signals, /feed, /search plus cluster-management routes.

cargo run -p tidal-server -- \
  cluster \
  --listen 0.0.0.0:9500 \
  --schema tidal-server/config/default-schema.yaml \
  --topology tidal-server/config/default-cluster.yaml

Key endpoints:

curl http://127.0.0.1:9500/health
curl -X POST http://127.0.0.1:9500/signals -d '{ "entity_id": 1, "signal": "view", "weight": 1.0 }'
curl "http://127.0.0.1:9500/feed?profile=trending&region=eu-west"
curl http://127.0.0.1:9500/cluster/status
curl -X POST http://127.0.0.1:9500/cluster/promote -d '{ "region": "eu-west" }'

Cluster mode currently replicates global signals (no user_id / creator_id contexts) so that followers can stay in sync with the leader’s WAL stream. See docs/runbooks/cluster.md for operational steps, failure drills, and API references.

Prefer containers? Build the provided image and run it anywhere:

docker build -f docker/cluster/Dockerfile -t tidal-cluster .
docker run --rm -p 9500:9500 tidal-cluster

Mount your own schema/topology files with -v if you want different regions or signal definitions.

6. Simulate a multi-region cluster in tests

The raw SimulatedCluster harness (no HTTP) remains available for property tests and fuzzing.

cargo test --test m8_uat
cargo test --test m8_uat uat_step3 -- --nocapture   # run a single scenario

Tweak tidal/tests/m8_uat.rs to script specific replication, failover, and migration scenarios inside your own test suites.

MSRV: Rust 1.91

Documentation

Document	Contents
QUICKSTART.md	Step-by-step guide: schema, ingest, signals, ranking, search
API.md	Full API reference with code examples
VISION.md	Problem statement and design thesis
ARCHITECTURE.md	Storage, signal system, vector index, query pipeline
USE_CASES.md	14 content discovery surfaces, filter and sort references

Status

Milestones completed:

Storage engine, WAL, entity store, signal ledger
RETRIEVE query: candidate retrieval, filtering, scoring, diversity, pagination
Vector index (USearch HNSW) with adaptive filtered search
25 built-in ranking profiles
BM25 full-text search (Tantivy) + hybrid RRF fusion
Creator search and creator profiles
Cohort-scoped signal aggregation and trending
Social graph (follows, blocks, following feed)
Collections, saved searches, autocomplete suggestions
Session and agent context (short-lived signals, preference decay)
Crash recovery, graceful degradation, rate limiting, diagnostics
Scale: tested to 1M items; scale benchmarks passing

The API surface is stable for the implemented features. Breaking changes are possible before 1.0.

License

MIT

README.md Unescape Escape

tidalDB

What it looks like

What it replaces

Key capabilities

Getting started

1. Embed tidalDB inside your Rust service (library mode)

2. Run the standalone HTTP server (tidal-server)

3. Wrap it in an HTTP service you control

4. Run the Forage demo server (Axum + UI)

5. Run the cluster server + Docker image

6. Simulate a multi-region cluster in tests

Documentation

Status

License

README.md

2. Run the standalone HTTP server (`tidal-server`)