--- title: "Why we're building tidalDB" date: "2026-02-20" author: "Jordan Washburn" description: "Every content platform builds the same 6-system stack from scratch. We're replacing it with one database." tags: ["vision", "architecture"] --- Every platform that serves personalized content — a media library, a social feed, a marketplace, a content discovery surface — eventually builds the same distributed system from scratch. Elasticsearch for retrieval. Redis for hot signals. Kafka for event ingestion. A feature store for user profiles. A vector database for semantic search. A ranking service that tries to stitch all of the above together into a single ordered list. We've built this stack. We've operated it. We've watched the seams between systems become the place where correctness dies — stale signals in Redis that don't match Elasticsearch, Kafka consumers that lag by seconds when they should lag by zero, cache invalidation bugs that surface as "why did the user see that item again?" The root cause is clear: none of these systems were built for the ranking problem. They treat it as an afterthought. A sort clause. A float field. A bolt-on scoring function. ## The observation Ranking is not a feature. It is a primitive. A signal that decays over time is not a field you update with a cron job. It is a type the database understands — with a half-life declared in schema and a decayed value computed at query time. A "trending" sort is not a formula your application computes and stores in a column. It is a built-in sort mode that reads signal velocity natively. A diversity constraint — "no more than 2 items from the same creator" — is not post-processing logic in your API layer. It is a query parameter the database enforces after scoring. Once you see it this way, the 6-system stack looks like what it is: scar tissue from forcing the wrong abstraction. ## What tidalDB is A single-node-first, embeddable Rust database designed specifically for personalized content ranking. One process. One query interface. One operational model. The core primitives: - **Entities** — Items, Users, Creators. Each with metadata, an embedding slot, and an attached signal ledger. - **Signals** — Typed, timestamped event streams with native decay, velocity, and windowed aggregation. You declare a `view` signal with a 7-day half-life. The database does the rest. - **Ranking Profiles** — Named, versioned scoring functions that live in the database. Reference signals, relationships, recency curves, and diversity rules. Swap at query time by name. - **One query** — Candidate retrieval, filtering, personalized ranking, and diversity enforcement in a single operation. The query that currently takes 6 systems to produce: ``` RETRIEVE items FOR USER @user_id CONTEXT feed USING PROFILE for_you FILTER unseen, unblocked, format:video, duration:short DIVERSITY max_per_creator:2, format_mix:true LIMIT 50 ``` ## The feedback loop When a user views, likes, skips, or hides content, the signal is written directly to the database. The item's signal ledger updates. The user's preference vector shifts. The relationship weight between user and creator adjusts. All atomically, all in the same write transaction. The next ranking query — even 100ms later — reflects the updated state. No Kafka consumer to lag. No feature store sync to schedule. No cache to invalidate. The write path and the read path are one system. ## What we're building first tidalDB is in active development. We're building in Rust, starting single-node, and working toward the first public release. The roadmap: 1. **Storage foundation** — WAL, entity store, signal ledger with forward-decay scoring 2. **Query engine** — The RETRIEVE/SEARCH/SUGGEST operations with filtering and ranking 3. **Vector and text search** — HNSW via USearch, BM25 via Tantivy, hybrid fusion with RRF 4. **The full query surface** — All sort modes, all filters, diversity enforcement, pagination We're building in public. Every architectural decision, every benchmark result, every trade-off gets documented here. ## Why open source The personalized content ranking problem is universal. Every content platform needs it. Making the solution proprietary would limit adoption to teams willing to vendor-lock on a database. That's not the goal. The goal is a tool that an engineering team can embed in their process, point at their data, and get correct ranking in one query. Open source, MIT licensed, embeddable. If you're operating a 6-system stack for content ranking and wondering why it has to be this hard — it doesn't. That's why we're building tidalDB. --- Follow the build on [GitHub](https://github.com/orchard9/tidalDB) or read the next post when it drops.