tidaldb/site/content/blog/every-platform-builds-the-same-6-systems.mdx
jordan 192c473f55 feat: complete Milestone 5 — full-text search, RRF fusion, and creator search
- M5p1: BM25 text indexing via Tantivy with background syncer (0.26ms @ 10K docs)
- M5p2: RRF fusion layer combining BM25 + ANN scores (46µs @ 1K candidates)
- M5p3: unified Search query API (8-stage pipeline, BM25 + vector + ranking)
- M5p4: creator text + vector indexing and creator search executor (< 20ms @ 200 creators)
- Refactor db/mod.rs into focused sub-modules (creators, items, sessions, signals, etc.)
- Decompose monolithic files into directory modules (query/executor, ranking/diversity, etc.)
- Split brute.rs → brute/mod.rs + brute/tests.rs; extract search executor helpers
- Add benches: fusion, search, session, text_index
- Add M5 UAT test suites (m5_uat, m5_search, m5p4_creator_search, text_index)
- Update blog posts, roadmap, content strategy, and M5 planning docs
- Add tmp/ and .claude/worktrees/ to .gitignore

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 23:53:16 -07:00

146 lines
17 KiB
Plaintext

---
title: "Every content platform builds the same 6 systems from scratch"
date: "2026-02-20"
author: "tidalDB"
description: "The Elasticsearch + Redis + Kafka + feature store + vector DB + ranking service stack is not an architecture. It is scar tissue. Here is why."
tags: ["architecture", "vision", "recommendation-systems"]
---
You have operated this system. You may be operating it right now.
Elasticsearch for retrieval. Redis for hot signals. Kafka for event ingestion. A feature store for user profiles. A vector database for semantic similarity. A ranking service that stitches all five together into a sorted list and hopes the data is consistent by the time it arrives.
Six systems. Six deployment targets. Six failure modes. Six sets of credentials, backup strategies, scaling characteristics, and on-call runbooks. All of them maintained by your team, all of them in service of one question: *given this user, right now, what should they see?*
This post is about why the stack exists, why it persists, and what should be true instead.
## The six systems, named
They show up in the same order at every company. The specifics vary -- Solr instead of Elasticsearch, Memcached instead of Redis, Pulsar instead of Kafka -- but the shape is identical.
**System 1: The search index.** Elasticsearch, Solr, or Typesense. Ingests your content catalog, tokenizes text, builds an inverted index, returns results ranked by BM25. It handles keyword search well. It handles everything else poorly. You will spend months teaching it to sort by "trending" using a score field you update from outside, on a schedule, that is stale before the update finishes.
**System 2: The cache layer.** Redis or Memcached. Holds the hot data that the search index cannot serve fast enough -- trending scores, view counts, precomputed ranking features. You will write a cache invalidation layer. It will have bugs. The bugs will manifest as users seeing content that should have been suppressed, or not seeing content that should have surfaced. These bugs will be intermittent, hard to reproduce, and never fully resolved.
**System 3: The event bus.** Kafka, Pulsar, or Kinesis. Ingests engagement signals -- views, likes, skips, shares -- and routes them to consumers that update every other system. The consumers will lag. Not always. Not predictably. But at 2am on a Saturday when a piece of content goes viral, the lag between "user liked this" and "the ranking query reflects it" will stretch from milliseconds to seconds to minutes. Your users will notice.
**System 4: The feature store.** Feast, Tecton, or a homegrown Redis-backed key-value store. Holds user profiles, engagement histories, computed features. Exists because the ranking service needs user context at query time and cannot afford to compute it on the fly. The feature store introduces its own consistency problem: the features it serves are snapshots. By the time they reach the ranker, the user may have liked three more items and blocked a creator. The features do not know this.
**System 5: The vector database.** Pinecone, Weaviate, Qdrant, Milvus, or pgvector bolted onto PostgreSQL. Holds content embeddings for semantic similarity search. Takes a user preference vector or a query embedding, returns the nearest neighbors. The problem: it knows nothing about signals, recency, relationships, or diversity. It returns semantically similar content. Whether that content is trending, stale, hidden by the user, or from a blocked creator -- not its concern.
**System 6: The ranking service.** The application you wrote. A microservice that calls systems 1 through 5 in sequence, merges their outputs, applies scoring logic, enforces diversity rules, handles edge cases, and returns a sorted list. This is the system that has the most bugs, the most latency, and the most institutional knowledge locked in the heads of two engineers who are not allowed to go on vacation at the same time.
Six systems. None of them were built for the ranking problem. All of them are pressed into service because there is no single system that was.
## Where correctness dies
The failure modes are not in the systems themselves. Redis is fast. Kafka is durable. Elasticsearch is a competent search engine. The failure modes live in the seams between them.
**Stale signals.** A user likes an item. The event enters Kafka. A consumer processes it and updates Redis. Another consumer updates the feature store. A third updates Elasticsearch's score field. Each update happens at a different time. Between the first update and the last, the ranking service is reading a mix of old and new state. The feed the user sees is computed from data that contradicts itself.
This is not a theoretical concern. It is Tuesday.
**Cache invalidation.** The trending score in Redis says an item is hot. The engagement data in the feature store says it is not -- the initial burst of views came from a bot network and the quality signals collapsed an hour ago. The cache TTL has not expired. The item remains in the trending feed for another 14 minutes. Fourteen minutes is an eternity in a content platform. Thousands of users see a recommendation the system already knows is wrong.
**ETL lag.** The feature store runs a batch pipeline every 15 minutes to recompute user preference vectors. A user blocks a creator at minute 1. For the next 14 minutes, the blocked creator's content still appears in the user's feed. Not because the system is broken. Because the architecture is designed around batched state synchronization, and batched state synchronization is, by definition, eventually wrong.
**The feedback gap.** A user skips three items in a row from the same creator. The skip events enter Kafka. They will eventually update the user's preference vector in the feature store and the creator's penalty score in Redis. Eventually. In the meantime, the ranking service is still using the stale preference vector and the stale creator score. It recommends a fourth item from the same creator. The user taps "Not interested." A fifth item appears. The user closes the app.
This is not a bug in any one system. It is the architecture working exactly as designed. The architecture is the bug.
**Agents make the seams worse.** When you add an LLM-mediated agent to the loop, the agent needs to ground its answer in fresh memory and emit feedback (preference hints, critiques, reward). In the 6-system stack those feedback signals live in a scratchpad or a sidecar vector store. None of the six systems know about them, which means the agent is reasoning over a different world than the ranking service. Latency compounds; correctness dies even faster.
## How we got here
The 6-system stack is not the product of deliberate design. It is an accretion. Understanding how it forms explains why it persists.
**Phase 1: Search.** The platform launches with a content catalog and a search bar. The team picks Elasticsearch because it handles full-text search. This is a reasonable decision. Elasticsearch is good at search.
**Phase 2: Ranking.** Users want more than search. They want a feed -- personalized, sorted by relevance, refreshed on every visit. Elasticsearch can sort by a score field, so the team adds a `ranking_score` field and updates it with a cron job. The cron job reads engagement data from the application database, computes a formula, and writes the result to Elasticsearch. This works for six months.
**Phase 3: Speed.** The ranking formula needs real-time signal data -- view counts, like counts, trending velocity. The application database cannot serve these at the read frequency the ranking service demands. The team adds Redis as a hot cache. Now the ranking formula reads from Redis instead of the application database. Engagement data flows into Redis via application writes. This works, but cache invalidation becomes a recurring source of bugs.
**Phase 4: Scale.** The platform grows. Engagement events arrive at thousands per second. Writing directly to Redis and Elasticsearch from the application path introduces latency on every user action. The team adds Kafka as a buffer. Events flow into Kafka, and consumers asynchronously update Redis, Elasticsearch, the feature store, and the vector database. The system is now eventually consistent. "Eventually" is doing a lot of work in that sentence.
**Phase 5: Personalization.** Users want personalized results, not just globally popular content. Personalization requires per-user feature vectors -- engagement history, topic affinity, creator preferences. These features are too expensive to compute at query time. The team adds a feature store that batch-computes user vectors and serves them as key-value lookups. The feature store is always stale by the duration of its batch interval.
**Phase 6: Semantic search.** Users expect "find me something like this" to work. Keyword matching cannot do this. The team adds a vector database for embedding-based similarity search. The vector database knows nothing about engagement, recency, or user context. The ranking service must call it separately and merge its results with the keyword results, the cached signals from Redis, and the user features from the feature store.
Each step is individually rational. The result is collectively irrational. A distributed system with six sources of truth, six consistency models, and one ranking service trying to produce a coherent answer from all of them.
## The root cause
The stack exists because existing databases were not built with ranking in mind. This is not a criticism -- PostgreSQL, Elasticsearch, and Redis were built to solve different problems, and they solve them well. But when you ask a search engine to be a ranking engine, you inherit the wrong abstraction.
A search engine models data as documents with fields. You search for documents matching a query. You sort by a field. The field is a static value that you update from outside.
But ranking is not a static value. A "trending score" is a velocity -- the rate of change of engagement signals over a time window. It changes every second. An "engagement decay score" is a function of time since the last signal event. It changes continuously, without any new data arriving. A "personalized relevance score" is a function of the user's preference vector, the item's embedding, the user's relationship to the creator, the item's signal history, and the diversity of the current result set. It is different for every user, every query, every moment.
None of these are fields. They are computations that depend on temporal state, user context, and signal dynamics. Forcing them into a field-update model is what creates the 6-system stack. You need Redis because the search engine cannot compute these values fast enough. You need Kafka because updating them synchronously is too slow. You need a feature store because user context is too expensive to derive at query time. You need a vector database because semantic similarity is a different index structure entirely.
The seams are not incidental. They are structural. They exist because the foundational abstraction -- data as documents with static fields -- does not fit the problem.
## What should be true
A database that understands ranking as a primitive would not need the stack. Here is what it would look like.
**Signals are a schema-level type.** A "view" signal is not a counter you increment in Redis and hope stays consistent. It is a typed, timestamped event stream declared in the database schema, with a decay rate, a set of time windows, and velocity computation -- all maintained by the database. You write the event. The database handles aggregation, windowing, and decay. When you query for "trending," the database reads signal velocity directly. No external cache. No stale scores.
**User context is a database-managed state.** The user's preference vector is not a row in a feature store updated every 15 minutes. It is a living embedding that the database shifts every time the user engages with content. A like shifts it toward the item's embedding. A skip adds the item to a hard-negative bitmap -- the user never sees it again. The next query reflects both. Not in 15 minutes. Now.
**The write path and the read path are one system.** When a user likes an item, the database atomically updates the item's signal ledger, the user's preference vector, and the user-to-creator relationship weight. No event bus between the engagement and the ranking update. No consumer lag. No eventual consistency. The write *is* the ranking update.
**Negative signals are equal citizens.** A skip is not the absence of a like. It is data. A hide is a permanent exclusion. A block removes all of a creator's content from all future queries. These are not afterthought filter operations applied in the ranking service. They are first-class signal types with their own decay rates, their own velocity, and their own weight in the scoring function.
**Diversity is a query constraint.** "No more than 2 items per creator" is not a post-processing step in your API layer. It is a parameter the database enforces after scoring, as part of the query execution pipeline. The application specifies the constraint. The database enforces it. The result set is reordered, not reduced.
**All sort modes are native.** Trending, hot, rising, controversial, hidden gems, top-this-week, shuffle -- these are not formulas your application computes and passes to the database as a sort key. They are built-in sort modes the database executes natively, using signal velocity, windowed aggregation, and decay functions it already maintains.
This is not a fantasy. Every one of these properties follows from a single architectural decision: model signals, decay, velocity, and user context as database primitives, not as application logic distributed across six systems.
## One question, one query
The 6-system stack exists to answer one question: given this user, right now, what should they see?
That question should be one query.
Not six network calls. Not a ranking service that merges five data sources and hopes they agree. Not a system where "consistency" means "consistent within each subsystem, inconsistent across all of them."
One query that retrieves candidates, applies filters, scores using live signals and user context, enforces diversity, and returns a ranked list. One query where the data is never stale because the write path and the read path share a storage model. One query where a signal written 100 milliseconds ago is reflected in the result.
```
RETRIEVE items
FOR USER @user_id
CONTEXT feed
USING PROFILE for_you
FILTER unseen, unblocked, format:video, duration:short
DIVERSITY max_per_creator:2, format_mix:true
LIMIT 50
```
That is what six systems currently produce. It should be one query that an agent can issue, jot its feedback into, and trust to be correct on the next round.
The database that treats ranking as a primitive -- not as an afterthought bolted on top of a search engine, not as a formula computed in a microservice, not as a cache warmed from a batch pipeline -- does not need the stack. It replaces it.
## A fair read of the existing systems
To be clear: these systems are good at what they were designed to do.
- **Search indexes (Elasticsearch, Solr, Typesense):** excellent full-text retrieval, BM25 relevance, and query/filter infrastructure.
- **Caches (Redis, Memcached):** excellent low-latency read/write paths for hot counters and precomputed features.
- **Event buses (Kafka, Pulsar, Kinesis):** excellent durable, high-throughput event transport and decoupled consumer architectures.
- **Feature stores (Feast, Tecton, homegrown):** excellent offline/online feature serving patterns for ML pipelines.
- **Vector databases (Pinecone, Weaviate, Qdrant, Milvus, pgvector):** excellent nearest-neighbor retrieval over embeddings with metadata filtering.
- **Ranking services (custom microservices):** excellent place to encode product-specific ranking logic when no single system owns the full problem.
- **Integrated retrieval/ranking platforms (for example, Vespa):** excellent end-to-end search and ranking infrastructure when teams can operate larger specialized serving systems.
**What makes tidalDB different (one line):** it treats signals, user context, ranking, diversity, and feedback writes as one atomic database system instead of six synchronized subsystems.
**Where we are intentionally focused:** personalized content loops where feedback intent is explicit -- `skip_for_now` (soft), `not_for_me` (preference), `low_quality` (quality), `hide/mute` (hard exclude) -- and the next ranked result updates immediately; not generic search infrastructure breadth.
Every content platform builds the same 6 systems because no database was built for this problem. The stack is not an architecture. It is scar tissue from the absence of one.
---
*tidalDB is an open-source, embeddable Rust database for personalized content ranking. Follow the build on [GitHub](https://github.com/orchard9/tidalDB).*