jordan 192c473f55 feat: complete Milestone 5 — full-text search, RRF fusion, and creator search

- M5p1: BM25 text indexing via Tantivy with background syncer (0.26ms @ 10K docs)
- M5p2: RRF fusion layer combining BM25 + ANN scores (46µs @ 1K candidates)
- M5p3: unified Search query API (8-stage pipeline, BM25 + vector + ranking)
- M5p4: creator text + vector indexing and creator search executor (< 20ms @ 200 creators)
- Refactor db/mod.rs into focused sub-modules (creators, items, sessions, signals, etc.)
- Decompose monolithic files into directory modules (query/executor, ranking/diversity, etc.)
- Split brute.rs → brute/mod.rs + brute/tests.rs; extract search executor helpers
- Add benches: fusion, search, session, text_index
- Add M5 UAT test suites (m5_uat, m5_search, m5p4_creator_search, text_index)
- Update blog posts, roadmap, content strategy, and M5 planning docs
- Add tmp/ and .claude/worktrees/ to .gitignore

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-21 23:53:16 -07:00

26 KiB

Raw Blame History

Content Strategy

Blog posts mapped to the tidalDB roadmap. Each entry identifies the moment worth writing about, the thesis that makes it shareable, and the type of post it demands.

The audience is engineers who have built or are currently maintaining recommendation and discovery systems -- the people running the 6-system stack this database replaces. They know what Kafka lag feels like at 3am. They know why cache invalidation bugs in the ranking pipeline are the ones that never get root-caused. They will smell marketing language from the first sentence. Respect that.

Publishing Principles

Write when something is true, not when something is scheduled. A blog post published the day a milestone passes its UAT is credible. A blog post published before the code works is fiction.

One insight per post. The reader should leave with a single idea they did not have before. If the post contains two insights, it is two posts.

Code proves claims. Every technical assertion is backed by a code example or a benchmark number from the actual codebase. Not a prototype. Not a plan. The shipped code.

The title is the thesis. If the title does not work as a standalone sentence that makes an engineer stop scrolling, the post is not ready.

Content Calendar

Pre-Implementation (Now)

These posts can be written before the engine is feature-complete. They draw on the vision, architecture research, and the problem space -- not on shipped code.

Post 1: "Every content platform builds the same 6 systems from scratch" [PUBLISHED]

Type: Vision / Problem Statement
Thesis: The Elasticsearch + Redis + Kafka + feature store + vector DB + ranking service stack is not an architecture. It is scar tissue. The seams between these systems are where correctness dies.
Source material: VISION.md, thoughts.md (Part VI)
When to publish: Any time. This post defines the problem and does not depend on implementation progress.
Why it matters: This is the foundational narrative. Every subsequent post assumes the reader understands this problem. It also serves as the litmus test for whether the audience cares -- if this post does not resonate, the subsequent ones will not either.
Structure: Problem statement. The 6 systems named and indicted. The seams enumerated (stale signals, ETL lag, cache invalidation, operational burden). The thesis: ranking is not a feature, it is a primitive. End with the one-query vision, not with a product pitch.

Milestone 1: Signal Engine

M1 proves that temporal signals with O(1) decay, velocity, and windowed aggregation work as a database primitive. This is the most technically interesting milestone for blog content because the math is elegant and the performance numbers are dramatic.

Post 2: "Running decay scores are O(1) -- here is the math" [PUBLISHED]

Type: Technical Deep Dive
Roadmap phase: m1p4 (Signal Ledger) completion
Thesis: The forward-decay formula S(t) = S(t_prev) * exp(-lambda * dt) + weight eliminates raw-event scanning at query time. Three exp() calls on write, one on read. 15 nanoseconds per entity. Every platform computing trending_score = views / (age + 2)^1.8 in application code is doing O(N) work that should be O(1).
Source material: docs/research/tidaldb_signal_ledger.md, ARCHITECTURE.md (Signal System section), m1p4 task docs
When to publish: After m1p4 passes UAT with benchmark numbers in hand.
Code to include: The EntitySignalState struct. The forward-decay write path. The out-of-order event correction. Benchmark output showing 200-entity scoring pass under 5 microseconds.
Why it matters: This is the post that demonstrates tidalDB is not vaporware. The math is verifiable. The benchmarks are reproducible. Engineers who have implemented trending scores in Redis will immediately understand the value.

Post 3: "What three databases taught us before we wrote a line of code" [PUBLISHED]

Type: Architecture Decision Record
Roadmap phase: m1p1-m1p3 completion (the foundation phases)
Thesis: We studied Engram (cognitive memory), Citadel (append-only logging), and StemeDB (knowledge graph) -- three purpose-built databases in the same codebase -- and stole their best patterns. WAL-first durability from Citadel. Cache-line aligned hot structs from Engram. Subject-prefix key encoding from StemeDB. Background materialization from StemeDB. Here is what converged and what the gaps taught us.
Source material: thoughts.md (all six parts), CODING_GUIDELINES.md
When to publish: After m1p3 (Storage Engine) is complete. The patterns referenced are already implemented.
Code to include: Key encoding format. Cache-line aligned struct. Group commit writer. Side-by-side comparison of the pattern in the source database and in tidalDB.
Why it matters: Engineers respect builders who study prior art. This post establishes technical credibility and shows the architectural foundation is grounded in real patterns, not invented from scratch.

Post 4: "Signals wrote 100ms ago. The query sees them now." [PUBLISHED]

Type: Devlog / Milestone Announcement
Roadmap phase: m1p5 (Entity CRUD and Signal Write API) -- M1 complete
Thesis: Milestone 1 is done. A developer can open a tidalDB instance, define signal types with decay rates and windows, write 10,000 engagement events, and read back decay-correct scores that match analytical computation to 6 decimal places. Including after a crash. The UAT scenario passes.
Source material: The m1p5 integration test, benchmark results, git log for the M1 period
When to publish: The day M1 UAT passes.
Code to include: The full UAT scenario (or a clean excerpt). TidalDB::open() with schema. Signal write. Decay score read. Before/after crash recovery.
Why it matters: This is the first "it works" post. It converts skeptics from "interesting idea" to "this is real." The UAT code is the proof.

Milestone 2: Ranked Retrieval

M2 proves that a single query can retrieve, filter, score, and enforce diversity over live signals. This is where tidalDB stops being a signal engine and starts being a database.

Post 5: "One query. Six systems. Under 50 milliseconds." [PUBLISHED]

Type: Technical Deep Dive / Announcement
Roadmap phase: m2p5 (RETRIEVE Query Executor) -- M2 complete
Thesis: RETRIEVE items USING PROFILE trending DIVERSITY max_per_creator:1 LIMIT 25 executes in under 50ms on 10K items. It retrieves candidates via ANN, filters by metadata, scores using live decay signals and velocity, enforces diversity, and returns a ranked list. That is what Elasticsearch + Redis + a ranking service produce. It is one query here.
Source material: m2p5 integration test, benchmark results, the dependency DAG showing how all M2 phases compose
When to publish: After M2 UAT passes.
Code to include: The RETRIEVE query. The ranked result with signal snapshots. The trending profile definition. A before/after signal burst showing the ranking change.
Why it matters: This is the money post. The one-query thesis is no longer a vision document -- it is a benchmark. Engineers who operate the 6-system stack will immediately understand what this eliminates.

Post 6: "Diversity enforcement in 3 microseconds" [PUBLISHED]

Type: Technical Deep Dive
Roadmap phase: m2p4 (Diversity Enforcement)
Thesis: "No more than 2 items per creator" does not belong in your API layer. It belongs in the query. tidalDB enforces diversity as a post-scoring reordering pass -- it does not reduce result count. The greedy selection algorithm runs in under 3 microseconds for 200 candidates.
Source material: m2p4 task docs, VISION.md (diversity section), benchmark results
When to publish: After m2p4 is complete.
Code to include: The DiversitySpec. The greedy selector. A concrete example showing reordering (creator A dominates pre-diversity, balanced post-diversity). Benchmark numbers.
Why it matters: Every team building a feed implements diversity in the API layer. Showing that it belongs in the database -- and costs 3 microseconds -- is a strong differentiator. This is the kind of post that gets shared in Slack channels.

Post 7: "Ranking profiles are data, not code" [PUBLISHED]

Type: Architecture Decision Record
Roadmap phase: m2p3 (Ranking Profile Engine)
Thesis: Changing how content is ranked should not require a code change, a deployment, or a restart. tidalDB treats ranking profiles as versioned schema declarations. Define a profile. Name it. Swap it at query time. A/B test two profiles by name. The database executes the entire pipeline.
Source material: m2p3 task docs, API.md (ranking profiles section), VISION.md
When to publish: After m2p3 is complete.
Code to include: A trending profile definition. A for_you profile definition. The same RETRIEVE query with two different profile names producing different orderings. The profile versioning API.
Why it matters: This reframes ranking as a database concern. Engineers who maintain ranking services as separate microservices will recognize the operational simplification.

Milestone 3: Personalized Ranking

M3 is where the feedback loop closes. Signal writes update the user's preference vector, the creator's interaction weight, and the item's signal ledger -- atomically, in one write. The "For You" query works.

Post 8: "The feedback loop that closes in one write" [PUBLISHED]

Type: Technical Deep Dive
Roadmap phase: m3p2 (Feedback Loop) completion
Thesis: When a user likes an item, the database atomically updates: the item's like count, the user-to-creator interaction weight, and the user's preference vector (shifted toward the item's embedding). One db.signal("like", ...) call. No Kafka consumer to lag. No feature store to sync. No cache to invalidate. The next ranking query -- even 100ms later -- reflects the change.
Source material: m3p2 task docs, ARCHITECTURE.md (Write Path section), SEQUENCE.md
When to publish: After m3p2 passes UAT.
Code to include: The signal write. The 10-step atomic update path. A before/after query showing the preference shift. The property test that proves hidden items and blocked creators never surface.
Why it matters: The closed feedback loop is the core architectural thesis of tidalDB. This post proves it works. It is the strongest argument against the 6-system stack, because the stack's primary failure mode is feedback lag.

Post 9: "Negative signals are equal citizens" [PUBLISHED]

Type: Architecture Decision Record
Roadmap phase: m3p2 (Feedback Loop)
Thesis: A skip is not the absence of a like. It is data. tidalDB treats negative signals -- skips, hides, blocks, "not interested" -- with the same precision and immediacy as positive signals. A skip within 3 seconds is a strong quality signal. A hide creates a permanent exclusion. A block removes all of a creator's content from all future queries. These are not afterthoughts. They are first-class signal types with their own decay rates, velocity, and ranking weight.
Source material: VISION.md (negative signals section), USE_CASES.md (UC-01 feedback), m3p2 task docs
When to publish: After m3p2 is complete. Can be bundled with or separated from Post 8.
Code to include: Signal type definitions for skip, hide, block. The penalty clause in a ranking profile. The property test: 10,000 random signal sequences never produce a result where a hidden item or blocked creator appears.
Why it matters: Most recommendation systems handle negative feedback as an afterthought -- a manual "not interested" button that writes to a separate blocklist. tidalDB's approach is architecturally different and engineers building these systems will recognize the improvement immediately.

Post 10: "Cold start without application logic" [PUBLISHED]

Type: Technical Deep Dive
Roadmap phase: m3p3 (Personalized Ranking Profiles)
Thesis: New items with no signals get an exploration budget. New users with no history get a sensible default from population-level signals. The application does not manage either. The exploration rate decays as signals accumulate. This is declared per ranking profile, not implemented in application code.
Source material: m3p3 task docs, VISION.md (cold start section)
When to publish: After m3p3 is complete.
Code to include: The exploration budget in a profile definition. A new item appearing in a for_you feed despite having zero signals. The decay of exploration as signals arrive.
Why it matters: Cold start is the problem everyone hacks around and no one solves cleanly. Showing a database-native solution is a strong differentiator.

Milestone 5: Hybrid Search

M5 merges full-text search with semantic similarity and signal-ranked results. Search and retrieval become the same system.

Post 11: "Search and ranking are the same system" [PUBLISHED]

Type: Technical Deep Dive / Architecture Preview
Roadmap phase: Published during M4. Describes the architecture that M5 will complete.
Status: PUBLISHED. Written as an architectural intent post -- explains the unified pipeline design, what is already built (RETRIEVE, USearch, ranking), and what remains (Tantivy, RRF fusion, SEARCH query). Does not claim SEARCH is shipped.
Thesis: Text retrieval, vector retrieval, and signal-based ranking belong in one pipeline. The data model is already unified. Fusion is arithmetic. The RETRIEVE pipeline, the HNSW index, and the ranking profiles are all in place. Three pieces of wiring remain.
Source material: Published post at site/content/blog/search-and-ranking.mdx
Why it matters: Frames the M5 work for the audience before it ships. The architectural argument stands regardless of what's wired today.

Post 12: "Tantivy as a derived index, not a source of truth"

Type: Architecture Decision Record
Roadmap phase: m5p1 (Tantivy Integration)
Thesis: The entity store is the source of truth. Tantivy is a materialized view. If the index is corrupted, it can be rebuilt from the entity store. Crash recovery replays from a stored sequence number. Consistency is DB-primary, not two-phase commit. This is simpler, deterministic, and the right model for an embedded database.
Source material: docs/research/tantivy.md, m5p1 task docs (once written), ARCHITECTURE.md
When to publish: After m5p1 is complete.
Code to include: The outbox pattern. The crash recovery sequence number. The background indexer. The consistency model.
Why it matters: This is a useful architectural pattern beyond tidalDB. Engineers building systems with derived indexes will find this directly applicable.

Milestone 6: Full Surface Coverage

M6 completes all 14 use cases. The content here shifts from "how does the engine work" to "what can you build with it."

Post 13: "14 use cases, one query engine"

Type: Devlog / Announcement
Roadmap phase: M6 complete
Thesis: For You feeds, trending, search, following, related content, notifications, hidden gems, controversial, live content, creator discovery, user library, cohort-scoped trending -- every surface a content platform needs, driven by the same query primitives. The application specifies profiles, filters, and context. The database executes ranking.
Source material: USE_CASES.md, M6 UAT results
When to publish: After M6 UAT passes.
Code to include: A curated selection of 4-5 queries spanning different surfaces (for_you, trending, search, hidden_gems, cohort_trending). Each with a brief setup and result.
Why it matters: This is the completeness post. It demonstrates that the database is not a toy or a prototype -- it handles the full surface area of a real content platform.

Type: Technical Deep Dive
Roadmap phase: M6, cohort-scoped trending phase
Thesis: "What's trending" means different things to different audiences. A 22-year-old in Tokyo and a 45-year-old in Texas see different trending pages -- not because of personalization (individual preference), but because different content is genuinely trending within their respective audience segments. tidalDB maintains per-cohort signal aggregation using RoaringBitmaps for O(1) membership testing and sparse fan-out for storage efficiency.
Source material: USE_CASES.md (UC-15), ARCHITECTURE.md (Cohort-scoped aggregation), API.md (Cohort Definitions)
When to publish: After cohort-scoped trending passes integration tests.
Code to include: Cohort definition. Three-layer query (global trending, cohort trending, search within cohort trending). The fan-out write path. Storage cost analysis.
Why it matters: Cohort-scoped trending is a differentiator. Most systems compute trending globally. Slicing by audience segment is a product feature that usually requires a separate analytics pipeline. tidalDB does it natively.

Production Hardening

The final phase is about trust. The content shifts from "what it does" to "why you can trust it."

Post 15: "Kill it at any point. It comes back correct."

Type: Technical Deep Dive
Roadmap phase: Production hardening -- crash recovery hardening phase
Thesis: We injected faults at every write-path stage. Recovery time is under 30 seconds at 1M items. WAL replay produces state identical to pre-crash. No phantom items, no lost signals, no inconsistent aggregates. The WAL is the source of truth. Everything else is derived state that can be rebuilt.
Source material: Crash recovery test results, fault injection methodology, WAL implementation
When to publish: After crash recovery hardening passes.
Code to include: The crash simulation test. Recovery time measurements. The WAL checkpoint and replay sequence.
Why it matters: Trust is the precondition for adoption. Engineers will not embed a database they cannot crash-test. This post is the trust credential.

Post 16: "Graceful degradation: less precise, never wrong"

Type: Architecture Decision Record
Roadmap phase: Production hardening -- graceful degradation phase
Thesis: Under 3x overload, tidalDB does not return errors. It reduces candidate set size, uses coarser aggregates, skips diversity enforcement, and serves from materialized cache -- in that order. Results are less precise but never incorrect. The degradation order is documented and configurable.
Source material: Graceful degradation task docs, ARCHITECTURE.md (Graceful degradation)
When to publish: After graceful degradation is complete.
Code to include: The degradation cascade. Load test results at 1x, 2x, 3x. Latency distribution at each level.
Why it matters: This is how production systems should behave. Engineers who have been paged for "ranking service returned 500" will appreciate a system that degrades gracefully instead.

Ongoing / Anytime

These posts are not tied to specific milestones. They can be written whenever the insight is clear.

"Why not SQL"

Type: Architecture Decision Record
Status: READY -- code shipped, decision documented
Thesis: The custom query language exists because SQL cannot express ranking semantics without losing optimization opportunities. FOR USER means "load this user's preference vector and relationship graph." USING PROFILE means "apply this named scoring function." DIVERSITY means "enforce post-ranking constraints." These are not WHERE clauses.
Source material: thoughts.md (Part II.4), VISION.md (query examples), API.md
Code to read:
- tidal/src/query/retrieve.rs -- RetrieveBuilder showing FOR USER, USING PROFILE, DIVERSITY as typed builder methods, not string predicates
- tidal/src/ranking/profile.rs -- RankingProfile and CandidateStrategy showing how profiles express scoring intent that SQL GROUP BY cannot
- tidal/src/query/executor.rs -- dispatcher showing that FOR USER loads preference vector and relationship graph -- state SQL has no model for
When to publish: Any time after M1. Best paired with M2 when the RETRIEVE query is functional. Both are complete.

"Why we chose fjall over RocksDB (for now)"

Type: Architecture Decision Record
Status: READY -- code shipped, decision documented
Thesis: Pure Rust, #![forbid(unsafe_code)], fast compile times, trait-abstracted for swap. fjall is not the fastest LSM-tree. It is the right one for an embeddable database built by a small team that values correctness over raw throughput, with a trait boundary that makes the decision reversible.
Source material: thoughts.md (Part V.9), CODING_GUIDELINES.md
Code to read:
- tidal/src/storage/engine.rs -- the StorageEngine trait (six methods, zero fjall imports -- this is the abstraction boundary)
- tidal/src/storage/fjall.rs -- FjallBackend implementing the trait; note the fjall::Keyspace is the only fjall type that crosses the boundary
- tidal/src/storage/memory.rs -- InMemoryBackend proving the trait is genuinely swappable (used in all tests)
- tidal/Cargo.toml -- fjall version pin, no unsafe in the crate's feature flags
When to publish: Any time. Code has been shipped since m1p3.

"USearch, not from scratch"

Type: Architecture Decision Record
Status: READY -- code shipped, decision documented
Thesis: Correct, high-performance, concurrent HNSW with SIMD distance computation is 6-12 months of dedicated work. We are not a vector database company. USearch runs in ScyllaDB, ClickHouse, and DuckDB. The FFI boundary is thin. Build what differentiates you. Borrow what does not.
Source material: docs/research/ann_for_tidaldb.md, ARCHITECTURE.md (Vector Index)
Code to read:
- tidal/src/storage/vector/mod.rs -- VectorIndex trait and the module comment explaining the design decisions (VectorId = u64, L2 squared, ef_search uniformity)
- tidal/src/storage/vector/usearch_index.rs -- UsearchIndex wrapping the USearch FFI; the wrapper is thin by design
- tidal/src/storage/vector/planner.rs -- AdaptiveQueryPlanner with four strategies (HNSW, in-graph filter, widened beam, pre-filter brute-force) -- this is tidalDB's contribution on top of the borrowed index
- tidal/src/storage/vector/brute.rs -- BruteForceIndex and MockVectorIndex proving the trait boundary is real
When to publish: Any time. Code has been shipped since m2p1.

Post Cadence

Milestone	Posts	Approximate Pace	Status
Pre-implementation	1	Publish when ready	PUBLISHED
M1 (Signal Engine)	2-3	One per phase completion	PUBLISHED
M2 (Ranked Retrieval)	3	One per major phase	PUBLISHED
M3 (Personalized Ranking)	2-3	One per key insight	PUBLISHED
M4 (Agent Session Layer)	1 (Post 11, architectural preview)	Published during M4	PUBLISHED
M5 (Hybrid Search)	1 (Post 12)	After m5p1 ships	Blocked on M5
M6 (Full Coverage)	2 (Posts 13-14)	At milestone boundaries	Blocked on M6
Production Hardening	2 (Posts 15-16)	At milestone boundaries	Blocked on hardening phase
Ongoing / ADRs	3 (fjall, SQL, USearch)	When the decision is fresh	READY

Target: 16-20 posts across the full roadmap. Not more. Each one earns its place.

What Not to Write

Progress updates that are changelogs. ("We merged 47 PRs this month.") Nobody cares.
Posts that announce intent without shipped code. ("We plan to build...") Ship first.
Posts with titles that are labels. ("Q1 Update," "Phase 3 Complete.") The title is the thesis.
Posts that explain what a concept is without showing why the reader should care. ("Windowed aggregation is...") Start with the problem.
Posts that use "we're excited to announce." You are not excited. You are precise.

Reference: Roadmap to Post Mapping

Roadmap Phase	Post #	Title	Status
Pre-implementation	1	Every content platform builds the same 6 systems from scratch	PUBLISHED
m1p1-m1p3 (Foundation)	3	What three databases taught us before we wrote a line of code	PUBLISHED
m1p4 (Signal Ledger)	2	Running decay scores are O(1) -- here is the math	PUBLISHED
m1p5 (M1 Complete)	4	Signals wrote 100ms ago. The query sees them now.	PUBLISHED
m2p3 (Ranking Profiles)	7	Ranking profiles are data, not code	PUBLISHED
m2p4 (Diversity)	6	Diversity enforcement in 3 microseconds	PUBLISHED
m2p5 (M2 Complete)	5	One query. Six systems. Under 50 milliseconds.	PUBLISHED
m3p2 (Feedback Loop)	8, 9	The feedback loop that closes in one write / Negative signals are equal citizens	PUBLISHED
m3p3 (Personalized Profiles)	10	Cold start without application logic	PUBLISHED
M4 Complete (Agent Session Layer)	11	Search and ranking are the same system	PUBLISHED
Any time	--	Why we chose fjall over RocksDB (for now)	READY
Any time	--	Why not SQL	READY
Any time	--	USearch, not from scratch	READY
M5p1 (Tantivy Integration)	12	Tantivy as a derived index, not a source of truth	Blocked on M5p1
M5 Complete	13, 14	14 use cases, one query engine / Cohort-scoped trending	Blocked on M5
m6p1 (Crash Recovery)	15	Kill it at any point. It comes back correct.	Blocked on M6
m6p2 (Graceful Degradation)	16	Graceful degradation: less precise, never wrong	Blocked on M6

Current Queue

As of M4 complete (Agent Session Layer), posts 1-11 published.

Ready to write now

Post	Status
"Why we chose fjall over RocksDB (for now)"	READY -- m1p3 shipped, decision is documented
"Why not SQL"	READY -- any time after M1
"USearch, not from scratch"	READY -- m2p1 shipped

Next milestone-gated posts

Post	Blocked on
Post 12: "Tantivy as a derived index, not a source of truth"	M5p1 (Tantivy integration)
Post 13: "14 use cases, one query engine"	M5 complete
Post 14: "Cohort-scoped trending"	M5 complete
Post 15: "Kill it at any point. It comes back correct."	M6p1
Post 16: "Graceful degradation: less precise, never wrong"	M6p2

26 KiB Raw Blame History