# Content Strategy

Blog posts mapped to the tidalDB roadmap. Each entry identifies the moment worth writing about, the thesis that makes it shareable, and the type of post it demands.

The audience is engineers who have built or are currently maintaining recommendation and discovery systems -- the people running the 6-system stack this database replaces. They know what Kafka lag feels like at 3am. They know why cache invalidation bugs in the ranking pipeline are the ones that never get root-caused. They will smell marketing language from the first sentence. Respect that.

---

## Publishing Principles

**Write when something is true, not when something is scheduled.** A blog post published the day a milestone passes its UAT is credible. A blog post published before the code works is fiction.

**One insight per post.** The reader should leave with a single idea they did not have before. If the post contains two insights, it is two posts.

**Code proves claims.** Every technical assertion is backed by a code example or a benchmark number from the actual codebase. Not a prototype. Not a plan. The shipped code.

**The title is the thesis.** If the title does not work as a standalone sentence that makes an engineer stop scrolling, the post is not ready.

---

## Content Calendar

### Pre-Implementation (Now)

These posts can be written before the engine is feature-complete. They draw on the vision, architecture research, and the problem space -- not on shipped code.

#### Post 1: "Every content platform builds the same 6 systems from scratch" [PUBLISHED]

- **Type:** Vision / Problem Statement
- **Thesis:** The Elasticsearch + Redis + Kafka + feature store + vector DB + ranking service stack is not an architecture. It is scar tissue. The seams between these systems are where correctness dies.
- **Source material:** VISION.md, thoughts.md (Part VI)
- **When to publish:** Any time. This post defines the problem and does not depend on implementation progress.
- **Why it matters:** This is the foundational narrative. Every subsequent post assumes the reader understands this problem. It also serves as the litmus test for whether the audience cares -- if this post does not resonate, the subsequent ones will not either.
- **Structure:** Problem statement. The 6 systems named and indicted. The seams enumerated (stale signals, ETL lag, cache invalidation, operational burden). The thesis: ranking is not a feature, it is a primitive. End with the one-query vision, not with a product pitch.

---

### Milestone 1: Signal Engine

M1 proves that temporal signals with O(1) decay, velocity, and windowed aggregation work as a database primitive. This is the most technically interesting milestone for blog content because the math is elegant and the performance numbers are dramatic.

#### Post 2: "Running decay scores are O(1) -- here is the math" [PUBLISHED]

- **Type:** Technical Deep Dive
- **Roadmap phase:** m1p4 (Signal Ledger) completion
- **Thesis:** The forward-decay formula `S(t) = S(t_prev) * exp(-lambda * dt) + weight` eliminates raw-event scanning at query time. Three `exp()` calls on write, one on read. 15 nanoseconds per entity. Every platform computing `trending_score = views / (age + 2)^1.8` in application code is doing O(N) work that should be O(1).
- **Source material:** docs/research/tidaldb_signal_ledger.md, ARCHITECTURE.md (Signal System section), m1p4 task docs
- **When to publish:** After m1p4 passes UAT with benchmark numbers in hand.
- **Code to include:** The `EntitySignalState` struct. The forward-decay write path. The out-of-order event correction. Benchmark output showing 200-entity scoring pass under 5 microseconds.
- **Why it matters:** This is the post that demonstrates tidalDB is not vaporware. The math is verifiable. The benchmarks are reproducible. Engineers who have implemented trending scores in Redis will immediately understand the value.

#### Post 3: "What three databases taught us before we wrote a line of code" [PUBLISHED]

- **Type:** Architecture Decision Record
- **Roadmap phase:** m1p1-m1p3 completion (the foundation phases)
- **Thesis:** We studied Engram (cognitive memory), Citadel (append-only logging), and StemeDB (knowledge graph) -- three purpose-built databases in the same codebase -- and stole their best patterns. WAL-first durability from Citadel. Cache-line aligned hot structs from Engram. Subject-prefix key encoding from StemeDB. Background materialization from StemeDB. Here is what converged and what the gaps taught us.
- **Source material:** thoughts.md (all six parts), CODING_GUIDELINES.md
- **When to publish:** After m1p3 (Storage Engine) is complete. The patterns referenced are already implemented.
- **Code to include:** Key encoding format. Cache-line aligned struct. Group commit writer. Side-by-side comparison of the pattern in the source database and in tidalDB.
- **Why it matters:** Engineers respect builders who study prior art. This post establishes technical credibility and shows the architectural foundation is grounded in real patterns, not invented from scratch.

#### Post 4: "Signals wrote 100ms ago. The query sees them now." [PUBLISHED]

- **Type:** Devlog / Milestone Announcement
- **Roadmap phase:** m1p5 (Entity CRUD and Signal Write API) -- M1 complete
- **Thesis:** Milestone 1 is done. A developer can open a tidalDB instance, define signal types with decay rates and windows, write 10,000 engagement events, and read back decay-correct scores that match analytical computation to 6 decimal places. Including after a crash. The UAT scenario passes.
- **Source material:** The m1p5 integration test, benchmark results, git log for the M1 period
- **When to publish:** The day M1 UAT passes.
- **Code to include:** The full UAT scenario (or a clean excerpt). `TidalDB::open()` with schema. Signal write. Decay score read. Before/after crash recovery.
- **Why it matters:** This is the first "it works" post. It converts skeptics from "interesting idea" to "this is real." The UAT code is the proof.

---

### Milestone 2: Ranked Retrieval

M2 proves that a single query can retrieve, filter, score, and enforce diversity over live signals. This is where tidalDB stops being a signal engine and starts being a database.

#### Post 5: "One query. Six systems. Under 50 milliseconds." [PUBLISHED]

- **Type:** Technical Deep Dive / Announcement
- **Roadmap phase:** m2p5 (RETRIEVE Query Executor) -- M2 complete
- **Thesis:** `RETRIEVE items USING PROFILE trending DIVERSITY max_per_creator:1 LIMIT 25` executes in under 50ms on 10K items. It retrieves candidates via ANN, filters by metadata, scores using live decay signals and velocity, enforces diversity, and returns a ranked list. That is what Elasticsearch + Redis + a ranking service produce. It is one query here.
- **Source material:** m2p5 integration test, benchmark results, the dependency DAG showing how all M2 phases compose
- **When to publish:** After M2 UAT passes.
- **Code to include:** The RETRIEVE query. The ranked result with signal snapshots. The trending profile definition. A before/after signal burst showing the ranking change.
- **Why it matters:** This is the money post. The one-query thesis is no longer a vision document -- it is a benchmark. Engineers who operate the 6-system stack will immediately understand what this eliminates.

#### Post 6: "Diversity enforcement in 3 microseconds" [PUBLISHED]

- **Type:** Technical Deep Dive
- **Roadmap phase:** m2p4 (Diversity Enforcement)
- **Thesis:** "No more than 2 items per creator" does not belong in your API layer. It belongs in the query. tidalDB enforces diversity as a post-scoring reordering pass -- it does not reduce result count. The greedy selection algorithm runs in under 3 microseconds for 200 candidates.
- **Source material:** m2p4 task docs, VISION.md (diversity section), benchmark results
- **When to publish:** After m2p4 is complete.
- **Code to include:** The DiversitySpec. The greedy selector. A concrete example showing reordering (creator A dominates pre-diversity, balanced post-diversity). Benchmark numbers.
- **Why it matters:** Every team building a feed implements diversity in the API layer. Showing that it belongs in the database -- and costs 3 microseconds -- is a strong differentiator. This is the kind of post that gets shared in Slack channels.

#### Post 7: "Ranking profiles are data, not code" [PUBLISHED]

- **Type:** Architecture Decision Record
- **Roadmap phase:** m2p3 (Ranking Profile Engine)
- **Thesis:** Changing how content is ranked should not require a code change, a deployment, or a restart. tidalDB treats ranking profiles as versioned schema declarations. Define a profile. Name it. Swap it at query time. A/B test two profiles by name. The database executes the entire pipeline.
- **Source material:** m2p3 task docs, API.md (ranking profiles section), VISION.md
- **When to publish:** After m2p3 is complete.
- **Code to include:** A `trending` profile definition. A `for_you` profile definition. The same RETRIEVE query with two different profile names producing different orderings. The profile versioning API.
- **Why it matters:** This reframes ranking as a database concern. Engineers who maintain ranking services as separate microservices will recognize the operational simplification.

---

### Milestone 3: Personalized Ranking

M3 is where the feedback loop closes. Signal writes update the user's preference vector, the creator's interaction weight, and the item's signal ledger -- atomically, in one write. The "For You" query works.

#### Post 8: "The feedback loop that closes in one write" [PUBLISHED]

- **Type:** Technical Deep Dive
- **Roadmap phase:** m3p2 (Feedback Loop) completion
- **Thesis:** When a user likes an item, the database atomically updates: the item's like count, the user-to-creator interaction weight, and the user's preference vector (shifted toward the item's embedding). One `db.signal("like", ...)` call. No Kafka consumer to lag. No feature store to sync. No cache to invalidate. The next ranking query -- even 100ms later -- reflects the change.
- **Source material:** m3p2 task docs, ARCHITECTURE.md (Write Path section), SEQUENCE.md
- **When to publish:** After m3p2 passes UAT.
- **Code to include:** The signal write. The 10-step atomic update path. A before/after query showing the preference shift. The property test that proves hidden items and blocked creators never surface.
- **Why it matters:** The closed feedback loop is the core architectural thesis of tidalDB. This post proves it works. It is the strongest argument against the 6-system stack, because the stack's primary failure mode is feedback lag.

#### Post 9: "Negative signals are equal citizens" [PUBLISHED]

- **Type:** Architecture Decision Record
- **Roadmap phase:** m3p2 (Feedback Loop)
- **Thesis:** A skip is not the absence of a like. It is data. tidalDB treats negative signals -- skips, hides, blocks, "not interested" -- with the same precision and immediacy as positive signals. A skip within 3 seconds is a strong quality signal. A hide creates a permanent exclusion. A block removes all of a creator's content from all future queries. These are not afterthoughts. They are first-class signal types with their own decay rates, velocity, and ranking weight.
- **Source material:** VISION.md (negative signals section), USE_CASES.md (UC-01 feedback), m3p2 task docs
- **When to publish:** After m3p2 is complete. Can be bundled with or separated from Post 8.
- **Code to include:** Signal type definitions for skip, hide, block. The penalty clause in a ranking profile. The property test: 10,000 random signal sequences never produce a result where a hidden item or blocked creator appears.
- **Why it matters:** Most recommendation systems handle negative feedback as an afterthought -- a manual "not interested" button that writes to a separate blocklist. tidalDB's approach is architecturally different and engineers building these systems will recognize the improvement immediately.

#### Post 10: "Cold start without application logic" [PUBLISHED]

- **Type:** Technical Deep Dive
- **Roadmap phase:** m3p3 (Personalized Ranking Profiles)
- **Thesis:** New items with no signals get an exploration budget. New users with no history get a sensible default from population-level signals. The application does not manage either. The exploration rate decays as signals accumulate. This is declared per ranking profile, not implemented in application code.
- **Source material:** m3p3 task docs, VISION.md (cold start section)
- **When to publish:** After m3p3 is complete.
- **Code to include:** The exploration budget in a profile definition. A new item appearing in a for_you feed despite having zero signals. The decay of exploration as signals arrive.
- **Why it matters:** Cold start is the problem everyone hacks around and no one solves cleanly. Showing a database-native solution is a strong differentiator.

---

### Milestone 5: Hybrid Search

M5 merges full-text search with semantic similarity and signal-ranked results. Search and retrieval become the same system.

#### Post 11: "Search and ranking are the same system" [PUBLISHED]

- **Type:** Technical Deep Dive / Architecture Preview
- **Roadmap phase:** Published during M4. Describes the architecture that M5 will complete.
- **Status:** PUBLISHED. Written as an architectural intent post -- explains the unified pipeline design, what is already built (RETRIEVE, USearch, ranking), and what remains (Tantivy, RRF fusion, SEARCH query). Does not claim SEARCH is shipped.
- **Thesis:** Text retrieval, vector retrieval, and signal-based ranking belong in one pipeline. The data model is already unified. Fusion is arithmetic. The RETRIEVE pipeline, the HNSW index, and the ranking profiles are all in place. Three pieces of wiring remain.
- **Source material:** Published post at `site/content/blog/search-and-ranking.mdx`
- **Why it matters:** Frames the M5 work for the audience before it ships. The architectural argument stands regardless of what's wired today.

#### Post 12: "Tantivy as a derived index, not a source of truth"

- **Type:** Architecture Decision Record
- **Roadmap phase:** m5p1 (Tantivy Integration)
- **Thesis:** The entity store is the source of truth. Tantivy is a materialized view. If the index is corrupted, it can be rebuilt from the entity store. Crash recovery replays from a stored sequence number. Consistency is DB-primary, not two-phase commit. This is simpler, deterministic, and the right model for an embedded database.
- **Source material:** docs/research/tantivy.md, m5p1 task docs (once written), ARCHITECTURE.md
- **When to publish:** After m5p1 is complete.
- **Code to include:** The outbox pattern. The crash recovery sequence number. The background indexer. The consistency model.
- **Why it matters:** This is a useful architectural pattern beyond tidalDB. Engineers building systems with derived indexes will find this directly applicable.

---

### Milestone 6: Full Surface Coverage

M6 completes all 14 use cases. The content here shifts from "how does the engine work" to "what can you build with it."

#### Post 13: "14 use cases, one query engine"

- **Type:** Devlog / Announcement
- **Roadmap phase:** M6 complete
- **Thesis:** For You feeds, trending, search, following, related content, notifications, hidden gems, controversial, live content, creator discovery, user library, cohort-scoped trending -- every surface a content platform needs, driven by the same query primitives. The application specifies profiles, filters, and context. The database executes ranking.
- **Source material:** USE_CASES.md, M6 UAT results
- **When to publish:** After M6 UAT passes.
- **Code to include:** A curated selection of 4-5 queries spanning different surfaces (for_you, trending, search, hidden_gems, cohort_trending). Each with a brief setup and result.
- **Why it matters:** This is the completeness post. It demonstrates that the database is not a toy or a prototype -- it handles the full surface area of a real content platform.

#### Post 14: "Cohort-scoped trending: what is hot for people like you"

- **Type:** Technical Deep Dive
- **Roadmap phase:** M6, cohort-scoped trending phase
- **Thesis:** "What's trending" means different things to different audiences. A 22-year-old in Tokyo and a 45-year-old in Texas see different trending pages -- not because of personalization (individual preference), but because different content is genuinely trending within their respective audience segments. tidalDB maintains per-cohort signal aggregation using RoaringBitmaps for O(1) membership testing and sparse fan-out for storage efficiency.
- **Source material:** USE_CASES.md (UC-15), ARCHITECTURE.md (Cohort-scoped aggregation), API.md (Cohort Definitions)
- **When to publish:** After cohort-scoped trending passes integration tests.
- **Code to include:** Cohort definition. Three-layer query (global trending, cohort trending, search within cohort trending). The fan-out write path. Storage cost analysis.
- **Why it matters:** Cohort-scoped trending is a differentiator. Most systems compute trending globally. Slicing by audience segment is a product feature that usually requires a separate analytics pipeline. tidalDB does it natively.

---

### Production Hardening

The final phase is about trust. The content shifts from "what it does" to "why you can trust it."

#### Post 15: "Kill it at any point. It comes back correct."

- **Type:** Technical Deep Dive
- **Roadmap phase:** Production hardening -- crash recovery hardening phase
- **Thesis:** We injected faults at every write-path stage. Recovery time is under 30 seconds at 1M items. WAL replay produces state identical to pre-crash. No phantom items, no lost signals, no inconsistent aggregates. The WAL is the source of truth. Everything else is derived state that can be rebuilt.
- **Source material:** Crash recovery test results, fault injection methodology, WAL implementation
- **When to publish:** After crash recovery hardening passes.
- **Code to include:** The crash simulation test. Recovery time measurements. The WAL checkpoint and replay sequence.
- **Why it matters:** Trust is the precondition for adoption. Engineers will not embed a database they cannot crash-test. This post is the trust credential.

#### Post 16: "Graceful degradation: less precise, never wrong"

- **Type:** Architecture Decision Record
- **Roadmap phase:** Production hardening -- graceful degradation phase
- **Thesis:** Under 3x overload, tidalDB does not return errors. It reduces candidate set size, uses coarser aggregates, skips diversity enforcement, and serves from materialized cache -- in that order. Results are less precise but never incorrect. The degradation order is documented and configurable.
- **Source material:** Graceful degradation task docs, ARCHITECTURE.md (Graceful degradation)
- **When to publish:** After graceful degradation is complete.
- **Code to include:** The degradation cascade. Load test results at 1x, 2x, 3x. Latency distribution at each level.
- **Why it matters:** This is how production systems should behave. Engineers who have been paged for "ranking service returned 500" will appreciate a system that degrades gracefully instead.

---

### Ongoing / Anytime

These posts are not tied to specific milestones. They can be written whenever the insight is clear.

#### "Why not SQL"

- **Type:** Architecture Decision Record
- **Status:** READY -- code shipped, decision documented
- **Thesis:** The custom query language exists because SQL cannot express ranking semantics without losing optimization opportunities. `FOR USER` means "load this user's preference vector and relationship graph." `USING PROFILE` means "apply this named scoring function." `DIVERSITY` means "enforce post-ranking constraints." These are not WHERE clauses.
- **Source material:** thoughts.md (Part II.4), VISION.md (query examples), API.md
- **Code to read:**
  - `tidal/src/query/retrieve.rs` -- `RetrieveBuilder` showing FOR USER, USING PROFILE, DIVERSITY as typed builder methods, not string predicates
  - `tidal/src/ranking/profile.rs` -- `RankingProfile` and `CandidateStrategy` showing how profiles express scoring intent that SQL GROUP BY cannot
  - `tidal/src/query/executor.rs` -- dispatcher showing that FOR USER loads preference vector and relationship graph -- state SQL has no model for
- **When to publish:** Any time after M1. Best paired with M2 when the RETRIEVE query is functional. Both are complete.

#### "Why we chose fjall over RocksDB (for now)"

- **Type:** Architecture Decision Record
- **Status:** READY -- code shipped, decision documented
- **Thesis:** Pure Rust, `#![forbid(unsafe_code)]`, fast compile times, trait-abstracted for swap. fjall is not the fastest LSM-tree. It is the right one for an embeddable database built by a small team that values correctness over raw throughput, with a trait boundary that makes the decision reversible.
- **Source material:** thoughts.md (Part V.9), CODING_GUIDELINES.md
- **Code to read:**
  - `tidal/src/storage/engine.rs` -- the `StorageEngine` trait (six methods, zero fjall imports -- this is the abstraction boundary)
  - `tidal/src/storage/fjall.rs` -- `FjallBackend` implementing the trait; note the `fjall::Keyspace` is the only fjall type that crosses the boundary
  - `tidal/src/storage/memory.rs` -- `InMemoryBackend` proving the trait is genuinely swappable (used in all tests)
  - `tidal/Cargo.toml` -- fjall version pin, no `unsafe` in the crate's feature flags
- **When to publish:** Any time. Code has been shipped since m1p3.

#### "USearch, not from scratch"

- **Type:** Architecture Decision Record
- **Status:** READY -- code shipped, decision documented
- **Thesis:** Correct, high-performance, concurrent HNSW with SIMD distance computation is 6-12 months of dedicated work. We are not a vector database company. USearch runs in ScyllaDB, ClickHouse, and DuckDB. The FFI boundary is thin. Build what differentiates you. Borrow what does not.
- **Source material:** docs/research/ann_for_tidaldb.md, ARCHITECTURE.md (Vector Index)
- **Code to read:**
  - `tidal/src/storage/vector/mod.rs` -- `VectorIndex` trait and the module comment explaining the design decisions (VectorId = u64, L2 squared, ef_search uniformity)
  - `tidal/src/storage/vector/usearch_index.rs` -- `UsearchIndex` wrapping the USearch FFI; the wrapper is thin by design
  - `tidal/src/storage/vector/planner.rs` -- `AdaptiveQueryPlanner` with four strategies (HNSW, in-graph filter, widened beam, pre-filter brute-force) -- this is tidalDB's contribution on top of the borrowed index
  - `tidal/src/storage/vector/brute.rs` -- `BruteForceIndex` and `MockVectorIndex` proving the trait boundary is real
- **When to publish:** Any time. Code has been shipped since m2p1.

---

## Post Cadence

| Milestone | Posts | Approximate Pace | Status |
|-----------|-------|-----------------|--------|
| Pre-implementation | 1 | Publish when ready | PUBLISHED |
| M1 (Signal Engine) | 2-3 | One per phase completion | PUBLISHED |
| M2 (Ranked Retrieval) | 3 | One per major phase | PUBLISHED |
| M3 (Personalized Ranking) | 2-3 | One per key insight | PUBLISHED |
| M4 (Agent Session Layer) | 1 (Post 11, architectural preview) | Published during M4 | PUBLISHED |
| M5 (Hybrid Search) | 1 (Post 12) | After m5p1 ships | Blocked on M5 |
| M6 (Full Coverage) | 2 (Posts 13-14) | At milestone boundaries | Blocked on M6 |
| Production Hardening | 2 (Posts 15-16) | At milestone boundaries | Blocked on hardening phase |
| Ongoing / ADRs | 3 (fjall, SQL, USearch) | When the decision is fresh | READY |

**Target: 16-20 posts across the full roadmap.** Not more. Each one earns its place.

---

## What Not to Write

- Progress updates that are changelogs. ("We merged 47 PRs this month.") Nobody cares.
- Posts that announce intent without shipped code. ("We plan to build...") Ship first.
- Posts with titles that are labels. ("Q1 Update," "Phase 3 Complete.") The title is the thesis.
- Posts that explain what a concept is without showing why the reader should care. ("Windowed aggregation is...") Start with the problem.
- Posts that use "we're excited to announce." You are not excited. You are precise.

---

## Reference: Roadmap to Post Mapping

| Roadmap Phase | Post # | Title | Status |
|---------------|--------|-------|--------|
| Pre-implementation | 1 | Every content platform builds the same 6 systems from scratch | PUBLISHED |
| m1p1-m1p3 (Foundation) | 3 | What three databases taught us before we wrote a line of code | PUBLISHED |
| m1p4 (Signal Ledger) | 2 | Running decay scores are O(1) -- here is the math | PUBLISHED |
| m1p5 (M1 Complete) | 4 | Signals wrote 100ms ago. The query sees them now. | PUBLISHED |
| m2p3 (Ranking Profiles) | 7 | Ranking profiles are data, not code | PUBLISHED |
| m2p4 (Diversity) | 6 | Diversity enforcement in 3 microseconds | PUBLISHED |
| m2p5 (M2 Complete) | 5 | One query. Six systems. Under 50 milliseconds. | PUBLISHED |
| m3p2 (Feedback Loop) | 8, 9 | The feedback loop that closes in one write / Negative signals are equal citizens | PUBLISHED |
| m3p3 (Personalized Profiles) | 10 | Cold start without application logic | PUBLISHED |
| M4 Complete (Agent Session Layer) | 11 | Search and ranking are the same system | PUBLISHED |
| Any time | -- | Why we chose fjall over RocksDB (for now) | READY |
| Any time | -- | Why not SQL | READY |
| Any time | -- | USearch, not from scratch | READY |
| M5p1 (Tantivy Integration) | 12 | Tantivy as a derived index, not a source of truth | Blocked on M5p1 |
| M5 Complete | 13, 14 | 14 use cases, one query engine / Cohort-scoped trending | Blocked on M5 |
| m6p1 (Crash Recovery) | 15 | Kill it at any point. It comes back correct. | Blocked on M6 |
| m6p2 (Graceful Degradation) | 16 | Graceful degradation: less precise, never wrong | Blocked on M6 |

---

## Current Queue

**As of M4 complete (Agent Session Layer), posts 1-11 published.**

### Ready to write now

| Post | Status |
|------|--------|
| "Why we chose fjall over RocksDB (for now)" | READY -- m1p3 shipped, decision is documented |
| "Why not SQL" | READY -- any time after M1 |
| "USearch, not from scratch" | READY -- m2p1 shipped |

### Next milestone-gated posts

| Post | Blocked on |
|------|------------|
| Post 12: "Tantivy as a derived index, not a source of truth" | M5p1 (Tantivy integration) |
| Post 13: "14 use cases, one query engine" | M5 complete |
| Post 14: "Cohort-scoped trending" | M5 complete |
| Post 15: "Kill it at any point. It comes back correct." | M6p1 |
| Post 16: "Graceful degradation: less precise, never wrong" | M6p2 |