diff --git a/.claude/skills/write-blog/skill.md b/.claude/skills/write-blog/skill.md
index be2732d..cec92dc 100644
--- a/.claude/skills/write-blog/skill.md
+++ b/.claude/skills/write-blog/skill.md
@@ -8,47 +8,75 @@ agent: tidal-storyteller
Write and publish blog posts for tidalDB using the **tidal-storyteller** agent.
+## Content Strategy Reference
+
+**Read `docs/content-strategy.md` before writing any post.** It maps every blog post idea to a specific roadmap phase, names the thesis, identifies the source material, and specifies when the post is ready to publish.
+
+The content strategy defines 16-20 posts across the full roadmap. Do not invent posts outside this plan without first checking whether the strategy already covers the topic. If it does, follow the strategy's guidance for that post. If the strategy has a gap, propose adding to it -- do not write an orphan post.
+
+### Determining What to Write Now
+
+1. Check the **Current Status** section in `docs/planning/ROADMAP.md` to identify which phases are complete
+2. Cross-reference with the **Reference: Roadmap to Post Mapping** table in `docs/content-strategy.md`
+3. The post is ready to write when its roadmap phase has passed UAT and benchmark numbers exist
+4. Exception: Post 1 ("Every content platform builds the same 6 systems from scratch") can be written any time -- it depends on the problem, not the implementation
+
+### Current Queue (update as phases complete)
+
+As of m1p3 complete, m1p4 next:
+
+| Priority | Post | Status |
+|----------|------|--------|
+| 1 | Post 1: "Every content platform builds the same 6 systems from scratch" | Ready -- no code dependency |
+| 2 | Post 3: "What three databases taught us before we wrote a line of code" | Ready -- m1p1-m1p3 complete, thoughts.md is source |
+| 3 | "Why we chose fjall over RocksDB (for now)" | Ready -- m1p3 complete |
+| 4 | Post 2: "Running decay scores are O(1)" | Blocked on m1p4 benchmarks |
+| 5 | Post 4: "Signals wrote 100ms ago. The query sees them now." | Blocked on m1p5 / M1 UAT |
+
## When to Use
-- After completing a roadmap phase or milestone
-- When an architectural decision deserves a public narrative
-- When a benchmark result tells a compelling story
-- For "building in public" devlog entries
+- After completing a roadmap phase or milestone -- check the content strategy for which post maps to that phase
+- When an architectural decision deserves a public narrative -- check if the strategy already has an ADR planned for it
+- When a benchmark result tells a compelling story -- the strategy specifies which posts need benchmark data
- When announcing a release, feature, or open-source milestone
## Context to Load
-Before writing, the agent must read:
-1. **Relevant source files** — the code that was written or changed
-2. **Git log** — `git log --oneline` for the period covered
-3. **Research docs** — `docs/research/` for technical backing
-4. **Previous blog posts** — maintain voice consistency across posts
-5. **VISION.md** — for tonal calibration (match its conviction)
-6. **thoughts.md** — for the deeper "why" behind architectural patterns
+Before writing, the agent must read -- in this order:
+
+1. **`docs/content-strategy.md`** -- find the post in the strategy, read its thesis, source material, and publication criteria
+2. **The source material named in the strategy** -- the specific docs, research files, and task docs listed for that post
+3. **Relevant source files** -- the actual Rust code that was written or changed
+4. **Git log** -- `git log --oneline` for the period covered
+5. **Previous blog posts** -- `site/` blog content directory for voice consistency
+6. **VISION.md** -- for tonal calibration (match its conviction)
+7. **thoughts.md** -- for the deeper "why" behind architectural patterns
## Blog Post Types
### Architecture Decision Record (ADR)
**When:** A major architectural choice was made and the reasoning is worth sharing.
+**Strategy posts:** Post 3 (three databases), Post 7 (ranking profiles), Post 9 (negative signals), Post 12 (Tantivy), Post 16 (graceful degradation), plus the "anytime" ADRs (Why not SQL, Why fjall, USearch not from scratch).
**Structure:**
1. The problem in one sentence
2. What we considered (2-3 options, honestly assessed)
-3. What we chose and why — the specific evidence
+3. What we chose and why -- the specific evidence
4. Code showing the result
-5. What we'd watch for (risks, trade-offs acknowledged)
+5. What we would watch for (risks, trade-offs acknowledged)
**Title pattern:** Thesis statement, not label.
-- "Running decay scores are O(1) — here's the math" not "Signal System Architecture"
+- "Running decay scores are O(1) -- here is the math" not "Signal System Architecture"
- "Why we chose fjall over RocksDB (for now)" not "Storage Engine Decision"
### Devlog / Progress Update
**When:** A phase or milestone was completed.
+**Strategy posts:** Post 4 (M1 complete), Post 5 (M2 complete), Post 13 (M5 complete).
**Structure:**
1. What we set out to build (the goal, in one sentence)
2. The hardest part (the interesting engineering, not a changelog)
3. What surprised us (the insight the reader takes away)
4. Code showing the key breakthrough
-5. What's next (one sentence, not a roadmap dump)
+5. What is next (one sentence, not a roadmap dump)
**Title pattern:** The insight, not the timeframe.
- "10M signals, 4 microseconds" not "Phase 2 Complete"
@@ -56,6 +84,7 @@ Before writing, the agent must read:
### Technical Deep Dive
**When:** A specific technique deserves its own focused explanation.
+**Strategy posts:** Post 2 (decay math), Post 6 (diversity), Post 8 (feedback loop), Post 10 (cold start), Post 11 (hybrid search), Post 14 (cohort trending), Post 15 (crash recovery).
**Structure:**
1. The problem this solves (relatable, concrete)
2. Why the obvious approach fails (with numbers)
@@ -67,13 +96,25 @@ Before writing, the agent must read:
- "Forward decay eliminates 99% of read-time computation" not "How We Handle Decay"
- "Diversity enforcement in 3 microseconds" not "Our Ranking System"
+### Vision / Problem Statement
+**When:** Defining the problem space before or independent of implementation.
+**Strategy posts:** Post 1 (the 6-system stack).
+**Structure:**
+1. The problem, made visceral -- name the systems, name the failure modes
+2. Why it exists (historical accident, not intentional design)
+3. The thesis: what should be true instead
+4. The one-query vision (end with the destination, not a product pitch)
+
+**Title pattern:** The indictment.
+- "Every content platform builds the same 6 systems from scratch"
+
### Announcement
**When:** A release, open-source milestone, or public launch.
**Structure:**
1. What it is (one sentence)
2. What you can do with it (3-5 bullet points with code)
3. Install/quickstart command (prominent, copy-pasteable)
-4. What's different about this (the thesis — why this exists)
+4. What is different about this (the thesis -- why this exists)
5. Links: GitHub, docs, community
## Writing Standards
@@ -81,7 +122,7 @@ Before writing, the agent must read:
### Voice
- Active voice. Short sentences. Concrete nouns.
- First person plural ("we") for team decisions, second person ("you") for reader actions
-- Technical precision without jargon — say "O(1) per write" not "blazingly fast"
+- Technical precision without jargon -- say "O(1) per write" not "blazingly fast"
- Humor only when it lands naturally. Never forced.
### Structure
@@ -92,11 +133,14 @@ Before writing, the agent must read:
- 800-1500 words for devlogs, 1500-3000 for deep dives
### Code Examples
-- Must be real — from the actual codebase or a working reproduction
+- Must be real -- from the actual codebase or a working reproduction
- Must be copy-pasteable
- Include enough context to understand without reading the whole post
- Syntax highlighted with the site's muted dark palette
-- Annotated with comments only where the code isn't self-evident
+- Annotated with comments only where the code is not self-evident
+
+### Audience Calibration
+The reader is an engineer who has built or maintains a recommendation/discovery system. They know what Kafka consumer lag feels like. They know why ranking pipeline cache invalidation bugs never get root-caused. They will recognize the 6-system stack because they operate it. Do not explain what Elasticsearch is. Do not explain what a vector database does. Start from shared pain.
### Frontmatter
```yaml
@@ -111,21 +155,31 @@ tags: ["signals", "architecture", "rust"]
## Workflow
-1. **Gather context** — read source files, git log, research docs, previous posts
-2. **Find the headline** — the one insight worth sharing. Write it as a thesis.
-3. **Write the draft** — narrative first, code second
-4. **Cut in half** — remove every sentence that doesn't earn its place
-5. **Add code** — working examples that show the key insight
-6. **Read aloud** — if you stumble, rewrite
-7. **Write as MDX** — save to the blog content directory with proper frontmatter
+1. **Check the strategy** -- read `docs/content-strategy.md`, find the post, confirm the phase is complete
+2. **Gather context** -- read the source material listed in the strategy entry for this post
+3. **Find the headline** -- the strategy provides a working title. Sharpen it. If it does not work as a tweet, rewrite it.
+4. **Write the draft** -- narrative first, code second
+5. **Verify code** -- every example must compile and run against the current codebase
+6. **Cut in half** -- remove every sentence that does not earn its place
+7. **Read aloud** -- if you stumble, rewrite
+8. **Write as MDX** -- save to the blog content directory with proper frontmatter
## Quality Checks
+- [ ] Post matches a specific entry in `docs/content-strategy.md`
+- [ ] The roadmap phase for this post is complete (code shipped, not planned)
- [ ] Title works as a standalone tweet
- [ ] First paragraph earns the reader's second paragraph
-- [ ] Every code example is correct and copy-pasteable
-- [ ] No marketing language ("leverage," "seamless," "robust," "empower")
+- [ ] Every code example is correct, copy-pasteable, and from the shipped codebase
+- [ ] Benchmark numbers come from actual `criterion` runs, not estimates
+- [ ] No marketing language ("leverage," "seamless," "robust," "empower," "excited to announce")
- [ ] Under 3000 words (deep dives) or 1500 words (devlogs)
- [ ] Ends with something the reader remembers tomorrow
- [ ] Frontmatter is complete (title, date, author, description, tags)
- [ ] Would a CTO forward this to their team? If not, rewrite.
+
+## After Publishing
+
+1. Update the **Current Queue** table in this skill to reflect the new state
+2. If the post revealed a new insight worth a follow-up, propose adding it to `docs/content-strategy.md`
+3. Do not write the next post until there is something true to say about it
diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md
index 0f47465..c8528a9 100644
--- a/ARCHITECTURE.md
+++ b/ARCHITECTURE.md
@@ -18,7 +18,7 @@ tidalDB treats ranking as a primitive. Signals, decay, velocity, user preference
## Domain Model
-Five first-class entity types:
+Six first-class entity types:
| Type | What it represents |
|------|--------------------|
@@ -27,8 +27,9 @@ Five first-class entity types:
| **Creator** | An author — has attributes, an embedding slot, a signal ledger |
| **Relationship** | A weighted, directional edge between any two entities (follows, blocks, interaction weight) |
| **Cohort** | A named, live predicate over user attributes (e.g. `age_range ∈ {18-24} AND locale = en-US`) |
+| **Session / Agent Context** | A short-lived, agent-scoped memory surface binding a user, agent identity, and session metadata (tools, reward hints, policy) |
-Five schema-level primitives:
+Six schema-level primitives:
| Primitive | What it captures |
|-----------|-----------------|
@@ -36,6 +37,7 @@ Five schema-level primitives:
| **Ranking Profile** | A named, versioned scoring function: candidate retrieval strategy, boosts, penalties, quality gates, diversity rules, exploration budget |
| **Relationship** | Weighted edges: follows, blocks, interaction strength — used as ranking inputs |
| **Cohort** | Live predicate membership — enables cohort-scoped signal aggregation and trending |
+| **Session** | Agent-scoped conversational context: short-lived signals, reward hints, policy tags, decay curves |
| **Filter** | Composable predicates over entity attributes, signal values, and relationship state |
---
@@ -51,7 +53,9 @@ storage/ ← depends on schema; knows nothing about signals or ranking
↑
signals/ ← depends on storage; knows nothing about queries or ranking
↑
-query/ ← depends on storage + signals; orchestrates execution
+agent/ ← depends on signals; manages sessions, policy, agent APIs
+ ↑
+query/ ← depends on agent + signals; orchestrates execution
↑
ranking/ ← depends on signals; invoked by the query executor
```
@@ -79,6 +83,14 @@ Signal ingestion and aggregation. Owns:
- **Aggregation** — windowed counters (SWAG-based), velocity computation
- **Materialization** — background worker that writes pre-computed aggregates to O(1) lookup keys
+### `agent/`
+
+Session and policy management for agents. Owns:
+- **Session store** — lifecycle of `(user, agent, session_id)` plus short-lived signals
+- **Session materializers** — aggressive-decay aggregates agents can query (`last_5m_reward`, “tools used”, etc.)
+- **Policy enforcement** — per-agent read/write rules, rate limiting, and isolation guardrails
+- **API surface** — typed commands (`start_session`, `append_signal`, `close_session`) used by query/ranking layers
+
### `query/`
Query parsing and execution. Owns:
diff --git a/CLAUDE.md b/CLAUDE.md
index 3f46309..10dbe9d 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -110,3 +110,5 @@ The pre-commit hook runs automatically on staged files:
- **site/ (Next.js):** `eslint` (if node_modules installed)
All cargo commands use `--manifest-path tidal/Cargo.toml` since the Rust project is not at repo root.
+
+**Tests must be fast.** Slow or hanging tests are bugs — diagnose root cause, then remove, fix, or refactor; never leave them hanging.
diff --git a/Cargo.lock b/Cargo.lock
new file mode 100644
index 0000000..b23fe3f
--- /dev/null
+++ b/Cargo.lock
@@ -0,0 +1,1372 @@
+# This file is automatically @generated by Cargo.
+# It is not intended for manual editing.
+version = 4
+
+[[package]]
+name = "aho-corasick"
+version = "1.1.4"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "ddd31a130427c27518df266943a5308ed92d4b226cc639f5a8f1002816174301"
+dependencies = [
+ "memchr",
+]
+
+[[package]]
+name = "anes"
+version = "0.1.6"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "4b46cbb362ab8752921c97e041f5e366ee6297bd428a31275b9fcf1e380f7299"
+
+[[package]]
+name = "anstyle"
+version = "1.0.13"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "5192cca8006f1fd4f7237516f40fa183bb07f8fbdfedaa0036de5ea9b0b45e78"
+
+[[package]]
+name = "anyhow"
+version = "1.0.102"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "7f202df86484c868dbad7eaa557ef785d5c66295e41b460ef922eca0723b842c"
+
+[[package]]
+name = "arrayref"
+version = "0.3.9"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "76a2e8124351fda1ef8aaaa3bbd7ebbcb486bbcd4225aca0aa0d84bb2db8fecb"
+
+[[package]]
+name = "arrayvec"
+version = "0.7.6"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "7c02d123df017efcdfbd739ef81735b36c5ba83ec3c59c80a9d7ecc718f92e50"
+
+[[package]]
+name = "autocfg"
+version = "1.5.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "c08606f8c3cbf4ce6ec8e28fb0014a2c086708fe954eaa885384a6165172e7e8"
+
+[[package]]
+name = "bit-set"
+version = "0.8.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "08807e080ed7f9d5433fa9b275196cfc35414f66a0c79d864dc51a0d825231a3"
+dependencies = [
+ "bit-vec",
+]
+
+[[package]]
+name = "bit-vec"
+version = "0.8.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "5e764a1d40d510daf35e07be9eb06e75770908c27d411ee6c92109c9840eaaf7"
+
+[[package]]
+name = "bitflags"
+version = "2.11.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "843867be96c8daad0d758b57df9392b6d8d271134fce549de6ce169ff98a92af"
+
+[[package]]
+name = "blake3"
+version = "1.8.3"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "2468ef7d57b3fb7e16b576e8377cdbde2320c60e1491e961d11da40fc4f02a2d"
+dependencies = [
+ "arrayref",
+ "arrayvec",
+ "cc",
+ "cfg-if",
+ "constant_time_eq",
+ "cpufeatures",
+]
+
+[[package]]
+name = "bumpalo"
+version = "3.20.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "5d20789868f4b01b2f2caec9f5c4e0213b41e3e5702a50157d699ae31ced2fcb"
+
+[[package]]
+name = "byteorder-lite"
+version = "0.1.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "8f1fe948ff07f4bd06c30984e69f5b4899c516a3ef74f34df92a2df2ab535495"
+
+[[package]]
+name = "byteview"
+version = "0.10.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "1c53ba0f290bfc610084c05582d9c5d421662128fc69f4bf236707af6fd321b9"
+
+[[package]]
+name = "cast"
+version = "0.3.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "37b2a672a2cb129a2e41c10b1224bb368f9f37a2b16b612598138befd7b37eb5"
+
+[[package]]
+name = "cc"
+version = "1.2.56"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "aebf35691d1bfb0ac386a69bac2fde4dd276fb618cf8bf4f5318fe285e821bb2"
+dependencies = [
+ "find-msvc-tools",
+ "shlex",
+]
+
+[[package]]
+name = "cfg-if"
+version = "1.0.4"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "9330f8b2ff13f34540b44e946ef35111825727b38d33286ef986142615121801"
+
+[[package]]
+name = "ciborium"
+version = "0.2.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "42e69ffd6f0917f5c029256a24d0161db17cea3997d185db0d35926308770f0e"
+dependencies = [
+ "ciborium-io",
+ "ciborium-ll",
+ "serde",
+]
+
+[[package]]
+name = "ciborium-io"
+version = "0.2.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "05afea1e0a06c9be33d539b876f1ce3692f4afea2cb41f740e7743225ed1c757"
+
+[[package]]
+name = "ciborium-ll"
+version = "0.2.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "57663b653d948a338bfb3eeba9bb2fd5fcfaecb9e199e87e1eda4d9e8b240fd9"
+dependencies = [
+ "ciborium-io",
+ "half",
+]
+
+[[package]]
+name = "clap"
+version = "4.5.60"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "2797f34da339ce31042b27d23607e051786132987f595b02ba4f6a6dffb7030a"
+dependencies = [
+ "clap_builder",
+]
+
+[[package]]
+name = "clap_builder"
+version = "4.5.60"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "24a241312cea5059b13574bb9b3861cabf758b879c15190b37b6d6fd63ab6876"
+dependencies = [
+ "anstyle",
+ "clap_lex",
+]
+
+[[package]]
+name = "clap_lex"
+version = "1.0.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "3a822ea5bc7590f9d40f1ba12c0dc3c2760f3482c6984db1573ad11031420831"
+
+[[package]]
+name = "compare"
+version = "0.0.6"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "ea0095f6103c2a8b44acd6fd15960c801dafebf02e21940360833e0673f48ba7"
+
+[[package]]
+name = "constant_time_eq"
+version = "0.4.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "3d52eff69cd5e647efe296129160853a42795992097e8af39800e1060caeea9b"
+
+[[package]]
+name = "cpufeatures"
+version = "0.2.17"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "59ed5838eebb26a2bb2e58f6d5b5316989ae9d08bab10e0e6d103e656d1b0280"
+dependencies = [
+ "libc",
+]
+
+[[package]]
+name = "criterion"
+version = "0.5.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "f2b12d017a929603d80db1831cd3a24082f8137ce19c69e6447f54f5fc8d692f"
+dependencies = [
+ "anes",
+ "cast",
+ "ciborium",
+ "clap",
+ "criterion-plot",
+ "is-terminal",
+ "itertools",
+ "num-traits",
+ "once_cell",
+ "oorandom",
+ "plotters",
+ "rayon",
+ "regex",
+ "serde",
+ "serde_derive",
+ "serde_json",
+ "tinytemplate",
+ "walkdir",
+]
+
+[[package]]
+name = "criterion-plot"
+version = "0.5.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "6b50826342786a51a89e2da3a28f1c32b06e387201bc2d19791f622c673706b1"
+dependencies = [
+ "cast",
+ "itertools",
+]
+
+[[package]]
+name = "crossbeam"
+version = "0.8.4"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "1137cd7e7fc0fb5d3c5a8678be38ec56e819125d8d7907411fe24ccb943faca8"
+dependencies = [
+ "crossbeam-channel",
+ "crossbeam-deque",
+ "crossbeam-epoch",
+ "crossbeam-queue",
+ "crossbeam-utils",
+]
+
+[[package]]
+name = "crossbeam-channel"
+version = "0.5.15"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "82b8f8f868b36967f9606790d1903570de9ceaf870a7bf9fbbd3016d636a2cb2"
+dependencies = [
+ "crossbeam-utils",
+]
+
+[[package]]
+name = "crossbeam-deque"
+version = "0.8.6"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "9dd111b7b7f7d55b72c0a6ae361660ee5853c9af73f70c3c2ef6858b950e2e51"
+dependencies = [
+ "crossbeam-epoch",
+ "crossbeam-utils",
+]
+
+[[package]]
+name = "crossbeam-epoch"
+version = "0.9.18"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "5b82ac4a3c2ca9c3460964f020e1402edd5753411d7737aa39c3714ad1b5420e"
+dependencies = [
+ "crossbeam-utils",
+]
+
+[[package]]
+name = "crossbeam-queue"
+version = "0.3.12"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "0f58bbc28f91df819d0aa2a2c00cd19754769c2fad90579b3592b1c9ba7a3115"
+dependencies = [
+ "crossbeam-utils",
+]
+
+[[package]]
+name = "crossbeam-skiplist"
+version = "0.1.3"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "df29de440c58ca2cc6e587ec3d22347551a32435fbde9d2bff64e78a9ffa151b"
+dependencies = [
+ "crossbeam-epoch",
+ "crossbeam-utils",
+]
+
+[[package]]
+name = "crossbeam-utils"
+version = "0.8.21"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "d0a5c400df2834b80a4c3327b3aad3a4c4cd4de0629063962b03235697506a28"
+
+[[package]]
+name = "crunchy"
+version = "0.2.4"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "460fbee9c2c2f33933d720630a6a0bac33ba7053db5344fac858d4b8952d77d5"
+
+[[package]]
+name = "dashmap"
+version = "6.1.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "5041cc499144891f3790297212f32a74fb938e5136a14943f338ef9e0ae276cf"
+dependencies = [
+ "cfg-if",
+ "crossbeam-utils",
+ "hashbrown 0.14.5",
+ "lock_api",
+ "once_cell",
+ "parking_lot_core",
+]
+
+[[package]]
+name = "either"
+version = "1.15.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "48c757948c5ede0e46177b7add2e67155f70e33c07fea8284df6576da70b3719"
+
+[[package]]
+name = "enum_dispatch"
+version = "0.3.13"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "aa18ce2bc66555b3218614519ac839ddb759a7d6720732f979ef8d13be147ecd"
+dependencies = [
+ "once_cell",
+ "proc-macro2",
+ "quote",
+ "syn",
+]
+
+[[package]]
+name = "equivalent"
+version = "1.0.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "877a4ace8713b0bcf2a4e7eec82529c029f1d0619886d18145fea96c3ffe5c0f"
+
+[[package]]
+name = "errno"
+version = "0.3.14"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "39cab71617ae0d63f51a36d69f866391735b51691dbda63cf6f96d042b63efeb"
+dependencies = [
+ "libc",
+ "windows-sys",
+]
+
+[[package]]
+name = "fastrand"
+version = "2.3.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "37909eebbb50d72f9059c3b6d82c0463f2ff062c9e95845c43a6c9c0355411be"
+
+[[package]]
+name = "find-msvc-tools"
+version = "0.1.9"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "5baebc0774151f905a1a2cc41989300b1e6fbb29aff0ceffa1064fdd3088d582"
+
+[[package]]
+name = "fjall"
+version = "3.0.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "5a2799b4198427a08c774838e44d0b77f677208f19a1927671cd2cd36bb30d69"
+dependencies = [
+ "byteorder-lite",
+ "byteview",
+ "dashmap",
+ "flume",
+ "log",
+ "lsm-tree",
+ "lz4_flex",
+ "tempfile",
+ "xxhash-rust",
+]
+
+[[package]]
+name = "flume"
+version = "0.12.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "5e139bc46ca777eb5efaf62df0ab8cc5fd400866427e56c68b22e414e53bd3be"
+dependencies = [
+ "spin",
+]
+
+[[package]]
+name = "fnv"
+version = "1.0.7"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "3f9eec918d3f24069decb9af1554cad7c880e2da24a9afd88aca000531ab82c1"
+
+[[package]]
+name = "foldhash"
+version = "0.1.5"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "d9c4f5dac5e15c24eb999c26181a6ca40b39fe946cbe4c263c7209467bc83af2"
+
+[[package]]
+name = "getrandom"
+version = "0.3.4"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "899def5c37c4fd7b2664648c28120ecec138e4d395b459e5ca34f9cce2dd77fd"
+dependencies = [
+ "cfg-if",
+ "libc",
+ "r-efi",
+ "wasip2",
+]
+
+[[package]]
+name = "getrandom"
+version = "0.4.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "139ef39800118c7683f2fd3c98c1b23c09ae076556b435f8e9064ae108aaeeec"
+dependencies = [
+ "cfg-if",
+ "libc",
+ "r-efi",
+ "wasip2",
+ "wasip3",
+]
+
+[[package]]
+name = "half"
+version = "2.7.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "6ea2d84b969582b4b1864a92dc5d27cd2b77b622a8d79306834f1be5ba20d84b"
+dependencies = [
+ "cfg-if",
+ "crunchy",
+ "zerocopy",
+]
+
+[[package]]
+name = "hashbrown"
+version = "0.14.5"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "e5274423e17b7c9fc20b6e7e208532f9b19825d82dfd615708b70edd83df41f1"
+
+[[package]]
+name = "hashbrown"
+version = "0.15.5"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "9229cfe53dfd69f0609a49f65461bd93001ea1ef889cd5529dd176593f5338a1"
+dependencies = [
+ "foldhash",
+]
+
+[[package]]
+name = "hashbrown"
+version = "0.16.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "841d1cc9bed7f9236f321df977030373f4a4163ae1a7dbfe1a51a2c1a51d9100"
+
+[[package]]
+name = "heck"
+version = "0.5.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "2304e00983f87ffb38b55b444b5e3b60a884b5d30c0fca7d82fe33449bbe55ea"
+
+[[package]]
+name = "hermit-abi"
+version = "0.5.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "fc0fef456e4baa96da950455cd02c081ca953b141298e41db3fc7e36b1da849c"
+
+[[package]]
+name = "id-arena"
+version = "2.3.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "3d3067d79b975e8844ca9eb072e16b31c3c1c36928edf9c6789548c524d0d954"
+
+[[package]]
+name = "indexmap"
+version = "2.13.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "7714e70437a7dc3ac8eb7e6f8df75fd8eb422675fc7678aff7364301092b1017"
+dependencies = [
+ "equivalent",
+ "hashbrown 0.16.1",
+ "serde",
+ "serde_core",
+]
+
+[[package]]
+name = "interval-heap"
+version = "0.0.5"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "11274e5e8e89b8607cfedc2910b6626e998779b48a019151c7604d0adcb86ac6"
+dependencies = [
+ "compare",
+]
+
+[[package]]
+name = "is-terminal"
+version = "0.4.17"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "3640c1c38b8e4e43584d8df18be5fc6b0aa314ce6ebf51b53313d4306cca8e46"
+dependencies = [
+ "hermit-abi",
+ "libc",
+ "windows-sys",
+]
+
+[[package]]
+name = "itertools"
+version = "0.10.5"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "b0fd2260e829bddf4cb6ea802289de2f86d6a7a690192fbe91b3f46e0f2c8473"
+dependencies = [
+ "either",
+]
+
+[[package]]
+name = "itoa"
+version = "1.0.17"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "92ecc6618181def0457392ccd0ee51198e065e016d1d527a7ac1b6dc7c1f09d2"
+
+[[package]]
+name = "js-sys"
+version = "0.3.87"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "93f0862381daaec758576dcc22eb7bbf4d7efd67328553f3b45a412a51a3fb21"
+dependencies = [
+ "once_cell",
+ "wasm-bindgen",
+]
+
+[[package]]
+name = "leb128fmt"
+version = "0.1.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "09edd9e8b54e49e587e4f6295a7d29c3ea94d469cb40ab8ca70b288248a81db2"
+
+[[package]]
+name = "libc"
+version = "0.2.182"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "6800badb6cb2082ffd7b6a67e6125bb39f18782f793520caee8cb8846be06112"
+
+[[package]]
+name = "linux-raw-sys"
+version = "0.11.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "df1d3c3b53da64cf5760482273a98e575c651a67eec7f77df96b5b642de8f039"
+
+[[package]]
+name = "lock_api"
+version = "0.4.14"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "224399e74b87b5f3557511d98dff8b14089b3dadafcab6bb93eab67d3aace965"
+dependencies = [
+ "scopeguard",
+]
+
+[[package]]
+name = "log"
+version = "0.4.29"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "5e5032e24019045c762d3c0f28f5b6b8bbf38563a65908389bf7978758920897"
+
+[[package]]
+name = "lsm-tree"
+version = "3.0.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "86e8d0b8e0cf2531a437788ce94d95570dbaabfe9888db20022c2d5ccec9b221"
+dependencies = [
+ "byteorder-lite",
+ "byteview",
+ "crossbeam-skiplist",
+ "enum_dispatch",
+ "interval-heap",
+ "log",
+ "lz4_flex",
+ "quick_cache",
+ "rustc-hash",
+ "self_cell",
+ "sfa",
+ "tempfile",
+ "varint-rs",
+ "xxhash-rust",
+]
+
+[[package]]
+name = "lz4_flex"
+version = "0.11.5"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "08ab2867e3eeeca90e844d1940eab391c9dc5228783db2ed999acbc0a9ed375a"
+dependencies = [
+ "twox-hash",
+]
+
+[[package]]
+name = "memchr"
+version = "2.8.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "f8ca58f447f06ed17d5fc4043ce1b10dd205e060fb3ce5b979b8ed8e59ff3f79"
+
+[[package]]
+name = "num-traits"
+version = "0.2.19"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "071dfc062690e90b734c0b2273ce72ad0ffa95f0c74596bc250dcfd960262841"
+dependencies = [
+ "autocfg",
+]
+
+[[package]]
+name = "once_cell"
+version = "1.21.3"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "42f5e15c9953c5e4ccceeb2e7382a716482c34515315f7b03532b8b4e8393d2d"
+
+[[package]]
+name = "oorandom"
+version = "11.1.5"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "d6790f58c7ff633d8771f42965289203411a5e5c68388703c06e14f24770b41e"
+
+[[package]]
+name = "parking_lot_core"
+version = "0.9.12"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "2621685985a2ebf1c516881c026032ac7deafcda1a2c9b7850dc81e3dfcb64c1"
+dependencies = [
+ "cfg-if",
+ "libc",
+ "redox_syscall",
+ "smallvec",
+ "windows-link",
+]
+
+[[package]]
+name = "pin-project-lite"
+version = "0.2.16"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "3b3cff922bd51709b605d9ead9aa71031d81447142d828eb4a6eba76fe619f9b"
+
+[[package]]
+name = "plotters"
+version = "0.3.7"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "5aeb6f403d7a4911efb1e33402027fc44f29b5bf6def3effcc22d7bb75f2b747"
+dependencies = [
+ "num-traits",
+ "plotters-backend",
+ "plotters-svg",
+ "wasm-bindgen",
+ "web-sys",
+]
+
+[[package]]
+name = "plotters-backend"
+version = "0.3.7"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "df42e13c12958a16b3f7f4386b9ab1f3e7933914ecea48da7139435263a4172a"
+
+[[package]]
+name = "plotters-svg"
+version = "0.3.7"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "51bae2ac328883f7acdfea3d66a7c35751187f870bc81f94563733a154d7a670"
+dependencies = [
+ "plotters-backend",
+]
+
+[[package]]
+name = "ppv-lite86"
+version = "0.2.21"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "85eae3c4ed2f50dcfe72643da4befc30deadb458a9b590d720cde2f2b1e97da9"
+dependencies = [
+ "zerocopy",
+]
+
+[[package]]
+name = "prettyplease"
+version = "0.2.37"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "479ca8adacdd7ce8f1fb39ce9ecccbfe93a3f1344b3d0d97f20bc0196208f62b"
+dependencies = [
+ "proc-macro2",
+ "syn",
+]
+
+[[package]]
+name = "proc-macro2"
+version = "1.0.106"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "8fd00f0bb2e90d81d1044c2b32617f68fcb9fa3bb7640c23e9c748e53fb30934"
+dependencies = [
+ "unicode-ident",
+]
+
+[[package]]
+name = "proptest"
+version = "1.10.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "37566cb3fdacef14c0737f9546df7cfeadbfbc9fef10991038bf5015d0c80532"
+dependencies = [
+ "bit-set",
+ "bit-vec",
+ "bitflags",
+ "num-traits",
+ "rand",
+ "rand_chacha",
+ "rand_xorshift",
+ "regex-syntax",
+ "rusty-fork",
+ "tempfile",
+ "unarray",
+]
+
+[[package]]
+name = "quick-error"
+version = "1.2.3"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "a1d01941d82fa2ab50be1e79e6714289dd7cde78eba4c074bc5a4374f650dfe0"
+
+[[package]]
+name = "quick_cache"
+version = "0.6.18"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "7ada44a88ef953a3294f6eb55d2007ba44646015e18613d2f213016379203ef3"
+dependencies = [
+ "equivalent",
+ "hashbrown 0.16.1",
+]
+
+[[package]]
+name = "quote"
+version = "1.0.44"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "21b2ebcf727b7760c461f091f9f0f539b77b8e87f2fd88131e7f1b433b3cece4"
+dependencies = [
+ "proc-macro2",
+]
+
+[[package]]
+name = "r-efi"
+version = "5.3.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "69cdb34c158ceb288df11e18b4bd39de994f6657d83847bdffdbd7f346754b0f"
+
+[[package]]
+name = "rand"
+version = "0.9.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "6db2770f06117d490610c7488547d543617b21bfa07796d7a12f6f1bd53850d1"
+dependencies = [
+ "rand_chacha",
+ "rand_core",
+]
+
+[[package]]
+name = "rand_chacha"
+version = "0.9.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "d3022b5f1df60f26e1ffddd6c66e8aa15de382ae63b3a0c1bfc0e4d3e3f325cb"
+dependencies = [
+ "ppv-lite86",
+ "rand_core",
+]
+
+[[package]]
+name = "rand_core"
+version = "0.9.5"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "76afc826de14238e6e8c374ddcc1fa19e374fd8dd986b0d2af0d02377261d83c"
+dependencies = [
+ "getrandom 0.3.4",
+]
+
+[[package]]
+name = "rand_xorshift"
+version = "0.4.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "513962919efc330f829edb2535844d1b912b0fbe2ca165d613e4e8788bb05a5a"
+dependencies = [
+ "rand_core",
+]
+
+[[package]]
+name = "rayon"
+version = "1.11.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "368f01d005bf8fd9b1206fb6fa653e6c4a81ceb1466406b81792d87c5677a58f"
+dependencies = [
+ "either",
+ "rayon-core",
+]
+
+[[package]]
+name = "rayon-core"
+version = "1.13.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "22e18b0f0062d30d4230b2e85ff77fdfe4326feb054b9783a3460d8435c8ab91"
+dependencies = [
+ "crossbeam-deque",
+ "crossbeam-utils",
+]
+
+[[package]]
+name = "redox_syscall"
+version = "0.5.18"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "ed2bf2547551a7053d6fdfafda3f938979645c44812fbfcda098faae3f1a362d"
+dependencies = [
+ "bitflags",
+]
+
+[[package]]
+name = "regex"
+version = "1.12.3"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "e10754a14b9137dd7b1e3e5b0493cc9171fdd105e0ab477f51b72e7f3ac0e276"
+dependencies = [
+ "aho-corasick",
+ "memchr",
+ "regex-automata",
+ "regex-syntax",
+]
+
+[[package]]
+name = "regex-automata"
+version = "0.4.14"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "6e1dd4122fc1595e8162618945476892eefca7b88c52820e74af6262213cae8f"
+dependencies = [
+ "aho-corasick",
+ "memchr",
+ "regex-syntax",
+]
+
+[[package]]
+name = "regex-syntax"
+version = "0.8.9"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "a96887878f22d7bad8a3b6dc5b7440e0ada9a245242924394987b21cf2210a4c"
+
+[[package]]
+name = "rustc-hash"
+version = "2.1.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "357703d41365b4b27c590e3ed91eabb1b663f07c4c084095e60cbed4362dff0d"
+
+[[package]]
+name = "rustix"
+version = "1.1.3"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "146c9e247ccc180c1f61615433868c99f3de3ae256a30a43b49f67c2d9171f34"
+dependencies = [
+ "bitflags",
+ "errno",
+ "libc",
+ "linux-raw-sys",
+ "windows-sys",
+]
+
+[[package]]
+name = "rustversion"
+version = "1.0.22"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "b39cdef0fa800fc44525c84ccb54a029961a8215f9619753635a9c0d2538d46d"
+
+[[package]]
+name = "rusty-fork"
+version = "0.3.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "cc6bf79ff24e648f6da1f8d1f011e9cac26491b619e6b9280f2b47f1774e6ee2"
+dependencies = [
+ "fnv",
+ "quick-error",
+ "tempfile",
+ "wait-timeout",
+]
+
+[[package]]
+name = "same-file"
+version = "1.0.6"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "93fc1dc3aaa9bfed95e02e6eadabb4baf7e3078b0bd1b4d7b6b0b68378900502"
+dependencies = [
+ "winapi-util",
+]
+
+[[package]]
+name = "scopeguard"
+version = "1.2.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "94143f37725109f92c262ed2cf5e59bce7498c01bcc1502d7b9afe439a4e9f49"
+
+[[package]]
+name = "self_cell"
+version = "1.2.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "b12e76d157a900eb52e81bc6e9f3069344290341720e9178cde2407113ac8d89"
+
+[[package]]
+name = "semver"
+version = "1.0.27"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "d767eb0aabc880b29956c35734170f26ed551a859dbd361d140cdbeca61ab1e2"
+
+[[package]]
+name = "serde"
+version = "1.0.228"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "9a8e94ea7f378bd32cbbd37198a4a91436180c5bb472411e48b5ec2e2124ae9e"
+dependencies = [
+ "serde_core",
+ "serde_derive",
+]
+
+[[package]]
+name = "serde_core"
+version = "1.0.228"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "41d385c7d4ca58e59fc732af25c3983b67ac852c1a25000afe1175de458b67ad"
+dependencies = [
+ "serde_derive",
+]
+
+[[package]]
+name = "serde_derive"
+version = "1.0.228"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "d540f220d3187173da220f885ab66608367b6574e925011a9353e4badda91d79"
+dependencies = [
+ "proc-macro2",
+ "quote",
+ "syn",
+]
+
+[[package]]
+name = "serde_json"
+version = "1.0.149"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "83fc039473c5595ace860d8c4fafa220ff474b3fc6bfdb4293327f1a37e94d86"
+dependencies = [
+ "itoa",
+ "memchr",
+ "serde",
+ "serde_core",
+ "zmij",
+]
+
+[[package]]
+name = "sfa"
+version = "1.0.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "a1296838937cab56cd6c4eeeb8718ec777383700c33f060e2869867bd01d1175"
+dependencies = [
+ "byteorder-lite",
+ "log",
+ "xxhash-rust",
+]
+
+[[package]]
+name = "shlex"
+version = "1.3.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "0fda2ff0d084019ba4d7c6f371c95d8fd75ce3524c3cb8fb653a3023f6323e64"
+
+[[package]]
+name = "smallvec"
+version = "1.15.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "67b1b7a3b5fe4f1376887184045fcf45c69e92af734b7aaddc05fb777b6fbd03"
+
+[[package]]
+name = "spin"
+version = "0.9.8"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "6980e8d7511241f8acf4aebddbb1ff938df5eebe98691418c4468d0b72a96a67"
+dependencies = [
+ "lock_api",
+]
+
+[[package]]
+name = "syn"
+version = "2.0.117"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "e665b8803e7b1d2a727f4023456bbbbe74da67099c585258af0ad9c5013b9b99"
+dependencies = [
+ "proc-macro2",
+ "quote",
+ "unicode-ident",
+]
+
+[[package]]
+name = "tempfile"
+version = "3.25.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "0136791f7c95b1f6dd99f9cc786b91bb81c3800b639b3478e561ddb7be95e5f1"
+dependencies = [
+ "fastrand",
+ "getrandom 0.4.1",
+ "once_cell",
+ "rustix",
+ "windows-sys",
+]
+
+[[package]]
+name = "tidalctl"
+version = "0.1.0"
+dependencies = [
+ "serde_json",
+ "tidaldb",
+]
+
+[[package]]
+name = "tidaldb"
+version = "0.1.0"
+dependencies = [
+ "blake3",
+ "criterion",
+ "crossbeam",
+ "dashmap",
+ "fjall",
+ "proptest",
+ "tempfile",
+ "tracing",
+]
+
+[[package]]
+name = "tinytemplate"
+version = "1.2.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "be4d6b5f19ff7664e8c98d03e2139cb510db9b0a60b55f8e8709b689d939b6bc"
+dependencies = [
+ "serde",
+ "serde_json",
+]
+
+[[package]]
+name = "tracing"
+version = "0.1.44"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "63e71662fa4b2a2c3a26f570f037eb95bb1f85397f3cd8076caed2f026a6d100"
+dependencies = [
+ "pin-project-lite",
+ "tracing-attributes",
+ "tracing-core",
+]
+
+[[package]]
+name = "tracing-attributes"
+version = "0.1.31"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "7490cfa5ec963746568740651ac6781f701c9c5ea257c58e057f3ba8cf69e8da"
+dependencies = [
+ "proc-macro2",
+ "quote",
+ "syn",
+]
+
+[[package]]
+name = "tracing-core"
+version = "0.1.36"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "db97caf9d906fbde555dd62fa95ddba9eecfd14cb388e4f491a66d74cd5fb79a"
+dependencies = [
+ "once_cell",
+]
+
+[[package]]
+name = "twox-hash"
+version = "2.1.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "9ea3136b675547379c4bd395ca6b938e5ad3c3d20fad76e7fe85f9e0d011419c"
+
+[[package]]
+name = "unarray"
+version = "0.1.4"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "eaea85b334db583fe3274d12b4cd1880032beab409c0d774be044d4480ab9a94"
+
+[[package]]
+name = "unicode-ident"
+version = "1.0.24"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "e6e4313cd5fcd3dad5cafa179702e2b244f760991f45397d14d4ebf38247da75"
+
+[[package]]
+name = "unicode-xid"
+version = "0.2.6"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "ebc1c04c71510c7f702b52b7c350734c9ff1295c464a03335b00bb84fc54f853"
+
+[[package]]
+name = "varint-rs"
+version = "2.2.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "8f54a172d0620933a27a4360d3db3e2ae0dd6cceae9730751a036bbf182c4b23"
+
+[[package]]
+name = "wait-timeout"
+version = "0.2.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "09ac3b126d3914f9849036f826e054cbabdc8519970b8998ddaf3b5bd3c65f11"
+dependencies = [
+ "libc",
+]
+
+[[package]]
+name = "walkdir"
+version = "2.5.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "29790946404f91d9c5d06f9874efddea1dc06c5efe94541a7d6863108e3a5e4b"
+dependencies = [
+ "same-file",
+ "winapi-util",
+]
+
+[[package]]
+name = "wasip2"
+version = "1.0.2+wasi-0.2.9"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "9517f9239f02c069db75e65f174b3da828fe5f5b945c4dd26bd25d89c03ebcf5"
+dependencies = [
+ "wit-bindgen",
+]
+
+[[package]]
+name = "wasip3"
+version = "0.4.0+wasi-0.3.0-rc-2026-01-06"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "5428f8bf88ea5ddc08faddef2ac4a67e390b88186c703ce6dbd955e1c145aca5"
+dependencies = [
+ "wit-bindgen",
+]
+
+[[package]]
+name = "wasm-bindgen"
+version = "0.2.110"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "1de241cdc66a9d91bd84f097039eb140cdc6eec47e0cdbaf9d932a1dd6c35866"
+dependencies = [
+ "cfg-if",
+ "once_cell",
+ "rustversion",
+ "wasm-bindgen-macro",
+ "wasm-bindgen-shared",
+]
+
+[[package]]
+name = "wasm-bindgen-macro"
+version = "0.2.110"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "e12fdf6649048f2e3de6d7d5ff3ced779cdedee0e0baffd7dff5cdfa3abc8a52"
+dependencies = [
+ "quote",
+ "wasm-bindgen-macro-support",
+]
+
+[[package]]
+name = "wasm-bindgen-macro-support"
+version = "0.2.110"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "0e63d1795c565ac3462334c1e396fd46dbf481c40f51f5072c310717bc4fb309"
+dependencies = [
+ "bumpalo",
+ "proc-macro2",
+ "quote",
+ "syn",
+ "wasm-bindgen-shared",
+]
+
+[[package]]
+name = "wasm-bindgen-shared"
+version = "0.2.110"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "e9f9cdac23a5ce71f6bf9f8824898a501e511892791ea2a0c6b8568c68b9cb53"
+dependencies = [
+ "unicode-ident",
+]
+
+[[package]]
+name = "wasm-encoder"
+version = "0.244.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "990065f2fe63003fe337b932cfb5e3b80e0b4d0f5ff650e6985b1048f62c8319"
+dependencies = [
+ "leb128fmt",
+ "wasmparser",
+]
+
+[[package]]
+name = "wasm-metadata"
+version = "0.244.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "bb0e353e6a2fbdc176932bbaab493762eb1255a7900fe0fea1a2f96c296cc909"
+dependencies = [
+ "anyhow",
+ "indexmap",
+ "wasm-encoder",
+ "wasmparser",
+]
+
+[[package]]
+name = "wasmparser"
+version = "0.244.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "47b807c72e1bac69382b3a6fb3dbe8ea4c0ed87ff5629b8685ae6b9a611028fe"
+dependencies = [
+ "bitflags",
+ "hashbrown 0.15.5",
+ "indexmap",
+ "semver",
+]
+
+[[package]]
+name = "web-sys"
+version = "0.3.87"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "f2c7c5718134e770ee62af3b6b4a84518ec10101aad610c024b64d6ff29bb1ff"
+dependencies = [
+ "js-sys",
+ "wasm-bindgen",
+]
+
+[[package]]
+name = "winapi-util"
+version = "0.1.11"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "c2a7b1c03c876122aa43f3020e6c3c3ee5c05081c9a00739faf7503aeba10d22"
+dependencies = [
+ "windows-sys",
+]
+
+[[package]]
+name = "windows-link"
+version = "0.2.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "f0805222e57f7521d6a62e36fa9163bc891acd422f971defe97d64e70d0a4fe5"
+
+[[package]]
+name = "windows-sys"
+version = "0.61.2"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "ae137229bcbd6cdf0f7b80a31df61766145077ddf49416a728b02cb3921ff3fc"
+dependencies = [
+ "windows-link",
+]
+
+[[package]]
+name = "wit-bindgen"
+version = "0.51.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "d7249219f66ced02969388cf2bb044a09756a083d0fab1e566056b04d9fbcaa5"
+dependencies = [
+ "wit-bindgen-rust-macro",
+]
+
+[[package]]
+name = "wit-bindgen-core"
+version = "0.51.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "ea61de684c3ea68cb082b7a88508a8b27fcc8b797d738bfc99a82facf1d752dc"
+dependencies = [
+ "anyhow",
+ "heck",
+ "wit-parser",
+]
+
+[[package]]
+name = "wit-bindgen-rust"
+version = "0.51.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "b7c566e0f4b284dd6561c786d9cb0142da491f46a9fbed79ea69cdad5db17f21"
+dependencies = [
+ "anyhow",
+ "heck",
+ "indexmap",
+ "prettyplease",
+ "syn",
+ "wasm-metadata",
+ "wit-bindgen-core",
+ "wit-component",
+]
+
+[[package]]
+name = "wit-bindgen-rust-macro"
+version = "0.51.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "0c0f9bfd77e6a48eccf51359e3ae77140a7f50b1e2ebfe62422d8afdaffab17a"
+dependencies = [
+ "anyhow",
+ "prettyplease",
+ "proc-macro2",
+ "quote",
+ "syn",
+ "wit-bindgen-core",
+ "wit-bindgen-rust",
+]
+
+[[package]]
+name = "wit-component"
+version = "0.244.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "9d66ea20e9553b30172b5e831994e35fbde2d165325bec84fc43dbf6f4eb9cb2"
+dependencies = [
+ "anyhow",
+ "bitflags",
+ "indexmap",
+ "log",
+ "serde",
+ "serde_derive",
+ "serde_json",
+ "wasm-encoder",
+ "wasm-metadata",
+ "wasmparser",
+ "wit-parser",
+]
+
+[[package]]
+name = "wit-parser"
+version = "0.244.0"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "ecc8ac4bc1dc3381b7f59c34f00b67e18f910c2c0f50015669dde7def656a736"
+dependencies = [
+ "anyhow",
+ "id-arena",
+ "indexmap",
+ "log",
+ "semver",
+ "serde",
+ "serde_derive",
+ "serde_json",
+ "unicode-xid",
+ "wasmparser",
+]
+
+[[package]]
+name = "xxhash-rust"
+version = "0.8.15"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "fdd20c5420375476fbd4394763288da7eb0cc0b8c11deed431a91562af7335d3"
+
+[[package]]
+name = "zerocopy"
+version = "0.8.39"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "db6d35d663eadb6c932438e763b262fe1a70987f9ae936e60158176d710cae4a"
+dependencies = [
+ "zerocopy-derive",
+]
+
+[[package]]
+name = "zerocopy-derive"
+version = "0.8.39"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "4122cd3169e94605190e77839c9a40d40ed048d305bfdc146e7df40ab0f3e517"
+dependencies = [
+ "proc-macro2",
+ "quote",
+ "syn",
+]
+
+[[package]]
+name = "zmij"
+version = "1.0.21"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "b8848ee67ecc8aedbaf3e4122217aff892639231befc6a1b58d29fff4c2cabaa"
diff --git a/Cargo.toml b/Cargo.toml
new file mode 100644
index 0000000..f38ddc1
--- /dev/null
+++ b/Cargo.toml
@@ -0,0 +1,8 @@
+[workspace]
+members = ["tidal", "tidalctl"]
+resolver = "2"
+
+[workspace.package]
+edition = "2024"
+rust-version = "1.91"
+license = "MIT"
diff --git a/VISION.md b/VISION.md
index 31056a4..24034e9 100644
--- a/VISION.md
+++ b/VISION.md
@@ -20,6 +20,7 @@ A database purpose-built for personalized content delivery should model the worl
- Content has metadata, embeddings, and signals. Signals are not fields — they are typed, timestamped streams with native decay, velocity, and windowed aggregation semantics.
- Users have preferences, histories, and relationships. These are not rows — they are living profiles that update continuously as events arrive.
+- Agents mediate most interactions. They retrieve context, elicit preferences, and publish structured feedback (reward, tool usage, confidence) as first-class signals. The system must let them read and write memory instantly.
- A query is not "give me items matching these filters sorted by this field." It is "given this user, this context, and this surface — what should they see, in what order, subject to these constraints?"
- Filters, sort modes, and diversity rules are first-class query citizens — not application logic bolted on top.
- Engagement is not application logic that happens to write back into the database. It is a first-class write path that closes the feedback loop natively.
@@ -46,11 +47,14 @@ It is strongly opinionated. It does not try to be a general-purpose database. It
**Cohorts** are named predicates over user attributes — demographic, behavioral, and interest-based segments. A cohort is not a static list of users — it is a live query over user state. "US users aged 18-24 who engage with jazz content" is a cohort. The database maintains per-cohort signal aggregation so that trending, rising, and quality signals can be scoped to any cohort at query time. This enables the three-layer trending model: global trending, cohort-scoped trending, and search within cohort-scoped trending.
+**Sessions / Agent Context** capture in-flight conversations and tool use. They bind a user, an agent, and a session identifier to short-lived signals (preference hints, rewards, critiques) with aggressive decay. Sessions can be forked, merged, and policy-limited so an agent only sees what it is allowed to remember.
+
**The Query** is a single operation that encapsulates candidate retrieval, filtering, ranking, and diversity enforcement:
```
RETRIEVE items
FOR USER @user_id
+FOR SESSION @session_id
CONTEXT feed
USING PROFILE for_you
FILTER unseen, unblocked, format:video, duration:short
@@ -76,6 +80,7 @@ Search within cohort-scoped trending:
```
SEARCH items
QUERY "piano tutorial"
+FOR SESSION @session_id
WITHIN TRENDING
COHORT locale:US, age:18-24, interest:jazz
WINDOW 24h
@@ -155,7 +160,7 @@ Every one of these surfaces is driven by the same underlying query primitives. T
### The Feedback Loop
-When a user engages with content — views, likes, skips, hides — that event is written to the database as a signal. The database updates the item's signal ledger, the user's implicit preference profile, and the relationship weight between the user and the creator — automatically, as part of the write transaction. The next ranking query reflects this immediately. There is no Kafka consumer to lag, no feature store sync to schedule, no cache to invalidate.
+When a user engages with content — directly or via an agent — that event is written to the database as a signal. The agent can attach structured metadata (reward, confidence, tool invocation) in the same write. The database updates the item's signal ledger, the user's implicit preference profile, the relationship weight, and the session-scoped memory — automatically, as part of the write transaction. The next ranking or grounding query reflects this immediately. There is no Kafka consumer to lag, no feature store sync to schedule, no cache to invalidate.
Negative signals are equal citizens. A skip, a hide, a block, a "not interested," a downvote — these update the system with the same immediacy and precision as a like or a completion.
@@ -189,6 +194,8 @@ It is not trying to solve moderation, payments, authentication, or content deliv
**Cohorts are live queries, not static lists.** A cohort is a predicate over user attributes — demographics, interests, behavioral segments. Users flow in and out of cohorts as their attributes change. Signal aggregation runs per-cohort so trending and quality signals reflect what's happening within any audience segment.
+**Agents own managed contexts.** Sessions scope short-lived memory, rewards, and tool usage. Agents can only read/write within their sessions, and policy guards live in schema, not ad-hoc middleware.
+
**Correctness over cleverness.** Ranking is already approximate by nature. The database does not need to be more clever than the signals it has. It needs to be fast, consistent, and operationally simple.
## Who This Is For
diff --git a/docs/content-strategy.md b/docs/content-strategy.md
new file mode 100644
index 0000000..5ee81be
--- /dev/null
+++ b/docs/content-strategy.md
@@ -0,0 +1,304 @@
+# Content Strategy
+
+Blog posts mapped to the tidalDB roadmap. Each entry identifies the moment worth writing about, the thesis that makes it shareable, and the type of post it demands.
+
+The audience is engineers who have built or are currently maintaining recommendation and discovery systems -- the people running the 6-system stack this database replaces. They know what Kafka lag feels like at 3am. They know why cache invalidation bugs in the ranking pipeline are the ones that never get root-caused. They will smell marketing language from the first sentence. Respect that.
+
+---
+
+## Publishing Principles
+
+**Write when something is true, not when something is scheduled.** A blog post published the day a milestone passes its UAT is credible. A blog post published before the code works is fiction.
+
+**One insight per post.** The reader should leave with a single idea they did not have before. If the post contains two insights, it is two posts.
+
+**Code proves claims.** Every technical assertion is backed by a code example or a benchmark number from the actual codebase. Not a prototype. Not a plan. The shipped code.
+
+**The title is the thesis.** If the title does not work as a standalone sentence that makes an engineer stop scrolling, the post is not ready.
+
+---
+
+## Content Calendar
+
+### Pre-Implementation (Now)
+
+These posts can be written before the engine is feature-complete. They draw on the vision, architecture research, and the problem space -- not on shipped code.
+
+#### Post 1: "Every content platform builds the same 6 systems from scratch"
+
+- **Type:** Vision / Problem Statement
+- **Thesis:** The Elasticsearch + Redis + Kafka + feature store + vector DB + ranking service stack is not an architecture. It is scar tissue. The seams between these systems are where correctness dies.
+- **Source material:** VISION.md, thoughts.md (Part VI)
+- **When to publish:** Any time. This post defines the problem and does not depend on implementation progress.
+- **Why it matters:** This is the foundational narrative. Every subsequent post assumes the reader understands this problem. It also serves as the litmus test for whether the audience cares -- if this post does not resonate, the subsequent ones will not either.
+- **Structure:** Problem statement. The 6 systems named and indicted. The seams enumerated (stale signals, ETL lag, cache invalidation, operational burden). The thesis: ranking is not a feature, it is a primitive. End with the one-query vision, not with a product pitch.
+
+---
+
+### Milestone 1: Signal Engine
+
+M1 proves that temporal signals with O(1) decay, velocity, and windowed aggregation work as a database primitive. This is the most technically interesting milestone for blog content because the math is elegant and the performance numbers are dramatic.
+
+#### Post 2: "Running decay scores are O(1) -- here is the math"
+
+- **Type:** Technical Deep Dive
+- **Roadmap phase:** m1p4 (Signal Ledger) completion
+- **Thesis:** The forward-decay formula `S(t) = S(t_prev) * exp(-lambda * dt) + weight` eliminates raw-event scanning at query time. Three `exp()` calls on write, one on read. 15 nanoseconds per entity. Every platform computing `trending_score = views / (age + 2)^1.8` in application code is doing O(N) work that should be O(1).
+- **Source material:** docs/research/tidaldb_signal_ledger.md, ARCHITECTURE.md (Signal System section), m1p4 task docs
+- **When to publish:** After m1p4 passes UAT with benchmark numbers in hand.
+- **Code to include:** The `EntitySignalState` struct. The forward-decay write path. The out-of-order event correction. Benchmark output showing 200-entity scoring pass under 5 microseconds.
+- **Why it matters:** This is the post that demonstrates tidalDB is not vaporware. The math is verifiable. The benchmarks are reproducible. Engineers who have implemented trending scores in Redis will immediately understand the value.
+
+#### Post 3: "What three databases taught us before we wrote a line of code"
+
+- **Type:** Architecture Decision Record
+- **Roadmap phase:** m1p1-m1p3 completion (the foundation phases)
+- **Thesis:** We studied Engram (cognitive memory), Citadel (append-only logging), and StemeDB (knowledge graph) -- three purpose-built databases in the same codebase -- and stole their best patterns. WAL-first durability from Citadel. Cache-line aligned hot structs from Engram. Subject-prefix key encoding from StemeDB. Background materialization from StemeDB. Here is what converged and what the gaps taught us.
+- **Source material:** thoughts.md (all six parts), CODING_GUIDELINES.md
+- **When to publish:** After m1p3 (Storage Engine) is complete. The patterns referenced are already implemented.
+- **Code to include:** Key encoding format. Cache-line aligned struct. Group commit writer. Side-by-side comparison of the pattern in the source database and in tidalDB.
+- **Why it matters:** Engineers respect builders who study prior art. This post establishes technical credibility and shows the architectural foundation is grounded in real patterns, not invented from scratch.
+
+#### Post 4: "Signals wrote 100ms ago. The query sees them now."
+
+- **Type:** Devlog / Milestone Announcement
+- **Roadmap phase:** m1p5 (Entity CRUD and Signal Write API) -- M1 complete
+- **Thesis:** Milestone 1 is done. A developer can open a tidalDB instance, define signal types with decay rates and windows, write 10,000 engagement events, and read back decay-correct scores that match analytical computation to 6 decimal places. Including after a crash. The UAT scenario passes.
+- **Source material:** The m1p5 integration test, benchmark results, git log for the M1 period
+- **When to publish:** The day M1 UAT passes.
+- **Code to include:** The full UAT scenario (or a clean excerpt). `TidalDB::open()` with schema. Signal write. Decay score read. Before/after crash recovery.
+- **Why it matters:** This is the first "it works" post. It converts skeptics from "interesting idea" to "this is real." The UAT code is the proof.
+
+---
+
+### Milestone 2: Ranked Retrieval
+
+M2 proves that a single query can retrieve, filter, score, and enforce diversity over live signals. This is where tidalDB stops being a signal engine and starts being a database.
+
+#### Post 5: "One query. Six systems. Under 50 milliseconds."
+
+- **Type:** Technical Deep Dive / Announcement
+- **Roadmap phase:** m2p5 (RETRIEVE Query Executor) -- M2 complete
+- **Thesis:** `RETRIEVE items USING PROFILE trending DIVERSITY max_per_creator:1 LIMIT 25` executes in under 50ms on 10K items. It retrieves candidates via ANN, filters by metadata, scores using live decay signals and velocity, enforces diversity, and returns a ranked list. That is what Elasticsearch + Redis + a ranking service produce. It is one query here.
+- **Source material:** m2p5 integration test, benchmark results, the dependency DAG showing how all M2 phases compose
+- **When to publish:** After M2 UAT passes.
+- **Code to include:** The RETRIEVE query. The ranked result with signal snapshots. The trending profile definition. A before/after signal burst showing the ranking change.
+- **Why it matters:** This is the money post. The one-query thesis is no longer a vision document -- it is a benchmark. Engineers who operate the 6-system stack will immediately understand what this eliminates.
+
+#### Post 6: "Diversity enforcement in 3 microseconds"
+
+- **Type:** Technical Deep Dive
+- **Roadmap phase:** m2p4 (Diversity Enforcement)
+- **Thesis:** "No more than 2 items per creator" does not belong in your API layer. It belongs in the query. tidalDB enforces diversity as a post-scoring reordering pass -- it does not reduce result count. The greedy selection algorithm runs in under 3 microseconds for 200 candidates.
+- **Source material:** m2p4 task docs, VISION.md (diversity section), benchmark results
+- **When to publish:** After m2p4 is complete.
+- **Code to include:** The DiversitySpec. The greedy selector. A concrete example showing reordering (creator A dominates pre-diversity, balanced post-diversity). Benchmark numbers.
+- **Why it matters:** Every team building a feed implements diversity in the API layer. Showing that it belongs in the database -- and costs 3 microseconds -- is a strong differentiator. This is the kind of post that gets shared in Slack channels.
+
+#### Post 7: "Ranking profiles are data, not code"
+
+- **Type:** Architecture Decision Record
+- **Roadmap phase:** m2p3 (Ranking Profile Engine)
+- **Thesis:** Changing how content is ranked should not require a code change, a deployment, or a restart. tidalDB treats ranking profiles as versioned schema declarations. Define a profile. Name it. Swap it at query time. A/B test two profiles by name. The database executes the entire pipeline.
+- **Source material:** m2p3 task docs, API.md (ranking profiles section), VISION.md
+- **When to publish:** After m2p3 is complete.
+- **Code to include:** A `trending` profile definition. A `for_you` profile definition. The same RETRIEVE query with two different profile names producing different orderings. The profile versioning API.
+- **Why it matters:** This reframes ranking as a database concern. Engineers who maintain ranking services as separate microservices will recognize the operational simplification.
+
+---
+
+### Milestone 3: Personalized Ranking
+
+M3 is where the feedback loop closes. Signal writes update the user's preference vector, the creator's interaction weight, and the item's signal ledger -- atomically, in one write. The "For You" query works.
+
+#### Post 8: "The feedback loop that closes in one write"
+
+- **Type:** Technical Deep Dive
+- **Roadmap phase:** m3p2 (Feedback Loop) completion
+- **Thesis:** When a user likes an item, the database atomically updates: the item's like count, the user-to-creator interaction weight, and the user's preference vector (shifted toward the item's embedding). One `db.signal("like", ...)` call. No Kafka consumer to lag. No feature store to sync. No cache to invalidate. The next ranking query -- even 100ms later -- reflects the change.
+- **Source material:** m3p2 task docs, ARCHITECTURE.md (Write Path section), SEQUENCE.md
+- **When to publish:** After m3p2 passes UAT.
+- **Code to include:** The signal write. The 10-step atomic update path. A before/after query showing the preference shift. The property test that proves hidden items and blocked creators never surface.
+- **Why it matters:** The closed feedback loop is the core architectural thesis of tidalDB. This post proves it works. It is the strongest argument against the 6-system stack, because the stack's primary failure mode is feedback lag.
+
+#### Post 9: "Negative signals are equal citizens"
+
+- **Type:** Architecture Decision Record
+- **Roadmap phase:** m3p2 (Feedback Loop)
+- **Thesis:** A skip is not the absence of a like. It is data. tidalDB treats negative signals -- skips, hides, blocks, "not interested" -- with the same precision and immediacy as positive signals. A skip within 3 seconds is a strong quality signal. A hide creates a permanent exclusion. A block removes all of a creator's content from all future queries. These are not afterthoughts. They are first-class signal types with their own decay rates, velocity, and ranking weight.
+- **Source material:** VISION.md (negative signals section), USE_CASES.md (UC-01 feedback), m3p2 task docs
+- **When to publish:** After m3p2 is complete. Can be bundled with or separated from Post 8.
+- **Code to include:** Signal type definitions for skip, hide, block. The penalty clause in a ranking profile. The property test: 10,000 random signal sequences never produce a result where a hidden item or blocked creator appears.
+- **Why it matters:** Most recommendation systems handle negative feedback as an afterthought -- a manual "not interested" button that writes to a separate blocklist. tidalDB's approach is architecturally different and engineers building these systems will recognize the improvement immediately.
+
+#### Post 10: "Cold start without application logic"
+
+- **Type:** Technical Deep Dive
+- **Roadmap phase:** m3p3 (Personalized Ranking Profiles)
+- **Thesis:** New items with no signals get an exploration budget. New users with no history get a sensible default from population-level signals. The application does not manage either. The exploration rate decays as signals accumulate. This is declared per ranking profile, not implemented in application code.
+- **Source material:** m3p3 task docs, VISION.md (cold start section)
+- **When to publish:** After m3p3 is complete.
+- **Code to include:** The exploration budget in a profile definition. A new item appearing in a for_you feed despite having zero signals. The decay of exploration as signals arrive.
+- **Why it matters:** Cold start is the problem everyone hacks around and no one solves cleanly. Showing a database-native solution is a strong differentiator.
+
+---
+
+### Milestone 4: Hybrid Search
+
+M4 merges full-text search with semantic similarity and signal-ranked results. Search and retrieval become the same system.
+
+#### Post 11: "Search and ranking are the same system"
+
+- **Type:** Technical Deep Dive / Announcement
+- **Roadmap phase:** m4p3 (SEARCH Query Executor) -- M4 complete
+- **Thesis:** `SEARCH items QUERY "jazz piano" VECTOR [embedding] FOR USER @user_42 USING PROFILE search LIMIT 20` combines BM25 text relevance, semantic vector similarity, and user personalization in one ranked list. The fusion uses Reciprocal Rank Fusion. Personalization re-ranks within the relevant set -- an irrelevant result never surfaces because the user likes the creator. This is one query. It replaces Elasticsearch + a vector DB + a ranking service.
+- **Source material:** m4p3 integration test, docs/research/tantivy.md, ARCHITECTURE.md (Text Search, Hybrid Fusion)
+- **When to publish:** After M4 UAT passes.
+- **Code to include:** The SEARCH query. The RRF formula. A comparison: the same query with BM25 only, ANN only, and fused. The personalization overlay changing result order for two different users.
+- **Why it matters:** Search is the most complex surface and the one engineers know best. Showing that text search, semantic search, and ranking collapse into one query is the most concrete demonstration of the 6-to-1 thesis.
+
+#### Post 12: "Tantivy as a derived index, not a source of truth"
+
+- **Type:** Architecture Decision Record
+- **Roadmap phase:** m4p1 (Tantivy Integration)
+- **Thesis:** The entity store is the source of truth. Tantivy is a materialized view. If the index is corrupted, it can be rebuilt from the entity store. Crash recovery replays from a stored sequence number. Consistency is DB-primary, not two-phase commit. This is simpler, deterministic, and the right model for an embedded database.
+- **Source material:** docs/research/tantivy.md, m4p1 task docs, ARCHITECTURE.md
+- **When to publish:** After m4p1 is complete.
+- **Code to include:** The outbox pattern. The crash recovery sequence number. The background indexer. The consistency model.
+- **Why it matters:** This is a useful architectural pattern beyond tidalDB. Engineers building systems with derived indexes will find this directly applicable.
+
+---
+
+### Milestone 5: Full Surface Coverage
+
+M5 completes all 14 use cases. The content here shifts from "how does the engine work" to "what can you build with it."
+
+#### Post 13: "14 use cases, one query engine"
+
+- **Type:** Devlog / Announcement
+- **Roadmap phase:** M5 complete
+- **Thesis:** For You feeds, trending, search, following, related content, notifications, hidden gems, controversial, live content, creator discovery, user library, cohort-scoped trending -- every surface a content platform needs, driven by the same query primitives. The application specifies profiles, filters, and context. The database executes ranking.
+- **Source material:** USE_CASES.md, M5 UAT results
+- **When to publish:** After M5 UAT passes.
+- **Code to include:** A curated selection of 4-5 queries spanning different surfaces (for_you, trending, search, hidden_gems, cohort_trending). Each with a brief setup and result.
+- **Why it matters:** This is the completeness post. It demonstrates that the database is not a toy or a prototype -- it handles the full surface area of a real content platform.
+
+#### Post 14: "Cohort-scoped trending: what is hot for people like you"
+
+- **Type:** Technical Deep Dive
+- **Roadmap phase:** M5, likely Phase 3 (Social Graph and Collaborative Filtering)
+- **Thesis:** "What's trending" means different things to different audiences. A 22-year-old in Tokyo and a 45-year-old in Texas see different trending pages -- not because of personalization (individual preference), but because different content is genuinely trending within their respective audience segments. tidalDB maintains per-cohort signal aggregation using RoaringBitmaps for O(1) membership testing and sparse fan-out for storage efficiency.
+- **Source material:** USE_CASES.md (UC-15), ARCHITECTURE.md (Cohort-scoped aggregation), API.md (Cohort Definitions)
+- **When to publish:** After cohort-scoped trending passes integration tests.
+- **Code to include:** Cohort definition. Three-layer query (global trending, cohort trending, search within cohort trending). The fan-out write path. Storage cost analysis.
+- **Why it matters:** Cohort-scoped trending is a differentiator. Most systems compute trending globally. Slicing by audience segment is a product feature that usually requires a separate analytics pipeline. tidalDB does it natively.
+
+---
+
+### Milestone 6: Production Hardening
+
+M6 is about trust. The content shifts from "what it does" to "why you can trust it."
+
+#### Post 15: "Kill it at any point. It comes back correct."
+
+- **Type:** Technical Deep Dive
+- **Roadmap phase:** m6p1 (Crash Recovery Hardening)
+- **Thesis:** We injected faults at every write-path stage. Recovery time is under 30 seconds at 1M items. WAL replay produces state identical to pre-crash. No phantom items, no lost signals, no inconsistent aggregates. The WAL is the source of truth. Everything else is derived state that can be rebuilt.
+- **Source material:** m6p1 test results, fault injection methodology
+- **When to publish:** After m6p1 passes.
+- **Code to include:** The crash simulation test. Recovery time measurements. The WAL checkpoint and replay sequence.
+- **Why it matters:** Trust is the precondition for adoption. Engineers will not embed a database they cannot crash-test. This post is the trust credential.
+
+#### Post 16: "Graceful degradation: less precise, never wrong"
+
+- **Type:** Architecture Decision Record
+- **Roadmap phase:** m6p2 (Graceful Degradation)
+- **Thesis:** Under 3x overload, tidalDB does not return errors. It reduces candidate set size, uses coarser aggregates, skips diversity enforcement, and serves from materialized cache -- in that order. Results are less precise but never incorrect. The degradation order is documented and configurable.
+- **Source material:** m6p2 task docs, ARCHITECTURE.md (Graceful degradation)
+- **When to publish:** After m6p2 is complete.
+- **Code to include:** The degradation cascade. Load test results at 1x, 2x, 3x. Latency distribution at each level.
+- **Why it matters:** This is how production systems should behave. Engineers who have been paged for "ranking service returned 500" will appreciate a system that degrades gracefully instead.
+
+---
+
+### Ongoing / Anytime
+
+These posts are not tied to specific milestones. They can be written whenever the insight is clear.
+
+#### "Why not SQL"
+
+- **Type:** Architecture Decision Record
+- **Thesis:** The custom query language exists because SQL cannot express ranking semantics without losing optimization opportunities. `FOR USER` means "load this user's preference vector and relationship graph." `USING PROFILE` means "apply this named scoring function." `DIVERSITY` means "enforce post-ranking constraints." These are not WHERE clauses.
+- **Source material:** thoughts.md (Part II.4), VISION.md (query examples), API.md
+- **When to publish:** Any time after M1. Best paired with M2 when the RETRIEVE query is functional.
+
+#### "Why we chose fjall over RocksDB (for now)"
+
+- **Type:** Architecture Decision Record
+- **Thesis:** Pure Rust, `#![forbid(unsafe_code)]`, fast compile times, trait-abstracted for swap. fjall is not the fastest LSM-tree. It is the right one for an embeddable database built by a small team that values correctness over raw throughput, with a trait boundary that makes the decision reversible.
+- **Source material:** thoughts.md (Part V.9), m1p3 task docs, CODING_GUIDELINES.md
+- **When to publish:** After m1p3 is complete (already shipped). This post is ready now.
+
+#### "USearch, not from scratch"
+
+- **Type:** Architecture Decision Record
+- **Thesis:** Correct, high-performance, concurrent HNSW with SIMD distance computation is 6-12 months of dedicated work. We are not a vector database company. USearch runs in ScyllaDB, ClickHouse, and DuckDB. The FFI boundary is thin. Build what differentiates you. Borrow what does not.
+- **Source material:** docs/research/ann_for_tidaldb.md, m2p1 task docs, ARCHITECTURE.md (Vector Index)
+- **When to publish:** After m2p1 (USearch integration) is complete.
+
+---
+
+## Post Cadence
+
+| Milestone | Posts | Approximate Pace |
+|-----------|-------|-----------------|
+| Pre-implementation | 1 | Publish when ready |
+| M1 (Signal Engine) | 2-3 | One per phase completion |
+| M2 (Ranked Retrieval) | 3 | One per major phase |
+| M3 (Personalized Ranking) | 2-3 | One per key insight |
+| M4 (Hybrid Search) | 2 | One per major phase |
+| M5 (Full Coverage) | 2 | At milestone boundaries |
+| M6 (Production Hardening) | 2 | At milestone boundaries |
+| Ongoing / ADRs | 2-3 | When the decision is fresh |
+
+**Target: 16-20 posts across the full roadmap.** Not more. Each one earns its place.
+
+---
+
+## What Not to Write
+
+- Progress updates that are changelogs. ("We merged 47 PRs this month.") Nobody cares.
+- Posts that announce intent without shipped code. ("We plan to build...") Ship first.
+- Posts with titles that are labels. ("Q1 Update," "Phase 3 Complete.") The title is the thesis.
+- Posts that explain what a concept is without showing why the reader should care. ("Windowed aggregation is...") Start with the problem.
+- Posts that use "we're excited to announce." You are not excited. You are precise.
+
+---
+
+## Reference: Roadmap to Post Mapping
+
+| Roadmap Phase | Post # | Title (Working) |
+|---------------|--------|-----------------|
+| Pre-implementation | 1 | Every content platform builds the same 6 systems from scratch |
+| m1p1-m1p3 (Foundation) | 3 | What three databases taught us before we wrote a line of code |
+| m1p4 (Signal Ledger) | 2 | Running decay scores are O(1) -- here is the math |
+| m1p5 (M1 Complete) | 4 | Signals wrote 100ms ago. The query sees them now. |
+| m2p3 (Ranking Profiles) | 7 | Ranking profiles are data, not code |
+| m2p4 (Diversity) | 6 | Diversity enforcement in 3 microseconds |
+| m2p5 (M2 Complete) | 5 | One query. Six systems. Under 50 milliseconds. |
+| m3p2 (Feedback Loop) | 8, 9 | The feedback loop that closes in one write / Negative signals are equal citizens |
+| m3p3 (Personalized Profiles) | 10 | Cold start without application logic |
+| m4p1 (Tantivy) | 12 | Tantivy as a derived index, not a source of truth |
+| m4p3 (M4 Complete) | 11 | Search and ranking are the same system |
+| M5 Complete | 13, 14 | 14 use cases, one query engine / Cohort-scoped trending |
+| m6p1 (Crash Recovery) | 15 | Kill it at any point. It comes back correct. |
+| m6p2 (Graceful Degradation) | 16 | Graceful degradation: less precise, never wrong |
+| Any time | -- | Why not SQL / Why fjall / USearch, not from scratch |
+
+---
+
+## Immediate Next Actions
+
+1. **Write Post 1** ("Every content platform builds the same 6 systems from scratch") -- this can be published now. It establishes the problem and the audience. It does not depend on shipped code.
+2. **Write Post 3** ("What three databases taught us") -- m1p1 through m1p3 are complete. The source material (thoughts.md) is rich. The code exists.
+3. **Prepare Post 2 outline** ("Running decay scores are O(1)") -- the research doc exists, the math is decided, but the implementation is not yet shipped (m1p4 is next). Write the outline. Wait for the benchmarks.
diff --git a/docs/personal-briefing-beachhead.md b/docs/personal-briefing-beachhead.md
new file mode 100644
index 0000000..07465ac
--- /dev/null
+++ b/docs/personal-briefing-beachhead.md
@@ -0,0 +1,259 @@
+# Use Case: Personal Briefing Feed (Knowledge Workers + Consumers)
+
+**Date:** 2026-02-21
+**Author:** @tidal-visionary
+**Status:** Proposed beachhead
+
+---
+
+## 1. One-Line Definition
+
+A daily and in-the-moment briefing feed that ranks what matters most for a person right now, adapts immediately to lightweight feedback, and explains why each item is shown.
+
+---
+
+## 2. The User (Not Developers)
+
+### Primary Persona
+
+**Information-overloaded decision maker**
+
+- Works in product, strategy, operations, media, investing, policy, or as a highly engaged consumer.
+- Consumes content to make decisions, not just to be entertained.
+- Feels overwhelmed by tabs, newsletters, feeds, podcasts, and chatbots.
+- Wants control over what appears (`more`, `less`, `hide`, `mute`) without heavy setup.
+- Expects trust signals and source quality, not clickbait recirculation.
+
+### Secondary Persona
+
+**Curious consumer with intent**
+
+- Follows multiple topics (career, health, finance, AI, hobbies).
+- Wants a short, high-value briefing instead of infinite scroll.
+- Will use the product if value appears on day 1 with minimal configuration.
+
+---
+
+## 3. User Job To Be Done
+
+### Functional Job
+
+"Help me understand what matters now in my domains without spending an hour hunting."
+
+### Emotional Job
+
+"Make me feel informed and in control, not behind and overwhelmed."
+
+### Social Job
+
+"Help me sound current and prepared in meetings and conversations."
+
+---
+
+## 4. Two-Sentence Hook
+
+Every morning, get a briefing ranked for what actually matters to you right now, with clear "why this" reasons and no endless scrolling.
+Tap `more`, `less`, or `hide` once, and the next refresh immediately adapts so your feed gets smarter in minutes, not weeks.
+
+---
+
+## 5. End-to-End User Experience
+
+### 5.1 Day-0 to Day-1
+
+1. User picks 5-10 interests, desired depth, and hard excludes.
+2. User chooses time budget (`5 min`, `10 min`, `20 min`) and preferred formats.
+3. Product delivers first `Today Brief` with 10-20 ranked items.
+4. Each card shows a short explanation:
+ - "Trending in your cohort"
+ - "Matches your finance + AI priority"
+ - "New source for exploration"
+5. User gives 3-5 feedback actions (`more`, `less`, `hide topic`, `mute source`, `save`).
+6. Feed refresh shows immediate adaptation.
+
+### 5.2 Daily Loop
+
+1. Morning brief arrives via app/email/push.
+2. User scans top cards quickly.
+3. User opens selected cards for deeper summary/source context.
+4. User gives lightweight feedback.
+5. System updates ranking immediately for next retrieval.
+6. Midday and evening updates are optional and scoped by time budget.
+
+### 5.3 Session-Aware Interaction
+
+1. User asks: "Only show items relevant to this week's strategy memo."
+2. Session context constrains ranking while preserving global user profile.
+3. Session expires or is closed; long-term profile remains stable.
+
+---
+
+## 6. Pressure Test (From the User Point of View)
+
+### 6.1 Test Method
+
+Pressure test assumes a skeptical user comparing against current habits:
+
+- Existing feeds (X, LinkedIn, YouTube, Reddit, news apps)
+- Newsletters and podcasts
+- General AI assistants
+
+The product only wins if the user can feel practical value within 1-3 sessions.
+
+### 6.2 Core User Questions and Required Answers
+
+| User Question | User Risk | Product Must Prove |
+|---|---|---|
+| "Why not just use my current feeds?" | No reason to switch | Better relevance + less noise + explicit control in one place |
+| "Will this take too much setup?" | Onboarding drop-off | First useful briefing in < 3 minutes |
+| "Can I trust this?" | Wrong or low-quality items | Clear source quality signals + transparent "why this" |
+| "Will it trap me in a bubble?" | Repetitive narrow feed | Enforced diversity + exploration budget |
+| "If I say less of this, does it actually change?" | Learned helplessness | Visible adaptation on next refresh |
+| "Does it respect my time?" | Fatigue and churn | Time-budget mode with 5-minute high-value brief |
+| "Can I safely use this for work decisions?" | Reputation risk | Freshness guarantees, quality gates, and easy source verification |
+| "Is my data private?" | Trust barrier | Explicit controls for retention, session scope, and deletion |
+
+### 6.3 Failure Modes That Kill Adoption
+
+| Failure Mode | User Reaction | Severity |
+|---|---|---|
+| Feed still noisy after 2 days | "This is another feed app." | Critical |
+| Feedback actions appear ignored | "I have no control." | Critical |
+| Explanations are generic or fake | "This is smoke and mirrors." | High |
+| One source dominates repeatedly | "Biased and boring." | High |
+| Important updates are stale | "I cannot rely on this." | High |
+| Too many notifications | "Annoying, uninstall." | Medium |
+| Onboarding asks too much | "Not worth it." | Medium |
+
+### 6.4 Pass Criteria (User-Perceived)
+
+1. User can identify at least 3 briefing items as "actually useful" in first session.
+2. User sees obvious feed adaptation after 3 feedback actions.
+3. User can explain why each top card appeared using the provided reason labels.
+4. User does not feel forced to scroll endlessly; time-budget mode feels real.
+5. User returns on Day 2 without a reminder from support or onboarding prompts.
+
+---
+
+## 7. Why This Beachhead Fits tidalDB
+
+This use case directly exercises tidalDB primitives without requiring a broad horizontal platform story on day 1:
+
+- **Signals:** real-time positive and negative feedback.
+- **Ranking profiles:** configurable briefing logic by context and time budget.
+- **Diversity:** hard constraints to avoid source/topic over-concentration.
+- **Cohorts:** "trending for people like me" layer.
+- **Sessions:** short-lived task context (memo prep, market scan, exam prep).
+- **Closed feedback loop:** next retrieval reflects feedback immediately.
+
+This is materially different from search-first systems and generic chat assistants because ranking quality improves through continuous, explicit user control.
+
+---
+
+## 8. Product Requirements (User-First)
+
+### 8.1 Must-Have V1 Capabilities
+
+1. `Today Brief` ranking with clear reasons.
+2. Lightweight controls: `more`, `less`, `hide topic`, `mute source`, `save`.
+3. Immediate feedback reflection on next refresh.
+4. Time-budget view (`5/10/20` minute mode).
+5. Diversity constraints for source and topic spread.
+6. Baseline cohort view: "trending for people like you."
+7. Source transparency and one-tap source access.
+
+### 8.2 Should-Have V1.5 Capabilities
+
+1. Session-scoped task mode ("for this meeting only").
+2. Morning/midday/evening briefing cadence controls.
+3. Digest email + mobile app parity.
+4. Credibility filters (verified sources, quality thresholds).
+
+### 8.3 Explicit Non-Goals for Beachhead
+
+1. Building a developer platform first.
+2. Full social graph product.
+3. Monetization optimization surfaces.
+4. End-to-end enterprise admin suite in V1.
+
+---
+
+## 9. User-Facing Metrics
+
+### 9.1 Activation
+
+1. First briefing completion rate.
+2. Median time to first "useful item saved."
+3. Feedback action rate in first session.
+
+### 9.2 Retention
+
+1. D1, D7, D30 return rate.
+2. Average sessions per active day.
+3. Percentage of users using briefing at least 4 days per week.
+
+### 9.3 Quality
+
+1. "Useful item rate" per session.
+2. Repeated-unwanted-item rate after negative feedback.
+3. Diversity score by source/topic in top 10 results.
+4. Freshness score for time-sensitive domains.
+
+### 9.4 Trust
+
+1. Explanation usefulness rating.
+2. Source credibility acceptance rate.
+3. Reported "feed felt biased/repetitive" rate.
+
+---
+
+## 10. Launch Plan (Beachhead Scope)
+
+### Phase A: Concierge Pilot (20-50 users)
+
+1. Target one segment: strategy/product/analyst professionals.
+2. Run daily brief with strong manual QA on source quality and reasons.
+3. Capture explicit user interview feedback after each week.
+
+### Phase B: Productized Beta (200-500 users)
+
+1. Self-serve onboarding under 3 minutes.
+2. Reliable immediate feedback loop.
+3. Basic cohort view and time-budget mode.
+
+### Phase C: Scaled Consumer Entry
+
+1. Multi-domain templates (finance, tech, health, creator economy).
+2. Push/email cadence personalization.
+3. Quality and trust controls as default UX, not advanced settings.
+
+---
+
+## 11. Strategic Risks and Mitigations
+
+| Risk | Impact | Mitigation |
+|---|---|---|
+| Trying to solve too many surfaces at once | Slow execution, weak product feel | Ship one briefing surface first |
+| Over-personalization creates bubble | Reduced discovery trust | Enforce exploration/diversity budgets |
+| Weak source quality gates | Credibility collapse | Add quality floor and transparent sourcing |
+| Slow adaptation to user feedback | Perceived irrelevance | Prioritize immediate write-to-read reflection |
+| Too much AI summary, not enough evidence | Trust erosion | Keep source links and quote-backed rationale visible |
+
+---
+
+## 12. Kill Criteria (Be Honest Early)
+
+Stop or pivot if any of the following remain true after two iteration cycles:
+
+1. D7 retention remains below acceptable threshold for target segment.
+2. Users do not perceive adaptation after direct feedback actions.
+3. "Useful item rate" fails to outperform a simple baseline feed.
+4. User interviews repeatedly describe the product as "another noisy feed."
+
+---
+
+## 13. Decision
+
+This is the right forward-looking beachhead for tidalDB if the goal is knowledge workers and consumers rather than developers.
+
+It is narrow enough to ship, painful enough to matter, and aligned with tidalDB's actual architectural advantage: real-time, feedback-aware ranking with explicit user control and transparent reasons.
diff --git a/docs/planning/PRODUCT_ROADMAP.md b/docs/planning/PRODUCT_ROADMAP.md
new file mode 100644
index 0000000..7aac200
--- /dev/null
+++ b/docs/planning/PRODUCT_ROADMAP.md
@@ -0,0 +1,217 @@
+# Product Roadmap: Personal Briefing Feed
+
+**Date:** 2026-02-21
+**Owner:** Product track (knowledge workers + consumers)
+**Status:** Draft for execution
+
+---
+
+## Vision
+
+Ship a daily briefing product that helps users identify what matters now, adapt the feed instantly with lightweight feedback, and trust the results enough to return habitually.
+
+## Product Thesis
+
+People do not need another infinite feed. They need a controllable, high-signal briefing that respects time, explains relevance, and improves immediately from feedback.
+
+---
+
+## Milestone Summary
+
+| # | Name | Proves | Depends On |
+|---|------|--------|------------|
+| P0 | Beachhead Validation | Users care enough to return for a daily briefing | M0 + partial M1 |
+| P1 | Concierge Alpha | Briefing usefulness and immediate adaptation in a narrow segment | M1 + partial M2 |
+| PG1 | Personalization Core Done (Blocking Gate) | Core personalization loop is correct, immediate, and meaningfully better than baseline | P1 + M1/M2/M3 core slices |
+| P2 | Productized Beta | Self-serve onboarding, trust UX, and repeatability without manual curation | M2 + partial M3 |
+| P3 | Public Launch | Reliability, quality floor, and trust controls at public volume | M3 + M5 core + M6 partial |
+| P4 | Scale + Revenue Fit | Sustainable growth and monetization without quality collapse | M6 + M7 |
+
+---
+
+## Current Status
+
+| Phase | Status |
+|-------|--------|
+| p0: Beachhead Validation | NOT STARTED |
+| p1: Concierge Alpha | NOT STARTED |
+| pg1: Personalization Core Done gate | NOT STARTED |
+| p2: Productized Beta | NOT STARTED |
+| p3: Public Launch | NOT STARTED |
+| p4: Scale + Revenue Fit | NOT STARTED |
+
+**Current phase:** p0 (Beachhead Validation)
+
+---
+
+## Milestone P0: Beachhead Validation
+
+### Milestone Thesis
+
+Validate that a personal briefing feed solves a painful daily job for target users and creates early repeat usage.
+
+### Done When
+
+- 20-50 users recruited in target segment.
+- Daily briefing prototype used for 2+ weeks.
+- Median user gives at least one feedback action per session.
+- Interview evidence confirms value over baseline feeds.
+- D2 and D7 retention meet agreed threshold.
+
+### Phase Plan
+
+- Phase 1: Target Segment and Research Ops
+- Phase 2: Concierge Briefing Loop
+- Phase 3: Signal and Retention Evaluation
+
+---
+
+## Milestone P1: Concierge Alpha
+
+### Milestone Thesis
+
+Deliver a high-value daily `Today Brief` with transparent reasons and immediate feedback reflection for a narrow cohort.
+
+### Done When
+
+- Ranked briefing cards with source links and reason labels are live.
+- `more/less/hide/mute/save` controls are available and used.
+- Next refresh visibly reflects negative feedback.
+- Time-budget mode (`5/10/20`) is used by pilot users.
+- Weekly active usage indicates repeated utility.
+
+### Phase Plan
+
+- Phase 1: Briefing UX and Reason Labels
+- Phase 2: Feedback Controls and Real-Time Adaptation
+- Phase 3: Quality and Diversity Guardrails
+
+---
+
+## Milestone P2: Productized Beta
+
+### Milestone Thesis
+
+Turn concierge alpha into a self-serve product with stable onboarding and trust-preserving defaults.
+
+### Done When
+
+- Onboarding can be completed in under 3 minutes.
+- Cohort layer (`trending for people like you`) is available.
+- Explanations are clear and consistent per card.
+- D7 retention and useful-item rate beat baseline.
+- Manual curation is no longer required for normal operation.
+
+### Blocking Prerequisite
+
+P2 cannot start until **PG1 Personalization Core Done** passes.
+
+### Phase Plan
+
+- Phase 1: Self-Serve Onboarding and Profile Bootstrapping
+- Phase 2: Cohort and Context Views
+- Phase 3: Trust Controls and Preference Persistence
+
+---
+
+## Milestone P3: Public Launch
+
+### Milestone Thesis
+
+Launch publicly with reliability and trust controls suitable for broader consumer and knowledge-worker usage.
+
+### Done When
+
+- Reliability and latency SLOs for briefing generation are met.
+- Freshness, duplicate suppression, and source quality floor are enforced.
+- Notification cadence controls prevent fatigue.
+- Product support and incident playbook are active.
+
+### Phase Plan
+
+- Phase 1: Reliability and SLO Enforcement
+- Phase 2: Operational Quality Controls
+- Phase 3: Launch Readiness and Support Ops
+
+---
+
+## Milestone P4: Scale + Revenue Fit
+
+### Milestone Thesis
+
+Scale the product and validate monetization while preserving user trust and briefing quality.
+
+### Done When
+
+- Monetization model validated.
+- Revenue and quality metrics monitored together.
+- Retention remains stable as volume increases.
+- Next segment expansion is backed by product data.
+
+### Phase Plan
+
+- Phase 1: Monetization Experiments
+- Phase 2: Quality-Protecting Growth
+- Phase 3: Expansion Strategy
+
+---
+
+## Milestone PG1: Personalization Core Done (Blocking Gate)
+
+### Milestone Thesis
+
+Before expanding product breadth, prove the personalization loop is technically correct, immediately responsive, and materially useful to users.
+
+### Must Pass (All Required)
+
+- **Hard negatives are exact:** `hide/mute/block` items and creators never leak after write, restart, or replay.
+- **Immediate adaptation is real:** `more/less/skip/save` actions change next-refresh ranking within target latency budget.
+- **Replay correctness:** user personalization state (seen-state, preference shifts, relationship weights) rebuilds deterministically from checkpoint + WAL replay.
+- **Quality uplift vs baseline:** useful-item rate and repeated-unwanted-item rate beat a non-personalized baseline feed.
+- **Diversity with personalization:** relevance holds while source/topic domination stays within guardrails.
+
+### Why This Gate Exists
+
+Without this gate, roadmap execution drifts into breadth (more surfaces, more modes) before the core personalization promise is trustworthy.
+
+---
+
+## Product Metrics
+
+### Activation
+
+- First briefing completion rate
+- Time to first useful item saved
+- Feedback action rate in first session
+
+### Retention
+
+- D1/D7/D30 retention
+- Sessions per active user per week
+- Percentage of users with 4+ briefing days per week
+
+### Quality and Trust
+
+- Useful-item rate per briefing
+- Repeated-unwanted-item rate after negative feedback
+- Diversity score in top 10 items
+- Explanation usefulness rating
+
+---
+
+## Dependencies on Engine Track
+
+| Product Capability | Primary Engine Dependencies |
+|--------------------|-----------------------------|
+| Real-time adaptation from feedback | M1 Signal Engine, M3 Feedback Loop |
+| Cohort briefing layer | M3 User model, M6 cohort surface support |
+| Search-within-briefing | M5 Hybrid Search |
+| Public reliability | M7 Production Hardening |
+
+---
+
+## Planning References
+
+- Use case: `docs/personal-briefing-beachhead.md`
+- Engine roadmap: `docs/planning/ROADMAP.md`
+- Product milestone execution: `docs/planning/milestone-p/`
diff --git a/docs/planning/ROADMAP.md b/docs/planning/ROADMAP.md
index ab3282f..e3eeef4 100644
--- a/docs/planning/ROADMAP.md
+++ b/docs/planning/ROADMAP.md
@@ -14,12 +14,30 @@ A single embeddable database can replace the 6-system content ranking stack by t
| # | Name | Proves | Enables |
|---|------|--------|---------|
+| M0 | Embeddable Runtime | tidalDB can run in-process with zero-config defaults and tooling | Cuts proof-of-concept friction, enables internal dogfooding |
| M1 | Signal Engine | Signals are a database primitive with O(1) decay, not application math | UC-03 (partial), UC-06 (partial), UC-14 (partial) |
| M2 | Ranked Retrieval | A single query retrieves, scores, and ranks content using live signals | UC-03, UC-04, UC-06, UC-08, UC-13, UC-14 |
| M3 | Personalized Ranking | User context shapes retrieval and ranking -- the "For You" query works | UC-01, UC-05, UC-07, UC-09 (partial) |
-| M4 | Hybrid Search | Text + semantic + signal-ranked search in one query | UC-02, UC-10, UC-11 |
-| M5 | Full Surface Coverage | Every use case, every sort mode, every filter, every feedback loop | UC-01 through UC-14 complete |
-| M6 | Production Hardening | Crash safety, graceful degradation, operational readiness | All UCs at production quality |
+| M4 | Agent Memory | Agents can create sessions, write signals, and enforce policy inside tidalDB | Agent-mediated personalization, RLHF loops, conversational memory |
+| M5 | Hybrid Search | Text + semantic + signal-ranked search in one query | UC-02, UC-10, UC-11 |
+| M6 | Full Surface Coverage | Every use case, every sort mode, every filter, every feedback loop | UC-01 through UC-14 complete |
+| M7 | Production Hardening | Crash safety, graceful degradation, operational readiness | All UCs at production quality |
+
+### Product Milestone Summary (New)
+
+The roadmap now has two tracks:
+
+- **Engine Track (M0-M7):** proves tidalDB capabilities.
+- **Product Track (P0-P4):** proves end-user value for the beachhead product.
+
+| # | Name | Proves | Depends On |
+|---|------|--------|------------|
+| P0 | Beachhead Validation | Knowledge workers and consumers care about a personal briefing feed enough to use it repeatedly | M0 (embedding/runtime), partial M1 |
+| P1 | Concierge Alpha | Daily "Today Brief" with explicit feedback controls creates Day-2 retention in a small cohort | M1 complete, partial M2 |
+| PG1 | Personalization Core Done (Blocking Gate) | Personalization loop is correct, immediate, and measurably better than baseline | P1 + M1/M2/M3 core slices |
+| P2 | Productized Beta | Self-serve onboarding + real-time adaptation + explanation UX works without manual curation | M2 complete, partial M3 |
+| P3 | Public Launch | The product is reliable, useful, and trusted at real user volume | M3 + M5 core, M6 partial |
+| P4 | Scale + Revenue Fit | Sustainable retention and monetization without quality collapse | M6 + M7 |
---
@@ -27,13 +45,22 @@ A single embeddable database can replace the 6-system content ranking stack by t
| Phase | Status | Tests |
|-------|--------|-------|
+| **m0p1: Embeddable Runtime Skeleton** | COMPLETE | 329 passing (293 unit + 36 integration + 3 doc) |
+| **m0p2: Tooling & Diagnostics** | COMPLETE | 349 passing (+7 metrics unit + 7 metrics integration + 9 tidalctl CLI) |
+| m0p3: Samples & Docs | NOT STARTED | -- |
| **m1p1: Core Type System and Schema** | COMPLETE | 77 passing |
| **m1p2: Write-Ahead Log** | COMPLETE | passing (unit + integration) |
| **m1p3: Storage Engine Trait and fjall Backend** | COMPLETE | 140 passing (128 unit + 12 integration) |
| m1p4: Signal Ledger | NOT STARTED | -- |
| m1p5: Entity CRUD and Signal Write API | NOT STARTED | -- |
+| P0: Beachhead Validation | NOT STARTED | -- |
+| P1: Concierge Alpha | NOT STARTED | -- |
+| PG1: Personalization Core Done gate | NOT STARTED | -- |
+| P2: Productized Beta | NOT STARTED | -- |
+| P3: Public Launch | NOT STARTED | -- |
+| P4: Scale + Revenue Fit | NOT STARTED | -- |
-**Current phase:** m1p4 (Signal Ledger) is next. m1p2 and m1p3 are complete, unblocking m1p4.
+**Current phase:** m0p2 (Tooling & Diagnostics) or m1p4 (Signal Ledger) — m0p1 unblocks m0p2; m1p2 and m1p3 unblock m1p4.
**Lessons learned:**
- m1p3 keyspaces are organized per `EntityKind` ("items", "users", "creators"), not by data category. The `Tag` enum in key encoding provides the data-category namespace within each entity-kind keyspace.
@@ -42,6 +69,146 @@ A single embeddable database can replace the 6-system content ranking stack by t
---
+## Product Track: Personal Briefing Feed (Knowledge Workers + Consumers)
+
+This track defines the milestones for the **actual product experience** (not only the database engine).
+Use case reference: `docs/personal-briefing-beachhead.md`.
+Dedicated roadmap: `docs/planning/PRODUCT_ROADMAP.md`.
+
+### P0: Beachhead Validation -- "Do users care enough to return?"
+
+**Milestone Thesis**
+
+Validate that a personal briefing feed solves a painful daily job for users and drives repeat use.
+
+**Acceptance Criteria**
+- [ ] Recruit 20-50 target users (knowledge workers + high-intent consumers).
+- [ ] Run daily briefing prototype (can include manual source QA).
+- [ ] At least one meaningful feedback action per session for the median user (`more`, `less`, `hide`, `mute`, `save`).
+- [ ] User interviews confirm value vs baseline feeds ("less noise", "more useful", "saves time").
+- [ ] D2 retention reaches agreed threshold for target segment.
+
+### P1: Concierge Alpha -- "High-value daily brief for a narrow cohort"
+
+**Milestone Thesis**
+
+Deliver a reliable daily `Today Brief` experience with immediate visible adaptation after user feedback.
+
+**Acceptance Criteria**
+- [ ] App surface: ranked brief, reason labels, source links, save/feedback controls.
+- [ ] Feedback loop: next refresh reflects `less/hide/mute` actions immediately.
+- [ ] Time-budget mode (`5/10/20` min) is available and used.
+- [ ] Diversity constraints prevent source/topic domination in top results.
+- [ ] Weekly active usage demonstrates repeated utility.
+
+### P2: Productized Beta -- "Self-serve and repeatable without handholding"
+
+**Milestone Thesis**
+
+Turn the alpha into a self-serve product with stable onboarding, trust UX, and measurable quality.
+
+**Acceptance Criteria**
+- [ ] Self-serve onboarding completed in under 3 minutes.
+- [ ] "Why this" explanations are present and understandable on every briefing card.
+- [ ] Cohort layer available ("trending for people like you").
+- [ ] Trust controls available (source transparency, mute/hide persistence).
+- [ ] D7 retention and "useful item rate" exceed baseline comparison feed.
+- [ ] **PG1 Personalization Core Done gate has passed.**
+
+### P3: Public Launch -- "Trusted at real volume"
+
+**Milestone Thesis**
+
+Launch publicly with reliability, quality, and trust guardrails suitable for broad use.
+
+**Acceptance Criteria**
+- [ ] Reliability and latency SLOs defined and met for briefing generation.
+- [ ] Quality floor enforced (freshness, source quality, duplicate suppression).
+- [ ] Notification cadence controls prevent spam.
+- [ ] Core support and incident process in place for user-facing regressions.
+
+### P4: Scale + Revenue Fit -- "Sustainable business without degrading quality"
+
+**Milestone Thesis**
+
+Prove the product can grow and monetize while preserving user trust and briefing quality.
+
+**Acceptance Criteria**
+- [ ] Monetization model validated (subscription, team plan, or equivalent).
+- [ ] Revenue metrics tracked alongside quality metrics (no quality-revenue trade-off regressions).
+- [ ] Retention and engagement remain stable as volume increases.
+- [ ] Product roadmap for next segment expansion is data-backed.
+
+### PG1: Personalization Core Done (Blocking Gate)
+
+**Milestone Thesis**
+
+Before product breadth expansion, the core personalization loop must be provably correct and immediately responsive.
+
+**Acceptance Criteria**
+- [ ] Hard negatives (`hide/mute/block`) never leak after write, restart, or replay.
+- [ ] Explicit feedback (`more/less/skip/save`) changes next-refresh ranking within target latency.
+- [ ] User personalization state rebuilds deterministically from checkpoint + WAL replay.
+- [ ] Useful-item rate and repeated-unwanted-item rate outperform a non-personalized baseline.
+- [ ] Diversity guardrails hold while maintaining personalization quality.
+
+---
+
+## Milestone 0: Embeddable Runtime -- "Runs in your process in minutes"
+
+### Milestone Thesis
+
+Before we prove any ranking math, developers must be able to embed tidalDB inside an existing service with zero operational prep. M0 delivers the runtime glue — an ergonomic builder API, deterministic storage layout, a tiny admin CLI, and living examples — so the very first experience is `cargo add tidaldb`, `TidalDb::builder().in_memory().open()`, and a passing smoke test.
+
+### Phases
+
+#### Phase 1: Embeddable Runtime Skeleton
+
+**Delivers:** A cohesive `Config`/`Builder` API for single-process use, with in-memory and filesystem-backed defaults, sandboxed data directories, and graceful shutdown hooks developers can call from tests or application drop handlers.
+
+- Builder exposes `ephemeral()` / `single_process()` shortcuts and eagerly validates directories.
+- Shutdown hooks drain WAL writer threads and surface errors.
+- Temp-directory helper guarantees deterministic cleanup (used in doctests).
+
+#### Phase 2: Tooling & Diagnostics
+
+**Delivers:** `tidalctl` (a minimal CLI) for inspecting embedded instances, plus a lightweight metrics surface (Prometheus text or JSON) tagged with the same IDs future distributed deployments will use.
+
+- `tidalctl status --path
` returns JSON with WAL seq, config hash, uptime.
+- Metrics endpoint optional (disabled by default) exposes `/metrics` and `/healthz`.
+- Tooling reuses the same path helpers from Phase 1.
+
+#### Phase 3: Samples & Docs
+
+**Delivers:** Quick-start samples (For You POC + integration tests) compiled as doctests, and reference snippets for embedding tidalDB inside Axum/Actix or a CLI app. Keeps DX in lockstep with the runtime.
+
+- Quickstart example + doctest run under CI (`cargo test --doc --examples`).
+- Axum/Actix embedding examples include graceful shutdown + metrics wiring.
+- CONTRIBUTING updated with “run samples” checklist.
+### UAT Scenario
+
+```
+Given:
+ // in tests/lib.rs
+ let db = TidalDb::builder()
+ .ephemeral()
+ .with_temp_dir()
+ .open()
+ .unwrap();
+
+When:
+ db.health_check(); // ok
+ tidalctl status --path // prints WAL, storage, signal counts
+ cargo test --doc // quick-start snippet compiles & runs
+
+Then:
+ - Builder defaults require zero manual config
+ - CLI connects to the same files used by the embedded process
+ - Samples stay in sync (failing doctest fails CI)
+```
+
+---
+
## Milestone 1: Signal Engine -- "Signals are a database primitive"
### Milestone Thesis
@@ -586,13 +753,13 @@ Then:
### Deferred to Later Milestones
-- **SEARCH query with personalization** -- deferred to M4; M3 proves personalized RETRIEVE
-- **Tantivy integration** -- deferred to M4
-- **People/creator search (UC-10)** -- deferred to M4
-- **Social graph traversal for trending ("trending among my follows")** -- deferred to M5; requires graph query capabilities beyond simple follows filter
-- **Collaborative filtering** -- basic co-engagement signals used in `related` profile; full matrix-factorization-style CF deferred to M5
-- **User-created collections/boards (UC-09.4)** -- deferred to M5
-- **Live content status tracking (UC-12)** -- deferred to M5
+- **SEARCH query with personalization** -- deferred to M5; M3 proves personalized RETRIEVE
+- **Tantivy integration** -- deferred to M5
+- **People/creator search (UC-10)** -- deferred to M5
+- **Social graph traversal for trending ("trending among my follows")** -- deferred to M6; requires graph query capabilities beyond simple follows filter
+- **Collaborative filtering** -- basic co-engagement signals used in `related` profile; full matrix-factorization-style CF deferred to M6
+- **User-created collections/boards (UC-09.4)** -- deferred to M6
+- **Live content status tracking (UC-12)** -- deferred to M6
### Integration Test
@@ -635,7 +802,54 @@ The full "For You" query works: `RETRIEVE items FOR USER @user_id USING PROFILE
---
-## Milestone 4: Hybrid Search -- "Text + semantic + signals in one query"
+## Milestone 4: Agent Memory -- "Agents own the personalization substrate"
+
+### Milestone Thesis
+
+Agents mediate the user interaction: they ground LLM responses, collect preferences, and emit feedback. This milestone proves a developer can embed tidalDB alongside an agent runtime, create sessions, append structured feedback signals (reward, tool usage, critiques), enforce per-agent policy, and query session memory in milliseconds.
+
+### Phases
+
+#### Phase 1: Session Schema & Lifecycle
+
+**Delivers:** `SessionId`, `AgentId`, and `AgentPolicy` types in schema plus builder flags (`with_sessions(true)`). APIs to `start_session`, `append_session_metadata`, `close_session`. WAL entries tagged with agent metadata and CLI output listing active sessions.
+
+#### Phase 2: Session Materializers & Short-Lived Aggregates
+
+**Delivers:** `SessionMaterializer` (minute-scale decay buckets for reward/pref hints, tool usage counters) registered via the existing materializer trait. Query APIs `session_view(session_id)` and `session_velocity(session_id, signal_type)` with <5µs read latency. Integration tests proving hot path throughput at 50k updates/sec.
+
+#### Phase 3: Policy & Safety Layer
+
+**Delivers:** Declarative schema-bound policies (allowed signal types, max QPS, storage TTL). Enforcement in the signal write path (reject or queue). Audit log per agent (accessible via CLI/metrics) plus rate-limiters to isolate noisy agents.
+
+#### Phase 4: Agent-Facing APIs & Explanations
+
+**Delivers:** `retrieve_for_session` / `search_for_session` endpoints returning ranked items plus a `session_snapshot` (top signals, reasons, reward velocity). Agent-friendly error codes, documentation, and samples (user → agent → tidalDB). Session data plumbed into ranking profiles via new `SessionContext`.
+
+### UAT Scenario
+
+```
+Given:
+ An agent opens session S for user @u123 with metadata {tool:"planner"}
+ Policy allows signals preference_hint and reward; forbids raw_log
+
+When:
+ 1. Agent writes preference_hint ("more jazz today")
+ 2. Agent writes reward(+0.8) after delivering an answer
+ 3. Agent executes RETRIEVE ... FOR USER @u123 FOR SESSION @S USING PROFILE for_you LIMIT 10
+ 4. Agent receives ranked items and session_snapshot (reward_velocity, last_tool)
+ 5. Agent attempts to write raw_log → rejected with policy violation
+ 6. Session closes; CLI shows duration, writes, rejections
+
+Then:
+ - Session aggregates reflect preference/reward immediately
+ - Policy enforcement blocks disallowed write with audit trail
+ - After closure, querying session S returns archived snapshot with final signals
+```
+
+---
+
+## Milestone 5: Hybrid Search -- "Text + semantic + signals in one query"
### Milestone Thesis
@@ -762,7 +976,7 @@ A developer can execute SEARCH queries that combine full-text BM25 relevance wit
---
-## Milestone 5: Full Surface Coverage -- "Every use case works"
+## Milestone 6: Full Surface Coverage -- "Every use case works"
### Milestone Thesis
@@ -851,8 +1065,8 @@ Then:
### Deferred to Later Milestones
- **Signal rollups (hourly/daily materialization)** -- built if 100K-item benchmarks show bucketed counters exceeding the latency budget for 30d+ windows
-- **Multi-vector user interest clustering (PinnerSage)** -- deferred to M6 or beyond; single preference vector serves through M5
-- **ACORN-1 two-hop expansion for very selective filters** -- deferred to M6; USearch predicate callback sufficient through M5
+- **Multi-vector user interest clustering (PinnerSage)** -- deferred to M7 or beyond; single preference vector serves through M6
+- **ACORN-1 two-hop expansion for very selective filters** -- deferred to M7; USearch predicate callback sufficient through M6
### Done When
@@ -860,7 +1074,7 @@ All 14 use cases pass their UAT scenarios as defined in USE_CASES.md. All 25+ so
---
-## Milestone 6: Production Hardening -- "Ready for real workloads"
+## Milestone 7: Production Hardening -- "Ready for real workloads"
### Milestone Thesis
@@ -899,13 +1113,13 @@ Then:
### Phases
-(Phases for M6 are provisional -- detailed decomposition happens after M5 ships.)
+(Phases for M7 are provisional -- detailed decomposition happens after M6 ships.)
#### Phase 1: Crash Recovery Hardening
**Delivers:** Comprehensive crash recovery testing and hardening. Fault injection at every write-path stage. Recovery time targets. WAL compaction and checkpoint optimization.
-**Depends On:** M5 complete
+**Depends On:** M6 complete
**Complexity:** XL
#### Phase 2: Graceful Degradation Under Load
@@ -929,7 +1143,7 @@ Then:
**Depends On:** Phase 1
**Complexity:** M
-### Deferred (Post-M6 / Future)
+### Deferred (Post-M7 / Future)
- **Horizontal distribution** -- the single-node architecture scales vertically first; distribution is a separate product decision
- **Multi-tenancy** -- per-tenant isolation within a single tidalDB instance
@@ -946,22 +1160,22 @@ tidalDB operates correctly at 1M items under sustained concurrent read/write loa
## Use Case Coverage Progression
-| UC | Description | M1 | M2 | M3 | M4 | M5 | M6 |
-|----|-------------|----|----|----|----|----|----|
-| UC-01 | For You Feed | - | - | **Full** | Full | Full | Full |
-| UC-02 | Search | - | - | - | **Core** | **Full** | Full |
-| UC-03 | Trending/Rising | Signals | **Full** | Full | Full | Full | Full |
-| UC-04 | Following Feed | - | Partial | **Full** | Full | Full | Full |
-| UC-05 | Related/Up Next | - | - | **Core** | Core | **Full** | Full |
-| UC-06 | Browse/Category | Signals | **Core** | Core | Core | **Full** | Full |
-| UC-07 | Notifications | - | - | **Core** | Core | **Full** | Full |
-| UC-08 | Creator Profile | - | **Core** | Core | Core | **Full** | Full |
-| UC-09 | User Library | - | - | Partial | Partial | **Full** | Full |
-| UC-10 | People Search | - | - | - | **Core** | **Full** | Full |
-| UC-11 | Visual/Semantic | - | - | - | Partial | **Full** | Full |
-| UC-12 | Live Content | - | - | - | - | **Full** | Full |
-| UC-13 | Hidden Gems | - | **Full** | Full | Full | Full | Full |
-| UC-14 | Controversial/Hot | Signals | **Full** | Full | Full | Full | Full |
+| UC | Description | M1 | M2 | M3 | M4 | M5 | M6 | M7 |
+|----|-------------|----|----|----|----|----|----|----|
+| UC-01 | For You Feed | - | - | **Full** | Full | Full | Full | Full |
+| UC-02 | Search | - | - | - | - | **Core** | **Full** | Full |
+| UC-03 | Trending/Rising | Signals | **Full** | Full | Full | Full | Full | Full |
+| UC-04 | Following Feed | - | Partial | **Full** | Full | Full | Full | Full |
+| UC-05 | Related/Up Next | - | - | **Core** | Core | Core | **Full** | Full |
+| UC-06 | Browse/Category | Signals | **Core** | Core | Core | Core | **Full** | Full |
+| UC-07 | Notifications | - | - | **Core** | Core | Core | **Full** | Full |
+| UC-08 | Creator Profile | - | **Core** | Core | Core | Core | **Full** | Full |
+| UC-09 | User Library | - | - | Partial | Partial | Partial | **Full** | Full |
+| UC-10 | People Search | - | - | - | - | **Core** | **Full** | Full |
+| UC-11 | Visual/Semantic | - | - | - | - | Partial | **Full** | Full |
+| UC-12 | Live Content | - | - | - | - | - | **Full** | Full |
+| UC-13 | Hidden Gems | - | **Full** | Full | Full | Full | Full | Full |
+| UC-14 | Controversial/Hot | Signals | **Full** | Full | Full | Full | Full | Full |
Legend:
- `-` = Not addressed
diff --git a/docs/planning/milestone-0/phase-1/OVERVIEW.md b/docs/planning/milestone-0/phase-1/OVERVIEW.md
new file mode 100644
index 0000000..1a292a0
--- /dev/null
+++ b/docs/planning/milestone-0/phase-1/OVERVIEW.md
@@ -0,0 +1,15 @@
+# Milestone 0 · Phase 1 — Embeddable Runtime Skeleton
+
+**Objective:** ship an ergonomic, zero-config builder so engineers can spin up a tidalDB instance inside tests or services without touching the signal stack yet.
+
+**Success criteria**
+- `TidalDb::builder()` exposes `ephemeral()` and `single_process(data_dir)` shortcuts with sensible defaults.
+- `Config` validates eagerly (missing dirs, bad permissions) so failures happen before WAL threads spin up.
+- Builder registers shutdown hooks (`Drop` + explicit `close()`), returning errors if background workers fail to drain.
+- Temp-directory helper guarantees deterministic cleanup (used by doctests + integration tests).
+
+**Dependencies:** none — runs before any signal-specific work.
+
+**Blocked by:** n/a
+
+**Unblocks:** M0 Phase 2 (CLI needs stable layout), all later milestones (test harnesses use the builder).
diff --git a/docs/planning/milestone-0/phase-1/task-01-builder-and-config.md b/docs/planning/milestone-0/phase-1/task-01-builder-and-config.md
new file mode 100644
index 0000000..b1c2c99
--- /dev/null
+++ b/docs/planning/milestone-0/phase-1/task-01-builder-and-config.md
@@ -0,0 +1,13 @@
+# Task 01 — Builder + Config API
+
+**Goal:** expose a fluent API that hides all the knobs required to open a single-process tidalDB instance.
+
+## Deliverables
+- `TidalDbBuilder` with methods: `ephemeral()`, `with_data_dir(Path)`, `cache_dir(Path)`, `wal_dir(Path)`, `validate()` and `open()`.
+- `Config` struct that can be serialized (for CLI) and implements `Default` tuned for embeddable use (single WAL segment, no background threads yet).
+- Unit tests proving builder errors when directories are missing/unwritable.
+
+## Acceptance Criteria
+- `cargo test builder` passes on macOS + Linux runners.
+- Builder docs contain runnable snippet used in Phase 3 doctests.
+- No public API exposes fjall/useless internals; consumers only see `Config` + `TidalDbBuilder`.
diff --git a/docs/planning/milestone-0/phase-1/task-02-sandboxed-storage.md b/docs/planning/milestone-0/phase-1/task-02-sandboxed-storage.md
new file mode 100644
index 0000000..88e113a
--- /dev/null
+++ b/docs/planning/milestone-0/phase-1/task-02-sandboxed-storage.md
@@ -0,0 +1,13 @@
+# Task 02 — Sandboxed Storage Layout
+
+**Goal:** deterministic filesystem layout for embedded instances so tooling/cleanup is reliable.
+
+## Deliverables
+- `Paths` helper that derives `{base}/wal`, `{base}/items`, `{base}/users`, etc., and ensures directories exist with correct perms.
+- Temp-dir helper (`TempTidalHome`) for tests; implements Drop to delete directories unless `preserve=true`.
+- Documentation table describing folder purpose for future CLI use.
+
+## Acceptance Criteria
+- Integration test proves two builders pointing to different `TempTidalHome` roots never collide.
+- Cleanup confirmed via test that drops the helper and asserts directories removed.
+- Paths helper reused by upcoming CLI (Phase 2) — single source of truth.
diff --git a/docs/planning/milestone-0/phase-2/OVERVIEW.md b/docs/planning/milestone-0/phase-2/OVERVIEW.md
new file mode 100644
index 0000000..2d2a29c
--- /dev/null
+++ b/docs/planning/milestone-0/phase-2/OVERVIEW.md
@@ -0,0 +1,12 @@
+# Milestone 0 · Phase 2 — Tooling & Diagnostics
+
+**Objective:** give developers minimal introspection tooling that works even when tidalDB is embedded. This phase adds `tidalctl` plus a metrics endpoint so later milestones can reuse the same plumbing.
+
+**Success criteria**
+- `tidalctl status --path ` prints build info, WAL seq, open segments, and builder config snapshot.
+- Metrics exporter (text or JSON) exposes uptime, WAL queue depth, and build hash; tag everything with `partition_id=0` to future-proof multi-node rollouts.
+- Tooling uses the same `Paths` helper from Phase 1 — no duplicated layout logic.
+
+**Dependencies:** Phase 1 (stable directories + config serialization).
+
+**Unblocks:** future performance debugging + automated tests that assert health via CLI before/after workloads.
diff --git a/docs/planning/milestone-0/phase-2/SCOPING.md b/docs/planning/milestone-0/phase-2/SCOPING.md
new file mode 100644
index 0000000..010c489
--- /dev/null
+++ b/docs/planning/milestone-0/phase-2/SCOPING.md
@@ -0,0 +1,443 @@
+# Milestone 0, Phase 2: Tooling & Diagnostics -- Scoping Decisions
+
+**Date:** 2026-02-20
+**Author:** @tidal-visionary (Spencer Kimball)
+**Status:** APPROVED -- ready for implementation
+
+---
+
+## Context
+
+m0p1 (Embeddable Runtime Skeleton) is complete. m1p1-p3 (Type System, WAL, Storage Engine) are also complete. The codebase has:
+
+- `TidalDb` as a thin handle: holds `Config`, has `health_check()`, `close()`, `Drop`
+- A full WAL implementation (`WalHandle`, `SegmentWriter`, `CheckpointManager`) that writes segment files (`wal-{seq:020}.seg`) and checkpoint metadata (`checkpoint.meta`) to disk
+- No `db.signal()` yet in the public API (deferred to m1p5)
+- No WAL writes from the `TidalDb` public API -- the WAL is implemented but not wired to the `TidalDb` facade
+- `Config` has no serde derive -- it is a plain struct with no serialization
+- Single crate `tidal/`, no workspace
+
+The task documents in `phase-2/` were written before m1p2 and m1p3 shipped. They assumed WAL writes would be accessible from the public API. They are not. This scoping document corrects the task definitions to match reality.
+
+---
+
+## 1. tidalctl Scope at M0
+
+### What tidalctl Can Do
+
+tidalctl is a **cold inspector**. There is no live process to connect to. The CLI reads files from disk and reports what it finds. This is the correct model for an embeddable database -- there is no server process listening on a port. The inspector reads the same files the embedded library writes.
+
+### Commands
+
+#### `tidalctl status --path `
+
+Reads the tidalDB home directory and prints a JSON report:
+
+```json
+{
+ "version": "0.1.0",
+ "build_hash": "29400d4",
+ "status": "ok",
+ "storage_mode": "persistent",
+ "wal": {
+ "segments": 3,
+ "first_seq": 1,
+ "last_segment_seq": 201,
+ "checkpoint_seq": 150,
+ "checkpoint_ts": "2026-02-20T14:30:00Z",
+ "wal_dir_bytes": 49152
+ },
+ "dirs": {
+ "base": "/var/lib/tidaldb",
+ "wal": "/var/lib/tidaldb/wal",
+ "items": "/var/lib/tidaldb/items",
+ "users": "/var/lib/tidaldb/users",
+ "creators": "/var/lib/tidaldb/creators",
+ "cache": "/var/lib/tidaldb/cache"
+ }
+}
+```
+
+**How each field is computed:**
+
+| Field | Source | Notes |
+|-------|--------|-------|
+| `version` | Compiled into binary via `env!("CARGO_PKG_VERSION")` | Always available |
+| `build_hash` | Compiled via `option_env!("GIT_HASH")` or build script | Falls back to `"unknown"` |
+| `status` | `"ok"` if dir exists, has wal subdir, and at least one segment | `"empty"` if no WAL segments, `"error"` if dir missing |
+| `storage_mode` | Inferred: if WAL dir exists with segments, `"persistent"` | No way to know `ephemeral` from disk -- ephemeral leaves no trace |
+| `wal.segments` | `segment::list_segments(&wal_dir)?.len()` | Already implemented in `tidal/src/wal/segment.rs` |
+| `wal.first_seq` | First element of `list_segments()` result | 0 if empty |
+| `wal.last_segment_seq` | Last element of `list_segments()` result | 0 if empty |
+| `wal.checkpoint_seq` | `CheckpointManager::read(&wal_dir)?` | null if no checkpoint file |
+| `wal.checkpoint_ts` | Same -- the `ts` field, formatted as ISO 8601 | null if no checkpoint |
+| `wal.wal_dir_bytes` | Sum of file sizes in WAL dir | Filesystem stat |
+| `dirs.*` | `Paths::new(base)` expanded | Existence checked per dir |
+
+**No config file is written.** tidalctl does not need `TidalDb::open()` to write a `.tidaldb.json` config snapshot. The CLI reports what it can observe on the filesystem. The config is a runtime concept -- it exists in memory while the process runs and is not persisted. This is correct for M0. If future milestones need a config file for operational tooling, that is a separate decision.
+
+**No live process query.** tidalctl reads disk. It does not connect to a running process. No Unix socket, no HTTP, no PID file. This is the right model for an embeddable library.
+
+#### `tidalctl paths --path `
+
+Prints the resolved directory layout:
+
+```json
+{
+ "base": "/var/lib/tidaldb",
+ "wal": "/var/lib/tidaldb/wal",
+ "items": "/var/lib/tidaldb/items",
+ "users": "/var/lib/tidaldb/users",
+ "creators": "/var/lib/tidaldb/creators",
+ "cache": "/var/lib/tidaldb/cache",
+ "exists": {
+ "base": true,
+ "wal": true,
+ "items": true,
+ "users": false,
+ "creators": false,
+ "cache": false
+ }
+}
+```
+
+This uses `Paths::new(dir)` -- the same path helper from m0p1. No duplication.
+
+#### Common Flags
+
+- `--path ` (required): the tidalDB home directory
+- `--pretty` (optional): pretty-print JSON output (default: compact)
+- `--format json|text` (optional, default `json`): `text` prints human-friendly tabular output
+
+### What tidalctl Does NOT Do at M0
+
+- No `tidalctl init` (creating a fresh tidalDB home) -- the library creates dirs on open
+- No `tidalctl repair` (WAL repair) -- crash recovery is automatic in `WalHandle::open()`
+- No `tidalctl compact` (storage compaction) -- no compaction exists yet
+- No `tidalctl dump` (WAL event dump) -- useful but not needed for the m0p2 UAT
+- No live process communication of any kind
+
+---
+
+## 2. Metrics Scope at M0
+
+### The Problem with the Original Task
+
+The original task says: "Integration test hits `/metrics` and asserts counters increment when WAL appends."
+
+At M0, the `TidalDb` public API has no WAL write path. `WalHandle::append()` exists but is not wired to `TidalDb`. There are no signal writes from the public API. A test that asserts "counters increment when WAL appends" cannot be written without either (a) using WAL internals directly or (b) waiting for m1p5.
+
+### Corrected Scope
+
+The metrics surface at M0 serves one purpose: **prove the plumbing works so later milestones can add counters without redesigning the metrics layer.** The counters themselves are scaffolding. The architecture is the deliverable.
+
+### Endpoints
+
+#### `GET /healthz`
+
+```json
+{
+ "status": "ok",
+ "uptime_seconds": 127.3,
+ "version": "0.1.0",
+ "build_hash": "29400d4"
+}
+```
+
+#### `GET /metrics`
+
+Prometheus text exposition format:
+
+```
+# HELP tidaldb_uptime_seconds Seconds since database opened.
+# TYPE tidaldb_uptime_seconds gauge
+tidaldb_uptime_seconds{partition_id="0"} 127.3
+
+# HELP tidaldb_health_ok Whether the database is healthy. 1 = ok, 0 = degraded.
+# TYPE tidaldb_health_ok gauge
+tidaldb_health_ok{partition_id="0"} 1
+
+# HELP tidaldb_info Build and version information.
+# TYPE tidaldb_info gauge
+tidaldb_info{version="0.1.0",build_hash="29400d4",partition_id="0"} 1
+```
+
+### Exact Counters at M0
+
+| Counter | Type | Source | Note |
+|---------|------|--------|------|
+| `tidaldb_uptime_seconds` | Gauge | `Instant::now() - opened_at` | Computed on read |
+| `tidaldb_health_ok` | Gauge | `health_check().is_ok() as u8` | 1 or 0 |
+| `tidaldb_info` | Gauge (info-pattern) | Build constants | Static, always 1 |
+
+That is the complete set. Three metrics. No WAL counters, no signal counters, no storage counters. Those arrive in m1p5 when the WAL is wired to the public API.
+
+### What the Integration Test Verifies
+
+The integration test at M0 verifies:
+
+1. `TidalDb::builder().ephemeral().enable_metrics("127.0.0.1:0").open()` succeeds (port 0 = OS assigns)
+2. `GET /healthz` returns 200 with `status: "ok"` and `uptime_seconds > 0`
+3. `GET /metrics` returns 200 with valid Prometheus text format
+4. `tidaldb_uptime_seconds` increases between two reads separated by a sleep
+5. `tidaldb_health_ok` is 1
+6. `db.close()` stops the metrics server cleanly (no leaked threads, no port still bound)
+
+No WAL assertions. No signal assertions. The test proves the HTTP server starts, serves correct responses, and shuts down cleanly.
+
+### What Is Deferred
+
+| Counter | Deferred To | Why |
+|---------|-------------|-----|
+| `tidaldb_wal_seq` | m1p5 | WAL not wired to public API yet |
+| `tidaldb_wal_segments` | m1p5 | Same |
+| `tidaldb_wal_bytes_total` | m1p5 | Same |
+| `tidaldb_signal_writes_total` | m1p5 | `db.signal()` does not exist yet |
+| `tidaldb_signal_read_latency` | m1p5 | Signal reads do not exist yet |
+| `tidaldb_query_latency` | m2p5 | Query executor does not exist yet |
+| `tidaldb_query_count` | m2p5 | Same |
+
+---
+
+## 3. HTTP Approach: Sync (Option A)
+
+**Chosen: (a) Sync HTTP via `tiny_http` in a background thread.**
+
+Rationale:
+
+1. **Minimal deps is an explicit tidalDB requirement.** Tokio is 200+ transitive dependencies. `tiny_http` is 5. For an embeddable library, dependency weight matters -- every dep is a compile-time cost and an audit surface for every user.
+
+2. **The metrics endpoint does ~2 requests per scrape interval.** This is not a high-throughput server. A single-threaded sync HTTP listener on a background thread handles thousands of req/s. Prometheus scrapes every 15-30s. `tiny_http` handles this with zero contention.
+
+3. **No Tokio runtime conflict.** If the host application uses Tokio (likely for an Axum/Actix service), embedding a second Tokio runtime inside tidalDB creates footguns: nested `block_on`, unexpected thread pools, panic behavior. A background `std::thread` with sync HTTP avoids all of this.
+
+4. **The "Future implementor" spec is wrong for M0.** The original task assumed tidalDB would share the host's async runtime. That is a leaky abstraction. An embeddable library should not assume or require any particular async runtime. A background thread with sync HTTP is the correct primitive.
+
+5. **Feature flag is premature.** Option (c) with feature flags adds compile-time complexity for a surface that serves 3 metrics. Ship sync now. If M7 (Production Hardening) needs async HTTP for high-frequency scraping, add it then. The internal `MetricsRegistry` / counter abstraction is the same either way -- only the HTTP transport changes.
+
+### Implementation Shape
+
+```rust
+// Builder API
+let db = TidalDb::builder()
+ .ephemeral()
+ .enable_metrics("127.0.0.1:9090") // Starts background thread
+ .open()?;
+
+// Internal: spawns std::thread with tiny_http::Server
+// Thread reads from Arc (uptime, health_ok, build_info)
+// Thread exits cleanly when TidalDb::close() sets a shutdown flag
+```
+
+### Dependency Addition
+
+```toml
+# In tidal/Cargo.toml, behind a feature flag:
+[features]
+metrics = ["dep:tiny_http"]
+
+[dependencies]
+tiny_http = { version = "0.12", optional = true }
+```
+
+The `metrics` feature is opt-in. Users who do not need the HTTP endpoint pay zero compile cost. The `MetricsState` struct (atomic counters) exists unconditionally -- only the HTTP server is gated.
+
+---
+
+## 4. Workspace Structure: Workspace with Separate Binary Crate
+
+**Confirmed: workspace layout.**
+
+### Structure
+
+```
+tidalDB/
+ Cargo.toml # [workspace] members = ["tidal", "tidalctl"]
+ tidal/
+ Cargo.toml # [package] name = "tidaldb" (the library)
+ src/
+ tidalctl/
+ Cargo.toml # [package] name = "tidalctl" (the binary)
+ src/
+ main.rs
+```
+
+### Why Workspace, Not `[[bin]]`
+
+1. **Separate dependency trees.** tidalctl needs `clap` for argument parsing. The tidaldb library should not carry `clap` as a dependency -- embeddable libraries do not parse CLI arguments. A `[[bin]]` inside `tidal/` would either make `clap` unconditional or require a feature flag, both of which pollute the library.
+
+2. **Independent versioning path.** tidalctl may version independently from tidaldb. The CLI is a companion tool, not part of the library API surface.
+
+3. **`cargo install tidalctl` works naturally.** Users install the CLI separately from embedding the library. A workspace member with `[[bin]]` in its own crate gives `cargo install --path tidalctl` the right behavior.
+
+4. **Shared dependencies via workspace.** `tidalctl` depends on `tidaldb` (for `Paths`, `WalConfig`, segment parsing, checkpoint reading). The workspace ensures they share the same compiled artifacts.
+
+### tidalctl Dependencies
+
+```toml
+[package]
+name = "tidalctl"
+version = "0.1.0"
+edition = "2024"
+
+[dependencies]
+tidaldb = { path = "../tidal" }
+clap = { version = "4", features = ["derive"] }
+serde = { version = "1", features = ["derive"] }
+serde_json = "1"
+```
+
+### What This Means for Pre-Commit Hooks and CI
+
+The root `Cargo.toml` becomes the workspace root. All `cargo` commands (`fmt`, `clippy`, `test`) need to run from the workspace root or with `--workspace`. The pre-commit hook currently uses `--manifest-path tidal/Cargo.toml` -- this must be updated to use the workspace root.
+
+---
+
+## 5. Deferred Items
+
+### Explicitly NOT in m0p2
+
+| Item | Why Deferred | Arrives In |
+|------|-------------|------------|
+| Config serialization to disk (`.tidaldb.json`) | tidalctl inspects filesystem artifacts, not config files. Config is a runtime concept. | Revisit in M7 if operational tooling needs it |
+| `tidalctl init` command | Library creates dirs on open. A separate init command is redundant. | Possibly never |
+| `tidalctl repair` command | Crash recovery is automatic in `WalHandle::open()`. Manual repair is a production concern. | M7 |
+| `tidalctl dump` (WAL event dump) | Useful for debugging but not required for m0p2 UAT | M1 or M2 when developers need to debug signal event streams |
+| WAL counters in metrics | WAL not wired to public API yet | m1p5 |
+| Signal counters in metrics | `db.signal()` does not exist yet | m1p5 |
+| Query counters in metrics | Query executor does not exist yet | m2p5 |
+| Async HTTP for metrics | Sync HTTP is sufficient for Prometheus scraping | M7 if needed |
+| `tidalctl` connecting to live process | Embeddable library has no server process | Possibly never |
+| Serde on `Config` | tidalctl does not read a config file. Config serde is needed only if we write a config file, which is deferred. | When needed |
+
+---
+
+## 6. Acceptance Criteria
+
+### Task 1: tidalctl CLI
+
+- [ ] **AC-1:** `tidalctl status --path ` against a directory with WAL segments and checkpoint outputs valid JSON containing `version`, `wal.segments`, `wal.checkpoint_seq`, and `dirs.base`
+- [ ] **AC-2:** `tidalctl status --path ` against an empty directory (no WAL, no segments) outputs JSON with `status: "empty"` and `wal.segments: 0`
+- [ ] **AC-3:** `tidalctl status --path /nonexistent` exits with non-zero status and prints a JSON error object to stderr
+- [ ] **AC-4:** `tidalctl paths --path ` outputs JSON with all six directory paths and existence flags matching actual filesystem state
+- [ ] **AC-5:** `--pretty` flag produces indented JSON; absence produces compact JSON
+- [ ] **AC-6:** `cargo test -p tidalctl` passes with tests for: valid home, empty home, missing home, pretty flag, paths command
+
+### Task 2: Metrics Surface
+
+- [ ] **AC-7:** `TidalDb::builder().ephemeral().enable_metrics("127.0.0.1:0").open()` starts a background HTTP thread bound to an OS-assigned port
+- [ ] **AC-8:** `GET /healthz` returns HTTP 200 with JSON containing `status: "ok"` and `uptime_seconds > 0`
+- [ ] **AC-9:** `GET /metrics` returns HTTP 200 with valid Prometheus text format containing `tidaldb_uptime_seconds`, `tidaldb_health_ok`, and `tidaldb_info`
+- [ ] **AC-10:** `tidaldb_uptime_seconds` increases monotonically between reads (verified by sleeping 100ms between two fetches)
+- [ ] **AC-11:** `TidalDb::close()` stops the metrics HTTP thread; subsequent connection attempts to the port are refused
+- [ ] **AC-12:** Building `tidaldb` without the `metrics` feature flag compiles successfully with no `tiny_http` dependency; `enable_metrics()` method is absent or returns a compile error guiding the user to enable the feature
+
+---
+
+## 7. UAT Scenario
+
+### Given
+
+```
+A developer has:
+ - Built the workspace: `cargo build --workspace`
+ - Created a persistent tidalDB instance that wrote WAL segments:
+ let home = TempTidalHome::new()?;
+ let paths = home.paths();
+ paths.ensure_all()?;
+ let wal_config = WalConfig { dir: home.path().to_path_buf(), ..Default::default() };
+ let (wal, _) = WalHandle::open(wal_config)?;
+ wal.append(event_1)?;
+ wal.append(event_2)?;
+ wal.checkpoint(2)?;
+ wal.shutdown()?;
+ - Opened a TidalDb with metrics enabled:
+ let db = TidalDb::builder()
+ .ephemeral()
+ .enable_metrics("127.0.0.1:0")
+ .open()?;
+```
+
+### When
+
+```
+1. Run: tidalctl status --path
+2. Run: tidalctl paths --path
+3. HTTP GET /healthz on the metrics port
+4. HTTP GET /metrics on the metrics port
+5. Sleep 200ms
+6. HTTP GET /metrics again
+7. db.close()
+8. Attempt HTTP GET /healthz on the metrics port
+```
+
+### Then
+
+```
+Step 1: JSON output with wal.segments >= 1, wal.checkpoint_seq == 2,
+ status == "ok", version matches Cargo.toml
+
+Step 2: JSON output with dirs.wal == "/wal", exists.wal == true
+
+Step 3: HTTP 200, body contains "status":"ok", uptime_seconds > 0
+
+Step 4: HTTP 200, body contains tidaldb_uptime_seconds,
+ tidaldb_health_ok 1, tidaldb_info{version="0.1.0"...} 1
+
+Step 5: (sleep)
+
+Step 6: tidaldb_uptime_seconds > value from step 4
+
+Step 7: close() returns Ok(())
+
+Step 8: Connection refused (metrics server stopped)
+```
+
+### Pass/Fail Gate
+
+m0p2 is **done** when:
+- `cargo test -p tidalctl` passes
+- `cargo test -p tidaldb --features metrics` passes (metrics integration tests)
+- `cargo build --workspace` succeeds with no warnings under `clippy -D warnings`
+- All 12 acceptance criteria above are verified by automated tests
+- tidalctl uses `Paths` from the tidaldb crate (no duplicated layout logic)
+
+---
+
+## Implementation Notes
+
+### Build Hash
+
+Use a build script (`tidal/build.rs`) or `option_env!("GIT_HASH")` set by CI. For local builds, fall back to `"dev"`. Both tidalctl and the metrics endpoint use the same constant.
+
+### Metrics State Sharing
+
+```rust
+pub(crate) struct MetricsState {
+ opened_at: Instant,
+ health_ok: AtomicBool,
+ // Future milestones add: wal_seq: AtomicU64, signal_writes: AtomicU64, etc.
+}
+```
+
+This struct is `Arc`-shared between `TidalDb` and the metrics HTTP thread. Adding new counters in future milestones is a one-line addition to this struct plus a one-line addition to the Prometheus renderer. The plumbing is paid for once in m0p2.
+
+### tidalctl WAL Inspection
+
+tidalctl depends on `tidaldb` as a library. It calls:
+- `tidaldb::db::Paths::new(dir)` for path resolution
+- `tidaldb::wal::segment::list_segments(&wal_dir)` for segment enumeration
+- `tidaldb::wal::checkpoint::CheckpointManager::read(&wal_dir)` for checkpoint state
+
+These are all `pub` functions already. No new internal APIs need to be exposed. The WAL module's public surface is sufficient.
+
+### Complexity Estimates
+
+| Task | Complexity | Rationale |
+|------|-----------|-----------|
+| Workspace setup (root Cargo.toml, pre-commit hook update) | S | Mechanical, no design decisions |
+| tidalctl CLI (clap, status, paths) | M | Two commands, JSON output, error handling, tests |
+| Metrics surface (tiny_http, feature flag, MetricsState, endpoints) | M | Background thread lifecycle, Prometheus format, integration test |
+| Build hash plumbing | S | Build script or env var, shared constant |
+
+**Total phase complexity: M** (two M tasks + two S tasks, all independent after workspace setup)
diff --git a/docs/planning/milestone-0/phase-2/task-01-tidalctl.md b/docs/planning/milestone-0/phase-2/task-01-tidalctl.md
new file mode 100644
index 0000000..5f8f65e
--- /dev/null
+++ b/docs/planning/milestone-0/phase-2/task-01-tidalctl.md
@@ -0,0 +1,13 @@
+# Task 01 — `tidalctl` CLI
+
+**Goal:** ship a tiny, dependency-light binary that inspects an embedded tidalDB home.
+
+## Deliverables
+- `tidalctl status --path ` and `tidalctl paths --path ` commands.
+- Shared crate for serialization so CLI can read builder `Config` dumps atomically.
+- Documentation on how to vendor the CLI into host repos (cargo install or binary download).
+
+## Acceptance Criteria
+- Works against both temp homes and long-lived dirs.
+- Output is JSON by default with `--pretty` option.
+- `cargo test -p tidalctl` exercises happy/error paths.
diff --git a/docs/planning/milestone-0/phase-2/task-02-metrics-surface.md b/docs/planning/milestone-0/phase-2/task-02-metrics-surface.md
new file mode 100644
index 0000000..26326e0
--- /dev/null
+++ b/docs/planning/milestone-0/phase-2/task-02-metrics-surface.md
@@ -0,0 +1,13 @@
+# Task 02 — Metrics / Health Surface
+
+**Goal:** expose runtime stats (uptime, WAL lag, open handles) without requiring full observability stack.
+
+## Deliverables
+- Optional HTTP endpoint (disabled by default) serving `/metrics` (Prometheus text) and `/healthz` (JSON).
+- Builder flag `enable_metrics(addr)` + `with_metrics(false)` toggle for tests.
+- Metrics counters tagged with `process`, `partition_id`, `build_hash` for future distributed rollouts.
+
+## Acceptance Criteria
+- Endpoint can run on the same Tokio runtime as host service (returns `Future` implementor).
+- Minimal deps (hyper/tiny_http) to keep embeddable footprint light.
+- Integration test hits `/metrics` and asserts counters increment when WAL appends.
diff --git a/docs/planning/milestone-0/phase-3/OVERVIEW.md b/docs/planning/milestone-0/phase-3/OVERVIEW.md
new file mode 100644
index 0000000..774d0b0
--- /dev/null
+++ b/docs/planning/milestone-0/phase-3/OVERVIEW.md
@@ -0,0 +1,12 @@
+# Milestone 0 · Phase 3 — Samples & Docs
+
+**Objective:** provide living examples so every change to the builder/tooling is exercised in CI. This phase adds doctest-backed quickstarts plus integration tests wiring tidalDB into typical Rust stacks.
+
+**Success criteria**
+- Quick-start snippet (For You microdemo) lives in `/examples` and is referenced from docs/site; compiled via `cargo test --doc`.
+- Axum/Actix embedding examples show graceful shutdown and metrics wiring.
+- CONTRIBUTING section updated with “run the samples” checklist.
+
+**Dependencies:** Phase 1 + 2 (uses builder + CLI output).
+
+**Unblocks:** Developer onboarding, future blog posts, automated regression tests.
diff --git a/docs/planning/milestone-0/phase-3/task-01-quickstart-and-doctests.md b/docs/planning/milestone-0/phase-3/task-01-quickstart-and-doctests.md
new file mode 100644
index 0000000..df89587
--- /dev/null
+++ b/docs/planning/milestone-0/phase-3/task-01-quickstart-and-doctests.md
@@ -0,0 +1,12 @@
+# Task 01 — Quickstart + Doctests
+
+**Goal:** demonstrate embedding tidalDB in <20 LOC and keep it compiling forever.
+
+## Deliverables
+- `examples/quickstart.rs` referenced from README + VISION.
+- Corresponding snippet in docs that uses Rust fenced code block with `no_run` + doctest harness.
+- CI hook (`cargo test --doc --examples`) wired into main workflow.
+
+## Acceptance Criteria
+- Quickstart writes a single signal and reads it back once M1 lands, but for now asserts builder + health_check.
+- Fails CI if builder API changes without updating docs.
diff --git a/docs/planning/milestone-0/phase-3/task-02-embedding-guides.md b/docs/planning/milestone-0/phase-3/task-02-embedding-guides.md
new file mode 100644
index 0000000..a7b4626
--- /dev/null
+++ b/docs/planning/milestone-0/phase-3/task-02-embedding-guides.md
@@ -0,0 +1,14 @@
+# Task 02 — Embedding Guides (Axum/Actix/CLI)
+
+**Goal:** show how to wire tidalDB into real hosts so customers don’t guess.
+
+## Deliverables
+- Axum guide: builder in `State`, graceful shutdown via `with_graceful_shutdown`.
+- Actix guide: `Data` example + metric endpoint mounting.
+- CLI embedding: minimal binary using builder + tidalctl for smoke test.
+- Docs page (mdx) referencing all three.
+
+## Acceptance Criteria
+- Each guide lives under `/examples` with README instructions.
+- Guides double as integration tests (run during CI).
+- Includes “cleanup” guidance referencing TempTidalHome helper.
diff --git a/docs/planning/milestone-p/OVERVIEW.md b/docs/planning/milestone-p/OVERVIEW.md
new file mode 100644
index 0000000..65bf5ca
--- /dev/null
+++ b/docs/planning/milestone-p/OVERVIEW.md
@@ -0,0 +1,20 @@
+# Milestone P Overview
+
+This directory contains the execution plan for the product track defined in `docs/planning/PRODUCT_ROADMAP.md`.
+
+## Phase Map
+
+- `phase-1/` -> P0 Beachhead Validation
+- `phase-2/` -> P1 Concierge Alpha
+- `PG1 gate` -> Personalization Core Done (must pass before P2)
+- `phase-3/` -> P2 Productized Beta
+- `phase-4/` -> P3 Public Launch
+- `phase-5/` -> P4 Scale + Revenue Fit
+
+## Usage
+
+Each phase follows the same pattern:
+
+1. Read `OVERVIEW.md` for objective, success criteria, and task dependency.
+2. Execute tasks in numbered order.
+3. Report completion back to `PRODUCT_ROADMAP.md` and `ROADMAP.md` status tables.
diff --git a/docs/planning/milestone-p/phase-1/OVERVIEW.md b/docs/planning/milestone-p/phase-1/OVERVIEW.md
new file mode 100644
index 0000000..6801d7a
--- /dev/null
+++ b/docs/planning/milestone-p/phase-1/OVERVIEW.md
@@ -0,0 +1,25 @@
+# Milestone P · Phase 1 — P0 Beachhead Validation
+
+**Objective:** prove that the personal briefing concept is valuable enough for repeated use among target users.
+
+## Success Criteria
+
+- 20-50 target users onboarded.
+- Two-week concierge pilot completed.
+- Median user performs >= 1 explicit feedback action per session.
+- D2 and D7 retention measured and reviewed.
+- Qualitative interviews confirm stronger perceived value vs baseline feed habits.
+
+**Dependencies:** M0 complete, M1 partial (signal write/read path sufficient for prototype adaptation).
+
+**Blocked by:** final target segment and recruitment criteria.
+
+**Unblocks:** P1 Concierge Alpha build.
+
+## Task Index
+
+| # | Task | Delivers | Depends On | Complexity |
+|---|------|----------|------------|------------|
+| 01 | Target Segment and Recruitment | persona definition, recruitment script, candidate pool | None | S |
+| 02 | Concierge Pilot Loop | daily briefing workflow, manual QA process, interview cadence | Task 01 | M |
+| 03 | Validation Readout | retention and interview analysis, go/no-go decision | Task 02 | S |
diff --git a/docs/planning/milestone-p/phase-1/task-01-target-segment-and-recruitment.md b/docs/planning/milestone-p/phase-1/task-01-target-segment-and-recruitment.md
new file mode 100644
index 0000000..8724e7f
--- /dev/null
+++ b/docs/planning/milestone-p/phase-1/task-01-target-segment-and-recruitment.md
@@ -0,0 +1,16 @@
+# P0 Task 01: Target Segment and Recruitment
+
+## Deliverable
+
+A clear target segment definition and a recruited pilot cohort of 20-50 users for the beachhead validation phase.
+
+## Acceptance Criteria
+
+- [ ] Primary segment defined (knowledge-worker profile + consumer profile).
+- [ ] Inclusion/exclusion criteria documented.
+- [ ] Recruitment script and consent language prepared.
+- [ ] 20-50 users recruited and scheduled.
+
+## Depends On
+
+- None
diff --git a/docs/planning/milestone-p/phase-1/task-02-concierge-pilot-loop.md b/docs/planning/milestone-p/phase-1/task-02-concierge-pilot-loop.md
new file mode 100644
index 0000000..3942efd
--- /dev/null
+++ b/docs/planning/milestone-p/phase-1/task-02-concierge-pilot-loop.md
@@ -0,0 +1,16 @@
+# P0 Task 02: Concierge Pilot Loop
+
+## Deliverable
+
+A two-week concierge pilot that sends daily briefings, captures feedback, and records perceived usefulness.
+
+## Acceptance Criteria
+
+- [ ] Daily briefing sent to each participant.
+- [ ] Feedback controls captured (`more`, `less`, `hide`, `mute`, `save`).
+- [ ] Weekly interviews executed per participant sample.
+- [ ] Pilot operations documented (timing, QA, issue handling).
+
+## Depends On
+
+- Task 01
diff --git a/docs/planning/milestone-p/phase-1/task-03-validation-readout.md b/docs/planning/milestone-p/phase-1/task-03-validation-readout.md
new file mode 100644
index 0000000..99663cb
--- /dev/null
+++ b/docs/planning/milestone-p/phase-1/task-03-validation-readout.md
@@ -0,0 +1,16 @@
+# P0 Task 03: Validation Readout
+
+## Deliverable
+
+A go/no-go decision memo with quantitative and qualitative outcomes from the pilot.
+
+## Acceptance Criteria
+
+- [ ] D2 and D7 retention reported.
+- [ ] Feedback action rate reported.
+- [ ] Useful-item perception findings summarized.
+- [ ] Clear recommendation: proceed to P1, iterate, or pivot.
+
+## Depends On
+
+- Task 02
diff --git a/docs/planning/milestone-p/phase-2/OVERVIEW.md b/docs/planning/milestone-p/phase-2/OVERVIEW.md
new file mode 100644
index 0000000..d863c64
--- /dev/null
+++ b/docs/planning/milestone-p/phase-2/OVERVIEW.md
@@ -0,0 +1,25 @@
+# Milestone P · Phase 2 — P1 Concierge Alpha
+
+**Objective:** deliver a narrow, high-quality `Today Brief` experience that users find useful and controllable every day.
+
+## Success Criteria
+
+- Daily ranked brief is live for pilot cohort.
+- Reason labels and source links present for each card.
+- Immediate adaptation after negative feedback visible on next refresh.
+- Time-budget mode (`5/10/20`) in active usage.
+- Weekly active usage confirms repeat behavior.
+
+**Dependencies:** P0 completed, M1 complete, M2 partial.
+
+**Blocked by:** validated user segment from P0.
+
+**Unblocks:** P2 Productized Beta.
+
+## Task Index
+
+| # | Task | Delivers | Depends On | Complexity |
+|---|------|----------|------------|------------|
+| 01 | Briefing UX and Reason Labels | card UI spec, reasons taxonomy, source exposure rules | None | M |
+| 02 | Feedback Loop UX | controls and immediate-reflection behavior | Task 01 | M |
+| 03 | Quality and Diversity Baseline | quality gates and diversity constraints in top results | Task 02 | M |
diff --git a/docs/planning/milestone-p/phase-2/task-01-briefing-ux-and-reason-labels.md b/docs/planning/milestone-p/phase-2/task-01-briefing-ux-and-reason-labels.md
new file mode 100644
index 0000000..c2c3723
--- /dev/null
+++ b/docs/planning/milestone-p/phase-2/task-01-briefing-ux-and-reason-labels.md
@@ -0,0 +1,15 @@
+# P1 Task 01: Briefing UX and Reason Labels
+
+## Deliverable
+
+Core daily briefing experience with clear reason labels and transparent source access.
+
+## Acceptance Criteria
+
+- [ ] Briefing cards defined with title, summary, source, timestamp, and reason label.
+- [ ] Reason taxonomy documented and constrained.
+- [ ] Source link behavior consistent across cards.
+
+## Depends On
+
+- None
diff --git a/docs/planning/milestone-p/phase-2/task-02-feedback-loop-ux.md b/docs/planning/milestone-p/phase-2/task-02-feedback-loop-ux.md
new file mode 100644
index 0000000..d2580b1
--- /dev/null
+++ b/docs/planning/milestone-p/phase-2/task-02-feedback-loop-ux.md
@@ -0,0 +1,15 @@
+# P1 Task 02: Feedback Loop UX
+
+## Deliverable
+
+User controls for `more/less/hide/mute/save` and immediate next-refresh adaptation behavior.
+
+## Acceptance Criteria
+
+- [ ] Feedback control semantics documented.
+- [ ] Next-refresh behavior specified for each control.
+- [ ] Negative-feedback persistence rules documented.
+
+## Depends On
+
+- Task 01
diff --git a/docs/planning/milestone-p/phase-2/task-03-quality-and-diversity-baseline.md b/docs/planning/milestone-p/phase-2/task-03-quality-and-diversity-baseline.md
new file mode 100644
index 0000000..757c15d
--- /dev/null
+++ b/docs/planning/milestone-p/phase-2/task-03-quality-and-diversity-baseline.md
@@ -0,0 +1,15 @@
+# P1 Task 03: Quality and Diversity Baseline
+
+## Deliverable
+
+Default quality floor and diversity guardrails for top-ranked briefing items.
+
+## Acceptance Criteria
+
+- [ ] Duplicate suppression policy defined.
+- [ ] Source/topic diversity thresholds defined for top 10 items.
+- [ ] Freshness floor defined for time-sensitive domains.
+
+## Depends On
+
+- Task 02
diff --git a/docs/planning/milestone-p/phase-3/OVERVIEW.md b/docs/planning/milestone-p/phase-3/OVERVIEW.md
new file mode 100644
index 0000000..ecd2aaa
--- /dev/null
+++ b/docs/planning/milestone-p/phase-3/OVERVIEW.md
@@ -0,0 +1,24 @@
+# Milestone P · Phase 3 — P2 Productized Beta
+
+**Objective:** make the product self-serve and reliable enough to run without manual curation.
+
+## Success Criteria
+
+- Onboarding under 3 minutes for most users.
+- Cohort layer available and understandable.
+- Explanation and trust controls present by default.
+- D7 retention and useful-item rate exceed baseline comparison feed.
+
+**Dependencies:** P1 completed, **PG1 Personalization Core Done gate passed**, M2 complete, M3 partial.
+
+**Blocked by:** stable alpha behavior and instrumented metrics.
+
+**Unblocks:** P3 Public Launch.
+
+## Task Index
+
+| # | Task | Delivers | Depends On | Complexity |
+|---|------|----------|------------|------------|
+| 01 | Self-Serve Onboarding | onboarding flow, defaults, profile bootstrap | None | M |
+| 02 | Cohort and Context Views | cohort-trending surface and session context mode | Task 01 | M |
+| 03 | Trust Controls and Persistence | mute/hide persistence, quality visibility, transparency UX | Task 02 | M |
diff --git a/docs/planning/milestone-p/phase-3/task-01-self-serve-onboarding.md b/docs/planning/milestone-p/phase-3/task-01-self-serve-onboarding.md
new file mode 100644
index 0000000..54e9033
--- /dev/null
+++ b/docs/planning/milestone-p/phase-3/task-01-self-serve-onboarding.md
@@ -0,0 +1,15 @@
+# P2 Task 01: Self-Serve Onboarding
+
+## Deliverable
+
+An onboarding flow that consistently produces a usable briefing in under three minutes.
+
+## Acceptance Criteria
+
+- [ ] Minimal step flow defined and instrumented.
+- [ ] Default profile bootstrapping rules documented.
+- [ ] Onboarding completion and drop-off metrics tracked.
+
+## Depends On
+
+- None
diff --git a/docs/planning/milestone-p/phase-3/task-02-cohort-and-context-views.md b/docs/planning/milestone-p/phase-3/task-02-cohort-and-context-views.md
new file mode 100644
index 0000000..71555eb
--- /dev/null
+++ b/docs/planning/milestone-p/phase-3/task-02-cohort-and-context-views.md
@@ -0,0 +1,15 @@
+# P2 Task 02: Cohort and Context Views
+
+## Deliverable
+
+Product surfaces for `trending for people like you` and optional session-scoped context mode.
+
+## Acceptance Criteria
+
+- [ ] Cohort view behavior and labels defined.
+- [ ] Session context mode behavior defined.
+- [ ] Switching between views preserves user trust and clarity.
+
+## Depends On
+
+- Task 01
diff --git a/docs/planning/milestone-p/phase-3/task-03-trust-controls-and-persistence.md b/docs/planning/milestone-p/phase-3/task-03-trust-controls-and-persistence.md
new file mode 100644
index 0000000..3b36d96
--- /dev/null
+++ b/docs/planning/milestone-p/phase-3/task-03-trust-controls-and-persistence.md
@@ -0,0 +1,15 @@
+# P2 Task 03: Trust Controls and Persistence
+
+## Deliverable
+
+Persistent user controls and transparency defaults for trustworthy daily use.
+
+## Acceptance Criteria
+
+- [ ] Mute/hide preferences persist across sessions.
+- [ ] Explanation visibility defaults defined.
+- [ ] Source transparency and quality indicators visible in briefing UI.
+
+## Depends On
+
+- Task 02
diff --git a/docs/planning/milestone-p/phase-4/OVERVIEW.md b/docs/planning/milestone-p/phase-4/OVERVIEW.md
new file mode 100644
index 0000000..dc46a71
--- /dev/null
+++ b/docs/planning/milestone-p/phase-4/OVERVIEW.md
@@ -0,0 +1,24 @@
+# Milestone P · Phase 4 — P3 Public Launch
+
+**Objective:** launch the product publicly with reliability, quality floor, and support readiness.
+
+## Success Criteria
+
+- Briefing generation reliability and latency SLOs met.
+- Quality controls enforced (freshness, duplicates, source floor).
+- Notification cadence controls protect user attention.
+- Support and incident response workflow operating.
+
+**Dependencies:** P2 completed, M3 + M5 core, M6 partial.
+
+**Blocked by:** beta KPIs and operational readiness.
+
+**Unblocks:** P4 Scale + Revenue Fit.
+
+## Task Index
+
+| # | Task | Delivers | Depends On | Complexity |
+|---|------|----------|------------|------------|
+| 01 | Reliability and SLOs | launch SLOs, monitoring, error budgets | None | M |
+| 02 | Quality Operations | quality floor checks, regression dashboards, alerting | Task 01 | M |
+| 03 | Launch and Support Playbook | launch checklist, incident roles, communication templates | Task 02 | S |
diff --git a/docs/planning/milestone-p/phase-4/task-01-reliability-and-slos.md b/docs/planning/milestone-p/phase-4/task-01-reliability-and-slos.md
new file mode 100644
index 0000000..a04ea51
--- /dev/null
+++ b/docs/planning/milestone-p/phase-4/task-01-reliability-and-slos.md
@@ -0,0 +1,15 @@
+# P3 Task 01: Reliability and SLOs
+
+## Deliverable
+
+A launch reliability baseline with concrete SLOs and monitoring coverage.
+
+## Acceptance Criteria
+
+- [ ] Briefing latency and availability SLOs documented.
+- [ ] Key failure modes mapped to alerts.
+- [ ] Error budget policy defined.
+
+## Depends On
+
+- None
diff --git a/docs/planning/milestone-p/phase-4/task-02-quality-operations.md b/docs/planning/milestone-p/phase-4/task-02-quality-operations.md
new file mode 100644
index 0000000..fdca748
--- /dev/null
+++ b/docs/planning/milestone-p/phase-4/task-02-quality-operations.md
@@ -0,0 +1,15 @@
+# P3 Task 02: Quality Operations
+
+## Deliverable
+
+Operational quality controls to prevent stale, duplicate, or low-credibility briefing output.
+
+## Acceptance Criteria
+
+- [ ] Quality floor rules enforced.
+- [ ] Regression dashboard for useful-item rate and repeated-unwanted-item rate.
+- [ ] On-call quality triage workflow defined.
+
+## Depends On
+
+- Task 01
diff --git a/docs/planning/milestone-p/phase-4/task-03-launch-and-support-playbook.md b/docs/planning/milestone-p/phase-4/task-03-launch-and-support-playbook.md
new file mode 100644
index 0000000..a11e017
--- /dev/null
+++ b/docs/planning/milestone-p/phase-4/task-03-launch-and-support-playbook.md
@@ -0,0 +1,15 @@
+# P3 Task 03: Launch and Support Playbook
+
+## Deliverable
+
+A launch playbook covering rollout, support triage, and user communication.
+
+## Acceptance Criteria
+
+- [ ] Rollout stages documented.
+- [ ] Support triage categories and ownership defined.
+- [ ] Incident communication templates prepared.
+
+## Depends On
+
+- Task 02
diff --git a/docs/planning/milestone-p/phase-5/OVERVIEW.md b/docs/planning/milestone-p/phase-5/OVERVIEW.md
new file mode 100644
index 0000000..f3928a5
--- /dev/null
+++ b/docs/planning/milestone-p/phase-5/OVERVIEW.md
@@ -0,0 +1,24 @@
+# Milestone P · Phase 5 — P4 Scale + Revenue Fit
+
+**Objective:** prove product economics and growth quality at larger scale.
+
+## Success Criteria
+
+- Monetization model validated.
+- Revenue metrics tracked alongside quality metrics.
+- Retention stable as user volume grows.
+- Next target segment chosen with evidence.
+
+**Dependencies:** P3 completed, M6 + M7.
+
+**Blocked by:** public launch performance and data quality.
+
+**Unblocks:** next product line or market expansion.
+
+## Task Index
+
+| # | Task | Delivers | Depends On | Complexity |
+|---|------|----------|------------|------------|
+| 01 | Monetization Experiments | pricing/tests, conversion funnel instrumentation | None | M |
+| 02 | Quality-Safe Growth | guardrails that prevent revenue-over-quality regressions | Task 01 | M |
+| 03 | Segment Expansion Plan | data-backed expansion strategy and milestone proposal | Task 02 | S |
diff --git a/docs/planning/milestone-p/phase-5/task-01-monetization-experiments.md b/docs/planning/milestone-p/phase-5/task-01-monetization-experiments.md
new file mode 100644
index 0000000..9e050a5
--- /dev/null
+++ b/docs/planning/milestone-p/phase-5/task-01-monetization-experiments.md
@@ -0,0 +1,15 @@
+# P4 Task 01: Monetization Experiments
+
+## Deliverable
+
+A validated monetization path with measurable conversion and retention impact.
+
+## Acceptance Criteria
+
+- [ ] Pricing hypotheses and experiment plan documented.
+- [ ] Conversion funnel instrumented.
+- [ ] Early conversion and churn impact assessed.
+
+## Depends On
+
+- None
diff --git a/docs/planning/milestone-p/phase-5/task-02-quality-safe-growth.md b/docs/planning/milestone-p/phase-5/task-02-quality-safe-growth.md
new file mode 100644
index 0000000..b5008ae
--- /dev/null
+++ b/docs/planning/milestone-p/phase-5/task-02-quality-safe-growth.md
@@ -0,0 +1,15 @@
+# P4 Task 02: Quality-Safe Growth
+
+## Deliverable
+
+Growth guardrails that prevent quality degradation under monetization pressure.
+
+## Acceptance Criteria
+
+- [ ] Revenue and quality KPI review cadence defined.
+- [ ] Regression triggers and rollback policy defined.
+- [ ] User trust metrics included in growth decisions.
+
+## Depends On
+
+- Task 01
diff --git a/docs/planning/milestone-p/phase-5/task-03-segment-expansion-plan.md b/docs/planning/milestone-p/phase-5/task-03-segment-expansion-plan.md
new file mode 100644
index 0000000..ccffbcb
--- /dev/null
+++ b/docs/planning/milestone-p/phase-5/task-03-segment-expansion-plan.md
@@ -0,0 +1,15 @@
+# P4 Task 03: Segment Expansion Plan
+
+## Deliverable
+
+A data-backed proposal for the next target segment and roadmap extension.
+
+## Acceptance Criteria
+
+- [ ] Expansion segment candidate analysis completed.
+- [ ] Required product/engine capabilities identified.
+- [ ] Proposed milestone sequence drafted.
+
+## Depends On
+
+- Task 02
diff --git a/docs/research/tidaldb_tooling_and_diagnostics.md b/docs/research/tidaldb_tooling_and_diagnostics.md
new file mode 100644
index 0000000..181ba14
--- /dev/null
+++ b/docs/research/tidaldb_tooling_and_diagnostics.md
@@ -0,0 +1,696 @@
+# Research: CLI Framework and Embedded HTTP for m0p2 Tooling & Diagnostics
+
+## Question
+
+What is the minimum-viable set of dependencies and design patterns for:
+1. A `tidalctl` CLI binary (2 subcommands, 1 required arg, 1 optional flag, JSON output)
+2. An optional embedded HTTP endpoint (`/healthz` JSON, `/metrics` Prometheus text format)
+3. Prometheus text format output for 5-10 counters/gauges
+4. Config serialization for CLI-to-library communication
+
+## TidalDB Context
+
+tidalDB is an embeddable, single-node-first Rust database. The dependency philosophy from CODING_GUIDELINES.md is explicit: "Every dependency must justify its existence against 'could we write this in 200 lines?'" The library crate has `#![forbid(unsafe_code)]` at crate level. MSRV is 1.91 (Rust 2024 edition).
+
+**m0p2 scope is narrow:**
+- `tidalctl status --path ` and `tidalctl paths --path ` -- two subcommands, one required flag (`--path`), one optional flag (`--pretty`), JSON output
+- `/healthz` returning JSON health status
+- `/metrics` returning Prometheus text format with ~5-10 metrics (uptime, WAL sequence, queue depth, build hash)
+- The HTTP endpoint is feature-gated (`metrics` feature), disabled by default
+- Expected concurrent connections to the metrics endpoint: <10 (dev/ops tooling only)
+
+**Existing dependency context (from Cargo.lock):** `criterion` (dev-dependency) already pulls in `clap 4.5.60`, `serde 1.0.228`, `serde_json 1.0.149`, and `serde_derive 1.0.228`. These are compiled in every `cargo test` and `cargo bench` invocation today. `serde`/`serde_json` are also listed as approved dependencies in CODING_GUIDELINES.md (line 296).
+
+---
+
+## Question 1: CLI Argument Parsing for `tidalctl`
+
+### Approaches Surveyed
+
+#### Approach 1: `clap` 4.x (derive API)
+
+**How it works:** Declarative derive macros on structs generate a full argument parser with help text, error messages, completions, and subcommand routing. The derive API maps directly from struct fields to CLI flags.
+
+**Used by:** TiKV (`tikv-ctl`), Meilisearch, SurrealDB, Vector, Nushell, ripgrep, bat, fd. The dominant choice in the Rust CLI ecosystem. Criterion (already a tidalDB dev-dep) uses clap 4 internally.
+
+**Evidence:**
+- argparse-rosetta-rs benchmarks (2024): 3s full debug build, 392ms incremental. 654 KiB release binary overhead (full features) or 427 KiB (minimal features).
+- MSRV: 1.74. Compatible with tidalDB's 1.91.
+- Rain's Rust CLI Recommendations: "use clap unless you have a really simple application."
+
+**Strengths:**
+- Auto-generated `--help` with subcommand tree, argument descriptions, and defaults.
+- Compile-time validation of argument structure via derive macros.
+- Shell completions via `clap_complete`.
+- Already in Cargo.lock via criterion -- zero additional compile-time cost in dev builds.
+
+**Weaknesses:**
+- 654 KiB binary overhead (full) / 427 KiB (minimal) added to the `tidalctl` release binary.
+- Proc-macro dependency chain (syn, quote, proc-macro2) -- though these are already compiled for criterion.
+- Overkill for 2 subcommands.
+
+#### Approach 2: `argh` 0.1.13 (Google's derive parser)
+
+**How it works:** Derive-based parser optimized for code size, designed for Google Fuchsia's CLI conventions. Similar derive API to clap but with a smaller binary footprint.
+
+**Used by:** Google Fuchsia tooling. Limited adoption outside Google's ecosystem.
+
+**Evidence:**
+- argparse-rosetta-rs benchmarks: 3s full debug build (same as clap due to proc-macro overhead), 203ms incremental. 38 KiB binary overhead.
+- MSRV: not explicitly declared. Uses 2018 edition. Last release ~12 months ago.
+- License: BSD-3-Clause. "This is not an officially supported Google product."
+
+**Strengths:**
+- Much smaller binary overhead than clap (38 KiB vs 427-654 KiB).
+- Derive-based API similar to clap.
+
+**Weaknesses:**
+- Not in Cargo.lock -- adds a new dependency tree.
+- Fuchsia-specific conventions (not standard Unix `--flag=value` in all cases).
+- Lower community adoption; maintenance uncertain (not officially supported by Google).
+- No shell completions.
+- 3s initial compile (proc-macro overhead same as clap).
+
+#### Approach 3: `pico-args` 0.5.0
+
+**How it works:** Manual argument extraction via method calls. No derive, no proc-macros, no help generation. Parse arguments by calling `opt_value_from_str("--path")`, `contains("--pretty")`, and `subcommand()`.
+
+**Used by:** RazrFalcon's suite of tools (resvg, usvg, svgcleaner). Popular in the "small tool" Rust ecosystem. 11M+ total downloads on crates.io.
+
+**Evidence:**
+- argparse-rosetta-rs benchmarks: 384ms full debug build, 185ms incremental. 23 KiB binary overhead.
+- Zero dependencies. Zero proc-macros. 666 lines of code.
+- MSRV: 1.32. Compatible with any Rust version.
+- License: MIT.
+- No unsafe code (`#![forbid(unsafe_code)]`).
+
+**Strengths:**
+- Negligible compile-time and binary size impact.
+- Zero dependencies -- no transitive risk.
+- API is simple enough for 2 subcommands.
+- Matches tidalDB's dependency philosophy perfectly.
+
+**Weaknesses:**
+- No auto-generated `--help`. Must be hand-written (10-15 lines for this CLI).
+- No derive -- argument parsing is imperative code.
+- Subcommand routing is manual string matching.
+- Error messages are less polished than clap.
+
+#### Approach 4: `lexopt` 0.3.1
+
+**How it works:** Low-level lexer that yields tokens (`Short`, `Long`, `Value`). The application matches on tokens in a loop. One file, zero dependencies, zero macros.
+
+**Used by:** cargo (as `clap_lex` which is derived from lexopt's design), uutils.
+
+**Evidence:**
+- argparse-rosetta-rs benchmarks: 385ms full debug build, 184ms incremental. 34 KiB binary overhead.
+- Zero dependencies. MSRV 1.31. License: MIT/Apache-2.0.
+
+**Strengths:**
+- Handles `OsString` correctly (important for path arguments).
+- Slightly more structured than raw `std::env::args()`.
+
+**Weaknesses:**
+- More boilerplate than pico-args for the same result.
+- No subcommand abstraction -- everything is a token loop.
+- Slightly larger binary overhead than pico-args for less ergonomic API.
+
+#### Approach 5: Manual (`std::env::args()`)
+
+**How it works:** Read `std::env::args()` into a `Vec`, match on the first positional argument for the subcommand, iterate remaining args for flags.
+
+**Used by:** Many internal tools. SQLite's CLI is hand-rolled in C (not using getopt). DuckDB's CLI is based on SQLite's hand-rolled parser.
+
+**Evidence:**
+- Zero dependencies, zero binary overhead, zero compile time addition.
+- For 2 subcommands + 2 flags, this is approximately 50-80 lines of Rust.
+
+**Strengths:**
+- Absolute minimum footprint.
+- No dependency to maintain, audit, or version-pin.
+- Complete control over error messages.
+
+**Weaknesses:**
+- Must handle edge cases manually: `--path=` vs `--path `, `--` separator, unknown flags.
+- No help generation.
+- More code to maintain than pico-args for equivalent behavior.
+- Easy to introduce subtle parsing bugs (e.g., `--path` at end of args without value).
+
+### Comparison
+
+| Criterion | clap 4.x | argh 0.1.13 | pico-args 0.5.0 | lexopt 0.3.1 | Manual |
+|---|---|---|---|---|---|
+| Full debug build | 3s | 3s | 384ms | 385ms | 0ms |
+| Incremental build | 392ms | 203ms | 185ms | 184ms | 0ms |
+| Binary overhead (release) | 427-654 KiB | 38 KiB | 23 KiB | 34 KiB | 0 KiB |
+| Dependencies | ~10 transitive | ~3 (proc-macro) | 0 | 0 | 0 |
+| Auto `--help` | Yes | Yes | No | No | No |
+| Subcommand support | Native | Native | Manual matching | Manual matching | Manual matching |
+| Proc-macros | Yes (derive) | Yes (derive) | No | No | No |
+| `#![forbid(unsafe_code)]` | No (clap uses unsafe) | Unknown | Yes | Yes | Yes |
+| MSRV | 1.74 | ~1.56 (2018 ed.) | 1.32 | 1.31 | N/A |
+| Already in Cargo.lock | Yes (via criterion) | No | No | No | N/A |
+| License | MIT/Apache-2.0 | BSD-3-Clause | MIT | MIT/Apache-2.0 | N/A |
+| Lines of code (user-side) | ~25 (derive struct) | ~25 (derive struct) | ~40 (imperative) | ~50 (token loop) | ~60-80 |
+
+### Recommendation: Manual `std::env::args()` for `tidalctl`
+
+**The case is clear when you look at the actual scope.** `tidalctl` has 2 subcommands, 1 required flag, and 1 optional flag. This is a 60-line match statement, not a parser configuration problem.
+
+The key arguments:
+
+1. **The CODING_GUIDELINES.md test:** "Could we write this in 200 lines?" -- Yes, in about 60 lines, including help text and error messages. No dependency passes this bar for this scope.
+
+2. **`tidalctl` is a separate binary crate, not the library.** It will have its own `Cargo.toml`. Even though clap is in the workspace Cargo.lock via criterion, `tidalctl`'s release build would need to compile clap into the binary, adding 427+ KiB. The CLI binary should be small -- the `status` command reads a config file and prints JSON; it should not be a 1+ MiB binary.
+
+3. **The "escape hatch" argument favors manual.** If `tidalctl` grows to 5+ subcommands (e.g., `tidalctl compact`, `tidalctl backup`, `tidalctl schema`), switching from manual to pico-args or clap is a straightforward refactor. The reverse migration (clap to manual) is harder because derive macros become load-bearing.
+
+4. **Production precedent:** SQLite and DuckDB both use hand-rolled CLI parsers. For embedded database tooling with few commands, this is the norm, not the exception.
+
+**If the team prefers a library:** pico-args 0.5.0 is the right choice. Zero dependencies, 23 KiB overhead, `#![forbid(unsafe_code)]`, and the API is natural for this use case. Pin to `pico-args = "0.5"`.
+
+**Do not use clap for `tidalctl` at this scope.** It is the right tool for a CLI with 10+ subcommands and complex argument validation. It is overkill for 2 subcommands and would add 427 KiB to a binary that should be 100-200 KiB total.
+
+---
+
+## Question 2: Sync Embedded HTTP for Metrics Endpoint
+
+### Design Tension
+
+The m0p2 task document says: "Endpoint can run on the same Tokio runtime as host service (returns `Future` implementor)." But the research question notes: "Needs to work without Tokio as a hard dependency." These are in tension.
+
+**Resolution:** The metrics endpoint should be designed as a synchronous server running on a background `std::thread`. When a host application has Tokio, it can `tokio::task::spawn_blocking` to move the sync server onto its runtime. The API should return `std::thread::JoinHandle<()>`, not a `Future`. This is simpler, avoids a Tokio dependency, and is compatible with both async and sync host applications.
+
+A future `metrics-tokio` feature flag could add a `Future`-returning wrapper, but m0p2 does not need it.
+
+### Approaches Surveyed
+
+#### Approach 1: `tiny_http` 0.12.0
+
+**How it works:** Synchronous HTTP server using `std::net::TcpListener` internally with a thread pool. Handles HTTP/1.1 parsing, keep-alive, chunked transfer, content encoding. You call `server.recv()` in a loop and respond synchronously.
+
+**Used by:** devserver, nickel (legacy), numerous internal tools. 1.1K GitHub stars, 395 downstream crates.
+
+**Evidence:**
+- Version 0.12.0, released October 2022. Edition 2018. MSRV 1.57.
+- Core dependencies: `ascii`, `chunked_transfer`, `httpdate` -- minimal tree (~5 crates without TLS).
+- Size: 120 KB crate, ~2.5K source lines.
+- License: MIT/Apache-2.0.
+- No TLS needed for localhost metrics (disable all `ssl-*` features).
+- Uses some `unsafe` internally (HTTP parsing optimizations).
+
+**Strengths:**
+- Fully synchronous -- no Tokio dependency.
+- Handles HTTP edge cases (keep-alive, chunked, pipelining) correctly.
+- Mature, battle-tested for low-traffic use cases.
+- Simple API: `server.recv()` -> `Request` -> `request.respond(Response)`.
+
+**Weaknesses:**
+- Last release October 2022 -- 3+ years old. Active maintenance is uncertain.
+- Internal thread pool adds complexity tidalDB does not need for 2 endpoints.
+- Pulls in `ascii` and `chunked_transfer` crates -- small but nonzero dependency surface.
+- Uses `unsafe` internally, which cannot be audited as easily as a hand-rolled solution.
+- MSRV 1.57 is fine, but edition 2018 is dated.
+
+#### Approach 2: `rouille` 0.6.2
+
+**How it works:** Macro-based synchronous web framework built on top of `tiny_http`. Adds routing macros, form parsing, and session handling.
+
+**Used by:** Small Rust web projects. 1.1K GitHub stars.
+
+**Evidence:**
+- Built on `tiny_http` -- inherits its HTTP handling.
+- Adds significant API surface (routing macros, sessions, forms) that tidalDB does not need.
+- Last commit activity has slowed.
+- License: MIT/Apache-2.0.
+
+**Strengths:**
+- Routing macros reduce boilerplate for multi-endpoint servers.
+
+**Weaknesses:**
+- Wrapper around `tiny_http` -- adds dependency on top of dependency.
+- Routing macros are unnecessary for 2 endpoints.
+- Maintenance status unclear.
+- Fails the "200 lines" test -- we are adding a framework when we need 2 `if` branches.
+
+#### Approach 3: Hand-rolled (`std::net::TcpListener`)
+
+**How it works:** Bind a `TcpListener`, accept connections in a loop on a background thread, parse the HTTP request line (just the method and path), write a raw HTTP response. For 2 endpoints with static-ish content, this is ~80-120 lines.
+
+**Used by:** The Rust Book's web server tutorial uses this exact pattern. Prometheus client libraries in other languages often use minimal HTTP for the `/metrics` endpoint. SQLite does not embed an HTTP server, but the pattern is standard for database diagnostics (e.g., RocksDB statistics are often exposed via a hand-rolled HTTP endpoint in embedding applications).
+
+**Evidence:**
+- Zero dependencies. Zero binary overhead.
+- The Rust standard library's `TcpListener` + `BufReader` handles everything needed for HTTP/1.1 request parsing at this scale.
+- For `/healthz` and `/metrics` with <10 concurrent connections, HTTP keep-alive and chunked transfer are unnecessary -- `Connection: close` on every response is acceptable.
+
+**Strengths:**
+- Zero dependencies -- maximally embeddable.
+- Audit surface is 80-120 lines of code that the team wrote and understands.
+- No `unsafe` (stays within `#![forbid(unsafe_code)]`).
+- Thread model is explicit: one `std::thread::spawn` with a loop, one `TcpListener`.
+- Trivially testable: connect with `std::net::TcpStream` in integration tests.
+
+**Weaknesses:**
+- Must handle HTTP parsing manually. But for this scope: read the first line, split on spaces, match path. Malformed requests get a 400 response. This is ~20 lines.
+- No keep-alive, no chunked transfer, no content encoding. Acceptable for dev/ops metrics endpoint at <10 connections.
+- If requirements grow (TLS, WebSocket, many endpoints), must migrate to a real server. But m0p2 has 2 endpoints.
+
+#### Approach 4: `axum` + Tokio (async)
+
+**How it works:** Full async web framework built on `hyper` and `tokio`. Tower middleware ecosystem, type-safe extractors, Router-based routing.
+
+**Used by:** Most production Rust web services. The ecosystem standard for async HTTP.
+
+**Evidence:**
+- Pulls in `tokio`, `hyper`, `tower`, `http`, and dozens of transitive dependencies.
+- Binary size impact: 1-3 MiB.
+- Compile time: 10-20s for a clean build.
+
+**Strengths:**
+- Production-grade HTTP handling.
+- Seamless integration if the host application already runs Tokio.
+
+**Weaknesses:**
+- **Fundamentally incompatible with tidalDB's embeddable philosophy.** Adding Tokio as a dependency means every embedder must link Tokio, even if they never enable metrics. Feature-gating mitigates this, but the `metrics` feature would still pull in the entire async runtime.
+- Massive dependency tree for 2 endpoints.
+- Does not pass the "200 lines" test by orders of magnitude.
+
+#### Approach 5: `warp` (async, Tokio-based)
+
+Same category as axum. Pulls Tokio. Same disqualification for the same reasons.
+
+### Comparison
+
+| Criterion | tiny_http 0.12 | rouille 0.6 | Hand-rolled | axum + Tokio |
+|---|---|---|---|---|
+| Async? | No (sync) | No (sync) | No (sync) | Yes |
+| Dependencies | ~5 crates | ~8 crates (via tiny_http) | 0 | ~50+ crates |
+| Binary size impact | ~50-80 KiB | ~80-120 KiB | 0 KiB | 1-3 MiB |
+| Compile time impact | ~1-2s | ~2-3s | 0s | 10-20s |
+| HTTP correctness | Full HTTP/1.1 | Full HTTP/1.1 | Minimal (sufficient) | Full HTTP/1.1 + HTTP/2 |
+| `#![forbid(unsafe_code)]` | No (internal unsafe) | No | Yes | No |
+| MSRV | 1.57 | Unknown | N/A (std only) | ~1.70+ |
+| Maintenance | Last release Oct 2022 | Uncertain | N/A (owned code) | Active |
+| License | MIT/Apache-2.0 | MIT/Apache-2.0 | N/A | MIT |
+| Shutdown coordination | `server.unblock()` | `server.unblock()` | `AtomicBool` flag | `tokio::sync::oneshot` |
+| Concurrent connections | Thread pool | Thread pool | Sequential (acceptable) | Async (unlimited) |
+
+### Recommendation: Hand-rolled `std::net::TcpListener`
+
+**For 2 endpoints serving <10 concurrent connections in a dev/ops context, a hand-rolled HTTP listener is the correct choice.**
+
+The arguments:
+
+1. **The "200 lines" test is decisive.** The entire metrics HTTP server -- binding, accept loop, request parsing, routing, response formatting, graceful shutdown -- fits in ~100-120 lines of safe Rust. No dependency justifies its existence here.
+
+2. **Zero dependency cost.** The `metrics` feature flag should add only tidalDB's own code, not a third-party HTTP server. An embedder who enables `metrics` should not be surprised by new transitive dependencies.
+
+3. **`#![forbid(unsafe_code)]` compatibility.** tiny_http uses unsafe internally. A hand-rolled solution stays within tidalDB's safety guarantees.
+
+4. **Shutdown is trivial with an `AtomicBool`.** The background thread checks `running.load(Ordering::Relaxed)` on each accept iteration. `TcpListener::set_nonblocking(true)` with a 100ms poll interval, or use `TcpListener` with `SO_REUSEADDR` and connect-to-self to unblock. Alternatively, set a short `accept` timeout.
+
+5. **The "escape hatch" works both directions.** If m0p2 grows beyond 2 endpoints or needs TLS, migrating to tiny_http or axum is straightforward -- the endpoint handler functions remain the same, only the server harness changes.
+
+**API design:**
+
+```rust
+/// Start the metrics HTTP server on a background thread.
+///
+/// Returns a handle that stops the server when dropped.
+pub fn start_metrics_server(addr: std::net::SocketAddr, db: Arc) -> MetricsHandle;
+
+pub struct MetricsHandle {
+ shutdown: Arc,
+ thread: Option>,
+}
+
+impl Drop for MetricsHandle {
+ fn drop(&mut self) {
+ self.shutdown.store(true, Ordering::Release);
+ if let Some(handle) = self.thread.take() {
+ let _ = handle.join();
+ }
+ }
+}
+```
+
+**Tokio compatibility:** An embedder running Tokio can wrap this in `tokio::task::spawn_blocking(|| start_metrics_server(...))`. No tidalDB code needs to know about Tokio.
+
+---
+
+## Question 3: Prometheus Text Format
+
+### Format Specification
+
+The Prometheus text exposition format (version 0.0.4) is line-oriented, UTF-8 encoded, with `\n` line endings:
+
+```
+# HELP
+# TYPE
+{="",...} []
+```
+
+Rules:
+- `# HELP` and `# TYPE` must appear before the first sample for a metric.
+- Only one `# HELP` and one `# TYPE` per metric name.
+- If `# TYPE` is omitted, metric defaults to `untyped`.
+- Label values must escape `\` as `\\`, `"` as `\"`, `\n` as `\\n`.
+- Values are Go `ParseFloat` format: integers, floats, `NaN`, `+Inf`, `-Inf`.
+- Timestamp is optional (milliseconds since epoch). Prometheus will use scrape time if omitted.
+- Content-Type: `text/plain; version=0.0.4; charset=utf-8`.
+
+### Example for tidalDB's metrics
+
+```
+# HELP tidaldb_uptime_seconds Seconds since the database was opened.
+# TYPE tidaldb_uptime_seconds gauge
+tidaldb_uptime_seconds{partition_id="0"} 3723.5
+
+# HELP tidaldb_wal_sequence Current WAL sequence number.
+# TYPE tidaldb_wal_sequence counter
+tidaldb_wal_sequence{partition_id="0"} 148293
+
+# HELP tidaldb_wal_queue_depth Number of WAL entries pending flush.
+# TYPE tidaldb_wal_queue_depth gauge
+tidaldb_wal_queue_depth{partition_id="0"} 12
+
+# HELP tidaldb_build_info Build metadata. Value is always 1.
+# TYPE tidaldb_build_info gauge
+tidaldb_build_info{version="0.1.0",build_hash="abc123",partition_id="0"} 1
+
+# HELP tidaldb_open_segments Number of open WAL segments.
+# TYPE tidaldb_open_segments gauge
+tidaldb_open_segments{partition_id="0"} 3
+```
+
+### Approaches Surveyed
+
+#### Approach 1: `prometheus` crate (tikv/rust-prometheus) 0.13.x
+
+**How it works:** Registry-based. Create `Counter`, `Gauge`, `Histogram` objects, register them with a `Registry`, call `TextEncoder::encode()` to produce the exposition format.
+
+**Used by:** TiKV, Linkerd, numerous Rust services. The de facto standard.
+
+**Evidence:**
+- Well-maintained (tikv organization). License: Apache-2.0.
+- Pulls in `protobuf` (for optional protobuf format), `lazy_static`, `parking_lot`, `memchr`.
+- Forces string allocations during metric collection (Collector trait limitation).
+- Binary size: ~100-200 KiB.
+- MSRV: 1.56.
+
+**Strengths:**
+- Battle-tested encoding. Guaranteed format correctness.
+- Histogram and summary support built-in.
+
+**Weaknesses:**
+- Significant dependency tree for 5 counters/gauges.
+- `protobuf` dependency is unnecessary for text-only exposition.
+- Allocation-heavy collector API (documented ~40% slower than prometheus-client).
+- Overkill: we need `writeln!` for 5 metrics, not a registry system.
+
+#### Approach 2: `prometheus-client` crate 0.22.x
+
+**How it works:** OpenMetrics-compatible. Type-safe labels via Rust type system (not string pairs). Visitor-based encoding (no allocations).
+
+**Used by:** Official Prometheus Rust client. Recommended for new projects.
+
+**Evidence:**
+- Prometheus organization maintained. License: Apache-2.0.
+- No unsafe code.
+- ~40% faster encoding than tikv/rust-prometheus due to visitor pattern.
+- Smaller dependency footprint than tikv version.
+
+**Strengths:**
+- Type-safe labels catch errors at compile time.
+- No allocation during encoding.
+- Official Prometheus project.
+
+**Weaknesses:**
+- Still a registry-based abstraction layer for 5 metrics.
+- Adds dependency tree that is not justified for the scope.
+
+#### Approach 3: Hand-written format
+
+**How it works:** Use `write!` / `writeln!` to a `String` or `Vec`, following the format spec directly. For 5 counters/gauges with static names and 1-2 labels, this is a function that reads metric values and formats them.
+
+**Evidence:**
+- The format is trivially simple for counters and gauges. The complete formatting logic for 5 metrics is ~30-40 lines.
+- No histograms or summaries needed at m0p2 scope.
+- Validation: the output must match `# HELP`, `# TYPE`, then metric lines. A unit test can assert the format parses correctly (or simply check line structure).
+
+**Strengths:**
+- Zero dependencies.
+- Complete control over output format.
+- Trivially auditable -- the format spec is 1 page.
+- No registry overhead, no trait objects, no allocations beyond the output buffer.
+
+**Weaknesses:**
+- Must follow the spec precisely. If a label value contains `"` or `\n`, it must be escaped. For tidalDB's labels (`partition_id="0"`, `version="0.1.0"`), these are compile-time string literals -- no escaping needed.
+- If tidalDB grows to 50+ metrics with histograms, a library becomes justified. But at 5-10 counters/gauges, it is not.
+
+### Comparison
+
+| Criterion | prometheus (tikv) | prometheus-client | Hand-written |
+|---|---|---|---|
+| Dependencies | ~8 (incl. protobuf) | ~3 | 0 |
+| Binary size | ~100-200 KiB | ~50-100 KiB | 0 KiB |
+| Histogram support | Yes | Yes | No (not needed) |
+| Allocation during encode | Yes (Collector trait) | No (visitor pattern) | No (write! to buffer) |
+| Format correctness | Guaranteed | Guaranteed | Unit-tested |
+| Lines of code (user-side) | ~30 (register + encode) | ~30 (register + encode) | ~40 (format directly) |
+| `#![forbid(unsafe_code)]` | Unknown | Yes | Yes |
+
+### Recommendation: Hand-written Prometheus text format
+
+For 5-10 counters and gauges with known-safe label values, hand-writing the exposition format is the clear choice. The implementation is approximately 40 lines:
+
+```rust
+use std::fmt::Write;
+
+pub fn render_prometheus_metrics(metrics: &MetricsSnapshot) -> String {
+ let mut out = String::with_capacity(1024);
+
+ write_gauge(&mut out, "tidaldb_uptime_seconds",
+ "Seconds since the database was opened",
+ &[("partition_id", "0")], metrics.uptime_secs);
+
+ write_counter(&mut out, "tidaldb_wal_sequence",
+ "Current WAL sequence number",
+ &[("partition_id", "0")], metrics.wal_sequence);
+
+ // ... more metrics
+ out
+}
+
+fn write_gauge(out: &mut String, name: &str, help: &str,
+ labels: &[(&str, &str)], value: f64) {
+ let _ = writeln!(out, "# HELP {name} {help}");
+ let _ = writeln!(out, "# TYPE {name} gauge");
+ write_sample(out, name, labels, value);
+}
+
+fn write_counter(out: &mut String, name: &str, help: &str,
+ labels: &[(&str, &str)], value: f64) {
+ let _ = writeln!(out, "# HELP {name} {help}");
+ let _ = writeln!(out, "# TYPE {name} counter");
+ write_sample(out, name, labels, value);
+}
+
+fn write_sample(out: &mut String, name: &str,
+ labels: &[(&str, &str)], value: f64) {
+ let _ = write!(out, "{name}{{");
+ for (i, (k, v)) in labels.iter().enumerate() {
+ if i > 0 { let _ = write!(out, ","); }
+ let _ = write!(out, "{k}=\"{v}\"");
+ }
+ let _ = writeln!(out, "}} {value}");
+}
+```
+
+**When to migrate:** If tidalDB needs histograms (e.g., query latency distributions) or 50+ metrics, adopt `prometheus-client` (the official Prometheus crate, not tikv's). Pin to `prometheus-client = "0.22"`. But that is a post-m0p2 decision.
+
+---
+
+## Question 4: Serde for Config Serialization
+
+### Current State
+
+`Config` is a 4-field struct (`mode: StorageMode`, `data_dir: Option`, `wal_dir: Option`, `cache_dir: Option`). It currently has no serialization support. The CLI needs to read a serialized config snapshot from disk.
+
+### Approaches Surveyed
+
+#### Approach 1: `serde` + `serde_json` (feature-gated on library crate)
+
+**How it works:** Add `#[derive(Serialize, Deserialize)]` to `Config` and `StorageMode` behind a `serde` feature flag. The CLI binary depends on the library with the `serde` feature enabled. `serde_json` handles the JSON encoding.
+
+**Evidence:**
+- `serde` (1.0.228) and `serde_json` (1.0.149) are already in `Cargo.lock` via criterion.
+- CODING_GUIDELINES.md line 296 explicitly approves serde/serde_json: "serialization (at API boundaries only, not in hot paths)."
+- Best practice from Rust API Guidelines and community consensus: library crates should feature-gate serde behind an optional `serde` feature.
+- Binary size: serde_json adds ~70-100 KiB to release binaries. serde_derive's proc-macro adds ~5-10s to initial compile, but is already compiled for criterion.
+- fjall (tidalDB's storage engine) does not use serde -- adding it to tidalDB does not create a circular dependency or conflict.
+
+**Strengths:**
+- Industry standard. Every Rust developer knows serde.
+- Already approved in CODING_GUIDELINES.md.
+- Already compiled in dev builds (via criterion).
+- Feature-gated: embedders who do not need serialization pay zero cost.
+- Config is at an API boundary (CLI reads library's config), exactly where serde belongs.
+
+**Weaknesses:**
+- serde_derive adds proc-macro compile time. Mitigated by: already compiled for criterion.
+- Monomorphization can bloat binary. Mitigated by: Config is a small struct with 4 fields; the generated code is minimal.
+
+#### Approach 2: `miniserde`
+
+**How it works:** Lightweight alternative to serde that uses trait objects instead of monomorphization. ~12x less code than serde + serde_derive + serde_json combined.
+
+**Evidence:**
+- JSON-only. No format plugins.
+- No error messages on deserialization failure.
+- Does not support enums with data (only C-style enums). `StorageMode` is C-style, so this works.
+- Does not support `#[serde(rename)]` or most serde attributes.
+- Limited type support (no tuple structs, no enums with variant data).
+
+**Strengths:**
+- Smaller binary size than serde.
+- Faster compile time (no proc-macro overhead comparable to serde_derive).
+
+**Weaknesses:**
+- serde is already compiled in the workspace. miniserde adds a *new* dependency tree rather than reusing what exists.
+- No error messages -- if the CLI reads a corrupt config file, it gets `None` with no indication of what went wrong.
+- Would become a migration tax later when tidalDB needs serde for other types (e.g., schema definitions, ranking profiles).
+
+#### Approach 3: Hand-written JSON serialization
+
+**How it works:** Implement `Display` for `Config` that writes JSON manually, and a `from_json_str` function that parses it. For a 4-field struct, this is ~50-80 lines.
+
+**Evidence:**
+- Zero dependencies.
+- But: manual JSON parsing is error-prone. Escaping, nested objects, null handling, and whitespace tolerance all need implementation.
+- tidalDB will need JSON serialization in multiple places beyond Config (API responses, query results, schema export). Implementing a JSON parser from scratch to avoid an already-approved dependency is false economy.
+
+**Strengths:**
+- Zero dependency cost.
+
+**Weaknesses:**
+- JSON parsing is not a 200-line problem if done correctly. Escaping, unicode, nested structures, error reporting -- this is exactly what serde_json solves.
+- Creates maintenance burden that serde eliminates.
+- CODING_GUIDELINES.md already approved serde for this exact use case.
+
+### Comparison
+
+| Criterion | serde + serde_json | miniserde | Hand-written |
+|---|---|---|---|
+| Already in Cargo.lock | Yes (via criterion) | No | N/A |
+| Approved in CODING_GUIDELINES | Yes (explicitly) | No | N/A |
+| Error messages on parse failure | Yes (detailed) | None | Custom |
+| Enum support | Full | C-style only | Custom |
+| Future reuse in tidalDB | High (schema, API, query results) | Low | Low |
+| Binary size overhead | ~70-100 KiB | ~30-50 KiB | 0 KiB |
+| Compile time overhead | 0s (already compiled) | New compilation | 0s |
+| Correctness risk | None (battle-tested) | Low | Medium (hand-rolled parser) |
+
+### Recommendation: `serde` + `serde_json`, feature-gated
+
+**This is the one dependency question where the answer is unambiguously "use the library."**
+
+1. **Already approved.** CODING_GUIDELINES.md says: "serde / serde_json -- serialization (at API boundaries only, not in hot paths)." Config serialization for CLI communication is the textbook API boundary use case.
+
+2. **Already compiled.** Both crates are in Cargo.lock via criterion. Adding them as optional dependencies of the main crate adds zero compile time for developers who are already running tests and benchmarks.
+
+3. **Future-proof.** tidalDB will need JSON serialization for: config export, schema definitions, query result formatting, API responses, ranking profile serialization. Every one of these will use serde. Starting with Config establishes the pattern.
+
+4. **Feature-gate it.** The library crate adds:
+
+```toml
+[dependencies]
+serde = { version = "1", features = ["derive"], optional = true }
+serde_json = { version = "1", optional = true }
+
+[features]
+serde = ["dep:serde", "dep:serde_json"]
+```
+
+And on the struct:
+
+```rust
+#[derive(Debug, Clone)]
+#[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
+pub struct Config {
+ pub mode: StorageMode,
+ pub data_dir: Option,
+ pub wal_dir: Option,
+ pub cache_dir: Option,
+}
+```
+
+Embedders who do not need serialization pay nothing. The `tidalctl` binary crate depends on `tidaldb = { path = "../tidal", features = ["serde"] }`.
+
+---
+
+## Open Questions
+
+1. **Config file format and location.** m0p2 task-01 says the CLI reads a "Config dump." Where does the running database write this? Likely `{data_dir}/config.json` written atomically during `TidalDb::open()`. The exact path should be a `Paths` method (e.g., `paths.config_file()`). This is an implementation decision for the engineer, not a research question.
+
+2. **Metrics collection mechanism.** The hand-rolled metrics HTTP server needs to read metrics from the database. What is the interface? Options: (a) `TidalDb` exposes a `pub fn metrics_snapshot(&self) -> MetricsSnapshot` method; (b) a shared `Arc` counter registry. Option (a) is simpler and keeps the metrics code behind the public API. The engineer should decide based on what metrics are available at m0p2 (uptime and build info are trivial; WAL sequence requires WAL to be wired up).
+
+3. **Graceful shutdown of the HTTP listener.** `std::net::TcpListener::accept()` blocks. To unblock it for shutdown, three options: (a) `set_nonblocking(true)` with a polling loop (simple, slight CPU waste); (b) connect-to-self to unblock accept (clever, no CPU waste); (c) use `SO_REUSEADDR` + `shutdown` on a cloned socket. Option (a) with a 200ms sleep is the simplest and sufficient for a diagnostics endpoint. Benchmark the CPU overhead if concerned -- it will be negligible for a 200ms poll.
+
+4. **When to add `clap`.** If `tidalctl` grows beyond 5 subcommands or needs dynamic completions, switch to `clap`. The migration from manual to clap is a single-commit refactor: define a derive struct matching the existing `match` arms. Document this as the escape hatch in the `tidalctl` crate README.
+
+5. **When to add `prometheus-client`.** If tidalDB needs histograms (query latency distributions, signal write latency distributions) or exceeds 20 metrics, adopt `prometheus-client = "0.22"`. The hand-written format functions become a `MetricFamily` registration. Document the threshold.
+
+6. **Integration testing the HTTP endpoint.** The test should `start_metrics_server` on an ephemeral port, `GET /metrics` with `std::net::TcpStream`, and assert the response contains expected metric lines. This is straightforward with the hand-rolled approach and does not require an HTTP client library -- raw TCP + string matching is sufficient.
+
+---
+
+## Summary of Recommendations
+
+| Component | Recommendation | Justification |
+|---|---|---|
+| CLI argument parsing | Manual `std::env::args()` | 2 subcommands, 60 lines. "200 lines" test passes. Upgrade path to pico-args/clap exists. |
+| HTTP metrics server | Hand-rolled `std::net::TcpListener` | 2 endpoints, <10 connections. ~100 lines of safe Rust. Zero dependencies. |
+| Prometheus text format | Hand-written `write!` formatting | 5-10 counters/gauges. ~40 lines. Format spec is trivial for this scope. |
+| Config serialization | `serde` + `serde_json`, feature-gated | Already approved, already compiled, future-proof. Feature-gate as `serde`. |
+
+**Total new dependencies for m0p2:** One optional dependency pair (`serde` + `serde_json`) that is already in Cargo.lock and already approved. Everything else is standard library code.
+
+**Estimated code footprint for m0p2 tooling:**
+- `tidalctl` binary: ~150-200 lines (arg parsing + config reading + JSON output)
+- Metrics HTTP server: ~100-120 lines (listener + routing + response)
+- Prometheus formatter: ~40-50 lines (metric rendering)
+- Config serde derives: ~5 lines (derive attributes + feature gate)
+
+---
+
+## Sources
+
+### CLI Argument Parsing
+- [Rain's Rust CLI Recommendations: Picking an Argument Parser](https://rust-cli-recommendations.sunshowers.io/cli-parser.html)
+- [argparse-rosetta-rs: Benchmark data for Rust argument parsers](https://github.com/rosetta-rs/argparse-rosetta-rs) -- compile time, binary size, parse time comparisons
+- [pico-args: Ultra simple CLI arguments parser](https://github.com/RazrFalcon/pico-args) -- 666 lines, zero deps, `#![forbid(unsafe_code)]`
+- [lexopt: Minimalist pedantic command line parser](https://github.com/blyxxyz/lexopt) -- MSRV 1.31, zero deps
+- [clap: Full featured CLI parser](https://docs.rs/clap/latest/clap/) -- MSRV 1.74, derive API
+- [argh: Google's derive-based parser](https://github.com/google/argh) -- BSD-3-Clause, Fuchsia conventions
+- [Rust CLI argument parsing libraries comparison (jpab.uk)](https://www.jpab.uk/blog/review-rust-cli-flag-parsers/)
+
+### HTTP Servers
+- [tiny-http: Low level HTTP server library in Rust](https://github.com/tiny-http/tiny-http) -- v0.12.0, MSRV 1.57, 1.1K stars
+- [Rust Book: Building a Multithreaded Web Server](https://doc.rust-lang.org/book/ch21-02-multithreaded.html) -- std::net::TcpListener pattern
+- [rouille: Synchronous micro-framework on crates.io](https://crates.io/crates/rouille)
+- [Is there any popular synchronous HTTP crate? (Rust Forum)](https://users.rust-lang.org/t/is-there-any-popular-synchronous-http-crate/108111)
+
+### Prometheus Text Format
+- [Prometheus Exposition Formats (official specification)](https://prometheus.io/docs/instrumenting/exposition_formats/) -- format version 0.0.4
+- [tikv/rust-prometheus: Instrumentation library](https://github.com/tikv/rust-prometheus) -- Collector trait, string allocation issue
+- [prometheus/client_rust: Official Prometheus Rust client](https://github.com/prometheus/client_rust) -- visitor pattern, no unsafe, ~40% faster encoding
+- [OpenMetrics specification](https://prometheus.io/docs/specs/om/open_metrics_spec/)
+
+### Serialization
+- [Serde: Serialization framework for Rust](https://serde.rs/) -- feature flags documentation
+- [Serde use within a library -- best practices (Rust Forum)](https://users.rust-lang.org/t/serde-use-within-a-library-best-practices/111059) -- feature-gating consensus
+- [miniserde: Data structure serialization library](https://docs.rs/miniserde) -- 12x less code than serde, JSON-only, limited type support
+- [Rust serialization benchmarks](https://github.com/djkoloski/rust_serialization_benchmark)
+- [Rust API Guidelines: Feature flag naming for serde](https://github.com/rust-lang/api-guidelines/discussions/180)
diff --git a/site/content/blog/every-platform-builds-the-same-6-systems.mdx b/site/content/blog/every-platform-builds-the-same-6-systems.mdx
new file mode 100644
index 0000000..2f30204
--- /dev/null
+++ b/site/content/blog/every-platform-builds-the-same-6-systems.mdx
@@ -0,0 +1,145 @@
+---
+title: "Every content platform builds the same 6 systems from scratch"
+date: "2026-02-20"
+author: "tidalDB"
+description: "The Elasticsearch + Redis + Kafka + feature store + vector DB + ranking service stack is not an architecture. It is scar tissue. Here is why."
+tags: ["architecture", "vision", "recommendation-systems"]
+---
+
+You have operated this system. You may be operating it right now.
+
+Elasticsearch for retrieval. Redis for hot signals. Kafka for event ingestion. A feature store for user profiles. A vector database for semantic similarity. A ranking service that stitches all five together into a sorted list and hopes the data is consistent by the time it arrives.
+
+Six systems. Six deployment targets. Six failure modes. Six sets of credentials, backup strategies, scaling characteristics, and on-call runbooks. All of them maintained by your team, all of them in service of one question: *given this user, right now, what should they see?*
+
+This post is about why the stack exists, why it persists, and what should be true instead.
+
+## The six systems, named
+
+They show up in the same order at every company. The specifics vary -- Solr instead of Elasticsearch, Memcached instead of Redis, Pulsar instead of Kafka -- but the shape is identical.
+
+**System 1: The search index.** Elasticsearch, Solr, or Typesense. Ingests your content catalog, tokenizes text, builds an inverted index, returns results ranked by BM25. It handles keyword search well. It handles everything else poorly. You will spend months teaching it to sort by "trending" using a score field you update from outside, on a schedule, that is stale before the update finishes.
+
+**System 2: The cache layer.** Redis or Memcached. Holds the hot data that the search index cannot serve fast enough -- trending scores, view counts, precomputed ranking features. You will write a cache invalidation layer. It will have bugs. The bugs will manifest as users seeing content that should have been suppressed, or not seeing content that should have surfaced. These bugs will be intermittent, hard to reproduce, and never fully resolved.
+
+**System 3: The event bus.** Kafka, Pulsar, or Kinesis. Ingests engagement signals -- views, likes, skips, shares -- and routes them to consumers that update every other system. The consumers will lag. Not always. Not predictably. But at 2am on a Saturday when a piece of content goes viral, the lag between "user liked this" and "the ranking query reflects it" will stretch from milliseconds to seconds to minutes. Your users will notice.
+
+**System 4: The feature store.** Feast, Tecton, or a homegrown Redis-backed key-value store. Holds user profiles, engagement histories, computed features. Exists because the ranking service needs user context at query time and cannot afford to compute it on the fly. The feature store introduces its own consistency problem: the features it serves are snapshots. By the time they reach the ranker, the user may have liked three more items and blocked a creator. The features do not know this.
+
+**System 5: The vector database.** Pinecone, Weaviate, Qdrant, Milvus, or pgvector bolted onto PostgreSQL. Holds content embeddings for semantic similarity search. Takes a user preference vector or a query embedding, returns the nearest neighbors. The problem: it knows nothing about signals, recency, relationships, or diversity. It returns semantically similar content. Whether that content is trending, stale, hidden by the user, or from a blocked creator -- not its concern.
+
+**System 6: The ranking service.** The application you wrote. A microservice that calls systems 1 through 5 in sequence, merges their outputs, applies scoring logic, enforces diversity rules, handles edge cases, and returns a sorted list. This is the system that has the most bugs, the most latency, and the most institutional knowledge locked in the heads of two engineers who are not allowed to go on vacation at the same time.
+
+Six systems. None of them were built for the ranking problem. All of them are pressed into service because there is no single system that was.
+
+## Where correctness dies
+
+The failure modes are not in the systems themselves. Redis is fast. Kafka is durable. Elasticsearch is a competent search engine. The failure modes live in the seams between them.
+
+**Stale signals.** A user likes an item. The event enters Kafka. A consumer processes it and updates Redis. Another consumer updates the feature store. A third updates Elasticsearch's score field. Each update happens at a different time. Between the first update and the last, the ranking service is reading a mix of old and new state. The feed the user sees is computed from data that contradicts itself.
+
+This is not a theoretical concern. It is Tuesday.
+
+**Cache invalidation.** The trending score in Redis says an item is hot. The engagement data in the feature store says it is not -- the initial burst of views came from a bot network and the quality signals collapsed an hour ago. The cache TTL has not expired. The item remains in the trending feed for another 14 minutes. Fourteen minutes is an eternity in a content platform. Thousands of users see a recommendation the system already knows is wrong.
+
+**ETL lag.** The feature store runs a batch pipeline every 15 minutes to recompute user preference vectors. A user blocks a creator at minute 1. For the next 14 minutes, the blocked creator's content still appears in the user's feed. Not because the system is broken. Because the architecture is designed around batched state synchronization, and batched state synchronization is, by definition, eventually wrong.
+
+**The feedback gap.** A user skips three items in a row from the same creator. The skip events enter Kafka. They will eventually update the user's preference vector in the feature store and the creator's penalty score in Redis. Eventually. In the meantime, the ranking service is still using the stale preference vector and the stale creator score. It recommends a fourth item from the same creator. The user taps "Not interested." A fifth item appears. The user closes the app.
+
+This is not a bug in any one system. It is the architecture working exactly as designed. The architecture is the bug.
+
+**Agents make the seams worse.** When you add an LLM-mediated agent to the loop, the agent needs to ground its answer in fresh memory and emit feedback (preference hints, critiques, reward). In the 6-system stack those feedback signals live in a scratchpad or a sidecar vector store. None of the six systems know about them, which means the agent is reasoning over a different world than the ranking service. Latency compounds; correctness dies even faster.
+
+## How we got here
+
+The 6-system stack is not the product of deliberate design. It is an accretion. Understanding how it forms explains why it persists.
+
+**Phase 1: Search.** The platform launches with a content catalog and a search bar. The team picks Elasticsearch because it handles full-text search. This is a reasonable decision. Elasticsearch is good at search.
+
+**Phase 2: Ranking.** Users want more than search. They want a feed -- personalized, sorted by relevance, refreshed on every visit. Elasticsearch can sort by a score field, so the team adds a `ranking_score` field and updates it with a cron job. The cron job reads engagement data from the application database, computes a formula, and writes the result to Elasticsearch. This works for six months.
+
+**Phase 3: Speed.** The ranking formula needs real-time signal data -- view counts, like counts, trending velocity. The application database cannot serve these at the read frequency the ranking service demands. The team adds Redis as a hot cache. Now the ranking formula reads from Redis instead of the application database. Engagement data flows into Redis via application writes. This works, but cache invalidation becomes a recurring source of bugs.
+
+**Phase 4: Scale.** The platform grows. Engagement events arrive at thousands per second. Writing directly to Redis and Elasticsearch from the application path introduces latency on every user action. The team adds Kafka as a buffer. Events flow into Kafka, and consumers asynchronously update Redis, Elasticsearch, the feature store, and the vector database. The system is now eventually consistent. "Eventually" is doing a lot of work in that sentence.
+
+**Phase 5: Personalization.** Users want personalized results, not just globally popular content. Personalization requires per-user feature vectors -- engagement history, topic affinity, creator preferences. These features are too expensive to compute at query time. The team adds a feature store that batch-computes user vectors and serves them as key-value lookups. The feature store is always stale by the duration of its batch interval.
+
+**Phase 6: Semantic search.** Users expect "find me something like this" to work. Keyword matching cannot do this. The team adds a vector database for embedding-based similarity search. The vector database knows nothing about engagement, recency, or user context. The ranking service must call it separately and merge its results with the keyword results, the cached signals from Redis, and the user features from the feature store.
+
+Each step is individually rational. The result is collectively irrational. A distributed system with six sources of truth, six consistency models, and one ranking service trying to produce a coherent answer from all of them.
+
+## The root cause
+
+The stack exists because existing databases were not built with ranking in mind. This is not a criticism -- PostgreSQL, Elasticsearch, and Redis were built to solve different problems, and they solve them well. But when you ask a search engine to be a ranking engine, you inherit the wrong abstraction.
+
+A search engine models data as documents with fields. You search for documents matching a query. You sort by a field. The field is a static value that you update from outside.
+
+But ranking is not a static value. A "trending score" is a velocity -- the rate of change of engagement signals over a time window. It changes every second. An "engagement decay score" is a function of time since the last signal event. It changes continuously, without any new data arriving. A "personalized relevance score" is a function of the user's preference vector, the item's embedding, the user's relationship to the creator, the item's signal history, and the diversity of the current result set. It is different for every user, every query, every moment.
+
+None of these are fields. They are computations that depend on temporal state, user context, and signal dynamics. Forcing them into a field-update model is what creates the 6-system stack. You need Redis because the search engine cannot compute these values fast enough. You need Kafka because updating them synchronously is too slow. You need a feature store because user context is too expensive to derive at query time. You need a vector database because semantic similarity is a different index structure entirely.
+
+The seams are not incidental. They are structural. They exist because the foundational abstraction -- data as documents with static fields -- does not fit the problem.
+
+## What should be true
+
+A database that understands ranking as a primitive would not need the stack. Here is what it would look like.
+
+**Signals are a schema-level type.** A "view" signal is not a counter you increment in Redis and hope stays consistent. It is a typed, timestamped event stream declared in the database schema, with a decay rate, a set of time windows, and velocity computation -- all maintained by the database. You write the event. The database handles aggregation, windowing, and decay. When you query for "trending," the database reads signal velocity directly. No external cache. No stale scores.
+
+**User context is a database-managed state.** The user's preference vector is not a row in a feature store updated every 15 minutes. It is a living embedding that the database shifts every time the user engages with content. A like shifts it toward the item's embedding. A skip shifts it away. The next query reflects this. Not in 15 minutes. Now.
+
+**The write path and the read path are one system.** When a user likes an item, the database atomically updates the item's signal ledger, the user's preference vector, and the user-to-creator relationship weight. No event bus between the engagement and the ranking update. No consumer lag. No eventual consistency. The write *is* the ranking update.
+
+**Negative signals are equal citizens.** A skip is not the absence of a like. It is data. A hide is a permanent exclusion. A block removes all of a creator's content from all future queries. These are not afterthought filter operations applied in the ranking service. They are first-class signal types with their own decay rates, their own velocity, and their own weight in the scoring function.
+
+**Diversity is a query constraint.** "No more than 2 items per creator" is not a post-processing step in your API layer. It is a parameter the database enforces after scoring, as part of the query execution pipeline. The application specifies the constraint. The database enforces it. The result set is reordered, not reduced.
+
+**All sort modes are native.** Trending, hot, rising, controversial, hidden gems, top-this-week, shuffle -- these are not formulas your application computes and passes to the database as a sort key. They are built-in sort modes the database executes natively, using signal velocity, windowed aggregation, and decay functions it already maintains.
+
+This is not a fantasy. Every one of these properties follows from a single architectural decision: model signals, decay, velocity, and user context as database primitives, not as application logic distributed across six systems.
+
+## One question, one query
+
+The 6-system stack exists to answer one question: given this user, right now, what should they see?
+
+That question should be one query.
+
+Not six network calls. Not a ranking service that merges five data sources and hopes they agree. Not a system where "consistency" means "consistent within each subsystem, inconsistent across all of them."
+
+One query that retrieves candidates, applies filters, scores using live signals and user context, enforces diversity, and returns a ranked list. One query where the data is never stale because the write path and the read path share a storage model. One query where a signal written 100 milliseconds ago is reflected in the result.
+
+```
+RETRIEVE items
+FOR USER @user_id
+CONTEXT feed
+USING PROFILE for_you
+FILTER unseen, unblocked, format:video, duration:short
+DIVERSITY max_per_creator:2, format_mix:true
+LIMIT 50
+```
+
+That is what six systems currently produce. It should be one query that an agent can issue, jot its feedback into, and trust to be correct on the next round.
+
+The database that treats ranking as a primitive -- not as an afterthought bolted on top of a search engine, not as a formula computed in a microservice, not as a cache warmed from a batch pipeline -- does not need the stack. It replaces it.
+
+## A fair read of the existing systems
+
+To be clear: these systems are good at what they were designed to do.
+
+- **Search indexes (Elasticsearch, Solr, Typesense):** excellent full-text retrieval, BM25 relevance, and query/filter infrastructure.
+- **Caches (Redis, Memcached):** excellent low-latency read/write paths for hot counters and precomputed features.
+- **Event buses (Kafka, Pulsar, Kinesis):** excellent durable, high-throughput event transport and decoupled consumer architectures.
+- **Feature stores (Feast, Tecton, homegrown):** excellent offline/online feature serving patterns for ML pipelines.
+- **Vector databases (Pinecone, Weaviate, Qdrant, Milvus, pgvector):** excellent nearest-neighbor retrieval over embeddings with metadata filtering.
+- **Ranking services (custom microservices):** excellent place to encode product-specific ranking logic when no single system owns the full problem.
+- **Integrated retrieval/ranking platforms (for example, Vespa):** excellent end-to-end search and ranking infrastructure when teams can operate larger specialized serving systems.
+
+**What makes tidalDB different (one line):** it treats signals, user context, ranking, diversity, and feedback writes as one atomic database system instead of six synchronized subsystems.
+
+**Where we are intentionally focused:** personalized content loops where feedback intent is explicit -- `skip_for_now` (soft), `not_for_me` (preference), `low_quality` (quality), `hide/mute` (hard exclude) -- and the next ranked result updates immediately; not generic search infrastructure breadth.
+
+Every content platform builds the same 6 systems because no database was built for this problem. The stack is not an architecture. It is scar tissue from the absence of one.
+
+---
+
+*tidalDB is an open-source, embeddable Rust database for personalized content ranking. Follow the build on [GitHub](https://github.com/orchard9/tidalDB).*
diff --git a/site/content/blog/why-tidaldb.mdx b/site/content/blog/why-tidaldb.mdx
index aee695f..47a323a 100644
--- a/site/content/blog/why-tidaldb.mdx
+++ b/site/content/blog/why-tidaldb.mdx
@@ -2,46 +2,53 @@
title: "Why we're building tidalDB"
date: "2026-02-20"
author: "Jordan Washburn"
-description: "Every content platform builds the same 6-system stack from scratch. We're replacing it with one database."
+description: "tidalDB is a single-process Rust database for personalized content ranking. Here is what it does and how it works."
tags: ["vision", "architecture"]
---
-Every platform that serves personalized content — a media library, a social feed, a marketplace, a content discovery surface — eventually builds the same distributed system from scratch.
+tidalDB is a database that answers one question: given this user, right now, what should they see?
-Elasticsearch for retrieval. Redis for hot signals. Kafka for event ingestion. A feature store for user profiles. A vector database for semantic search. A ranking service that tries to stitch all of the above together into a single ordered list.
+Agents now sit between the user and many surfaces, so session memory still matters. But the core focus is personalized content ranking. tidalDB is not trying to out-feature every search platform. It is a database where signals, decay, negative feedback, and diversity are schema-level primitives — and ranking updates immediately after user actions.
-We've built this stack. We've operated it. We've watched the seams between systems become the place where correctness dies — stale signals in Redis that don't match Elasticsearch, Kafka consumers that lag by seconds when they should lag by zero, cache invalidation bugs that surface as "why did the user see that item again?"
+I wrote separately about [why every content platform ends up operating six systems](/blog/every-platform-builds-the-same-6-systems) to answer that question. This post is about what we are building instead.
-The root cause is clear: none of these systems were built for the ranking problem. They treat it as an afterthought. A sort clause. A float field. A bolt-on scoring function.
+## The primitives
-## The observation
+tidalDB has five core concepts. Everything else follows from them.
-Ranking is not a feature. It is a primitive.
+**Entities** are Items, Users, and Creators. Each carries metadata, an embedding slot, and a signal ledger. You define them in schema with typed fields — text fields are full-text indexed, keyword fields are filterable, embeddings are ANN-indexed. The database owns the indexes.
-A signal that decays over time is not a field you update with a cron job. It is a type the database understands — with a half-life declared in schema and a decayed value computed at query time.
+**Signals** are typed, timestamped event streams with decay and velocity built in. You declare a signal type once:
-A "trending" sort is not a formula your application computes and stores in a column. It is a built-in sort mode that reads signal velocity natively.
+```rust
+db.define_signal(SignalDef {
+ name: "view",
+ target: EntityKind::Item,
+ decay: Decay::Exponential { half_life: Duration::days(7) },
+ windows: vec![
+ Window::hours(1),
+ Window::hours(24),
+ Window::days(7),
+ Window::all_time(),
+ ],
+ velocity: true,
+})?;
+```
-A diversity constraint — "no more than 2 items from the same creator" — is not post-processing logic in your API layer. It is a query parameter the database enforces after scoring.
+That declaration tells the database everything it needs. When a view event arrives, the database maintains windowed counts, computes velocity, and applies exponential decay — all at write time, all O(1). You never compute `trending_score = views / (age_hours + 2)^1.8` in application code. You never update a stale float field on a cron schedule. The database does this natively, and it does it correctly.
-Once you see it this way, the 6-system stack looks like what it is: scar tissue from forcing the wrong abstraction.
+Negative signals — skips, hides, blocks — are the same type. A skip is not the absence of a like. It is data with its own decay rate and its own weight in the scoring function.
-## What tidalDB is
+**Ranking Profiles** are named, versioned scoring functions declared in schema. They reference signals, relationship weights, recency curves, and diversity rules. You swap profiles at query time by name — no redeploy, no recompile. This is how you A/B test ranking: two profiles, one query parameter.
-A single-node-first, embeddable Rust database designed specifically for personalized content ranking. One process. One query interface. One operational model.
+**Sessions** capture agent context. A session binds a user, an agent identity, and a short-lived memory lane. Agents append structured signals (preference hints, reward scores, tool metadata) with aggressive decay while policies live in schema: what an agent can read, how often it may write, how long data persists.
-The core primitives:
-
-- **Entities** — Items, Users, Creators. Each with metadata, an embedding slot, and an attached signal ledger.
-- **Signals** — Typed, timestamped event streams with native decay, velocity, and windowed aggregation. You declare a `view` signal with a 7-day half-life. The database does the rest.
-- **Ranking Profiles** — Named, versioned scoring functions that live in the database. Reference signals, relationships, recency curves, and diversity rules. Swap at query time by name.
-- **One query** — Candidate retrieval, filtering, personalized ranking, and diversity enforcement in a single operation.
-
-The query that currently takes 6 systems to produce:
+**The query** brings it together. Candidate retrieval, filtering, personalized ranking, and diversity enforcement in a single operation:
```
RETRIEVE items
FOR USER @user_id
+FOR SESSION @session_id
CONTEXT feed
USING PROFILE for_you
FILTER unseen, unblocked, format:video, duration:short
@@ -49,33 +56,58 @@ DIVERSITY max_per_creator:2, format_mix:true
LIMIT 50
```
+One call. No network hops between subsystems. No merging results from five data sources. The database handles retrieval strategy (ANN, BM25, graph walk, full scan), applies hard filters, scores candidates against live signal state, enforces diversity constraints, and returns a ranked list. The agent gets the list along with a session snapshot (top signals, reward velocity, last tool it used) so it can explain its answer.
+
## The feedback loop
-When a user views, likes, skips, or hides content, the signal is written directly to the database. The item's signal ledger updates. The user's preference vector shifts. The relationship weight between user and creator adjusts. All atomically, all in the same write transaction.
+This is the part that makes the architecture honest.
-The next ranking query — even 100ms later — reflects the updated state.
+When a user likes an item, the database atomically updates the item's signal ledger, the user's preference vector, and the user-to-creator relationship weight. All in the same write transaction. The next ranking query — even 100ms later — reflects the updated state.
-No Kafka consumer to lag. No feature store sync to schedule. No cache to invalidate. The write path and the read path are one system.
+```rust
+db.signal(Signal {
+ kind: "like",
+ item: "item_abc",
+ user: "user_123",
+ session: Some("session_xyz"),
+ timestamp: Utc::now(),
+ weight: 1.0,
+ metadata: Some(json!({ "agent": "assistant", "tool": "planner" })),
+})?;
+```
-## What we're building first
+There is no event bus between the engagement and the ranking update. No consumer lag. No cache to invalidate. The write path and the read path are one system. A user who skips three items in a row sees the fourth query adjust — not after a batch pipeline runs, not after a feature store syncs. Now.
-tidalDB is in active development. We're building in Rust, starting single-node, and working toward the first public release. The roadmap:
+## Where we are deliberately narrow
-1. **Storage foundation** — WAL, entity store, signal ledger with forward-decay scoring
-2. **Query engine** — The RETRIEVE/SEARCH/SUGGEST operations with filtering and ranking
-3. **Vector and text search** — HNSW via USearch, BM25 via Tantivy, hybrid fusion with RRF
-4. **The full query surface** — All sort modes, all filters, diversity enforcement, pagination
+If your primary problem is operating a large, general search serving platform, systems like Vespa are excellent and mature.
-We're building in public. Every architectural decision, every benchmark result, every trade-off gets documented here.
+Our wedge is narrower and opinionated:
+
+- Optimize for the personalization loop, not broad search platform parity.
+- Make negative feedback intent explicit and immediate:
+ `skip_for_now` (soft), `not_for_me` (preference), `low_quality` (quality), `hide/mute/block` (hard excludes).
+- Treat "next refresh reflects feedback" as a hard product promise, not a best effort.
+- Keep the first deployment embeddable and in-process for low-latency iteration.
+
+## Where the build stands
+
+tidalDB is early. I want to be direct about what exists today and what does not.
+
+**Built:** Schema system with entity, signal, and profile definitions. Write-ahead log with segment rotation, checksummed records, BLAKE3 deduplication, and crash recovery. Storage engine backed by fjall with trait abstraction, key encoding, and batch writes. Signal ledger with forward-decay scoring, hot-path state, and warm-path persistence.
+
+**Next:** Query engine — the RETRIEVE/SEARCH/SUGGEST operations with the execution pipeline described above. Then session-aware APIs, agent policies, vector search (USearch), text search (Tantivy), and hybrid fusion. Then the full query surface with all sort modes and diversity enforcement.
+
+The foundation is Rust, single-node, embeddable. The storage layer is designed for horizontal scaling later — key encoding and storage isolation are partition-ready — but single-node correctness comes first. This is how we differentiate from Vespa, Milvus, or any search-first system: tidalDB embeds inside your agent runtime, exposes a declarative query+session API, and guarantees every signal the agent writes is visible on the next read without a distributed hop.
+
+We are building in public. The code is on [GitHub](https://github.com/orchard9/tidalDB). Every architectural decision gets documented.
## Why open source
-The personalized content ranking problem is universal. Every content platform needs it. Making the solution proprietary would limit adoption to teams willing to vendor-lock on a database. That's not the goal.
+The personalized content ranking problem is universal. Every content platform needs it. The solution should be a tool you embed in your process and point at your data — not a vendor you depend on for a query you could run locally.
-The goal is a tool that an engineering team can embed in their process, point at their data, and get correct ranking in one query. Open source, MIT licensed, embeddable.
-
-If you're operating a 6-system stack for content ranking and wondering why it has to be this hard — it doesn't. That's why we're building tidalDB.
+MIT licensed. No asterisks.
---
-Follow the build on [GitHub](https://github.com/orchard9/tidalDB) or read the next post when it drops.
+*If you want the full diagnosis of why the 6-system stack exists and where correctness fails between the seams, read [Every content platform builds the same 6 systems from scratch](/blog/every-platform-builds-the-same-6-systems).*
diff --git a/site/next.config.ts b/site/next.config.ts
index a7d4cbc..6e492b5 100644
--- a/site/next.config.ts
+++ b/site/next.config.ts
@@ -3,6 +3,9 @@ import type { NextConfig } from "next";
const nextConfig: NextConfig = {
output: "export",
images: { unoptimized: true },
+ turbopack: {
+ root: __dirname,
+ },
};
export default nextConfig;
diff --git a/site/src/app/blog/page.tsx b/site/src/app/blog/page.tsx
index 5aab232..c78f765 100644
--- a/site/src/app/blog/page.tsx
+++ b/site/src/app/blog/page.tsx
@@ -10,11 +10,12 @@ export default function BlogIndex() {
Blog
- Building in public.
+ Building the agent memory substrate.
Architecture decisions, engineering insights, and progress updates as
- we build tidalDB.
+ we turn tidalDB into the personalization layer agents can read and
+ write in real time.
diff --git a/site/src/app/layout.tsx b/site/src/app/layout.tsx
index 5be2360..5e165c1 100644
--- a/site/src/app/layout.tsx
+++ b/site/src/app/layout.tsx
@@ -33,6 +33,7 @@ export default function RootLayout({
{children}
diff --git a/tidal/Cargo.lock b/tidal/Cargo.lock
index 680ffa3..c6638e0 100644
--- a/tidal/Cargo.lock
+++ b/tidal/Cargo.lock
@@ -1015,6 +1015,7 @@ dependencies = [
"blake3",
"criterion",
"crossbeam",
+ "dashmap",
"fjall",
"proptest",
"tempfile",
diff --git a/tidal/Cargo.toml b/tidal/Cargo.toml
index 70bc5b5..ac9fc8b 100644
--- a/tidal/Cargo.toml
+++ b/tidal/Cargo.toml
@@ -6,10 +6,16 @@ rust-version = "1.91"
description = "Embeddable database for personalized content ranking"
license = "MIT"
+[features]
+test-utils = ["dep:tempfile"]
+metrics = [] # hand-rolled HTTP, no new crate deps
+
[dependencies]
blake3 = "1"
crossbeam = "0.8"
+dashmap = "6"
fjall = "3"
+tempfile = { version = "3", optional = true }
tracing = "0.1"
[dev-dependencies]
@@ -28,6 +34,14 @@ cast_possible_truncation = "allow"
module_name_repetitions = "allow"
unwrap_used = "deny"
+[[test]]
+name = "sandboxed_storage"
+required-features = ["test-utils"]
+
+[[test]]
+name = "metrics_integration"
+required-features = ["metrics"]
+
[[bench]]
name = "signals"
harness = false
diff --git a/tidal/benches/signals.rs b/tidal/benches/signals.rs
index 0ed327d..986cc0e 100644
--- a/tidal/benches/signals.rs
+++ b/tidal/benches/signals.rs
@@ -1,5 +1,6 @@
use criterion::{Criterion, criterion_group, criterion_main};
+#[allow(clippy::missing_const_for_fn)]
fn signal_benchmarks(_c: &mut Criterion) {
// Placeholder — benchmarks added as signal system is implemented.
}
diff --git a/tidal/benches/storage.rs b/tidal/benches/storage.rs
index a959bd4..d4c694b 100644
--- a/tidal/benches/storage.rs
+++ b/tidal/benches/storage.rs
@@ -1,3 +1,5 @@
+#![allow(clippy::unwrap_used)]
+
use criterion::{BatchSize, Criterion, criterion_group, criterion_main};
use tidaldb::schema::{EntityId, EntityKind};
use tidaldb::storage::{
diff --git a/tidal/build.rs b/tidal/build.rs
new file mode 100644
index 0000000..719b6c0
--- /dev/null
+++ b/tidal/build.rs
@@ -0,0 +1,7 @@
+fn main() {
+ // Expose GIT_HASH env var from CI, or "dev" for local builds.
+ let hash = std::env::var("GIT_HASH").unwrap_or_else(|_| "dev".to_string());
+ println!("cargo:rustc-env=TIDALDB_BUILD_HASH={hash}");
+ // Rerun only if the env var changes.
+ println!("cargo:rerun-if-env-changed=GIT_HASH");
+}
diff --git a/tidal/proptest-regressions/signals/warm.txt b/tidal/proptest-regressions/signals/warm.txt
new file mode 100644
index 0000000..cd8e5c1
--- /dev/null
+++ b/tidal/proptest-regressions/signals/warm.txt
@@ -0,0 +1,7 @@
+# Seeds for failure cases proptest has generated in the past. It is
+# automatically read and these particular cases re-run before any
+# novel cases are generated.
+#
+# It is recommended to check this file in to source control so that
+# everyone who runs the test benefits from these saved cases.
+cc b64b3481845f116d827e7d5b43295853693428b0e3346481c1b53c39e712553e # shrinks to event_times_secs = [3485, 7080, 3481, 3481], query_time_secs = 3600
diff --git a/tidal/src/db/builder.rs b/tidal/src/db/builder.rs
new file mode 100644
index 0000000..6368ef3
--- /dev/null
+++ b/tidal/src/db/builder.rs
@@ -0,0 +1,368 @@
+use std::path::{Path, PathBuf};
+use std::sync::Arc;
+
+use super::TidalDb;
+use super::config::{Config, ConfigError, StorageMode};
+use super::metrics::MetricsState;
+use super::paths::Paths;
+
+/// Fluent builder for constructing a [`TidalDb`] instance.
+///
+/// # Examples
+///
+/// ```rust,no_run
+/// use tidaldb::TidalDb;
+///
+/// // Ephemeral (in-memory) database for tests:
+/// let db = TidalDb::builder().ephemeral().open().unwrap();
+///
+/// // Persistent database with explicit data directory:
+/// let db = TidalDb::builder()
+/// .with_data_dir("/var/lib/tidaldb")
+/// .open()
+/// .unwrap();
+/// ```
+#[derive(Debug)]
+pub struct TidalDbBuilder {
+ config: Config,
+ /// Address for the optional metrics HTTP server (e.g. "127.0.0.1:9090").
+ /// Only used when the `metrics` feature is enabled.
+ #[allow(dead_code)]
+ metrics_addr: Option,
+}
+
+impl TidalDbBuilder {
+ /// Create a new builder with default (ephemeral) configuration.
+ #[must_use]
+ pub fn new() -> Self {
+ Self {
+ config: Config::default(),
+ metrics_addr: None,
+ }
+ }
+
+ /// Switch to ephemeral (in-memory) mode, clearing any directory paths.
+ ///
+ /// This is the default mode. Calling this is only necessary to reset
+ /// a builder that was previously configured for persistent mode.
+ #[must_use]
+ pub fn ephemeral(mut self) -> Self {
+ self.config.mode = StorageMode::Ephemeral;
+ self.config.data_dir = None;
+ self.config.wal_dir = None;
+ self.config.cache_dir = None;
+ self
+ }
+
+ /// Switch to persistent mode with the given data directory.
+ ///
+ /// The directory must exist and be writable at the time [`open`](Self::open)
+ /// is called. WAL and cache directories default to subdirectories of
+ /// `data_dir` unless explicitly overridden.
+ #[must_use]
+ pub fn with_data_dir(mut self, dir: impl AsRef) -> Self {
+ self.config.mode = StorageMode::Persistent;
+ self.config.data_dir = Some(dir.as_ref().to_path_buf());
+ self
+ }
+
+ /// Override the WAL directory (defaults to `{data_dir}/wal`).
+ #[must_use]
+ pub fn wal_dir(mut self, dir: impl AsRef) -> Self {
+ self.config.wal_dir = Some(dir.as_ref().to_path_buf());
+ self
+ }
+
+ /// Override the cache directory (defaults to `{data_dir}/cache`).
+ #[must_use]
+ pub fn cache_dir(mut self, dir: impl AsRef) -> Self {
+ self.config.cache_dir = Some(dir.as_ref().to_path_buf());
+ self
+ }
+
+ /// Eagerly validate the configuration.
+ ///
+ /// Checks:
+ /// - Persistent mode requires `data_dir`
+ /// - All specified directories must exist and be writable
+ ///
+ /// # Errors
+ ///
+ /// Returns [`ConfigError`] describing the first validation failure.
+ pub fn validate(&self) -> Result<(), ConfigError> {
+ if self.config.mode == StorageMode::Persistent && self.config.data_dir.is_none() {
+ return Err(ConfigError::MissingDataDir);
+ }
+
+ // Validate each specified directory exists and is writable.
+ let dirs_to_check: Vec<&PathBuf> = [
+ self.config.data_dir.as_ref(),
+ self.config.wal_dir.as_ref(),
+ self.config.cache_dir.as_ref(),
+ ]
+ .into_iter()
+ .flatten()
+ .collect();
+
+ for dir in dirs_to_check {
+ if !dir.exists() {
+ return Err(ConfigError::DirectoryNotFound { path: dir.clone() });
+ }
+ // Check writability by querying metadata permissions.
+ // On Unix, we check the readonly flag. A more robust check
+ // would attempt to create a temp file, but metadata is
+ // sufficient for configuration validation.
+ if dir
+ .metadata()
+ .map(|m| m.permissions().readonly())
+ .unwrap_or(true)
+ {
+ return Err(ConfigError::NotWritable { path: dir.clone() });
+ }
+ }
+
+ Ok(())
+ }
+
+ /// Enable the metrics HTTP server on the given address.
+ ///
+ /// Only available when the `metrics` feature is enabled. When the
+ /// feature is disabled, this method does not exist and the builder
+ /// compiles without it.
+ ///
+ /// # Examples
+ ///
+ /// ```rust,no_run
+ /// # #[cfg(feature = "metrics")]
+ /// let db = tidaldb::TidalDb::builder()
+ /// .ephemeral()
+ /// .enable_metrics("127.0.0.1:9090")
+ /// .open()
+ /// .unwrap();
+ /// ```
+ #[cfg(feature = "metrics")]
+ #[must_use]
+ pub fn enable_metrics(mut self, addr: impl Into) -> Self {
+ self.metrics_addr = Some(addr.into());
+ self
+ }
+
+ /// Resolve default directory paths using [`Paths`] for persistent mode.
+ ///
+ /// When a `data_dir` is set and `wal_dir` or `cache_dir` are not
+ /// explicitly overridden, this fills them in from [`Paths`]. This
+ /// makes `Paths` the single source of truth for directory layout --
+ /// the builder and the CLI both derive defaults the same way.
+ fn resolve_defaults(&mut self) {
+ if let Some(ref data_dir) = self.config.data_dir {
+ let paths = Paths::new(data_dir);
+ if self.config.wal_dir.is_none() {
+ self.config.wal_dir = Some(paths.wal_dir());
+ }
+ if self.config.cache_dir.is_none() {
+ self.config.cache_dir = Some(paths.cache_dir());
+ }
+ }
+ }
+
+ /// Validate and open a [`TidalDb`] instance.
+ ///
+ /// Calls [`validate`](Self::validate), then resolves default directory
+ /// paths via [`Paths`], and constructs the database handle.
+ ///
+ /// Validation checks only user-specified directories. Resolved defaults
+ /// (e.g., `{data_dir}/wal`) are populated after validation -- the storage
+ /// engine will create them via [`Paths::ensure_all`] during initialization.
+ ///
+ /// # Errors
+ ///
+ /// Returns [`crate::LumenError`] if validation fails or initialization
+ /// encounters an error.
+ #[tracing::instrument(skip(self), fields(mode = ?self.config.mode))]
+ pub fn open(mut self) -> crate::Result {
+ self.validate()?;
+ self.resolve_defaults();
+
+ let metrics = Arc::new(MetricsState::new());
+
+ #[cfg(feature = "metrics")]
+ let metrics_handle = if let Some(ref addr) = self.metrics_addr {
+ let handle =
+ super::http::MetricsHandle::start(addr, Arc::clone(&metrics)).map_err(|e| {
+ crate::LumenError::Internal(format!("metrics server failed to start: {e}"))
+ })?;
+ Some(handle)
+ } else {
+ None
+ };
+
+ Ok(TidalDb::from_config(
+ self.config,
+ metrics,
+ #[cfg(feature = "metrics")]
+ metrics_handle,
+ ))
+ }
+}
+
+impl Default for TidalDbBuilder {
+ fn default() -> Self {
+ Self::new()
+ }
+}
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+
+ #[test]
+ fn builder_ephemeral_succeeds() {
+ let db = TidalDb::builder().ephemeral().open();
+ assert!(db.is_ok());
+ }
+
+ #[test]
+ fn builder_default_is_ephemeral() {
+ let db = TidalDb::builder().open();
+ assert!(db.is_ok());
+ }
+
+ #[test]
+ fn builder_persistent_requires_data_dir() {
+ // Construct a persistent-mode builder without calling with_data_dir
+ // by manually setting mode.
+ let builder = TidalDbBuilder {
+ config: Config {
+ mode: StorageMode::Persistent,
+ data_dir: None,
+ wal_dir: None,
+ cache_dir: None,
+ },
+ metrics_addr: None,
+ };
+ let result = builder.validate();
+ assert!(result.is_err());
+ let err = result.expect_err("should fail");
+ assert!(
+ matches!(err, ConfigError::MissingDataDir),
+ "expected MissingDataDir, got: {err}"
+ );
+ }
+
+ #[test]
+ fn builder_persistent_missing_dir() {
+ let result = TidalDb::builder()
+ .with_data_dir("/nonexistent/path/that/does/not/exist")
+ .open();
+ assert!(result.is_err());
+ let err_msg = result.expect_err("should fail").to_string();
+ assert!(
+ err_msg.contains("does not exist"),
+ "expected DirectoryNotFound, got: {err_msg}"
+ );
+ }
+
+ #[test]
+ fn builder_persistent_existing_dir() {
+ let tmp = tempfile::tempdir().expect("failed to create tempdir");
+ let result = TidalDb::builder().with_data_dir(tmp.path()).open();
+ assert!(result.is_ok(), "open with valid tempdir should succeed");
+ }
+
+ #[test]
+ fn health_check_ok() {
+ let db = TidalDb::builder().ephemeral().open().expect("open failed");
+ assert!(db.health_check().is_ok());
+ }
+
+ #[test]
+ fn close_ok() {
+ let db = TidalDb::builder().ephemeral().open().expect("open failed");
+ assert!(db.close().is_ok());
+ }
+
+ #[test]
+ fn builder_with_wal_and_cache_dir() {
+ let tmp = tempfile::tempdir().expect("failed to create tempdir");
+ let wal = tmp.path().join("wal");
+ let cache = tmp.path().join("cache");
+ std::fs::create_dir_all(&wal).expect("mkdir wal");
+ std::fs::create_dir_all(&cache).expect("mkdir cache");
+
+ let result = TidalDb::builder()
+ .with_data_dir(tmp.path())
+ .wal_dir(&wal)
+ .cache_dir(&cache)
+ .open();
+ assert!(
+ result.is_ok(),
+ "open with explicit wal/cache dirs should succeed"
+ );
+ }
+
+ #[test]
+ fn builder_ephemeral_resets_dirs() {
+ let builder = TidalDb::builder()
+ .with_data_dir("/some/path")
+ .wal_dir("/some/wal")
+ .cache_dir("/some/cache")
+ .ephemeral();
+
+ assert_eq!(builder.config.mode, StorageMode::Ephemeral);
+ assert!(builder.config.data_dir.is_none());
+ assert!(builder.config.wal_dir.is_none());
+ assert!(builder.config.cache_dir.is_none());
+ }
+
+ #[test]
+ fn builder_wal_dir_nonexistent() {
+ let tmp = tempfile::tempdir().expect("failed to create tempdir");
+ let result = TidalDb::builder()
+ .with_data_dir(tmp.path())
+ .wal_dir("/nonexistent/wal")
+ .open();
+ assert!(result.is_err());
+ let err_msg = result.expect_err("should fail").to_string();
+ assert!(err_msg.contains("does not exist"));
+ }
+
+ #[test]
+ fn resolve_defaults_sets_wal_and_cache() {
+ let tmp = tempfile::tempdir().expect("failed to create tempdir");
+ let mut builder = TidalDb::builder().with_data_dir(tmp.path());
+ assert!(builder.config.wal_dir.is_none());
+ assert!(builder.config.cache_dir.is_none());
+
+ builder.resolve_defaults();
+
+ let paths = super::Paths::new(tmp.path());
+ assert_eq!(builder.config.wal_dir.as_ref(), Some(&paths.wal_dir()));
+ assert_eq!(builder.config.cache_dir.as_ref(), Some(&paths.cache_dir()));
+ }
+
+ #[test]
+ fn resolve_defaults_preserves_explicit_overrides() {
+ let tmp = tempfile::tempdir().expect("failed to create tempdir");
+ let custom_wal = tmp.path().join("custom_wal");
+ let custom_cache = tmp.path().join("custom_cache");
+
+ let mut builder = TidalDb::builder()
+ .with_data_dir(tmp.path())
+ .wal_dir(&custom_wal)
+ .cache_dir(&custom_cache);
+
+ builder.resolve_defaults();
+
+ assert_eq!(builder.config.wal_dir.as_ref(), Some(&custom_wal));
+ assert_eq!(builder.config.cache_dir.as_ref(), Some(&custom_cache));
+ }
+
+ #[test]
+ fn resolve_defaults_noop_for_ephemeral() {
+ let mut builder = TidalDb::builder().ephemeral();
+ builder.resolve_defaults();
+
+ assert!(builder.config.wal_dir.is_none());
+ assert!(builder.config.cache_dir.is_none());
+ }
+}
diff --git a/tidal/src/db/config.rs b/tidal/src/db/config.rs
new file mode 100644
index 0000000..2e314ee
--- /dev/null
+++ b/tidal/src/db/config.rs
@@ -0,0 +1,129 @@
+use std::fmt;
+use std::path::PathBuf;
+
+/// How tidalDB stores data.
+///
+/// `Ephemeral` keeps everything in memory -- ideal for tests and short-lived
+/// processes. `Persistent` writes to an LSM-tree on disk (fjall).
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub enum StorageMode {
+ /// In-memory only. No filesystem access. Data is lost on drop.
+ Ephemeral,
+ /// Durable storage backed by fjall (LSM-tree). Requires `data_dir`.
+ Persistent,
+}
+
+/// Configuration for a tidalDB instance.
+///
+/// Constructed either directly or via [`super::TidalDbBuilder`].
+///
+/// # Defaults
+///
+/// The default configuration is ephemeral (in-memory) with no directory paths.
+/// Persistent mode requires at least `data_dir` to be set.
+#[derive(Debug, Clone)]
+pub struct Config {
+ /// Storage backend selection.
+ pub mode: StorageMode,
+ /// Root directory for persistent data. Required when `mode` is `Persistent`.
+ pub data_dir: Option,
+ /// Override for the WAL directory. Defaults to `{data_dir}/wal`.
+ pub wal_dir: Option,
+ /// Override for the cache directory. Defaults to `{data_dir}/cache`.
+ pub cache_dir: Option,
+}
+
+impl Default for Config {
+ fn default() -> Self {
+ Self {
+ mode: StorageMode::Ephemeral,
+ data_dir: None,
+ wal_dir: None,
+ cache_dir: None,
+ }
+ }
+}
+
+/// Errors that arise during configuration validation.
+///
+/// These are always caller errors -- the configuration is invalid and must
+/// be corrected before a tidalDB instance can be opened.
+#[derive(Debug)]
+pub enum ConfigError {
+ /// Persistent mode was selected but no data directory was provided.
+ MissingDataDir,
+ /// A directory path was specified but does not exist on the filesystem.
+ DirectoryNotFound { path: PathBuf },
+ /// A directory exists but the process does not have write permission.
+ NotWritable { path: PathBuf },
+}
+
+impl fmt::Display for ConfigError {
+ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+ match self {
+ Self::MissingDataDir => f.write_str("persistent mode requires a data directory"),
+ Self::DirectoryNotFound { path } => {
+ write!(f, "directory does not exist: {}", path.display())
+ }
+ Self::NotWritable { path } => {
+ write!(f, "directory is not writable: {}", path.display())
+ }
+ }
+ }
+}
+
+impl std::error::Error for ConfigError {}
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+
+ #[test]
+ fn default_config_is_ephemeral() {
+ let cfg = Config::default();
+ assert_eq!(cfg.mode, StorageMode::Ephemeral);
+ assert!(cfg.data_dir.is_none());
+ assert!(cfg.wal_dir.is_none());
+ assert!(cfg.cache_dir.is_none());
+ }
+
+ #[test]
+ fn config_error_display_missing_data_dir() {
+ let e = ConfigError::MissingDataDir;
+ assert_eq!(e.to_string(), "persistent mode requires a data directory");
+ }
+
+ #[test]
+ fn config_error_display_directory_not_found() {
+ let e = ConfigError::DirectoryNotFound {
+ path: PathBuf::from("/nonexistent"),
+ };
+ assert!(e.to_string().contains("/nonexistent"));
+ assert!(e.to_string().contains("does not exist"));
+ }
+
+ #[test]
+ fn config_error_display_not_writable() {
+ let e = ConfigError::NotWritable {
+ path: PathBuf::from("/readonly"),
+ };
+ assert!(e.to_string().contains("/readonly"));
+ assert!(e.to_string().contains("not writable"));
+ }
+
+ #[test]
+ fn storage_mode_debug() {
+ // Ensure Debug is derived and produces readable output.
+ let s = format!("{:?}", StorageMode::Ephemeral);
+ assert_eq!(s, "Ephemeral");
+ let s = format!("{:?}", StorageMode::Persistent);
+ assert_eq!(s, "Persistent");
+ }
+
+ #[test]
+ fn config_debug() {
+ let cfg = Config::default();
+ let s = format!("{cfg:?}");
+ assert!(s.contains("Ephemeral"));
+ }
+}
diff --git a/tidal/src/db/http.rs b/tidal/src/db/http.rs
new file mode 100644
index 0000000..8882c87
--- /dev/null
+++ b/tidal/src/db/http.rs
@@ -0,0 +1,152 @@
+//! Optional HTTP metrics server for tidalDB.
+//!
+//! Disabled by default. Enable with the `metrics` feature flag.
+//!
+//! Serves two endpoints on a background `std::thread`:
+//! - `GET /healthz` -- JSON health check
+//! - `GET /metrics` -- Prometheus text exposition format
+//!
+//! The server reads from [`MetricsState`] which is Arc-shared with [`TidalDb`].
+//! The server thread exits cleanly when [`MetricsHandle::stop`] is called.
+
+use std::io::{BufRead, BufReader, Write};
+use std::net::{TcpListener, TcpStream};
+use std::sync::Arc;
+use std::sync::atomic::{AtomicBool, Ordering};
+use std::thread;
+
+use super::metrics::MetricsState;
+
+/// Handle to the background metrics HTTP server.
+///
+/// The server thread runs until [`stop`] is called or this handle is dropped.
+pub struct MetricsHandle {
+ shutdown: Arc,
+ thread: Option>,
+ /// The actual bound address (useful when port 0 was requested).
+ pub addr: std::net::SocketAddr,
+}
+
+impl MetricsHandle {
+ /// Bind to the given address and start the background server thread.
+ ///
+ /// # Errors
+ ///
+ /// Returns `std::io::Error` if the address cannot be bound.
+ pub fn start(addr: &str, state: Arc) -> std::io::Result {
+ let listener = TcpListener::bind(addr)?;
+ let actual_addr = listener.local_addr()?;
+ let shutdown = Arc::new(AtomicBool::new(false));
+ let shutdown_clone = Arc::clone(&shutdown);
+
+ // listener, state, and shutdown_clone are moved into the thread --
+ // they must be owned. serve_loop takes them by value intentionally.
+ let handle = thread::Builder::new()
+ .name("tidaldb-metrics-http".into())
+ .spawn(move || serve_loop(listener, state, shutdown_clone))
+ .map_err(std::io::Error::other)?;
+
+ Ok(Self {
+ shutdown,
+ thread: Some(handle),
+ addr: actual_addr,
+ })
+ }
+
+ /// Stop the metrics server. Blocks until the server thread exits.
+ pub fn stop(&mut self) {
+ self.shutdown.store(true, Ordering::Release);
+ // Connect to ourselves to unblock the accept() call.
+ if let Ok(mut conn) = TcpStream::connect(self.addr) {
+ let _ = conn.write_all(b"");
+ }
+ if let Some(t) = self.thread.take() {
+ let _ = t.join();
+ }
+ }
+}
+
+impl Drop for MetricsHandle {
+ fn drop(&mut self) {
+ self.stop();
+ }
+}
+
+// These parameters are passed by value intentionally: they are moved into the
+// spawned thread closure. Taking references would require `'static` lifetimes
+// which aren't available from the caller's stack.
+#[allow(clippy::needless_pass_by_value)]
+fn serve_loop(listener: TcpListener, state: Arc, shutdown: Arc) {
+ let timeout = std::time::Duration::from_millis(50);
+ // Use non-blocking + sleep to periodically check the shutdown flag
+ // without blocking indefinitely in accept().
+ if let Err(e) = listener.set_nonblocking(true) {
+ tracing::warn!(error = %e, "failed to set non-blocking mode on metrics listener");
+ }
+
+ loop {
+ if shutdown.load(Ordering::Acquire) {
+ break;
+ }
+ match listener.accept() {
+ Ok((stream, _)) => {
+ if shutdown.load(Ordering::Acquire) {
+ break;
+ }
+ handle_connection(stream, &state);
+ }
+ Err(ref e) if e.kind() == std::io::ErrorKind::WouldBlock => {
+ thread::sleep(timeout);
+ }
+ Err(_) => break,
+ }
+ }
+}
+
+fn handle_connection(mut stream: TcpStream, state: &MetricsState) {
+ // Set a read timeout to avoid hanging on slow/malicious clients.
+ let _ = stream.set_read_timeout(Some(std::time::Duration::from_secs(5)));
+
+ let mut reader = BufReader::new(&stream);
+ let mut request_line = String::new();
+ if reader.read_line(&mut request_line).is_err() {
+ return;
+ }
+ // Drain remaining headers to avoid broken pipe.
+ let mut header = String::new();
+ loop {
+ header.clear();
+ match reader.read_line(&mut header) {
+ Ok(0) | Err(_) => break,
+ Ok(n) if n <= 2 => break, // empty line = end of headers
+ _ => {}
+ }
+ }
+
+ let path = extract_path(&request_line);
+ let (status, content_type, body) = match path {
+ "/healthz" => ("200 OK", "application/json", state.render_healthz()),
+ "/metrics" => (
+ "200 OK",
+ "text/plain; version=0.0.4; charset=utf-8",
+ state.render_prometheus(),
+ ),
+ _ => (
+ "404 Not Found",
+ "application/json",
+ r#"{"error":"not found"}"#.to_string(),
+ ),
+ };
+
+ let response = format!(
+ "HTTP/1.1 {status}\r\nContent-Type: {content_type}\r\nContent-Length: {}\r\nConnection: close\r\n\r\n{body}",
+ body.len()
+ );
+ let _ = stream.write_all(response.as_bytes());
+}
+
+fn extract_path(request_line: &str) -> &str {
+ // "GET /path HTTP/1.1"
+ let parts: Vec<&str> = request_line.split_whitespace().collect();
+ if parts.len() >= 2 { parts[1] } else { "/" }
+}
diff --git a/tidal/src/db/metrics.rs b/tidal/src/db/metrics.rs
new file mode 100644
index 0000000..65f9189
--- /dev/null
+++ b/tidal/src/db/metrics.rs
@@ -0,0 +1,145 @@
+//! Runtime metrics for tidalDB.
+//!
+//! [`MetricsState`] is an `Arc`-shared bag of atomics that `TidalDb` updates
+//! on every operation. The metrics HTTP server (when the `metrics` feature
+//! is enabled) reads from this shared state to serve Prometheus text format.
+//!
+//! Adding a new counter in future milestones is:
+//! 1. Add an `AtomicU64` field to `MetricsState`
+//! 2. Increment it in the relevant `TidalDb` method
+//! 3. Add one line to `MetricsState::render_prometheus`
+
+use std::sync::atomic::{AtomicBool, Ordering};
+use std::time::Instant;
+
+/// Shared runtime metrics for a `TidalDb` instance.
+///
+/// Cheap to clone (`Arc` inside). Thread-safe.
+pub struct MetricsState {
+ /// Time the database was opened.
+ pub(crate) opened_at: Instant,
+ /// Whether the database is currently healthy.
+ pub(crate) health_ok: AtomicBool,
+}
+
+impl MetricsState {
+ pub(crate) fn new() -> Self {
+ Self {
+ opened_at: Instant::now(),
+ health_ok: AtomicBool::new(true),
+ }
+ }
+
+ /// Uptime in fractional seconds since the database was opened.
+ #[must_use]
+ pub fn uptime_seconds(&self) -> f64 {
+ self.opened_at.elapsed().as_secs_f64()
+ }
+
+ /// Whether the database reports healthy (1.0) or degraded (0.0).
+ #[must_use]
+ pub fn health_ok_value(&self) -> f64 {
+ if self.health_ok.load(Ordering::Relaxed) {
+ 1.0
+ } else {
+ 0.0
+ }
+ }
+
+ /// Render Prometheus text exposition format for all metrics.
+ ///
+ /// Format:
+ #[must_use]
+ pub fn render_prometheus(&self) -> String {
+ let uptime = self.uptime_seconds();
+ let health = self.health_ok_value();
+ let version = env!("CARGO_PKG_VERSION");
+ let build_hash = crate::BUILD_HASH;
+
+ format!(
+ "# HELP tidaldb_uptime_seconds Seconds since database opened.\n\
+ # TYPE tidaldb_uptime_seconds gauge\n\
+ tidaldb_uptime_seconds{{partition_id=\"0\"}} {uptime:.3}\n\n\
+ # HELP tidaldb_health_ok Whether the database is healthy. 1 = ok, 0 = degraded.\n\
+ # TYPE tidaldb_health_ok gauge\n\
+ tidaldb_health_ok{{partition_id=\"0\"}} {health}\n\n\
+ # HELP tidaldb_info Build and version information.\n\
+ # TYPE tidaldb_info gauge\n\
+ tidaldb_info{{version=\"{version}\",build_hash=\"{build_hash}\",partition_id=\"0\"}} 1\n"
+ )
+ }
+
+ /// Render JSON for /healthz.
+ #[must_use]
+ pub fn render_healthz(&self) -> String {
+ let uptime = self.uptime_seconds();
+ let status = if self.health_ok.load(Ordering::Relaxed) {
+ "ok"
+ } else {
+ "degraded"
+ };
+ let version = env!("CARGO_PKG_VERSION");
+ let build_hash = crate::BUILD_HASH;
+ format!(
+ r#"{{"status":"{status}","uptime_seconds":{uptime:.3},"version":"{version}","build_hash":"{build_hash}"}}"#
+ )
+ }
+}
+
+#[cfg(test)]
+mod tests {
+ use super::*;
+
+ #[test]
+ fn new_creates_healthy_state() {
+ let state = MetricsState::new();
+ assert!(state.health_ok.load(Ordering::Relaxed));
+ }
+
+ #[test]
+ fn uptime_is_non_negative() {
+ let state = MetricsState::new();
+ assert!(state.uptime_seconds() >= 0.0);
+ }
+
+ #[test]
+ fn health_ok_value_returns_one_when_healthy() {
+ let state = MetricsState::new();
+ assert!((state.health_ok_value() - 1.0).abs() < f64::EPSILON);
+ }
+
+ #[test]
+ fn health_ok_value_returns_zero_when_degraded() {
+ let state = MetricsState::new();
+ state.health_ok.store(false, Ordering::Relaxed);
+ assert!(state.health_ok_value().abs() < f64::EPSILON);
+ }
+
+ #[test]
+ fn render_prometheus_contains_expected_metrics() {
+ let state = MetricsState::new();
+ let output = state.render_prometheus();
+ assert!(output.contains("tidaldb_uptime_seconds"));
+ assert!(output.contains("tidaldb_health_ok"));
+ assert!(output.contains("tidaldb_info"));
+ assert!(output.contains("partition_id=\"0\""));
+ }
+
+ #[test]
+ fn render_healthz_contains_expected_fields() {
+ let state = MetricsState::new();
+ let output = state.render_healthz();
+ assert!(output.contains("\"status\":\"ok\""));
+ assert!(output.contains("\"uptime_seconds\":"));
+ assert!(output.contains("\"version\":"));
+ assert!(output.contains("\"build_hash\":"));
+ }
+
+ #[test]
+ fn render_healthz_degraded() {
+ let state = MetricsState::new();
+ state.health_ok.store(false, Ordering::Relaxed);
+ let output = state.render_healthz();
+ assert!(output.contains("\"status\":\"degraded\""));
+ }
+}
diff --git a/tidal/src/db/mod.rs b/tidal/src/db/mod.rs
new file mode 100644
index 0000000..0dc3d1f
--- /dev/null
+++ b/tidal/src/db/mod.rs
@@ -0,0 +1,210 @@
+//! The public entry point for tidalDB.
+//!
+//! This module provides [`TidalDb`] (the database handle) and
+//! [`TidalDbBuilder`] (the fluent construction API). All interaction
+//! with tidalDB starts here.
+//!
+//! # Quick Start
+//!
+//! ```rust,no_run
+//! use tidaldb::TidalDb;
+//!
+//! // In-memory database for tests:
+//! let db = TidalDb::builder().ephemeral().open().unwrap();
+//! assert!(db.health_check().is_ok());
+//! ```
+
+pub mod builder;
+pub mod config;
+#[cfg(feature = "metrics")]
+pub mod http;
+pub mod metrics;
+pub mod paths;
+#[cfg(any(test, feature = "test-utils"))]
+pub mod temp;
+
+pub use builder::TidalDbBuilder;
+pub use config::{Config, ConfigError, StorageMode};
+pub use metrics::MetricsState;
+pub use paths::Paths;
+#[cfg(any(test, feature = "test-utils"))]
+pub use temp::TempTidalHome;
+
+use std::sync::Arc;
+use std::sync::atomic::{AtomicBool, Ordering};
+
+/// A tidalDB database instance.
+///
+/// Created via [`TidalDb::builder()`]. At M0 this is a thin handle that
+/// validates configuration and proves the builder API works. Future
+/// milestones will wire in the storage engine, signal ledger, and query
+/// executor behind this facade.
+///
+/// # Shutdown
+///
+/// Call [`close`](Self::close) for explicit shutdown. If dropped without
+/// calling `close`, the [`Drop`] implementation will run cleanup and log
+/// any errors via `tracing::error!`.
+pub struct TidalDb {
+ config: Config,
+ /// Whether `close()` has been called. Prevents double-shutdown.
+ closed: AtomicBool,
+ /// Runtime metrics shared with the optional HTTP server.
+ metrics: Arc,
+ /// Handle to the metrics HTTP server thread (metrics feature only).
+ #[cfg(feature = "metrics")]
+ metrics_handle: Option,
+}
+
+impl TidalDb {
+ /// Returns a new [`TidalDbBuilder`] with default (ephemeral) configuration.
+ #[must_use]
+ pub fn builder() -> TidalDbBuilder {
+ TidalDbBuilder::new()
+ }
+
+ /// Construct a `TidalDb` from a validated configuration.
+ ///
+ /// This is `pub(crate)` -- external callers use the builder.
+ #[allow(clippy::missing_const_for_fn)] // Arc field prevents const in practice
+ pub(crate) fn from_config(
+ config: Config,
+ metrics: Arc,
+ #[cfg(feature = "metrics")] metrics_handle: Option,
+ ) -> Self {
+ Self {
+ config,
+ closed: AtomicBool::new(false),
+ metrics,
+ #[cfg(feature = "metrics")]
+ metrics_handle,
+ }
+ }
+
+ /// Returns a reference to the shared metrics state.
+ #[must_use]
+ #[allow(clippy::missing_const_for_fn)] // Arc field prevents const in practice
+ pub fn metrics(&self) -> &Arc {
+ &self.metrics
+ }
+
+ /// Returns the bound address of the metrics HTTP server, if running.
+ ///
+ /// Useful when port 0 was requested to discover the OS-assigned port.
+ /// Returns `None` if the `metrics` feature is disabled or if
+ /// `enable_metrics` was not called on the builder.
+ #[must_use]
+ #[allow(clippy::missing_const_for_fn)] // cfg-gated body prevents const
+ pub fn metrics_addr(&self) -> Option {
+ #[cfg(feature = "metrics")]
+ {
+ self.metrics_handle.as_ref().map(|h| h.addr)
+ }
+ #[cfg(not(feature = "metrics"))]
+ {
+ None
+ }
+ }
+
+ /// Returns `Ok(())` if the database is initialized and operational.
+ ///
+ /// At M0 this simply confirms the handle was constructed successfully.
+ /// Future milestones will verify storage engine connectivity, WAL
+ /// integrity, and index health.
+ ///
+ /// # Errors
+ ///
+ /// Returns an error if the database has been closed or an internal
+ /// check fails.
+ #[tracing::instrument(skip(self))]
+ pub fn health_check(&self) -> crate::Result<()> {
+ if self.closed.load(Ordering::Acquire) {
+ // Ordering::Release: ensures ranking queries that load with
+ // Acquire see the degraded state after we mark it here.
+ self.metrics.health_ok.store(false, Ordering::Release);
+ return Err(crate::LumenError::Internal(
+ "database is closed".to_string(),
+ ));
+ }
+ Ok(())
+ }
+
+ /// Cleanly shut down the database.
+ ///
+ /// At M0 this is a no-op beyond marking the instance as closed.
+ /// Future milestones will drain the WAL, flush the storage engine,
+ /// and persist index state.
+ ///
+ /// # Errors
+ ///
+ /// Returns an error if shutdown encounters a failure (e.g., WAL flush
+ /// fails in future milestones).
+ #[tracing::instrument(skip(self))]
+ pub fn close(self) -> crate::Result<()> {
+ self.shutdown_inner()
+ }
+
+ /// Internal shutdown logic shared by `close()` and `Drop`.
+ ///
+ /// Returns `Result` even though M0 is infallible -- future milestones
+ /// add WAL drain and storage flush which can fail.
+ #[allow(clippy::unnecessary_wraps)]
+ fn shutdown_inner(&self) -> crate::Result<()> {
+ // Swap from false to true. If it was already true, we already shut down.
+ if self
+ .closed
+ .compare_exchange(false, true, Ordering::AcqRel, Ordering::Acquire)
+ .is_err()
+ {
+ // Already closed -- idempotent, not an error.
+ return Ok(());
+ }
+
+ tracing::debug!(mode = ?self.config.mode, "tidaldb shutting down");
+
+ // Mark health as degraded so the metrics endpoint reflects shutdown.
+ self.metrics.health_ok.store(false, Ordering::Release);
+
+ // Stop the metrics HTTP server if running.
+ #[cfg(feature = "metrics")]
+ {
+ // SAFETY: We need &mut to stop the handle, but we only have &self.
+ // This is safe because shutdown_inner is guarded by the closed
+ // compare_exchange above -- only one thread will ever reach this
+ // point. We use a raw pointer to get interior mutability for
+ // the Option field.
+ //
+ // NOTE: This is the same pattern used in Drop, which also has &mut self.
+ // For the close() path we route through shutdown_inner(&self) to share
+ // logic. In practice this runs exactly once due to the CAS guard.
+ }
+
+ // M0: nothing to flush. Future milestones add WAL drain, storage
+ // engine flush, and index persistence here.
+
+ Ok(())
+ }
+}
+
+impl Drop for TidalDb {
+ fn drop(&mut self) {
+ // Stop metrics HTTP server if still running.
+ #[cfg(feature = "metrics")]
+ if let Some(ref mut handle) = self.metrics_handle {
+ handle.stop();
+ }
+
+ if let Err(e) = self.shutdown_inner() {
+ tracing::error!(error = %e, "error during tidaldb shutdown");
+ }
+ }
+}
+
+impl std::fmt::Debug for TidalDb {
+ fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+ f.debug_struct("TidalDb")
+ .field("mode", &self.config.mode)
+ .field("closed", &self.closed.load(Ordering::Relaxed))
+ .finish_non_exhaustive()
+ }
+}
diff --git a/tidal/src/db/paths.rs b/tidal/src/db/paths.rs
new file mode 100644
index 0000000..2cb31ed
--- /dev/null
+++ b/tidal/src/db/paths.rs
@@ -0,0 +1,177 @@
+//! Resolved filesystem paths for a single tidalDB instance.
+//!
+//! [`Paths`] is the single source of truth for directory layout. Both the
+//! [`TidalDbBuilder`](super::TidalDbBuilder) and the future CLI use it to
+//! resolve subdirectory locations from a base path.
+
+use std::path::{Path, PathBuf};
+
+/// Resolved filesystem paths for a single tidalDB instance.
+///
+/// All paths derive from a single base directory. Use [`Paths::new`] to
+/// construct from a base path. Directories are NOT created automatically --
+/// call [`Paths::ensure_all`] to create them.
+///
+/// # Directory Layout
+///
+/// | Directory | Purpose |
+/// |-------------------|---------------------------------------------------|
+/// | `{base}/wal` | Write-ahead log segments |
+/// | `{base}/items` | fjall keyspace for item entities |
+/// | `{base}/users` | fjall keyspace for user entities |
+/// | `{base}/creators` | fjall keyspace for creator entities |
+/// | `{base}/cache` | Materialized views and secondary indexes (future) |
+pub struct Paths {
+ base: PathBuf,
+}
+
+impl Paths {
+ /// Create a new `Paths` from the given base directory.
+ ///
+ /// No filesystem operations are performed. Call [`ensure_all`](Self::ensure_all)
+ /// to create the directories.
+ #[must_use]
+ pub fn new(base: impl Into) -> Self {
+ Self { base: base.into() }
+ }
+
+ /// The base directory from which all subdirectories are derived.
+ #[must_use]
+ pub fn base(&self) -> &Path {
+ &self.base
+ }
+
+ /// Path to the write-ahead log directory: `{base}/wal`.
+ #[must_use]
+ pub fn wal_dir(&self) -> PathBuf {
+ self.base.join("wal")
+ }
+
+ /// Path to the item entities directory: `{base}/items`.
+ #[must_use]
+ pub fn items_dir(&self) -> PathBuf {
+ self.base.join("items")
+ }
+
+ /// Path to the user entities directory: `{base}/users`.
+ #[must_use]
+ pub fn users_dir(&self) -> PathBuf {
+ self.base.join("users")
+ }
+
+ /// Path to the creator entities directory: `{base}/creators`.
+ #[must_use]
+ pub fn creators_dir(&self) -> PathBuf {
+ self.base.join("creators")
+ }
+
+ /// Path to the cache directory: `{base}/cache`.
+ #[must_use]
+ pub fn cache_dir(&self) -> PathBuf {
+ self.base.join("cache")
+ }
+
+ /// Create all subdirectories under the base path.
+ ///
+ /// Idempotent: does not fail if directories already exist. Uses
+ /// `std::fs::create_dir_all` so intermediate directories are also created.
+ ///
+ /// # Errors
+ ///
+ /// Returns `std::io::Error` if directory creation fails (e.g., permission
+ /// denied, disk full).
+ pub fn ensure_all(&self) -> Result<(), std::io::Error> {
+ let dirs = [
+ self.wal_dir(),
+ self.items_dir(),
+ self.users_dir(),
+ self.creators_dir(),
+ self.cache_dir(),
+ ];
+ for dir in &dirs {
+ std::fs::create_dir_all(dir)?;
+ }
+ Ok(())
+ }
+}
+
+impl std::fmt::Debug for Paths {
+ fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+ f.debug_struct("Paths")
+ .field("base", &self.base)
+ .field("wal", &self.wal_dir())
+ .field("items", &self.items_dir())
+ .field("users", &self.users_dir())
+ .field("creators", &self.creators_dir())
+ .field("cache", &self.cache_dir())
+ .finish()
+ }
+}
+
+#[cfg(test)]
+#[allow(clippy::unwrap_used)]
+mod tests {
+ use super::*;
+
+ #[test]
+ fn paths_derive_from_base() {
+ let paths = Paths::new("/data/tidaldb");
+ assert_eq!(paths.base(), Path::new("/data/tidaldb"));
+ assert_eq!(paths.wal_dir(), PathBuf::from("/data/tidaldb/wal"));
+ assert_eq!(paths.items_dir(), PathBuf::from("/data/tidaldb/items"));
+ assert_eq!(paths.users_dir(), PathBuf::from("/data/tidaldb/users"));
+ assert_eq!(
+ paths.creators_dir(),
+ PathBuf::from("/data/tidaldb/creators")
+ );
+ assert_eq!(paths.cache_dir(), PathBuf::from("/data/tidaldb/cache"));
+ }
+
+ #[test]
+ fn paths_from_pathbuf() {
+ let base = PathBuf::from("/tmp/test");
+ let paths = Paths::new(base.clone());
+ assert_eq!(paths.base(), base.as_path());
+ }
+
+ #[test]
+ fn paths_debug_includes_all_dirs() {
+ let paths = Paths::new("/data/tidaldb");
+ let debug = format!("{paths:?}");
+ assert!(debug.contains("base"), "Debug should show base");
+ assert!(debug.contains("wal"), "Debug should show wal");
+ assert!(debug.contains("items"), "Debug should show items");
+ assert!(debug.contains("users"), "Debug should show users");
+ assert!(debug.contains("creators"), "Debug should show creators");
+ assert!(debug.contains("cache"), "Debug should show cache");
+ }
+
+ #[test]
+ fn paths_ensure_all_creates_directories() {
+ let tmp = tempfile::tempdir().unwrap();
+ let paths = Paths::new(tmp.path());
+ paths.ensure_all().unwrap();
+
+ assert!(paths.wal_dir().is_dir());
+ assert!(paths.items_dir().is_dir());
+ assert!(paths.users_dir().is_dir());
+ assert!(paths.creators_dir().is_dir());
+ assert!(paths.cache_dir().is_dir());
+ }
+
+ #[test]
+ fn paths_ensure_all_is_idempotent() {
+ let tmp = tempfile::tempdir().unwrap();
+ let paths = Paths::new(tmp.path());
+ paths.ensure_all().unwrap();
+ // Second call should not error.
+ paths.ensure_all().unwrap();
+ }
+
+ #[test]
+ fn paths_trailing_slash_handled() {
+ let paths = Paths::new("/data/tidaldb/");
+ // PathBuf::join handles trailing slashes correctly.
+ assert_eq!(paths.wal_dir(), PathBuf::from("/data/tidaldb/wal"));
+ }
+}
diff --git a/tidal/src/db/temp.rs b/tidal/src/db/temp.rs
new file mode 100644
index 0000000..0109c9a
--- /dev/null
+++ b/tidal/src/db/temp.rs
@@ -0,0 +1,186 @@
+//! Temporary tidalDB data directory for tests.
+//!
+//! Provides [`TempTidalHome`], a uniquely-named temporary directory that
+//! is cleaned up on drop (unless `preserve` is set). This ensures test
+//! isolation: every test gets its own filesystem sandbox.
+
+use std::path::{Path, PathBuf};
+
+use super::paths::Paths;
+
+/// A temporary tidalDB data directory for use in tests.
+///
+/// Automatically cleaned up when dropped, unless `preserve` is set.
+/// Each instance gets a unique directory under the OS temp root,
+/// ensuring test isolation.
+///
+/// # Examples
+///
+/// ```rust,no_run
+/// # use tidaldb::db::temp::TempTidalHome;
+/// let home = TempTidalHome::new().unwrap();
+/// let paths = home.paths();
+/// paths.ensure_all().unwrap();
+/// // ... run test ...
+/// // `home` dropped here -> directory removed
+/// ```
+pub struct TempTidalHome {
+ base: PathBuf,
+ preserve: bool,
+}
+
+impl TempTidalHome {
+ /// Create a new temporary directory with a unique name.
+ ///
+ /// The directory is created under the OS temp root with prefix `tidaldb-`.
+ /// It will be deleted when this value is dropped.
+ ///
+ /// # Errors
+ ///
+ /// Returns `std::io::Error` if the temp directory cannot be created.
+ pub fn new() -> Result {
+ let dir = tempfile::Builder::new()
+ .prefix("tidaldb-")
+ .tempdir()?
+ .keep();
+ Ok(Self {
+ base: dir,
+ preserve: false,
+ })
+ }
+
+ /// Create a temporary directory that is NOT deleted on drop.
+ ///
+ /// Useful for debugging: the directory survives so you can inspect
+ /// its contents after a test failure.
+ ///
+ /// # Errors
+ ///
+ /// Returns `std::io::Error` if the temp directory cannot be created.
+ pub fn with_preserve() -> Result {
+ let dir = tempfile::Builder::new()
+ .prefix("tidaldb-")
+ .tempdir()?
+ .keep();
+ Ok(Self {
+ base: dir,
+ preserve: true,
+ })
+ }
+
+ /// The root path of this temporary directory.
+ #[must_use]
+ pub fn path(&self) -> &Path {
+ &self.base
+ }
+
+ /// Construct a [`Paths`] rooted at this temporary directory.
+ #[must_use]
+ pub fn paths(&self) -> Paths {
+ Paths::new(self.base.clone())
+ }
+}
+
+impl Drop for TempTidalHome {
+ fn drop(&mut self) {
+ if !self.preserve
+ && let Err(e) = std::fs::remove_dir_all(&self.base)
+ {
+ tracing::warn!(
+ path = %self.base.display(),
+ error = %e,
+ "failed to clean up TempTidalHome"
+ );
+ }
+ }
+}
+
+impl std::fmt::Debug for TempTidalHome {
+ fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+ f.debug_struct("TempTidalHome")
+ .field("base", &self.base)
+ .field("preserve", &self.preserve)
+ .finish()
+ }
+}
+
+#[cfg(test)]
+#[allow(clippy::unwrap_used)]
+mod tests {
+ use super::*;
+
+ #[test]
+ fn new_creates_directory() {
+ let home = TempTidalHome::new().unwrap();
+ assert!(home.path().exists());
+ assert!(home.path().is_dir());
+ }
+
+ #[test]
+ fn path_contains_tidaldb_prefix() {
+ let home = TempTidalHome::new().unwrap();
+ let name = home
+ .path()
+ .file_name()
+ .unwrap()
+ .to_string_lossy()
+ .to_string();
+ assert!(
+ name.starts_with("tidaldb-"),
+ "expected tidaldb- prefix, got: {name}"
+ );
+ }
+
+ #[test]
+ fn paths_returns_paths_rooted_at_base() {
+ let home = TempTidalHome::new().unwrap();
+ let paths = home.paths();
+ assert_eq!(paths.base(), home.path());
+ }
+
+ #[test]
+ fn drop_removes_directory() {
+ let path = {
+ let home = TempTidalHome::new().unwrap();
+ let p = home.path().to_owned();
+ // Write a file so the dir is non-empty
+ std::fs::write(p.join("testfile"), b"data").unwrap();
+ assert!(p.exists());
+ p
+ // home dropped here
+ };
+ assert!(!path.exists(), "directory should be removed after drop");
+ }
+
+ #[test]
+ fn with_preserve_keeps_directory() {
+ let path = {
+ let home = TempTidalHome::with_preserve().unwrap();
+ let p = home.path().to_owned();
+ std::fs::write(p.join("testfile"), b"data").unwrap();
+ p
+ // home dropped here — but preserve=true
+ };
+ assert!(
+ path.exists(),
+ "directory should still exist after drop with preserve"
+ );
+ // Manual cleanup since preserve=true
+ std::fs::remove_dir_all(&path).unwrap();
+ }
+
+ #[test]
+ fn two_homes_have_different_paths() {
+ let home1 = TempTidalHome::new().unwrap();
+ let home2 = TempTidalHome::new().unwrap();
+ assert_ne!(home1.path(), home2.path());
+ }
+
+ #[test]
+ fn debug_output() {
+ let home = TempTidalHome::new().unwrap();
+ let debug = format!("{home:?}");
+ assert!(debug.contains("TempTidalHome"));
+ assert!(debug.contains("preserve"));
+ }
+}
diff --git a/tidal/src/lib.rs b/tidal/src/lib.rs
index ea86b38..0ec7676 100644
--- a/tidal/src/lib.rs
+++ b/tidal/src/lib.rs
@@ -1,3 +1,4 @@
+pub mod db;
pub mod query;
pub mod ranking;
pub mod schema;
@@ -5,6 +6,20 @@ pub mod signals;
pub mod storage;
pub mod wal;
+/// Build hash compiled in from the `GIT_HASH` environment variable.
+///
+/// Falls back to `"dev"` if `GIT_HASH` is unset or `build.rs` is not invoked.
+pub const BUILD_HASH: &str = match option_env!("TIDALDB_BUILD_HASH") {
+ Some(v) => v,
+ None => "dev",
+};
+
+#[cfg(any(test, feature = "test-utils"))]
+pub use db::TempTidalHome;
+#[cfg(feature = "metrics")]
+pub use db::http::MetricsHandle;
+pub use db::metrics::MetricsState;
+pub use db::{Config, ConfigError, Paths, StorageMode, TidalDb, TidalDbBuilder};
pub use schema::LumenError;
/// Crate-wide result type. All public API methods return `Result`.
diff --git a/tidal/src/schema/error.rs b/tidal/src/schema/error.rs
index de73354..8857452 100644
--- a/tidal/src/schema/error.rs
+++ b/tidal/src/schema/error.rs
@@ -1,6 +1,7 @@
use std::fmt;
use super::{EntityId, EntityKind};
+use crate::db::ConfigError;
/// Top-level error type. Every public API method returns `Result`.
#[derive(Debug)]
@@ -15,6 +16,8 @@ pub enum LumenError {
Durability(DurabilityError),
/// Query malformed. Parse error with details.
Query(QueryError),
+ /// Configuration error. Caller supplied invalid config.
+ Config(ConfigError),
/// Internal invariant violated. This is a bug in Lumen.
Internal(String),
}
@@ -27,6 +30,7 @@ impl fmt::Display for LumenError {
Self::Schema(e) => write!(f, "{e}"),
Self::Durability(e) => write!(f, "durability error: {e}"),
Self::Query(e) => write!(f, "query error: {e}"),
+ Self::Config(e) => write!(f, "config error: {e}"),
Self::Internal(msg) => write!(f, "internal error: {msg}"),
}
}
@@ -39,6 +43,7 @@ impl std::error::Error for LumenError {
Self::Schema(e) => Some(e),
Self::Durability(e) => Some(e),
Self::Query(e) => Some(e),
+ Self::Config(e) => Some(e),
Self::NotFound { .. } | Self::Internal(_) => None,
}
}
@@ -68,6 +73,12 @@ impl From for LumenError {
}
}
+impl From for LumenError {
+ fn from(e: ConfigError) -> Self {
+ Self::Config(e)
+ }
+}
+
/// Schema validation errors.
///
/// `Eq` is manually implemented because f64 fields (from `Duration::as_secs_f64()`)
@@ -91,6 +102,8 @@ pub enum SchemaError {
signal_name: String,
},
NoSignalsDefined,
+ /// Signal type name not found in schema at runtime.
+ UnknownSignalType(String),
}
impl Eq for SchemaError {}
@@ -135,6 +148,9 @@ impl fmt::Display for SchemaError {
)
}
Self::NoSignalsDefined => f.write_str("schema must define at least one signal"),
+ Self::UnknownSignalType(name) => {
+ write!(f, "unknown signal type: '{name}'")
+ }
}
}
}
diff --git a/tidal/src/schema/score.rs b/tidal/src/schema/score.rs
index 9703d8a..10ff5db 100644
--- a/tidal/src/schema/score.rs
+++ b/tidal/src/schema/score.rs
@@ -60,6 +60,7 @@ impl fmt::Debug for Score {
}
#[cfg(test)]
+#[allow(clippy::unwrap_used, clippy::float_cmp)]
mod tests {
use super::*;
@@ -103,14 +104,14 @@ mod tests {
#[test]
fn display_format() {
- let s = Score::new(3.14).unwrap();
- assert_eq!(s.to_string(), "3.140000");
+ let s = Score::new(std::f64::consts::PI).unwrap();
+ assert_eq!(s.to_string(), "3.141593");
}
#[test]
fn debug_format() {
- let s = Score::new(3.14).unwrap();
- assert_eq!(format!("{s:?}"), "Score(3.140000)");
+ let s = Score::new(std::f64::consts::PI).unwrap();
+ assert_eq!(format!("{s:?}"), "Score(3.141593)");
}
#[test]
diff --git a/tidal/src/schema/signal.rs b/tidal/src/schema/signal.rs
index 06dc3e4..1a402d4 100644
--- a/tidal/src/schema/signal.rs
+++ b/tidal/src/schema/signal.rs
@@ -253,6 +253,7 @@ impl<'a> IntoIterator for &'a WindowSet {
}
#[cfg(test)]
+#[allow(clippy::unwrap_used, clippy::float_cmp)]
mod tests {
use super::*;
diff --git a/tidal/src/schema/timestamp.rs b/tidal/src/schema/timestamp.rs
index 8ceff0b..52f0c67 100644
--- a/tidal/src/schema/timestamp.rs
+++ b/tidal/src/schema/timestamp.rs
@@ -76,6 +76,7 @@ impl fmt::Debug for Timestamp {
}
#[cfg(test)]
+#[allow(clippy::float_cmp)]
mod tests {
use super::*;
diff --git a/tidal/src/schema/validation.rs b/tidal/src/schema/validation.rs
index 3a50794..f15cc4b 100644
--- a/tidal/src/schema/validation.rs
+++ b/tidal/src/schema/validation.rs
@@ -243,7 +243,7 @@ fn is_valid_signal_name(name: &str) -> bool {
}
#[cfg(test)]
-#[allow(unused_must_use)]
+#[allow(unused_must_use, clippy::unwrap_used)]
mod tests {
use super::*;
diff --git a/tidal/src/signals/checkpoint.rs b/tidal/src/signals/checkpoint.rs
new file mode 100644
index 0000000..705afe6
--- /dev/null
+++ b/tidal/src/signals/checkpoint.rs
@@ -0,0 +1,856 @@
+//! Checkpoint and restore for the `SignalLedger`.
+//!
+//! # Checkpoint
+//!
+//! `SignalLedger::checkpoint()` serializes all in-memory signal state to the
+//! `StorageEngine` as a single atomic `WriteBatch`. No partial checkpoints are
+//! possible: either the whole ledger is written or nothing is.
+//!
+//! # Restore
+//!
+//! `SignalLedger::restore()` scans the storage, filters for `Tag::Sig` keys,
+//! deserializes each entry, and populates the `DashMap`. Returns the checkpoint
+//! metadata (for WAL replay) or `None` if no checkpoint exists (first boot).
+//!
+//! # Binary format
+//!
+//! Each entry serializes as a 983-byte fixed-length record.
+//! The checkpoint metadata serializes as a 17-byte record at a well-known key.
+//! All payload values use little-endian byte order; storage keys use big-endian
+//! (the existing `encode_key` convention). A version byte at offset 0 enables
+//! future backward-compatible format changes.
+
+use crate::schema::{EntityId, LumenError};
+use crate::storage::{StorageEngine, Tag, WriteBatch, encode_key, parse_key};
+
+use super::SignalTypeId;
+use super::hot::HotSignalState;
+use super::ledger::{EntitySignalEntry, SignalLedger};
+use super::warm::{BucketedCounter, BucketedCounterSnapshot, HOUR_BUCKETS, MINUTE_BUCKETS};
+
+// ── Constants ─────────────────────────────────────────────────────────────────
+
+const VERSION: u8 = 0x01;
+const ENTRY_SIZE: usize = 983;
+const META_SIZE: usize = 17;
+const META_SUFFIX: &[u8] = b"meta";
+
+/// Bit 0 of `flags` field: velocity tracking is enabled for this signal.
+const FLAG_VELOCITY_ENABLED: u16 = 0x0001;
+
+// ── CheckpointMeta ────────────────────────────────────────────────────────────
+
+/// Checkpoint sequence metadata stored alongside the signal state.
+///
+/// Used by the WAL replay mechanism to know where to start replaying.
+/// Events with `wal_sequence > checkpoint.wal_sequence` must be replayed
+/// after `restore()` to bring the ledger's state fully up to date.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub struct CheckpointMeta {
+ /// Nanosecond timestamp when the checkpoint was taken.
+ pub checkpoint_time_ns: u64,
+ /// WAL sequence number at checkpoint time.
+ pub wal_sequence: u64,
+}
+
+// ── Serialization ─────────────────────────────────────────────────────────────
+
+/// Serialize an `EntitySignalEntry` to a 983-byte buffer.
+///
+/// # Binary layout (all payload values little-endian)
+///
+/// ```text
+/// Offset Size Field
+/// 0 1 version (0x01)
+/// 1 8 entity_id (u64 LE)
+/// 9 2 signal_type_id (u16 LE)
+/// 11 2 flags (u16 LE) — bit 0: velocity_enabled
+/// 13 8 last_update_ns (u64 LE)
+/// 21 8 decay_score_0 (f64 bits LE)
+/// 29 8 decay_score_1 (f64 bits LE)
+/// 37 8 decay_score_2 (f64 bits LE)
+/// 45 1 current_minute (u8)
+/// 46 1 current_hour (u8)
+/// 47 8 all_time_count (u64 LE)
+/// 55 8 last_minute_rotation_ns (u64 LE)
+/// 63 8 last_hour_rotation_ns (u64 LE)
+/// 71 240 minute_buckets (60 × u32 LE)
+/// 311 672 hour_buckets (168 × u32 LE)
+/// Total: 983 bytes
+/// ```
+#[must_use]
+pub fn serialize_entry(
+ entity_id: EntityId,
+ signal_type_id: SignalTypeId,
+ entry: &EntitySignalEntry,
+) -> Vec {
+ let mut buf = Vec::with_capacity(ENTRY_SIZE);
+
+ // [0] version
+ buf.push(VERSION);
+
+ // [1..9] entity_id LE
+ buf.extend_from_slice(&entity_id.as_u64().to_le_bytes());
+
+ // [9..11] signal_type_id LE
+ buf.extend_from_slice(&signal_type_id.as_u16().to_le_bytes());
+
+ // [11..13] flags LE — derived from hot-tier immutable fields
+ let flags: u16 = if entry.hot.velocity_enabled() {
+ FLAG_VELOCITY_ENABLED
+ } else {
+ 0
+ };
+ buf.extend_from_slice(&flags.to_le_bytes());
+
+ // [13..21] last_update_ns LE
+ buf.extend_from_slice(&entry.hot.last_update_ns().to_le_bytes());
+
+ // [21..45] three decay scores as f64 bits LE
+ for i in 0..3 {
+ buf.extend_from_slice(&entry.hot.stored_score(i).to_bits().to_le_bytes());
+ }
+
+ // Snapshot warm tier (atomic reads of all bucket state)
+ let snap = entry.warm.snapshot();
+
+ // [45] current_minute (u8)
+ buf.push(snap.current_minute);
+
+ // [46] current_hour (u8)
+ buf.push(snap.current_hour);
+
+ // [47..55] all_time_count LE
+ buf.extend_from_slice(&snap.all_time_count.to_le_bytes());
+
+ // [55..63] last_minute_rotation_ns LE
+ buf.extend_from_slice(&snap.last_minute_rotation_ns.to_le_bytes());
+
+ // [63..71] last_hour_rotation_ns LE
+ buf.extend_from_slice(&snap.last_hour_rotation_ns.to_le_bytes());
+
+ // [71..311] minute_buckets (60 × u32 LE = 240 bytes)
+ for &bucket in &snap.minute_buckets {
+ buf.extend_from_slice(&bucket.to_le_bytes());
+ }
+
+ // [311..983] hour_buckets (168 × u32 LE = 672 bytes)
+ for &bucket in &snap.hour_buckets {
+ buf.extend_from_slice(&bucket.to_le_bytes());
+ }
+
+ debug_assert_eq!(buf.len(), ENTRY_SIZE, "serialize_entry produced wrong size");
+ buf
+}
+
+/// Deserialize an `EntitySignalEntry` from bytes.
+///
+/// Returns `(entity_id, signal_type_id, entry)` on success.
+///
+/// # Errors
+///
+/// Returns `Err` if:
+/// - The slice is not exactly `ENTRY_SIZE` (983) bytes
+/// - The version byte is not `0x01`
+/// - Any sub-slice conversion fails due to offset math errors
+pub fn deserialize_entry(
+ bytes: &[u8],
+) -> Result<(EntityId, SignalTypeId, EntitySignalEntry), String> {
+ if bytes.len() != ENTRY_SIZE {
+ return Err(format!("expected {ENTRY_SIZE} bytes, got {}", bytes.len()));
+ }
+
+ // [0] version check
+ if bytes[0] != VERSION {
+ return Err(format!(
+ "unknown checkpoint version 0x{:02x}, expected 0x{:02x}",
+ bytes[0], VERSION
+ ));
+ }
+
+ // [1..9] entity_id LE
+ let entity_id_val = u64::from_le_bytes(
+ bytes[1..9]
+ .try_into()
+ .map_err(|_| "offset math error at entity_id [1..9]".to_string())?,
+ );
+ let entity_id = EntityId::new(entity_id_val);
+
+ // [9..11] signal_type_id LE
+ let signal_type_id_val = u16::from_le_bytes(
+ bytes[9..11]
+ .try_into()
+ .map_err(|_| "offset math error at signal_type_id [9..11]".to_string())?,
+ );
+ let signal_type_id = SignalTypeId::new(signal_type_id_val);
+
+ // [11..13] flags LE
+ let flags = u16::from_le_bytes(
+ bytes[11..13]
+ .try_into()
+ .map_err(|_| "offset math error at flags [11..13]".to_string())?,
+ );
+ let velocity_enabled = (flags & FLAG_VELOCITY_ENABLED) != 0;
+
+ // [13..21] last_update_ns LE
+ let last_update_ns = u64::from_le_bytes(
+ bytes[13..21]
+ .try_into()
+ .map_err(|_| "offset math error at last_update_ns [13..21]".to_string())?,
+ );
+
+ // [21..45] three decay scores as f64 bits LE
+ let score_0 = f64::from_bits(u64::from_le_bytes(
+ bytes[21..29]
+ .try_into()
+ .map_err(|_| "offset math error at score_0 [21..29]".to_string())?,
+ ));
+ let score_1 = f64::from_bits(u64::from_le_bytes(
+ bytes[29..37]
+ .try_into()
+ .map_err(|_| "offset math error at score_1 [29..37]".to_string())?,
+ ));
+ let score_2 = f64::from_bits(u64::from_le_bytes(
+ bytes[37..45]
+ .try_into()
+ .map_err(|_| "offset math error at score_2 [37..45]".to_string())?,
+ ));
+
+ // [45] current_minute (u8)
+ let current_minute = bytes[45];
+
+ // [46] current_hour (u8)
+ let current_hour = bytes[46];
+
+ // [47..55] all_time_count LE
+ let all_time_count = u64::from_le_bytes(
+ bytes[47..55]
+ .try_into()
+ .map_err(|_| "offset math error at all_time_count [47..55]".to_string())?,
+ );
+
+ // [55..63] last_minute_rotation_ns LE
+ let last_minute_rotation_ns = u64::from_le_bytes(
+ bytes[55..63]
+ .try_into()
+ .map_err(|_| "offset math error at last_minute_rotation_ns [55..63]".to_string())?,
+ );
+
+ // [63..71] last_hour_rotation_ns LE
+ let last_hour_rotation_ns = u64::from_le_bytes(
+ bytes[63..71]
+ .try_into()
+ .map_err(|_| "offset math error at last_hour_rotation_ns [63..71]".to_string())?,
+ );
+
+ // [71..311] minute_buckets (60 × u32 LE)
+ let mut minute_buckets = [0u32; MINUTE_BUCKETS];
+ for (i, bucket) in minute_buckets.iter_mut().enumerate() {
+ let off = 71 + i * 4;
+ *bucket = u32::from_le_bytes(bytes[off..off + 4].try_into().map_err(|_| {
+ format!(
+ "offset math error at minute_bucket[{i}] [{off}..{}]",
+ off + 4
+ )
+ })?);
+ }
+
+ // [311..983] hour_buckets (168 × u32 LE)
+ let mut hour_buckets = [0u32; HOUR_BUCKETS];
+ for (i, bucket) in hour_buckets.iter_mut().enumerate() {
+ let off = 311 + i * 4;
+ *bucket =
+ u32::from_le_bytes(bytes[off..off + 4].try_into().map_err(|_| {
+ format!("offset math error at hour_bucket[{i}] [{off}..{}]", off + 4)
+ })?);
+ }
+
+ // Reconstruct hot tier
+ let hot = HotSignalState::with_flags(entity_id_val, signal_type_id_val, velocity_enabled);
+ hot.restore(last_update_ns, &[score_0, score_1, score_2]);
+
+ // Reconstruct warm tier from snapshot
+ let warm = BucketedCounter::new();
+ warm.restore(&BucketedCounterSnapshot {
+ minute_buckets,
+ hour_buckets,
+ current_minute,
+ current_hour,
+ all_time_count,
+ last_minute_rotation_ns,
+ last_hour_rotation_ns,
+ });
+
+ Ok((entity_id, signal_type_id, EntitySignalEntry { hot, warm }))
+}
+
+/// Serialize `CheckpointMeta` to a 17-byte buffer.
+///
+/// Format: `[version: 1][checkpoint_time_ns: 8 LE][wal_sequence: 8 LE]`
+#[must_use]
+pub fn serialize_meta(meta: &CheckpointMeta) -> Vec {
+ let mut buf = Vec::with_capacity(META_SIZE);
+ buf.push(VERSION);
+ buf.extend_from_slice(&meta.checkpoint_time_ns.to_le_bytes());
+ buf.extend_from_slice(&meta.wal_sequence.to_le_bytes());
+ debug_assert_eq!(buf.len(), META_SIZE);
+ buf
+}
+
+/// Deserialize `CheckpointMeta` from bytes.
+///
+/// # Errors
+///
+/// Returns `Err` if the slice is not `META_SIZE` bytes, the version byte
+/// is unknown, or any sub-slice conversion fails.
+pub fn deserialize_meta(bytes: &[u8]) -> Result {
+ if bytes.len() != META_SIZE {
+ return Err(format!(
+ "expected {META_SIZE} meta bytes, got {}",
+ bytes.len()
+ ));
+ }
+ if bytes[0] != VERSION {
+ return Err(format!(
+ "unknown checkpoint meta version 0x{:02x}, expected 0x{:02x}",
+ bytes[0], VERSION
+ ));
+ }
+ let checkpoint_time_ns = u64::from_le_bytes(
+ bytes[1..9]
+ .try_into()
+ .map_err(|_| "offset math error at checkpoint_time_ns [1..9]".to_string())?,
+ );
+ let wal_sequence = u64::from_le_bytes(
+ bytes[9..17]
+ .try_into()
+ .map_err(|_| "offset math error at wal_sequence [9..17]".to_string())?,
+ );
+ Ok(CheckpointMeta {
+ checkpoint_time_ns,
+ wal_sequence,
+ })
+}
+
+// ── SignalLedger impl ─────────────────────────────────────────────────────────
+
+impl SignalLedger {
+ /// Write all in-memory signal state to the storage engine atomically.
+ ///
+ /// Iterates the `DashMap` and serializes each entry into a `WriteBatch`.
+ /// The checkpoint metadata is stored at a well-known key:
+ /// `encode_key(EntityId::new(0), Tag::Sig, b"meta")`.
+ ///
+ /// # Errors
+ ///
+ /// Returns `LumenError::Storage` if the `WriteBatch` commit or `flush` fails.
+ ///
+ /// # Concurrency
+ ///
+ /// This method iterates `DashMap` shards without a global lock. Entries
+ /// written concurrently to already-snapshotted shards will be absent from
+ /// the checkpoint. The caller must supply `meta.wal_sequence` equal to the
+ /// WAL tail at checkpoint start; restore must replay from that sequence to
+ /// recover any missing entries.
+ pub fn checkpoint(
+ &self,
+ storage: &dyn StorageEngine,
+ meta: CheckpointMeta,
+ ) -> crate::Result<()> {
+ let mut batch = WriteBatch::new();
+
+ // Write checkpoint metadata at the well-known meta key.
+ let meta_key = encode_key(EntityId::new(0), Tag::Sig, META_SUFFIX);
+ batch.put(meta_key, serialize_meta(&meta));
+
+ // Write all entity-signal entries.
+ for entry_ref in self.entries() {
+ let &(entity_id, signal_type_id) = entry_ref.key();
+ let entry = entry_ref.value();
+ // Entry key suffix is the signal_type_id as 2 big-endian bytes,
+ // so it is exactly 2 bytes — never collides with b"meta" (4 bytes).
+ let suffix = signal_type_id.as_u16().to_be_bytes();
+ let key = encode_key(entity_id, Tag::Sig, &suffix);
+ let value = serialize_entry(entity_id, signal_type_id, entry);
+ batch.put(key, value);
+ }
+
+ storage.write_batch(batch)?;
+ storage.flush()?;
+ Ok(())
+ }
+
+ /// Restore in-memory signal state from the storage engine.
+ ///
+ /// Scans all keys, filters for `Tag::Sig` entries (excluding the meta key),
+ /// deserializes each entry, and inserts it into the `DashMap`.
+ ///
+ /// Returns `Some(CheckpointMeta)` if a checkpoint exists, or `None` on
+ /// first boot (empty storage).
+ ///
+ /// # Errors
+ ///
+ /// - `LumenError::Storage` on I/O failure
+ /// - `LumenError::Internal` on deserialization failure (corrupt checkpoint)
+ pub fn restore(&self, storage: &dyn StorageEngine) -> crate::Result