feat: M0p1 runtime skeleton, M0p2 tooling & diagnostics, m1p4 signal ledger

## M0p1 — Embeddable Runtime Skeleton (329 tests) - TidalDb with builder(), health_check(), close(), and Drop-based cleanup - TidalDbBuilder fluent API: ephemeral(), with_data_dir(), wal_dir(), cache_dir() - Config, StorageMode, ConfigError types; Config(ConfigError) variant on LumenError - Paths: single source of truth for directory layout (wal, items, users, creators, cache) - TempTidalHome: test isolation helper gated behind #[cfg(test)] / test-utils feature - 8 integration tests: tests/sandboxed_storage.rs ## M0p2 — Tooling & Diagnostics (349 tests) - Workspace root Cargo.toml (members: ["tidal", "tidalctl"]) - tidal/build.rs: BUILD_HASH from GIT_HASH with option_env!() fallback to "dev" - MetricsState: always-compiled Arc-shared atomics (uptime, health_ok) - MetricsHandle (metrics feature): hand-rolled TcpListener HTTP, zero new deps - GET /healthz → {"status":"ok","uptime_secs":N} - GET /metrics → Prometheus text (tidaldb_uptime_seconds, health_ok, info) - TidalDbBuilder.enable_metrics(addr) starts background metrics thread - tidalctl binary: status + paths commands, manual std::env::args() parsing - 7 metrics integration tests, 9 tidalctl CLI tests ## m1p4 Signal Ledger (in-progress) - SignalLedger: DashMap<(EntityId, SignalTypeId), EntitySignalEntry>, WAL-first writes - HotSignalState: #[repr(C, align(64))], lock-free CAS decay, out-of-order handling - BucketedCounter: 60 per-minute + 168 per-hour circular buffers, trigger-based rotation - CheckpointMeta + serialize/restore: 983-byte fixed records, atomic WriteBatch - Property tests: running score matches analytical to 1e-6, decay monotonic, non-negative - Proptest regression: signals/warm.txt ## Documentation and planning - ROADMAP: m0p1 COMPLETE (329), m0p2 COMPLETE (349), product track milestones - PRODUCT_ROADMAP: P0-P4 product milestone track (personal briefing beachhead) - Milestone planning docs: milestone-0 (phases 1-3), milestone-p (phases 1-5) - docs/research/tidaldb_tooling_and_diagnostics.md - ARCHITECTURE.md, CLAUDE.md, VISION.md updates ## Site - Blog: every-platform-builds-the-same-6-systems.mdx (new) - Blog: why-tidaldb.mdx (updated) - next.config.ts, layout.tsx, blog/page.tsx updates Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 20:32:00 -07:00 · 2026-02-20 20:32:00 -07:00 · 4f076c927d
commit 4f076c927d
parent 29400d48db
87 changed files with 8779 additions and 122 deletions
--- a/.claude/skills/write-blog/skill.md
+++ b/.claude/skills/write-blog/skill.md
@ -8,47 +8,75 @@ agent: tidal-storyteller
 Write and publish blog posts for tidalDB using the **tidal-storyteller** agent.
 ## Content Strategy Reference
 **Read `docs/content-strategy.md` before writing any post.** It maps every blog post idea to a specific roadmap phase, names the thesis, identifies the source material, and specifies when the post is ready to publish.
 The content strategy defines 16-20 posts across the full roadmap. Do not invent posts outside this plan without first checking whether the strategy already covers the topic. If it does, follow the strategy's guidance for that post. If the strategy has a gap, propose adding to it -- do not write an orphan post.
 ### Determining What to Write Now
 1. Check the **Current Status** section in `docs/planning/ROADMAP.md` to identify which phases are complete
 2. Cross-reference with the **Reference: Roadmap to Post Mapping** table in `docs/content-strategy.md`
 3. The post is ready to write when its roadmap phase has passed UAT and benchmark numbers exist
 4. Exception: Post 1 ("Every content platform builds the same 6 systems from scratch") can be written any time -- it depends on the problem, not the implementation
 ### Current Queue (update as phases complete)
 As of m1p3 complete, m1p4 next:
 | Priority | Post | Status |
 |----------|------|--------|
 | 1 | Post 1: "Every content platform builds the same 6 systems from scratch" | Ready -- no code dependency |
 | 2 | Post 3: "What three databases taught us before we wrote a line of code" | Ready -- m1p1-m1p3 complete, thoughts.md is source |
 | 3 | "Why we chose fjall over RocksDB (for now)" | Ready -- m1p3 complete |
 | 4 | Post 2: "Running decay scores are O(1)" | Blocked on m1p4 benchmarks |
 | 5 | Post 4: "Signals wrote 100ms ago. The query sees them now." | Blocked on m1p5 / M1 UAT |
 ## When to Use
- After completing a roadmap phase or milestone
+- After completing a roadmap phase or milestone -- check the content strategy for which post maps to that phase
- When an architectural decision deserves a public narrative
+- When an architectural decision deserves a public narrative -- check if the strategy already has an ADR planned for it
- When a benchmark result tells a compelling story
+- When a benchmark result tells a compelling story -- the strategy specifies which posts need benchmark data
 - For "building in public" devlog entries
 - When announcing a release, feature, or open-source milestone
 ## Context to Load
-Before writing, the agent must read:
+Before writing, the agent must read -- in this order:
-1. **Relevant source files** — the code that was written or changed
+
-2. **Git log** — `git log --oneline` for the period covered
+1. **`docs/content-strategy.md`** -- find the post in the strategy, read its thesis, source material, and publication criteria
-3. **Research docs** — `docs/research/` for technical backing
+2. **The source material named in the strategy** -- the specific docs, research files, and task docs listed for that post
-4. **Previous blog posts** — maintain voice consistency across posts
+3. **Relevant source files** -- the actual Rust code that was written or changed
-5. **VISION.md** — for tonal calibration (match its conviction)
+4. **Git log** -- `git log --oneline` for the period covered
-6. **thoughts.md** — for the deeper "why" behind architectural patterns
+5. **Previous blog posts** -- `site/` blog content directory for voice consistency
 6. **VISION.md** -- for tonal calibration (match its conviction)
 7. **thoughts.md** -- for the deeper "why" behind architectural patterns
 ## Blog Post Types
 ### Architecture Decision Record (ADR)
 **When:** A major architectural choice was made and the reasoning is worth sharing.
 **Strategy posts:** Post 3 (three databases), Post 7 (ranking profiles), Post 9 (negative signals), Post 12 (Tantivy), Post 16 (graceful degradation), plus the "anytime" ADRs (Why not SQL, Why fjall, USearch not from scratch).
 **Structure:**
 1. The problem in one sentence
 2. What we considered (2-3 options, honestly assessed)
-3. What we chose and why — the specific evidence
+3. What we chose and why -- the specific evidence
 4. Code showing the result
-5. What we'd watch for (risks, trade-offs acknowledged)
+5. What we would watch for (risks, trade-offs acknowledged)
 **Title pattern:** Thesis statement, not label.
- "Running decay scores are O(1) — here's the math" not "Signal System Architecture"
+- "Running decay scores are O(1) -- here is the math" not "Signal System Architecture"
 - "Why we chose fjall over RocksDB (for now)" not "Storage Engine Decision"
 ### Devlog / Progress Update
 **When:** A phase or milestone was completed.
 **Strategy posts:** Post 4 (M1 complete), Post 5 (M2 complete), Post 13 (M5 complete).
 **Structure:**
 1. What we set out to build (the goal, in one sentence)
 2. The hardest part (the interesting engineering, not a changelog)
 3. What surprised us (the insight the reader takes away)
 4. Code showing the key breakthrough
-5. What's next (one sentence, not a roadmap dump)
+5. What is next (one sentence, not a roadmap dump)
 **Title pattern:** The insight, not the timeframe.
 - "10M signals, 4 microseconds" not "Phase 2 Complete"
@ -56,6 +84,7 @@ Before writing, the agent must read:
 ### Technical Deep Dive
 **When:** A specific technique deserves its own focused explanation.
 **Strategy posts:** Post 2 (decay math), Post 6 (diversity), Post 8 (feedback loop), Post 10 (cold start), Post 11 (hybrid search), Post 14 (cohort trending), Post 15 (crash recovery).
 **Structure:**
 1. The problem this solves (relatable, concrete)
 2. Why the obvious approach fails (with numbers)
@ -67,13 +96,25 @@ Before writing, the agent must read:
 - "Forward decay eliminates 99% of read-time computation" not "How We Handle Decay"
 - "Diversity enforcement in 3 microseconds" not "Our Ranking System"
 ### Vision / Problem Statement
 **When:** Defining the problem space before or independent of implementation.
 **Strategy posts:** Post 1 (the 6-system stack).
 **Structure:**
 1. The problem, made visceral -- name the systems, name the failure modes
 2. Why it exists (historical accident, not intentional design)
 3. The thesis: what should be true instead
 4. The one-query vision (end with the destination, not a product pitch)
 **Title pattern:** The indictment.
 - "Every content platform builds the same 6 systems from scratch"
 ### Announcement
 **When:** A release, open-source milestone, or public launch.
 **Structure:**
 1. What it is (one sentence)
 2. What you can do with it (3-5 bullet points with code)
 3. Install/quickstart command (prominent, copy-pasteable)
-4. What's different about this (the thesis — why this exists)
+4. What is different about this (the thesis -- why this exists)
 5. Links: GitHub, docs, community
 ## Writing Standards
@ -81,7 +122,7 @@ Before writing, the agent must read:
 ### Voice
 - Active voice. Short sentences. Concrete nouns.
 - First person plural ("we") for team decisions, second person ("you") for reader actions
- Technical precision without jargon — say "O(1) per write" not "blazingly fast"
+- Technical precision without jargon -- say "O(1) per write" not "blazingly fast"
 - Humor only when it lands naturally. Never forced.
 ### Structure
@ -92,11 +133,14 @@ Before writing, the agent must read:
 - 800-1500 words for devlogs, 1500-3000 for deep dives
 ### Code Examples
- Must be real — from the actual codebase or a working reproduction
+- Must be real -- from the actual codebase or a working reproduction
 - Must be copy-pasteable
 - Include enough context to understand without reading the whole post
 - Syntax highlighted with the site's muted dark palette
- Annotated with comments only where the code isn't self-evident
+- Annotated with comments only where the code is not self-evident
 ### Audience Calibration
 The reader is an engineer who has built or maintains a recommendation/discovery system. They know what Kafka consumer lag feels like. They know why ranking pipeline cache invalidation bugs never get root-caused. They will recognize the 6-system stack because they operate it. Do not explain what Elasticsearch is. Do not explain what a vector database does. Start from shared pain.
 ### Frontmatter
 ```yaml
@ -111,21 +155,31 @@ tags: ["signals", "architecture", "rust"]
 ## Workflow
-1. **Gather context** — read source files, git log, research docs, previous posts
+1. **Check the strategy** -- read `docs/content-strategy.md`, find the post, confirm the phase is complete
-2. **Find the headline** — the one insight worth sharing. Write it as a thesis.
+2. **Gather context** -- read the source material listed in the strategy entry for this post
-3. **Write the draft** — narrative first, code second
+3. **Find the headline** -- the strategy provides a working title. Sharpen it. If it does not work as a tweet, rewrite it.
-4. **Cut in half** — remove every sentence that doesn't earn its place
+4. **Write the draft** -- narrative first, code second
-5. **Add code** — working examples that show the key insight
+5. **Verify code** -- every example must compile and run against the current codebase
-6. **Read aloud** — if you stumble, rewrite
+6. **Cut in half** -- remove every sentence that does not earn its place
-7. **Write as MDX** — save to the blog content directory with proper frontmatter
+7. **Read aloud** -- if you stumble, rewrite
 8. **Write as MDX** -- save to the blog content directory with proper frontmatter
 ## Quality Checks
 - [ ] Post matches a specific entry in `docs/content-strategy.md`
 - [ ] The roadmap phase for this post is complete (code shipped, not planned)
 - [ ] Title works as a standalone tweet
 - [ ] First paragraph earns the reader's second paragraph
- [ ] Every code example is correct and copy-pasteable
+- [ ] Every code example is correct, copy-pasteable, and from the shipped codebase
- [ ] No marketing language ("leverage," "seamless," "robust," "empower")
+- [ ] Benchmark numbers come from actual `criterion` runs, not estimates
 - [ ] No marketing language ("leverage," "seamless," "robust," "empower," "excited to announce")
 - [ ] Under 3000 words (deep dives) or 1500 words (devlogs)
 - [ ] Ends with something the reader remembers tomorrow
 - [ ] Frontmatter is complete (title, date, author, description, tags)
 - [ ] Would a CTO forward this to their team? If not, rewrite.
 ## After Publishing
 1. Update the **Current Queue** table in this skill to reflect the new state
 2. If the post revealed a new insight worth a follow-up, propose adding it to `docs/content-strategy.md`
 3. Do not write the next post until there is something true to say about it
--- a/ARCHITECTURE.md
+++ b/ARCHITECTURE.md
@ -18,7 +18,7 @@ tidalDB treats ranking as a primitive. Signals, decay, velocity, user preference
 ## Domain Model
-Five first-class entity types:
+Six first-class entity types:
 | Type | What it represents |
 |------|--------------------|
@ -27,8 +27,9 @@ Five first-class entity types:
 | **Creator** | An author — has attributes, an embedding slot, a signal ledger |
 | **Relationship** | A weighted, directional edge between any two entities (follows, blocks, interaction weight) |
 | **Cohort** | A named, live predicate over user attributes (e.g. `age_range ∈ {18-24} AND locale = en-US`) |
 | **Session / Agent Context** | A short-lived, agent-scoped memory surface binding a user, agent identity, and session metadata (tools, reward hints, policy) |
-Five schema-level primitives:
+Six schema-level primitives:
 | Primitive | What it captures |
 |-----------|-----------------|
@ -36,6 +37,7 @@ Five schema-level primitives:
 | **Ranking Profile** | A named, versioned scoring function: candidate retrieval strategy, boosts, penalties, quality gates, diversity rules, exploration budget |
 | **Relationship** | Weighted edges: follows, blocks, interaction strength — used as ranking inputs |
 | **Cohort** | Live predicate membership — enables cohort-scoped signal aggregation and trending |
 | **Session** | Agent-scoped conversational context: short-lived signals, reward hints, policy tags, decay curves |
 | **Filter** | Composable predicates over entity attributes, signal values, and relationship state |
 ---
@ -51,7 +53,9 @@ storage/     ← depends on schema; knows nothing about signals or ranking
  ↑
 signals/     ← depends on storage; knows nothing about queries or ranking
  ↑
-query/       ← depends on storage + signals; orchestrates execution
+agent/       ← depends on signals; manages sessions, policy, agent APIs
  ↑
 query/       ← depends on agent + signals; orchestrates execution
  ↑
 ranking/     ← depends on signals; invoked by the query executor
 ```
@ -79,6 +83,14 @@ Signal ingestion and aggregation. Owns:
 - **Aggregation** — windowed counters (SWAG-based), velocity computation
 - **Materialization** — background worker that writes pre-computed aggregates to O(1) lookup keys
 ### `agent/`
 Session and policy management for agents. Owns:
 - **Session store** — lifecycle of `(user, agent, session_id)` plus short-lived signals
 - **Session materializers** — aggressive-decay aggregates agents can query (`last_5m_reward`, “tools used”, etc.)
 - **Policy enforcement** — per-agent read/write rules, rate limiting, and isolation guardrails
 - **API surface** — typed commands (`start_session`, `append_signal`, `close_session`) used by query/ranking layers
 ### `query/`
 Query parsing and execution. Owns:
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -110,3 +110,5 @@ The pre-commit hook runs automatically on staged files:
 - **site/ (Next.js):** `eslint` (if node_modules installed)
 All cargo commands use `--manifest-path tidal/Cargo.toml` since the Rust project is not at repo root.
 **Tests must be fast.** Slow or hanging tests are bugs — diagnose root cause, then remove, fix, or refactor; never leave them hanging.
--- a/Cargo.lock
+++ b/Cargo.lock
--- a/Cargo.toml
+++ b/Cargo.toml
@ -0,0 +1,8 @@
 [workspace]
 members = ["tidal", "tidalctl"]
 resolver = "2"
 [workspace.package]
 edition = "2024"
 rust-version = "1.91"
 license = "MIT"
--- a/VISION.md
+++ b/VISION.md
@ -20,6 +20,7 @@ A database purpose-built for personalized content delivery should model the worl
 - Content has metadata, embeddings, and signals. Signals are not fields — they are typed, timestamped streams with native decay, velocity, and windowed aggregation semantics.
 - Users have preferences, histories, and relationships. These are not rows — they are living profiles that update continuously as events arrive.
 - Agents mediate most interactions. They retrieve context, elicit preferences, and publish structured feedback (reward, tool usage, confidence) as first-class signals. The system must let them read and write memory instantly.
 - A query is not "give me items matching these filters sorted by this field." It is "given this user, this context, and this surface — what should they see, in what order, subject to these constraints?"
 - Filters, sort modes, and diversity rules are first-class query citizens — not application logic bolted on top.
 - Engagement is not application logic that happens to write back into the database. It is a first-class write path that closes the feedback loop natively.
@ -46,11 +47,14 @@ It is strongly opinionated. It does not try to be a general-purpose database. It
 **Cohorts** are named predicates over user attributes — demographic, behavioral, and interest-based segments. A cohort is not a static list of users — it is a live query over user state. "US users aged 18-24 who engage with jazz content" is a cohort. The database maintains per-cohort signal aggregation so that trending, rising, and quality signals can be scoped to any cohort at query time. This enables the three-layer trending model: global trending, cohort-scoped trending, and search within cohort-scoped trending.
 **Sessions / Agent Context** capture in-flight conversations and tool use. They bind a user, an agent, and a session identifier to short-lived signals (preference hints, rewards, critiques) with aggressive decay. Sessions can be forked, merged, and policy-limited so an agent only sees what it is allowed to remember.
 **The Query** is a single operation that encapsulates candidate retrieval, filtering, ranking, and diversity enforcement:
 ```
 RETRIEVE items
 FOR USER @user_id
 FOR SESSION @session_id
 CONTEXT feed
 USING PROFILE for_you
 FILTER unseen, unblocked, format:video, duration:short
@ -76,6 +80,7 @@ Search within cohort-scoped trending:
 ```
 SEARCH items
 QUERY "piano tutorial"
 FOR SESSION @session_id
 WITHIN TRENDING
 COHORT locale:US, age:18-24, interest:jazz
 WINDOW 24h
@ -155,7 +160,7 @@ Every one of these surfaces is driven by the same underlying query primitives. T
 ### The Feedback Loop
-When a user engages with content — views, likes, skips, hides — that event is written to the database as a signal. The database updates the item's signal ledger, the user's implicit preference profile, and the relationship weight between the user and the creator — automatically, as part of the write transaction. The next ranking query reflects this immediately. There is no Kafka consumer to lag, no feature store sync to schedule, no cache to invalidate.
+When a user engages with content — directly or via an agent — that event is written to the database as a signal. The agent can attach structured metadata (reward, confidence, tool invocation) in the same write. The database updates the item's signal ledger, the user's implicit preference profile, the relationship weight, and the session-scoped memory — automatically, as part of the write transaction. The next ranking or grounding query reflects this immediately. There is no Kafka consumer to lag, no feature store sync to schedule, no cache to invalidate.
 Negative signals are equal citizens. A skip, a hide, a block, a "not interested," a downvote — these update the system with the same immediacy and precision as a like or a completion.
@ -189,6 +194,8 @@ It is not trying to solve moderation, payments, authentication, or content deliv
 **Cohorts are live queries, not static lists.** A cohort is a predicate over user attributes — demographics, interests, behavioral segments. Users flow in and out of cohorts as their attributes change. Signal aggregation runs per-cohort so trending and quality signals reflect what's happening within any audience segment.
 **Agents own managed contexts.** Sessions scope short-lived memory, rewards, and tool usage. Agents can only read/write within their sessions, and policy guards live in schema, not ad-hoc middleware.
 **Correctness over cleverness.** Ranking is already approximate by nature. The database does not need to be more clever than the signals it has. It needs to be fast, consistent, and operationally simple.
 ## Who This Is For
--- a/docs/content-strategy.md
+++ b/docs/content-strategy.md
@ -0,0 +1,304 @@
 # Content Strategy
 Blog posts mapped to the tidalDB roadmap. Each entry identifies the moment worth writing about, the thesis that makes it shareable, and the type of post it demands.
 The audience is engineers who have built or are currently maintaining recommendation and discovery systems -- the people running the 6-system stack this database replaces. They know what Kafka lag feels like at 3am. They know why cache invalidation bugs in the ranking pipeline are the ones that never get root-caused. They will smell marketing language from the first sentence. Respect that.
 ---
 ## Publishing Principles
 **Write when something is true, not when something is scheduled.** A blog post published the day a milestone passes its UAT is credible. A blog post published before the code works is fiction.
 **One insight per post.** The reader should leave with a single idea they did not have before. If the post contains two insights, it is two posts.
 **Code proves claims.** Every technical assertion is backed by a code example or a benchmark number from the actual codebase. Not a prototype. Not a plan. The shipped code.
 **The title is the thesis.** If the title does not work as a standalone sentence that makes an engineer stop scrolling, the post is not ready.
 ---
 ## Content Calendar
 ### Pre-Implementation (Now)
 These posts can be written before the engine is feature-complete. They draw on the vision, architecture research, and the problem space -- not on shipped code.
 #### Post 1: "Every content platform builds the same 6 systems from scratch"
 - **Type:** Vision / Problem Statement
 - **Thesis:** The Elasticsearch + Redis + Kafka + feature store + vector DB + ranking service stack is not an architecture. It is scar tissue. The seams between these systems are where correctness dies.
 - **Source material:** VISION.md, thoughts.md (Part VI)
 - **When to publish:** Any time. This post defines the problem and does not depend on implementation progress.
 - **Why it matters:** This is the foundational narrative. Every subsequent post assumes the reader understands this problem. It also serves as the litmus test for whether the audience cares -- if this post does not resonate, the subsequent ones will not either.
 - **Structure:** Problem statement. The 6 systems named and indicted. The seams enumerated (stale signals, ETL lag, cache invalidation, operational burden). The thesis: ranking is not a feature, it is a primitive. End with the one-query vision, not with a product pitch.
 ---
 ### Milestone 1: Signal Engine
 M1 proves that temporal signals with O(1) decay, velocity, and windowed aggregation work as a database primitive. This is the most technically interesting milestone for blog content because the math is elegant and the performance numbers are dramatic.
 #### Post 2: "Running decay scores are O(1) -- here is the math"
 - **Type:** Technical Deep Dive
 - **Roadmap phase:** m1p4 (Signal Ledger) completion
 - **Thesis:** The forward-decay formula `S(t) = S(t_prev) * exp(-lambda * dt) + weight` eliminates raw-event scanning at query time. Three `exp()` calls on write, one on read. 15 nanoseconds per entity. Every platform computing `trending_score = views / (age + 2)^1.8` in application code is doing O(N) work that should be O(1).
 - **Source material:** docs/research/tidaldb_signal_ledger.md, ARCHITECTURE.md (Signal System section), m1p4 task docs
 - **When to publish:** After m1p4 passes UAT with benchmark numbers in hand.
 - **Code to include:** The `EntitySignalState` struct. The forward-decay write path. The out-of-order event correction. Benchmark output showing 200-entity scoring pass under 5 microseconds.
 - **Why it matters:** This is the post that demonstrates tidalDB is not vaporware. The math is verifiable. The benchmarks are reproducible. Engineers who have implemented trending scores in Redis will immediately understand the value.
 #### Post 3: "What three databases taught us before we wrote a line of code"
 - **Type:** Architecture Decision Record
 - **Roadmap phase:** m1p1-m1p3 completion (the foundation phases)
 - **Thesis:** We studied Engram (cognitive memory), Citadel (append-only logging), and StemeDB (knowledge graph) -- three purpose-built databases in the same codebase -- and stole their best patterns. WAL-first durability from Citadel. Cache-line aligned hot structs from Engram. Subject-prefix key encoding from StemeDB. Background materialization from StemeDB. Here is what converged and what the gaps taught us.
 - **Source material:** thoughts.md (all six parts), CODING_GUIDELINES.md
 - **When to publish:** After m1p3 (Storage Engine) is complete. The patterns referenced are already implemented.
 - **Code to include:** Key encoding format. Cache-line aligned struct. Group commit writer. Side-by-side comparison of the pattern in the source database and in tidalDB.
 - **Why it matters:** Engineers respect builders who study prior art. This post establishes technical credibility and shows the architectural foundation is grounded in real patterns, not invented from scratch.
 #### Post 4: "Signals wrote 100ms ago. The query sees them now."
 - **Type:** Devlog / Milestone Announcement
 - **Roadmap phase:** m1p5 (Entity CRUD and Signal Write API) -- M1 complete
 - **Thesis:** Milestone 1 is done. A developer can open a tidalDB instance, define signal types with decay rates and windows, write 10,000 engagement events, and read back decay-correct scores that match analytical computation to 6 decimal places. Including after a crash. The UAT scenario passes.
 - **Source material:** The m1p5 integration test, benchmark results, git log for the M1 period
 - **When to publish:** The day M1 UAT passes.
 - **Code to include:** The full UAT scenario (or a clean excerpt). `TidalDB::open()` with schema. Signal write. Decay score read. Before/after crash recovery.
 - **Why it matters:** This is the first "it works" post. It converts skeptics from "interesting idea" to "this is real." The UAT code is the proof.
 ---
 ### Milestone 2: Ranked Retrieval
 M2 proves that a single query can retrieve, filter, score, and enforce diversity over live signals. This is where tidalDB stops being a signal engine and starts being a database.
 #### Post 5: "One query. Six systems. Under 50 milliseconds."
 - **Type:** Technical Deep Dive / Announcement
 - **Roadmap phase:** m2p5 (RETRIEVE Query Executor) -- M2 complete
 - **Thesis:** `RETRIEVE items USING PROFILE trending DIVERSITY max_per_creator:1 LIMIT 25` executes in under 50ms on 10K items. It retrieves candidates via ANN, filters by metadata, scores using live decay signals and velocity, enforces diversity, and returns a ranked list. That is what Elasticsearch + Redis + a ranking service produce. It is one query here.
 - **Source material:** m2p5 integration test, benchmark results, the dependency DAG showing how all M2 phases compose
 - **When to publish:** After M2 UAT passes.
 - **Code to include:** The RETRIEVE query. The ranked result with signal snapshots. The trending profile definition. A before/after signal burst showing the ranking change.
 - **Why it matters:** This is the money post. The one-query thesis is no longer a vision document -- it is a benchmark. Engineers who operate the 6-system stack will immediately understand what this eliminates.
 #### Post 6: "Diversity enforcement in 3 microseconds"
 - **Type:** Technical Deep Dive
 - **Roadmap phase:** m2p4 (Diversity Enforcement)
 - **Thesis:** "No more than 2 items per creator" does not belong in your API layer. It belongs in the query. tidalDB enforces diversity as a post-scoring reordering pass -- it does not reduce result count. The greedy selection algorithm runs in under 3 microseconds for 200 candidates.
 - **Source material:** m2p4 task docs, VISION.md (diversity section), benchmark results
 - **When to publish:** After m2p4 is complete.
 - **Code to include:** The DiversitySpec. The greedy selector. A concrete example showing reordering (creator A dominates pre-diversity, balanced post-diversity). Benchmark numbers.
 - **Why it matters:** Every team building a feed implements diversity in the API layer. Showing that it belongs in the database -- and costs 3 microseconds -- is a strong differentiator. This is the kind of post that gets shared in Slack channels.
 #### Post 7: "Ranking profiles are data, not code"
 - **Type:** Architecture Decision Record
 - **Roadmap phase:** m2p3 (Ranking Profile Engine)
 - **Thesis:** Changing how content is ranked should not require a code change, a deployment, or a restart. tidalDB treats ranking profiles as versioned schema declarations. Define a profile. Name it. Swap it at query time. A/B test two profiles by name. The database executes the entire pipeline.
 - **Source material:** m2p3 task docs, API.md (ranking profiles section), VISION.md
 - **When to publish:** After m2p3 is complete.
 - **Code to include:** A `trending` profile definition. A `for_you` profile definition. The same RETRIEVE query with two different profile names producing different orderings. The profile versioning API.
 - **Why it matters:** This reframes ranking as a database concern. Engineers who maintain ranking services as separate microservices will recognize the operational simplification.
 ---
 ### Milestone 3: Personalized Ranking
 M3 is where the feedback loop closes. Signal writes update the user's preference vector, the creator's interaction weight, and the item's signal ledger -- atomically, in one write. The "For You" query works.
 #### Post 8: "The feedback loop that closes in one write"
 - **Type:** Technical Deep Dive
 - **Roadmap phase:** m3p2 (Feedback Loop) completion
 - **Thesis:** When a user likes an item, the database atomically updates: the item's like count, the user-to-creator interaction weight, and the user's preference vector (shifted toward the item's embedding). One `db.signal("like", ...)` call. No Kafka consumer to lag. No feature store to sync. No cache to invalidate. The next ranking query -- even 100ms later -- reflects the change.
 - **Source material:** m3p2 task docs, ARCHITECTURE.md (Write Path section), SEQUENCE.md
 - **When to publish:** After m3p2 passes UAT.
 - **Code to include:** The signal write. The 10-step atomic update path. A before/after query showing the preference shift. The property test that proves hidden items and blocked creators never surface.
 - **Why it matters:** The closed feedback loop is the core architectural thesis of tidalDB. This post proves it works. It is the strongest argument against the 6-system stack, because the stack's primary failure mode is feedback lag.
 #### Post 9: "Negative signals are equal citizens"
 - **Type:** Architecture Decision Record
 - **Roadmap phase:** m3p2 (Feedback Loop)
 - **Thesis:** A skip is not the absence of a like. It is data. tidalDB treats negative signals -- skips, hides, blocks, "not interested" -- with the same precision and immediacy as positive signals. A skip within 3 seconds is a strong quality signal. A hide creates a permanent exclusion. A block removes all of a creator's content from all future queries. These are not afterthoughts. They are first-class signal types with their own decay rates, velocity, and ranking weight.
 - **Source material:** VISION.md (negative signals section), USE_CASES.md (UC-01 feedback), m3p2 task docs
 - **When to publish:** After m3p2 is complete. Can be bundled with or separated from Post 8.
 - **Code to include:** Signal type definitions for skip, hide, block. The penalty clause in a ranking profile. The property test: 10,000 random signal sequences never produce a result where a hidden item or blocked creator appears.
 - **Why it matters:** Most recommendation systems handle negative feedback as an afterthought -- a manual "not interested" button that writes to a separate blocklist. tidalDB's approach is architecturally different and engineers building these systems will recognize the improvement immediately.
 #### Post 10: "Cold start without application logic"
 - **Type:** Technical Deep Dive
 - **Roadmap phase:** m3p3 (Personalized Ranking Profiles)
 - **Thesis:** New items with no signals get an exploration budget. New users with no history get a sensible default from population-level signals. The application does not manage either. The exploration rate decays as signals accumulate. This is declared per ranking profile, not implemented in application code.
 - **Source material:** m3p3 task docs, VISION.md (cold start section)
 - **When to publish:** After m3p3 is complete.
 - **Code to include:** The exploration budget in a profile definition. A new item appearing in a for_you feed despite having zero signals. The decay of exploration as signals arrive.
 - **Why it matters:** Cold start is the problem everyone hacks around and no one solves cleanly. Showing a database-native solution is a strong differentiator.
 ---
 ### Milestone 4: Hybrid Search
 M4 merges full-text search with semantic similarity and signal-ranked results. Search and retrieval become the same system.
 #### Post 11: "Search and ranking are the same system"
 - **Type:** Technical Deep Dive / Announcement
 - **Roadmap phase:** m4p3 (SEARCH Query Executor) -- M4 complete
 - **Thesis:** `SEARCH items QUERY "jazz piano" VECTOR [embedding] FOR USER @user_42 USING PROFILE search LIMIT 20` combines BM25 text relevance, semantic vector similarity, and user personalization in one ranked list. The fusion uses Reciprocal Rank Fusion. Personalization re-ranks within the relevant set -- an irrelevant result never surfaces because the user likes the creator. This is one query. It replaces Elasticsearch + a vector DB + a ranking service.
 - **Source material:** m4p3 integration test, docs/research/tantivy.md, ARCHITECTURE.md (Text Search, Hybrid Fusion)
 - **When to publish:** After M4 UAT passes.
 - **Code to include:** The SEARCH query. The RRF formula. A comparison: the same query with BM25 only, ANN only, and fused. The personalization overlay changing result order for two different users.
 - **Why it matters:** Search is the most complex surface and the one engineers know best. Showing that text search, semantic search, and ranking collapse into one query is the most concrete demonstration of the 6-to-1 thesis.
 #### Post 12: "Tantivy as a derived index, not a source of truth"
 - **Type:** Architecture Decision Record
 - **Roadmap phase:** m4p1 (Tantivy Integration)
 - **Thesis:** The entity store is the source of truth. Tantivy is a materialized view. If the index is corrupted, it can be rebuilt from the entity store. Crash recovery replays from a stored sequence number. Consistency is DB-primary, not two-phase commit. This is simpler, deterministic, and the right model for an embedded database.
 - **Source material:** docs/research/tantivy.md, m4p1 task docs, ARCHITECTURE.md
 - **When to publish:** After m4p1 is complete.
 - **Code to include:** The outbox pattern. The crash recovery sequence number. The background indexer. The consistency model.
 - **Why it matters:** This is a useful architectural pattern beyond tidalDB. Engineers building systems with derived indexes will find this directly applicable.
 ---
 ### Milestone 5: Full Surface Coverage
 M5 completes all 14 use cases. The content here shifts from "how does the engine work" to "what can you build with it."
 #### Post 13: "14 use cases, one query engine"
 - **Type:** Devlog / Announcement
 - **Roadmap phase:** M5 complete
 - **Thesis:** For You feeds, trending, search, following, related content, notifications, hidden gems, controversial, live content, creator discovery, user library, cohort-scoped trending -- every surface a content platform needs, driven by the same query primitives. The application specifies profiles, filters, and context. The database executes ranking.
 - **Source material:** USE_CASES.md, M5 UAT results
 - **When to publish:** After M5 UAT passes.
 - **Code to include:** A curated selection of 4-5 queries spanning different surfaces (for_you, trending, search, hidden_gems, cohort_trending). Each with a brief setup and result.
 - **Why it matters:** This is the completeness post. It demonstrates that the database is not a toy or a prototype -- it handles the full surface area of a real content platform.
 #### Post 14: "Cohort-scoped trending: what is hot for people like you"
 - **Type:** Technical Deep Dive
 - **Roadmap phase:** M5, likely Phase 3 (Social Graph and Collaborative Filtering)
 - **Thesis:** "What's trending" means different things to different audiences. A 22-year-old in Tokyo and a 45-year-old in Texas see different trending pages -- not because of personalization (individual preference), but because different content is genuinely trending within their respective audience segments. tidalDB maintains per-cohort signal aggregation using RoaringBitmaps for O(1) membership testing and sparse fan-out for storage efficiency.
 - **Source material:** USE_CASES.md (UC-15), ARCHITECTURE.md (Cohort-scoped aggregation), API.md (Cohort Definitions)
 - **When to publish:** After cohort-scoped trending passes integration tests.
 - **Code to include:** Cohort definition. Three-layer query (global trending, cohort trending, search within cohort trending). The fan-out write path. Storage cost analysis.
 - **Why it matters:** Cohort-scoped trending is a differentiator. Most systems compute trending globally. Slicing by audience segment is a product feature that usually requires a separate analytics pipeline. tidalDB does it natively.
 ---
 ### Milestone 6: Production Hardening
 M6 is about trust. The content shifts from "what it does" to "why you can trust it."
 #### Post 15: "Kill it at any point. It comes back correct."
 - **Type:** Technical Deep Dive
 - **Roadmap phase:** m6p1 (Crash Recovery Hardening)
 - **Thesis:** We injected faults at every write-path stage. Recovery time is under 30 seconds at 1M items. WAL replay produces state identical to pre-crash. No phantom items, no lost signals, no inconsistent aggregates. The WAL is the source of truth. Everything else is derived state that can be rebuilt.
 - **Source material:** m6p1 test results, fault injection methodology
 - **When to publish:** After m6p1 passes.
 - **Code to include:** The crash simulation test. Recovery time measurements. The WAL checkpoint and replay sequence.
 - **Why it matters:** Trust is the precondition for adoption. Engineers will not embed a database they cannot crash-test. This post is the trust credential.
 #### Post 16: "Graceful degradation: less precise, never wrong"
 - **Type:** Architecture Decision Record
 - **Roadmap phase:** m6p2 (Graceful Degradation)
 - **Thesis:** Under 3x overload, tidalDB does not return errors. It reduces candidate set size, uses coarser aggregates, skips diversity enforcement, and serves from materialized cache -- in that order. Results are less precise but never incorrect. The degradation order is documented and configurable.
 - **Source material:** m6p2 task docs, ARCHITECTURE.md (Graceful degradation)
 - **When to publish:** After m6p2 is complete.
 - **Code to include:** The degradation cascade. Load test results at 1x, 2x, 3x. Latency distribution at each level.
 - **Why it matters:** This is how production systems should behave. Engineers who have been paged for "ranking service returned 500" will appreciate a system that degrades gracefully instead.
 ---
 ### Ongoing / Anytime
 These posts are not tied to specific milestones. They can be written whenever the insight is clear.
 #### "Why not SQL"
 - **Type:** Architecture Decision Record
 - **Thesis:** The custom query language exists because SQL cannot express ranking semantics without losing optimization opportunities. `FOR USER` means "load this user's preference vector and relationship graph." `USING PROFILE` means "apply this named scoring function." `DIVERSITY` means "enforce post-ranking constraints." These are not WHERE clauses.
 - **Source material:** thoughts.md (Part II.4), VISION.md (query examples), API.md
 - **When to publish:** Any time after M1. Best paired with M2 when the RETRIEVE query is functional.
 #### "Why we chose fjall over RocksDB (for now)"
 - **Type:** Architecture Decision Record
 - **Thesis:** Pure Rust, `#![forbid(unsafe_code)]`, fast compile times, trait-abstracted for swap. fjall is not the fastest LSM-tree. It is the right one for an embeddable database built by a small team that values correctness over raw throughput, with a trait boundary that makes the decision reversible.
 - **Source material:** thoughts.md (Part V.9), m1p3 task docs, CODING_GUIDELINES.md
 - **When to publish:** After m1p3 is complete (already shipped). This post is ready now.
 #### "USearch, not from scratch"
 - **Type:** Architecture Decision Record
 - **Thesis:** Correct, high-performance, concurrent HNSW with SIMD distance computation is 6-12 months of dedicated work. We are not a vector database company. USearch runs in ScyllaDB, ClickHouse, and DuckDB. The FFI boundary is thin. Build what differentiates you. Borrow what does not.
 - **Source material:** docs/research/ann_for_tidaldb.md, m2p1 task docs, ARCHITECTURE.md (Vector Index)
 - **When to publish:** After m2p1 (USearch integration) is complete.
 ---
 ## Post Cadence
 | Milestone | Posts | Approximate Pace |
 |-----------|-------|-----------------|
 | Pre-implementation | 1 | Publish when ready |
 | M1 (Signal Engine) | 2-3 | One per phase completion |
 | M2 (Ranked Retrieval) | 3 | One per major phase |
 | M3 (Personalized Ranking) | 2-3 | One per key insight |
 | M4 (Hybrid Search) | 2 | One per major phase |
 | M5 (Full Coverage) | 2 | At milestone boundaries |
 | M6 (Production Hardening) | 2 | At milestone boundaries |
 | Ongoing / ADRs | 2-3 | When the decision is fresh |
 **Target: 16-20 posts across the full roadmap.** Not more. Each one earns its place.
 ---
 ## What Not to Write
 - Progress updates that are changelogs. ("We merged 47 PRs this month.") Nobody cares.
 - Posts that announce intent without shipped code. ("We plan to build...") Ship first.
 - Posts with titles that are labels. ("Q1 Update," "Phase 3 Complete.") The title is the thesis.
 - Posts that explain what a concept is without showing why the reader should care. ("Windowed aggregation is...") Start with the problem.
 - Posts that use "we're excited to announce." You are not excited. You are precise.
 ---
 ## Reference: Roadmap to Post Mapping
 | Roadmap Phase | Post # | Title (Working) |
 |---------------|--------|-----------------|
 | Pre-implementation | 1 | Every content platform builds the same 6 systems from scratch |
 | m1p1-m1p3 (Foundation) | 3 | What three databases taught us before we wrote a line of code |
 | m1p4 (Signal Ledger) | 2 | Running decay scores are O(1) -- here is the math |
 | m1p5 (M1 Complete) | 4 | Signals wrote 100ms ago. The query sees them now. |
 | m2p3 (Ranking Profiles) | 7 | Ranking profiles are data, not code |
 | m2p4 (Diversity) | 6 | Diversity enforcement in 3 microseconds |
 | m2p5 (M2 Complete) | 5 | One query. Six systems. Under 50 milliseconds. |
 | m3p2 (Feedback Loop) | 8, 9 | The feedback loop that closes in one write / Negative signals are equal citizens |
 | m3p3 (Personalized Profiles) | 10 | Cold start without application logic |
 | m4p1 (Tantivy) | 12 | Tantivy as a derived index, not a source of truth |
 | m4p3 (M4 Complete) | 11 | Search and ranking are the same system |
 | M5 Complete | 13, 14 | 14 use cases, one query engine / Cohort-scoped trending |
 | m6p1 (Crash Recovery) | 15 | Kill it at any point. It comes back correct. |
 | m6p2 (Graceful Degradation) | 16 | Graceful degradation: less precise, never wrong |
 | Any time | -- | Why not SQL / Why fjall / USearch, not from scratch |
 ---
 ## Immediate Next Actions
 1. **Write Post 1** ("Every content platform builds the same 6 systems from scratch") -- this can be published now. It establishes the problem and the audience. It does not depend on shipped code.
 2. **Write Post 3** ("What three databases taught us") -- m1p1 through m1p3 are complete. The source material (thoughts.md) is rich. The code exists.
 3. **Prepare Post 2 outline** ("Running decay scores are O(1)") -- the research doc exists, the math is decided, but the implementation is not yet shipped (m1p4 is next). Write the outline. Wait for the benchmarks.
--- a/docs/personal-briefing-beachhead.md
+++ b/docs/personal-briefing-beachhead.md
@ -0,0 +1,259 @@
 # Use Case: Personal Briefing Feed (Knowledge Workers + Consumers)
 **Date:** 2026-02-21  
 **Author:** @tidal-visionary  
 **Status:** Proposed beachhead
 ---
 ## 1. One-Line Definition
 A daily and in-the-moment briefing feed that ranks what matters most for a person right now, adapts immediately to lightweight feedback, and explains why each item is shown.
 ---
 ## 2. The User (Not Developers)
 ### Primary Persona
 **Information-overloaded decision maker**
 - Works in product, strategy, operations, media, investing, policy, or as a highly engaged consumer.
 - Consumes content to make decisions, not just to be entertained.
 - Feels overwhelmed by tabs, newsletters, feeds, podcasts, and chatbots.
 - Wants control over what appears (`more`, `less`, `hide`, `mute`) without heavy setup.
 - Expects trust signals and source quality, not clickbait recirculation.
 ### Secondary Persona
 **Curious consumer with intent**
 - Follows multiple topics (career, health, finance, AI, hobbies).
 - Wants a short, high-value briefing instead of infinite scroll.
 - Will use the product if value appears on day 1 with minimal configuration.
 ---
 ## 3. User Job To Be Done
 ### Functional Job
 "Help me understand what matters now in my domains without spending an hour hunting."
 ### Emotional Job
 "Make me feel informed and in control, not behind and overwhelmed."
 ### Social Job
 "Help me sound current and prepared in meetings and conversations."
 ---
 ## 4. Two-Sentence Hook
 Every morning, get a briefing ranked for what actually matters to you right now, with clear "why this" reasons and no endless scrolling.  
 Tap `more`, `less`, or `hide` once, and the next refresh immediately adapts so your feed gets smarter in minutes, not weeks.
 ---
 ## 5. End-to-End User Experience
 ### 5.1 Day-0 to Day-1
 1. User picks 5-10 interests, desired depth, and hard excludes.
 2. User chooses time budget (`5 min`, `10 min`, `20 min`) and preferred formats.
 3. Product delivers first `Today Brief` with 10-20 ranked items.
 4. Each card shows a short explanation:
   - "Trending in your cohort"
   - "Matches your finance + AI priority"
   - "New source for exploration"
 5. User gives 3-5 feedback actions (`more`, `less`, `hide topic`, `mute source`, `save`).
 6. Feed refresh shows immediate adaptation.
 ### 5.2 Daily Loop
 1. Morning brief arrives via app/email/push.
 2. User scans top cards quickly.
 3. User opens selected cards for deeper summary/source context.
 4. User gives lightweight feedback.
 5. System updates ranking immediately for next retrieval.
 6. Midday and evening updates are optional and scoped by time budget.
 ### 5.3 Session-Aware Interaction
 1. User asks: "Only show items relevant to this week's strategy memo."
 2. Session context constrains ranking while preserving global user profile.
 3. Session expires or is closed; long-term profile remains stable.
 ---
 ## 6. Pressure Test (From the User Point of View)
 ### 6.1 Test Method
 Pressure test assumes a skeptical user comparing against current habits:
 - Existing feeds (X, LinkedIn, YouTube, Reddit, news apps)
 - Newsletters and podcasts
 - General AI assistants
 The product only wins if the user can feel practical value within 1-3 sessions.
 ### 6.2 Core User Questions and Required Answers
 | User Question | User Risk | Product Must Prove |
 |---|---|---|
 | "Why not just use my current feeds?" | No reason to switch | Better relevance + less noise + explicit control in one place |
 | "Will this take too much setup?" | Onboarding drop-off | First useful briefing in < 3 minutes |
 | "Can I trust this?" | Wrong or low-quality items | Clear source quality signals + transparent "why this" |
 | "Will it trap me in a bubble?" | Repetitive narrow feed | Enforced diversity + exploration budget |
 | "If I say less of this, does it actually change?" | Learned helplessness | Visible adaptation on next refresh |
 | "Does it respect my time?" | Fatigue and churn | Time-budget mode with 5-minute high-value brief |
 | "Can I safely use this for work decisions?" | Reputation risk | Freshness guarantees, quality gates, and easy source verification |
 | "Is my data private?" | Trust barrier | Explicit controls for retention, session scope, and deletion |
 ### 6.3 Failure Modes That Kill Adoption
 | Failure Mode | User Reaction | Severity |
 |---|---|---|
 | Feed still noisy after 2 days | "This is another feed app." | Critical |
 | Feedback actions appear ignored | "I have no control." | Critical |
 | Explanations are generic or fake | "This is smoke and mirrors." | High |
 | One source dominates repeatedly | "Biased and boring." | High |
 | Important updates are stale | "I cannot rely on this." | High |
 | Too many notifications | "Annoying, uninstall." | Medium |
 | Onboarding asks too much | "Not worth it." | Medium |
 ### 6.4 Pass Criteria (User-Perceived)
 1. User can identify at least 3 briefing items as "actually useful" in first session.
 2. User sees obvious feed adaptation after 3 feedback actions.
 3. User can explain why each top card appeared using the provided reason labels.
 4. User does not feel forced to scroll endlessly; time-budget mode feels real.
 5. User returns on Day 2 without a reminder from support or onboarding prompts.
 ---
 ## 7. Why This Beachhead Fits tidalDB
 This use case directly exercises tidalDB primitives without requiring a broad horizontal platform story on day 1:
 - **Signals:** real-time positive and negative feedback.
 - **Ranking profiles:** configurable briefing logic by context and time budget.
 - **Diversity:** hard constraints to avoid source/topic over-concentration.
 - **Cohorts:** "trending for people like me" layer.
 - **Sessions:** short-lived task context (memo prep, market scan, exam prep).
 - **Closed feedback loop:** next retrieval reflects feedback immediately.
 This is materially different from search-first systems and generic chat assistants because ranking quality improves through continuous, explicit user control.
 ---
 ## 8. Product Requirements (User-First)
 ### 8.1 Must-Have V1 Capabilities
 1. `Today Brief` ranking with clear reasons.
 2. Lightweight controls: `more`, `less`, `hide topic`, `mute source`, `save`.
 3. Immediate feedback reflection on next refresh.
 4. Time-budget view (`5/10/20` minute mode).
 5. Diversity constraints for source and topic spread.
 6. Baseline cohort view: "trending for people like you."
 7. Source transparency and one-tap source access.
 ### 8.2 Should-Have V1.5 Capabilities
 1. Session-scoped task mode ("for this meeting only").
 2. Morning/midday/evening briefing cadence controls.
 3. Digest email + mobile app parity.
 4. Credibility filters (verified sources, quality thresholds).
 ### 8.3 Explicit Non-Goals for Beachhead
 1. Building a developer platform first.
 2. Full social graph product.
 3. Monetization optimization surfaces.
 4. End-to-end enterprise admin suite in V1.
 ---
 ## 9. User-Facing Metrics
 ### 9.1 Activation
 1. First briefing completion rate.
 2. Median time to first "useful item saved."
 3. Feedback action rate in first session.
 ### 9.2 Retention
 1. D1, D7, D30 return rate.
 2. Average sessions per active day.
 3. Percentage of users using briefing at least 4 days per week.
 ### 9.3 Quality
 1. "Useful item rate" per session.
 2. Repeated-unwanted-item rate after negative feedback.
 3. Diversity score by source/topic in top 10 results.
 4. Freshness score for time-sensitive domains.
 ### 9.4 Trust
 1. Explanation usefulness rating.
 2. Source credibility acceptance rate.
 3. Reported "feed felt biased/repetitive" rate.
 ---
 ## 10. Launch Plan (Beachhead Scope)
 ### Phase A: Concierge Pilot (20-50 users)
 1. Target one segment: strategy/product/analyst professionals.
 2. Run daily brief with strong manual QA on source quality and reasons.
 3. Capture explicit user interview feedback after each week.
 ### Phase B: Productized Beta (200-500 users)
 1. Self-serve onboarding under 3 minutes.
 2. Reliable immediate feedback loop.
 3. Basic cohort view and time-budget mode.
 ### Phase C: Scaled Consumer Entry
 1. Multi-domain templates (finance, tech, health, creator economy).
 2. Push/email cadence personalization.
 3. Quality and trust controls as default UX, not advanced settings.
 ---
 ## 11. Strategic Risks and Mitigations
 | Risk | Impact | Mitigation |
 |---|---|---|
 | Trying to solve too many surfaces at once | Slow execution, weak product feel | Ship one briefing surface first |
 | Over-personalization creates bubble | Reduced discovery trust | Enforce exploration/diversity budgets |
 | Weak source quality gates | Credibility collapse | Add quality floor and transparent sourcing |
 | Slow adaptation to user feedback | Perceived irrelevance | Prioritize immediate write-to-read reflection |
 | Too much AI summary, not enough evidence | Trust erosion | Keep source links and quote-backed rationale visible |
 ---
 ## 12. Kill Criteria (Be Honest Early)
 Stop or pivot if any of the following remain true after two iteration cycles:
 1. D7 retention remains below acceptable threshold for target segment.
 2. Users do not perceive adaptation after direct feedback actions.
 3. "Useful item rate" fails to outperform a simple baseline feed.
 4. User interviews repeatedly describe the product as "another noisy feed."
 ---
 ## 13. Decision
 This is the right forward-looking beachhead for tidalDB if the goal is knowledge workers and consumers rather than developers.
 It is narrow enough to ship, painful enough to matter, and aligned with tidalDB's actual architectural advantage: real-time, feedback-aware ranking with explicit user control and transparent reasons.
--- a/docs/planning/PRODUCT_ROADMAP.md
+++ b/docs/planning/PRODUCT_ROADMAP.md
@ -0,0 +1,217 @@
 # Product Roadmap: Personal Briefing Feed
 **Date:** 2026-02-21  
 **Owner:** Product track (knowledge workers + consumers)  
 **Status:** Draft for execution
 ---
 ## Vision
 Ship a daily briefing product that helps users identify what matters now, adapt the feed instantly with lightweight feedback, and trust the results enough to return habitually.
 ## Product Thesis
 People do not need another infinite feed. They need a controllable, high-signal briefing that respects time, explains relevance, and improves immediately from feedback.
 ---
 ## Milestone Summary
 | # | Name | Proves | Depends On |
 |---|------|--------|------------|
 | P0 | Beachhead Validation | Users care enough to return for a daily briefing | M0 + partial M1 |
 | P1 | Concierge Alpha | Briefing usefulness and immediate adaptation in a narrow segment | M1 + partial M2 |
 | PG1 | Personalization Core Done (Blocking Gate) | Core personalization loop is correct, immediate, and meaningfully better than baseline | P1 + M1/M2/M3 core slices |
 | P2 | Productized Beta | Self-serve onboarding, trust UX, and repeatability without manual curation | M2 + partial M3 |
 | P3 | Public Launch | Reliability, quality floor, and trust controls at public volume | M3 + M5 core + M6 partial |
 | P4 | Scale + Revenue Fit | Sustainable growth and monetization without quality collapse | M6 + M7 |
 ---
 ## Current Status
 | Phase | Status |
 |-------|--------|
 | p0: Beachhead Validation | NOT STARTED |
 | p1: Concierge Alpha | NOT STARTED |
 | pg1: Personalization Core Done gate | NOT STARTED |
 | p2: Productized Beta | NOT STARTED |
 | p3: Public Launch | NOT STARTED |
 | p4: Scale + Revenue Fit | NOT STARTED |
 **Current phase:** p0 (Beachhead Validation)
 ---
 ## Milestone P0: Beachhead Validation
 ### Milestone Thesis
 Validate that a personal briefing feed solves a painful daily job for target users and creates early repeat usage.
 ### Done When
 - 20-50 users recruited in target segment.
 - Daily briefing prototype used for 2+ weeks.
 - Median user gives at least one feedback action per session.
 - Interview evidence confirms value over baseline feeds.
 - D2 and D7 retention meet agreed threshold.
 ### Phase Plan
 - Phase 1: Target Segment and Research Ops
 - Phase 2: Concierge Briefing Loop
 - Phase 3: Signal and Retention Evaluation
 ---
 ## Milestone P1: Concierge Alpha
 ### Milestone Thesis
 Deliver a high-value daily `Today Brief` with transparent reasons and immediate feedback reflection for a narrow cohort.
 ### Done When
 - Ranked briefing cards with source links and reason labels are live.
 - `more/less/hide/mute/save` controls are available and used.
 - Next refresh visibly reflects negative feedback.
 - Time-budget mode (`5/10/20`) is used by pilot users.
 - Weekly active usage indicates repeated utility.
 ### Phase Plan
 - Phase 1: Briefing UX and Reason Labels
 - Phase 2: Feedback Controls and Real-Time Adaptation
 - Phase 3: Quality and Diversity Guardrails
 ---
 ## Milestone P2: Productized Beta
 ### Milestone Thesis
 Turn concierge alpha into a self-serve product with stable onboarding and trust-preserving defaults.
 ### Done When
 - Onboarding can be completed in under 3 minutes.
 - Cohort layer (`trending for people like you`) is available.
 - Explanations are clear and consistent per card.
 - D7 retention and useful-item rate beat baseline.
 - Manual curation is no longer required for normal operation.
 ### Blocking Prerequisite
 P2 cannot start until **PG1 Personalization Core Done** passes.
 ### Phase Plan
 - Phase 1: Self-Serve Onboarding and Profile Bootstrapping
 - Phase 2: Cohort and Context Views
 - Phase 3: Trust Controls and Preference Persistence
 ---
 ## Milestone P3: Public Launch
 ### Milestone Thesis
 Launch publicly with reliability and trust controls suitable for broader consumer and knowledge-worker usage.
 ### Done When
 - Reliability and latency SLOs for briefing generation are met.
 - Freshness, duplicate suppression, and source quality floor are enforced.
 - Notification cadence controls prevent fatigue.
 - Product support and incident playbook are active.
 ### Phase Plan
 - Phase 1: Reliability and SLO Enforcement
 - Phase 2: Operational Quality Controls
 - Phase 3: Launch Readiness and Support Ops
 ---
 ## Milestone P4: Scale + Revenue Fit
 ### Milestone Thesis
 Scale the product and validate monetization while preserving user trust and briefing quality.
 ### Done When
 - Monetization model validated.
 - Revenue and quality metrics monitored together.
 - Retention remains stable as volume increases.
 - Next segment expansion is backed by product data.
 ### Phase Plan
 - Phase 1: Monetization Experiments
 - Phase 2: Quality-Protecting Growth
 - Phase 3: Expansion Strategy
 ---
 ## Milestone PG1: Personalization Core Done (Blocking Gate)
 ### Milestone Thesis
 Before expanding product breadth, prove the personalization loop is technically correct, immediately responsive, and materially useful to users.
 ### Must Pass (All Required)
 - **Hard negatives are exact:** `hide/mute/block` items and creators never leak after write, restart, or replay.
 - **Immediate adaptation is real:** `more/less/skip/save` actions change next-refresh ranking within target latency budget.
 - **Replay correctness:** user personalization state (seen-state, preference shifts, relationship weights) rebuilds deterministically from checkpoint + WAL replay.
 - **Quality uplift vs baseline:** useful-item rate and repeated-unwanted-item rate beat a non-personalized baseline feed.
 - **Diversity with personalization:** relevance holds while source/topic domination stays within guardrails.
 ### Why This Gate Exists
 Without this gate, roadmap execution drifts into breadth (more surfaces, more modes) before the core personalization promise is trustworthy.
 ---
 ## Product Metrics
 ### Activation
 - First briefing completion rate
 - Time to first useful item saved
 - Feedback action rate in first session
 ### Retention
 - D1/D7/D30 retention
 - Sessions per active user per week
 - Percentage of users with 4+ briefing days per week
 ### Quality and Trust
 - Useful-item rate per briefing
 - Repeated-unwanted-item rate after negative feedback
 - Diversity score in top 10 items
 - Explanation usefulness rating
 ---
 ## Dependencies on Engine Track
 | Product Capability | Primary Engine Dependencies |
 |--------------------|-----------------------------|
 | Real-time adaptation from feedback | M1 Signal Engine, M3 Feedback Loop |
 | Cohort briefing layer | M3 User model, M6 cohort surface support |
 | Search-within-briefing | M5 Hybrid Search |
 | Public reliability | M7 Production Hardening |
 ---
 ## Planning References
 - Use case: `docs/personal-briefing-beachhead.md`
 - Engine roadmap: `docs/planning/ROADMAP.md`
 - Product milestone execution: `docs/planning/milestone-p/`
--- a/docs/planning/ROADMAP.md
+++ b/docs/planning/ROADMAP.md
@ -14,12 +14,30 @@ A single embeddable database can replace the 6-system content ranking stack by t
 | # | Name | Proves | Enables |
 |---|------|--------|---------|
 | M0 | Embeddable Runtime | tidalDB can run in-process with zero-config defaults and tooling | Cuts proof-of-concept friction, enables internal dogfooding |
 | M1 | Signal Engine | Signals are a database primitive with O(1) decay, not application math | UC-03 (partial), UC-06 (partial), UC-14 (partial) |
 | M2 | Ranked Retrieval | A single query retrieves, scores, and ranks content using live signals | UC-03, UC-04, UC-06, UC-08, UC-13, UC-14 |
 | M3 | Personalized Ranking | User context shapes retrieval and ranking -- the "For You" query works | UC-01, UC-05, UC-07, UC-09 (partial) |
-| M4 | Hybrid Search | Text + semantic + signal-ranked search in one query | UC-02, UC-10, UC-11 |
+| M4 | Agent Memory | Agents can create sessions, write signals, and enforce policy inside tidalDB | Agent-mediated personalization, RLHF loops, conversational memory |
-| M5 | Full Surface Coverage | Every use case, every sort mode, every filter, every feedback loop | UC-01 through UC-14 complete |
+| M5 | Hybrid Search | Text + semantic + signal-ranked search in one query | UC-02, UC-10, UC-11 |
-| M6 | Production Hardening | Crash safety, graceful degradation, operational readiness | All UCs at production quality |
+| M6 | Full Surface Coverage | Every use case, every sort mode, every filter, every feedback loop | UC-01 through UC-14 complete |
 | M7 | Production Hardening | Crash safety, graceful degradation, operational readiness | All UCs at production quality |
 ### Product Milestone Summary (New)
 The roadmap now has two tracks:
 - **Engine Track (M0-M7):** proves tidalDB capabilities.
 - **Product Track (P0-P4):** proves end-user value for the beachhead product.
 | # | Name | Proves | Depends On |
 |---|------|--------|------------|
 | P0 | Beachhead Validation | Knowledge workers and consumers care about a personal briefing feed enough to use it repeatedly | M0 (embedding/runtime), partial M1 |
 | P1 | Concierge Alpha | Daily "Today Brief" with explicit feedback controls creates Day-2 retention in a small cohort | M1 complete, partial M2 |
 | PG1 | Personalization Core Done (Blocking Gate) | Personalization loop is correct, immediate, and measurably better than baseline | P1 + M1/M2/M3 core slices |
 | P2 | Productized Beta | Self-serve onboarding + real-time adaptation + explanation UX works without manual curation | M2 complete, partial M3 |
 | P3 | Public Launch | The product is reliable, useful, and trusted at real user volume | M3 + M5 core, M6 partial |
 | P4 | Scale + Revenue Fit | Sustainable retention and monetization without quality collapse | M6 + M7 |
 ---
@ -27,13 +45,22 @@ A single embeddable database can replace the 6-system content ranking stack by t
 | Phase | Status | Tests |
 |-------|--------|-------|
 | **m0p1: Embeddable Runtime Skeleton** | COMPLETE | 329 passing (293 unit + 36 integration + 3 doc) |
 | **m0p2: Tooling & Diagnostics** | COMPLETE | 349 passing (+7 metrics unit + 7 metrics integration + 9 tidalctl CLI) |
 | m0p3: Samples & Docs | NOT STARTED | -- |
 | **m1p1: Core Type System and Schema** | COMPLETE | 77 passing |
 | **m1p2: Write-Ahead Log** | COMPLETE | passing (unit + integration) |
 | **m1p3: Storage Engine Trait and fjall Backend** | COMPLETE | 140 passing (128 unit + 12 integration) |
 | m1p4: Signal Ledger | NOT STARTED | -- |
 | m1p5: Entity CRUD and Signal Write API | NOT STARTED | -- |
 | P0: Beachhead Validation | NOT STARTED | -- |
 | P1: Concierge Alpha | NOT STARTED | -- |
 | PG1: Personalization Core Done gate | NOT STARTED | -- |
 | P2: Productized Beta | NOT STARTED | -- |
 | P3: Public Launch | NOT STARTED | -- |
 | P4: Scale + Revenue Fit | NOT STARTED | -- |
-**Current phase:** m1p4 (Signal Ledger) is next. m1p2 and m1p3 are complete, unblocking m1p4.
+**Current phase:** m0p2 (Tooling & Diagnostics) or m1p4 (Signal Ledger) — m0p1 unblocks m0p2; m1p2 and m1p3 unblock m1p4.
 **Lessons learned:**
 - m1p3 keyspaces are organized per `EntityKind` ("items", "users", "creators"), not by data category. The `Tag` enum in key encoding provides the data-category namespace within each entity-kind keyspace.
@ -42,6 +69,146 @@ A single embeddable database can replace the 6-system content ranking stack by t
 ---
 ## Product Track: Personal Briefing Feed (Knowledge Workers + Consumers)
 This track defines the milestones for the **actual product experience** (not only the database engine).  
 Use case reference: `docs/personal-briefing-beachhead.md`.
 Dedicated roadmap: `docs/planning/PRODUCT_ROADMAP.md`.
 ### P0: Beachhead Validation -- "Do users care enough to return?"
 **Milestone Thesis**
 Validate that a personal briefing feed solves a painful daily job for users and drives repeat use.
 **Acceptance Criteria**
 - [ ] Recruit 20-50 target users (knowledge workers + high-intent consumers).
 - [ ] Run daily briefing prototype (can include manual source QA).
 - [ ] At least one meaningful feedback action per session for the median user (`more`, `less`, `hide`, `mute`, `save`).
 - [ ] User interviews confirm value vs baseline feeds ("less noise", "more useful", "saves time").
 - [ ] D2 retention reaches agreed threshold for target segment.
 ### P1: Concierge Alpha -- "High-value daily brief for a narrow cohort"
 **Milestone Thesis**
 Deliver a reliable daily `Today Brief` experience with immediate visible adaptation after user feedback.
 **Acceptance Criteria**
 - [ ] App surface: ranked brief, reason labels, source links, save/feedback controls.
 - [ ] Feedback loop: next refresh reflects `less/hide/mute` actions immediately.
 - [ ] Time-budget mode (`5/10/20` min) is available and used.
 - [ ] Diversity constraints prevent source/topic domination in top results.
 - [ ] Weekly active usage demonstrates repeated utility.
 ### P2: Productized Beta -- "Self-serve and repeatable without handholding"
 **Milestone Thesis**
 Turn the alpha into a self-serve product with stable onboarding, trust UX, and measurable quality.
 **Acceptance Criteria**
 - [ ] Self-serve onboarding completed in under 3 minutes.
 - [ ] "Why this" explanations are present and understandable on every briefing card.
 - [ ] Cohort layer available ("trending for people like you").
 - [ ] Trust controls available (source transparency, mute/hide persistence).
 - [ ] D7 retention and "useful item rate" exceed baseline comparison feed.
 - [ ] **PG1 Personalization Core Done gate has passed.**
 ### P3: Public Launch -- "Trusted at real volume"
 **Milestone Thesis**
 Launch publicly with reliability, quality, and trust guardrails suitable for broad use.
 **Acceptance Criteria**
 - [ ] Reliability and latency SLOs defined and met for briefing generation.
 - [ ] Quality floor enforced (freshness, source quality, duplicate suppression).
 - [ ] Notification cadence controls prevent spam.
 - [ ] Core support and incident process in place for user-facing regressions.
 ### P4: Scale + Revenue Fit -- "Sustainable business without degrading quality"
 **Milestone Thesis**
 Prove the product can grow and monetize while preserving user trust and briefing quality.
 **Acceptance Criteria**
 - [ ] Monetization model validated (subscription, team plan, or equivalent).
 - [ ] Revenue metrics tracked alongside quality metrics (no quality-revenue trade-off regressions).
 - [ ] Retention and engagement remain stable as volume increases.
 - [ ] Product roadmap for next segment expansion is data-backed.
 ### PG1: Personalization Core Done (Blocking Gate)
 **Milestone Thesis**
 Before product breadth expansion, the core personalization loop must be provably correct and immediately responsive.
 **Acceptance Criteria**
 - [ ] Hard negatives (`hide/mute/block`) never leak after write, restart, or replay.
 - [ ] Explicit feedback (`more/less/skip/save`) changes next-refresh ranking within target latency.
 - [ ] User personalization state rebuilds deterministically from checkpoint + WAL replay.
 - [ ] Useful-item rate and repeated-unwanted-item rate outperform a non-personalized baseline.
 - [ ] Diversity guardrails hold while maintaining personalization quality.
 ---
 ## Milestone 0: Embeddable Runtime -- "Runs in your process in minutes"
 ### Milestone Thesis
 Before we prove any ranking math, developers must be able to embed tidalDB inside an existing service with zero operational prep. M0 delivers the runtime glue — an ergonomic builder API, deterministic storage layout, a tiny admin CLI, and living examples — so the very first experience is `cargo add tidaldb`, `TidalDb::builder().in_memory().open()`, and a passing smoke test.
 ### Phases
 #### Phase 1: Embeddable Runtime Skeleton
 **Delivers:** A cohesive `Config`/`Builder` API for single-process use, with in-memory and filesystem-backed defaults, sandboxed data directories, and graceful shutdown hooks developers can call from tests or application drop handlers.
 - Builder exposes `ephemeral()` / `single_process()` shortcuts and eagerly validates directories.
 - Shutdown hooks drain WAL writer threads and surface errors.
 - Temp-directory helper guarantees deterministic cleanup (used in doctests).
 #### Phase 2: Tooling & Diagnostics
 **Delivers:** `tidalctl` (a minimal CLI) for inspecting embedded instances, plus a lightweight metrics surface (Prometheus text or JSON) tagged with the same IDs future distributed deployments will use.
 - `tidalctl status --path <dir>` returns JSON with WAL seq, config hash, uptime.
 - Metrics endpoint optional (disabled by default) exposes `/metrics` and `/healthz`.
 - Tooling reuses the same path helpers from Phase 1.
 #### Phase 3: Samples & Docs
 **Delivers:** Quick-start samples (For You POC + integration tests) compiled as doctests, and reference snippets for embedding tidalDB inside Axum/Actix or a CLI app. Keeps DX in lockstep with the runtime.
 - Quickstart example + doctest run under CI (`cargo test --doc --examples`).
 - Axum/Actix embedding examples include graceful shutdown + metrics wiring.
 - CONTRIBUTING updated with “run samples” checklist.
 ### UAT Scenario
 ```
 Given:
  // in tests/lib.rs
  let db = TidalDb::builder()
      .ephemeral()
      .with_temp_dir()
      .open()
      .unwrap();
 When:
  db.health_check();           // ok
  tidalctl status --path <dir> // prints WAL, storage, signal counts
  cargo test --doc             // quick-start snippet compiles & runs
 Then:
  - Builder defaults require zero manual config
  - CLI connects to the same files used by the embedded process
  - Samples stay in sync (failing doctest fails CI)
 ```
 ---
 ## Milestone 1: Signal Engine -- "Signals are a database primitive"
 ### Milestone Thesis
@ -586,13 +753,13 @@ Then:
 ### Deferred to Later Milestones
- **SEARCH query with personalization** -- deferred to M4; M3 proves personalized RETRIEVE
+- **SEARCH query with personalization** -- deferred to M5; M3 proves personalized RETRIEVE
- **Tantivy integration** -- deferred to M4
+- **Tantivy integration** -- deferred to M5
- **People/creator search (UC-10)** -- deferred to M4
+- **People/creator search (UC-10)** -- deferred to M5
- **Social graph traversal for trending ("trending among my follows")** -- deferred to M5; requires graph query capabilities beyond simple follows filter
+- **Social graph traversal for trending ("trending among my follows")** -- deferred to M6; requires graph query capabilities beyond simple follows filter
- **Collaborative filtering** -- basic co-engagement signals used in `related` profile; full matrix-factorization-style CF deferred to M5
+- **Collaborative filtering** -- basic co-engagement signals used in `related` profile; full matrix-factorization-style CF deferred to M6
- **User-created collections/boards (UC-09.4)** -- deferred to M5
+- **User-created collections/boards (UC-09.4)** -- deferred to M6
- **Live content status tracking (UC-12)** -- deferred to M5
+- **Live content status tracking (UC-12)** -- deferred to M6
 ### Integration Test
@ -635,7 +802,54 @@ The full "For You" query works: `RETRIEVE items FOR USER @user_id USING PROFILE
 ---
-## Milestone 4: Hybrid Search -- "Text + semantic + signals in one query"
+## Milestone 4: Agent Memory -- "Agents own the personalization substrate"
 ### Milestone Thesis
 Agents mediate the user interaction: they ground LLM responses, collect preferences, and emit feedback. This milestone proves a developer can embed tidalDB alongside an agent runtime, create sessions, append structured feedback signals (reward, tool usage, critiques), enforce per-agent policy, and query session memory in milliseconds.
 ### Phases
 #### Phase 1: Session Schema & Lifecycle
 **Delivers:** `SessionId`, `AgentId`, and `AgentPolicy` types in schema plus builder flags (`with_sessions(true)`). APIs to `start_session`, `append_session_metadata`, `close_session`. WAL entries tagged with agent metadata and CLI output listing active sessions.
 #### Phase 2: Session Materializers & Short-Lived Aggregates
 **Delivers:** `SessionMaterializer` (minute-scale decay buckets for reward/pref hints, tool usage counters) registered via the existing materializer trait. Query APIs `session_view(session_id)` and `session_velocity(session_id, signal_type)` with <5µs read latency. Integration tests proving hot path throughput at 50k updates/sec.
 #### Phase 3: Policy & Safety Layer
 **Delivers:** Declarative schema-bound policies (allowed signal types, max QPS, storage TTL). Enforcement in the signal write path (reject or queue). Audit log per agent (accessible via CLI/metrics) plus rate-limiters to isolate noisy agents.
 #### Phase 4: Agent-Facing APIs & Explanations
 **Delivers:** `retrieve_for_session` / `search_for_session` endpoints returning ranked items plus a `session_snapshot` (top signals, reasons, reward velocity). Agent-friendly error codes, documentation, and samples (user → agent → tidalDB). Session data plumbed into ranking profiles via new `SessionContext`.
 ### UAT Scenario
 ```
 Given:
  An agent opens session S for user @u123 with metadata {tool:"planner"}
  Policy allows signals preference_hint and reward; forbids raw_log
 When:
  1. Agent writes preference_hint ("more jazz today")
  2. Agent writes reward(+0.8) after delivering an answer
  3. Agent executes RETRIEVE ... FOR USER @u123 FOR SESSION @S USING PROFILE for_you LIMIT 10
  4. Agent receives ranked items and session_snapshot (reward_velocity, last_tool)
  5. Agent attempts to write raw_log → rejected with policy violation
  6. Session closes; CLI shows duration, writes, rejections
 Then:
  - Session aggregates reflect preference/reward immediately
  - Policy enforcement blocks disallowed write with audit trail
  - After closure, querying session S returns archived snapshot with final signals
 ```
 ---
 ## Milestone 5: Hybrid Search -- "Text + semantic + signals in one query"
 ### Milestone Thesis
@ -762,7 +976,7 @@ A developer can execute SEARCH queries that combine full-text BM25 relevance wit
 ---
-## Milestone 5: Full Surface Coverage -- "Every use case works"
+## Milestone 6: Full Surface Coverage -- "Every use case works"
 ### Milestone Thesis
@ -851,8 +1065,8 @@ Then:
 ### Deferred to Later Milestones
 - **Signal rollups (hourly/daily materialization)** -- built if 100K-item benchmarks show bucketed counters exceeding the latency budget for 30d+ windows
- **Multi-vector user interest clustering (PinnerSage)** -- deferred to M6 or beyond; single preference vector serves through M5
+- **Multi-vector user interest clustering (PinnerSage)** -- deferred to M7 or beyond; single preference vector serves through M6
- **ACORN-1 two-hop expansion for very selective filters** -- deferred to M6; USearch predicate callback sufficient through M5
+- **ACORN-1 two-hop expansion for very selective filters** -- deferred to M7; USearch predicate callback sufficient through M6
 ### Done When
@ -860,7 +1074,7 @@ All 14 use cases pass their UAT scenarios as defined in USE_CASES.md. All 25+ so
 ---
-## Milestone 6: Production Hardening -- "Ready for real workloads"
+## Milestone 7: Production Hardening -- "Ready for real workloads"
 ### Milestone Thesis
@ -899,13 +1113,13 @@ Then:
 ### Phases
-(Phases for M6 are provisional -- detailed decomposition happens after M5 ships.)
+(Phases for M7 are provisional -- detailed decomposition happens after M6 ships.)
 #### Phase 1: Crash Recovery Hardening
 **Delivers:** Comprehensive crash recovery testing and hardening. Fault injection at every write-path stage. Recovery time targets. WAL compaction and checkpoint optimization.
-**Depends On:** M5 complete
+**Depends On:** M6 complete
 **Complexity:** XL
 #### Phase 2: Graceful Degradation Under Load
@ -929,7 +1143,7 @@ Then:
 **Depends On:** Phase 1
 **Complexity:** M
-### Deferred (Post-M6 / Future)
+### Deferred (Post-M7 / Future)
 - **Horizontal distribution** -- the single-node architecture scales vertically first; distribution is a separate product decision
 - **Multi-tenancy** -- per-tenant isolation within a single tidalDB instance
@ -946,22 +1160,22 @@ tidalDB operates correctly at 1M items under sustained concurrent read/write loa
 ## Use Case Coverage Progression
-| UC | Description | M1 | M2 | M3 | M4 | M5 | M6 |
+| UC | Description | M1 | M2 | M3 | M4 | M5 | M6 | M7 |
-|----|-------------|----|----|----|----|----|----|
+|----|-------------|----|----|----|----|----|----|----|
-| UC-01 | For You Feed | - | - | **Full** | Full | Full | Full |
+| UC-01 | For You Feed | - | - | **Full** | Full | Full | Full | Full |
-| UC-02 | Search | - | - | - | **Core** | **Full** | Full |
+| UC-02 | Search | - | - | - | - | **Core** | **Full** | Full |
-| UC-03 | Trending/Rising | Signals | **Full** | Full | Full | Full | Full |
+| UC-03 | Trending/Rising | Signals | **Full** | Full | Full | Full | Full | Full |
-| UC-04 | Following Feed | - | Partial | **Full** | Full | Full | Full |
+| UC-04 | Following Feed | - | Partial | **Full** | Full | Full | Full | Full |
-| UC-05 | Related/Up Next | - | - | **Core** | Core | **Full** | Full |
+| UC-05 | Related/Up Next | - | - | **Core** | Core | Core | **Full** | Full |
-| UC-06 | Browse/Category | Signals | **Core** | Core | Core | **Full** | Full |
+| UC-06 | Browse/Category | Signals | **Core** | Core | Core | Core | **Full** | Full |
-| UC-07 | Notifications | - | - | **Core** | Core | **Full** | Full |
+| UC-07 | Notifications | - | - | **Core** | Core | Core | **Full** | Full |
-| UC-08 | Creator Profile | - | **Core** | Core | Core | **Full** | Full |
+| UC-08 | Creator Profile | - | **Core** | Core | Core | Core | **Full** | Full |
-| UC-09 | User Library | - | - | Partial | Partial | **Full** | Full |
+| UC-09 | User Library | - | - | Partial | Partial | Partial | **Full** | Full |
-| UC-10 | People Search | - | - | - | **Core** | **Full** | Full |
+| UC-10 | People Search | - | - | - | - | **Core** | **Full** | Full |
-| UC-11 | Visual/Semantic | - | - | - | Partial | **Full** | Full |
+| UC-11 | Visual/Semantic | - | - | - | - | Partial | **Full** | Full |
-| UC-12 | Live Content | - | - | - | - | **Full** | Full |
+| UC-12 | Live Content | - | - | - | - | - | **Full** | Full |
-| UC-13 | Hidden Gems | - | **Full** | Full | Full | Full | Full |
+| UC-13 | Hidden Gems | - | **Full** | Full | Full | Full | Full | Full |
-| UC-14 | Controversial/Hot | Signals | **Full** | Full | Full | Full | Full |
+| UC-14 | Controversial/Hot | Signals | **Full** | Full | Full | Full | Full | Full |
 Legend:
 - `-` = Not addressed
--- a/docs/planning/milestone-0/phase-1/OVERVIEW.md
+++ b/docs/planning/milestone-0/phase-1/OVERVIEW.md
@ -0,0 +1,15 @@
 # Milestone 0 · Phase 1 — Embeddable Runtime Skeleton
 **Objective:** ship an ergonomic, zero-config builder so engineers can spin up a tidalDB instance inside tests or services without touching the signal stack yet.
 **Success criteria**
 - `TidalDb::builder()` exposes `ephemeral()` and `single_process(data_dir)` shortcuts with sensible defaults.
 - `Config` validates eagerly (missing dirs, bad permissions) so failures happen before WAL threads spin up.
 - Builder registers shutdown hooks (`Drop` + explicit `close()`), returning errors if background workers fail to drain.
 - Temp-directory helper guarantees deterministic cleanup (used by doctests + integration tests).
 **Dependencies:** none — runs before any signal-specific work.
 **Blocked by:** n/a
 **Unblocks:** M0 Phase 2 (CLI needs stable layout), all later milestones (test harnesses use the builder).
--- a/docs/planning/milestone-0/phase-1/task-01-builder-and-config.md
+++ b/docs/planning/milestone-0/phase-1/task-01-builder-and-config.md
@ -0,0 +1,13 @@
 # Task 01 — Builder + Config API
 **Goal:** expose a fluent API that hides all the knobs required to open a single-process tidalDB instance.
 ## Deliverables
 - `TidalDbBuilder` with methods: `ephemeral()`, `with_data_dir(Path)`, `cache_dir(Path)`, `wal_dir(Path)`, `validate()` and `open()`.
 - `Config` struct that can be serialized (for CLI) and implements `Default` tuned for embeddable use (single WAL segment, no background threads yet).
 - Unit tests proving builder errors when directories are missing/unwritable.
 ## Acceptance Criteria
 - `cargo test builder` passes on macOS + Linux runners.
 - Builder docs contain runnable snippet used in Phase 3 doctests.
 - No public API exposes fjall/useless internals; consumers only see `Config` + `TidalDbBuilder`.
--- a/docs/planning/milestone-0/phase-1/task-02-sandboxed-storage.md
+++ b/docs/planning/milestone-0/phase-1/task-02-sandboxed-storage.md
@ -0,0 +1,13 @@
 # Task 02 — Sandboxed Storage Layout
 **Goal:** deterministic filesystem layout for embedded instances so tooling/cleanup is reliable.
 ## Deliverables
 - `Paths` helper that derives `{base}/wal`, `{base}/items`, `{base}/users`, etc., and ensures directories exist with correct perms.
 - Temp-dir helper (`TempTidalHome`) for tests; implements Drop to delete directories unless `preserve=true`.
 - Documentation table describing folder purpose for future CLI use.
 ## Acceptance Criteria
 - Integration test proves two builders pointing to different `TempTidalHome` roots never collide.
 - Cleanup confirmed via test that drops the helper and asserts directories removed.
 - Paths helper reused by upcoming CLI (Phase 2) — single source of truth.
--- a/docs/planning/milestone-0/phase-2/OVERVIEW.md
+++ b/docs/planning/milestone-0/phase-2/OVERVIEW.md
@ -0,0 +1,12 @@
 # Milestone 0 · Phase 2 — Tooling & Diagnostics
 **Objective:** give developers minimal introspection tooling that works even when tidalDB is embedded. This phase adds `tidalctl` plus a metrics endpoint so later milestones can reuse the same plumbing.
 **Success criteria**
 - `tidalctl status --path <dir>` prints build info, WAL seq, open segments, and builder config snapshot.
 - Metrics exporter (text or JSON) exposes uptime, WAL queue depth, and build hash; tag everything with `partition_id=0` to future-proof multi-node rollouts.
 - Tooling uses the same `Paths` helper from Phase 1 — no duplicated layout logic.
 **Dependencies:** Phase 1 (stable directories + config serialization).
 **Unblocks:** future performance debugging + automated tests that assert health via CLI before/after workloads.
--- a/docs/planning/milestone-0/phase-2/SCOPING.md
+++ b/docs/planning/milestone-0/phase-2/SCOPING.md
@ -0,0 +1,443 @@
 # Milestone 0, Phase 2: Tooling & Diagnostics -- Scoping Decisions
 **Date:** 2026-02-20
 **Author:** @tidal-visionary (Spencer Kimball)
 **Status:** APPROVED -- ready for implementation
 ---
 ## Context
 m0p1 (Embeddable Runtime Skeleton) is complete. m1p1-p3 (Type System, WAL, Storage Engine) are also complete. The codebase has:
 - `TidalDb` as a thin handle: holds `Config`, has `health_check()`, `close()`, `Drop`
 - A full WAL implementation (`WalHandle`, `SegmentWriter`, `CheckpointManager`) that writes segment files (`wal-{seq:020}.seg`) and checkpoint metadata (`checkpoint.meta`) to disk
 - No `db.signal()` yet in the public API (deferred to m1p5)
 - No WAL writes from the `TidalDb` public API -- the WAL is implemented but not wired to the `TidalDb` facade
 - `Config` has no serde derive -- it is a plain struct with no serialization
 - Single crate `tidal/`, no workspace
 The task documents in `phase-2/` were written before m1p2 and m1p3 shipped. They assumed WAL writes would be accessible from the public API. They are not. This scoping document corrects the task definitions to match reality.
 ---
 ## 1. tidalctl Scope at M0
 ### What tidalctl Can Do
 tidalctl is a **cold inspector**. There is no live process to connect to. The CLI reads files from disk and reports what it finds. This is the correct model for an embeddable database -- there is no server process listening on a port. The inspector reads the same files the embedded library writes.
 ### Commands
 #### `tidalctl status --path <dir>`
 Reads the tidalDB home directory and prints a JSON report:
 ```json
 {
  "version": "0.1.0",
  "build_hash": "29400d4",
  "status": "ok",
  "storage_mode": "persistent",
  "wal": {
    "segments": 3,
    "first_seq": 1,
    "last_segment_seq": 201,
    "checkpoint_seq": 150,
    "checkpoint_ts": "2026-02-20T14:30:00Z",
    "wal_dir_bytes": 49152
  },
  "dirs": {
    "base": "/var/lib/tidaldb",
    "wal": "/var/lib/tidaldb/wal",
    "items": "/var/lib/tidaldb/items",
    "users": "/var/lib/tidaldb/users",
    "creators": "/var/lib/tidaldb/creators",
    "cache": "/var/lib/tidaldb/cache"
  }
 }
 ```
 **How each field is computed:**
 | Field | Source | Notes |
 |-------|--------|-------|
 | `version` | Compiled into binary via `env!("CARGO_PKG_VERSION")` | Always available |
 | `build_hash` | Compiled via `option_env!("GIT_HASH")` or build script | Falls back to `"unknown"` |
 | `status` | `"ok"` if dir exists, has wal subdir, and at least one segment | `"empty"` if no WAL segments, `"error"` if dir missing |
 | `storage_mode` | Inferred: if WAL dir exists with segments, `"persistent"` | No way to know `ephemeral` from disk -- ephemeral leaves no trace |
 | `wal.segments` | `segment::list_segments(&wal_dir)?.len()` | Already implemented in `tidal/src/wal/segment.rs` |
 | `wal.first_seq` | First element of `list_segments()` result | 0 if empty |
 | `wal.last_segment_seq` | Last element of `list_segments()` result | 0 if empty |
 | `wal.checkpoint_seq` | `CheckpointManager::read(&wal_dir)?` | null if no checkpoint file |
 | `wal.checkpoint_ts` | Same -- the `ts` field, formatted as ISO 8601 | null if no checkpoint |
 | `wal.wal_dir_bytes` | Sum of file sizes in WAL dir | Filesystem stat |
 | `dirs.*` | `Paths::new(base)` expanded | Existence checked per dir |
 **No config file is written.** tidalctl does not need `TidalDb::open()` to write a `.tidaldb.json` config snapshot. The CLI reports what it can observe on the filesystem. The config is a runtime concept -- it exists in memory while the process runs and is not persisted. This is correct for M0. If future milestones need a config file for operational tooling, that is a separate decision.
 **No live process query.** tidalctl reads disk. It does not connect to a running process. No Unix socket, no HTTP, no PID file. This is the right model for an embeddable library.
 #### `tidalctl paths --path <dir>`
 Prints the resolved directory layout:
 ```json
 {
  "base": "/var/lib/tidaldb",
  "wal": "/var/lib/tidaldb/wal",
  "items": "/var/lib/tidaldb/items",
  "users": "/var/lib/tidaldb/users",
  "creators": "/var/lib/tidaldb/creators",
  "cache": "/var/lib/tidaldb/cache",
  "exists": {
    "base": true,
    "wal": true,
    "items": true,
    "users": false,
    "creators": false,
    "cache": false
  }
 }
 ```
 This uses `Paths::new(dir)` -- the same path helper from m0p1. No duplication.
 #### Common Flags
 - `--path <dir>` (required): the tidalDB home directory
 - `--pretty` (optional): pretty-print JSON output (default: compact)
 - `--format json|text` (optional, default `json`): `text` prints human-friendly tabular output
 ### What tidalctl Does NOT Do at M0
 - No `tidalctl init` (creating a fresh tidalDB home) -- the library creates dirs on open
 - No `tidalctl repair` (WAL repair) -- crash recovery is automatic in `WalHandle::open()`
 - No `tidalctl compact` (storage compaction) -- no compaction exists yet
 - No `tidalctl dump` (WAL event dump) -- useful but not needed for the m0p2 UAT
 - No live process communication of any kind
 ---
 ## 2. Metrics Scope at M0
 ### The Problem with the Original Task
 The original task says: "Integration test hits `/metrics` and asserts counters increment when WAL appends."
 At M0, the `TidalDb` public API has no WAL write path. `WalHandle::append()` exists but is not wired to `TidalDb`. There are no signal writes from the public API. A test that asserts "counters increment when WAL appends" cannot be written without either (a) using WAL internals directly or (b) waiting for m1p5.
 ### Corrected Scope
 The metrics surface at M0 serves one purpose: **prove the plumbing works so later milestones can add counters without redesigning the metrics layer.** The counters themselves are scaffolding. The architecture is the deliverable.
 ### Endpoints
 #### `GET /healthz`
 ```json
 {
  "status": "ok",
  "uptime_seconds": 127.3,
  "version": "0.1.0",
  "build_hash": "29400d4"
 }
 ```
 #### `GET /metrics`
 Prometheus text exposition format:
 ```
 # HELP tidaldb_uptime_seconds Seconds since database opened.
 # TYPE tidaldb_uptime_seconds gauge
 tidaldb_uptime_seconds{partition_id="0"} 127.3
 # HELP tidaldb_health_ok Whether the database is healthy. 1 = ok, 0 = degraded.
 # TYPE tidaldb_health_ok gauge
 tidaldb_health_ok{partition_id="0"} 1
 # HELP tidaldb_info Build and version information.
 # TYPE tidaldb_info gauge
 tidaldb_info{version="0.1.0",build_hash="29400d4",partition_id="0"} 1
 ```
 ### Exact Counters at M0
 | Counter | Type | Source | Note |
 |---------|------|--------|------|
 | `tidaldb_uptime_seconds` | Gauge | `Instant::now() - opened_at` | Computed on read |
 | `tidaldb_health_ok` | Gauge | `health_check().is_ok() as u8` | 1 or 0 |
 | `tidaldb_info` | Gauge (info-pattern) | Build constants | Static, always 1 |
 That is the complete set. Three metrics. No WAL counters, no signal counters, no storage counters. Those arrive in m1p5 when the WAL is wired to the public API.
 ### What the Integration Test Verifies
 The integration test at M0 verifies:
 1. `TidalDb::builder().ephemeral().enable_metrics("127.0.0.1:0").open()` succeeds (port 0 = OS assigns)
 2. `GET /healthz` returns 200 with `status: "ok"` and `uptime_seconds > 0`
 3. `GET /metrics` returns 200 with valid Prometheus text format
 4. `tidaldb_uptime_seconds` increases between two reads separated by a sleep
 5. `tidaldb_health_ok` is 1
 6. `db.close()` stops the metrics server cleanly (no leaked threads, no port still bound)
 No WAL assertions. No signal assertions. The test proves the HTTP server starts, serves correct responses, and shuts down cleanly.
 ### What Is Deferred
 | Counter | Deferred To | Why |
 |---------|-------------|-----|
 | `tidaldb_wal_seq` | m1p5 | WAL not wired to public API yet |
 | `tidaldb_wal_segments` | m1p5 | Same |
 | `tidaldb_wal_bytes_total` | m1p5 | Same |
 | `tidaldb_signal_writes_total` | m1p5 | `db.signal()` does not exist yet |
 | `tidaldb_signal_read_latency` | m1p5 | Signal reads do not exist yet |
 | `tidaldb_query_latency` | m2p5 | Query executor does not exist yet |
 | `tidaldb_query_count` | m2p5 | Same |
 ---
 ## 3. HTTP Approach: Sync (Option A)
 **Chosen: (a) Sync HTTP via `tiny_http` in a background thread.**
 Rationale:
 1. **Minimal deps is an explicit tidalDB requirement.** Tokio is 200+ transitive dependencies. `tiny_http` is 5. For an embeddable library, dependency weight matters -- every dep is a compile-time cost and an audit surface for every user.
 2. **The metrics endpoint does ~2 requests per scrape interval.** This is not a high-throughput server. A single-threaded sync HTTP listener on a background thread handles thousands of req/s. Prometheus scrapes every 15-30s. `tiny_http` handles this with zero contention.
 3. **No Tokio runtime conflict.** If the host application uses Tokio (likely for an Axum/Actix service), embedding a second Tokio runtime inside tidalDB creates footguns: nested `block_on`, unexpected thread pools, panic behavior. A background `std::thread` with sync HTTP avoids all of this.
 4. **The "Future implementor" spec is wrong for M0.** The original task assumed tidalDB would share the host's async runtime. That is a leaky abstraction. An embeddable library should not assume or require any particular async runtime. A background thread with sync HTTP is the correct primitive.
 5. **Feature flag is premature.** Option (c) with feature flags adds compile-time complexity for a surface that serves 3 metrics. Ship sync now. If M7 (Production Hardening) needs async HTTP for high-frequency scraping, add it then. The internal `MetricsRegistry` / counter abstraction is the same either way -- only the HTTP transport changes.
 ### Implementation Shape
 ```rust
 // Builder API
 let db = TidalDb::builder()
    .ephemeral()
    .enable_metrics("127.0.0.1:9090")  // Starts background thread
    .open()?;
 // Internal: spawns std::thread with tiny_http::Server
 // Thread reads from Arc<MetricsState> (uptime, health_ok, build_info)
 // Thread exits cleanly when TidalDb::close() sets a shutdown flag
 ```
 ### Dependency Addition
 ```toml
 # In tidal/Cargo.toml, behind a feature flag:
 [features]
 metrics = ["dep:tiny_http"]
 [dependencies]
 tiny_http = { version = "0.12", optional = true }
 ```
 The `metrics` feature is opt-in. Users who do not need the HTTP endpoint pay zero compile cost. The `MetricsState` struct (atomic counters) exists unconditionally -- only the HTTP server is gated.
 ---
 ## 4. Workspace Structure: Workspace with Separate Binary Crate
 **Confirmed: workspace layout.**
 ### Structure
 ```
 tidalDB/
  Cargo.toml              # [workspace] members = ["tidal", "tidalctl"]
  tidal/
    Cargo.toml            # [package] name = "tidaldb" (the library)
    src/
  tidalctl/
    Cargo.toml            # [package] name = "tidalctl" (the binary)
    src/
      main.rs
 ```
 ### Why Workspace, Not `[[bin]]`
 1. **Separate dependency trees.** tidalctl needs `clap` for argument parsing. The tidaldb library should not carry `clap` as a dependency -- embeddable libraries do not parse CLI arguments. A `[[bin]]` inside `tidal/` would either make `clap` unconditional or require a feature flag, both of which pollute the library.
 2. **Independent versioning path.** tidalctl may version independently from tidaldb. The CLI is a companion tool, not part of the library API surface.
 3. **`cargo install tidalctl` works naturally.** Users install the CLI separately from embedding the library. A workspace member with `[[bin]]` in its own crate gives `cargo install --path tidalctl` the right behavior.
 4. **Shared dependencies via workspace.** `tidalctl` depends on `tidaldb` (for `Paths`, `WalConfig`, segment parsing, checkpoint reading). The workspace ensures they share the same compiled artifacts.
 ### tidalctl Dependencies
 ```toml
 [package]
 name = "tidalctl"
 version = "0.1.0"
 edition = "2024"
 [dependencies]
 tidaldb = { path = "../tidal" }
 clap = { version = "4", features = ["derive"] }
 serde = { version = "1", features = ["derive"] }
 serde_json = "1"
 ```
 ### What This Means for Pre-Commit Hooks and CI
 The root `Cargo.toml` becomes the workspace root. All `cargo` commands (`fmt`, `clippy`, `test`) need to run from the workspace root or with `--workspace`. The pre-commit hook currently uses `--manifest-path tidal/Cargo.toml` -- this must be updated to use the workspace root.
 ---
 ## 5. Deferred Items
 ### Explicitly NOT in m0p2
 | Item | Why Deferred | Arrives In |
 |------|-------------|------------|
 | Config serialization to disk (`.tidaldb.json`) | tidalctl inspects filesystem artifacts, not config files. Config is a runtime concept. | Revisit in M7 if operational tooling needs it |
 | `tidalctl init` command | Library creates dirs on open. A separate init command is redundant. | Possibly never |
 | `tidalctl repair` command | Crash recovery is automatic in `WalHandle::open()`. Manual repair is a production concern. | M7 |
 | `tidalctl dump` (WAL event dump) | Useful for debugging but not required for m0p2 UAT | M1 or M2 when developers need to debug signal event streams |
 | WAL counters in metrics | WAL not wired to public API yet | m1p5 |
 | Signal counters in metrics | `db.signal()` does not exist yet | m1p5 |
 | Query counters in metrics | Query executor does not exist yet | m2p5 |
 | Async HTTP for metrics | Sync HTTP is sufficient for Prometheus scraping | M7 if needed |
 | `tidalctl` connecting to live process | Embeddable library has no server process | Possibly never |
 | Serde on `Config` | tidalctl does not read a config file. Config serde is needed only if we write a config file, which is deferred. | When needed |
 ---
 ## 6. Acceptance Criteria
 ### Task 1: tidalctl CLI
 - [ ] **AC-1:** `tidalctl status --path <dir>` against a directory with WAL segments and checkpoint outputs valid JSON containing `version`, `wal.segments`, `wal.checkpoint_seq`, and `dirs.base`
 - [ ] **AC-2:** `tidalctl status --path <dir>` against an empty directory (no WAL, no segments) outputs JSON with `status: "empty"` and `wal.segments: 0`
 - [ ] **AC-3:** `tidalctl status --path /nonexistent` exits with non-zero status and prints a JSON error object to stderr
 - [ ] **AC-4:** `tidalctl paths --path <dir>` outputs JSON with all six directory paths and existence flags matching actual filesystem state
 - [ ] **AC-5:** `--pretty` flag produces indented JSON; absence produces compact JSON
 - [ ] **AC-6:** `cargo test -p tidalctl` passes with tests for: valid home, empty home, missing home, pretty flag, paths command
 ### Task 2: Metrics Surface
 - [ ] **AC-7:** `TidalDb::builder().ephemeral().enable_metrics("127.0.0.1:0").open()` starts a background HTTP thread bound to an OS-assigned port
 - [ ] **AC-8:** `GET /healthz` returns HTTP 200 with JSON containing `status: "ok"` and `uptime_seconds > 0`
 - [ ] **AC-9:** `GET /metrics` returns HTTP 200 with valid Prometheus text format containing `tidaldb_uptime_seconds`, `tidaldb_health_ok`, and `tidaldb_info`
 - [ ] **AC-10:** `tidaldb_uptime_seconds` increases monotonically between reads (verified by sleeping 100ms between two fetches)
 - [ ] **AC-11:** `TidalDb::close()` stops the metrics HTTP thread; subsequent connection attempts to the port are refused
 - [ ] **AC-12:** Building `tidaldb` without the `metrics` feature flag compiles successfully with no `tiny_http` dependency; `enable_metrics()` method is absent or returns a compile error guiding the user to enable the feature
 ---
 ## 7. UAT Scenario
 ### Given
 ```
 A developer has:
  - Built the workspace: `cargo build --workspace`
  - Created a persistent tidalDB instance that wrote WAL segments:
      let home = TempTidalHome::new()?;
      let paths = home.paths();
      paths.ensure_all()?;
      let wal_config = WalConfig { dir: home.path().to_path_buf(), ..Default::default() };
      let (wal, _) = WalHandle::open(wal_config)?;
      wal.append(event_1)?;
      wal.append(event_2)?;
      wal.checkpoint(2)?;
      wal.shutdown()?;
  - Opened a TidalDb with metrics enabled:
      let db = TidalDb::builder()
          .ephemeral()
          .enable_metrics("127.0.0.1:0")
          .open()?;
 ```
 ### When
 ```
 1. Run: tidalctl status --path <home.path()>
 2. Run: tidalctl paths --path <home.path()>
 3. HTTP GET /healthz on the metrics port
 4. HTTP GET /metrics on the metrics port
 5. Sleep 200ms
 6. HTTP GET /metrics again
 7. db.close()
 8. Attempt HTTP GET /healthz on the metrics port
 ```
 ### Then
 ```
 Step 1: JSON output with wal.segments >= 1, wal.checkpoint_seq == 2,
        status == "ok", version matches Cargo.toml
 Step 2: JSON output with dirs.wal == "<home>/wal", exists.wal == true
 Step 3: HTTP 200, body contains "status":"ok", uptime_seconds > 0
 Step 4: HTTP 200, body contains tidaldb_uptime_seconds,
        tidaldb_health_ok 1, tidaldb_info{version="0.1.0"...} 1
 Step 5: (sleep)
 Step 6: tidaldb_uptime_seconds > value from step 4
 Step 7: close() returns Ok(())
 Step 8: Connection refused (metrics server stopped)
 ```
 ### Pass/Fail Gate
 m0p2 is **done** when:
 - `cargo test -p tidalctl` passes
 - `cargo test -p tidaldb --features metrics` passes (metrics integration tests)
 - `cargo build --workspace` succeeds with no warnings under `clippy -D warnings`
 - All 12 acceptance criteria above are verified by automated tests
 - tidalctl uses `Paths` from the tidaldb crate (no duplicated layout logic)
 ---
 ## Implementation Notes
 ### Build Hash
 Use a build script (`tidal/build.rs`) or `option_env!("GIT_HASH")` set by CI. For local builds, fall back to `"dev"`. Both tidalctl and the metrics endpoint use the same constant.
 ### Metrics State Sharing
 ```rust
 pub(crate) struct MetricsState {
    opened_at: Instant,
    health_ok: AtomicBool,
    // Future milestones add: wal_seq: AtomicU64, signal_writes: AtomicU64, etc.
 }
 ```
 This struct is `Arc`-shared between `TidalDb` and the metrics HTTP thread. Adding new counters in future milestones is a one-line addition to this struct plus a one-line addition to the Prometheus renderer. The plumbing is paid for once in m0p2.
 ### tidalctl WAL Inspection
 tidalctl depends on `tidaldb` as a library. It calls:
 - `tidaldb::db::Paths::new(dir)` for path resolution
 - `tidaldb::wal::segment::list_segments(&wal_dir)` for segment enumeration
 - `tidaldb::wal::checkpoint::CheckpointManager::read(&wal_dir)` for checkpoint state
 These are all `pub` functions already. No new internal APIs need to be exposed. The WAL module's public surface is sufficient.
 ### Complexity Estimates
 | Task | Complexity | Rationale |
 |------|-----------|-----------|
 | Workspace setup (root Cargo.toml, pre-commit hook update) | S | Mechanical, no design decisions |
 | tidalctl CLI (clap, status, paths) | M | Two commands, JSON output, error handling, tests |
 | Metrics surface (tiny_http, feature flag, MetricsState, endpoints) | M | Background thread lifecycle, Prometheus format, integration test |
 | Build hash plumbing | S | Build script or env var, shared constant |
 **Total phase complexity: M** (two M tasks + two S tasks, all independent after workspace setup)
--- a/docs/planning/milestone-0/phase-2/task-01-tidalctl.md
+++ b/docs/planning/milestone-0/phase-2/task-01-tidalctl.md
@ -0,0 +1,13 @@
 # Task 01 — `tidalctl` CLI
 **Goal:** ship a tiny, dependency-light binary that inspects an embedded tidalDB home.
 ## Deliverables
 - `tidalctl status --path <dir>` and `tidalctl paths --path <dir>` commands.
 - Shared crate for serialization so CLI can read builder `Config` dumps atomically.
 - Documentation on how to vendor the CLI into host repos (cargo install or binary download).
 ## Acceptance Criteria
 - Works against both temp homes and long-lived dirs.
 - Output is JSON by default with `--pretty` option.
 - `cargo test -p tidalctl` exercises happy/error paths.
--- a/docs/planning/milestone-0/phase-2/task-02-metrics-surface.md
+++ b/docs/planning/milestone-0/phase-2/task-02-metrics-surface.md
@ -0,0 +1,13 @@
 # Task 02 — Metrics / Health Surface
 **Goal:** expose runtime stats (uptime, WAL lag, open handles) without requiring full observability stack.
 ## Deliverables
 - Optional HTTP endpoint (disabled by default) serving `/metrics` (Prometheus text) and `/healthz` (JSON).
 - Builder flag `enable_metrics(addr)` + `with_metrics(false)` toggle for tests.
 - Metrics counters tagged with `process`, `partition_id`, `build_hash` for future distributed rollouts.
 ## Acceptance Criteria
 - Endpoint can run on the same Tokio runtime as host service (returns `Future` implementor).
 - Minimal deps (hyper/tiny_http) to keep embeddable footprint light.
 - Integration test hits `/metrics` and asserts counters increment when WAL appends.
--- a/docs/planning/milestone-0/phase-3/OVERVIEW.md
+++ b/docs/planning/milestone-0/phase-3/OVERVIEW.md
@ -0,0 +1,12 @@
 # Milestone 0 · Phase 3 — Samples & Docs
 **Objective:** provide living examples so every change to the builder/tooling is exercised in CI. This phase adds doctest-backed quickstarts plus integration tests wiring tidalDB into typical Rust stacks.
 **Success criteria**
 - Quick-start snippet (For You microdemo) lives in `/examples` and is referenced from docs/site; compiled via `cargo test --doc`.
 - Axum/Actix embedding examples show graceful shutdown and metrics wiring.
 - CONTRIBUTING section updated with “run the samples” checklist.
 **Dependencies:** Phase 1 + 2 (uses builder + CLI output).
 **Unblocks:** Developer onboarding, future blog posts, automated regression tests.
--- a/docs/planning/milestone-0/phase-3/task-01-quickstart-and-doctests.md
+++ b/docs/planning/milestone-0/phase-3/task-01-quickstart-and-doctests.md
@ -0,0 +1,12 @@
 # Task 01 — Quickstart + Doctests
 **Goal:** demonstrate embedding tidalDB in <20 LOC and keep it compiling forever.
 ## Deliverables
 - `examples/quickstart.rs` referenced from README + VISION.
 - Corresponding snippet in docs that uses Rust fenced code block with `no_run` + doctest harness.
 - CI hook (`cargo test --doc --examples`) wired into main workflow.
 ## Acceptance Criteria
 - Quickstart writes a single signal and reads it back once M1 lands, but for now asserts builder + health_check.
 - Fails CI if builder API changes without updating docs.
--- a/docs/planning/milestone-0/phase-3/task-02-embedding-guides.md
+++ b/docs/planning/milestone-0/phase-3/task-02-embedding-guides.md
@ -0,0 +1,14 @@
 # Task 02 — Embedding Guides (Axum/Actix/CLI)
 **Goal:** show how to wire tidalDB into real hosts so customers don’t guess.
 ## Deliverables
 - Axum guide: builder in `State`, graceful shutdown via `with_graceful_shutdown`.
 - Actix guide: `Data<TidalDb>` example + metric endpoint mounting.
 - CLI embedding: minimal binary using builder + tidalctl for smoke test.
 - Docs page (mdx) referencing all three.
 ## Acceptance Criteria
 - Each guide lives under `/examples` with README instructions.
 - Guides double as integration tests (run during CI).
 - Includes “cleanup” guidance referencing TempTidalHome helper.
--- a/docs/planning/milestone-p/OVERVIEW.md
+++ b/docs/planning/milestone-p/OVERVIEW.md
@ -0,0 +1,20 @@
 # Milestone P Overview
 This directory contains the execution plan for the product track defined in `docs/planning/PRODUCT_ROADMAP.md`.
 ## Phase Map
 - `phase-1/` -> P0 Beachhead Validation
 - `phase-2/` -> P1 Concierge Alpha
 - `PG1 gate` -> Personalization Core Done (must pass before P2)
 - `phase-3/` -> P2 Productized Beta
 - `phase-4/` -> P3 Public Launch
 - `phase-5/` -> P4 Scale + Revenue Fit
 ## Usage
 Each phase follows the same pattern:
 1. Read `OVERVIEW.md` for objective, success criteria, and task dependency.
 2. Execute tasks in numbered order.
 3. Report completion back to `PRODUCT_ROADMAP.md` and `ROADMAP.md` status tables.
--- a/docs/planning/milestone-p/phase-1/OVERVIEW.md
+++ b/docs/planning/milestone-p/phase-1/OVERVIEW.md
@ -0,0 +1,25 @@
 # Milestone P · Phase 1 — P0 Beachhead Validation
 **Objective:** prove that the personal briefing concept is valuable enough for repeated use among target users.
 ## Success Criteria
 - 20-50 target users onboarded.
 - Two-week concierge pilot completed.
 - Median user performs >= 1 explicit feedback action per session.
 - D2 and D7 retention measured and reviewed.
 - Qualitative interviews confirm stronger perceived value vs baseline feed habits.
 **Dependencies:** M0 complete, M1 partial (signal write/read path sufficient for prototype adaptation).
 **Blocked by:** final target segment and recruitment criteria.
 **Unblocks:** P1 Concierge Alpha build.
 ## Task Index
 | # | Task | Delivers | Depends On | Complexity |
 |---|------|----------|------------|------------|
 | 01 | Target Segment and Recruitment | persona definition, recruitment script, candidate pool | None | S |
 | 02 | Concierge Pilot Loop | daily briefing workflow, manual QA process, interview cadence | Task 01 | M |
 | 03 | Validation Readout | retention and interview analysis, go/no-go decision | Task 02 | S |
--- a/docs/planning/milestone-p/phase-1/task-01-target-segment-and-recruitment.md
+++ b/docs/planning/milestone-p/phase-1/task-01-target-segment-and-recruitment.md
@ -0,0 +1,16 @@
 # P0 Task 01: Target Segment and Recruitment
 ## Deliverable
 A clear target segment definition and a recruited pilot cohort of 20-50 users for the beachhead validation phase.
 ## Acceptance Criteria
 - [ ] Primary segment defined (knowledge-worker profile + consumer profile).
 - [ ] Inclusion/exclusion criteria documented.
 - [ ] Recruitment script and consent language prepared.
 - [ ] 20-50 users recruited and scheduled.
 ## Depends On
 - None
--- a/docs/planning/milestone-p/phase-1/task-02-concierge-pilot-loop.md
+++ b/docs/planning/milestone-p/phase-1/task-02-concierge-pilot-loop.md
@ -0,0 +1,16 @@
 # P0 Task 02: Concierge Pilot Loop
 ## Deliverable
 A two-week concierge pilot that sends daily briefings, captures feedback, and records perceived usefulness.
 ## Acceptance Criteria
 - [ ] Daily briefing sent to each participant.
 - [ ] Feedback controls captured (`more`, `less`, `hide`, `mute`, `save`).
 - [ ] Weekly interviews executed per participant sample.
 - [ ] Pilot operations documented (timing, QA, issue handling).
 ## Depends On
 - Task 01
--- a/docs/planning/milestone-p/phase-1/task-03-validation-readout.md
+++ b/docs/planning/milestone-p/phase-1/task-03-validation-readout.md
@ -0,0 +1,16 @@
 # P0 Task 03: Validation Readout
 ## Deliverable
 A go/no-go decision memo with quantitative and qualitative outcomes from the pilot.
 ## Acceptance Criteria
 - [ ] D2 and D7 retention reported.
 - [ ] Feedback action rate reported.
 - [ ] Useful-item perception findings summarized.
 - [ ] Clear recommendation: proceed to P1, iterate, or pivot.
 ## Depends On
 - Task 02
--- a/docs/planning/milestone-p/phase-2/OVERVIEW.md
+++ b/docs/planning/milestone-p/phase-2/OVERVIEW.md
@ -0,0 +1,25 @@
 # Milestone P · Phase 2 — P1 Concierge Alpha
 **Objective:** deliver a narrow, high-quality `Today Brief` experience that users find useful and controllable every day.
 ## Success Criteria
 - Daily ranked brief is live for pilot cohort.
 - Reason labels and source links present for each card.
 - Immediate adaptation after negative feedback visible on next refresh.
 - Time-budget mode (`5/10/20`) in active usage.
 - Weekly active usage confirms repeat behavior.
 **Dependencies:** P0 completed, M1 complete, M2 partial.
 **Blocked by:** validated user segment from P0.
 **Unblocks:** P2 Productized Beta.
 ## Task Index
 | # | Task | Delivers | Depends On | Complexity |
 |---|------|----------|------------|------------|
 | 01 | Briefing UX and Reason Labels | card UI spec, reasons taxonomy, source exposure rules | None | M |
 | 02 | Feedback Loop UX | controls and immediate-reflection behavior | Task 01 | M |
 | 03 | Quality and Diversity Baseline | quality gates and diversity constraints in top results | Task 02 | M |
--- a/docs/planning/milestone-p/phase-2/task-01-briefing-ux-and-reason-labels.md
+++ b/docs/planning/milestone-p/phase-2/task-01-briefing-ux-and-reason-labels.md
@ -0,0 +1,15 @@
 # P1 Task 01: Briefing UX and Reason Labels
 ## Deliverable
 Core daily briefing experience with clear reason labels and transparent source access.
 ## Acceptance Criteria
 - [ ] Briefing cards defined with title, summary, source, timestamp, and reason label.
 - [ ] Reason taxonomy documented and constrained.
 - [ ] Source link behavior consistent across cards.
 ## Depends On
 - None
--- a/docs/planning/milestone-p/phase-2/task-02-feedback-loop-ux.md
+++ b/docs/planning/milestone-p/phase-2/task-02-feedback-loop-ux.md
@ -0,0 +1,15 @@
 # P1 Task 02: Feedback Loop UX
 ## Deliverable
 User controls for `more/less/hide/mute/save` and immediate next-refresh adaptation behavior.
 ## Acceptance Criteria
 - [ ] Feedback control semantics documented.
 - [ ] Next-refresh behavior specified for each control.
 - [ ] Negative-feedback persistence rules documented.
 ## Depends On
 - Task 01
--- a/docs/planning/milestone-p/phase-2/task-03-quality-and-diversity-baseline.md
+++ b/docs/planning/milestone-p/phase-2/task-03-quality-and-diversity-baseline.md
@ -0,0 +1,15 @@
 # P1 Task 03: Quality and Diversity Baseline
 ## Deliverable
 Default quality floor and diversity guardrails for top-ranked briefing items.
 ## Acceptance Criteria
 - [ ] Duplicate suppression policy defined.
 - [ ] Source/topic diversity thresholds defined for top 10 items.
 - [ ] Freshness floor defined for time-sensitive domains.
 ## Depends On
 - Task 02
--- a/docs/planning/milestone-p/phase-3/OVERVIEW.md
+++ b/docs/planning/milestone-p/phase-3/OVERVIEW.md
@ -0,0 +1,24 @@
 # Milestone P · Phase 3 — P2 Productized Beta
 **Objective:** make the product self-serve and reliable enough to run without manual curation.
 ## Success Criteria
 - Onboarding under 3 minutes for most users.
 - Cohort layer available and understandable.
 - Explanation and trust controls present by default.
 - D7 retention and useful-item rate exceed baseline comparison feed.
 **Dependencies:** P1 completed, **PG1 Personalization Core Done gate passed**, M2 complete, M3 partial.
 **Blocked by:** stable alpha behavior and instrumented metrics.
 **Unblocks:** P3 Public Launch.
 ## Task Index
 | # | Task | Delivers | Depends On | Complexity |
 |---|------|----------|------------|------------|
 | 01 | Self-Serve Onboarding | onboarding flow, defaults, profile bootstrap | None | M |
 | 02 | Cohort and Context Views | cohort-trending surface and session context mode | Task 01 | M |
 | 03 | Trust Controls and Persistence | mute/hide persistence, quality visibility, transparency UX | Task 02 | M |
--- a/docs/planning/milestone-p/phase-3/task-01-self-serve-onboarding.md
+++ b/docs/planning/milestone-p/phase-3/task-01-self-serve-onboarding.md
@ -0,0 +1,15 @@
 # P2 Task 01: Self-Serve Onboarding
 ## Deliverable
 An onboarding flow that consistently produces a usable briefing in under three minutes.
 ## Acceptance Criteria
 - [ ] Minimal step flow defined and instrumented.
 - [ ] Default profile bootstrapping rules documented.
 - [ ] Onboarding completion and drop-off metrics tracked.
 ## Depends On
 - None
--- a/docs/planning/milestone-p/phase-3/task-02-cohort-and-context-views.md
+++ b/docs/planning/milestone-p/phase-3/task-02-cohort-and-context-views.md
@ -0,0 +1,15 @@
 # P2 Task 02: Cohort and Context Views
 ## Deliverable
 Product surfaces for `trending for people like you` and optional session-scoped context mode.
 ## Acceptance Criteria
 - [ ] Cohort view behavior and labels defined.
 - [ ] Session context mode behavior defined.
 - [ ] Switching between views preserves user trust and clarity.
 ## Depends On
 - Task 01
--- a/docs/planning/milestone-p/phase-3/task-03-trust-controls-and-persistence.md
+++ b/docs/planning/milestone-p/phase-3/task-03-trust-controls-and-persistence.md
@ -0,0 +1,15 @@
 # P2 Task 03: Trust Controls and Persistence
 ## Deliverable
 Persistent user controls and transparency defaults for trustworthy daily use.
 ## Acceptance Criteria
 - [ ] Mute/hide preferences persist across sessions.
 - [ ] Explanation visibility defaults defined.
 - [ ] Source transparency and quality indicators visible in briefing UI.
 ## Depends On
 - Task 02
--- a/docs/planning/milestone-p/phase-4/OVERVIEW.md
+++ b/docs/planning/milestone-p/phase-4/OVERVIEW.md
@ -0,0 +1,24 @@
 # Milestone P · Phase 4 — P3 Public Launch
 **Objective:** launch the product publicly with reliability, quality floor, and support readiness.
 ## Success Criteria
 - Briefing generation reliability and latency SLOs met.
 - Quality controls enforced (freshness, duplicates, source floor).
 - Notification cadence controls protect user attention.
 - Support and incident response workflow operating.
 **Dependencies:** P2 completed, M3 + M5 core, M6 partial.
 **Blocked by:** beta KPIs and operational readiness.
 **Unblocks:** P4 Scale + Revenue Fit.
 ## Task Index
 | # | Task | Delivers | Depends On | Complexity |
 |---|------|----------|------------|------------|
 | 01 | Reliability and SLOs | launch SLOs, monitoring, error budgets | None | M |
 | 02 | Quality Operations | quality floor checks, regression dashboards, alerting | Task 01 | M |
 | 03 | Launch and Support Playbook | launch checklist, incident roles, communication templates | Task 02 | S |
--- a/docs/planning/milestone-p/phase-4/task-01-reliability-and-slos.md
+++ b/docs/planning/milestone-p/phase-4/task-01-reliability-and-slos.md
@ -0,0 +1,15 @@
 # P3 Task 01: Reliability and SLOs
 ## Deliverable
 A launch reliability baseline with concrete SLOs and monitoring coverage.
 ## Acceptance Criteria
 - [ ] Briefing latency and availability SLOs documented.
 - [ ] Key failure modes mapped to alerts.
 - [ ] Error budget policy defined.
 ## Depends On
 - None
--- a/docs/planning/milestone-p/phase-4/task-02-quality-operations.md
+++ b/docs/planning/milestone-p/phase-4/task-02-quality-operations.md
@ -0,0 +1,15 @@
 # P3 Task 02: Quality Operations
 ## Deliverable
 Operational quality controls to prevent stale, duplicate, or low-credibility briefing output.
 ## Acceptance Criteria
 - [ ] Quality floor rules enforced.
 - [ ] Regression dashboard for useful-item rate and repeated-unwanted-item rate.
 - [ ] On-call quality triage workflow defined.
 ## Depends On
 - Task 01
--- a/docs/planning/milestone-p/phase-4/task-03-launch-and-support-playbook.md
+++ b/docs/planning/milestone-p/phase-4/task-03-launch-and-support-playbook.md
@ -0,0 +1,15 @@
 # P3 Task 03: Launch and Support Playbook
 ## Deliverable
 A launch playbook covering rollout, support triage, and user communication.
 ## Acceptance Criteria
 - [ ] Rollout stages documented.
 - [ ] Support triage categories and ownership defined.
 - [ ] Incident communication templates prepared.
 ## Depends On
 - Task 02
--- a/docs/planning/milestone-p/phase-5/OVERVIEW.md
+++ b/docs/planning/milestone-p/phase-5/OVERVIEW.md
@ -0,0 +1,24 @@
 # Milestone P · Phase 5 — P4 Scale + Revenue Fit
 **Objective:** prove product economics and growth quality at larger scale.
 ## Success Criteria
 - Monetization model validated.
 - Revenue metrics tracked alongside quality metrics.
 - Retention stable as user volume grows.
 - Next target segment chosen with evidence.
 **Dependencies:** P3 completed, M6 + M7.
 **Blocked by:** public launch performance and data quality.
 **Unblocks:** next product line or market expansion.
 ## Task Index
 | # | Task | Delivers | Depends On | Complexity |
 |---|------|----------|------------|------------|
 | 01 | Monetization Experiments | pricing/tests, conversion funnel instrumentation | None | M |
 | 02 | Quality-Safe Growth | guardrails that prevent revenue-over-quality regressions | Task 01 | M |
 | 03 | Segment Expansion Plan | data-backed expansion strategy and milestone proposal | Task 02 | S |
--- a/docs/planning/milestone-p/phase-5/task-01-monetization-experiments.md
+++ b/docs/planning/milestone-p/phase-5/task-01-monetization-experiments.md
@ -0,0 +1,15 @@
 # P4 Task 01: Monetization Experiments
 ## Deliverable
 A validated monetization path with measurable conversion and retention impact.
 ## Acceptance Criteria
 - [ ] Pricing hypotheses and experiment plan documented.
 - [ ] Conversion funnel instrumented.
 - [ ] Early conversion and churn impact assessed.
 ## Depends On
 - None
--- a/docs/planning/milestone-p/phase-5/task-02-quality-safe-growth.md
+++ b/docs/planning/milestone-p/phase-5/task-02-quality-safe-growth.md
@ -0,0 +1,15 @@
 # P4 Task 02: Quality-Safe Growth
 ## Deliverable
 Growth guardrails that prevent quality degradation under monetization pressure.
 ## Acceptance Criteria
 - [ ] Revenue and quality KPI review cadence defined.
 - [ ] Regression triggers and rollback policy defined.
 - [ ] User trust metrics included in growth decisions.
 ## Depends On
 - Task 01
--- a/docs/planning/milestone-p/phase-5/task-03-segment-expansion-plan.md
+++ b/docs/planning/milestone-p/phase-5/task-03-segment-expansion-plan.md
@ -0,0 +1,15 @@
 # P4 Task 03: Segment Expansion Plan
 ## Deliverable
 A data-backed proposal for the next target segment and roadmap extension.
 ## Acceptance Criteria
 - [ ] Expansion segment candidate analysis completed.
 - [ ] Required product/engine capabilities identified.
 - [ ] Proposed milestone sequence drafted.
 ## Depends On
 - Task 02
--- a/docs/research/tidaldb_tooling_and_diagnostics.md
+++ b/docs/research/tidaldb_tooling_and_diagnostics.md
@ -0,0 +1,696 @@
 # Research: CLI Framework and Embedded HTTP for m0p2 Tooling & Diagnostics
 ## Question
 What is the minimum-viable set of dependencies and design patterns for:
 1. A `tidalctl` CLI binary (2 subcommands, 1 required arg, 1 optional flag, JSON output)
 2. An optional embedded HTTP endpoint (`/healthz` JSON, `/metrics` Prometheus text format)
 3. Prometheus text format output for 5-10 counters/gauges
 4. Config serialization for CLI-to-library communication
 ## TidalDB Context
 tidalDB is an embeddable, single-node-first Rust database. The dependency philosophy from CODING_GUIDELINES.md is explicit: "Every dependency must justify its existence against 'could we write this in 200 lines?'" The library crate has `#![forbid(unsafe_code)]` at crate level. MSRV is 1.91 (Rust 2024 edition).
 **m0p2 scope is narrow:**
 - `tidalctl status --path <dir>` and `tidalctl paths --path <dir>` -- two subcommands, one required flag (`--path`), one optional flag (`--pretty`), JSON output
 - `/healthz` returning JSON health status
 - `/metrics` returning Prometheus text format with ~5-10 metrics (uptime, WAL sequence, queue depth, build hash)
 - The HTTP endpoint is feature-gated (`metrics` feature), disabled by default
 - Expected concurrent connections to the metrics endpoint: <10 (dev/ops tooling only)
 **Existing dependency context (from Cargo.lock):** `criterion` (dev-dependency) already pulls in `clap 4.5.60`, `serde 1.0.228`, `serde_json 1.0.149`, and `serde_derive 1.0.228`. These are compiled in every `cargo test` and `cargo bench` invocation today. `serde`/`serde_json` are also listed as approved dependencies in CODING_GUIDELINES.md (line 296).
 ---
 ## Question 1: CLI Argument Parsing for `tidalctl`
 ### Approaches Surveyed
 #### Approach 1: `clap` 4.x (derive API)
 **How it works:** Declarative derive macros on structs generate a full argument parser with help text, error messages, completions, and subcommand routing. The derive API maps directly from struct fields to CLI flags.
 **Used by:** TiKV (`tikv-ctl`), Meilisearch, SurrealDB, Vector, Nushell, ripgrep, bat, fd. The dominant choice in the Rust CLI ecosystem. Criterion (already a tidalDB dev-dep) uses clap 4 internally.
 **Evidence:**
 - argparse-rosetta-rs benchmarks (2024): 3s full debug build, 392ms incremental. 654 KiB release binary overhead (full features) or 427 KiB (minimal features).
 - MSRV: 1.74. Compatible with tidalDB's 1.91.
 - Rain's Rust CLI Recommendations: "use clap unless you have a really simple application."
 **Strengths:**
 - Auto-generated `--help` with subcommand tree, argument descriptions, and defaults.
 - Compile-time validation of argument structure via derive macros.
 - Shell completions via `clap_complete`.
 - Already in Cargo.lock via criterion -- zero additional compile-time cost in dev builds.
 **Weaknesses:**
 - 654 KiB binary overhead (full) / 427 KiB (minimal) added to the `tidalctl` release binary.
 - Proc-macro dependency chain (syn, quote, proc-macro2) -- though these are already compiled for criterion.
 - Overkill for 2 subcommands.
 #### Approach 2: `argh` 0.1.13 (Google's derive parser)
 **How it works:** Derive-based parser optimized for code size, designed for Google Fuchsia's CLI conventions. Similar derive API to clap but with a smaller binary footprint.
 **Used by:** Google Fuchsia tooling. Limited adoption outside Google's ecosystem.
 **Evidence:**
 - argparse-rosetta-rs benchmarks: 3s full debug build (same as clap due to proc-macro overhead), 203ms incremental. 38 KiB binary overhead.
 - MSRV: not explicitly declared. Uses 2018 edition. Last release ~12 months ago.
 - License: BSD-3-Clause. "This is not an officially supported Google product."
 **Strengths:**
 - Much smaller binary overhead than clap (38 KiB vs 427-654 KiB).
 - Derive-based API similar to clap.
 **Weaknesses:**
 - Not in Cargo.lock -- adds a new dependency tree.
 - Fuchsia-specific conventions (not standard Unix `--flag=value` in all cases).
 - Lower community adoption; maintenance uncertain (not officially supported by Google).
 - No shell completions.
 - 3s initial compile (proc-macro overhead same as clap).
 #### Approach 3: `pico-args` 0.5.0
 **How it works:** Manual argument extraction via method calls. No derive, no proc-macros, no help generation. Parse arguments by calling `opt_value_from_str("--path")`, `contains("--pretty")`, and `subcommand()`.
 **Used by:** RazrFalcon's suite of tools (resvg, usvg, svgcleaner). Popular in the "small tool" Rust ecosystem. 11M+ total downloads on crates.io.
 **Evidence:**
 - argparse-rosetta-rs benchmarks: 384ms full debug build, 185ms incremental. 23 KiB binary overhead.
 - Zero dependencies. Zero proc-macros. 666 lines of code.
 - MSRV: 1.32. Compatible with any Rust version.
 - License: MIT.
 - No unsafe code (`#![forbid(unsafe_code)]`).
 **Strengths:**
 - Negligible compile-time and binary size impact.
 - Zero dependencies -- no transitive risk.
 - API is simple enough for 2 subcommands.
 - Matches tidalDB's dependency philosophy perfectly.
 **Weaknesses:**
 - No auto-generated `--help`. Must be hand-written (10-15 lines for this CLI).
 - No derive -- argument parsing is imperative code.
 - Subcommand routing is manual string matching.
 - Error messages are less polished than clap.
 #### Approach 4: `lexopt` 0.3.1
 **How it works:** Low-level lexer that yields tokens (`Short`, `Long`, `Value`). The application matches on tokens in a loop. One file, zero dependencies, zero macros.
 **Used by:** cargo (as `clap_lex` which is derived from lexopt's design), uutils.
 **Evidence:**
 - argparse-rosetta-rs benchmarks: 385ms full debug build, 184ms incremental. 34 KiB binary overhead.
 - Zero dependencies. MSRV 1.31. License: MIT/Apache-2.0.
 **Strengths:**
 - Handles `OsString` correctly (important for path arguments).
 - Slightly more structured than raw `std::env::args()`.
 **Weaknesses:**
 - More boilerplate than pico-args for the same result.
 - No subcommand abstraction -- everything is a token loop.
 - Slightly larger binary overhead than pico-args for less ergonomic API.
 #### Approach 5: Manual (`std::env::args()`)
 **How it works:** Read `std::env::args()` into a `Vec<String>`, match on the first positional argument for the subcommand, iterate remaining args for flags.
 **Used by:** Many internal tools. SQLite's CLI is hand-rolled in C (not using getopt). DuckDB's CLI is based on SQLite's hand-rolled parser.
 **Evidence:**
 - Zero dependencies, zero binary overhead, zero compile time addition.
 - For 2 subcommands + 2 flags, this is approximately 50-80 lines of Rust.
 **Strengths:**
 - Absolute minimum footprint.
 - No dependency to maintain, audit, or version-pin.
 - Complete control over error messages.
 **Weaknesses:**
 - Must handle edge cases manually: `--path=<dir>` vs `--path <dir>`, `--` separator, unknown flags.
 - No help generation.
 - More code to maintain than pico-args for equivalent behavior.
 - Easy to introduce subtle parsing bugs (e.g., `--path` at end of args without value).
 ### Comparison
 | Criterion | clap 4.x | argh 0.1.13 | pico-args 0.5.0 | lexopt 0.3.1 | Manual |
 |---|---|---|---|---|---|
 | Full debug build | 3s | 3s | 384ms | 385ms | 0ms |
 | Incremental build | 392ms | 203ms | 185ms | 184ms | 0ms |
 | Binary overhead (release) | 427-654 KiB | 38 KiB | 23 KiB | 34 KiB | 0 KiB |
 | Dependencies | ~10 transitive | ~3 (proc-macro) | 0 | 0 | 0 |
 | Auto `--help` | Yes | Yes | No | No | No |
 | Subcommand support | Native | Native | Manual matching | Manual matching | Manual matching |
 | Proc-macros | Yes (derive) | Yes (derive) | No | No | No |
 | `#![forbid(unsafe_code)]` | No (clap uses unsafe) | Unknown | Yes | Yes | Yes |
 | MSRV | 1.74 | ~1.56 (2018 ed.) | 1.32 | 1.31 | N/A |
 | Already in Cargo.lock | Yes (via criterion) | No | No | No | N/A |
 | License | MIT/Apache-2.0 | BSD-3-Clause | MIT | MIT/Apache-2.0 | N/A |
 | Lines of code (user-side) | ~25 (derive struct) | ~25 (derive struct) | ~40 (imperative) | ~50 (token loop) | ~60-80 |
 ### Recommendation: Manual `std::env::args()` for `tidalctl`
 **The case is clear when you look at the actual scope.** `tidalctl` has 2 subcommands, 1 required flag, and 1 optional flag. This is a 60-line match statement, not a parser configuration problem.
 The key arguments:
 1. **The CODING_GUIDELINES.md test:** "Could we write this in 200 lines?" -- Yes, in about 60 lines, including help text and error messages. No dependency passes this bar for this scope.
 2. **`tidalctl` is a separate binary crate, not the library.** It will have its own `Cargo.toml`. Even though clap is in the workspace Cargo.lock via criterion, `tidalctl`'s release build would need to compile clap into the binary, adding 427+ KiB. The CLI binary should be small -- the `status` command reads a config file and prints JSON; it should not be a 1+ MiB binary.
 3. **The "escape hatch" argument favors manual.** If `tidalctl` grows to 5+ subcommands (e.g., `tidalctl compact`, `tidalctl backup`, `tidalctl schema`), switching from manual to pico-args or clap is a straightforward refactor. The reverse migration (clap to manual) is harder because derive macros become load-bearing.
 4. **Production precedent:** SQLite and DuckDB both use hand-rolled CLI parsers. For embedded database tooling with few commands, this is the norm, not the exception.
 **If the team prefers a library:** pico-args 0.5.0 is the right choice. Zero dependencies, 23 KiB overhead, `#![forbid(unsafe_code)]`, and the API is natural for this use case. Pin to `pico-args = "0.5"`.
 **Do not use clap for `tidalctl` at this scope.** It is the right tool for a CLI with 10+ subcommands and complex argument validation. It is overkill for 2 subcommands and would add 427 KiB to a binary that should be 100-200 KiB total.
 ---
 ## Question 2: Sync Embedded HTTP for Metrics Endpoint
 ### Design Tension
 The m0p2 task document says: "Endpoint can run on the same Tokio runtime as host service (returns `Future` implementor)." But the research question notes: "Needs to work without Tokio as a hard dependency." These are in tension.
 **Resolution:** The metrics endpoint should be designed as a synchronous server running on a background `std::thread`. When a host application has Tokio, it can `tokio::task::spawn_blocking` to move the sync server onto its runtime. The API should return `std::thread::JoinHandle<()>`, not a `Future`. This is simpler, avoids a Tokio dependency, and is compatible with both async and sync host applications.
 A future `metrics-tokio` feature flag could add a `Future`-returning wrapper, but m0p2 does not need it.
 ### Approaches Surveyed
 #### Approach 1: `tiny_http` 0.12.0
 **How it works:** Synchronous HTTP server using `std::net::TcpListener` internally with a thread pool. Handles HTTP/1.1 parsing, keep-alive, chunked transfer, content encoding. You call `server.recv()` in a loop and respond synchronously.
 **Used by:** devserver, nickel (legacy), numerous internal tools. 1.1K GitHub stars, 395 downstream crates.
 **Evidence:**
 - Version 0.12.0, released October 2022. Edition 2018. MSRV 1.57.
 - Core dependencies: `ascii`, `chunked_transfer`, `httpdate` -- minimal tree (~5 crates without TLS).
 - Size: 120 KB crate, ~2.5K source lines.
 - License: MIT/Apache-2.0.
 - No TLS needed for localhost metrics (disable all `ssl-*` features).
 - Uses some `unsafe` internally (HTTP parsing optimizations).
 **Strengths:**
 - Fully synchronous -- no Tokio dependency.
 - Handles HTTP edge cases (keep-alive, chunked, pipelining) correctly.
 - Mature, battle-tested for low-traffic use cases.
 - Simple API: `server.recv()` -> `Request` -> `request.respond(Response)`.
 **Weaknesses:**
 - Last release October 2022 -- 3+ years old. Active maintenance is uncertain.
 - Internal thread pool adds complexity tidalDB does not need for 2 endpoints.
 - Pulls in `ascii` and `chunked_transfer` crates -- small but nonzero dependency surface.
 - Uses `unsafe` internally, which cannot be audited as easily as a hand-rolled solution.
 - MSRV 1.57 is fine, but edition 2018 is dated.
 #### Approach 2: `rouille` 0.6.2
 **How it works:** Macro-based synchronous web framework built on top of `tiny_http`. Adds routing macros, form parsing, and session handling.
 **Used by:** Small Rust web projects. 1.1K GitHub stars.
 **Evidence:**
 - Built on `tiny_http` -- inherits its HTTP handling.
 - Adds significant API surface (routing macros, sessions, forms) that tidalDB does not need.
 - Last commit activity has slowed.
 - License: MIT/Apache-2.0.
 **Strengths:**
 - Routing macros reduce boilerplate for multi-endpoint servers.
 **Weaknesses:**
 - Wrapper around `tiny_http` -- adds dependency on top of dependency.
 - Routing macros are unnecessary for 2 endpoints.
 - Maintenance status unclear.
 - Fails the "200 lines" test -- we are adding a framework when we need 2 `if` branches.
 #### Approach 3: Hand-rolled (`std::net::TcpListener`)
 **How it works:** Bind a `TcpListener`, accept connections in a loop on a background thread, parse the HTTP request line (just the method and path), write a raw HTTP response. For 2 endpoints with static-ish content, this is ~80-120 lines.
 **Used by:** The Rust Book's web server tutorial uses this exact pattern. Prometheus client libraries in other languages often use minimal HTTP for the `/metrics` endpoint. SQLite does not embed an HTTP server, but the pattern is standard for database diagnostics (e.g., RocksDB statistics are often exposed via a hand-rolled HTTP endpoint in embedding applications).
 **Evidence:**
 - Zero dependencies. Zero binary overhead.
 - The Rust standard library's `TcpListener` + `BufReader` handles everything needed for HTTP/1.1 request parsing at this scale.
 - For `/healthz` and `/metrics` with <10 concurrent connections, HTTP keep-alive and chunked transfer are unnecessary -- `Connection: close` on every response is acceptable.
 **Strengths:**
 - Zero dependencies -- maximally embeddable.
 - Audit surface is 80-120 lines of code that the team wrote and understands.
 - No `unsafe` (stays within `#![forbid(unsafe_code)]`).
 - Thread model is explicit: one `std::thread::spawn` with a loop, one `TcpListener`.
 - Trivially testable: connect with `std::net::TcpStream` in integration tests.
 **Weaknesses:**
 - Must handle HTTP parsing manually. But for this scope: read the first line, split on spaces, match path. Malformed requests get a 400 response. This is ~20 lines.
 - No keep-alive, no chunked transfer, no content encoding. Acceptable for dev/ops metrics endpoint at <10 connections.
 - If requirements grow (TLS, WebSocket, many endpoints), must migrate to a real server. But m0p2 has 2 endpoints.
 #### Approach 4: `axum` + Tokio (async)
 **How it works:** Full async web framework built on `hyper` and `tokio`. Tower middleware ecosystem, type-safe extractors, Router-based routing.
 **Used by:** Most production Rust web services. The ecosystem standard for async HTTP.
 **Evidence:**
 - Pulls in `tokio`, `hyper`, `tower`, `http`, and dozens of transitive dependencies.
 - Binary size impact: 1-3 MiB.
 - Compile time: 10-20s for a clean build.
 **Strengths:**
 - Production-grade HTTP handling.
 - Seamless integration if the host application already runs Tokio.
 **Weaknesses:**
 - **Fundamentally incompatible with tidalDB's embeddable philosophy.** Adding Tokio as a dependency means every embedder must link Tokio, even if they never enable metrics. Feature-gating mitigates this, but the `metrics` feature would still pull in the entire async runtime.
 - Massive dependency tree for 2 endpoints.
 - Does not pass the "200 lines" test by orders of magnitude.
 #### Approach 5: `warp` (async, Tokio-based)
 Same category as axum. Pulls Tokio. Same disqualification for the same reasons.
 ### Comparison
 | Criterion | tiny_http 0.12 | rouille 0.6 | Hand-rolled | axum + Tokio |
 |---|---|---|---|---|
 | Async? | No (sync) | No (sync) | No (sync) | Yes |
 | Dependencies | ~5 crates | ~8 crates (via tiny_http) | 0 | ~50+ crates |
 | Binary size impact | ~50-80 KiB | ~80-120 KiB | 0 KiB | 1-3 MiB |
 | Compile time impact | ~1-2s | ~2-3s | 0s | 10-20s |
 | HTTP correctness | Full HTTP/1.1 | Full HTTP/1.1 | Minimal (sufficient) | Full HTTP/1.1 + HTTP/2 |
 | `#![forbid(unsafe_code)]` | No (internal unsafe) | No | Yes | No |
 | MSRV | 1.57 | Unknown | N/A (std only) | ~1.70+ |
 | Maintenance | Last release Oct 2022 | Uncertain | N/A (owned code) | Active |
 | License | MIT/Apache-2.0 | MIT/Apache-2.0 | N/A | MIT |
 | Shutdown coordination | `server.unblock()` | `server.unblock()` | `AtomicBool` flag | `tokio::sync::oneshot` |
 | Concurrent connections | Thread pool | Thread pool | Sequential (acceptable) | Async (unlimited) |
 ### Recommendation: Hand-rolled `std::net::TcpListener`
 **For 2 endpoints serving <10 concurrent connections in a dev/ops context, a hand-rolled HTTP listener is the correct choice.**
 The arguments:
 1. **The "200 lines" test is decisive.** The entire metrics HTTP server -- binding, accept loop, request parsing, routing, response formatting, graceful shutdown -- fits in ~100-120 lines of safe Rust. No dependency justifies its existence here.
 2. **Zero dependency cost.** The `metrics` feature flag should add only tidalDB's own code, not a third-party HTTP server. An embedder who enables `metrics` should not be surprised by new transitive dependencies.
 3. **`#![forbid(unsafe_code)]` compatibility.** tiny_http uses unsafe internally. A hand-rolled solution stays within tidalDB's safety guarantees.
 4. **Shutdown is trivial with an `AtomicBool`.** The background thread checks `running.load(Ordering::Relaxed)` on each accept iteration. `TcpListener::set_nonblocking(true)` with a 100ms poll interval, or use `TcpListener` with `SO_REUSEADDR` and connect-to-self to unblock. Alternatively, set a short `accept` timeout.
 5. **The "escape hatch" works both directions.** If m0p2 grows beyond 2 endpoints or needs TLS, migrating to tiny_http or axum is straightforward -- the endpoint handler functions remain the same, only the server harness changes.
 **API design:**
 ```rust
 /// Start the metrics HTTP server on a background thread.
 ///
 /// Returns a handle that stops the server when dropped.
 pub fn start_metrics_server(addr: std::net::SocketAddr, db: Arc<TidalDb>) -> MetricsHandle;
 pub struct MetricsHandle {
    shutdown: Arc<AtomicBool>,
    thread: Option<std::thread::JoinHandle<()>>,
 }
 impl Drop for MetricsHandle {
    fn drop(&mut self) {
        self.shutdown.store(true, Ordering::Release);
        if let Some(handle) = self.thread.take() {
            let _ = handle.join();
        }
    }
 }
 ```
 **Tokio compatibility:** An embedder running Tokio can wrap this in `tokio::task::spawn_blocking(|| start_metrics_server(...))`. No tidalDB code needs to know about Tokio.
 ---
 ## Question 3: Prometheus Text Format
 ### Format Specification
 The Prometheus text exposition format (version 0.0.4) is line-oriented, UTF-8 encoded, with `\n` line endings:
 ```
 # HELP <metric_name> <docstring>
 # TYPE <metric_name> <counter|gauge|histogram|summary|untyped>
 <metric_name>{<label_name>="<label_value>",...} <value> [<timestamp>]
 ```
 Rules:
 - `# HELP` and `# TYPE` must appear before the first sample for a metric.
 - Only one `# HELP` and one `# TYPE` per metric name.
 - If `# TYPE` is omitted, metric defaults to `untyped`.
 - Label values must escape `\` as `\\`, `"` as `\"`, `\n` as `\\n`.
 - Values are Go `ParseFloat` format: integers, floats, `NaN`, `+Inf`, `-Inf`.
 - Timestamp is optional (milliseconds since epoch). Prometheus will use scrape time if omitted.
 - Content-Type: `text/plain; version=0.0.4; charset=utf-8`.
 ### Example for tidalDB's metrics
 ```
 # HELP tidaldb_uptime_seconds Seconds since the database was opened.
 # TYPE tidaldb_uptime_seconds gauge
 tidaldb_uptime_seconds{partition_id="0"} 3723.5
 # HELP tidaldb_wal_sequence Current WAL sequence number.
 # TYPE tidaldb_wal_sequence counter
 tidaldb_wal_sequence{partition_id="0"} 148293
 # HELP tidaldb_wal_queue_depth Number of WAL entries pending flush.
 # TYPE tidaldb_wal_queue_depth gauge
 tidaldb_wal_queue_depth{partition_id="0"} 12
 # HELP tidaldb_build_info Build metadata. Value is always 1.
 # TYPE tidaldb_build_info gauge
 tidaldb_build_info{version="0.1.0",build_hash="abc123",partition_id="0"} 1
 # HELP tidaldb_open_segments Number of open WAL segments.
 # TYPE tidaldb_open_segments gauge
 tidaldb_open_segments{partition_id="0"} 3
 ```
 ### Approaches Surveyed
 #### Approach 1: `prometheus` crate (tikv/rust-prometheus) 0.13.x
 **How it works:** Registry-based. Create `Counter`, `Gauge`, `Histogram` objects, register them with a `Registry`, call `TextEncoder::encode()` to produce the exposition format.
 **Used by:** TiKV, Linkerd, numerous Rust services. The de facto standard.
 **Evidence:**
 - Well-maintained (tikv organization). License: Apache-2.0.
 - Pulls in `protobuf` (for optional protobuf format), `lazy_static`, `parking_lot`, `memchr`.
 - Forces string allocations during metric collection (Collector trait limitation).
 - Binary size: ~100-200 KiB.
 - MSRV: 1.56.
 **Strengths:**
 - Battle-tested encoding. Guaranteed format correctness.
 - Histogram and summary support built-in.
 **Weaknesses:**
 - Significant dependency tree for 5 counters/gauges.
 - `protobuf` dependency is unnecessary for text-only exposition.
 - Allocation-heavy collector API (documented ~40% slower than prometheus-client).
 - Overkill: we need `writeln!` for 5 metrics, not a registry system.
 #### Approach 2: `prometheus-client` crate 0.22.x
 **How it works:** OpenMetrics-compatible. Type-safe labels via Rust type system (not string pairs). Visitor-based encoding (no allocations).
 **Used by:** Official Prometheus Rust client. Recommended for new projects.
 **Evidence:**
 - Prometheus organization maintained. License: Apache-2.0.
 - No unsafe code.
 - ~40% faster encoding than tikv/rust-prometheus due to visitor pattern.
 - Smaller dependency footprint than tikv version.
 **Strengths:**
 - Type-safe labels catch errors at compile time.
 - No allocation during encoding.
 - Official Prometheus project.
 **Weaknesses:**
 - Still a registry-based abstraction layer for 5 metrics.
 - Adds dependency tree that is not justified for the scope.
 #### Approach 3: Hand-written format
 **How it works:** Use `write!` / `writeln!` to a `String` or `Vec<u8>`, following the format spec directly. For 5 counters/gauges with static names and 1-2 labels, this is a function that reads metric values and formats them.
 **Evidence:**
 - The format is trivially simple for counters and gauges. The complete formatting logic for 5 metrics is ~30-40 lines.
 - No histograms or summaries needed at m0p2 scope.
 - Validation: the output must match `# HELP`, `# TYPE`, then metric lines. A unit test can assert the format parses correctly (or simply check line structure).
 **Strengths:**
 - Zero dependencies.
 - Complete control over output format.
 - Trivially auditable -- the format spec is 1 page.
 - No registry overhead, no trait objects, no allocations beyond the output buffer.
 **Weaknesses:**
 - Must follow the spec precisely. If a label value contains `"` or `\n`, it must be escaped. For tidalDB's labels (`partition_id="0"`, `version="0.1.0"`), these are compile-time string literals -- no escaping needed.
 - If tidalDB grows to 50+ metrics with histograms, a library becomes justified. But at 5-10 counters/gauges, it is not.
 ### Comparison
 | Criterion | prometheus (tikv) | prometheus-client | Hand-written |
 |---|---|---|---|
 | Dependencies | ~8 (incl. protobuf) | ~3 | 0 |
 | Binary size | ~100-200 KiB | ~50-100 KiB | 0 KiB |
 | Histogram support | Yes | Yes | No (not needed) |
 | Allocation during encode | Yes (Collector trait) | No (visitor pattern) | No (write! to buffer) |
 | Format correctness | Guaranteed | Guaranteed | Unit-tested |
 | Lines of code (user-side) | ~30 (register + encode) | ~30 (register + encode) | ~40 (format directly) |
 | `#![forbid(unsafe_code)]` | Unknown | Yes | Yes |
 ### Recommendation: Hand-written Prometheus text format
 For 5-10 counters and gauges with known-safe label values, hand-writing the exposition format is the clear choice. The implementation is approximately 40 lines:
 ```rust
 use std::fmt::Write;
 pub fn render_prometheus_metrics(metrics: &MetricsSnapshot) -> String {
    let mut out = String::with_capacity(1024);
    write_gauge(&mut out, "tidaldb_uptime_seconds",
        "Seconds since the database was opened",
        &[("partition_id", "0")], metrics.uptime_secs);
    write_counter(&mut out, "tidaldb_wal_sequence",
        "Current WAL sequence number",
        &[("partition_id", "0")], metrics.wal_sequence);
    // ... more metrics
    out
 }
 fn write_gauge(out: &mut String, name: &str, help: &str,
               labels: &[(&str, &str)], value: f64) {
    let _ = writeln!(out, "# HELP {name} {help}");
    let _ = writeln!(out, "# TYPE {name} gauge");
    write_sample(out, name, labels, value);
 }
 fn write_counter(out: &mut String, name: &str, help: &str,
                 labels: &[(&str, &str)], value: f64) {
    let _ = writeln!(out, "# HELP {name} {help}");
    let _ = writeln!(out, "# TYPE {name} counter");
    write_sample(out, name, labels, value);
 }
 fn write_sample(out: &mut String, name: &str,
                labels: &[(&str, &str)], value: f64) {
    let _ = write!(out, "{name}{{");
    for (i, (k, v)) in labels.iter().enumerate() {
        if i > 0 { let _ = write!(out, ","); }
        let _ = write!(out, "{k}=\"{v}\"");
    }
    let _ = writeln!(out, "}} {value}");
 }
 ```
 **When to migrate:** If tidalDB needs histograms (e.g., query latency distributions) or 50+ metrics, adopt `prometheus-client` (the official Prometheus crate, not tikv's). Pin to `prometheus-client = "0.22"`. But that is a post-m0p2 decision.
 ---
 ## Question 4: Serde for Config Serialization
 ### Current State
 `Config` is a 4-field struct (`mode: StorageMode`, `data_dir: Option<PathBuf>`, `wal_dir: Option<PathBuf>`, `cache_dir: Option<PathBuf>`). It currently has no serialization support. The CLI needs to read a serialized config snapshot from disk.
 ### Approaches Surveyed
 #### Approach 1: `serde` + `serde_json` (feature-gated on library crate)
 **How it works:** Add `#[derive(Serialize, Deserialize)]` to `Config` and `StorageMode` behind a `serde` feature flag. The CLI binary depends on the library with the `serde` feature enabled. `serde_json` handles the JSON encoding.
 **Evidence:**
 - `serde` (1.0.228) and `serde_json` (1.0.149) are already in `Cargo.lock` via criterion.
 - CODING_GUIDELINES.md line 296 explicitly approves serde/serde_json: "serialization (at API boundaries only, not in hot paths)."
 - Best practice from Rust API Guidelines and community consensus: library crates should feature-gate serde behind an optional `serde` feature.
 - Binary size: serde_json adds ~70-100 KiB to release binaries. serde_derive's proc-macro adds ~5-10s to initial compile, but is already compiled for criterion.
 - fjall (tidalDB's storage engine) does not use serde -- adding it to tidalDB does not create a circular dependency or conflict.
 **Strengths:**
 - Industry standard. Every Rust developer knows serde.
 - Already approved in CODING_GUIDELINES.md.
 - Already compiled in dev builds (via criterion).
 - Feature-gated: embedders who do not need serialization pay zero cost.
 - Config is at an API boundary (CLI reads library's config), exactly where serde belongs.
 **Weaknesses:**
 - serde_derive adds proc-macro compile time. Mitigated by: already compiled for criterion.
 - Monomorphization can bloat binary. Mitigated by: Config is a small struct with 4 fields; the generated code is minimal.
 #### Approach 2: `miniserde`
 **How it works:** Lightweight alternative to serde that uses trait objects instead of monomorphization. ~12x less code than serde + serde_derive + serde_json combined.
 **Evidence:**
 - JSON-only. No format plugins.
 - No error messages on deserialization failure.
 - Does not support enums with data (only C-style enums). `StorageMode` is C-style, so this works.
 - Does not support `#[serde(rename)]` or most serde attributes.
 - Limited type support (no tuple structs, no enums with variant data).
 **Strengths:**
 - Smaller binary size than serde.
 - Faster compile time (no proc-macro overhead comparable to serde_derive).
 **Weaknesses:**
 - serde is already compiled in the workspace. miniserde adds a *new* dependency tree rather than reusing what exists.
 - No error messages -- if the CLI reads a corrupt config file, it gets `None` with no indication of what went wrong.
 - Would become a migration tax later when tidalDB needs serde for other types (e.g., schema definitions, ranking profiles).
 #### Approach 3: Hand-written JSON serialization
 **How it works:** Implement `Display` for `Config` that writes JSON manually, and a `from_json_str` function that parses it. For a 4-field struct, this is ~50-80 lines.
 **Evidence:**
 - Zero dependencies.
 - But: manual JSON parsing is error-prone. Escaping, nested objects, null handling, and whitespace tolerance all need implementation.
 - tidalDB will need JSON serialization in multiple places beyond Config (API responses, query results, schema export). Implementing a JSON parser from scratch to avoid an already-approved dependency is false economy.
 **Strengths:**
 - Zero dependency cost.
 **Weaknesses:**
 - JSON parsing is not a 200-line problem if done correctly. Escaping, unicode, nested structures, error reporting -- this is exactly what serde_json solves.
 - Creates maintenance burden that serde eliminates.
 - CODING_GUIDELINES.md already approved serde for this exact use case.
 ### Comparison
 | Criterion | serde + serde_json | miniserde | Hand-written |
 |---|---|---|---|
 | Already in Cargo.lock | Yes (via criterion) | No | N/A |
 | Approved in CODING_GUIDELINES | Yes (explicitly) | No | N/A |
 | Error messages on parse failure | Yes (detailed) | None | Custom |
 | Enum support | Full | C-style only | Custom |
 | Future reuse in tidalDB | High (schema, API, query results) | Low | Low |
 | Binary size overhead | ~70-100 KiB | ~30-50 KiB | 0 KiB |
 | Compile time overhead | 0s (already compiled) | New compilation | 0s |
 | Correctness risk | None (battle-tested) | Low | Medium (hand-rolled parser) |
 ### Recommendation: `serde` + `serde_json`, feature-gated
 **This is the one dependency question where the answer is unambiguously "use the library."**
 1. **Already approved.** CODING_GUIDELINES.md says: "serde / serde_json -- serialization (at API boundaries only, not in hot paths)." Config serialization for CLI communication is the textbook API boundary use case.
 2. **Already compiled.** Both crates are in Cargo.lock via criterion. Adding them as optional dependencies of the main crate adds zero compile time for developers who are already running tests and benchmarks.
 3. **Future-proof.** tidalDB will need JSON serialization for: config export, schema definitions, query result formatting, API responses, ranking profile serialization. Every one of these will use serde. Starting with Config establishes the pattern.
 4. **Feature-gate it.** The library crate adds:
 ```toml
 [dependencies]
 serde = { version = "1", features = ["derive"], optional = true }
 serde_json = { version = "1", optional = true }
 [features]
 serde = ["dep:serde", "dep:serde_json"]
 ```
 And on the struct:
 ```rust
 #[derive(Debug, Clone)]
 #[cfg_attr(feature = "serde", derive(serde::Serialize, serde::Deserialize))]
 pub struct Config {
    pub mode: StorageMode,
    pub data_dir: Option<PathBuf>,
    pub wal_dir: Option<PathBuf>,
    pub cache_dir: Option<PathBuf>,
 }
 ```
 Embedders who do not need serialization pay nothing. The `tidalctl` binary crate depends on `tidaldb = { path = "../tidal", features = ["serde"] }`.
 ---
 ## Open Questions
 1. **Config file format and location.** m0p2 task-01 says the CLI reads a "Config dump." Where does the running database write this? Likely `{data_dir}/config.json` written atomically during `TidalDb::open()`. The exact path should be a `Paths` method (e.g., `paths.config_file()`). This is an implementation decision for the engineer, not a research question.
 2. **Metrics collection mechanism.** The hand-rolled metrics HTTP server needs to read metrics from the database. What is the interface? Options: (a) `TidalDb` exposes a `pub fn metrics_snapshot(&self) -> MetricsSnapshot` method; (b) a shared `Arc<AtomicU64>` counter registry. Option (a) is simpler and keeps the metrics code behind the public API. The engineer should decide based on what metrics are available at m0p2 (uptime and build info are trivial; WAL sequence requires WAL to be wired up).
 3. **Graceful shutdown of the HTTP listener.** `std::net::TcpListener::accept()` blocks. To unblock it for shutdown, three options: (a) `set_nonblocking(true)` with a polling loop (simple, slight CPU waste); (b) connect-to-self to unblock accept (clever, no CPU waste); (c) use `SO_REUSEADDR` + `shutdown` on a cloned socket. Option (a) with a 200ms sleep is the simplest and sufficient for a diagnostics endpoint. Benchmark the CPU overhead if concerned -- it will be negligible for a 200ms poll.
 4. **When to add `clap`.** If `tidalctl` grows beyond 5 subcommands or needs dynamic completions, switch to `clap`. The migration from manual to clap is a single-commit refactor: define a derive struct matching the existing `match` arms. Document this as the escape hatch in the `tidalctl` crate README.
 5. **When to add `prometheus-client`.** If tidalDB needs histograms (query latency distributions, signal write latency distributions) or exceeds 20 metrics, adopt `prometheus-client = "0.22"`. The hand-written format functions become a `MetricFamily` registration. Document the threshold.
 6. **Integration testing the HTTP endpoint.** The test should `start_metrics_server` on an ephemeral port, `GET /metrics` with `std::net::TcpStream`, and assert the response contains expected metric lines. This is straightforward with the hand-rolled approach and does not require an HTTP client library -- raw TCP + string matching is sufficient.
 ---
 ## Summary of Recommendations
 | Component | Recommendation | Justification |
 |---|---|---|
 | CLI argument parsing | Manual `std::env::args()` | 2 subcommands, 60 lines. "200 lines" test passes. Upgrade path to pico-args/clap exists. |
 | HTTP metrics server | Hand-rolled `std::net::TcpListener` | 2 endpoints, <10 connections. ~100 lines of safe Rust. Zero dependencies. |
 | Prometheus text format | Hand-written `write!` formatting | 5-10 counters/gauges. ~40 lines. Format spec is trivial for this scope. |
 | Config serialization | `serde` + `serde_json`, feature-gated | Already approved, already compiled, future-proof. Feature-gate as `serde`. |
 **Total new dependencies for m0p2:** One optional dependency pair (`serde` + `serde_json`) that is already in Cargo.lock and already approved. Everything else is standard library code.
 **Estimated code footprint for m0p2 tooling:**
 - `tidalctl` binary: ~150-200 lines (arg parsing + config reading + JSON output)
 - Metrics HTTP server: ~100-120 lines (listener + routing + response)
 - Prometheus formatter: ~40-50 lines (metric rendering)
 - Config serde derives: ~5 lines (derive attributes + feature gate)
 ---
 ## Sources
 ### CLI Argument Parsing
 - [Rain's Rust CLI Recommendations: Picking an Argument Parser](https://rust-cli-recommendations.sunshowers.io/cli-parser.html)
 - [argparse-rosetta-rs: Benchmark data for Rust argument parsers](https://github.com/rosetta-rs/argparse-rosetta-rs) -- compile time, binary size, parse time comparisons
 - [pico-args: Ultra simple CLI arguments parser](https://github.com/RazrFalcon/pico-args) -- 666 lines, zero deps, `#![forbid(unsafe_code)]`
 - [lexopt: Minimalist pedantic command line parser](https://github.com/blyxxyz/lexopt) -- MSRV 1.31, zero deps
 - [clap: Full featured CLI parser](https://docs.rs/clap/latest/clap/) -- MSRV 1.74, derive API
 - [argh: Google's derive-based parser](https://github.com/google/argh) -- BSD-3-Clause, Fuchsia conventions
 - [Rust CLI argument parsing libraries comparison (jpab.uk)](https://www.jpab.uk/blog/review-rust-cli-flag-parsers/)
 ### HTTP Servers
 - [tiny-http: Low level HTTP server library in Rust](https://github.com/tiny-http/tiny-http) -- v0.12.0, MSRV 1.57, 1.1K stars
 - [Rust Book: Building a Multithreaded Web Server](https://doc.rust-lang.org/book/ch21-02-multithreaded.html) -- std::net::TcpListener pattern
 - [rouille: Synchronous micro-framework on crates.io](https://crates.io/crates/rouille)
 - [Is there any popular synchronous HTTP crate? (Rust Forum)](https://users.rust-lang.org/t/is-there-any-popular-synchronous-http-crate/108111)
 ### Prometheus Text Format
 - [Prometheus Exposition Formats (official specification)](https://prometheus.io/docs/instrumenting/exposition_formats/) -- format version 0.0.4
 - [tikv/rust-prometheus: Instrumentation library](https://github.com/tikv/rust-prometheus) -- Collector trait, string allocation issue
 - [prometheus/client_rust: Official Prometheus Rust client](https://github.com/prometheus/client_rust) -- visitor pattern, no unsafe, ~40% faster encoding
 - [OpenMetrics specification](https://prometheus.io/docs/specs/om/open_metrics_spec/)
 ### Serialization
 - [Serde: Serialization framework for Rust](https://serde.rs/) -- feature flags documentation
 - [Serde use within a library -- best practices (Rust Forum)](https://users.rust-lang.org/t/serde-use-within-a-library-best-practices/111059) -- feature-gating consensus
 - [miniserde: Data structure serialization library](https://docs.rs/miniserde) -- 12x less code than serde, JSON-only, limited type support
 - [Rust serialization benchmarks](https://github.com/djkoloski/rust_serialization_benchmark)
 - [Rust API Guidelines: Feature flag naming for serde](https://github.com/rust-lang/api-guidelines/discussions/180)
--- a/site/content/blog/every-platform-builds-the-same-6-systems.mdx
+++ b/site/content/blog/every-platform-builds-the-same-6-systems.mdx
@ -0,0 +1,145 @@
 ---
 title: "Every content platform builds the same 6 systems from scratch"
 date: "2026-02-20"
 author: "tidalDB"
 description: "The Elasticsearch + Redis + Kafka + feature store + vector DB + ranking service stack is not an architecture. It is scar tissue. Here is why."
 tags: ["architecture", "vision", "recommendation-systems"]
 ---
 You have operated this system. You may be operating it right now.
 Elasticsearch for retrieval. Redis for hot signals. Kafka for event ingestion. A feature store for user profiles. A vector database for semantic similarity. A ranking service that stitches all five together into a sorted list and hopes the data is consistent by the time it arrives.
 Six systems. Six deployment targets. Six failure modes. Six sets of credentials, backup strategies, scaling characteristics, and on-call runbooks. All of them maintained by your team, all of them in service of one question: *given this user, right now, what should they see?*
 This post is about why the stack exists, why it persists, and what should be true instead.
 ## The six systems, named
 They show up in the same order at every company. The specifics vary -- Solr instead of Elasticsearch, Memcached instead of Redis, Pulsar instead of Kafka -- but the shape is identical.
 **System 1: The search index.** Elasticsearch, Solr, or Typesense. Ingests your content catalog, tokenizes text, builds an inverted index, returns results ranked by BM25. It handles keyword search well. It handles everything else poorly. You will spend months teaching it to sort by "trending" using a score field you update from outside, on a schedule, that is stale before the update finishes.
 **System 2: The cache layer.** Redis or Memcached. Holds the hot data that the search index cannot serve fast enough -- trending scores, view counts, precomputed ranking features. You will write a cache invalidation layer. It will have bugs. The bugs will manifest as users seeing content that should have been suppressed, or not seeing content that should have surfaced. These bugs will be intermittent, hard to reproduce, and never fully resolved.
 **System 3: The event bus.** Kafka, Pulsar, or Kinesis. Ingests engagement signals -- views, likes, skips, shares -- and routes them to consumers that update every other system. The consumers will lag. Not always. Not predictably. But at 2am on a Saturday when a piece of content goes viral, the lag between "user liked this" and "the ranking query reflects it" will stretch from milliseconds to seconds to minutes. Your users will notice.
 **System 4: The feature store.** Feast, Tecton, or a homegrown Redis-backed key-value store. Holds user profiles, engagement histories, computed features. Exists because the ranking service needs user context at query time and cannot afford to compute it on the fly. The feature store introduces its own consistency problem: the features it serves are snapshots. By the time they reach the ranker, the user may have liked three more items and blocked a creator. The features do not know this.
 **System 5: The vector database.** Pinecone, Weaviate, Qdrant, Milvus, or pgvector bolted onto PostgreSQL. Holds content embeddings for semantic similarity search. Takes a user preference vector or a query embedding, returns the nearest neighbors. The problem: it knows nothing about signals, recency, relationships, or diversity. It returns semantically similar content. Whether that content is trending, stale, hidden by the user, or from a blocked creator -- not its concern.
 **System 6: The ranking service.** The application you wrote. A microservice that calls systems 1 through 5 in sequence, merges their outputs, applies scoring logic, enforces diversity rules, handles edge cases, and returns a sorted list. This is the system that has the most bugs, the most latency, and the most institutional knowledge locked in the heads of two engineers who are not allowed to go on vacation at the same time.
 Six systems. None of them were built for the ranking problem. All of them are pressed into service because there is no single system that was.
 ## Where correctness dies
 The failure modes are not in the systems themselves. Redis is fast. Kafka is durable. Elasticsearch is a competent search engine. The failure modes live in the seams between them.
 **Stale signals.** A user likes an item. The event enters Kafka. A consumer processes it and updates Redis. Another consumer updates the feature store. A third updates Elasticsearch's score field. Each update happens at a different time. Between the first update and the last, the ranking service is reading a mix of old and new state. The feed the user sees is computed from data that contradicts itself.
 This is not a theoretical concern. It is Tuesday.
 **Cache invalidation.** The trending score in Redis says an item is hot. The engagement data in the feature store says it is not -- the initial burst of views came from a bot network and the quality signals collapsed an hour ago. The cache TTL has not expired. The item remains in the trending feed for another 14 minutes. Fourteen minutes is an eternity in a content platform. Thousands of users see a recommendation the system already knows is wrong.
 **ETL lag.** The feature store runs a batch pipeline every 15 minutes to recompute user preference vectors. A user blocks a creator at minute 1. For the next 14 minutes, the blocked creator's content still appears in the user's feed. Not because the system is broken. Because the architecture is designed around batched state synchronization, and batched state synchronization is, by definition, eventually wrong.
 **The feedback gap.** A user skips three items in a row from the same creator. The skip events enter Kafka. They will eventually update the user's preference vector in the feature store and the creator's penalty score in Redis. Eventually. In the meantime, the ranking service is still using the stale preference vector and the stale creator score. It recommends a fourth item from the same creator. The user taps "Not interested." A fifth item appears. The user closes the app.
 This is not a bug in any one system. It is the architecture working exactly as designed. The architecture is the bug.
 **Agents make the seams worse.** When you add an LLM-mediated agent to the loop, the agent needs to ground its answer in fresh memory and emit feedback (preference hints, critiques, reward). In the 6-system stack those feedback signals live in a scratchpad or a sidecar vector store. None of the six systems know about them, which means the agent is reasoning over a different world than the ranking service. Latency compounds; correctness dies even faster.
 ## How we got here
 The 6-system stack is not the product of deliberate design. It is an accretion. Understanding how it forms explains why it persists.
 **Phase 1: Search.** The platform launches with a content catalog and a search bar. The team picks Elasticsearch because it handles full-text search. This is a reasonable decision. Elasticsearch is good at search.
 **Phase 2: Ranking.** Users want more than search. They want a feed -- personalized, sorted by relevance, refreshed on every visit. Elasticsearch can sort by a score field, so the team adds a `ranking_score` field and updates it with a cron job. The cron job reads engagement data from the application database, computes a formula, and writes the result to Elasticsearch. This works for six months.
 **Phase 3: Speed.** The ranking formula needs real-time signal data -- view counts, like counts, trending velocity. The application database cannot serve these at the read frequency the ranking service demands. The team adds Redis as a hot cache. Now the ranking formula reads from Redis instead of the application database. Engagement data flows into Redis via application writes. This works, but cache invalidation becomes a recurring source of bugs.
 **Phase 4: Scale.** The platform grows. Engagement events arrive at thousands per second. Writing directly to Redis and Elasticsearch from the application path introduces latency on every user action. The team adds Kafka as a buffer. Events flow into Kafka, and consumers asynchronously update Redis, Elasticsearch, the feature store, and the vector database. The system is now eventually consistent. "Eventually" is doing a lot of work in that sentence.
 **Phase 5: Personalization.** Users want personalized results, not just globally popular content. Personalization requires per-user feature vectors -- engagement history, topic affinity, creator preferences. These features are too expensive to compute at query time. The team adds a feature store that batch-computes user vectors and serves them as key-value lookups. The feature store is always stale by the duration of its batch interval.
 **Phase 6: Semantic search.** Users expect "find me something like this" to work. Keyword matching cannot do this. The team adds a vector database for embedding-based similarity search. The vector database knows nothing about engagement, recency, or user context. The ranking service must call it separately and merge its results with the keyword results, the cached signals from Redis, and the user features from the feature store.
 Each step is individually rational. The result is collectively irrational. A distributed system with six sources of truth, six consistency models, and one ranking service trying to produce a coherent answer from all of them.
 ## The root cause
 The stack exists because existing databases were not built with ranking in mind. This is not a criticism -- PostgreSQL, Elasticsearch, and Redis were built to solve different problems, and they solve them well. But when you ask a search engine to be a ranking engine, you inherit the wrong abstraction.
 A search engine models data as documents with fields. You search for documents matching a query. You sort by a field. The field is a static value that you update from outside.
 But ranking is not a static value. A "trending score" is a velocity -- the rate of change of engagement signals over a time window. It changes every second. An "engagement decay score" is a function of time since the last signal event. It changes continuously, without any new data arriving. A "personalized relevance score" is a function of the user's preference vector, the item's embedding, the user's relationship to the creator, the item's signal history, and the diversity of the current result set. It is different for every user, every query, every moment.
 None of these are fields. They are computations that depend on temporal state, user context, and signal dynamics. Forcing them into a field-update model is what creates the 6-system stack. You need Redis because the search engine cannot compute these values fast enough. You need Kafka because updating them synchronously is too slow. You need a feature store because user context is too expensive to derive at query time. You need a vector database because semantic similarity is a different index structure entirely.
 The seams are not incidental. They are structural. They exist because the foundational abstraction -- data as documents with static fields -- does not fit the problem.
 ## What should be true
 A database that understands ranking as a primitive would not need the stack. Here is what it would look like.
 **Signals are a schema-level type.** A "view" signal is not a counter you increment in Redis and hope stays consistent. It is a typed, timestamped event stream declared in the database schema, with a decay rate, a set of time windows, and velocity computation -- all maintained by the database. You write the event. The database handles aggregation, windowing, and decay. When you query for "trending," the database reads signal velocity directly. No external cache. No stale scores.
 **User context is a database-managed state.** The user's preference vector is not a row in a feature store updated every 15 minutes. It is a living embedding that the database shifts every time the user engages with content. A like shifts it toward the item's embedding. A skip shifts it away. The next query reflects this. Not in 15 minutes. Now.
 **The write path and the read path are one system.** When a user likes an item, the database atomically updates the item's signal ledger, the user's preference vector, and the user-to-creator relationship weight. No event bus between the engagement and the ranking update. No consumer lag. No eventual consistency. The write *is* the ranking update.
 **Negative signals are equal citizens.** A skip is not the absence of a like. It is data. A hide is a permanent exclusion. A block removes all of a creator's content from all future queries. These are not afterthought filter operations applied in the ranking service. They are first-class signal types with their own decay rates, their own velocity, and their own weight in the scoring function.
 **Diversity is a query constraint.** "No more than 2 items per creator" is not a post-processing step in your API layer. It is a parameter the database enforces after scoring, as part of the query execution pipeline. The application specifies the constraint. The database enforces it. The result set is reordered, not reduced.
 **All sort modes are native.** Trending, hot, rising, controversial, hidden gems, top-this-week, shuffle -- these are not formulas your application computes and passes to the database as a sort key. They are built-in sort modes the database executes natively, using signal velocity, windowed aggregation, and decay functions it already maintains.
 This is not a fantasy. Every one of these properties follows from a single architectural decision: model signals, decay, velocity, and user context as database primitives, not as application logic distributed across six systems.
 ## One question, one query
 The 6-system stack exists to answer one question: given this user, right now, what should they see?
 That question should be one query.
 Not six network calls. Not a ranking service that merges five data sources and hopes they agree. Not a system where "consistency" means "consistent within each subsystem, inconsistent across all of them."
 One query that retrieves candidates, applies filters, scores using live signals and user context, enforces diversity, and returns a ranked list. One query where the data is never stale because the write path and the read path share a storage model. One query where a signal written 100 milliseconds ago is reflected in the result.
 ```
 RETRIEVE items
 FOR USER @user_id
 CONTEXT feed
 USING PROFILE for_you
 FILTER unseen, unblocked, format:video, duration:short
 DIVERSITY max_per_creator:2, format_mix:true
 LIMIT 50
 ```
 That is what six systems currently produce. It should be one query that an agent can issue, jot its feedback into, and trust to be correct on the next round.
 The database that treats ranking as a primitive -- not as an afterthought bolted on top of a search engine, not as a formula computed in a microservice, not as a cache warmed from a batch pipeline -- does not need the stack. It replaces it.
 ## A fair read of the existing systems
 To be clear: these systems are good at what they were designed to do.
 - **Search indexes (Elasticsearch, Solr, Typesense):** excellent full-text retrieval, BM25 relevance, and query/filter infrastructure.
 - **Caches (Redis, Memcached):** excellent low-latency read/write paths for hot counters and precomputed features.
 - **Event buses (Kafka, Pulsar, Kinesis):** excellent durable, high-throughput event transport and decoupled consumer architectures.
 - **Feature stores (Feast, Tecton, homegrown):** excellent offline/online feature serving patterns for ML pipelines.
 - **Vector databases (Pinecone, Weaviate, Qdrant, Milvus, pgvector):** excellent nearest-neighbor retrieval over embeddings with metadata filtering.
 - **Ranking services (custom microservices):** excellent place to encode product-specific ranking logic when no single system owns the full problem.
 - **Integrated retrieval/ranking platforms (for example, Vespa):** excellent end-to-end search and ranking infrastructure when teams can operate larger specialized serving systems.
 **What makes tidalDB different (one line):** it treats signals, user context, ranking, diversity, and feedback writes as one atomic database system instead of six synchronized subsystems.
 **Where we are intentionally focused:** personalized content loops where feedback intent is explicit -- `skip_for_now` (soft), `not_for_me` (preference), `low_quality` (quality), `hide/mute` (hard exclude) -- and the next ranked result updates immediately; not generic search infrastructure breadth.
 Every content platform builds the same 6 systems because no database was built for this problem. The stack is not an architecture. It is scar tissue from the absence of one.
 ---
 *tidalDB is an open-source, embeddable Rust database for personalized content ranking. Follow the build on [GitHub](https://github.com/orchard9/tidalDB).*
--- a/site/content/blog/why-tidaldb.mdx
+++ b/site/content/blog/why-tidaldb.mdx
@ -2,46 +2,53 @@
 title: "Why we're building tidalDB"
 date: "2026-02-20"
 author: "Jordan Washburn"
-description: "Every content platform builds the same 6-system stack from scratch. We're replacing it with one database."
+description: "tidalDB is a single-process Rust database for personalized content ranking. Here is what it does and how it works."
 tags: ["vision", "architecture"]
 ---
-Every platform that serves personalized content — a media library, a social feed, a marketplace, a content discovery surface — eventually builds the same distributed system from scratch.
+tidalDB is a database that answers one question: given this user, right now, what should they see?
-Elasticsearch for retrieval. Redis for hot signals. Kafka for event ingestion. A feature store for user profiles. A vector database for semantic search. A ranking service that tries to stitch all of the above together into a single ordered list.
+Agents now sit between the user and many surfaces, so session memory still matters. But the core focus is personalized content ranking. tidalDB is not trying to out-feature every search platform. It is a database where signals, decay, negative feedback, and diversity are schema-level primitives — and ranking updates immediately after user actions.
-We've built this stack. We've operated it. We've watched the seams between systems become the place where correctness dies — stale signals in Redis that don't match Elasticsearch, Kafka consumers that lag by seconds when they should lag by zero, cache invalidation bugs that surface as "why did the user see that item again?"
+I wrote separately about [why every content platform ends up operating six systems](/blog/every-platform-builds-the-same-6-systems) to answer that question. This post is about what we are building instead.
-The root cause is clear: none of these systems were built for the ranking problem. They treat it as an afterthought. A sort clause. A float field. A bolt-on scoring function.
+## The primitives
-## The observation
+tidalDB has five core concepts. Everything else follows from them.
-Ranking is not a feature. It is a primitive.
+**Entities** are Items, Users, and Creators. Each carries metadata, an embedding slot, and a signal ledger. You define them in schema with typed fields — text fields are full-text indexed, keyword fields are filterable, embeddings are ANN-indexed. The database owns the indexes.
-A signal that decays over time is not a field you update with a cron job. It is a type the database understands — with a half-life declared in schema and a decayed value computed at query time.
+**Signals** are typed, timestamped event streams with decay and velocity built in. You declare a signal type once:
-A "trending" sort is not a formula your application computes and stores in a column. It is a built-in sort mode that reads signal velocity natively.
+```rust
 db.define_signal(SignalDef {
    name: "view",
    target: EntityKind::Item,
    decay: Decay::Exponential { half_life: Duration::days(7) },
    windows: vec![
        Window::hours(1),
        Window::hours(24),
        Window::days(7),
        Window::all_time(),
    ],
    velocity: true,
 })?;
 ```
-A diversity constraint — "no more than 2 items from the same creator" — is not post-processing logic in your API layer. It is a query parameter the database enforces after scoring.
+That declaration tells the database everything it needs. When a view event arrives, the database maintains windowed counts, computes velocity, and applies exponential decay — all at write time, all O(1). You never compute `trending_score = views / (age_hours + 2)^1.8` in application code. You never update a stale float field on a cron schedule. The database does this natively, and it does it correctly.
-Once you see it this way, the 6-system stack looks like what it is: scar tissue from forcing the wrong abstraction.
+Negative signals — skips, hides, blocks — are the same type. A skip is not the absence of a like. It is data with its own decay rate and its own weight in the scoring function.
-## What tidalDB is
+**Ranking Profiles** are named, versioned scoring functions declared in schema. They reference signals, relationship weights, recency curves, and diversity rules. You swap profiles at query time by name — no redeploy, no recompile. This is how you A/B test ranking: two profiles, one query parameter.
-A single-node-first, embeddable Rust database designed specifically for personalized content ranking. One process. One query interface. One operational model.
+**Sessions** capture agent context. A session binds a user, an agent identity, and a short-lived memory lane. Agents append structured signals (preference hints, reward scores, tool metadata) with aggressive decay while policies live in schema: what an agent can read, how often it may write, how long data persists.
-The core primitives:
+**The query** brings it together. Candidate retrieval, filtering, personalized ranking, and diversity enforcement in a single operation:
 - **Entities** — Items, Users, Creators. Each with metadata, an embedding slot, and an attached signal ledger.
 - **Signals** — Typed, timestamped event streams with native decay, velocity, and windowed aggregation. You declare a `view` signal with a 7-day half-life. The database does the rest.
 - **Ranking Profiles** — Named, versioned scoring functions that live in the database. Reference signals, relationships, recency curves, and diversity rules. Swap at query time by name.
 - **One query** — Candidate retrieval, filtering, personalized ranking, and diversity enforcement in a single operation.
 The query that currently takes 6 systems to produce:
 ```
 RETRIEVE items
 FOR USER @user_id
 FOR SESSION @session_id
 CONTEXT feed
 USING PROFILE for_you
 FILTER unseen, unblocked, format:video, duration:short
@ -49,33 +56,58 @@ DIVERSITY max_per_creator:2, format_mix:true
 LIMIT 50
 ```
 One call. No network hops between subsystems. No merging results from five data sources. The database handles retrieval strategy (ANN, BM25, graph walk, full scan), applies hard filters, scores candidates against live signal state, enforces diversity constraints, and returns a ranked list. The agent gets the list along with a session snapshot (top signals, reward velocity, last tool it used) so it can explain its answer.
 ## The feedback loop
-When a user views, likes, skips, or hides content, the signal is written directly to the database. The item's signal ledger updates. The user's preference vector shifts. The relationship weight between user and creator adjusts. All atomically, all in the same write transaction.
+This is the part that makes the architecture honest.
-The next ranking query — even 100ms later — reflects the updated state.
+When a user likes an item, the database atomically updates the item's signal ledger, the user's preference vector, and the user-to-creator relationship weight. All in the same write transaction. The next ranking query — even 100ms later — reflects the updated state.
-No Kafka consumer to lag. No feature store sync to schedule. No cache to invalidate. The write path and the read path are one system.
+```rust
 db.signal(Signal {
    kind: "like",
    item: "item_abc",
    user: "user_123",
    session: Some("session_xyz"),
    timestamp: Utc::now(),
    weight: 1.0,
    metadata: Some(json!({ "agent": "assistant", "tool": "planner" })),
 })?;
 ```
-## What we're building first
+There is no event bus between the engagement and the ranking update. No consumer lag. No cache to invalidate. The write path and the read path are one system. A user who skips three items in a row sees the fourth query adjust — not after a batch pipeline runs, not after a feature store syncs. Now.
-tidalDB is in active development. We're building in Rust, starting single-node, and working toward the first public release. The roadmap:
+## Where we are deliberately narrow
-1. **Storage foundation** — WAL, entity store, signal ledger with forward-decay scoring
+If your primary problem is operating a large, general search serving platform, systems like Vespa are excellent and mature.
 2. **Query engine** — The RETRIEVE/SEARCH/SUGGEST operations with filtering and ranking
 3. **Vector and text search** — HNSW via USearch, BM25 via Tantivy, hybrid fusion with RRF
 4. **The full query surface** — All sort modes, all filters, diversity enforcement, pagination
-We're building in public. Every architectural decision, every benchmark result, every trade-off gets documented here.
+Our wedge is narrower and opinionated:
 - Optimize for the personalization loop, not broad search platform parity.
 - Make negative feedback intent explicit and immediate:
  `skip_for_now` (soft), `not_for_me` (preference), `low_quality` (quality), `hide/mute/block` (hard excludes).
 - Treat "next refresh reflects feedback" as a hard product promise, not a best effort.
 - Keep the first deployment embeddable and in-process for low-latency iteration.
 ## Where the build stands
 tidalDB is early. I want to be direct about what exists today and what does not.
 **Built:** Schema system with entity, signal, and profile definitions. Write-ahead log with segment rotation, checksummed records, BLAKE3 deduplication, and crash recovery. Storage engine backed by fjall with trait abstraction, key encoding, and batch writes. Signal ledger with forward-decay scoring, hot-path state, and warm-path persistence.
 **Next:** Query engine — the RETRIEVE/SEARCH/SUGGEST operations with the execution pipeline described above. Then session-aware APIs, agent policies, vector search (USearch), text search (Tantivy), and hybrid fusion. Then the full query surface with all sort modes and diversity enforcement.
 The foundation is Rust, single-node, embeddable. The storage layer is designed for horizontal scaling later — key encoding and storage isolation are partition-ready — but single-node correctness comes first. This is how we differentiate from Vespa, Milvus, or any search-first system: tidalDB embeds inside your agent runtime, exposes a declarative query+session API, and guarantees every signal the agent writes is visible on the next read without a distributed hop.
 We are building in public. The code is on [GitHub](https://github.com/orchard9/tidalDB). Every architectural decision gets documented.
 ## Why open source
-The personalized content ranking problem is universal. Every content platform needs it. Making the solution proprietary would limit adoption to teams willing to vendor-lock on a database. That's not the goal.
+The personalized content ranking problem is universal. Every content platform needs it. The solution should be a tool you embed in your process and point at your data — not a vendor you depend on for a query you could run locally.
-The goal is a tool that an engineering team can embed in their process, point at their data, and get correct ranking in one query. Open source, MIT licensed, embeddable.
+MIT licensed. No asterisks.
 If you're operating a 6-system stack for content ranking and wondering why it has to be this hard — it doesn't. That's why we're building tidalDB.
 ---
-Follow the build on [GitHub](https://github.com/orchard9/tidalDB) or read the next post when it drops.
+*If you want the full diagnosis of why the 6-system stack exists and where correctness fails between the seams, read [Every content platform builds the same 6 systems from scratch](/blog/every-platform-builds-the-same-6-systems).*
--- a/site/next.config.ts
+++ b/site/next.config.ts
@ -3,6 +3,9 @@ import type { NextConfig } from "next";
 const nextConfig: NextConfig = {
  output: "export",
  images: { unoptimized: true },
  turbopack: {
    root: __dirname,
  },
 };
 export default nextConfig;
--- a/site/src/app/blog/page.tsx
+++ b/site/src/app/blog/page.tsx
@ -10,11 +10,12 @@ export default function BlogIndex() {
          Blog
        </p>
        <h1 className="font-serif text-4xl font-bold leading-tight md:text-5xl">
-          Building in public.
+          Building the agent memory substrate.
        </h1>
        <p className="mt-4 text-lg text-muted">
          Architecture decisions, engineering insights, and progress updates as
-          we build tidalDB.
+          we turn tidalDB into the personalization layer agents can read and
          write in real time.
        </p>
        <div className="mt-16 space-y-12">
--- a/site/src/app/layout.tsx
+++ b/site/src/app/layout.tsx
@ -33,6 +33,7 @@ export default function RootLayout({
    <html lang="en" className="dark">
      <body
        className={`${inter.variable} ${lora.variable} ${jetbrainsMono.variable} antialiased bg-background text-foreground`}
        suppressHydrationWarning
      >
        <Nav />
        <main>{children}</main>
--- a/tidal/Cargo.lock
+++ b/tidal/Cargo.lock
@ -1015,6 +1015,7 @@ dependencies = [
 "blake3",
 "criterion",
 "crossbeam",
 "dashmap",
 "fjall",
 "proptest",
 "tempfile",
--- a/tidal/Cargo.toml
+++ b/tidal/Cargo.toml
@ -6,10 +6,16 @@ rust-version = "1.91"
 description = "Embeddable database for personalized content ranking"
 license = "MIT"
 [features]
 test-utils = ["dep:tempfile"]
 metrics = []   # hand-rolled HTTP, no new crate deps
 [dependencies]
 blake3 = "1"
 crossbeam = "0.8"
 dashmap = "6"
 fjall = "3"
 tempfile = { version = "3", optional = true }
 tracing = "0.1"
 [dev-dependencies]
@ -28,6 +34,14 @@ cast_possible_truncation = "allow"
 module_name_repetitions = "allow"
 unwrap_used = "deny"
 [[test]]
 name = "sandboxed_storage"
 required-features = ["test-utils"]
 [[test]]
 name = "metrics_integration"
 required-features = ["metrics"]
 [[bench]]
 name = "signals"
 harness = false
--- a/tidal/benches/signals.rs
+++ b/tidal/benches/signals.rs
@ -1,5 +1,6 @@
 use criterion::{Criterion, criterion_group, criterion_main};
 #[allow(clippy::missing_const_for_fn)]
 fn signal_benchmarks(_c: &mut Criterion) {
    // Placeholder — benchmarks added as signal system is implemented.
 }
--- a/tidal/benches/storage.rs
+++ b/tidal/benches/storage.rs
@ -1,3 +1,5 @@
 #![allow(clippy::unwrap_used)]
 use criterion::{BatchSize, Criterion, criterion_group, criterion_main};
 use tidaldb::schema::{EntityId, EntityKind};
 use tidaldb::storage::{
--- a/tidal/build.rs
+++ b/tidal/build.rs
@ -0,0 +1,7 @@
 fn main() {
    // Expose GIT_HASH env var from CI, or "dev" for local builds.
    let hash = std::env::var("GIT_HASH").unwrap_or_else(|_| "dev".to_string());
    println!("cargo:rustc-env=TIDALDB_BUILD_HASH={hash}");
    // Rerun only if the env var changes.
    println!("cargo:rerun-if-env-changed=GIT_HASH");
 }
--- a/tidal/proptest-regressions/signals/warm.txt
+++ b/tidal/proptest-regressions/signals/warm.txt
@ -0,0 +1,7 @@
 # Seeds for failure cases proptest has generated in the past. It is
 # automatically read and these particular cases re-run before any
 # novel cases are generated.
 #
 # It is recommended to check this file in to source control so that
 # everyone who runs the test benefits from these saved cases.
 cc b64b3481845f116d827e7d5b43295853693428b0e3346481c1b53c39e712553e # shrinks to event_times_secs = [3485, 7080, 3481, 3481], query_time_secs = 3600
--- a/tidal/src/db/builder.rs
+++ b/tidal/src/db/builder.rs
@ -0,0 +1,368 @@
 use std::path::{Path, PathBuf};
 use std::sync::Arc;
 use super::TidalDb;
 use super::config::{Config, ConfigError, StorageMode};
 use super::metrics::MetricsState;
 use super::paths::Paths;
 /// Fluent builder for constructing a [`TidalDb`] instance.
 ///
 /// # Examples
 ///
 /// ```rust,no_run
 /// use tidaldb::TidalDb;
 ///
 /// // Ephemeral (in-memory) database for tests:
 /// let db = TidalDb::builder().ephemeral().open().unwrap();
 ///
 /// // Persistent database with explicit data directory:
 /// let db = TidalDb::builder()
 ///     .with_data_dir("/var/lib/tidaldb")
 ///     .open()
 ///     .unwrap();
 /// ```
 #[derive(Debug)]
 pub struct TidalDbBuilder {
    config: Config,
    /// Address for the optional metrics HTTP server (e.g. "127.0.0.1:9090").
    /// Only used when the `metrics` feature is enabled.
    #[allow(dead_code)]
    metrics_addr: Option<String>,
 }
 impl TidalDbBuilder {
    /// Create a new builder with default (ephemeral) configuration.
    #[must_use]
    pub fn new() -> Self {
        Self {
            config: Config::default(),
            metrics_addr: None,
        }
    }
    /// Switch to ephemeral (in-memory) mode, clearing any directory paths.
    ///
    /// This is the default mode. Calling this is only necessary to reset
    /// a builder that was previously configured for persistent mode.
    #[must_use]
    pub fn ephemeral(mut self) -> Self {
        self.config.mode = StorageMode::Ephemeral;
        self.config.data_dir = None;
        self.config.wal_dir = None;
        self.config.cache_dir = None;
        self
    }
    /// Switch to persistent mode with the given data directory.
    ///
    /// The directory must exist and be writable at the time [`open`](Self::open)
    /// is called. WAL and cache directories default to subdirectories of
    /// `data_dir` unless explicitly overridden.
    #[must_use]
    pub fn with_data_dir(mut self, dir: impl AsRef<Path>) -> Self {
        self.config.mode = StorageMode::Persistent;
        self.config.data_dir = Some(dir.as_ref().to_path_buf());
        self
    }
    /// Override the WAL directory (defaults to `{data_dir}/wal`).
    #[must_use]
    pub fn wal_dir(mut self, dir: impl AsRef<Path>) -> Self {
        self.config.wal_dir = Some(dir.as_ref().to_path_buf());
        self
    }
    /// Override the cache directory (defaults to `{data_dir}/cache`).
    #[must_use]
    pub fn cache_dir(mut self, dir: impl AsRef<Path>) -> Self {
        self.config.cache_dir = Some(dir.as_ref().to_path_buf());
        self
    }
    /// Eagerly validate the configuration.
    ///
    /// Checks:
    /// - Persistent mode requires `data_dir`
    /// - All specified directories must exist and be writable
    ///
    /// # Errors
    ///
    /// Returns [`ConfigError`] describing the first validation failure.
    pub fn validate(&self) -> Result<(), ConfigError> {
        if self.config.mode == StorageMode::Persistent && self.config.data_dir.is_none() {
            return Err(ConfigError::MissingDataDir);
        }
        // Validate each specified directory exists and is writable.
        let dirs_to_check: Vec<&PathBuf> = [
            self.config.data_dir.as_ref(),
            self.config.wal_dir.as_ref(),
            self.config.cache_dir.as_ref(),
        ]
        .into_iter()
        .flatten()
        .collect();
        for dir in dirs_to_check {
            if !dir.exists() {
                return Err(ConfigError::DirectoryNotFound { path: dir.clone() });
            }
            // Check writability by querying metadata permissions.
            // On Unix, we check the readonly flag. A more robust check
            // would attempt to create a temp file, but metadata is
            // sufficient for configuration validation.
            if dir
                .metadata()
                .map(|m| m.permissions().readonly())
                .unwrap_or(true)
            {
                return Err(ConfigError::NotWritable { path: dir.clone() });
            }
        }
        Ok(())
    }
    /// Enable the metrics HTTP server on the given address.
    ///
    /// Only available when the `metrics` feature is enabled. When the
    /// feature is disabled, this method does not exist and the builder
    /// compiles without it.
    ///
    /// # Examples
    ///
    /// ```rust,no_run
    /// # #[cfg(feature = "metrics")]
    /// let db = tidaldb::TidalDb::builder()
    ///     .ephemeral()
    ///     .enable_metrics("127.0.0.1:9090")
    ///     .open()
    ///     .unwrap();
    /// ```
    #[cfg(feature = "metrics")]
    #[must_use]
    pub fn enable_metrics(mut self, addr: impl Into<String>) -> Self {
        self.metrics_addr = Some(addr.into());
        self
    }
    /// Resolve default directory paths using [`Paths`] for persistent mode.
    ///
    /// When a `data_dir` is set and `wal_dir` or `cache_dir` are not
    /// explicitly overridden, this fills them in from [`Paths`]. This
    /// makes `Paths` the single source of truth for directory layout --
    /// the builder and the CLI both derive defaults the same way.
    fn resolve_defaults(&mut self) {
        if let Some(ref data_dir) = self.config.data_dir {
            let paths = Paths::new(data_dir);
            if self.config.wal_dir.is_none() {
                self.config.wal_dir = Some(paths.wal_dir());
            }
            if self.config.cache_dir.is_none() {
                self.config.cache_dir = Some(paths.cache_dir());
            }
        }
    }
    /// Validate and open a [`TidalDb`] instance.
    ///
    /// Calls [`validate`](Self::validate), then resolves default directory
    /// paths via [`Paths`], and constructs the database handle.
    ///
    /// Validation checks only user-specified directories. Resolved defaults
    /// (e.g., `{data_dir}/wal`) are populated after validation -- the storage
    /// engine will create them via [`Paths::ensure_all`] during initialization.
    ///
    /// # Errors
    ///
    /// Returns [`crate::LumenError`] if validation fails or initialization
    /// encounters an error.
    #[tracing::instrument(skip(self), fields(mode = ?self.config.mode))]
    pub fn open(mut self) -> crate::Result<TidalDb> {
        self.validate()?;
        self.resolve_defaults();
        let metrics = Arc::new(MetricsState::new());
        #[cfg(feature = "metrics")]
        let metrics_handle = if let Some(ref addr) = self.metrics_addr {
            let handle =
                super::http::MetricsHandle::start(addr, Arc::clone(&metrics)).map_err(|e| {
                    crate::LumenError::Internal(format!("metrics server failed to start: {e}"))
                })?;
            Some(handle)
        } else {
            None
        };
        Ok(TidalDb::from_config(
            self.config,
            metrics,
            #[cfg(feature = "metrics")]
            metrics_handle,
        ))
    }
 }
 impl Default for TidalDbBuilder {
    fn default() -> Self {
        Self::new()
    }
 }
 #[cfg(test)]
 mod tests {
    use super::*;
    #[test]
    fn builder_ephemeral_succeeds() {
        let db = TidalDb::builder().ephemeral().open();
        assert!(db.is_ok());
    }
    #[test]
    fn builder_default_is_ephemeral() {
        let db = TidalDb::builder().open();
        assert!(db.is_ok());
    }
    #[test]
    fn builder_persistent_requires_data_dir() {
        // Construct a persistent-mode builder without calling with_data_dir
        // by manually setting mode.
        let builder = TidalDbBuilder {
            config: Config {
                mode: StorageMode::Persistent,
                data_dir: None,
                wal_dir: None,
                cache_dir: None,
            },
            metrics_addr: None,
        };
        let result = builder.validate();
        assert!(result.is_err());
        let err = result.expect_err("should fail");
        assert!(
            matches!(err, ConfigError::MissingDataDir),
            "expected MissingDataDir, got: {err}"
        );
    }
    #[test]
    fn builder_persistent_missing_dir() {
        let result = TidalDb::builder()
            .with_data_dir("/nonexistent/path/that/does/not/exist")
            .open();
        assert!(result.is_err());
        let err_msg = result.expect_err("should fail").to_string();
        assert!(
            err_msg.contains("does not exist"),
            "expected DirectoryNotFound, got: {err_msg}"
        );
    }
    #[test]
    fn builder_persistent_existing_dir() {
        let tmp = tempfile::tempdir().expect("failed to create tempdir");
        let result = TidalDb::builder().with_data_dir(tmp.path()).open();
        assert!(result.is_ok(), "open with valid tempdir should succeed");
    }
    #[test]
    fn health_check_ok() {
        let db = TidalDb::builder().ephemeral().open().expect("open failed");
        assert!(db.health_check().is_ok());
    }
    #[test]
    fn close_ok() {
        let db = TidalDb::builder().ephemeral().open().expect("open failed");
        assert!(db.close().is_ok());
    }
    #[test]
    fn builder_with_wal_and_cache_dir() {
        let tmp = tempfile::tempdir().expect("failed to create tempdir");
        let wal = tmp.path().join("wal");
        let cache = tmp.path().join("cache");
        std::fs::create_dir_all(&wal).expect("mkdir wal");
        std::fs::create_dir_all(&cache).expect("mkdir cache");
        let result = TidalDb::builder()
            .with_data_dir(tmp.path())
            .wal_dir(&wal)
            .cache_dir(&cache)
            .open();
        assert!(
            result.is_ok(),
            "open with explicit wal/cache dirs should succeed"
        );
    }
    #[test]
    fn builder_ephemeral_resets_dirs() {
        let builder = TidalDb::builder()
            .with_data_dir("/some/path")
            .wal_dir("/some/wal")
            .cache_dir("/some/cache")
            .ephemeral();
        assert_eq!(builder.config.mode, StorageMode::Ephemeral);
        assert!(builder.config.data_dir.is_none());
        assert!(builder.config.wal_dir.is_none());
        assert!(builder.config.cache_dir.is_none());
    }
    #[test]
    fn builder_wal_dir_nonexistent() {
        let tmp = tempfile::tempdir().expect("failed to create tempdir");
        let result = TidalDb::builder()
            .with_data_dir(tmp.path())
            .wal_dir("/nonexistent/wal")
            .open();
        assert!(result.is_err());
        let err_msg = result.expect_err("should fail").to_string();
        assert!(err_msg.contains("does not exist"));
    }
    #[test]
    fn resolve_defaults_sets_wal_and_cache() {
        let tmp = tempfile::tempdir().expect("failed to create tempdir");
        let mut builder = TidalDb::builder().with_data_dir(tmp.path());
        assert!(builder.config.wal_dir.is_none());
        assert!(builder.config.cache_dir.is_none());
        builder.resolve_defaults();
        let paths = super::Paths::new(tmp.path());
        assert_eq!(builder.config.wal_dir.as_ref(), Some(&paths.wal_dir()));
        assert_eq!(builder.config.cache_dir.as_ref(), Some(&paths.cache_dir()));
    }
    #[test]
    fn resolve_defaults_preserves_explicit_overrides() {
        let tmp = tempfile::tempdir().expect("failed to create tempdir");
        let custom_wal = tmp.path().join("custom_wal");
        let custom_cache = tmp.path().join("custom_cache");
        let mut builder = TidalDb::builder()
            .with_data_dir(tmp.path())
            .wal_dir(&custom_wal)
            .cache_dir(&custom_cache);
        builder.resolve_defaults();
        assert_eq!(builder.config.wal_dir.as_ref(), Some(&custom_wal));
        assert_eq!(builder.config.cache_dir.as_ref(), Some(&custom_cache));
    }
    #[test]
    fn resolve_defaults_noop_for_ephemeral() {
        let mut builder = TidalDb::builder().ephemeral();
        builder.resolve_defaults();
        assert!(builder.config.wal_dir.is_none());
        assert!(builder.config.cache_dir.is_none());
    }
 }
--- a/tidal/src/db/config.rs
+++ b/tidal/src/db/config.rs
@ -0,0 +1,129 @@
 use std::fmt;
 use std::path::PathBuf;
 /// How tidalDB stores data.
 ///
 /// `Ephemeral` keeps everything in memory -- ideal for tests and short-lived
 /// processes. `Persistent` writes to an LSM-tree on disk (fjall).
 #[derive(Debug, Clone, PartialEq, Eq)]
 pub enum StorageMode {
    /// In-memory only. No filesystem access. Data is lost on drop.
    Ephemeral,
    /// Durable storage backed by fjall (LSM-tree). Requires `data_dir`.
    Persistent,
 }
 /// Configuration for a tidalDB instance.
 ///
 /// Constructed either directly or via [`super::TidalDbBuilder`].
 ///
 /// # Defaults
 ///
 /// The default configuration is ephemeral (in-memory) with no directory paths.
 /// Persistent mode requires at least `data_dir` to be set.
 #[derive(Debug, Clone)]
 pub struct Config {
    /// Storage backend selection.
    pub mode: StorageMode,
    /// Root directory for persistent data. Required when `mode` is `Persistent`.
    pub data_dir: Option<PathBuf>,
    /// Override for the WAL directory. Defaults to `{data_dir}/wal`.
    pub wal_dir: Option<PathBuf>,
    /// Override for the cache directory. Defaults to `{data_dir}/cache`.
    pub cache_dir: Option<PathBuf>,
 }
 impl Default for Config {
    fn default() -> Self {
        Self {
            mode: StorageMode::Ephemeral,
            data_dir: None,
            wal_dir: None,
            cache_dir: None,
        }
    }
 }
 /// Errors that arise during configuration validation.
 ///
 /// These are always caller errors -- the configuration is invalid and must
 /// be corrected before a tidalDB instance can be opened.
 #[derive(Debug)]
 pub enum ConfigError {
    /// Persistent mode was selected but no data directory was provided.
    MissingDataDir,
    /// A directory path was specified but does not exist on the filesystem.
    DirectoryNotFound { path: PathBuf },
    /// A directory exists but the process does not have write permission.
    NotWritable { path: PathBuf },
 }
 impl fmt::Display for ConfigError {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        match self {
            Self::MissingDataDir => f.write_str("persistent mode requires a data directory"),
            Self::DirectoryNotFound { path } => {
                write!(f, "directory does not exist: {}", path.display())
            }
            Self::NotWritable { path } => {
                write!(f, "directory is not writable: {}", path.display())
            }
        }
    }
 }
 impl std::error::Error for ConfigError {}
 #[cfg(test)]
 mod tests {
    use super::*;
    #[test]
    fn default_config_is_ephemeral() {
        let cfg = Config::default();
        assert_eq!(cfg.mode, StorageMode::Ephemeral);
        assert!(cfg.data_dir.is_none());
        assert!(cfg.wal_dir.is_none());
        assert!(cfg.cache_dir.is_none());
    }
    #[test]
    fn config_error_display_missing_data_dir() {
        let e = ConfigError::MissingDataDir;
        assert_eq!(e.to_string(), "persistent mode requires a data directory");
    }
    #[test]
    fn config_error_display_directory_not_found() {
        let e = ConfigError::DirectoryNotFound {
            path: PathBuf::from("/nonexistent"),
        };
        assert!(e.to_string().contains("/nonexistent"));
        assert!(e.to_string().contains("does not exist"));
    }
    #[test]
    fn config_error_display_not_writable() {
        let e = ConfigError::NotWritable {
            path: PathBuf::from("/readonly"),
        };
        assert!(e.to_string().contains("/readonly"));
        assert!(e.to_string().contains("not writable"));
    }
    #[test]
    fn storage_mode_debug() {
        // Ensure Debug is derived and produces readable output.
        let s = format!("{:?}", StorageMode::Ephemeral);
        assert_eq!(s, "Ephemeral");
        let s = format!("{:?}", StorageMode::Persistent);
        assert_eq!(s, "Persistent");
    }
    #[test]
    fn config_debug() {
        let cfg = Config::default();
        let s = format!("{cfg:?}");
        assert!(s.contains("Ephemeral"));
    }
 }
--- a/tidal/src/db/http.rs
+++ b/tidal/src/db/http.rs
@ -0,0 +1,152 @@
 //! Optional HTTP metrics server for tidalDB.
 //!
 //! Disabled by default. Enable with the `metrics` feature flag.
 //!
 //! Serves two endpoints on a background `std::thread`:
 //! - `GET /healthz` -- JSON health check
 //! - `GET /metrics` -- Prometheus text exposition format
 //!
 //! The server reads from [`MetricsState`] which is Arc-shared with [`TidalDb`].
 //! The server thread exits cleanly when [`MetricsHandle::stop`] is called.
 use std::io::{BufRead, BufReader, Write};
 use std::net::{TcpListener, TcpStream};
 use std::sync::Arc;
 use std::sync::atomic::{AtomicBool, Ordering};
 use std::thread;
 use super::metrics::MetricsState;
 /// Handle to the background metrics HTTP server.
 ///
 /// The server thread runs until [`stop`] is called or this handle is dropped.
 pub struct MetricsHandle {
    shutdown: Arc<AtomicBool>,
    thread: Option<thread::JoinHandle<()>>,
    /// The actual bound address (useful when port 0 was requested).
    pub addr: std::net::SocketAddr,
 }
 impl MetricsHandle {
    /// Bind to the given address and start the background server thread.
    ///
    /// # Errors
    ///
    /// Returns `std::io::Error` if the address cannot be bound.
    pub fn start(addr: &str, state: Arc<MetricsState>) -> std::io::Result<Self> {
        let listener = TcpListener::bind(addr)?;
        let actual_addr = listener.local_addr()?;
        let shutdown = Arc::new(AtomicBool::new(false));
        let shutdown_clone = Arc::clone(&shutdown);
        // listener, state, and shutdown_clone are moved into the thread --
        // they must be owned. serve_loop takes them by value intentionally.
        let handle = thread::Builder::new()
            .name("tidaldb-metrics-http".into())
            .spawn(move || serve_loop(listener, state, shutdown_clone))
            .map_err(std::io::Error::other)?;
        Ok(Self {
            shutdown,
            thread: Some(handle),
            addr: actual_addr,
        })
    }
    /// Stop the metrics server. Blocks until the server thread exits.
    pub fn stop(&mut self) {
        self.shutdown.store(true, Ordering::Release);
        // Connect to ourselves to unblock the accept() call.
        if let Ok(mut conn) = TcpStream::connect(self.addr) {
            let _ = conn.write_all(b"");
        }
        if let Some(t) = self.thread.take() {
            let _ = t.join();
        }
    }
 }
 impl Drop for MetricsHandle {
    fn drop(&mut self) {
        self.stop();
    }
 }
 // These parameters are passed by value intentionally: they are moved into the
 // spawned thread closure. Taking references would require `'static` lifetimes
 // which aren't available from the caller's stack.
 #[allow(clippy::needless_pass_by_value)]
 fn serve_loop(listener: TcpListener, state: Arc<MetricsState>, shutdown: Arc<AtomicBool>) {
    let timeout = std::time::Duration::from_millis(50);
    // Use non-blocking + sleep to periodically check the shutdown flag
    // without blocking indefinitely in accept().
    if let Err(e) = listener.set_nonblocking(true) {
        tracing::warn!(error = %e, "failed to set non-blocking mode on metrics listener");
    }
    loop {
        if shutdown.load(Ordering::Acquire) {
            break;
        }
        match listener.accept() {
            Ok((stream, _)) => {
                if shutdown.load(Ordering::Acquire) {
                    break;
                }
                handle_connection(stream, &state);
            }
            Err(ref e) if e.kind() == std::io::ErrorKind::WouldBlock => {
                thread::sleep(timeout);
            }
            Err(_) => break,
        }
    }
 }
 fn handle_connection(mut stream: TcpStream, state: &MetricsState) {
    // Set a read timeout to avoid hanging on slow/malicious clients.
    let _ = stream.set_read_timeout(Some(std::time::Duration::from_secs(5)));
    let mut reader = BufReader::new(&stream);
    let mut request_line = String::new();
    if reader.read_line(&mut request_line).is_err() {
        return;
    }
    // Drain remaining headers to avoid broken pipe.
    let mut header = String::new();
    loop {
        header.clear();
        match reader.read_line(&mut header) {
            Ok(0) | Err(_) => break,
            Ok(n) if n <= 2 => break, // empty line = end of headers
            _ => {}
        }
    }
    let path = extract_path(&request_line);
    let (status, content_type, body) = match path {
        "/healthz" => ("200 OK", "application/json", state.render_healthz()),
        "/metrics" => (
            "200 OK",
            "text/plain; version=0.0.4; charset=utf-8",
            state.render_prometheus(),
        ),
        _ => (
            "404 Not Found",
            "application/json",
            r#"{"error":"not found"}"#.to_string(),
        ),
    };
    let response = format!(
        "HTTP/1.1 {status}\r\nContent-Type: {content_type}\r\nContent-Length: {}\r\nConnection: close\r\n\r\n{body}",
        body.len()
    );
    let _ = stream.write_all(response.as_bytes());
 }
 fn extract_path(request_line: &str) -> &str {
    // "GET /path HTTP/1.1"
    let parts: Vec<&str> = request_line.split_whitespace().collect();
    if parts.len() >= 2 { parts[1] } else { "/" }
 }
--- a/tidal/src/db/metrics.rs
+++ b/tidal/src/db/metrics.rs
@ -0,0 +1,145 @@
 //! Runtime metrics for tidalDB.
 //!
 //! [`MetricsState`] is an `Arc`-shared bag of atomics that `TidalDb` updates
 //! on every operation. The metrics HTTP server (when the `metrics` feature
 //! is enabled) reads from this shared state to serve Prometheus text format.
 //!
 //! Adding a new counter in future milestones is:
 //! 1. Add an `AtomicU64` field to `MetricsState`
 //! 2. Increment it in the relevant `TidalDb` method
 //! 3. Add one line to `MetricsState::render_prometheus`
 use std::sync::atomic::{AtomicBool, Ordering};
 use std::time::Instant;
 /// Shared runtime metrics for a `TidalDb` instance.
 ///
 /// Cheap to clone (`Arc` inside). Thread-safe.
 pub struct MetricsState {
    /// Time the database was opened.
    pub(crate) opened_at: Instant,
    /// Whether the database is currently healthy.
    pub(crate) health_ok: AtomicBool,
 }
 impl MetricsState {
    pub(crate) fn new() -> Self {
        Self {
            opened_at: Instant::now(),
            health_ok: AtomicBool::new(true),
        }
    }
    /// Uptime in fractional seconds since the database was opened.
    #[must_use]
    pub fn uptime_seconds(&self) -> f64 {
        self.opened_at.elapsed().as_secs_f64()
    }
    /// Whether the database reports healthy (1.0) or degraded (0.0).
    #[must_use]
    pub fn health_ok_value(&self) -> f64 {
        if self.health_ok.load(Ordering::Relaxed) {
            1.0
        } else {
            0.0
        }
    }
    /// Render Prometheus text exposition format for all metrics.
    ///
    /// Format: <https://prometheus.io/docs/instrumenting/exposition_formats/>
    #[must_use]
    pub fn render_prometheus(&self) -> String {
        let uptime = self.uptime_seconds();
        let health = self.health_ok_value();
        let version = env!("CARGO_PKG_VERSION");
        let build_hash = crate::BUILD_HASH;
        format!(
            "# HELP tidaldb_uptime_seconds Seconds since database opened.\n\
             # TYPE tidaldb_uptime_seconds gauge\n\
             tidaldb_uptime_seconds{{partition_id=\"0\"}} {uptime:.3}\n\n\
             # HELP tidaldb_health_ok Whether the database is healthy. 1 = ok, 0 = degraded.\n\
             # TYPE tidaldb_health_ok gauge\n\
             tidaldb_health_ok{{partition_id=\"0\"}} {health}\n\n\
             # HELP tidaldb_info Build and version information.\n\
             # TYPE tidaldb_info gauge\n\
             tidaldb_info{{version=\"{version}\",build_hash=\"{build_hash}\",partition_id=\"0\"}} 1\n"
        )
    }
    /// Render JSON for /healthz.
    #[must_use]
    pub fn render_healthz(&self) -> String {
        let uptime = self.uptime_seconds();
        let status = if self.health_ok.load(Ordering::Relaxed) {
            "ok"
        } else {
            "degraded"
        };
        let version = env!("CARGO_PKG_VERSION");
        let build_hash = crate::BUILD_HASH;
        format!(
            r#"{{"status":"{status}","uptime_seconds":{uptime:.3},"version":"{version}","build_hash":"{build_hash}"}}"#
        )
    }
 }
 #[cfg(test)]
 mod tests {
    use super::*;
    #[test]
    fn new_creates_healthy_state() {
        let state = MetricsState::new();
        assert!(state.health_ok.load(Ordering::Relaxed));
    }
    #[test]
    fn uptime_is_non_negative() {
        let state = MetricsState::new();
        assert!(state.uptime_seconds() >= 0.0);
    }
    #[test]
    fn health_ok_value_returns_one_when_healthy() {
        let state = MetricsState::new();
        assert!((state.health_ok_value() - 1.0).abs() < f64::EPSILON);
    }
    #[test]
    fn health_ok_value_returns_zero_when_degraded() {
        let state = MetricsState::new();
        state.health_ok.store(false, Ordering::Relaxed);
        assert!(state.health_ok_value().abs() < f64::EPSILON);
    }
    #[test]
    fn render_prometheus_contains_expected_metrics() {
        let state = MetricsState::new();
        let output = state.render_prometheus();
        assert!(output.contains("tidaldb_uptime_seconds"));
        assert!(output.contains("tidaldb_health_ok"));
        assert!(output.contains("tidaldb_info"));
        assert!(output.contains("partition_id=\"0\""));
    }
    #[test]
    fn render_healthz_contains_expected_fields() {
        let state = MetricsState::new();
        let output = state.render_healthz();
        assert!(output.contains("\"status\":\"ok\""));
        assert!(output.contains("\"uptime_seconds\":"));
        assert!(output.contains("\"version\":"));
        assert!(output.contains("\"build_hash\":"));
    }
    #[test]
    fn render_healthz_degraded() {
        let state = MetricsState::new();
        state.health_ok.store(false, Ordering::Relaxed);
        let output = state.render_healthz();
        assert!(output.contains("\"status\":\"degraded\""));
    }
 }
--- a/tidal/src/db/mod.rs
+++ b/tidal/src/db/mod.rs
@ -0,0 +1,210 @@
 //! The public entry point for tidalDB.
 //!
 //! This module provides [`TidalDb`] (the database handle) and
 //! [`TidalDbBuilder`] (the fluent construction API). All interaction
 //! with tidalDB starts here.
 //!
 //! # Quick Start
 //!
 //! ```rust,no_run
 //! use tidaldb::TidalDb;
 //!
 //! // In-memory database for tests:
 //! let db = TidalDb::builder().ephemeral().open().unwrap();
 //! assert!(db.health_check().is_ok());
 //! ```
 pub mod builder;
 pub mod config;
 #[cfg(feature = "metrics")]
 pub mod http;
 pub mod metrics;
 pub mod paths;
 #[cfg(any(test, feature = "test-utils"))]
 pub mod temp;
 pub use builder::TidalDbBuilder;
 pub use config::{Config, ConfigError, StorageMode};
 pub use metrics::MetricsState;
 pub use paths::Paths;
 #[cfg(any(test, feature = "test-utils"))]
 pub use temp::TempTidalHome;
 use std::sync::Arc;
 use std::sync::atomic::{AtomicBool, Ordering};
 /// A tidalDB database instance.
 ///
 /// Created via [`TidalDb::builder()`]. At M0 this is a thin handle that
 /// validates configuration and proves the builder API works. Future
 /// milestones will wire in the storage engine, signal ledger, and query
 /// executor behind this facade.
 ///
 /// # Shutdown
 ///
 /// Call [`close`](Self::close) for explicit shutdown. If dropped without
 /// calling `close`, the [`Drop`] implementation will run cleanup and log
 /// any errors via `tracing::error!`.
 pub struct TidalDb {
    config: Config,
    /// Whether `close()` has been called. Prevents double-shutdown.
    closed: AtomicBool,
    /// Runtime metrics shared with the optional HTTP server.
    metrics: Arc<MetricsState>,
    /// Handle to the metrics HTTP server thread (metrics feature only).
    #[cfg(feature = "metrics")]
    metrics_handle: Option<http::MetricsHandle>,
 }
 impl TidalDb {
    /// Returns a new [`TidalDbBuilder`] with default (ephemeral) configuration.
    #[must_use]
    pub fn builder() -> TidalDbBuilder {
        TidalDbBuilder::new()
    }
    /// Construct a `TidalDb` from a validated configuration.
    ///
    /// This is `pub(crate)` -- external callers use the builder.
    #[allow(clippy::missing_const_for_fn)] // Arc field prevents const in practice
    pub(crate) fn from_config(
        config: Config,
        metrics: Arc<MetricsState>,
        #[cfg(feature = "metrics")] metrics_handle: Option<http::MetricsHandle>,
    ) -> Self {
        Self {
            config,
            closed: AtomicBool::new(false),
            metrics,
            #[cfg(feature = "metrics")]
            metrics_handle,
        }
    }
    /// Returns a reference to the shared metrics state.
    #[must_use]
    #[allow(clippy::missing_const_for_fn)] // Arc field prevents const in practice
    pub fn metrics(&self) -> &Arc<MetricsState> {
        &self.metrics
    }
    /// Returns the bound address of the metrics HTTP server, if running.
    ///
    /// Useful when port 0 was requested to discover the OS-assigned port.
    /// Returns `None` if the `metrics` feature is disabled or if
    /// `enable_metrics` was not called on the builder.
    #[must_use]
    #[allow(clippy::missing_const_for_fn)] // cfg-gated body prevents const
    pub fn metrics_addr(&self) -> Option<std::net::SocketAddr> {
        #[cfg(feature = "metrics")]
        {
            self.metrics_handle.as_ref().map(|h| h.addr)
        }
        #[cfg(not(feature = "metrics"))]
        {
            None
        }
    }
    /// Returns `Ok(())` if the database is initialized and operational.
    ///
    /// At M0 this simply confirms the handle was constructed successfully.
    /// Future milestones will verify storage engine connectivity, WAL
    /// integrity, and index health.
    ///
    /// # Errors
    ///
    /// Returns an error if the database has been closed or an internal
    /// check fails.
    #[tracing::instrument(skip(self))]
    pub fn health_check(&self) -> crate::Result<()> {
        if self.closed.load(Ordering::Acquire) {
            // Ordering::Release: ensures ranking queries that load with
            // Acquire see the degraded state after we mark it here.
            self.metrics.health_ok.store(false, Ordering::Release);
            return Err(crate::LumenError::Internal(
                "database is closed".to_string(),
            ));
        }
        Ok(())
    }
    /// Cleanly shut down the database.
    ///
    /// At M0 this is a no-op beyond marking the instance as closed.
    /// Future milestones will drain the WAL, flush the storage engine,
    /// and persist index state.
    ///
    /// # Errors
    ///
    /// Returns an error if shutdown encounters a failure (e.g., WAL flush
    /// fails in future milestones).
    #[tracing::instrument(skip(self))]
    pub fn close(self) -> crate::Result<()> {
        self.shutdown_inner()
    }
    /// Internal shutdown logic shared by `close()` and `Drop`.
    ///
    /// Returns `Result` even though M0 is infallible -- future milestones
    /// add WAL drain and storage flush which can fail.
    #[allow(clippy::unnecessary_wraps)]
    fn shutdown_inner(&self) -> crate::Result<()> {
        // Swap from false to true. If it was already true, we already shut down.
        if self
            .closed
            .compare_exchange(false, true, Ordering::AcqRel, Ordering::Acquire)
            .is_err()
        {
            // Already closed -- idempotent, not an error.
            return Ok(());
        }
        tracing::debug!(mode = ?self.config.mode, "tidaldb shutting down");
        // Mark health as degraded so the metrics endpoint reflects shutdown.
        self.metrics.health_ok.store(false, Ordering::Release);
        // Stop the metrics HTTP server if running.
        #[cfg(feature = "metrics")]
        {
            // SAFETY: We need &mut to stop the handle, but we only have &self.
            // This is safe because shutdown_inner is guarded by the closed
            // compare_exchange above -- only one thread will ever reach this
            // point. We use a raw pointer to get interior mutability for
            // the Option<MetricsHandle> field.
            //
            // NOTE: This is the same pattern used in Drop, which also has &mut self.
            // For the close() path we route through shutdown_inner(&self) to share
            // logic. In practice this runs exactly once due to the CAS guard.
        }
        // M0: nothing to flush. Future milestones add WAL drain, storage
        // engine flush, and index persistence here.
        Ok(())
    }
 }
 impl Drop for TidalDb {
    fn drop(&mut self) {
        // Stop metrics HTTP server if still running.
        #[cfg(feature = "metrics")]
        if let Some(ref mut handle) = self.metrics_handle {
            handle.stop();
        }
        if let Err(e) = self.shutdown_inner() {
            tracing::error!(error = %e, "error during tidaldb shutdown");
        }
    }
 }
 impl std::fmt::Debug for TidalDb {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        f.debug_struct("TidalDb")
            .field("mode", &self.config.mode)
            .field("closed", &self.closed.load(Ordering::Relaxed))
            .finish_non_exhaustive()
    }
 }
--- a/tidal/src/db/paths.rs
+++ b/tidal/src/db/paths.rs
@ -0,0 +1,177 @@
 //! Resolved filesystem paths for a single tidalDB instance.
 //!
 //! [`Paths`] is the single source of truth for directory layout. Both the
 //! [`TidalDbBuilder`](super::TidalDbBuilder) and the future CLI use it to
 //! resolve subdirectory locations from a base path.
 use std::path::{Path, PathBuf};
 /// Resolved filesystem paths for a single tidalDB instance.
 ///
 /// All paths derive from a single base directory. Use [`Paths::new`] to
 /// construct from a base path. Directories are NOT created automatically --
 /// call [`Paths::ensure_all`] to create them.
 ///
 /// # Directory Layout
 ///
 /// | Directory         | Purpose                                           |
 /// |-------------------|---------------------------------------------------|
 /// | `{base}/wal`      | Write-ahead log segments                          |
 /// | `{base}/items`    | fjall keyspace for item entities                  |
 /// | `{base}/users`    | fjall keyspace for user entities                  |
 /// | `{base}/creators` | fjall keyspace for creator entities                |
 /// | `{base}/cache`    | Materialized views and secondary indexes (future) |
 pub struct Paths {
    base: PathBuf,
 }
 impl Paths {
    /// Create a new `Paths` from the given base directory.
    ///
    /// No filesystem operations are performed. Call [`ensure_all`](Self::ensure_all)
    /// to create the directories.
    #[must_use]
    pub fn new(base: impl Into<PathBuf>) -> Self {
        Self { base: base.into() }
    }
    /// The base directory from which all subdirectories are derived.
    #[must_use]
    pub fn base(&self) -> &Path {
        &self.base
    }
    /// Path to the write-ahead log directory: `{base}/wal`.
    #[must_use]
    pub fn wal_dir(&self) -> PathBuf {
        self.base.join("wal")
    }
    /// Path to the item entities directory: `{base}/items`.
    #[must_use]
    pub fn items_dir(&self) -> PathBuf {
        self.base.join("items")
    }
    /// Path to the user entities directory: `{base}/users`.
    #[must_use]
    pub fn users_dir(&self) -> PathBuf {
        self.base.join("users")
    }
    /// Path to the creator entities directory: `{base}/creators`.
    #[must_use]
    pub fn creators_dir(&self) -> PathBuf {
        self.base.join("creators")
    }
    /// Path to the cache directory: `{base}/cache`.
    #[must_use]
    pub fn cache_dir(&self) -> PathBuf {
        self.base.join("cache")
    }
    /// Create all subdirectories under the base path.
    ///
    /// Idempotent: does not fail if directories already exist. Uses
    /// `std::fs::create_dir_all` so intermediate directories are also created.
    ///
    /// # Errors
    ///
    /// Returns `std::io::Error` if directory creation fails (e.g., permission
    /// denied, disk full).
    pub fn ensure_all(&self) -> Result<(), std::io::Error> {
        let dirs = [
            self.wal_dir(),
            self.items_dir(),
            self.users_dir(),
            self.creators_dir(),
            self.cache_dir(),
        ];
        for dir in &dirs {
            std::fs::create_dir_all(dir)?;
        }
        Ok(())
    }
 }
 impl std::fmt::Debug for Paths {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        f.debug_struct("Paths")
            .field("base", &self.base)
            .field("wal", &self.wal_dir())
            .field("items", &self.items_dir())
            .field("users", &self.users_dir())
            .field("creators", &self.creators_dir())
            .field("cache", &self.cache_dir())
            .finish()
    }
 }
 #[cfg(test)]
 #[allow(clippy::unwrap_used)]
 mod tests {
    use super::*;
    #[test]
    fn paths_derive_from_base() {
        let paths = Paths::new("/data/tidaldb");
        assert_eq!(paths.base(), Path::new("/data/tidaldb"));
        assert_eq!(paths.wal_dir(), PathBuf::from("/data/tidaldb/wal"));
        assert_eq!(paths.items_dir(), PathBuf::from("/data/tidaldb/items"));
        assert_eq!(paths.users_dir(), PathBuf::from("/data/tidaldb/users"));
        assert_eq!(
            paths.creators_dir(),
            PathBuf::from("/data/tidaldb/creators")
        );
        assert_eq!(paths.cache_dir(), PathBuf::from("/data/tidaldb/cache"));
    }
    #[test]
    fn paths_from_pathbuf() {
        let base = PathBuf::from("/tmp/test");
        let paths = Paths::new(base.clone());
        assert_eq!(paths.base(), base.as_path());
    }
    #[test]
    fn paths_debug_includes_all_dirs() {
        let paths = Paths::new("/data/tidaldb");
        let debug = format!("{paths:?}");
        assert!(debug.contains("base"), "Debug should show base");
        assert!(debug.contains("wal"), "Debug should show wal");
        assert!(debug.contains("items"), "Debug should show items");
        assert!(debug.contains("users"), "Debug should show users");
        assert!(debug.contains("creators"), "Debug should show creators");
        assert!(debug.contains("cache"), "Debug should show cache");
    }
    #[test]
    fn paths_ensure_all_creates_directories() {
        let tmp = tempfile::tempdir().unwrap();
        let paths = Paths::new(tmp.path());
        paths.ensure_all().unwrap();
        assert!(paths.wal_dir().is_dir());
        assert!(paths.items_dir().is_dir());
        assert!(paths.users_dir().is_dir());
        assert!(paths.creators_dir().is_dir());
        assert!(paths.cache_dir().is_dir());
    }
    #[test]
    fn paths_ensure_all_is_idempotent() {
        let tmp = tempfile::tempdir().unwrap();
        let paths = Paths::new(tmp.path());
        paths.ensure_all().unwrap();
        // Second call should not error.
        paths.ensure_all().unwrap();
    }
    #[test]
    fn paths_trailing_slash_handled() {
        let paths = Paths::new("/data/tidaldb/");
        // PathBuf::join handles trailing slashes correctly.
        assert_eq!(paths.wal_dir(), PathBuf::from("/data/tidaldb/wal"));
    }
 }
--- a/tidal/src/db/temp.rs
+++ b/tidal/src/db/temp.rs
@ -0,0 +1,186 @@
 //! Temporary tidalDB data directory for tests.
 //!
 //! Provides [`TempTidalHome`], a uniquely-named temporary directory that
 //! is cleaned up on drop (unless `preserve` is set). This ensures test
 //! isolation: every test gets its own filesystem sandbox.
 use std::path::{Path, PathBuf};
 use super::paths::Paths;
 /// A temporary tidalDB data directory for use in tests.
 ///
 /// Automatically cleaned up when dropped, unless `preserve` is set.
 /// Each instance gets a unique directory under the OS temp root,
 /// ensuring test isolation.
 ///
 /// # Examples
 ///
 /// ```rust,no_run
 /// # use tidaldb::db::temp::TempTidalHome;
 /// let home = TempTidalHome::new().unwrap();
 /// let paths = home.paths();
 /// paths.ensure_all().unwrap();
 /// // ... run test ...
 /// // `home` dropped here -> directory removed
 /// ```
 pub struct TempTidalHome {
    base: PathBuf,
    preserve: bool,
 }
 impl TempTidalHome {
    /// Create a new temporary directory with a unique name.
    ///
    /// The directory is created under the OS temp root with prefix `tidaldb-`.
    /// It will be deleted when this value is dropped.
    ///
    /// # Errors
    ///
    /// Returns `std::io::Error` if the temp directory cannot be created.
    pub fn new() -> Result<Self, std::io::Error> {
        let dir = tempfile::Builder::new()
            .prefix("tidaldb-")
            .tempdir()?
            .keep();
        Ok(Self {
            base: dir,
            preserve: false,
        })
    }
    /// Create a temporary directory that is NOT deleted on drop.
    ///
    /// Useful for debugging: the directory survives so you can inspect
    /// its contents after a test failure.
    ///
    /// # Errors
    ///
    /// Returns `std::io::Error` if the temp directory cannot be created.
    pub fn with_preserve() -> Result<Self, std::io::Error> {
        let dir = tempfile::Builder::new()
            .prefix("tidaldb-")
            .tempdir()?
            .keep();
        Ok(Self {
            base: dir,
            preserve: true,
        })
    }
    /// The root path of this temporary directory.
    #[must_use]
    pub fn path(&self) -> &Path {
        &self.base
    }
    /// Construct a [`Paths`] rooted at this temporary directory.
    #[must_use]
    pub fn paths(&self) -> Paths {
        Paths::new(self.base.clone())
    }
 }
 impl Drop for TempTidalHome {
    fn drop(&mut self) {
        if !self.preserve
            && let Err(e) = std::fs::remove_dir_all(&self.base)
        {
            tracing::warn!(
                path = %self.base.display(),
                error = %e,
                "failed to clean up TempTidalHome"
            );
        }
    }
 }
 impl std::fmt::Debug for TempTidalHome {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        f.debug_struct("TempTidalHome")
            .field("base", &self.base)
            .field("preserve", &self.preserve)
            .finish()
    }
 }
 #[cfg(test)]
 #[allow(clippy::unwrap_used)]
 mod tests {
    use super::*;
    #[test]
    fn new_creates_directory() {
        let home = TempTidalHome::new().unwrap();
        assert!(home.path().exists());
        assert!(home.path().is_dir());
    }
    #[test]
    fn path_contains_tidaldb_prefix() {
        let home = TempTidalHome::new().unwrap();
        let name = home
            .path()
            .file_name()
            .unwrap()
            .to_string_lossy()
            .to_string();
        assert!(
            name.starts_with("tidaldb-"),
            "expected tidaldb- prefix, got: {name}"
        );
    }
    #[test]
    fn paths_returns_paths_rooted_at_base() {
        let home = TempTidalHome::new().unwrap();
        let paths = home.paths();
        assert_eq!(paths.base(), home.path());
    }
    #[test]
    fn drop_removes_directory() {
        let path = {
            let home = TempTidalHome::new().unwrap();
            let p = home.path().to_owned();
            // Write a file so the dir is non-empty
            std::fs::write(p.join("testfile"), b"data").unwrap();
            assert!(p.exists());
            p
            // home dropped here
        };
        assert!(!path.exists(), "directory should be removed after drop");
    }
    #[test]
    fn with_preserve_keeps_directory() {
        let path = {
            let home = TempTidalHome::with_preserve().unwrap();
            let p = home.path().to_owned();
            std::fs::write(p.join("testfile"), b"data").unwrap();
            p
            // home dropped here — but preserve=true
        };
        assert!(
            path.exists(),
            "directory should still exist after drop with preserve"
        );
        // Manual cleanup since preserve=true
        std::fs::remove_dir_all(&path).unwrap();
    }
    #[test]
    fn two_homes_have_different_paths() {
        let home1 = TempTidalHome::new().unwrap();
        let home2 = TempTidalHome::new().unwrap();
        assert_ne!(home1.path(), home2.path());
    }
    #[test]
    fn debug_output() {
        let home = TempTidalHome::new().unwrap();
        let debug = format!("{home:?}");
        assert!(debug.contains("TempTidalHome"));
        assert!(debug.contains("preserve"));
    }
 }
--- a/tidal/src/lib.rs
+++ b/tidal/src/lib.rs
@ -1,3 +1,4 @@
 pub mod db;
 pub mod query;
 pub mod ranking;
 pub mod schema;
@ -5,6 +6,20 @@ pub mod signals;
 pub mod storage;
 pub mod wal;
 /// Build hash compiled in from the `GIT_HASH` environment variable.
 ///
 /// Falls back to `"dev"` if `GIT_HASH` is unset or `build.rs` is not invoked.
 pub const BUILD_HASH: &str = match option_env!("TIDALDB_BUILD_HASH") {
    Some(v) => v,
    None => "dev",
 };
 #[cfg(any(test, feature = "test-utils"))]
 pub use db::TempTidalHome;
 #[cfg(feature = "metrics")]
 pub use db::http::MetricsHandle;
 pub use db::metrics::MetricsState;
 pub use db::{Config, ConfigError, Paths, StorageMode, TidalDb, TidalDbBuilder};
 pub use schema::LumenError;
 /// Crate-wide result type. All public API methods return `Result<T, LumenError>`.
--- a/tidal/src/schema/error.rs
+++ b/tidal/src/schema/error.rs
@ -1,6 +1,7 @@
 use std::fmt;
 use super::{EntityId, EntityKind};
 use crate::db::ConfigError;
 /// Top-level error type. Every public API method returns `Result<T, LumenError>`.
 #[derive(Debug)]
@ -15,6 +16,8 @@ pub enum LumenError {
    Durability(DurabilityError),
    /// Query malformed. Parse error with details.
    Query(QueryError),
    /// Configuration error. Caller supplied invalid config.
    Config(ConfigError),
    /// Internal invariant violated. This is a bug in Lumen.
    Internal(String),
 }
@ -27,6 +30,7 @@ impl fmt::Display for LumenError {
            Self::Schema(e) => write!(f, "{e}"),
            Self::Durability(e) => write!(f, "durability error: {e}"),
            Self::Query(e) => write!(f, "query error: {e}"),
            Self::Config(e) => write!(f, "config error: {e}"),
            Self::Internal(msg) => write!(f, "internal error: {msg}"),
        }
    }
@ -39,6 +43,7 @@ impl std::error::Error for LumenError {
            Self::Schema(e) => Some(e),
            Self::Durability(e) => Some(e),
            Self::Query(e) => Some(e),
            Self::Config(e) => Some(e),
            Self::NotFound { .. } | Self::Internal(_) => None,
        }
    }
@ -68,6 +73,12 @@ impl From<QueryError> for LumenError {
    }
 }
 impl From<ConfigError> for LumenError {
    fn from(e: ConfigError) -> Self {
        Self::Config(e)
    }
 }
 /// Schema validation errors.
 ///
 /// `Eq` is manually implemented because f64 fields (from `Duration::as_secs_f64()`)
@ -91,6 +102,8 @@ pub enum SchemaError {
        signal_name: String,
    },
    NoSignalsDefined,
    /// Signal type name not found in schema at runtime.
    UnknownSignalType(String),
 }
 impl Eq for SchemaError {}
@ -135,6 +148,9 @@ impl fmt::Display for SchemaError {
                )
            }
            Self::NoSignalsDefined => f.write_str("schema must define at least one signal"),
            Self::UnknownSignalType(name) => {
                write!(f, "unknown signal type: '{name}'")
            }
        }
    }
 }
--- a/tidal/src/schema/score.rs
+++ b/tidal/src/schema/score.rs
@ -60,6 +60,7 @@ impl fmt::Debug for Score {
 }
 #[cfg(test)]
 #[allow(clippy::unwrap_used, clippy::float_cmp)]
 mod tests {
    use super::*;
@ -103,14 +104,14 @@ mod tests {
    #[test]
    fn display_format() {
-        let s = Score::new(3.14).unwrap();
+        let s = Score::new(std::f64::consts::PI).unwrap();
-        assert_eq!(s.to_string(), "3.140000");
+        assert_eq!(s.to_string(), "3.141593");
    }
    #[test]
    fn debug_format() {
-        let s = Score::new(3.14).unwrap();
+        let s = Score::new(std::f64::consts::PI).unwrap();
-        assert_eq!(format!("{s:?}"), "Score(3.140000)");
+        assert_eq!(format!("{s:?}"), "Score(3.141593)");
    }
    #[test]
--- a/tidal/src/schema/signal.rs
+++ b/tidal/src/schema/signal.rs
@ -253,6 +253,7 @@ impl<'a> IntoIterator for &'a WindowSet {
 }
 #[cfg(test)]
 #[allow(clippy::unwrap_used, clippy::float_cmp)]
 mod tests {
    use super::*;
--- a/tidal/src/schema/timestamp.rs
+++ b/tidal/src/schema/timestamp.rs
@ -76,6 +76,7 @@ impl fmt::Debug for Timestamp {
 }
 #[cfg(test)]
 #[allow(clippy::float_cmp)]
 mod tests {
    use super::*;
--- a/tidal/src/schema/validation.rs
+++ b/tidal/src/schema/validation.rs
@ -243,7 +243,7 @@ fn is_valid_signal_name(name: &str) -> bool {
 }
 #[cfg(test)]
-#[allow(unused_must_use)]
+#[allow(unused_must_use, clippy::unwrap_used)]
 mod tests {
    use super::*;
--- a/tidal/src/signals/checkpoint.rs
+++ b/tidal/src/signals/checkpoint.rs
@ -0,0 +1,856 @@
 //! Checkpoint and restore for the `SignalLedger`.
 //!
 //! # Checkpoint
 //!
 //! `SignalLedger::checkpoint()` serializes all in-memory signal state to the
 //! `StorageEngine` as a single atomic `WriteBatch`. No partial checkpoints are
 //! possible: either the whole ledger is written or nothing is.
 //!
 //! # Restore
 //!
 //! `SignalLedger::restore()` scans the storage, filters for `Tag::Sig` keys,
 //! deserializes each entry, and populates the `DashMap`. Returns the checkpoint
 //! metadata (for WAL replay) or `None` if no checkpoint exists (first boot).
 //!
 //! # Binary format
 //!
 //! Each entry serializes as a 983-byte fixed-length record.
 //! The checkpoint metadata serializes as a 17-byte record at a well-known key.
 //! All payload values use little-endian byte order; storage keys use big-endian
 //! (the existing `encode_key` convention). A version byte at offset 0 enables
 //! future backward-compatible format changes.
 use crate::schema::{EntityId, LumenError};
 use crate::storage::{StorageEngine, Tag, WriteBatch, encode_key, parse_key};
 use super::SignalTypeId;
 use super::hot::HotSignalState;
 use super::ledger::{EntitySignalEntry, SignalLedger};
 use super::warm::{BucketedCounter, BucketedCounterSnapshot, HOUR_BUCKETS, MINUTE_BUCKETS};
 // ── Constants ─────────────────────────────────────────────────────────────────
 const VERSION: u8 = 0x01;
 const ENTRY_SIZE: usize = 983;
 const META_SIZE: usize = 17;
 const META_SUFFIX: &[u8] = b"meta";
 /// Bit 0 of `flags` field: velocity tracking is enabled for this signal.
 const FLAG_VELOCITY_ENABLED: u16 = 0x0001;
 // ── CheckpointMeta ────────────────────────────────────────────────────────────
 /// Checkpoint sequence metadata stored alongside the signal state.
 ///
 /// Used by the WAL replay mechanism to know where to start replaying.
 /// Events with `wal_sequence > checkpoint.wal_sequence` must be replayed
 /// after `restore()` to bring the ledger's state fully up to date.
 #[derive(Debug, Clone, Copy, PartialEq, Eq)]
 pub struct CheckpointMeta {
    /// Nanosecond timestamp when the checkpoint was taken.
    pub checkpoint_time_ns: u64,
    /// WAL sequence number at checkpoint time.
    pub wal_sequence: u64,
 }
 // ── Serialization ─────────────────────────────────────────────────────────────
 /// Serialize an `EntitySignalEntry` to a 983-byte buffer.
 ///
 /// # Binary layout (all payload values little-endian)
 ///
 /// ```text
 /// Offset  Size  Field
 ///      0     1  version (0x01)
 ///      1     8  entity_id (u64 LE)
 ///      9     2  signal_type_id (u16 LE)
 ///     11     2  flags (u16 LE) — bit 0: velocity_enabled
 ///     13     8  last_update_ns (u64 LE)
 ///     21     8  decay_score_0 (f64 bits LE)
 ///     29     8  decay_score_1 (f64 bits LE)
 ///     37     8  decay_score_2 (f64 bits LE)
 ///     45     1  current_minute (u8)
 ///     46     1  current_hour (u8)
 ///     47     8  all_time_count (u64 LE)
 ///     55     8  last_minute_rotation_ns (u64 LE)
 ///     63     8  last_hour_rotation_ns (u64 LE)
 ///     71   240  minute_buckets (60 × u32 LE)
 ///    311   672  hour_buckets (168 × u32 LE)
 /// Total: 983 bytes
 /// ```
 #[must_use]
 pub fn serialize_entry(
    entity_id: EntityId,
    signal_type_id: SignalTypeId,
    entry: &EntitySignalEntry,
 ) -> Vec<u8> {
    let mut buf = Vec::with_capacity(ENTRY_SIZE);
    // [0] version
    buf.push(VERSION);
    // [1..9] entity_id LE
    buf.extend_from_slice(&entity_id.as_u64().to_le_bytes());
    // [9..11] signal_type_id LE
    buf.extend_from_slice(&signal_type_id.as_u16().to_le_bytes());
    // [11..13] flags LE — derived from hot-tier immutable fields
    let flags: u16 = if entry.hot.velocity_enabled() {
        FLAG_VELOCITY_ENABLED
    } else {
        0
    };
    buf.extend_from_slice(&flags.to_le_bytes());
    // [13..21] last_update_ns LE
    buf.extend_from_slice(&entry.hot.last_update_ns().to_le_bytes());
    // [21..45] three decay scores as f64 bits LE
    for i in 0..3 {
        buf.extend_from_slice(&entry.hot.stored_score(i).to_bits().to_le_bytes());
    }
    // Snapshot warm tier (atomic reads of all bucket state)
    let snap = entry.warm.snapshot();
    // [45] current_minute (u8)
    buf.push(snap.current_minute);
    // [46] current_hour (u8)
    buf.push(snap.current_hour);
    // [47..55] all_time_count LE
    buf.extend_from_slice(&snap.all_time_count.to_le_bytes());
    // [55..63] last_minute_rotation_ns LE
    buf.extend_from_slice(&snap.last_minute_rotation_ns.to_le_bytes());
    // [63..71] last_hour_rotation_ns LE
    buf.extend_from_slice(&snap.last_hour_rotation_ns.to_le_bytes());
    // [71..311] minute_buckets (60 × u32 LE = 240 bytes)
    for &bucket in &snap.minute_buckets {
        buf.extend_from_slice(&bucket.to_le_bytes());
    }
    // [311..983] hour_buckets (168 × u32 LE = 672 bytes)
    for &bucket in &snap.hour_buckets {
        buf.extend_from_slice(&bucket.to_le_bytes());
    }
    debug_assert_eq!(buf.len(), ENTRY_SIZE, "serialize_entry produced wrong size");
    buf
 }
 /// Deserialize an `EntitySignalEntry` from bytes.
 ///
 /// Returns `(entity_id, signal_type_id, entry)` on success.
 ///
 /// # Errors
 ///
 /// Returns `Err` if:
 /// - The slice is not exactly `ENTRY_SIZE` (983) bytes
 /// - The version byte is not `0x01`
 /// - Any sub-slice conversion fails due to offset math errors
 pub fn deserialize_entry(
    bytes: &[u8],
 ) -> Result<(EntityId, SignalTypeId, EntitySignalEntry), String> {
    if bytes.len() != ENTRY_SIZE {
        return Err(format!("expected {ENTRY_SIZE} bytes, got {}", bytes.len()));
    }
    // [0] version check
    if bytes[0] != VERSION {
        return Err(format!(
            "unknown checkpoint version 0x{:02x}, expected 0x{:02x}",
            bytes[0], VERSION
        ));
    }
    // [1..9] entity_id LE
    let entity_id_val = u64::from_le_bytes(
        bytes[1..9]
            .try_into()
            .map_err(|_| "offset math error at entity_id [1..9]".to_string())?,
    );
    let entity_id = EntityId::new(entity_id_val);
    // [9..11] signal_type_id LE
    let signal_type_id_val = u16::from_le_bytes(
        bytes[9..11]
            .try_into()
            .map_err(|_| "offset math error at signal_type_id [9..11]".to_string())?,
    );
    let signal_type_id = SignalTypeId::new(signal_type_id_val);
    // [11..13] flags LE
    let flags = u16::from_le_bytes(
        bytes[11..13]
            .try_into()
            .map_err(|_| "offset math error at flags [11..13]".to_string())?,
    );
    let velocity_enabled = (flags & FLAG_VELOCITY_ENABLED) != 0;
    // [13..21] last_update_ns LE
    let last_update_ns = u64::from_le_bytes(
        bytes[13..21]
            .try_into()
            .map_err(|_| "offset math error at last_update_ns [13..21]".to_string())?,
    );
    // [21..45] three decay scores as f64 bits LE
    let score_0 = f64::from_bits(u64::from_le_bytes(
        bytes[21..29]
            .try_into()
            .map_err(|_| "offset math error at score_0 [21..29]".to_string())?,
    ));
    let score_1 = f64::from_bits(u64::from_le_bytes(
        bytes[29..37]
            .try_into()
            .map_err(|_| "offset math error at score_1 [29..37]".to_string())?,
    ));
    let score_2 = f64::from_bits(u64::from_le_bytes(
        bytes[37..45]
            .try_into()
            .map_err(|_| "offset math error at score_2 [37..45]".to_string())?,
    ));
    // [45] current_minute (u8)
    let current_minute = bytes[45];
    // [46] current_hour (u8)
    let current_hour = bytes[46];
    // [47..55] all_time_count LE
    let all_time_count = u64::from_le_bytes(
        bytes[47..55]
            .try_into()
            .map_err(|_| "offset math error at all_time_count [47..55]".to_string())?,
    );
    // [55..63] last_minute_rotation_ns LE
    let last_minute_rotation_ns = u64::from_le_bytes(
        bytes[55..63]
            .try_into()
            .map_err(|_| "offset math error at last_minute_rotation_ns [55..63]".to_string())?,
    );
    // [63..71] last_hour_rotation_ns LE
    let last_hour_rotation_ns = u64::from_le_bytes(
        bytes[63..71]
            .try_into()
            .map_err(|_| "offset math error at last_hour_rotation_ns [63..71]".to_string())?,
    );
    // [71..311] minute_buckets (60 × u32 LE)
    let mut minute_buckets = [0u32; MINUTE_BUCKETS];
    for (i, bucket) in minute_buckets.iter_mut().enumerate() {
        let off = 71 + i * 4;
        *bucket = u32::from_le_bytes(bytes[off..off + 4].try_into().map_err(|_| {
            format!(
                "offset math error at minute_bucket[{i}] [{off}..{}]",
                off + 4
            )
        })?);
    }
    // [311..983] hour_buckets (168 × u32 LE)
    let mut hour_buckets = [0u32; HOUR_BUCKETS];
    for (i, bucket) in hour_buckets.iter_mut().enumerate() {
        let off = 311 + i * 4;
        *bucket =
            u32::from_le_bytes(bytes[off..off + 4].try_into().map_err(|_| {
                format!("offset math error at hour_bucket[{i}] [{off}..{}]", off + 4)
            })?);
    }
    // Reconstruct hot tier
    let hot = HotSignalState::with_flags(entity_id_val, signal_type_id_val, velocity_enabled);
    hot.restore(last_update_ns, &[score_0, score_1, score_2]);
    // Reconstruct warm tier from snapshot
    let warm = BucketedCounter::new();
    warm.restore(&BucketedCounterSnapshot {
        minute_buckets,
        hour_buckets,
        current_minute,
        current_hour,
        all_time_count,
        last_minute_rotation_ns,
        last_hour_rotation_ns,
    });
    Ok((entity_id, signal_type_id, EntitySignalEntry { hot, warm }))
 }
 /// Serialize `CheckpointMeta` to a 17-byte buffer.
 ///
 /// Format: `[version: 1][checkpoint_time_ns: 8 LE][wal_sequence: 8 LE]`
 #[must_use]
 pub fn serialize_meta(meta: &CheckpointMeta) -> Vec<u8> {
    let mut buf = Vec::with_capacity(META_SIZE);
    buf.push(VERSION);
    buf.extend_from_slice(&meta.checkpoint_time_ns.to_le_bytes());
    buf.extend_from_slice(&meta.wal_sequence.to_le_bytes());
    debug_assert_eq!(buf.len(), META_SIZE);
    buf
 }
 /// Deserialize `CheckpointMeta` from bytes.
 ///
 /// # Errors
 ///
 /// Returns `Err` if the slice is not `META_SIZE` bytes, the version byte
 /// is unknown, or any sub-slice conversion fails.
 pub fn deserialize_meta(bytes: &[u8]) -> Result<CheckpointMeta, String> {
    if bytes.len() != META_SIZE {
        return Err(format!(
            "expected {META_SIZE} meta bytes, got {}",
            bytes.len()
        ));
    }
    if bytes[0] != VERSION {
        return Err(format!(
            "unknown checkpoint meta version 0x{:02x}, expected 0x{:02x}",
            bytes[0], VERSION
        ));
    }
    let checkpoint_time_ns = u64::from_le_bytes(
        bytes[1..9]
            .try_into()
            .map_err(|_| "offset math error at checkpoint_time_ns [1..9]".to_string())?,
    );
    let wal_sequence = u64::from_le_bytes(
        bytes[9..17]
            .try_into()
            .map_err(|_| "offset math error at wal_sequence [9..17]".to_string())?,
    );
    Ok(CheckpointMeta {
        checkpoint_time_ns,
        wal_sequence,
    })
 }
 // ── SignalLedger impl ─────────────────────────────────────────────────────────
 impl SignalLedger {
    /// Write all in-memory signal state to the storage engine atomically.
    ///
    /// Iterates the `DashMap` and serializes each entry into a `WriteBatch`.
    /// The checkpoint metadata is stored at a well-known key:
    /// `encode_key(EntityId::new(0), Tag::Sig, b"meta")`.
    ///
    /// # Errors
    ///
    /// Returns `LumenError::Storage` if the `WriteBatch` commit or `flush` fails.
    ///
    /// # Concurrency
    ///
    /// This method iterates `DashMap` shards without a global lock. Entries
    /// written concurrently to already-snapshotted shards will be absent from
    /// the checkpoint. The caller must supply `meta.wal_sequence` equal to the
    /// WAL tail at checkpoint start; restore must replay from that sequence to
    /// recover any missing entries.
    pub fn checkpoint(
        &self,
        storage: &dyn StorageEngine,
        meta: CheckpointMeta,
    ) -> crate::Result<()> {
        let mut batch = WriteBatch::new();
        // Write checkpoint metadata at the well-known meta key.
        let meta_key = encode_key(EntityId::new(0), Tag::Sig, META_SUFFIX);
        batch.put(meta_key, serialize_meta(&meta));
        // Write all entity-signal entries.
        for entry_ref in self.entries() {
            let &(entity_id, signal_type_id) = entry_ref.key();
            let entry = entry_ref.value();
            // Entry key suffix is the signal_type_id as 2 big-endian bytes,
            // so it is exactly 2 bytes — never collides with b"meta" (4 bytes).
            let suffix = signal_type_id.as_u16().to_be_bytes();
            let key = encode_key(entity_id, Tag::Sig, &suffix);
            let value = serialize_entry(entity_id, signal_type_id, entry);
            batch.put(key, value);
        }
        storage.write_batch(batch)?;
        storage.flush()?;
        Ok(())
    }
    /// Restore in-memory signal state from the storage engine.
    ///
    /// Scans all keys, filters for `Tag::Sig` entries (excluding the meta key),
    /// deserializes each entry, and inserts it into the `DashMap`.
    ///
    /// Returns `Some(CheckpointMeta)` if a checkpoint exists, or `None` on
    /// first boot (empty storage).
    ///
    /// # Errors
    ///
    /// - `LumenError::Storage` on I/O failure
    /// - `LumenError::Internal` on deserialization failure (corrupt checkpoint)
    pub fn restore(&self, storage: &dyn StorageEngine) -> crate::Result<Option<CheckpointMeta>> {
        // Read checkpoint metadata first.
        let meta_key = encode_key(EntityId::new(0), Tag::Sig, META_SUFFIX);
        let meta = match storage.get(&meta_key)? {
            None => None,
            Some(meta_bytes) => Some(
                deserialize_meta(&meta_bytes)
                    .map_err(|e| LumenError::Internal(format!("corrupt checkpoint meta: {e}")))?,
            ),
        };
        // Scan all keys; keep only Tag::Sig entry keys (suffix length == 2).
        // TECH DEBT: scan_prefix(&[]) iterates the entire keyspace. This is safe
        // today (signals are the only key namespace), but must be replaced with a
        // Tag::Sig-scoped scan (e.g. `scan_tag(Tag::Sig)`) once M1P5 adds entity,
        // index, and embedding key namespaces to avoid iterating unrelated data.
        for item in storage.scan_prefix(&[]) {
            let (key, value) = item?;
            if let Some((entity_id, Tag::Sig, suffix)) = parse_key(&key) {
                // Skip the meta key (entity_id=0, suffix=b"meta").
                if entity_id == EntityId::new(0) && suffix == META_SUFFIX {
                    continue;
                }
                let (eid, stid, entry) = deserialize_entry(&value)
                    .map_err(|e| LumenError::Internal(format!("corrupt checkpoint entry: {e}")))?;
                self.entries.insert((eid, stid), entry);
            }
        }
        Ok(meta)
    }
    /// Return the number of entries currently in the `DashMap`.
    ///
    /// Used for diagnostics and testing.
    #[must_use]
    pub fn entry_count(&self) -> usize {
        self.entries.len()
    }
 }
 // ── Tests ─────────────────────────────────────────────────────────────────────
 #[cfg(test)]
 #[allow(clippy::unwrap_used)]
 mod tests {
    use std::time::Duration;
    use super::*;
    use crate::schema::{DecaySpec, EntityKind, SchemaBuilder, Timestamp, Window};
    use crate::signals::ledger::NoopWalWriter;
    use crate::storage::InMemoryBackend;
    fn test_schema() -> crate::schema::Schema {
        let mut builder = SchemaBuilder::new();
        let _ = builder
            .signal(
                "view",
                EntityKind::Item,
                DecaySpec::Exponential {
                    half_life: Duration::from_secs(7 * 24 * 3600),
                },
            )
            .windows(&[Window::OneHour, Window::AllTime])
            .velocity(true)
            .add();
        builder.build().expect("valid test schema")
    }
    // ── Serialization unit tests ───────────────────────────────────────────────
    #[test]
    fn serialize_entry_version_byte() {
        let entry = EntitySignalEntry {
            hot: HotSignalState::new(1, 0),
            warm: BucketedCounter::new(),
        };
        let bytes = serialize_entry(EntityId::new(1), SignalTypeId::new(0), &entry);
        assert_eq!(bytes[0], 0x01, "version byte should be 0x01");
    }
    #[test]
    fn serialize_entry_correct_length() {
        let entry = EntitySignalEntry {
            hot: HotSignalState::new(42, 3),
            warm: BucketedCounter::new(),
        };
        let bytes = serialize_entry(EntityId::new(42), SignalTypeId::new(3), &entry);
        assert_eq!(bytes.len(), ENTRY_SIZE);
    }
    #[test]
    fn deserialize_entry_rejects_wrong_version() {
        let bytes = vec![0x00u8; ENTRY_SIZE];
        assert!(deserialize_entry(&bytes).is_err());
    }
    #[test]
    fn deserialize_entry_rejects_truncated_data() {
        let result = deserialize_entry(&[0x01, 0x00]);
        assert!(result.is_err());
    }
    #[test]
    fn deserialize_entry_rejects_wrong_length() {
        let bytes = vec![0x01u8; ENTRY_SIZE - 1];
        assert!(deserialize_entry(&bytes).is_err());
    }
    #[test]
    fn serialize_deserialize_entry_roundtrip() {
        let entity_id = EntityId::new(99);
        let signal_type_id = SignalTypeId::new(2);
        let hot = HotSignalState::with_flags(99, 2, true);
        hot.restore(1_000_000_000_000, &[3.125, 2.71, 1.41]);
        let warm = BucketedCounter::with_start_time(1_000_000_000_000);
        warm.increment(1_000_000_000_000);
        warm.increment(1_000_000_000_001);
        let entry = EntitySignalEntry { hot, warm };
        let bytes = serialize_entry(entity_id, signal_type_id, &entry);
        assert_eq!(bytes.len(), ENTRY_SIZE);
        let (eid, stid, restored) = deserialize_entry(&bytes).expect("deserialize ok");
        assert_eq!(eid, entity_id);
        assert_eq!(stid, signal_type_id);
        assert!((restored.hot.stored_score(0) - 3.125).abs() < 1e-15);
        assert!((restored.hot.stored_score(1) - 2.71).abs() < 1e-15);
        assert!((restored.hot.stored_score(2) - 1.41).abs() < 1e-15);
        assert_eq!(restored.hot.last_update_ns(), 1_000_000_000_000);
        assert!(restored.hot.velocity_enabled());
        assert_eq!(restored.warm.all_time_count(), 2);
    }
    // ── Meta serialization tests ───────────────────────────────────────────────
    #[test]
    fn serialize_meta_correct_length() {
        let meta = CheckpointMeta {
            checkpoint_time_ns: 123_456,
            wal_sequence: 78,
        };
        let bytes = serialize_meta(&meta);
        assert_eq!(bytes.len(), META_SIZE);
        assert_eq!(bytes[0], 0x01);
    }
    #[test]
    fn deserialize_meta_roundtrip() {
        let meta = CheckpointMeta {
            checkpoint_time_ns: 1_700_000_000_000_000_000,
            wal_sequence: 42_000,
        };
        let bytes = serialize_meta(&meta);
        let restored = deserialize_meta(&bytes).expect("ok");
        assert_eq!(restored, meta);
    }
    #[test]
    fn deserialize_meta_rejects_wrong_version() {
        let mut bytes = serialize_meta(&CheckpointMeta {
            checkpoint_time_ns: 1,
            wal_sequence: 1,
        });
        bytes[0] = 0xFF;
        assert!(deserialize_meta(&bytes).is_err());
    }
    #[test]
    fn deserialize_meta_rejects_truncated() {
        assert!(deserialize_meta(&[0x01, 0x00]).is_err());
    }
    // ── Checkpoint/restore integration tests ──────────────────────────────────
    #[test]
    fn checkpoint_to_empty_storage() {
        let schema = test_schema();
        let ledger = SignalLedger::new(schema, Box::new(NoopWalWriter));
        let now = Timestamp::now();
        for i in 0..10u64 {
            ledger
                .record_signal("view", EntityId::new(i + 1), 1.0, now)
                .expect("record");
        }
        let storage = InMemoryBackend::new();
        let meta = CheckpointMeta {
            checkpoint_time_ns: now.as_nanos(),
            wal_sequence: 100,
        };
        ledger.checkpoint(&storage, meta).expect("checkpoint");
        // Expect meta key + 10 entry keys = 11 total keys.
        let all_keys: Vec<_> = storage
            .scan_prefix(&[])
            .collect::<Result<_, _>>()
            .expect("scan ok");
        assert_eq!(
            all_keys.len(),
            11,
            "expected 11 keys, got {}",
            all_keys.len()
        );
    }
    #[test]
    fn restore_from_empty_storage() {
        let schema = test_schema();
        let ledger = SignalLedger::new(schema, Box::new(NoopWalWriter));
        let storage = InMemoryBackend::new();
        let meta = ledger.restore(&storage).expect("restore ok");
        assert!(meta.is_none(), "empty storage should return None");
        assert_eq!(ledger.entry_count(), 0);
    }
    #[test]
    fn restore_preserves_decay_scores() {
        let schema = test_schema();
        let ledger = SignalLedger::new(schema.clone(), Box::new(NoopWalWriter));
        let ts1 = Timestamp::from_nanos(1_000_000_000_000);
        let ts2 = Timestamp::from_nanos(1_001_000_000_000);
        ledger
            .record_signal("view", EntityId::new(42), 5.0, ts1)
            .expect("record 1");
        ledger
            .record_signal("view", EntityId::new(42), 3.0, ts2)
            .expect("record 2");
        let storage = InMemoryBackend::new();
        let meta = CheckpointMeta {
            checkpoint_time_ns: 1_002_000_000_000,
            wal_sequence: 50,
        };
        ledger.checkpoint(&storage, meta).expect("checkpoint");
        let ledger2 = SignalLedger::new(schema, Box::new(NoopWalWriter));
        let restored_meta = ledger2
            .restore(&storage)
            .expect("restore")
            .expect("some meta");
        assert_eq!(restored_meta.wal_sequence, 50);
        let score_orig = ledger
            .read_decay_score(EntityId::new(42), "view", 0)
            .expect("ok");
        let score_rest = ledger2
            .read_decay_score(EntityId::new(42), "view", 0)
            .expect("ok");
        assert!(score_orig.is_some());
        assert!(score_rest.is_some());
    }
    #[test]
    fn restore_preserves_windowed_counts() {
        let schema = test_schema();
        let ledger = SignalLedger::new(schema.clone(), Box::new(NoopWalWriter));
        let base_ns = 1_000_000_000_000u64;
        for i in 0..100u64 {
            let ts = Timestamp::from_nanos(base_ns + i * 100_000_000);
            ledger
                .record_signal("view", EntityId::new(1), 1.0, ts)
                .expect("record");
        }
        let storage = InMemoryBackend::new();
        let meta = CheckpointMeta {
            checkpoint_time_ns: base_ns + 10_000_000_000,
            wal_sequence: 0,
        };
        ledger.checkpoint(&storage, meta).expect("checkpoint");
        let ledger2 = SignalLedger::new(schema, Box::new(NoopWalWriter));
        ledger2.restore(&storage).expect("restore");
        let count_orig = ledger
            .read_windowed_count(EntityId::new(1), "view", Window::AllTime)
            .expect("ok");
        let count_rest = ledger2
            .read_windowed_count(EntityId::new(1), "view", Window::AllTime)
            .expect("ok");
        assert_eq!(count_orig, count_rest);
        assert_eq!(count_rest, 100);
    }
    #[test]
    fn checkpoint_overwrites_previous() {
        let schema = test_schema();
        let ledger = SignalLedger::new(schema.clone(), Box::new(NoopWalWriter));
        let storage = InMemoryBackend::new();
        let ts = Timestamp::now();
        // First checkpoint: 5 entities.
        for i in 0..5u64 {
            ledger
                .record_signal("view", EntityId::new(i + 1), 1.0, ts)
                .expect("record");
        }
        ledger
            .checkpoint(
                &storage,
                CheckpointMeta {
                    checkpoint_time_ns: 1,
                    wal_sequence: 10,
                },
            )
            .expect("checkpoint 1");
        // Add 3 more entities, then second checkpoint: 8 entities total.
        for i in 5..8u64 {
            ledger
                .record_signal("view", EntityId::new(i + 1), 1.0, ts)
                .expect("record");
        }
        ledger
            .checkpoint(
                &storage,
                CheckpointMeta {
                    checkpoint_time_ns: 2,
                    wal_sequence: 20,
                },
            )
            .expect("checkpoint 2");
        let ledger2 = SignalLedger::new(schema, Box::new(NoopWalWriter));
        let restored_meta = ledger2
            .restore(&storage)
            .expect("restore")
            .expect("some meta");
        assert_eq!(restored_meta.wal_sequence, 20);
        assert_eq!(ledger2.entry_count(), 8);
    }
 }
 #[cfg(test)]
 #[allow(clippy::unwrap_used)]
 mod proptests {
    use std::time::Duration;
    use proptest::prelude::*;
    use super::*;
    use crate::schema::{DecaySpec, EntityKind, SchemaBuilder, Timestamp, Window};
    use crate::signals::ledger::NoopWalWriter;
    use crate::storage::InMemoryBackend;
    fn test_schema() -> crate::schema::Schema {
        let mut builder = SchemaBuilder::new();
        let _ = builder
            .signal(
                "view",
                EntityKind::Item,
                DecaySpec::Exponential {
                    half_life: Duration::from_secs(7 * 24 * 3600),
                },
            )
            .windows(&[Window::AllTime])
            .velocity(false)
            .add();
        builder.build().expect("valid schema")
    }
    // Meta serialization roundtrip for all u64 combinations.
    proptest! {
        #[test]
        fn serialize_deserialize_meta_roundtrip(
            checkpoint_time_ns: u64,
            wal_sequence: u64,
        ) {
            let meta = CheckpointMeta { checkpoint_time_ns, wal_sequence };
            let bytes = serialize_meta(&meta);
            let restored = deserialize_meta(&bytes).unwrap();
            prop_assert_eq!(restored, meta);
        }
    }
    // Entry serialization roundtrip for arbitrary hot-tier state.
    proptest! {
        #[test]
        fn serialize_deserialize_entry_roundtrip(
            entity_id_val in 1u64..1_000_000,
            signal_type_id_val in 0u16..64,
            score_0 in 0.0f64..1e12,
            score_1 in 0.0f64..1e12,
            score_2 in 0.0f64..1e12,
            last_update in 0u64..2_000_000_000_000u64,
            all_time in 0u64..1_000_000,
        ) {
            let entity_id = EntityId::new(entity_id_val);
            let signal_type_id = SignalTypeId::new(signal_type_id_val);
            let hot = HotSignalState::new(entity_id_val, signal_type_id_val);
            hot.restore(last_update, &[score_0, score_1, score_2]);
            let warm = BucketedCounter::new();
            warm.increment_by(all_time as u32, 0);
            let entry = EntitySignalEntry { hot, warm };
            let bytes = serialize_entry(entity_id, signal_type_id, &entry);
            let (eid, stid, restored) = deserialize_entry(&bytes).unwrap();
            prop_assert_eq!(eid, entity_id);
            prop_assert_eq!(stid, signal_type_id);
            prop_assert!((restored.hot.stored_score(0) - score_0).abs() < 1e-15);
            prop_assert!((restored.hot.stored_score(1) - score_1).abs() < 1e-15);
            prop_assert!((restored.hot.stored_score(2) - score_2).abs() < 1e-15);
            prop_assert_eq!(restored.hot.last_update_ns(), last_update);
        }
    }
    // Full checkpoint-restore roundtrip.
    proptest! {
        #[test]
        fn checkpoint_restore_roundtrip(
            entity_count in 1usize..50,
            signals_per_entity in 1usize..20,
        ) {
            let schema = test_schema();
            let ledger = SignalLedger::new(schema.clone(), Box::new(NoopWalWriter));
            let base_ns = 1_000_000_000_000u64;
            for entity in 0..entity_count as u64 {
                for i in 0..signals_per_entity {
                    let ts = Timestamp::from_nanos(base_ns + (i as u64) * 1_000_000_000);
                    ledger
                        .record_signal("view", EntityId::new(entity + 1), 1.0, ts)
                        .unwrap();
                }
            }
            let storage = InMemoryBackend::new();
            let meta = CheckpointMeta { checkpoint_time_ns: base_ns, wal_sequence: 42 };
            ledger.checkpoint(&storage, meta).unwrap();
            let ledger2 = SignalLedger::new(schema, Box::new(NoopWalWriter));
            let restored_meta = ledger2.restore(&storage).unwrap();
            prop_assert_eq!(restored_meta, Some(meta));
            prop_assert_eq!(ledger2.entry_count(), ledger.entry_count());
            for entity in 0..entity_count as u64 {
                let eid = EntityId::new(entity + 1);
                let orig_count = ledger
                    .read_windowed_count(eid, "view", Window::AllTime)
                    .unwrap();
                let rest_count = ledger2
                    .read_windowed_count(eid, "view", Window::AllTime)
                    .unwrap();
                prop_assert_eq!(orig_count, rest_count, "entity {}: all-time count mismatch", entity);
            }
        }
    }
 }
--- a/tidal/src/signals/hot.rs
+++ b/tidal/src/signals/hot.rs
@ -197,6 +197,14 @@ impl HotSignalState {
    ///
    /// Out-of-bounds `decay_rate_idx` is saturated to `MAX_DECAY_RATES - 1` to
    /// avoid panicking on the hot path.
    ///
    /// # Concurrency
    ///
    /// Reads `last_update_ns` and the score without a lock. The score may
    /// reflect a slightly different timestamp than `query_time_ns` if a concurrent
    /// `on_signal` is in flight. This is intentional — approximate reads are
    /// acceptable for ranking. Callers must not rely on the score being exactly
    /// consistent with `query_time_ns`.
    #[must_use]
    #[allow(clippy::cast_precision_loss)]
    pub fn current_score(&self, decay_rate_idx: usize, query_time_ns: u64, lambda: f64) -> f64 {
@ -267,6 +275,12 @@ impl fmt::Debug for HotSignalState {
 }
 #[cfg(test)]
 #[allow(
    clippy::unwrap_used,
    clippy::float_cmp,
    clippy::cast_sign_loss,
    clippy::cast_precision_loss
 )]
 mod tests {
    use super::*;
@ -331,7 +345,7 @@ mod tests {
        let score = state.current_score(0, t1, lambda);
        // score = 1.0 * exp(-lambda * 1.0) + 1.0
-        let expected = 1.0_f64 * (-lambda * 1.0).exp() + 1.0;
+        let expected = 1.0_f64.mul_add((-lambda * 1.0).exp(), 1.0);
        assert!(
            (score - expected).abs() < 1e-10,
            "score={score}, expected={expected}"
@ -352,8 +366,7 @@ mod tests {
        state.on_signal(1.0, t_early, &[lambda]);
        // Query at t=10s -- should match analytical result
-        let analytical = 1.0 * (-lambda * 0.0).exp() // event at t=10, age=0
+        let analytical = 1.0f64.mul_add((-lambda * 0.0).exp(), 1.0 * (-lambda * 5.0).exp());
                       + 1.0 * (-lambda * 5.0).exp(); // event at t=5, age=5s
        let actual = state.current_score(0, t_late, lambda);
        assert!(
            (actual - analytical).abs() < 1e-10,
@ -429,6 +442,7 @@ mod tests {
 }
 #[cfg(test)]
 #[allow(clippy::unwrap_used, clippy::cast_precision_loss)]
 mod proptests {
    use super::*;
    use proptest::prelude::*;
@ -458,7 +472,7 @@ mod proptests {
            lambda in 1e-7f64..1e-3,
        ) {
            // Sort events by time for in-order processing
-            let mut sorted_events = events.clone();
+            let mut sorted_events = events;
            sorted_events.sort_by_key(|e| e.1);
            let query_time_ns = sorted_events.last().expect("events non-empty").1 + 1_000_000_000; // +1 second
--- a/tidal/src/signals/ledger.rs
+++ b/tidal/src/signals/ledger.rs
@ -0,0 +1,593 @@
 //! Signal ledger: top-level coordinator for hot-tier decay scores and
 //! warm-tier bucketed counters across all active entities.
 //!
 //! `SignalLedger` is the single entry point for signal state management.
 //! It owns a `DashMap<(EntityId, SignalTypeId), EntitySignalEntry>` that
 //! provides concurrent access to per-entity signal state.
 //!
 //! # WAL integration
 //!
 //! Every `record_signal()` call first appends the event to the WAL via the
 //! `WalWriter` trait. Only after the WAL confirms durability does the ledger
 //! update in-memory state. This ensures signals survive crashes.
 //!
 //! # Concurrency
 //!
 //! Multiple threads can write signals to different entities simultaneously.
 //! Writes to the same entity contend on the `DashMap` shard lock only for entry
 //! lookup; the actual state update (CAS on hot tier, atomic increment on warm
 //! tier) is lock-free once the entry reference is obtained.
 use std::collections::HashMap;
 use std::fmt;
 use dashmap::DashMap;
 use crate::schema::{DecayModel, Schema};
 use crate::schema::{EntityId, LumenError, SchemaError, Timestamp, Window};
 use super::SignalTypeId;
 use super::hot::HotSignalState;
 use super::warm::BucketedCounter;
 // ── WAL boundary ─────────────────────────────────────────────────────────────
 /// Trait boundary for WAL integration.
 ///
 /// `m1p2` provides the real implementation. `m1p4` tests use `NoopWalWriter`.
 /// The `SignalLedger` calls `append_signal()` before updating in-memory state,
 /// ensuring WAL-first durability semantics.
 pub trait WalWriter: Send + Sync {
    /// Append a signal event to the WAL.
    ///
    /// Returns `Ok(())` when the event is durably committed. After this
    /// returns, in-memory state is updated.
    ///
    /// # Errors
    ///
    /// Returns `LumenError::Durability` if the WAL write fails.
    fn append_signal(
        &self,
        signal_type_id: SignalTypeId,
        entity_id: EntityId,
        weight: f64,
        timestamp: Timestamp,
    ) -> crate::Result<()>;
 }
 /// No-op WAL writer for testing. Always succeeds without writing anything.
 pub struct NoopWalWriter;
 impl WalWriter for NoopWalWriter {
    fn append_signal(
        &self,
        _signal_type_id: SignalTypeId,
        _entity_id: EntityId,
        _weight: f64,
        _timestamp: Timestamp,
    ) -> crate::Result<()> {
        Ok(())
    }
 }
 impl fmt::Debug for NoopWalWriter {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        f.write_str("NoopWalWriter")
    }
 }
 // ── Entry ─────────────────────────────────────────────────────────────────────
 /// Combined hot-tier and warm-tier state for one entity-signal pair.
 pub struct EntitySignalEntry {
    /// Running exponentially-decayed score (hot tier).
    pub hot: HotSignalState,
    /// Bucketed windowed event counters (warm tier).
    pub warm: BucketedCounter,
 }
 impl fmt::Debug for EntitySignalEntry {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        f.debug_struct("EntitySignalEntry")
            .field("hot", &self.hot)
            .field("warm", &self.warm)
            .finish()
    }
 }
 // ── Ledger ────────────────────────────────────────────────────────────────────
 /// The signal ledger: coordinates hot and warm tiers for all active entities.
 ///
 /// This is the single entry point for signal state management. `TidalDB`
 /// holds a `SignalLedger` and delegates all signal operations to it.
 pub struct SignalLedger {
    /// Per-(entity, `signal_type`) state. Sharded for concurrent access.
    pub(crate) entries: DashMap<(EntityId, SignalTypeId), EntitySignalEntry>,
    /// WAL writer for durability.
    wal: Box<dyn WalWriter>,
    /// Schema for signal type lookup and lambda retrieval.
    schema: Schema,
    /// Signal name → `SignalTypeId` mapping (built at construction, immutable).
    signal_name_to_id: HashMap<String, SignalTypeId>,
    /// `SignalTypeId` → lambda array (cached from schema, immutable after construction).
    signal_lambdas: HashMap<SignalTypeId, Vec<f64>>,
 }
 impl SignalLedger {
    /// Construct a new ledger from a validated schema and WAL writer.
    ///
    /// Signal types are enumerated in alphabetical order and assigned
    /// sequential `SignalTypeId` values (0, 1, 2, …).
    #[must_use]
    pub fn new(schema: Schema, wal: Box<dyn WalWriter>) -> Self {
        let mut signal_list: Vec<&crate::schema::SignalTypeDef> = schema.signals().collect();
        signal_list.sort_by_key(|s| s.name());
        let mut signal_name_to_id = HashMap::with_capacity(signal_list.len());
        let mut signal_lambdas = HashMap::with_capacity(signal_list.len());
        for (idx, sig) in signal_list.iter().enumerate() {
            let type_id = SignalTypeId::new(idx as u16);
            signal_name_to_id.insert(sig.name().to_owned(), type_id);
            let lambdas = match sig.decay() {
                DecayModel::Exponential { lambda, .. } => vec![*lambda],
                DecayModel::Linear { .. } | DecayModel::Permanent => vec![],
            };
            signal_lambdas.insert(type_id, lambdas);
        }
        Self {
            entries: DashMap::with_shard_amount(16),
            wal,
            schema,
            signal_name_to_id,
            signal_lambdas,
        }
    }
    /// Record a signal event for an entity.
    ///
    /// Steps:
    /// 1. Resolve signal type name → `SignalTypeId`
    /// 2. Append event to WAL (WAL-first)
    /// 3. Get or create `EntitySignalEntry` in `DashMap`
    /// 4. Update hot-tier decay score
    /// 5. Update warm-tier windowed counters
    ///
    /// # Errors
    ///
    /// - `LumenError::Schema` if `signal_type_name` is not defined
    /// - `LumenError::Durability` if WAL write fails
    ///
    /// # Durability contract
    ///
    /// The WAL is written **before** in-memory state is updated. If
    /// `wal.append_signal()` returns `Err`, the in-memory state is NOT updated
    /// and the error is returned to the caller. Implementors of `WalWriter`
    /// MUST guarantee persistence before returning `Ok(())`.
    pub fn record_signal(
        &self,
        signal_type_name: &str,
        entity_id: EntityId,
        weight: f64,
        timestamp: Timestamp,
    ) -> crate::Result<()> {
        let type_id = self.resolve_signal_type(signal_type_name)?;
        // WAL-first: durability before in-memory update.
        self.wal
            .append_signal(type_id, entity_id, weight, timestamp)?;
        let lambdas = self
            .signal_lambdas
            .get(&type_id)
            .map_or_else(<&[f64]>::default, Vec::as_slice);
        let ts_ns = timestamp.as_nanos();
        let entry = self
            .entries
            .entry((entity_id, type_id))
            .or_insert_with(|| EntitySignalEntry {
                hot: HotSignalState::new(entity_id.as_u64(), type_id.as_u16()),
                warm: BucketedCounter::with_start_time(ts_ns),
            });
        entry.hot.on_signal(weight, ts_ns, lambdas);
        entry.warm.increment(ts_ns);
        // Explicitly drop the DashMap entry ref to release the shard lock before
        // returning (satisfies clippy::significant_drop_tightening).
        drop(entry);
        Ok(())
    }
    /// Read the current decay score for an entity-signal pair.
    ///
    /// Applies lazy decay from the stored timestamp to the current wall-clock time.
    /// Returns `None` if no signals have been recorded for this entity-signal pair.
    ///
    /// # Errors
    ///
    /// - `LumenError::Schema` if `signal_type_name` is not defined
    pub fn read_decay_score(
        &self,
        entity_id: EntityId,
        signal_type_name: &str,
        decay_rate_idx: usize,
    ) -> crate::Result<Option<f64>> {
        let type_id = self.resolve_signal_type(signal_type_name)?;
        let now_ns = Timestamp::now().as_nanos();
        let lambda = self
            .signal_lambdas
            .get(&type_id)
            .and_then(|v| v.get(decay_rate_idx))
            .copied()
            .unwrap_or(0.0);
        match self.entries.get(&(entity_id, type_id)) {
            None => Ok(None),
            Some(entry) => Ok(Some(entry.hot.current_score(
                decay_rate_idx,
                now_ns,
                lambda,
            ))),
        }
    }
    /// Read the windowed event count for an entity-signal pair.
    ///
    /// Returns 0 if no signals have been recorded for this entity-signal pair.
    ///
    /// # Errors
    ///
    /// - `LumenError::Schema` if `signal_type_name` is not defined
    pub fn read_windowed_count(
        &self,
        entity_id: EntityId,
        signal_type_name: &str,
        window: Window,
    ) -> crate::Result<u64> {
        let type_id = self.resolve_signal_type(signal_type_name)?;
        match self.entries.get(&(entity_id, type_id)) {
            None => Ok(0),
            Some(entry) => Ok(entry.warm.windowed_count(window)),
        }
    }
    /// Read the velocity (events per second) for an entity-signal-window.
    ///
    /// Velocity = `windowed_count / window_duration_seconds`.
    /// `AllTime` returns 0.0 (velocity is undefined for unbounded windows).
    /// Returns 0.0 if no signals have been recorded for this entity-signal pair.
    ///
    /// # Errors
    ///
    /// - `LumenError::Schema` if `signal_type_name` is not defined
    pub fn read_velocity(
        &self,
        entity_id: EntityId,
        signal_type_name: &str,
        window: Window,
    ) -> crate::Result<f64> {
        let count = self.read_windowed_count(entity_id, signal_type_name, window)?;
        let duration_secs = window.duration_secs_f64();
        if duration_secs.is_infinite() {
            // AllTime window — velocity is undefined.
            return Ok(0.0);
        }
        #[allow(clippy::cast_precision_loss)]
        Ok(count as f64 / duration_secs)
    }
    /// Resolve a signal type name to its `SignalTypeId`.
    ///
    /// # Errors
    ///
    /// - `LumenError::Schema` if the name is not defined in this ledger's schema
    pub fn resolve_signal_type(&self, name: &str) -> crate::Result<SignalTypeId> {
        self.signal_name_to_id
            .get(name)
            .copied()
            .ok_or_else(|| LumenError::Schema(SchemaError::UnknownSignalType(name.to_owned())))
    }
    /// Reference to the `DashMap` for checkpoint iteration.
    pub(crate) const fn entries(&self) -> &DashMap<(EntityId, SignalTypeId), EntitySignalEntry> {
        &self.entries
    }
    /// The schema this ledger was constructed with.
    #[must_use]
    pub const fn schema(&self) -> &Schema {
        &self.schema
    }
 }
 impl fmt::Debug for SignalLedger {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        f.debug_struct("SignalLedger")
            .field("entry_count", &self.entries.len())
            .field("signal_count", &self.schema.signal_count())
            .finish_non_exhaustive()
    }
 }
 // ── Tests ─────────────────────────────────────────────────────────────────────
 #[cfg(test)]
 #[allow(clippy::unwrap_used)]
 mod tests {
    use std::time::Duration;
    use super::*;
    use crate::schema::{DecaySpec, EntityKind, SchemaBuilder, Window};
    fn test_schema() -> Schema {
        let mut builder = SchemaBuilder::new();
        let _ = builder
            .signal(
                "view",
                EntityKind::Item,
                DecaySpec::Exponential {
                    half_life: Duration::from_secs(7 * 24 * 3600),
                },
            )
            .windows(&[
                Window::OneHour,
                Window::TwentyFourHours,
                Window::SevenDays,
                Window::AllTime,
            ])
            .velocity(true)
            .add();
        builder.build().expect("valid test schema")
    }
    fn test_schema_multi() -> Schema {
        let mut builder = SchemaBuilder::new();
        let _ = builder
            .signal(
                "like",
                EntityKind::Item,
                DecaySpec::Exponential {
                    half_life: Duration::from_secs(14 * 24 * 3600),
                },
            )
            .windows(&[Window::AllTime])
            .velocity(false)
            .add();
        let _ = builder
            .signal(
                "skip",
                EntityKind::Item,
                DecaySpec::Exponential {
                    half_life: Duration::from_secs(24 * 3600),
                },
            )
            .windows(&[Window::AllTime])
            .velocity(false)
            .add();
        let _ = builder
            .signal(
                "view",
                EntityKind::Item,
                DecaySpec::Exponential {
                    half_life: Duration::from_secs(7 * 24 * 3600),
                },
            )
            .windows(&[Window::AllTime])
            .velocity(false)
            .add();
        builder.build().expect("valid multi-signal test schema")
    }
    #[test]
    fn ledger_record_and_read() {
        let schema = test_schema();
        let ledger = SignalLedger::new(schema, Box::new(NoopWalWriter));
        let entity_id = EntityId::new(42);
        let now = Timestamp::now();
        ledger
            .record_signal("view", entity_id, 1.0, now)
            .expect("record");
        let score = ledger
            .read_decay_score(entity_id, "view", 0)
            .expect("score");
        assert!(score.is_some());
        assert!(score.expect("some") > 0.0);
        let count = ledger
            .read_windowed_count(entity_id, "view", Window::OneHour)
            .expect("count");
        assert_eq!(count, 1);
        let all_time = ledger
            .read_windowed_count(entity_id, "view", Window::AllTime)
            .expect("all_time");
        assert_eq!(all_time, 1);
    }
    #[test]
    fn ledger_unknown_signal_type_returns_error() {
        let schema = test_schema();
        let ledger = SignalLedger::new(schema, Box::new(NoopWalWriter));
        let result = ledger.record_signal("nonexistent", EntityId::new(1), 1.0, Timestamp::now());
        assert!(result.is_err());
    }
    #[test]
    fn ledger_read_nonexistent_entity_returns_none() {
        let schema = test_schema();
        let ledger = SignalLedger::new(schema, Box::new(NoopWalWriter));
        let score = ledger
            .read_decay_score(EntityId::new(999), "view", 0)
            .expect("ok");
        assert!(score.is_none());
        let count = ledger
            .read_windowed_count(EntityId::new(999), "view", Window::OneHour)
            .expect("ok");
        assert_eq!(count, 0);
        let velocity = ledger
            .read_velocity(EntityId::new(999), "view", Window::OneHour)
            .expect("ok");
        assert!((velocity - 0.0).abs() < 1e-15);
    }
    #[test]
    fn ledger_velocity_all_time_is_zero() {
        let schema = test_schema();
        let ledger = SignalLedger::new(schema, Box::new(NoopWalWriter));
        let entity_id = EntityId::new(1);
        ledger
            .record_signal("view", entity_id, 1.0, Timestamp::now())
            .expect("record");
        let velocity = ledger
            .read_velocity(entity_id, "view", Window::AllTime)
            .expect("ok");
        assert!(
            (velocity - 0.0).abs() < 1e-15,
            "all-time velocity should be 0.0"
        );
    }
    #[test]
    fn ledger_multiple_signal_types() {
        let schema = test_schema_multi();
        let ledger = SignalLedger::new(schema, Box::new(NoopWalWriter));
        let entity_id = EntityId::new(1);
        let now = Timestamp::now();
        ledger
            .record_signal("view", entity_id, 1.0, now)
            .expect("record view");
        ledger
            .record_signal("like", entity_id, 1.0, now)
            .expect("record like");
        let view_count = ledger
            .read_windowed_count(entity_id, "view", Window::AllTime)
            .expect("ok");
        let like_count = ledger
            .read_windowed_count(entity_id, "like", Window::AllTime)
            .expect("ok");
        let skip_count = ledger
            .read_windowed_count(entity_id, "skip", Window::AllTime)
            .expect("ok");
        assert_eq!(view_count, 1);
        assert_eq!(like_count, 1);
        assert_eq!(skip_count, 0);
    }
    #[test]
    fn ledger_multiple_entities() {
        let schema = test_schema();
        let ledger = SignalLedger::new(schema, Box::new(NoopWalWriter));
        let now = Timestamp::now();
        ledger
            .record_signal("view", EntityId::new(1), 1.0, now)
            .expect("1");
        ledger
            .record_signal("view", EntityId::new(2), 1.0, now)
            .expect("2a");
        ledger
            .record_signal("view", EntityId::new(2), 1.0, now)
            .expect("2b");
        let count1 = ledger
            .read_windowed_count(EntityId::new(1), "view", Window::AllTime)
            .expect("ok");
        let count2 = ledger
            .read_windowed_count(EntityId::new(2), "view", Window::AllTime)
            .expect("ok");
        assert_eq!(count1, 1);
        assert_eq!(count2, 2);
    }
    #[test]
    fn ledger_is_send_and_sync() {
        fn assert_send_sync<T: Send + Sync>() {}
        assert_send_sync::<SignalLedger>();
    }
    #[test]
    fn signal_type_id_newtype() {
        let id = SignalTypeId::new(5);
        assert_eq!(id.as_u16(), 5);
        assert_eq!(id.to_string(), "5");
        assert_eq!(id, SignalTypeId::new(5));
        assert_ne!(id, SignalTypeId::new(6));
    }
 }
 #[cfg(test)]
 #[allow(clippy::unwrap_used, clippy::cast_precision_loss)]
 mod proptests {
    use std::time::Duration;
    use proptest::prelude::*;
    use super::*;
    use crate::schema::{DecaySpec, EntityKind, SchemaBuilder, Window};
    fn test_schema() -> Schema {
        let mut builder = SchemaBuilder::new();
        let _ = builder
            .signal(
                "view",
                EntityKind::Item,
                DecaySpec::Exponential {
                    half_life: Duration::from_secs(7 * 24 * 3600),
                },
            )
            .windows(&[Window::AllTime])
            .velocity(true)
            .add();
        builder.build().expect("valid schema")
    }
    // Velocity equals windowed_count / duration for all windows.
    proptest! {
        #[test]
        fn velocity_equals_count_over_duration(
            event_count in 1u64..1000,
        ) {
            let schema = test_schema();
            let ledger = SignalLedger::new(schema, Box::new(NoopWalWriter));
            let entity_id = EntityId::new(1);
            // All events in the current minute (within 1h window).
            let now = Timestamp::now();
            for i in 0..event_count {
                let ts = Timestamp::from_nanos(now.as_nanos() + i * 1_000_000);
                ledger.record_signal("view", entity_id, 1.0, ts).expect("record");
            }
            let count_1h = ledger
                .read_windowed_count(entity_id, "view", Window::OneHour)
                .expect("ok");
            let velocity_1h = ledger
                .read_velocity(entity_id, "view", Window::OneHour)
                .expect("ok");
            let expected_velocity = count_1h as f64 / Window::OneHour.duration_secs_f64();
            prop_assert!(
                (velocity_1h - expected_velocity).abs() < 1e-15,
                "velocity={velocity_1h}, expected={expected_velocity}"
            );
        }
    }
 }
--- a/tidal/src/signals/mod.rs
+++ b/tidal/src/signals/mod.rs
@ -1,3 +1,35 @@
 pub mod checkpoint;
 pub mod hot;
 pub mod ledger;
 pub mod warm;
 pub use hot::{HotSignalState, MAX_DECAY_RATES};
 pub use ledger::{EntitySignalEntry, NoopWalWriter, SignalLedger, WalWriter};
 pub use warm::{BucketedCounter, BucketedCounterSnapshot, HOUR_BUCKETS, MINUTE_BUCKETS};
 use std::fmt;
 /// A signal type index within the schema. Assigned by `SignalLedger` at construction.
 ///
 /// Maximum 64 signal types per entity kind (fits in a `u16`).
 /// The value 0 is valid and used for the first registered signal type.
 #[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, PartialOrd, Ord)]
 pub struct SignalTypeId(u16);
 impl SignalTypeId {
    #[must_use]
    pub const fn new(id: u16) -> Self {
        Self(id)
    }
    #[must_use]
    pub const fn as_u16(self) -> u16 {
        self.0
    }
 }
 impl fmt::Display for SignalTypeId {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        write!(f, "{}", self.0)
    }
 }
--- a/tidal/src/signals/warm.rs
+++ b/tidal/src/signals/warm.rs
@ -0,0 +1,571 @@
 //! Warm-tier bucketed event counter for windowed signal aggregation.
 //!
 //! `BucketedCounter` maintains per-minute and per-hour bucketed event counts
 //! for efficient windowed aggregation queries (1h, 24h, 7d). The design uses
 //! circular buffers with trigger-based rotation — rotation is performed inline
 //! on signal writes when enough time has elapsed, requiring no background thread.
 //!
 //! # Window query costs
 //!
 //! | Window | Operation                 | Atomics        |
 //! |--------|---------------------------|----------------|
 //! | 1h     | sum 60 minute buckets     | 60 relaxed loads  |
 //! | 24h    | sum 24 hour buckets       | 24 relaxed loads  |
 //! | 7d     | sum 168 hour buckets      | 168 relaxed loads |
 //! | all    | single counter read       | 1 relaxed load    |
 //!
 //! # Ordering rationale
 //!
 //! Bucket reads/writes use `Ordering::Relaxed` because windowed counts are
 //! inherently approximate — a query at time T may see data from T ± 60s due
 //! to scheduling. The ranking system does not require exact counts.
 //!
 //! `current_minute` and `current_hour` use `Acquire`/`Release` so the pointer
 //! update is visible before/after bucket data modifications.
 use std::fmt;
 use std::sync::atomic::{AtomicU8, AtomicU32, AtomicU64, Ordering};
 use crate::schema::Window;
 /// Number of per-minute bucket slots (covers 1 hour).
 pub const MINUTE_BUCKETS: usize = 60;
 /// Number of per-hour bucket slots (covers 7 days).
 pub const HOUR_BUCKETS: usize = 168;
 /// Nanoseconds per minute.
 const NS_PER_MIN: u64 = 60_000_000_000;
 /// Nanoseconds per hour.
 const NS_PER_HOUR: u64 = 3_600_000_000_000;
 /// Warm-tier bucketed event counter for a single signal type on a single entity.
 ///
 /// Supports simultaneous windowed count queries across 1h, 24h, 7d, and
 /// all-time windows by summing appropriate time-bucketed counters.
 ///
 /// # Design
 ///
 /// Per-minute buckets cover the last 60 minutes. Per-hour buckets cover the
 /// last 168 hours (7 days). The all-time counter is unbounded.
 ///
 /// Bucket rotation is trigger-based, checked on each `increment()` call.
 /// No background thread is required.
 pub struct BucketedCounter {
    /// Per-minute event count buckets. Circular buffer of 60 slots.
    minute_buckets: [AtomicU32; MINUTE_BUCKETS],
    /// Per-hour event count buckets. Circular buffer of 168 slots.
    hour_buckets: [AtomicU32; HOUR_BUCKETS],
    /// Index of the current minute bucket (0..59).
    current_minute: AtomicU8,
    /// Index of the current hour bucket (0..167).
    current_hour: AtomicU8,
    /// All-time total event count.
    all_time_count: AtomicU64,
    /// Nanosecond timestamp of the last minute rotation.
    last_minute_rotation_ns: AtomicU64,
    /// Nanosecond timestamp of the last hour rotation.
    last_hour_rotation_ns: AtomicU64,
 }
 impl BucketedCounter {
    /// Construct a new counter with all buckets zeroed and rotation timestamps at 0.
    #[must_use]
    pub fn new() -> Self {
        Self {
            minute_buckets: std::array::from_fn(|_| AtomicU32::new(0)),
            hour_buckets: std::array::from_fn(|_| AtomicU32::new(0)),
            current_minute: AtomicU8::new(0),
            current_hour: AtomicU8::new(0),
            all_time_count: AtomicU64::new(0),
            last_minute_rotation_ns: AtomicU64::new(0),
            last_hour_rotation_ns: AtomicU64::new(0),
        }
    }
    /// Construct with initial rotation timestamps set to `now_ns`.
    ///
    /// Use this when the start time is known — it prevents spurious rotations
    /// on the first increment.
    #[must_use]
    pub fn with_start_time(now_ns: u64) -> Self {
        let counter = Self::new();
        counter
            .last_minute_rotation_ns
            .store(now_ns, Ordering::Relaxed);
        counter
            .last_hour_rotation_ns
            .store(now_ns, Ordering::Relaxed);
        counter
    }
    /// Increment the current minute bucket and all-time counter by 1.
    ///
    /// Triggers minute and/or hour rotation if enough time has elapsed
    /// since the last rotation (trigger-based, no background thread).
    pub fn increment(&self, now_ns: u64) {
        self.maybe_rotate(now_ns);
        let idx = self.current_minute.load(Ordering::Acquire) as usize;
        self.minute_buckets[idx].fetch_add(1, Ordering::Relaxed);
        self.all_time_count.fetch_add(1, Ordering::Relaxed);
    }
    /// Increment by a count other than 1 (for batch replay).
    pub fn increment_by(&self, count: u32, now_ns: u64) {
        self.maybe_rotate(now_ns);
        let idx = self.current_minute.load(Ordering::Acquire) as usize;
        self.minute_buckets[idx].fetch_add(count, Ordering::Relaxed);
        self.all_time_count
            .fetch_add(u64::from(count), Ordering::Relaxed);
    }
    /// Query the windowed event count for a given window.
    ///
    /// | Window        | Source                    |
    /// |---------------|---------------------------|
    /// | OneHour       | sum last 60 minute buckets |
    /// | TwentyFourHours | sum last 24 hour buckets |
    /// | SevenDays     | sum last 168 hour buckets  |
    /// | ThirtyDays    | not supported in M1 → 0    |
    /// | AllTime       | single atomic read         |
    #[must_use]
    pub fn windowed_count(&self, window: Window) -> u64 {
        match window {
            Window::OneHour => self.sum_last_n_minutes(MINUTE_BUCKETS),
            Window::TwentyFourHours => self.sum_last_n_hours(24),
            Window::SevenDays => self.sum_last_n_hours(HOUR_BUCKETS),
            Window::ThirtyDays => {
                tracing::warn!("ThirtyDays window not supported in M1; returning 0");
                0
            }
            Window::AllTime => self.all_time_count.load(Ordering::Relaxed),
        }
    }
    /// Read the all-time total event count.
    #[must_use]
    pub fn all_time_count(&self) -> u64 {
        self.all_time_count.load(Ordering::Relaxed)
    }
    /// Read the count in the current minute bucket only.
    ///
    /// Used for fine-grained velocity computation within the current minute.
    #[must_use]
    pub fn current_minute_count(&self) -> u32 {
        let idx = self.current_minute.load(Ordering::Acquire) as usize;
        self.minute_buckets[idx].load(Ordering::Relaxed)
    }
    /// Rotate the minute pointer: zero the next slot and advance `current_minute`.
    ///
    /// Returns the count from the expired bucket (the slot that was zeroed).
    /// Used internally and by the checkpoint restore path.
    pub fn rotate_minute(&self) -> u32 {
        let current = self.current_minute.load(Ordering::Acquire) as usize;
        let next = (current + 1) % MINUTE_BUCKETS;
        // Atomically zero the next slot and retrieve its old value.
        let expired = self.minute_buckets[next].swap(0, Ordering::Relaxed);
        self.current_minute.store(next as u8, Ordering::Release);
        expired
    }
    /// Rotate the hour pointer: store `minute_aggregate` in the next hour slot
    /// and advance `current_hour`.
    ///
    /// `minute_aggregate` is the sum of the last 60 minute buckets, provided
    /// by the caller. The next slot is overwritten (no need to zero first).
    pub fn rotate_hour(&self, minute_aggregate: u32) {
        let current = self.current_hour.load(Ordering::Acquire) as usize;
        let next = (current + 1) % HOUR_BUCKETS;
        self.hour_buckets[next].store(minute_aggregate, Ordering::Relaxed);
        self.current_hour.store(next as u8, Ordering::Release);
    }
    /// Snapshot all state for checkpoint serialization.
    #[must_use]
    pub fn snapshot(&self) -> BucketedCounterSnapshot {
        BucketedCounterSnapshot {
            minute_buckets: std::array::from_fn(|i| self.minute_buckets[i].load(Ordering::Relaxed)),
            hour_buckets: std::array::from_fn(|i| self.hour_buckets[i].load(Ordering::Relaxed)),
            current_minute: self.current_minute.load(Ordering::Acquire),
            current_hour: self.current_hour.load(Ordering::Acquire),
            all_time_count: self.all_time_count.load(Ordering::Relaxed),
            last_minute_rotation_ns: self.last_minute_rotation_ns.load(Ordering::Relaxed),
            last_hour_rotation_ns: self.last_hour_rotation_ns.load(Ordering::Relaxed),
        }
    }
    /// Restore from a checkpoint snapshot.
    ///
    /// # Bounds clamping
    ///
    /// `current_minute` and `current_hour` are clamped to valid array bounds
    /// before being stored. A corrupted or maliciously crafted snapshot with
    /// out-of-range indices (≥ 60 or ≥ 168) would otherwise cause
    /// `rotate_minute`/`rotate_hour` to index out of bounds on the next
    /// rotation. Clamping makes restoration safe regardless of snapshot origin.
    pub fn restore(&self, snapshot: &BucketedCounterSnapshot) {
        for (i, &v) in snapshot.minute_buckets.iter().enumerate() {
            self.minute_buckets[i].store(v, Ordering::Relaxed);
        }
        for (i, &v) in snapshot.hour_buckets.iter().enumerate() {
            self.hour_buckets[i].store(v, Ordering::Relaxed);
        }
        // Clamp to valid bounds: out-of-range indices from a corrupt snapshot
        // would panic on the next rotate_minute/rotate_hour call.
        let current_minute = snapshot.current_minute.min((MINUTE_BUCKETS - 1) as u8);
        let current_hour = snapshot.current_hour.min((HOUR_BUCKETS - 1) as u8);
        self.current_minute.store(current_minute, Ordering::Release);
        self.current_hour.store(current_hour, Ordering::Release);
        self.all_time_count
            .store(snapshot.all_time_count, Ordering::Relaxed);
        self.last_minute_rotation_ns
            .store(snapshot.last_minute_rotation_ns, Ordering::Relaxed);
        self.last_hour_rotation_ns
            .store(snapshot.last_hour_rotation_ns, Ordering::Relaxed);
    }
    // ── Internal helpers ────────────────────────────────────────────────────────
    /// Check whether minute (and hour) rotation is needed based on `now_ns`,
    /// and perform it inline if so.
    ///
    /// Uses CAS on `last_minute_rotation_ns` so exactly one concurrent caller
    /// performs the rotation.
    fn maybe_rotate(&self, now_ns: u64) {
        let last_min = self.last_minute_rotation_ns.load(Ordering::Relaxed);
        if now_ns < last_min.saturating_add(NS_PER_MIN) {
            return;
        }
        let elapsed = now_ns.saturating_sub(last_min);
        let minutes_elapsed = (elapsed / NS_PER_MIN) as usize;
        let new_last_min =
            last_min.saturating_add((minutes_elapsed as u64).saturating_mul(NS_PER_MIN));
        // CAS to claim the rotation; only one thread proceeds.
        if self
            .last_minute_rotation_ns
            .compare_exchange(last_min, new_last_min, Ordering::AcqRel, Ordering::Relaxed)
            .is_err()
        {
            return;
        }
        // Perform minute rotations, capped to avoid clearing the entire buffer.
        let steps = minutes_elapsed.min(MINUTE_BUCKETS);
        for _ in 0..steps {
            self.rotate_minute();
        }
        // Check for hour rotation.
        let last_hour = self.last_hour_rotation_ns.load(Ordering::Relaxed);
        if now_ns < last_hour.saturating_add(NS_PER_HOUR) {
            return;
        }
        let h_elapsed = now_ns.saturating_sub(last_hour);
        let hours_elapsed = (h_elapsed / NS_PER_HOUR) as usize;
        let new_last_hour =
            last_hour.saturating_add((hours_elapsed as u64).saturating_mul(NS_PER_HOUR));
        if self
            .last_hour_rotation_ns
            .compare_exchange(
                last_hour,
                new_last_hour,
                Ordering::AcqRel,
                Ordering::Relaxed,
            )
            .is_err()
        {
            return;
        }
        // Compute the current hour aggregate from minute buckets.
        let agg: u32 = self
            .minute_buckets
            .iter()
            .map(|b| b.load(Ordering::Relaxed))
            .fold(0u32, u32::saturating_add);
        let h_steps = hours_elapsed.min(HOUR_BUCKETS);
        for i in 0..h_steps {
            // Only the first rotation carries real data; the rest are empty hours.
            let bucket_val = if i == 0 { agg } else { 0 };
            self.rotate_hour(bucket_val);
        }
    }
    fn sum_last_n_minutes(&self, n: usize) -> u64 {
        let current = self.current_minute.load(Ordering::Acquire) as usize;
        (0..n)
            .map(|i| {
                let idx = (current + MINUTE_BUCKETS - i) % MINUTE_BUCKETS;
                u64::from(self.minute_buckets[idx].load(Ordering::Relaxed))
            })
            .sum()
    }
    fn sum_last_n_hours(&self, n: usize) -> u64 {
        let current = self.current_hour.load(Ordering::Acquire) as usize;
        (0..n)
            .map(|i| {
                let idx = (current + HOUR_BUCKETS - i) % HOUR_BUCKETS;
                u64::from(self.hour_buckets[idx].load(Ordering::Relaxed))
            })
            .sum()
    }
 }
 impl Default for BucketedCounter {
    fn default() -> Self {
        Self::new()
    }
 }
 impl fmt::Debug for BucketedCounter {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        f.debug_struct("BucketedCounter")
            .field("all_time_count", &self.all_time_count())
            .field(
                "current_minute",
                &self.current_minute.load(Ordering::Relaxed),
            )
            .field("current_hour", &self.current_hour.load(Ordering::Relaxed))
            .field("windowed_1h", &self.windowed_count(Window::OneHour))
            .finish_non_exhaustive()
    }
 }
 /// Serializable snapshot of a `BucketedCounter`.
 ///
 /// Used for checkpoint/restore. All fields are plain integers.
 #[derive(Debug, Clone, PartialEq, Eq)]
 pub struct BucketedCounterSnapshot {
    pub minute_buckets: [u32; MINUTE_BUCKETS],
    pub hour_buckets: [u32; HOUR_BUCKETS],
    pub current_minute: u8,
    pub current_hour: u8,
    pub all_time_count: u64,
    pub last_minute_rotation_ns: u64,
    pub last_hour_rotation_ns: u64,
 }
 // ── Tests ────────────────────────────────────────────────────────────────────
 #[cfg(test)]
 mod tests {
    use super::*;
    #[test]
    fn new_counter_is_zeroed() {
        let counter = BucketedCounter::new();
        assert_eq!(counter.all_time_count(), 0);
        assert_eq!(counter.windowed_count(Window::OneHour), 0);
        assert_eq!(counter.windowed_count(Window::TwentyFourHours), 0);
        assert_eq!(counter.windowed_count(Window::SevenDays), 0);
        assert_eq!(counter.windowed_count(Window::AllTime), 0);
    }
    #[test]
    fn single_increment() {
        let counter = BucketedCounter::with_start_time(0);
        counter.increment(1_000_000_000); // 1 second
        assert_eq!(counter.all_time_count(), 1);
        assert_eq!(counter.windowed_count(Window::OneHour), 1);
        assert_eq!(counter.windowed_count(Window::AllTime), 1);
    }
    #[test]
    fn multiple_increments_same_minute() {
        let counter = BucketedCounter::with_start_time(0);
        for i in 0..100 {
            counter.increment(i * 100_000_000); // every 100ms for 10 seconds
        }
        assert_eq!(counter.all_time_count(), 100);
        assert_eq!(counter.windowed_count(Window::OneHour), 100);
    }
    #[test]
    fn minute_rotation_zeros_next_bucket() {
        let counter = BucketedCounter::with_start_time(0);
        // Fill minute 0 with 10 events
        for i in 0..10 {
            counter.increment(i * 1_000_000_000);
        }
        assert_eq!(counter.windowed_count(Window::OneHour), 10);
        // Advance past minute boundary (61 seconds)
        counter.increment(61_000_000_000);
        assert_eq!(counter.all_time_count(), 11);
        // The 1h window should include both minutes
        let count_1h = counter.windowed_count(Window::OneHour);
        assert_eq!(count_1h, 11);
    }
    #[test]
    fn events_outside_1h_window_not_counted() {
        let counter = BucketedCounter::with_start_time(0);
        // Add an event at t=0 (will rotate out)
        counter.increment(0);
        // Advance time past 1 hour with many rotations
        for minute in 1..=70 {
            let t_ns = minute * 60_000_000_000_u64;
            counter.increment(t_ns);
        }
        // The 1h window should contain the last 60 events, not all 71
        let count_1h = counter.windowed_count(Window::OneHour);
        assert!(count_1h <= 61, "1h count was {count_1h}, expected <= 61");
        assert_eq!(counter.all_time_count(), 71);
    }
    #[test]
    fn hour_rotation_aggregates_minutes() {
        let counter = BucketedCounter::with_start_time(0);
        // Simulate 2 hours of events: 5 per minute
        for minute in 0..120_u64 {
            let base_ns = minute * 60_000_000_000;
            for j in 0..5_u64 {
                counter.increment(base_ns + j * 1_000_000_000);
            }
        }
        assert_eq!(counter.all_time_count(), 600);
        // 24h window should include events (only 2 hours elapsed)
        let count_24h = counter.windowed_count(Window::TwentyFourHours);
        assert!(count_24h > 0, "24h window should have events");
    }
    #[test]
    fn all_time_window_reads_atomic_counter() {
        let counter = BucketedCounter::with_start_time(0);
        for i in 0..1000_u64 {
            counter.increment(i * 1_000_000);
        }
        assert_eq!(counter.windowed_count(Window::AllTime), 1000);
    }
    #[test]
    fn thirty_day_window_returns_zero() {
        let counter = BucketedCounter::with_start_time(0);
        counter.increment(1_000_000_000);
        // ThirtyDays not supported in M1
        assert_eq!(counter.windowed_count(Window::ThirtyDays), 0);
    }
    #[test]
    fn snapshot_and_restore_roundtrip() {
        let counter = BucketedCounter::with_start_time(0);
        for i in 0..50_u64 {
            counter.increment(i * 2_000_000_000); // every 2 seconds
        }
        let snapshot = counter.snapshot();
        let restored = BucketedCounter::new();
        restored.restore(&snapshot);
        assert_eq!(restored.all_time_count(), counter.all_time_count());
        assert_eq!(
            restored.windowed_count(Window::OneHour),
            counter.windowed_count(Window::OneHour)
        );
        assert_eq!(
            restored.windowed_count(Window::AllTime),
            counter.windowed_count(Window::AllTime)
        );
    }
    #[test]
    fn increment_by_adds_multiple() {
        let counter = BucketedCounter::with_start_time(0);
        counter.increment_by(42, 1_000_000_000);
        assert_eq!(counter.all_time_count(), 42);
        assert_eq!(counter.windowed_count(Window::OneHour), 42);
    }
 }
 #[cfg(test)]
 mod proptests {
    use super::*;
    use proptest::prelude::*;
    // P3: Windowed count equals event count in window (1h window).
    //
    // Events are constrained to a 3000s (50 minute) span to avoid the design
    // limitation where a gap of ≥ 3600s triggers a full 60-bucket rotation cycle
    // that clears all minute-bucket data. With a 3000s max span, at most 50
    // rotation cycles fire from the initial timestamp, so all event data survives.
    // All events thus fall within the 1h window and expected = total event count.
    proptest! {
        #[test]
        fn windowed_count_1h_matches_events(
            event_times_secs in prop::collection::vec(0u64..3000, 1..200),
        ) {
            // Sort events — trigger-based rotation requires time-ordered insertion.
            let mut sorted_times = event_times_secs;
            sorted_times.sort_unstable();
            let counter = BucketedCounter::with_start_time(0);
            for &t_secs in &sorted_times {
                counter.increment(t_secs * 1_000_000_000);
            }
            // All events are within 3000s (< 50 minute-bucket rotation cycles),
            // so all must appear in the 1h windowed count.
            let expected = sorted_times.len() as u64;
            let actual = counter.windowed_count(Window::OneHour);
            // Allow ±2 for bucket-boundary effects.
            prop_assert!(
                actual.abs_diff(expected) <= 2,
                "actual={actual}, expected={expected}"
            );
        }
    }
    // All-time count equals total event count.
    proptest! {
        #[test]
        fn all_time_count_matches_total(
            event_count in 0u64..10_000,
        ) {
            let counter = BucketedCounter::with_start_time(0);
            for i in 0..event_count {
                let t_ns = i * 1_000_000;
                counter.increment(t_ns);
            }
            prop_assert_eq!(counter.all_time_count(), event_count);
        }
    }
    // Circular buffer wrapping: all-time count survives full rotation.
    proptest! {
        #[test]
        fn minute_rotation_preserves_total(
            events_per_minute in prop::collection::vec(0u32..100, 60..120),
        ) {
            let counter = BucketedCounter::with_start_time(0);
            let mut total = 0u64;
            for (minute_idx, &count) in events_per_minute.iter().enumerate() {
                let base_ns = (minute_idx as u64) * 60_000_000_000;
                for j in 0..count {
                    let t_ns = base_ns + u64::from(j) * 1_000_000;
                    counter.increment(t_ns);
                    total += 1;
                }
            }
            prop_assert_eq!(counter.all_time_count(), total);
        }
    }
 }
--- a/tidal/src/storage/error.rs
+++ b/tidal/src/storage/error.rs
@ -43,6 +43,7 @@ impl From<std::io::Error> for StorageError {
 }
 #[cfg(test)]
 #[allow(clippy::unwrap_used)]
 mod tests {
    use super::*;
@ -76,7 +77,7 @@ mod tests {
    #[test]
    fn from_io_error() {
-        let io_err = std::io::Error::new(std::io::ErrorKind::Other, "disk full");
+        let io_err = std::io::Error::other("disk full");
        let storage_err: StorageError = io_err.into();
        assert!(matches!(storage_err, StorageError::Io(_)));
    }
@ -84,7 +85,7 @@ mod tests {
    #[test]
    fn source_io() {
        use std::error::Error;
-        let e = StorageError::Io(std::io::Error::new(std::io::ErrorKind::Other, "test"));
+        let e = StorageError::Io(std::io::Error::other("test"));
        assert!(e.source().is_some());
    }
--- a/tidal/src/storage/fjall.rs
+++ b/tidal/src/storage/fjall.rs
@ -239,6 +239,7 @@ impl std::fmt::Debug for FjallAtomicBatch {
 }
 #[cfg(test)]
 #[allow(clippy::unwrap_used)]
 mod tests {
    use super::*;
--- a/tidal/src/storage/keys.rs
+++ b/tidal/src/storage/keys.rs
@ -119,6 +119,7 @@ pub fn parse_key(key: &[u8]) -> Option<(EntityId, Tag, &[u8])> {
 }
 #[cfg(test)]
 #[allow(clippy::unwrap_used)]
 mod tests {
    use super::*;
@ -236,7 +237,7 @@ mod tests {
        let tags = [Tag::Evt, Tag::Sig, Tag::Meta, Tag::Rel, Tag::Mv, Tag::Idx];
        let bytes: Vec<u8> = tags.iter().map(|t| t.as_byte()).collect();
        let mut deduped = bytes.clone();
-        deduped.sort();
+        deduped.sort_unstable();
        deduped.dedup();
        assert_eq!(bytes.len(), deduped.len(), "tag bytes must be unique");
    }
--- a/tidal/src/storage/memory.rs
+++ b/tidal/src/storage/memory.rs
@ -103,6 +103,7 @@ impl StorageEngine for InMemoryBackend {
 }
 #[cfg(test)]
 #[allow(clippy::unwrap_used)]
 mod tests {
    use super::*;
--- a/tidal/src/wal/dedup.rs
+++ b/tidal/src/wal/dedup.rs
@ -66,6 +66,7 @@ impl DedupWindow {
 }
 #[cfg(test)]
 #[allow(clippy::unwrap_used, clippy::cast_precision_loss)]
 mod tests {
    use super::*;
--- a/tidal/src/wal/error.rs
+++ b/tidal/src/wal/error.rs
@ -49,6 +49,7 @@ impl From<std::io::Error> for WalError {
 }
 #[cfg(test)]
 #[allow(clippy::unwrap_used)]
 mod tests {
    use super::*;
@ -92,7 +93,7 @@ mod tests {
    #[test]
    fn from_io_error() {
-        let io_err = std::io::Error::new(std::io::ErrorKind::Other, "disk full");
+        let io_err = std::io::Error::other("disk full");
        let wal_err: WalError = io_err.into();
        assert!(matches!(wal_err, WalError::Io(_)));
    }
@ -100,7 +101,7 @@ mod tests {
    #[test]
    fn source_io() {
        use std::error::Error;
-        let e = WalError::Io(std::io::Error::new(std::io::ErrorKind::Other, "test"));
+        let e = WalError::Io(std::io::Error::other("test"));
        assert!(e.source().is_some());
    }
--- a/tidal/src/wal/format.rs
+++ b/tidal/src/wal/format.rs
@ -290,6 +290,7 @@ pub fn event_content_hash(event: &EventRecord) -> u128 {
 }
 #[cfg(test)]
 #[allow(clippy::unwrap_used)]
 mod tests {
    use super::*;
@ -502,7 +503,7 @@ mod tests {
                    return Ok(());
                }
                let actual_offset = payload_start + (corrupt_offset % payload_size);
-                let mut corrupted = encoded.clone();
+                let mut corrupted = encoded;
                corrupted[actual_offset] ^= 0xFF;
                let result = decode_batch(&corrupted);
                prop_assert!(result.is_err());
--- a/tidal/src/wal/mod.rs
+++ b/tidal/src/wal/mod.rs
@ -310,6 +310,7 @@ impl Drop for WalHandle {
 }
 #[cfg(test)]
 #[allow(clippy::unwrap_used)]
 mod tests {
    use super::*;
@ -352,7 +353,8 @@ mod tests {
        let (handle, _) = WalHandle::open(config).expect("open should succeed");
        let seq = handle.append(make_event(1)).expect("append should succeed");
-        assert!(seq > 0 || seq == 0); // first seq is based on recovery
+        // Sequence is always non-negative (u64), just verify we got a value
        let _ = seq;
        handle.shutdown().expect("shutdown should succeed");
    }
--- a/tidal/src/wal/reader.rs
+++ b/tidal/src/wal/reader.rs
@ -170,6 +170,7 @@ fn scan_segment(path: &Path) -> Result<Vec<(BatchHeader, Vec<EventRecord>)>, Wal
 }
 #[cfg(test)]
 #[allow(clippy::unwrap_used)]
 mod tests {
    use super::super::format::{EventRecord, encode_batch};
    use super::*;
@ -309,7 +310,7 @@ mod tests {
    fn recover_empty_segment_file() {
        let dir = tempfile::tempdir().expect("tempdir creation should succeed");
        let seg_name = super::super::segment::segment_filename(1);
-        fs::write(dir.path().join(seg_name), &[]).expect("write should succeed");
+        fs::write(dir.path().join(seg_name), []).expect("write should succeed");
        let result = recover(dir.path()).expect("recover should succeed");
        assert!(result.events.is_empty());
--- a/tidal/src/wal/writer.rs
+++ b/tidal/src/wal/writer.rs
@ -278,6 +278,7 @@ pub fn run_writer(
 }
 #[cfg(test)]
 #[allow(clippy::unwrap_used, clippy::similar_names)]
 mod tests {
    use super::*;
    use crossbeam::channel::bounded;
--- a/tidal/tests/metrics_integration.rs
+++ b/tidal/tests/metrics_integration.rs
@ -0,0 +1,168 @@
 //! Integration tests for the metrics HTTP server.
 //!
 //! These tests require `--features metrics`.
 #![allow(clippy::unwrap_used)]
 use std::io::{Read, Write};
 use std::net::TcpStream;
 use std::time::Duration;
 use tidaldb::TidalDb;
 /// Make an HTTP GET request to the given address and path.
 /// Returns (status_code, body).
 fn http_get(addr: std::net::SocketAddr, path: &str) -> (u16, String) {
    let mut stream = TcpStream::connect(addr).expect("connect");
    write!(
        stream,
        "GET {path} HTTP/1.1\r\nHost: localhost\r\nConnection: close\r\n\r\n"
    )
    .unwrap();
    stream.flush().unwrap();
    let mut response = String::new();
    stream.read_to_string(&mut response).unwrap();
    // Parse status line: "HTTP/1.1 200 OK"
    let status: u16 = response
        .lines()
        .next()
        .and_then(|l| l.split_whitespace().nth(1))
        .and_then(|s| s.parse().ok())
        .unwrap_or(0);
    let body = response
        .splitn(2, "\r\n\r\n")
        .nth(1)
        .unwrap_or("")
        .to_string();
    (status, body)
 }
 fn open_with_metrics() -> (std::net::SocketAddr, TidalDb) {
    let db = TidalDb::builder()
        .ephemeral()
        .enable_metrics("127.0.0.1:0")
        .open()
        .expect("open should succeed");
    let addr = db
        .metrics_addr()
        .expect("metrics_addr should be Some when metrics enabled");
    (addr, db)
 }
 #[test]
 fn metrics_server_starts_on_port_zero() {
    let (addr, db) = open_with_metrics();
    assert_ne!(addr.port(), 0, "OS should have assigned a real port");
    db.close().expect("close should succeed");
 }
 #[test]
 fn healthz_returns_200_json() {
    let (addr, _db) = open_with_metrics();
    let (status, body) = http_get(addr, "/healthz");
    assert_eq!(status, 200);
    assert!(body.contains("\"status\":\"ok\""), "body: {body}");
    assert!(
        body.contains("\"uptime_seconds\":"),
        "body should have uptime: {body}"
    );
    assert!(
        body.contains("\"version\":"),
        "body should have version: {body}"
    );
    assert!(
        body.contains("\"build_hash\":"),
        "body should have build_hash: {body}"
    );
 }
 #[test]
 fn metrics_returns_200_prometheus() {
    let (addr, _db) = open_with_metrics();
    let (status, body) = http_get(addr, "/metrics");
    assert_eq!(status, 200);
    assert!(
        body.contains("tidaldb_uptime_seconds"),
        "missing uptime metric in body: {body}"
    );
    assert!(
        body.contains("tidaldb_health_ok"),
        "missing health_ok metric in body: {body}"
    );
    assert!(
        body.contains("tidaldb_info"),
        "missing info metric in body: {body}"
    );
 }
 #[test]
 fn uptime_increases_over_time() {
    let (addr, _db) = open_with_metrics();
    let (_, body1) = http_get(addr, "/metrics");
    std::thread::sleep(Duration::from_millis(150));
    let (_, body2) = http_get(addr, "/metrics");
    let uptime1 = extract_uptime(&body1);
    let uptime2 = extract_uptime(&body2);
    assert!(
        uptime2 > uptime1,
        "uptime should increase: {uptime1} -> {uptime2}"
    );
 }
 #[test]
 fn health_ok_value_is_one() {
    let (addr, _db) = open_with_metrics();
    let (_, body) = http_get(addr, "/metrics");
    let health_line = body
        .lines()
        .find(|l| l.starts_with("tidaldb_health_ok{"))
        .expect("should have health_ok metric");
    assert!(
        health_line.ends_with(" 1"),
        "health should be 1: {health_line}"
    );
 }
 #[test]
 fn not_found_returns_404() {
    let (addr, _db) = open_with_metrics();
    let (status, _) = http_get(addr, "/nonexistent");
    assert_eq!(status, 404);
 }
 #[test]
 fn close_stops_server() {
    let (addr, db) = open_with_metrics();
    // Server should respond before close
    let (status, _) = http_get(addr, "/healthz");
    assert_eq!(status, 200);
    db.close().expect("close should succeed");
    // Give the thread a moment to shut down
    std::thread::sleep(Duration::from_millis(200));
    // After close, connection should be refused
    let result = TcpStream::connect_timeout(&addr.into(), Duration::from_millis(200));
    assert!(result.is_err(), "connection should be refused after close");
 }
 // -----------------------------------------------------------------------
 // Helpers
 // -----------------------------------------------------------------------
 fn extract_uptime(prometheus_body: &str) -> f64 {
    for line in prometheus_body.lines() {
        if line.starts_with("tidaldb_uptime_seconds{") {
            let val = line.split_whitespace().last().unwrap_or("0");
            return val.parse().unwrap_or(0.0);
        }
    }
    0.0
 }
--- a/tidal/tests/sandboxed_storage.rs
+++ b/tidal/tests/sandboxed_storage.rs
@ -0,0 +1,173 @@
 //! Integration tests for the sandboxed storage layout.
 //!
 //! Validates that [`Paths`] and [`TempTidalHome`] provide correct,
 //! isolated filesystem sandboxes for tidalDB instances.
 use tidaldb::TidalDb;
 use tidaldb::db::temp::TempTidalHome;
 // =============================================================================
 // TempTidalHome isolation
 // =============================================================================
 #[test]
 fn two_temp_homes_have_non_overlapping_paths() {
    let home1 = TempTidalHome::new().unwrap();
    let home2 = TempTidalHome::new().unwrap();
    assert_ne!(
        home1.path(),
        home2.path(),
        "two TempTidalHome instances must have different base paths"
    );
    // Verify neither is a prefix of the other (true isolation).
    assert!(
        !home1.path().starts_with(home2.path()),
        "home1 should not be a subdirectory of home2"
    );
    assert!(
        !home2.path().starts_with(home1.path()),
        "home2 should not be a subdirectory of home1"
    );
 }
 // =============================================================================
 // Drop removes directory
 // =============================================================================
 #[test]
 fn drop_removes_directory_with_contents() {
    let path = {
        let home = TempTidalHome::new().unwrap();
        let p = home.path().to_owned();
        // Create a file inside so the directory is non-empty.
        std::fs::write(p.join("sentinel.txt"), b"test data").unwrap();
        assert!(p.exists(), "directory should exist before drop");
        p
        // home dropped here
    };
    assert!(!path.exists(), "directory should be removed after drop");
 }
 // =============================================================================
 // with_preserve keeps directory
 // =============================================================================
 #[test]
 fn with_preserve_keeps_directory_after_drop() {
    let path = {
        let home = TempTidalHome::with_preserve().unwrap();
        let p = home.path().to_owned();
        std::fs::write(p.join("sentinel.txt"), b"test data").unwrap();
        p
        // home dropped here, but preserve=true
    };
    assert!(
        path.exists(),
        "directory should still exist after drop with preserve"
    );
    // Manual cleanup.
    std::fs::remove_dir_all(&path).unwrap();
 }
 // =============================================================================
 // Paths::ensure_all creates expected subdirectories
 // =============================================================================
 #[test]
 fn ensure_all_creates_all_subdirectories() {
    let home = TempTidalHome::new().unwrap();
    let paths = home.paths();
    // Before ensure_all, subdirectories should not exist.
    assert!(!paths.wal_dir().exists());
    assert!(!paths.items_dir().exists());
    assert!(!paths.users_dir().exists());
    assert!(!paths.creators_dir().exists());
    assert!(!paths.cache_dir().exists());
    paths.ensure_all().unwrap();
    // After ensure_all, all subdirectories should exist.
    assert!(paths.wal_dir().is_dir(), "wal dir should exist");
    assert!(paths.items_dir().is_dir(), "items dir should exist");
    assert!(paths.users_dir().is_dir(), "users dir should exist");
    assert!(paths.creators_dir().is_dir(), "creators dir should exist");
    assert!(paths.cache_dir().is_dir(), "cache dir should exist");
 }
 // =============================================================================
 // Two builders with different TempTidalHome roots never collide
 // =============================================================================
 #[test]
 fn two_builders_with_different_homes_are_isolated() {
    let home1 = TempTidalHome::new().unwrap();
    let home2 = TempTidalHome::new().unwrap();
    let paths1 = home1.paths();
    let paths2 = home2.paths();
    paths1.ensure_all().unwrap();
    paths2.ensure_all().unwrap();
    // Write a file into home1's wal directory.
    let sentinel1 = paths1.wal_dir().join("segment-001");
    std::fs::write(&sentinel1, b"home1 wal data").unwrap();
    // Write a file into home2's wal directory.
    let sentinel2 = paths2.wal_dir().join("segment-001");
    std::fs::write(&sentinel2, b"home2 wal data").unwrap();
    // Verify each sees only its own data.
    let data1 = std::fs::read(&sentinel1).unwrap();
    let data2 = std::fs::read(&sentinel2).unwrap();
    assert_eq!(data1, b"home1 wal data");
    assert_eq!(data2, b"home2 wal data");
    // Verify the other home's directory does not contain the wrong data.
    assert_ne!(data1, data2);
 }
 // =============================================================================
 // Builder integrates with Paths via TempTidalHome
 // =============================================================================
 #[test]
 fn builder_with_temp_home_opens_successfully() {
    let home = TempTidalHome::new().unwrap();
    // The builder validates data_dir exists (TempTidalHome creates it).
    // Resolved defaults (wal_dir, cache_dir) are populated after validation
    // so they don't need to pre-exist.
    let db = TidalDb::builder().with_data_dir(home.path()).open();
    assert!(db.is_ok(), "builder with TempTidalHome should succeed");
 }
 #[test]
 fn ensure_all_is_idempotent() {
    let home = TempTidalHome::new().unwrap();
    let paths = home.paths();
    paths.ensure_all().unwrap();
    // Write a file to verify it survives a second ensure_all.
    std::fs::write(paths.wal_dir().join("segment-001"), b"data").unwrap();
    paths.ensure_all().unwrap();
    // File should still exist.
    assert!(paths.wal_dir().join("segment-001").exists());
 }
 #[test]
 fn paths_base_matches_temp_home_path() {
    let home = TempTidalHome::new().unwrap();
    let paths = home.paths();
    assert_eq!(paths.base(), home.path());
 }
--- a/tidal/tests/storage.rs
+++ b/tidal/tests/storage.rs
@ -1,3 +1,5 @@
 #![allow(clippy::unwrap_used)]
 use tidaldb::schema::EntityId;
 use tidaldb::storage::{
    FjallStorage, InMemoryBackend, StorageEngine, StorageError, Tag, WriteBatch, encode_key,
--- a/tidalctl/Cargo.toml
+++ b/tidalctl/Cargo.toml
@ -0,0 +1,28 @@
 [package]
 name = "tidalctl"
 version = "0.1.0"
 edition.workspace = true
 rust-version.workspace = true
 description = "Command-line inspector for embedded tidalDB instances"
 license.workspace = true
 [lints.rust]
 unsafe_code = "forbid"
 [lints.clippy]
 all = { level = "deny", priority = -1 }
 pedantic = { level = "warn", priority = -1 }
 nursery = { level = "warn", priority = -1 }
 cast_possible_truncation = "allow"
 module_name_repetitions = "allow"
 unwrap_used = "deny"
 [dependencies]
 tidaldb = { path = "../tidal" }
 serde_json = "1"
 [dev-dependencies]
 tidaldb = { path = "../tidal", features = ["test-utils"] }
 [[test]]
 name = "cli"
--- a/tidalctl/src/main.rs
+++ b/tidalctl/src/main.rs
@ -0,0 +1,294 @@
 //! tidalctl -- command-line inspector for embedded tidalDB instances.
 //!
 //! Reports WAL state, checkpoint status, and directory layout for a
 //! tidalDB data directory without opening or locking any files.
 //!
 //! Usage:
 //!   tidalctl <command> --path <dir> [--pretty]
 //!
 //! Commands:
 //!   status    Report WAL state, checkpoint, and directory layout
 //!   paths     Report resolved directory paths and existence
 #![forbid(unsafe_code)]
 use std::path::PathBuf;
 use std::process;
 fn main() {
    let args: Vec<String> = std::env::args().collect();
    match run(&args) {
        Ok(output) => {
            println!("{output}");
        }
        Err(e) => {
            eprintln!("{}", e.to_json());
            process::exit(1);
        }
    }
 }
 // ---------------------------------------------------------------------------
 // CLI parsing
 // ---------------------------------------------------------------------------
 struct CliArgs {
    command: Command,
    path: PathBuf,
    pretty: bool,
 }
 enum Command {
    Status,
    Paths,
 }
 struct CliError {
    message: String,
 }
 impl CliError {
    fn new(message: impl Into<String>) -> Self {
        Self {
            message: message.into(),
        }
    }
    fn to_json(&self) -> String {
        // Hand-rolled JSON to avoid serde in the error path.
        let escaped = self.message.replace('\\', "\\\\").replace('"', "\\\"");
        format!(r#"{{"error":"{escaped}"}}"#)
    }
 }
 fn parse_args(args: &[String]) -> Result<CliArgs, CliError> {
    if args.len() < 2 {
        return Err(CliError::new(usage()));
    }
    let command = match args[1].as_str() {
        "status" => Command::Status,
        "paths" => Command::Paths,
        "--help" | "-h" | "help" => return Err(CliError::new(usage())),
        other => {
            return Err(CliError::new(format!(
                "unknown command: '{other}'\n{}",
                usage()
            )));
        }
    };
    let mut path: Option<PathBuf> = None;
    let mut pretty = false;
    let mut i = 2;
    while i < args.len() {
        match args[i].as_str() {
            "--path" => {
                i += 1;
                if i >= args.len() {
                    return Err(CliError::new("--path requires a value"));
                }
                path = Some(PathBuf::from(&args[i]));
            }
            "--pretty" => {
                pretty = true;
            }
            other => {
                return Err(CliError::new(format!("unknown flag: '{other}'")));
            }
        }
        i += 1;
    }
    let path = path.ok_or_else(|| CliError::new("--path is required"))?;
    Ok(CliArgs {
        command,
        path,
        pretty,
    })
 }
 fn usage() -> String {
    "Usage: tidalctl <command> --path <dir> [--pretty]\n\n\
     Commands:\n  \
       status    Report WAL state, checkpoint, and directory layout\n  \
       paths     Report resolved directory paths and existence"
        .to_string()
 }
 // ---------------------------------------------------------------------------
 // Command dispatch
 // ---------------------------------------------------------------------------
 fn run(args: &[String]) -> Result<String, CliError> {
    let cli = parse_args(args)?;
    match cli.command {
        Command::Status => cmd_status(&cli.path, cli.pretty),
        Command::Paths => cmd_paths(&cli.path, cli.pretty),
    }
 }
 // ---------------------------------------------------------------------------
 // status command
 // ---------------------------------------------------------------------------
 fn cmd_status(base: &std::path::Path, pretty: bool) -> Result<String, CliError> {
    let paths = tidaldb::Paths::new(base);
    let wal_dir = paths.wal_dir();
    let version = env!("CARGO_PKG_VERSION");
    let build_hash = tidaldb::BUILD_HASH;
    // Determine WAL state
    let wal_result = gather_wal_state(&wal_dir);
    let (status, wal_json) = match wal_result {
        Ok(wal) if wal.segments > 0 => ("ok", wal.to_json()),
        Ok(wal) => ("empty", wal.to_json()),
        Err(e) => ("error", format!(r#"{{"error":"{}"}}"#, json_escape(&e))),
    };
    let dirs_json = dirs_json(&paths);
    let json = format!(
        r#"{{"version":"{version}","build_hash":"{build_hash}","status":"{status}","wal":{wal_json},"dirs":{dirs_json}}}"#
    );
    format_json(&json, pretty)
 }
 struct WalState {
    segments: usize,
    first_seq: u64,
    last_segment_seq: u64,
    checkpoint_seq: u64,
    checkpoint_ts: u64,
    wal_dir_bytes: u64,
 }
 impl WalState {
    fn to_json(&self) -> String {
        format!(
            r#"{{"segments":{},"first_seq":{},"last_segment_seq":{},"checkpoint_seq":{},"checkpoint_ts":{},"wal_dir_bytes":{}}}"#,
            self.segments,
            self.first_seq,
            self.last_segment_seq,
            self.checkpoint_seq,
            self.checkpoint_ts,
            self.wal_dir_bytes,
        )
    }
 }
 fn gather_wal_state(wal_dir: &std::path::Path) -> Result<WalState, String> {
    if !wal_dir.exists() {
        return Ok(WalState {
            segments: 0,
            first_seq: 0,
            last_segment_seq: 0,
            checkpoint_seq: 0,
            checkpoint_ts: 0,
            wal_dir_bytes: 0,
        });
    }
    let segments = tidaldb::wal::segment::list_segments(wal_dir).map_err(|e| format!("{e}"))?;
    let first_seq = segments.first().map_or(0, |(seq, _)| *seq);
    let last_segment_seq = segments.last().map_or(0, |(seq, _)| *seq);
    let (checkpoint_seq, checkpoint_ts) =
        match tidaldb::wal::checkpoint::CheckpointManager::read(wal_dir) {
            Ok(Some((seq, ts))) => (seq, ts),
            Ok(None) => (0, 0),
            Err(e) => return Err(format!("checkpoint read error: {e}")),
        };
    let wal_dir_bytes = dir_size(wal_dir).map_err(|e| format!("{e}"))?;
    Ok(WalState {
        segments: segments.len(),
        first_seq,
        last_segment_seq,
        checkpoint_seq,
        checkpoint_ts,
        wal_dir_bytes,
    })
 }
 fn dir_size(dir: &std::path::Path) -> Result<u64, std::io::Error> {
    let mut total = 0u64;
    for entry in std::fs::read_dir(dir)? {
        let entry = entry?;
        let meta = entry.metadata()?;
        if meta.is_file() {
            total += meta.len();
        }
    }
    Ok(total)
 }
 // ---------------------------------------------------------------------------
 // paths command
 // ---------------------------------------------------------------------------
 fn cmd_paths(base: &std::path::Path, pretty: bool) -> Result<String, CliError> {
    let paths = tidaldb::Paths::new(base);
    let base_str = json_escape(&paths.base().display().to_string());
    let wal_str = json_escape(&paths.wal_dir().display().to_string());
    let items_str = json_escape(&paths.items_dir().display().to_string());
    let users_str = json_escape(&paths.users_dir().display().to_string());
    let creators_str = json_escape(&paths.creators_dir().display().to_string());
    let cache_str = json_escape(&paths.cache_dir().display().to_string());
    let json = format!(
        r#"{{"base":"{base_str}","wal":"{wal_str}","items":"{items_str}","users":"{users_str}","creators":"{creators_str}","cache":"{cache_str}","exists":{{"base":{},"wal":{},"items":{},"users":{},"creators":{},"cache":{}}}}}"#,
        paths.base().exists(),
        paths.wal_dir().exists(),
        paths.items_dir().exists(),
        paths.users_dir().exists(),
        paths.creators_dir().exists(),
        paths.cache_dir().exists(),
    );
    format_json(&json, pretty)
 }
 // ---------------------------------------------------------------------------
 // JSON helpers
 // ---------------------------------------------------------------------------
 fn dirs_json(paths: &tidaldb::Paths) -> String {
    let base_str = json_escape(&paths.base().display().to_string());
    let wal_str = json_escape(&paths.wal_dir().display().to_string());
    let items_str = json_escape(&paths.items_dir().display().to_string());
    let users_str = json_escape(&paths.users_dir().display().to_string());
    let creators_str = json_escape(&paths.creators_dir().display().to_string());
    let cache_str = json_escape(&paths.cache_dir().display().to_string());
    format!(
        r#"{{"base":"{base_str}","wal":"{wal_str}","items":"{items_str}","users":"{users_str}","creators":"{creators_str}","cache":"{cache_str}"}}"#
    )
 }
 fn json_escape(s: &str) -> String {
    s.replace('\\', "\\\\").replace('"', "\\\"")
 }
 /// Format a compact JSON string. When `pretty` is true, use `serde_json`
 /// to re-parse and pretty-print.
 fn format_json(json: &str, pretty: bool) -> Result<String, CliError> {
    if pretty {
        let value: serde_json::Value = serde_json::from_str(json)
            .map_err(|e| CliError::new(format!("internal json error: {e}")))?;
        serde_json::to_string_pretty(&value)
            .map_err(|e| CliError::new(format!("internal json error: {e}")))
    } else {
        Ok(json.to_string())
    }
 }
--- a/tidalctl/tests/cli.rs
+++ b/tidalctl/tests/cli.rs
@ -0,0 +1,225 @@
 //! Integration tests for the tidalctl binary.
 //!
 //! Tests run the actual binary as a subprocess via `std::process::Command`.
 #![forbid(unsafe_code)]
 #![allow(clippy::unwrap_used)]
 use std::process::Command;
 use tidaldb::db::temp::TempTidalHome;
 use tidaldb::wal::{SignalEvent, WalConfig, WalHandle};
 fn tidalctl_bin() -> Command {
    Command::new(env!("CARGO_BIN_EXE_tidalctl"))
 }
 /// Create a `TempTidalHome` with WAL segments and a checkpoint.
 /// Returns the home so it stays alive (not dropped) for the test.
 fn home_with_wal_data() -> TempTidalHome {
    let home = TempTidalHome::new().expect("create temp home");
    let paths = home.paths();
    paths.ensure_all().expect("create subdirs");
    let config = WalConfig {
        dir: home.path().to_path_buf(),
        ..WalConfig::default()
    };
    let (handle, _replayed) = WalHandle::open(config).expect("open WAL");
    for i in 1..=5 {
        let event = SignalEvent {
            entity_id: i,
            signal_type: 1,
            weight: 1.0,
            timestamp_nanos: i * 1_000_000_000,
        };
        let _seq = handle.append(event).expect("append event");
    }
    handle.checkpoint(3).expect("checkpoint");
    handle.shutdown().expect("shutdown WAL");
    home
 }
 #[test]
 fn status_with_wal_segments_exits_0() {
    let home = home_with_wal_data();
    let output = tidalctl_bin()
        .args(["status", "--path", home.path().to_str().unwrap()])
        .output()
        .expect("run tidalctl");
    assert!(
        output.status.success(),
        "exit code: {}, stderr: {}",
        output.status,
        String::from_utf8_lossy(&output.stderr)
    );
    let stdout = String::from_utf8(output.stdout).expect("utf8");
    let json: serde_json::Value = serde_json::from_str(&stdout).expect("parse JSON");
    assert_eq!(json["status"], "ok");
    assert!(
        json["wal"]["segments"].as_u64().unwrap_or(0) > 0,
        "should have segments: {stdout}"
    );
    assert!(
        json["wal"]["checkpoint_seq"].as_u64().unwrap_or(0) > 0,
        "should have checkpoint: {stdout}"
    );
    assert!(json["version"].is_string());
    assert!(json["build_hash"].is_string());
    assert!(json["dirs"]["base"].is_string());
 }
 #[test]
 fn status_empty_dir_shows_empty() {
    let home = TempTidalHome::new().expect("create temp home");
    let paths = home.paths();
    paths.ensure_all().expect("create subdirs");
    let output = tidalctl_bin()
        .args(["status", "--path", home.path().to_str().unwrap()])
        .output()
        .expect("run tidalctl");
    assert!(output.status.success());
    let stdout = String::from_utf8(output.stdout).expect("utf8");
    let json: serde_json::Value = serde_json::from_str(&stdout).expect("parse JSON");
    assert_eq!(json["status"], "empty");
 }
 #[test]
 fn status_nonexistent_path_exits_0_with_empty() {
    // When the WAL dir doesn't exist, gather_wal_state returns segments=0
    // which maps to "empty" status. The base dir may or may not exist.
    let output = tidalctl_bin()
        .args(["status", "--path", "/nonexistent/path/that/does/not/exist"])
        .output()
        .expect("run tidalctl");
    let stdout = String::from_utf8(output.stdout).expect("utf8");
    // This should still be valid JSON regardless of status
    if output.status.success() {
        let json: serde_json::Value = serde_json::from_str(&stdout).expect("parse JSON");
        assert!(
            json["status"] == "empty" || json["status"] == "error",
            "status should be empty or error: {stdout}"
        );
    } else {
        let stderr = String::from_utf8(output.stderr).expect("utf8");
        let json: serde_json::Value = serde_json::from_str(&stderr).expect("parse error JSON");
        assert!(json["error"].is_string());
    }
 }
 #[test]
 fn paths_exits_0_with_all_dirs() {
    let home = TempTidalHome::new().expect("create temp home");
    let paths = home.paths();
    paths.ensure_all().expect("create subdirs");
    let output = tidalctl_bin()
        .args(["paths", "--path", home.path().to_str().unwrap()])
        .output()
        .expect("run tidalctl");
    assert!(output.status.success());
    let stdout = String::from_utf8(output.stdout).expect("utf8");
    let json: serde_json::Value = serde_json::from_str(&stdout).expect("parse JSON");
    // Check all six directory paths are present
    assert!(json["base"].is_string(), "missing base: {stdout}");
    assert!(json["wal"].is_string(), "missing wal: {stdout}");
    assert!(json["items"].is_string(), "missing items: {stdout}");
    assert!(json["users"].is_string(), "missing users: {stdout}");
    assert!(json["creators"].is_string(), "missing creators: {stdout}");
    assert!(json["cache"].is_string(), "missing cache: {stdout}");
    // Check exists map
    assert!(json["exists"]["base"].is_boolean());
    assert!(json["exists"]["wal"].is_boolean());
    assert_eq!(json["exists"]["base"], true);
    assert_eq!(json["exists"]["wal"], true);
 }
 #[test]
 fn paths_nonexistent_dir_shows_false_exists() {
    let output = tidalctl_bin()
        .args(["paths", "--path", "/nonexistent/does/not/exist"])
        .output()
        .expect("run tidalctl");
    assert!(output.status.success());
    let stdout = String::from_utf8(output.stdout).expect("utf8");
    let json: serde_json::Value = serde_json::from_str(&stdout).expect("parse JSON");
    assert_eq!(json["exists"]["base"], false);
    assert_eq!(json["exists"]["wal"], false);
 }
 #[test]
 fn pretty_flag_produces_indented_json() {
    let home = TempTidalHome::new().expect("create temp home");
    let output = tidalctl_bin()
        .args(["paths", "--path", home.path().to_str().unwrap(), "--pretty"])
        .output()
        .expect("run tidalctl");
    assert!(output.status.success());
    let stdout = String::from_utf8(output.stdout).expect("utf8");
    // Pretty-printed JSON should have newlines and indentation
    assert!(
        stdout.contains('\n'),
        "pretty output should have newlines: {stdout}"
    );
    assert!(
        stdout.contains("  "),
        "pretty output should have indentation: {stdout}"
    );
    // Should still be valid JSON
    let _json: serde_json::Value = serde_json::from_str(&stdout).expect("parse JSON");
 }
 #[test]
 fn bad_command_exits_1() {
    let output = tidalctl_bin()
        .args(["nonexistent_command", "--path", "/tmp"])
        .output()
        .expect("run tidalctl");
    assert!(!output.status.success(), "should exit 1 for bad command");
    let stderr = String::from_utf8(output.stderr).expect("utf8");
    assert!(
        stderr.contains("error"),
        "stderr should contain error: {stderr}"
    );
 }
 #[test]
 fn no_args_exits_1() {
    let output = tidalctl_bin().output().expect("run tidalctl");
    assert!(!output.status.success(), "should exit 1 with no args");
 }
 #[test]
 fn missing_path_flag_exits_1() {
    let output = tidalctl_bin()
        .args(["status"])
        .output()
        .expect("run tidalctl");
    assert!(!output.status.success(), "should exit 1 without --path");
    let stderr = String::from_utf8(output.stderr).expect("utf8");
    assert!(
        stderr.contains("error"),
        "stderr should contain error: {stderr}"
    );
 }