tidaldb/docs/planning/milestone-0/phase-2/SCOPING.md
jordan 4f076c927d feat: M0p1 runtime skeleton, M0p2 tooling & diagnostics, m1p4 signal ledger
## M0p1 — Embeddable Runtime Skeleton (329 tests)
- TidalDb with builder(), health_check(), close(), and Drop-based cleanup
- TidalDbBuilder fluent API: ephemeral(), with_data_dir(), wal_dir(), cache_dir()
- Config, StorageMode, ConfigError types; Config(ConfigError) variant on LumenError
- Paths: single source of truth for directory layout (wal, items, users, creators, cache)
- TempTidalHome: test isolation helper gated behind #[cfg(test)] / test-utils feature
- 8 integration tests: tests/sandboxed_storage.rs

## M0p2 — Tooling & Diagnostics (349 tests)
- Workspace root Cargo.toml (members: ["tidal", "tidalctl"])
- tidal/build.rs: BUILD_HASH from GIT_HASH with option_env!() fallback to "dev"
- MetricsState: always-compiled Arc-shared atomics (uptime, health_ok)
- MetricsHandle (metrics feature): hand-rolled TcpListener HTTP, zero new deps
  - GET /healthz → {"status":"ok","uptime_secs":N}
  - GET /metrics → Prometheus text (tidaldb_uptime_seconds, health_ok, info)
- TidalDbBuilder.enable_metrics(addr) starts background metrics thread
- tidalctl binary: status + paths commands, manual std::env::args() parsing
- 7 metrics integration tests, 9 tidalctl CLI tests

## m1p4 Signal Ledger (in-progress)
- SignalLedger: DashMap<(EntityId, SignalTypeId), EntitySignalEntry>, WAL-first writes
- HotSignalState: #[repr(C, align(64))], lock-free CAS decay, out-of-order handling
- BucketedCounter: 60 per-minute + 168 per-hour circular buffers, trigger-based rotation
- CheckpointMeta + serialize/restore: 983-byte fixed records, atomic WriteBatch
- Property tests: running score matches analytical to 1e-6, decay monotonic, non-negative
- Proptest regression: signals/warm.txt

## Documentation and planning
- ROADMAP: m0p1 COMPLETE (329), m0p2 COMPLETE (349), product track milestones
- PRODUCT_ROADMAP: P0-P4 product milestone track (personal briefing beachhead)
- Milestone planning docs: milestone-0 (phases 1-3), milestone-p (phases 1-5)
- docs/research/tidaldb_tooling_and_diagnostics.md
- ARCHITECTURE.md, CLAUDE.md, VISION.md updates

## Site
- Blog: every-platform-builds-the-same-6-systems.mdx (new)
- Blog: why-tidaldb.mdx (updated)
- next.config.ts, layout.tsx, blog/page.tsx updates

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 20:32:00 -07:00

19 KiB

Milestone 0, Phase 2: Tooling & Diagnostics -- Scoping Decisions

Date: 2026-02-20 Author: @tidal-visionary (Spencer Kimball) Status: APPROVED -- ready for implementation


Context

m0p1 (Embeddable Runtime Skeleton) is complete. m1p1-p3 (Type System, WAL, Storage Engine) are also complete. The codebase has:

  • TidalDb as a thin handle: holds Config, has health_check(), close(), Drop
  • A full WAL implementation (WalHandle, SegmentWriter, CheckpointManager) that writes segment files (wal-{seq:020}.seg) and checkpoint metadata (checkpoint.meta) to disk
  • No db.signal() yet in the public API (deferred to m1p5)
  • No WAL writes from the TidalDb public API -- the WAL is implemented but not wired to the TidalDb facade
  • Config has no serde derive -- it is a plain struct with no serialization
  • Single crate tidal/, no workspace

The task documents in phase-2/ were written before m1p2 and m1p3 shipped. They assumed WAL writes would be accessible from the public API. They are not. This scoping document corrects the task definitions to match reality.


1. tidalctl Scope at M0

What tidalctl Can Do

tidalctl is a cold inspector. There is no live process to connect to. The CLI reads files from disk and reports what it finds. This is the correct model for an embeddable database -- there is no server process listening on a port. The inspector reads the same files the embedded library writes.

Commands

tidalctl status --path <dir>

Reads the tidalDB home directory and prints a JSON report:

{
  "version": "0.1.0",
  "build_hash": "29400d4",
  "status": "ok",
  "storage_mode": "persistent",
  "wal": {
    "segments": 3,
    "first_seq": 1,
    "last_segment_seq": 201,
    "checkpoint_seq": 150,
    "checkpoint_ts": "2026-02-20T14:30:00Z",
    "wal_dir_bytes": 49152
  },
  "dirs": {
    "base": "/var/lib/tidaldb",
    "wal": "/var/lib/tidaldb/wal",
    "items": "/var/lib/tidaldb/items",
    "users": "/var/lib/tidaldb/users",
    "creators": "/var/lib/tidaldb/creators",
    "cache": "/var/lib/tidaldb/cache"
  }
}

How each field is computed:

Field Source Notes
version Compiled into binary via env!("CARGO_PKG_VERSION") Always available
build_hash Compiled via option_env!("GIT_HASH") or build script Falls back to "unknown"
status "ok" if dir exists, has wal subdir, and at least one segment "empty" if no WAL segments, "error" if dir missing
storage_mode Inferred: if WAL dir exists with segments, "persistent" No way to know ephemeral from disk -- ephemeral leaves no trace
wal.segments segment::list_segments(&wal_dir)?.len() Already implemented in tidal/src/wal/segment.rs
wal.first_seq First element of list_segments() result 0 if empty
wal.last_segment_seq Last element of list_segments() result 0 if empty
wal.checkpoint_seq CheckpointManager::read(&wal_dir)? null if no checkpoint file
wal.checkpoint_ts Same -- the ts field, formatted as ISO 8601 null if no checkpoint
wal.wal_dir_bytes Sum of file sizes in WAL dir Filesystem stat
dirs.* Paths::new(base) expanded Existence checked per dir

No config file is written. tidalctl does not need TidalDb::open() to write a .tidaldb.json config snapshot. The CLI reports what it can observe on the filesystem. The config is a runtime concept -- it exists in memory while the process runs and is not persisted. This is correct for M0. If future milestones need a config file for operational tooling, that is a separate decision.

No live process query. tidalctl reads disk. It does not connect to a running process. No Unix socket, no HTTP, no PID file. This is the right model for an embeddable library.

tidalctl paths --path <dir>

Prints the resolved directory layout:

{
  "base": "/var/lib/tidaldb",
  "wal": "/var/lib/tidaldb/wal",
  "items": "/var/lib/tidaldb/items",
  "users": "/var/lib/tidaldb/users",
  "creators": "/var/lib/tidaldb/creators",
  "cache": "/var/lib/tidaldb/cache",
  "exists": {
    "base": true,
    "wal": true,
    "items": true,
    "users": false,
    "creators": false,
    "cache": false
  }
}

This uses Paths::new(dir) -- the same path helper from m0p1. No duplication.

Common Flags

  • --path <dir> (required): the tidalDB home directory
  • --pretty (optional): pretty-print JSON output (default: compact)
  • --format json|text (optional, default json): text prints human-friendly tabular output

What tidalctl Does NOT Do at M0

  • No tidalctl init (creating a fresh tidalDB home) -- the library creates dirs on open
  • No tidalctl repair (WAL repair) -- crash recovery is automatic in WalHandle::open()
  • No tidalctl compact (storage compaction) -- no compaction exists yet
  • No tidalctl dump (WAL event dump) -- useful but not needed for the m0p2 UAT
  • No live process communication of any kind

2. Metrics Scope at M0

The Problem with the Original Task

The original task says: "Integration test hits /metrics and asserts counters increment when WAL appends."

At M0, the TidalDb public API has no WAL write path. WalHandle::append() exists but is not wired to TidalDb. There are no signal writes from the public API. A test that asserts "counters increment when WAL appends" cannot be written without either (a) using WAL internals directly or (b) waiting for m1p5.

Corrected Scope

The metrics surface at M0 serves one purpose: prove the plumbing works so later milestones can add counters without redesigning the metrics layer. The counters themselves are scaffolding. The architecture is the deliverable.

Endpoints

GET /healthz

{
  "status": "ok",
  "uptime_seconds": 127.3,
  "version": "0.1.0",
  "build_hash": "29400d4"
}

GET /metrics

Prometheus text exposition format:

# HELP tidaldb_uptime_seconds Seconds since database opened.
# TYPE tidaldb_uptime_seconds gauge
tidaldb_uptime_seconds{partition_id="0"} 127.3

# HELP tidaldb_health_ok Whether the database is healthy. 1 = ok, 0 = degraded.
# TYPE tidaldb_health_ok gauge
tidaldb_health_ok{partition_id="0"} 1

# HELP tidaldb_info Build and version information.
# TYPE tidaldb_info gauge
tidaldb_info{version="0.1.0",build_hash="29400d4",partition_id="0"} 1

Exact Counters at M0

Counter Type Source Note
tidaldb_uptime_seconds Gauge Instant::now() - opened_at Computed on read
tidaldb_health_ok Gauge health_check().is_ok() as u8 1 or 0
tidaldb_info Gauge (info-pattern) Build constants Static, always 1

That is the complete set. Three metrics. No WAL counters, no signal counters, no storage counters. Those arrive in m1p5 when the WAL is wired to the public API.

What the Integration Test Verifies

The integration test at M0 verifies:

  1. TidalDb::builder().ephemeral().enable_metrics("127.0.0.1:0").open() succeeds (port 0 = OS assigns)
  2. GET /healthz returns 200 with status: "ok" and uptime_seconds > 0
  3. GET /metrics returns 200 with valid Prometheus text format
  4. tidaldb_uptime_seconds increases between two reads separated by a sleep
  5. tidaldb_health_ok is 1
  6. db.close() stops the metrics server cleanly (no leaked threads, no port still bound)

No WAL assertions. No signal assertions. The test proves the HTTP server starts, serves correct responses, and shuts down cleanly.

What Is Deferred

Counter Deferred To Why
tidaldb_wal_seq m1p5 WAL not wired to public API yet
tidaldb_wal_segments m1p5 Same
tidaldb_wal_bytes_total m1p5 Same
tidaldb_signal_writes_total m1p5 db.signal() does not exist yet
tidaldb_signal_read_latency m1p5 Signal reads do not exist yet
tidaldb_query_latency m2p5 Query executor does not exist yet
tidaldb_query_count m2p5 Same

3. HTTP Approach: Sync (Option A)

Chosen: (a) Sync HTTP via tiny_http in a background thread.

Rationale:

  1. Minimal deps is an explicit tidalDB requirement. Tokio is 200+ transitive dependencies. tiny_http is 5. For an embeddable library, dependency weight matters -- every dep is a compile-time cost and an audit surface for every user.

  2. The metrics endpoint does ~2 requests per scrape interval. This is not a high-throughput server. A single-threaded sync HTTP listener on a background thread handles thousands of req/s. Prometheus scrapes every 15-30s. tiny_http handles this with zero contention.

  3. No Tokio runtime conflict. If the host application uses Tokio (likely for an Axum/Actix service), embedding a second Tokio runtime inside tidalDB creates footguns: nested block_on, unexpected thread pools, panic behavior. A background std::thread with sync HTTP avoids all of this.

  4. The "Future implementor" spec is wrong for M0. The original task assumed tidalDB would share the host's async runtime. That is a leaky abstraction. An embeddable library should not assume or require any particular async runtime. A background thread with sync HTTP is the correct primitive.

  5. Feature flag is premature. Option (c) with feature flags adds compile-time complexity for a surface that serves 3 metrics. Ship sync now. If M7 (Production Hardening) needs async HTTP for high-frequency scraping, add it then. The internal MetricsRegistry / counter abstraction is the same either way -- only the HTTP transport changes.

Implementation Shape

// Builder API
let db = TidalDb::builder()
    .ephemeral()
    .enable_metrics("127.0.0.1:9090")  // Starts background thread
    .open()?;

// Internal: spawns std::thread with tiny_http::Server
// Thread reads from Arc<MetricsState> (uptime, health_ok, build_info)
// Thread exits cleanly when TidalDb::close() sets a shutdown flag

Dependency Addition

# In tidal/Cargo.toml, behind a feature flag:
[features]
metrics = ["dep:tiny_http"]

[dependencies]
tiny_http = { version = "0.12", optional = true }

The metrics feature is opt-in. Users who do not need the HTTP endpoint pay zero compile cost. The MetricsState struct (atomic counters) exists unconditionally -- only the HTTP server is gated.


4. Workspace Structure: Workspace with Separate Binary Crate

Confirmed: workspace layout.

Structure

tidalDB/
  Cargo.toml              # [workspace] members = ["tidal", "tidalctl"]
  tidal/
    Cargo.toml            # [package] name = "tidaldb" (the library)
    src/
  tidalctl/
    Cargo.toml            # [package] name = "tidalctl" (the binary)
    src/
      main.rs

Why Workspace, Not [[bin]]

  1. Separate dependency trees. tidalctl needs clap for argument parsing. The tidaldb library should not carry clap as a dependency -- embeddable libraries do not parse CLI arguments. A [[bin]] inside tidal/ would either make clap unconditional or require a feature flag, both of which pollute the library.

  2. Independent versioning path. tidalctl may version independently from tidaldb. The CLI is a companion tool, not part of the library API surface.

  3. cargo install tidalctl works naturally. Users install the CLI separately from embedding the library. A workspace member with [[bin]] in its own crate gives cargo install --path tidalctl the right behavior.

  4. Shared dependencies via workspace. tidalctl depends on tidaldb (for Paths, WalConfig, segment parsing, checkpoint reading). The workspace ensures they share the same compiled artifacts.

tidalctl Dependencies

[package]
name = "tidalctl"
version = "0.1.0"
edition = "2024"

[dependencies]
tidaldb = { path = "../tidal" }
clap = { version = "4", features = ["derive"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"

What This Means for Pre-Commit Hooks and CI

The root Cargo.toml becomes the workspace root. All cargo commands (fmt, clippy, test) need to run from the workspace root or with --workspace. The pre-commit hook currently uses --manifest-path tidal/Cargo.toml -- this must be updated to use the workspace root.


5. Deferred Items

Explicitly NOT in m0p2

Item Why Deferred Arrives In
Config serialization to disk (.tidaldb.json) tidalctl inspects filesystem artifacts, not config files. Config is a runtime concept. Revisit in M7 if operational tooling needs it
tidalctl init command Library creates dirs on open. A separate init command is redundant. Possibly never
tidalctl repair command Crash recovery is automatic in WalHandle::open(). Manual repair is a production concern. M7
tidalctl dump (WAL event dump) Useful for debugging but not required for m0p2 UAT M1 or M2 when developers need to debug signal event streams
WAL counters in metrics WAL not wired to public API yet m1p5
Signal counters in metrics db.signal() does not exist yet m1p5
Query counters in metrics Query executor does not exist yet m2p5
Async HTTP for metrics Sync HTTP is sufficient for Prometheus scraping M7 if needed
tidalctl connecting to live process Embeddable library has no server process Possibly never
Serde on Config tidalctl does not read a config file. Config serde is needed only if we write a config file, which is deferred. When needed

6. Acceptance Criteria

Task 1: tidalctl CLI

  • AC-1: tidalctl status --path <dir> against a directory with WAL segments and checkpoint outputs valid JSON containing version, wal.segments, wal.checkpoint_seq, and dirs.base
  • AC-2: tidalctl status --path <dir> against an empty directory (no WAL, no segments) outputs JSON with status: "empty" and wal.segments: 0
  • AC-3: tidalctl status --path /nonexistent exits with non-zero status and prints a JSON error object to stderr
  • AC-4: tidalctl paths --path <dir> outputs JSON with all six directory paths and existence flags matching actual filesystem state
  • AC-5: --pretty flag produces indented JSON; absence produces compact JSON
  • AC-6: cargo test -p tidalctl passes with tests for: valid home, empty home, missing home, pretty flag, paths command

Task 2: Metrics Surface

  • AC-7: TidalDb::builder().ephemeral().enable_metrics("127.0.0.1:0").open() starts a background HTTP thread bound to an OS-assigned port
  • AC-8: GET /healthz returns HTTP 200 with JSON containing status: "ok" and uptime_seconds > 0
  • AC-9: GET /metrics returns HTTP 200 with valid Prometheus text format containing tidaldb_uptime_seconds, tidaldb_health_ok, and tidaldb_info
  • AC-10: tidaldb_uptime_seconds increases monotonically between reads (verified by sleeping 100ms between two fetches)
  • AC-11: TidalDb::close() stops the metrics HTTP thread; subsequent connection attempts to the port are refused
  • AC-12: Building tidaldb without the metrics feature flag compiles successfully with no tiny_http dependency; enable_metrics() method is absent or returns a compile error guiding the user to enable the feature

7. UAT Scenario

Given

A developer has:
  - Built the workspace: `cargo build --workspace`
  - Created a persistent tidalDB instance that wrote WAL segments:
      let home = TempTidalHome::new()?;
      let paths = home.paths();
      paths.ensure_all()?;
      let wal_config = WalConfig { dir: home.path().to_path_buf(), ..Default::default() };
      let (wal, _) = WalHandle::open(wal_config)?;
      wal.append(event_1)?;
      wal.append(event_2)?;
      wal.checkpoint(2)?;
      wal.shutdown()?;
  - Opened a TidalDb with metrics enabled:
      let db = TidalDb::builder()
          .ephemeral()
          .enable_metrics("127.0.0.1:0")
          .open()?;

When

1. Run: tidalctl status --path <home.path()>
2. Run: tidalctl paths --path <home.path()>
3. HTTP GET /healthz on the metrics port
4. HTTP GET /metrics on the metrics port
5. Sleep 200ms
6. HTTP GET /metrics again
7. db.close()
8. Attempt HTTP GET /healthz on the metrics port

Then

Step 1: JSON output with wal.segments >= 1, wal.checkpoint_seq == 2,
        status == "ok", version matches Cargo.toml

Step 2: JSON output with dirs.wal == "<home>/wal", exists.wal == true

Step 3: HTTP 200, body contains "status":"ok", uptime_seconds > 0

Step 4: HTTP 200, body contains tidaldb_uptime_seconds,
        tidaldb_health_ok 1, tidaldb_info{version="0.1.0"...} 1

Step 5: (sleep)

Step 6: tidaldb_uptime_seconds > value from step 4

Step 7: close() returns Ok(())

Step 8: Connection refused (metrics server stopped)

Pass/Fail Gate

m0p2 is done when:

  • cargo test -p tidalctl passes
  • cargo test -p tidaldb --features metrics passes (metrics integration tests)
  • cargo build --workspace succeeds with no warnings under clippy -D warnings
  • All 12 acceptance criteria above are verified by automated tests
  • tidalctl uses Paths from the tidaldb crate (no duplicated layout logic)

Implementation Notes

Build Hash

Use a build script (tidal/build.rs) or option_env!("GIT_HASH") set by CI. For local builds, fall back to "dev". Both tidalctl and the metrics endpoint use the same constant.

Metrics State Sharing

pub(crate) struct MetricsState {
    opened_at: Instant,
    health_ok: AtomicBool,
    // Future milestones add: wal_seq: AtomicU64, signal_writes: AtomicU64, etc.
}

This struct is Arc-shared between TidalDb and the metrics HTTP thread. Adding new counters in future milestones is a one-line addition to this struct plus a one-line addition to the Prometheus renderer. The plumbing is paid for once in m0p2.

tidalctl WAL Inspection

tidalctl depends on tidaldb as a library. It calls:

  • tidaldb::db::Paths::new(dir) for path resolution
  • tidaldb::wal::segment::list_segments(&wal_dir) for segment enumeration
  • tidaldb::wal::checkpoint::CheckpointManager::read(&wal_dir) for checkpoint state

These are all pub functions already. No new internal APIs need to be exposed. The WAL module's public surface is sufficient.

Complexity Estimates

Task Complexity Rationale
Workspace setup (root Cargo.toml, pre-commit hook update) S Mechanical, no design decisions
tidalctl CLI (clap, status, paths) M Two commands, JSON output, error handling, tests
Metrics surface (tiny_http, feature flag, MetricsState, endpoints) M Background thread lifecycle, Prometheus format, integration test
Build hash plumbing S Build script or env var, shared constant

Total phase complexity: M (two M tasks + two S tasks, all independent after workspace setup)