tidaldb/docs/planning/milestone-7/phase-4/task-05-tidalctl-diagnostics.md

# Task 05: `tidalctl diagnostics` Command

## Delivers

A `diagnostics` subcommand for `tidalctl` that reads the database's metrics state and persistent storage to print a human-readable health summary. Operators use this to triage production issues without attaching a debugger or parsing Prometheus output.

## Complexity: M

## Dependencies

- task-02 complete (signal + WAL metrics must be wired)
- task-03 complete (index health metrics must be wired)
- task-04 complete (session + cohort + degradation metrics must be wired)
- Existing `tidalctl` binary with `status` and `paths` subcommands (m0p2)
- `tidal/src/db/metrics.rs` -- `MetricsState` with all m7p4 metrics

## Technical Design

### 1. Add `diagnostics` subcommand to tidalctl

In the `tidalctl` binary (manual arg parsing), add a new match arm:

```rust
"diagnostics" => {
    let path = parse_path_flag(&args)?;
    run_diagnostics(&path, pretty)?;
}
```

### 2. Diagnostics data collection

The diagnostics command opens the database in read-only inspection mode. It does NOT start a full `TidalDb` instance. Instead, it reads:

1. **Config**: from `{data_dir}/config.json` (existing `tidalctl status` path)
2. **WAL state**: scan `{wal_dir}/` for segment files, compute total size and count
3. **Checkpoint age**: read `{wal_dir}/checkpoint` file, parse `CheckpointMeta`, compute age from `checkpoint_time_ns`
4. **Signal ledger size**: read the checkpoint file size (approximate; each entity-signal entry is ~983 bytes from m1p4 format)
5. **Tantivy index**: if `{data_dir}/text_index/` exists, open read-only, count segments and docs
6. **USearch index**: if `{data_dir}/vectors/` exists, report directory size
7. **Session count**: count entries in session journal (`{wal_dir}/session_journal.bin`)
8. **Collection count**: scan `{data_dir}/items/` for `Tag::Collection` keys
9. **Cohort count**: scan `{data_dir}/items/` for cohort-related keys

For items 5-9, if the directory or file does not exist, report "not available" rather than erroring.

### 3. Diagnostics output format

```
tidalDB Diagnostics
===================
Version:        0.7.0 (build: abc123)
Data dir:       /var/lib/tidaldb/data
Storage mode:   durable

WAL
---
Segments:       12
Total size:     48.3 MB
Lag (uncompacted): 12.1 MB

Checkpoint
----------
Last checkpoint: 2026-02-23 14:30:12 UTC (47s ago)
WAL sequence:    148293

Signal Ledger
-------------
Estimated entries: ~152,000

Text Index (Tantivy)
--------------------
Segments:       4
Indexed docs:   98,412

Vector Index (USearch)
---------------------
Directory size: 256.7 MB

Sessions
--------
Active:         3
Closed (total): 1,247
Auto-closed:    12

Degradation
-----------
Level:          0 (healthy)

Collections:    8
Cohorts:        3
```

When `--pretty` is NOT set, output machine-readable JSON:

```json
{
  "version": "0.7.0",
  "build_hash": "abc123",
  "wal_segments": 12,
  "wal_total_bytes": 50659328,
  "wal_lag_bytes": 12689408,
  "checkpoint_age_seconds": 47,
  "checkpoint_wal_sequence": 148293,
  "signal_estimated_entries": 152000,
  "tantivy_segments": 4,
  "tantivy_indexed_docs": 98412,
  "usearch_directory_bytes": 269156352,
  "sessions_active": 3,
  "sessions_closed_total": 1247,
  "sessions_auto_closed_total": 12,
  "degradation_level": 0,
  "collection_count": 8,
  "cohort_count": 3
}
```

### 4. Exit codes

| Code | Meaning |
|---|---|
| 0 | Diagnostics completed successfully |
| 1 | Data directory does not exist or is not readable |
| 2 | WAL directory missing or corrupt (partial output still printed) |

### 5. No TidalDb instance required

The diagnostics command reads files directly. It does NOT call `TidalDb::builder().open()`. This means it can run against a database that is currently open by another process (read-only file access) or against a database that failed to start (helping debug startup failures).

The one exception: if a running `TidalDb` has the metrics HTTP server enabled, `tidalctl diagnostics` could alternatively fetch `/metrics` and format the output. Implement the file-based approach as the primary path; the HTTP-based approach is a future enhancement.

## Acceptance Criteria

- [ ] `tidalctl diagnostics --path <dir>` prints human-readable health summary
- [ ] `tidalctl diagnostics --path <dir>` (without `--pretty`) prints machine-readable JSON
- [ ] Output includes: WAL segment count, WAL total size, WAL lag, checkpoint age, checkpoint sequence, estimated signal entries, Tantivy segment count, Tantivy indexed docs, USearch directory size, active sessions, closed sessions, auto-closed sessions, degradation level, collection count, cohort count
- [ ] Missing subsystems (no text index, no vectors) show "not available" rather than error
- [ ] Works against a database currently open by another process (read-only access)
- [ ] Exit code 0 on success, 1 on missing data dir, 2 on WAL issues
- [ ] `cargo clippy -D warnings` and `cargo fmt --check` pass

## Test Strategy

```rust
// CLI integration test (runs the binary as a subprocess)
#[test]
fn diagnostics_json_output_valid() {
    let db = make_test_db_with_items(10);
    let data_dir = db.paths().data_dir().to_path_buf();
    db.close().unwrap();

    let output = Command::new(tidalctl_binary_path())
        .args(["diagnostics", "--path", data_dir.to_str().unwrap()])
        .output()
        .unwrap();
    assert!(output.status.success());

    let json: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap();
    assert!(json["version"].is_string());
    assert!(json["wal_segments"].is_number());
    assert!(json["checkpoint_age_seconds"].is_number());
}

#[test]
fn diagnostics_pretty_output_readable() {
    let db = make_test_db_with_items(10);
    let data_dir = db.paths().data_dir().to_path_buf();
    db.close().unwrap();

    let output = Command::new(tidalctl_binary_path())
        .args(["diagnostics", "--path", data_dir.to_str().unwrap(), "--pretty"])
        .output()
        .unwrap();
    assert!(output.status.success());

    let stdout = String::from_utf8_lossy(&output.stdout);
    assert!(stdout.contains("tidalDB Diagnostics"));
    assert!(stdout.contains("WAL"));
    assert!(stdout.contains("Checkpoint"));
}

#[test]
fn diagnostics_missing_dir_exits_1() {
    let output = Command::new(tidalctl_binary_path())
        .args(["diagnostics", "--path", "/nonexistent/path"])
        .output()
        .unwrap();
    assert_eq!(output.status.code(), Some(1));
}
```