tidaldb/docs/planning/milestone-7/phase-4/task-05-tidalctl-diagnostics.md
2026-02-23 22:41:16 -07:00

6.3 KiB

Task 05: tidalctl diagnostics Command

Delivers

A diagnostics subcommand for tidalctl that reads the database's metrics state and persistent storage to print a human-readable health summary. Operators use this to triage production issues without attaching a debugger or parsing Prometheus output.

Complexity: M

Dependencies

  • task-02 complete (signal + WAL metrics must be wired)
  • task-03 complete (index health metrics must be wired)
  • task-04 complete (session + cohort + degradation metrics must be wired)
  • Existing tidalctl binary with status and paths subcommands (m0p2)
  • tidal/src/db/metrics.rs -- MetricsState with all m7p4 metrics

Technical Design

1. Add diagnostics subcommand to tidalctl

In the tidalctl binary (manual arg parsing), add a new match arm:

"diagnostics" => {
    let path = parse_path_flag(&args)?;
    run_diagnostics(&path, pretty)?;
}

2. Diagnostics data collection

The diagnostics command opens the database in read-only inspection mode. It does NOT start a full TidalDb instance. Instead, it reads:

  1. Config: from {data_dir}/config.json (existing tidalctl status path)
  2. WAL state: scan {wal_dir}/ for segment files, compute total size and count
  3. Checkpoint age: read {wal_dir}/checkpoint file, parse CheckpointMeta, compute age from checkpoint_time_ns
  4. Signal ledger size: read the checkpoint file size (approximate; each entity-signal entry is ~983 bytes from m1p4 format)
  5. Tantivy index: if {data_dir}/text_index/ exists, open read-only, count segments and docs
  6. USearch index: if {data_dir}/vectors/ exists, report directory size
  7. Session count: count entries in session journal ({wal_dir}/session_journal.bin)
  8. Collection count: scan {data_dir}/items/ for Tag::Collection keys
  9. Cohort count: scan {data_dir}/items/ for cohort-related keys

For items 5-9, if the directory or file does not exist, report "not available" rather than erroring.

3. Diagnostics output format

tidalDB Diagnostics
===================
Version:        0.7.0 (build: abc123)
Data dir:       /var/lib/tidaldb/data
Storage mode:   durable

WAL
---
Segments:       12
Total size:     48.3 MB
Lag (uncompacted): 12.1 MB

Checkpoint
----------
Last checkpoint: 2026-02-23 14:30:12 UTC (47s ago)
WAL sequence:    148293

Signal Ledger
-------------
Estimated entries: ~152,000

Text Index (Tantivy)
--------------------
Segments:       4
Indexed docs:   98,412

Vector Index (USearch)
---------------------
Directory size: 256.7 MB

Sessions
--------
Active:         3
Closed (total): 1,247
Auto-closed:    12

Degradation
-----------
Level:          0 (healthy)

Collections:    8
Cohorts:        3

When --pretty is NOT set, output machine-readable JSON:

{
  "version": "0.7.0",
  "build_hash": "abc123",
  "wal_segments": 12,
  "wal_total_bytes": 50659328,
  "wal_lag_bytes": 12689408,
  "checkpoint_age_seconds": 47,
  "checkpoint_wal_sequence": 148293,
  "signal_estimated_entries": 152000,
  "tantivy_segments": 4,
  "tantivy_indexed_docs": 98412,
  "usearch_directory_bytes": 269156352,
  "sessions_active": 3,
  "sessions_closed_total": 1247,
  "sessions_auto_closed_total": 12,
  "degradation_level": 0,
  "collection_count": 8,
  "cohort_count": 3
}

4. Exit codes

Code Meaning
0 Diagnostics completed successfully
1 Data directory does not exist or is not readable
2 WAL directory missing or corrupt (partial output still printed)

5. No TidalDb instance required

The diagnostics command reads files directly. It does NOT call TidalDb::builder().open(). This means it can run against a database that is currently open by another process (read-only file access) or against a database that failed to start (helping debug startup failures).

The one exception: if a running TidalDb has the metrics HTTP server enabled, tidalctl diagnostics could alternatively fetch /metrics and format the output. Implement the file-based approach as the primary path; the HTTP-based approach is a future enhancement.

Acceptance Criteria

  • tidalctl diagnostics --path <dir> prints human-readable health summary
  • tidalctl diagnostics --path <dir> (without --pretty) prints machine-readable JSON
  • Output includes: WAL segment count, WAL total size, WAL lag, checkpoint age, checkpoint sequence, estimated signal entries, Tantivy segment count, Tantivy indexed docs, USearch directory size, active sessions, closed sessions, auto-closed sessions, degradation level, collection count, cohort count
  • Missing subsystems (no text index, no vectors) show "not available" rather than error
  • Works against a database currently open by another process (read-only access)
  • Exit code 0 on success, 1 on missing data dir, 2 on WAL issues
  • cargo clippy -D warnings and cargo fmt --check pass

Test Strategy

// CLI integration test (runs the binary as a subprocess)
#[test]
fn diagnostics_json_output_valid() {
    let db = make_test_db_with_items(10);
    let data_dir = db.paths().data_dir().to_path_buf();
    db.close().unwrap();

    let output = Command::new(tidalctl_binary_path())
        .args(["diagnostics", "--path", data_dir.to_str().unwrap()])
        .output()
        .unwrap();
    assert!(output.status.success());

    let json: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap();
    assert!(json["version"].is_string());
    assert!(json["wal_segments"].is_number());
    assert!(json["checkpoint_age_seconds"].is_number());
}

#[test]
fn diagnostics_pretty_output_readable() {
    let db = make_test_db_with_items(10);
    let data_dir = db.paths().data_dir().to_path_buf();
    db.close().unwrap();

    let output = Command::new(tidalctl_binary_path())
        .args(["diagnostics", "--path", data_dir.to_str().unwrap(), "--pretty"])
        .output()
        .unwrap();
    assert!(output.status.success());

    let stdout = String::from_utf8_lossy(&output.stdout);
    assert!(stdout.contains("tidalDB Diagnostics"));
    assert!(stdout.contains("WAL"));
    assert!(stdout.contains("Checkpoint"));
}

#[test]
fn diagnostics_missing_dir_exits_1() {
    let output = Command::new(tidalctl_binary_path())
        .args(["diagnostics", "--path", "/nonexistent/path"])
        .output()
        .unwrap();
    assert_eq!(output.status.code(), Some(1));
}