190 lines
6.3 KiB
Markdown
190 lines
6.3 KiB
Markdown
# Task 05: `tidalctl diagnostics` Command
|
|
|
|
## Delivers
|
|
|
|
A `diagnostics` subcommand for `tidalctl` that reads the database's metrics state and persistent storage to print a human-readable health summary. Operators use this to triage production issues without attaching a debugger or parsing Prometheus output.
|
|
|
|
## Complexity: M
|
|
|
|
## Dependencies
|
|
|
|
- task-02 complete (signal + WAL metrics must be wired)
|
|
- task-03 complete (index health metrics must be wired)
|
|
- task-04 complete (session + cohort + degradation metrics must be wired)
|
|
- Existing `tidalctl` binary with `status` and `paths` subcommands (m0p2)
|
|
- `tidal/src/db/metrics.rs` -- `MetricsState` with all m7p4 metrics
|
|
|
|
## Technical Design
|
|
|
|
### 1. Add `diagnostics` subcommand to tidalctl
|
|
|
|
In the `tidalctl` binary (manual arg parsing), add a new match arm:
|
|
|
|
```rust
|
|
"diagnostics" => {
|
|
let path = parse_path_flag(&args)?;
|
|
run_diagnostics(&path, pretty)?;
|
|
}
|
|
```
|
|
|
|
### 2. Diagnostics data collection
|
|
|
|
The diagnostics command opens the database in read-only inspection mode. It does NOT start a full `TidalDb` instance. Instead, it reads:
|
|
|
|
1. **Config**: from `{data_dir}/config.json` (existing `tidalctl status` path)
|
|
2. **WAL state**: scan `{wal_dir}/` for segment files, compute total size and count
|
|
3. **Checkpoint age**: read `{wal_dir}/checkpoint` file, parse `CheckpointMeta`, compute age from `checkpoint_time_ns`
|
|
4. **Signal ledger size**: read the checkpoint file size (approximate; each entity-signal entry is ~983 bytes from m1p4 format)
|
|
5. **Tantivy index**: if `{data_dir}/text_index/` exists, open read-only, count segments and docs
|
|
6. **USearch index**: if `{data_dir}/vectors/` exists, report directory size
|
|
7. **Session count**: count entries in session journal (`{wal_dir}/session_journal.bin`)
|
|
8. **Collection count**: scan `{data_dir}/items/` for `Tag::Collection` keys
|
|
9. **Cohort count**: scan `{data_dir}/items/` for cohort-related keys
|
|
|
|
For items 5-9, if the directory or file does not exist, report "not available" rather than erroring.
|
|
|
|
### 3. Diagnostics output format
|
|
|
|
```
|
|
tidalDB Diagnostics
|
|
===================
|
|
Version: 0.7.0 (build: abc123)
|
|
Data dir: /var/lib/tidaldb/data
|
|
Storage mode: durable
|
|
|
|
WAL
|
|
---
|
|
Segments: 12
|
|
Total size: 48.3 MB
|
|
Lag (uncompacted): 12.1 MB
|
|
|
|
Checkpoint
|
|
----------
|
|
Last checkpoint: 2026-02-23 14:30:12 UTC (47s ago)
|
|
WAL sequence: 148293
|
|
|
|
Signal Ledger
|
|
-------------
|
|
Estimated entries: ~152,000
|
|
|
|
Text Index (Tantivy)
|
|
--------------------
|
|
Segments: 4
|
|
Indexed docs: 98,412
|
|
|
|
Vector Index (USearch)
|
|
---------------------
|
|
Directory size: 256.7 MB
|
|
|
|
Sessions
|
|
--------
|
|
Active: 3
|
|
Closed (total): 1,247
|
|
Auto-closed: 12
|
|
|
|
Degradation
|
|
-----------
|
|
Level: 0 (healthy)
|
|
|
|
Collections: 8
|
|
Cohorts: 3
|
|
```
|
|
|
|
When `--pretty` is NOT set, output machine-readable JSON:
|
|
|
|
```json
|
|
{
|
|
"version": "0.7.0",
|
|
"build_hash": "abc123",
|
|
"wal_segments": 12,
|
|
"wal_total_bytes": 50659328,
|
|
"wal_lag_bytes": 12689408,
|
|
"checkpoint_age_seconds": 47,
|
|
"checkpoint_wal_sequence": 148293,
|
|
"signal_estimated_entries": 152000,
|
|
"tantivy_segments": 4,
|
|
"tantivy_indexed_docs": 98412,
|
|
"usearch_directory_bytes": 269156352,
|
|
"sessions_active": 3,
|
|
"sessions_closed_total": 1247,
|
|
"sessions_auto_closed_total": 12,
|
|
"degradation_level": 0,
|
|
"collection_count": 8,
|
|
"cohort_count": 3
|
|
}
|
|
```
|
|
|
|
### 4. Exit codes
|
|
|
|
| Code | Meaning |
|
|
|---|---|
|
|
| 0 | Diagnostics completed successfully |
|
|
| 1 | Data directory does not exist or is not readable |
|
|
| 2 | WAL directory missing or corrupt (partial output still printed) |
|
|
|
|
### 5. No TidalDb instance required
|
|
|
|
The diagnostics command reads files directly. It does NOT call `TidalDb::builder().open()`. This means it can run against a database that is currently open by another process (read-only file access) or against a database that failed to start (helping debug startup failures).
|
|
|
|
The one exception: if a running `TidalDb` has the metrics HTTP server enabled, `tidalctl diagnostics` could alternatively fetch `/metrics` and format the output. Implement the file-based approach as the primary path; the HTTP-based approach is a future enhancement.
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [ ] `tidalctl diagnostics --path <dir>` prints human-readable health summary
|
|
- [ ] `tidalctl diagnostics --path <dir>` (without `--pretty`) prints machine-readable JSON
|
|
- [ ] Output includes: WAL segment count, WAL total size, WAL lag, checkpoint age, checkpoint sequence, estimated signal entries, Tantivy segment count, Tantivy indexed docs, USearch directory size, active sessions, closed sessions, auto-closed sessions, degradation level, collection count, cohort count
|
|
- [ ] Missing subsystems (no text index, no vectors) show "not available" rather than error
|
|
- [ ] Works against a database currently open by another process (read-only access)
|
|
- [ ] Exit code 0 on success, 1 on missing data dir, 2 on WAL issues
|
|
- [ ] `cargo clippy -D warnings` and `cargo fmt --check` pass
|
|
|
|
## Test Strategy
|
|
|
|
```rust
|
|
// CLI integration test (runs the binary as a subprocess)
|
|
#[test]
|
|
fn diagnostics_json_output_valid() {
|
|
let db = make_test_db_with_items(10);
|
|
let data_dir = db.paths().data_dir().to_path_buf();
|
|
db.close().unwrap();
|
|
|
|
let output = Command::new(tidalctl_binary_path())
|
|
.args(["diagnostics", "--path", data_dir.to_str().unwrap()])
|
|
.output()
|
|
.unwrap();
|
|
assert!(output.status.success());
|
|
|
|
let json: serde_json::Value = serde_json::from_slice(&output.stdout).unwrap();
|
|
assert!(json["version"].is_string());
|
|
assert!(json["wal_segments"].is_number());
|
|
assert!(json["checkpoint_age_seconds"].is_number());
|
|
}
|
|
|
|
#[test]
|
|
fn diagnostics_pretty_output_readable() {
|
|
let db = make_test_db_with_items(10);
|
|
let data_dir = db.paths().data_dir().to_path_buf();
|
|
db.close().unwrap();
|
|
|
|
let output = Command::new(tidalctl_binary_path())
|
|
.args(["diagnostics", "--path", data_dir.to_str().unwrap(), "--pretty"])
|
|
.output()
|
|
.unwrap();
|
|
assert!(output.status.success());
|
|
|
|
let stdout = String::from_utf8_lossy(&output.stdout);
|
|
assert!(stdout.contains("tidalDB Diagnostics"));
|
|
assert!(stdout.contains("WAL"));
|
|
assert!(stdout.contains("Checkpoint"));
|
|
}
|
|
|
|
#[test]
|
|
fn diagnostics_missing_dir_exits_1() {
|
|
let output = Command::new(tidalctl_binary_path())
|
|
.args(["diagnostics", "--path", "/nonexistent/path"])
|
|
.output()
|
|
.unwrap();
|
|
assert_eq!(output.status.code(), Some(1));
|
|
}
|
|
```
|