tidaldb/docs/planning/milestone-7/phase-4/task-07-metrics-feature-flag.md
2026-02-23 22:41:16 -07:00

137 lines
5.6 KiB
Markdown

# Task 07: `metrics` Feature Flag + Zero-Overhead Verification
## Delivers
Verification that all metrics instrumentation (atomic counters, histogram, HTTP endpoint, Prometheus rendering) is gated behind `#[cfg(feature = "metrics")]` and compiles to zero overhead when the feature is disabled. Adds a CI-enforceable check that `--no-default-features` produces no metrics code.
## Complexity: S
## Dependencies
- task-01 complete (QueryStats is NOT gated -- it has value without Prometheus)
- task-02, task-03, task-04 complete (all atomic counters must exist before we verify gating)
- `tidal/Cargo.toml` -- `metrics` feature definition
## Technical Design
### 1. Audit all `#[cfg(feature = "metrics")]` gates
Every `AtomicU64` counter, the `LatencyHistogram`, and the HTTP server module must be gated. The audit covers:
| Location | Gated item |
|---|---|
| `db/metrics.rs` | `wal_lag_bytes`, `wal_compacted_segments_total`, `last_checkpoint_ns`, `signal_hot_entries`, `signal_writes_total`, `signal_write_latency`, `tantivy_segment_count`, `tantivy_indexed_docs`, `usearch_index_size_bytes`, `usearch_vector_count`, `bitmap_index_cardinality`, `active_sessions`, `closed_sessions_total`, `session_auto_closed_total`, `rate_limited_total`, `degradation_level` |
| `db/http.rs` | Entire module already gated: `#[cfg(feature = "metrics")] pub mod http;` |
| `db/signals.rs` | `Instant::now()` + `observe()` call in `signal()` |
| `db/sessions.rs` | `fetch_add`/`fetch_sub` calls on session counters |
| `db/mod.rs` | `metrics_handle` field, `refresh_index_metrics()` call |
| `lib.rs` | `pub use db::http::MetricsHandle` re-export |
### 2. Verify `QueryStats` is NOT gated
`QueryStats` is always available. It carries per-query telemetry that embedders use for their own logging and debugging, independent of Prometheus. The `stats` field on `Results` and `SearchResults` is always populated.
### 3. Verify `MetricsState` base fields are NOT gated
`MetricsState.opened_at` and `MetricsState.health_ok` exist regardless of the `metrics` feature. The `render_prometheus()` and `render_healthz()` methods exist regardless (they render the base metrics). Only the additional m7p4 counters are gated.
### 4. Zero-overhead verification
Add a compile-time check to CI:
```bash
# Verify the crate compiles without the metrics feature
cargo check --manifest-path tidal/Cargo.toml --no-default-features
# Verify no metrics-related symbols appear in the binary
cargo build --manifest-path tidal/Cargo.toml --release --no-default-features
# Inspect with nm or objdump for tidaldb_wal_lag_bytes etc.
# (This is a manual verification step documented in the task, not automated)
```
### 5. Conditional compilation pattern
Every instrumentation site follows this pattern:
```rust
// GOOD: zero overhead when feature is off
#[cfg(feature = "metrics")]
{
let elapsed_us = start.elapsed().as_micros() as u64;
self.metrics.signal_writes_total.fetch_add(1, Ordering::Relaxed);
self.metrics.signal_write_latency.observe(elapsed_us);
}
// BAD: timing overhead even when feature is off
let start = Instant::now(); // <-- this runs unconditionally
// ... work ...
#[cfg(feature = "metrics")]
{
self.metrics.signal_writes_total.fetch_add(1, Ordering::Relaxed);
}
```
The `Instant::now()` call must also be inside the `#[cfg]` block. On hot paths (signal writes, query execution), even `Instant::now()` is measurable overhead (~20ns on macOS).
Exception: `QueryStats` timing uses `Instant::now()` unconditionally because `QueryStats` is always populated. This is acceptable because query execution is not the write-path hot loop.
### 6. Feature flag definition
Verify `tidal/Cargo.toml` has:
```toml
[features]
default = ["metrics"]
metrics = []
```
The `metrics` feature is enabled by default so that out-of-the-box usage includes observability. Embedders who need zero overhead opt out with `default-features = false`.
## Acceptance Criteria
- [ ] All m7p4 atomic counters gated behind `#[cfg(feature = "metrics")]`
- [ ] `LatencyHistogram` struct and all its call sites gated
- [ ] HTTP metrics server module gated (already done, verify)
- [ ] `Instant::now()` for metric timing is inside `#[cfg]` blocks on write-path code
- [ ] `QueryStats` is NOT gated (always available)
- [ ] `MetricsState.opened_at` and `MetricsState.health_ok` are NOT gated
- [ ] `cargo check --manifest-path tidal/Cargo.toml --no-default-features` succeeds
- [ ] `cargo test --manifest-path tidal/Cargo.toml --no-default-features --lib` passes
- [ ] No dead code warnings when `metrics` feature is disabled
- [ ] `cargo clippy -D warnings` passes with and without `metrics` feature
## Test Strategy
```rust
// This test runs in both feature configurations via CI matrix
#[test]
fn query_stats_always_available() {
// QueryStats is not feature-gated
let stats = QueryStats::new("test".to_owned());
assert_eq!(stats.profile_name, "test");
}
#[test]
fn metrics_state_base_always_available() {
let state = MetricsState::new();
assert!(state.uptime_seconds() >= 0.0);
assert!((state.health_ok_value() - 1.0).abs() < f64::EPSILON);
}
#[cfg(feature = "metrics")]
#[test]
fn metrics_feature_counters_exist() {
let state = MetricsState::new();
state.signal_writes_total.fetch_add(1, Ordering::Relaxed);
assert_eq!(state.signal_writes_total.load(Ordering::Relaxed), 1);
}
```
CI verification (add to pre-commit or CI script):
```bash
# Verify no-default-features compilation
cargo check --manifest-path tidal/Cargo.toml --no-default-features 2>&1 | grep -c "error" && exit 1 || true
cargo test --manifest-path tidal/Cargo.toml --no-default-features --lib 2>&1
```