tidaldb/docs/planning/milestone-7/phase-4/task-07-metrics-feature-flag.md
2026-02-23 22:41:16 -07:00

5.6 KiB

Task 07: metrics Feature Flag + Zero-Overhead Verification

Delivers

Verification that all metrics instrumentation (atomic counters, histogram, HTTP endpoint, Prometheus rendering) is gated behind #[cfg(feature = "metrics")] and compiles to zero overhead when the feature is disabled. Adds a CI-enforceable check that --no-default-features produces no metrics code.

Complexity: S

Dependencies

  • task-01 complete (QueryStats is NOT gated -- it has value without Prometheus)
  • task-02, task-03, task-04 complete (all atomic counters must exist before we verify gating)
  • tidal/Cargo.toml -- metrics feature definition

Technical Design

1. Audit all #[cfg(feature = "metrics")] gates

Every AtomicU64 counter, the LatencyHistogram, and the HTTP server module must be gated. The audit covers:

Location Gated item
db/metrics.rs wal_lag_bytes, wal_compacted_segments_total, last_checkpoint_ns, signal_hot_entries, signal_writes_total, signal_write_latency, tantivy_segment_count, tantivy_indexed_docs, usearch_index_size_bytes, usearch_vector_count, bitmap_index_cardinality, active_sessions, closed_sessions_total, session_auto_closed_total, rate_limited_total, degradation_level
db/http.rs Entire module already gated: #[cfg(feature = "metrics")] pub mod http;
db/signals.rs Instant::now() + observe() call in signal()
db/sessions.rs fetch_add/fetch_sub calls on session counters
db/mod.rs metrics_handle field, refresh_index_metrics() call
lib.rs pub use db::http::MetricsHandle re-export

2. Verify QueryStats is NOT gated

QueryStats is always available. It carries per-query telemetry that embedders use for their own logging and debugging, independent of Prometheus. The stats field on Results and SearchResults is always populated.

3. Verify MetricsState base fields are NOT gated

MetricsState.opened_at and MetricsState.health_ok exist regardless of the metrics feature. The render_prometheus() and render_healthz() methods exist regardless (they render the base metrics). Only the additional m7p4 counters are gated.

4. Zero-overhead verification

Add a compile-time check to CI:

# Verify the crate compiles without the metrics feature
cargo check --manifest-path tidal/Cargo.toml --no-default-features

# Verify no metrics-related symbols appear in the binary
cargo build --manifest-path tidal/Cargo.toml --release --no-default-features
# Inspect with nm or objdump for tidaldb_wal_lag_bytes etc.
# (This is a manual verification step documented in the task, not automated)

5. Conditional compilation pattern

Every instrumentation site follows this pattern:

// GOOD: zero overhead when feature is off
#[cfg(feature = "metrics")]
{
    let elapsed_us = start.elapsed().as_micros() as u64;
    self.metrics.signal_writes_total.fetch_add(1, Ordering::Relaxed);
    self.metrics.signal_write_latency.observe(elapsed_us);
}

// BAD: timing overhead even when feature is off
let start = Instant::now(); // <-- this runs unconditionally
// ... work ...
#[cfg(feature = "metrics")]
{
    self.metrics.signal_writes_total.fetch_add(1, Ordering::Relaxed);
}

The Instant::now() call must also be inside the #[cfg] block. On hot paths (signal writes, query execution), even Instant::now() is measurable overhead (~20ns on macOS).

Exception: QueryStats timing uses Instant::now() unconditionally because QueryStats is always populated. This is acceptable because query execution is not the write-path hot loop.

6. Feature flag definition

Verify tidal/Cargo.toml has:

[features]
default = ["metrics"]
metrics = []

The metrics feature is enabled by default so that out-of-the-box usage includes observability. Embedders who need zero overhead opt out with default-features = false.

Acceptance Criteria

  • All m7p4 atomic counters gated behind #[cfg(feature = "metrics")]
  • LatencyHistogram struct and all its call sites gated
  • HTTP metrics server module gated (already done, verify)
  • Instant::now() for metric timing is inside #[cfg] blocks on write-path code
  • QueryStats is NOT gated (always available)
  • MetricsState.opened_at and MetricsState.health_ok are NOT gated
  • cargo check --manifest-path tidal/Cargo.toml --no-default-features succeeds
  • cargo test --manifest-path tidal/Cargo.toml --no-default-features --lib passes
  • No dead code warnings when metrics feature is disabled
  • cargo clippy -D warnings passes with and without metrics feature

Test Strategy

// This test runs in both feature configurations via CI matrix
#[test]
fn query_stats_always_available() {
    // QueryStats is not feature-gated
    let stats = QueryStats::new("test".to_owned());
    assert_eq!(stats.profile_name, "test");
}

#[test]
fn metrics_state_base_always_available() {
    let state = MetricsState::new();
    assert!(state.uptime_seconds() >= 0.0);
    assert!((state.health_ok_value() - 1.0).abs() < f64::EPSILON);
}

#[cfg(feature = "metrics")]
#[test]
fn metrics_feature_counters_exist() {
    let state = MetricsState::new();
    state.signal_writes_total.fetch_add(1, Ordering::Relaxed);
    assert_eq!(state.signal_writes_total.load(Ordering::Relaxed), 1);
}

CI verification (add to pre-commit or CI script):

# Verify no-default-features compilation
cargo check --manifest-path tidal/Cargo.toml --no-default-features 2>&1 | grep -c "error" && exit 1 || true
cargo test --manifest-path tidal/Cargo.toml --no-default-features --lib 2>&1