8.5 KiB
Task 03: Index Health Metrics
Delivers
Prometheus gauges for all secondary index health: Tantivy segment count and indexed document count, USearch vector count and byte size, bitmap index total cardinality. These metrics let operators detect stale derived indexes, growing segment fragmentation, and index size anomalies before they affect query latency.
Complexity: M
Dependencies
- task-01 complete (establishes instrumentation pattern)
tidal/src/db/metrics.rs--MetricsStateto extendtidal/src/text/index.rs--TextIndexwraps TantivyIndexandIndexReadertidal/src/storage/vector/registry.rs--EmbeddingSlotRegistryowns USearch indexestidal/src/storage/indexes/bitmap.rs--BitmapIndexwithRoaringBitmapvalues
Technical Design
1. Add atomic gauges to MetricsState
In tidal/src/db/metrics.rs:
pub struct MetricsState {
// ... existing + task-02 fields ...
// ── Index health metrics (m7p4) ────────────────────────────────────
/// Number of Tantivy segments for the items text index.
#[cfg(feature = "metrics")]
pub(crate) tantivy_segment_count: AtomicU64,
/// Number of documents indexed in the items text index.
#[cfg(feature = "metrics")]
pub(crate) tantivy_indexed_docs: AtomicU64,
/// Total byte size of the USearch index files on disk (or in-memory estimate).
#[cfg(feature = "metrics")]
pub(crate) usearch_index_size_bytes: AtomicU64,
/// Number of vectors stored in the USearch index.
#[cfg(feature = "metrics")]
pub(crate) usearch_vector_count: AtomicU64,
/// Total cardinality across all bitmap index entries (category + format + creator + tag).
#[cfg(feature = "metrics")]
pub(crate) bitmap_index_cardinality: AtomicU64,
}
2. Expose index introspection methods
TextIndex
Add a method to TextIndex for segment and document count:
impl TextIndex {
/// Return the number of segments and total indexed documents.
///
/// Reads from the current IndexReader snapshot. Thread-safe.
#[must_use]
pub fn index_stats(&self) -> (usize, u64) {
let searcher = self.reader.searcher();
let segment_count = searcher.segment_readers().len();
let doc_count = searcher
.segment_readers()
.iter()
.map(|r| u64::from(r.num_docs()))
.sum();
(segment_count, doc_count)
}
}
EmbeddingSlotRegistry
Add a method to report total vector count and estimated byte size:
impl EmbeddingSlotRegistry {
/// Return the total vector count and estimated byte size across all slots.
#[must_use]
pub fn index_stats(&self) -> (u64, u64) {
let mut total_vectors: u64 = 0;
let mut total_bytes: u64 = 0;
for slot in self.slots.values() {
let count = slot.index.size() as u64;
// USearch reports serialized size; use dimensions * sizeof(f16) * count as estimate
let dim = slot.dimensions as u64;
let bytes = count * dim * 2; // f16 = 2 bytes
total_vectors += count;
total_bytes += bytes;
}
(total_vectors, total_bytes)
}
}
If USearch provides a serialized_length() method, prefer that over the estimate. The estimate is a lower bound (excludes HNSW graph overhead).
BitmapIndex
Add a method to report total cardinality:
impl BitmapIndex {
/// Total number of entity IDs across all bitmap entries.
#[must_use]
pub fn total_cardinality(&self) -> u64 {
self.entries.iter().map(|e| e.value().len()).sum()
}
}
3. Periodic metrics refresh
In the checkpoint thread or a dedicated metrics-refresh interval (reuse the pattern from task-02), collect index stats:
#[cfg(feature = "metrics")]
fn refresh_index_metrics(db: &TidalDb) {
// Tantivy
if let Some(text_index) = &db.text_index {
let (segments, docs) = text_index.index_stats();
db.metrics.tantivy_segment_count.store(segments as u64, Ordering::Relaxed);
db.metrics.tantivy_indexed_docs.store(docs, Ordering::Relaxed);
}
// USearch
if let Ok(registry) = db.embedding_registry.read() {
let (vectors, bytes) = registry.index_stats();
db.metrics.usearch_vector_count.store(vectors, Ordering::Relaxed);
db.metrics.usearch_index_size_bytes.store(bytes, Ordering::Relaxed);
}
// Bitmap indexes
let cardinality = db.category_index.total_cardinality()
+ db.format_index.total_cardinality()
+ db.creator_index.total_cardinality()
+ db.tag_index.total_cardinality();
db.metrics.bitmap_index_cardinality.store(cardinality, Ordering::Relaxed);
}
Call this function every 10 seconds from the checkpoint thread's periodic loop. Index stats are not hot-path -- 10-second staleness is acceptable for monitoring.
4. Render in Prometheus format
Extend MetricsState::render_prometheus():
// Tantivy
write_gauge(&mut out, "tidaldb_tantivy_segment_count",
"Number of Tantivy index segments",
self.tantivy_segment_count.load(Ordering::Relaxed) as f64);
write_gauge(&mut out, "tidaldb_tantivy_indexed_docs",
"Number of documents indexed in Tantivy",
self.tantivy_indexed_docs.load(Ordering::Relaxed) as f64);
// USearch
write_gauge(&mut out, "tidaldb_usearch_index_size_bytes",
"Estimated byte size of USearch vector indexes",
self.usearch_index_size_bytes.load(Ordering::Relaxed) as f64);
write_gauge(&mut out, "tidaldb_usearch_vector_count",
"Number of vectors stored in USearch indexes",
self.usearch_vector_count.load(Ordering::Relaxed) as f64);
// Bitmap
write_gauge(&mut out, "tidaldb_bitmap_index_cardinality",
"Total entity IDs across all bitmap indexes",
self.bitmap_index_cardinality.load(Ordering::Relaxed) as f64);
5. Metric names (string literals)
| Metric name | Type | Description |
|---|---|---|
tidaldb_tantivy_segment_count |
gauge | Number of Tantivy index segments |
tidaldb_tantivy_indexed_docs |
gauge | Number of documents indexed in Tantivy |
tidaldb_usearch_index_size_bytes |
gauge | Estimated byte size of USearch vector indexes |
tidaldb_usearch_vector_count |
gauge | Number of vectors stored in USearch indexes |
tidaldb_bitmap_index_cardinality |
gauge | Total entity IDs across all bitmap indexes |
Acceptance Criteria
TextIndex::index_stats()returns(segment_count, doc_count)correctlyEmbeddingSlotRegistry::index_stats()returns(vector_count, byte_size)BitmapIndex::total_cardinality()sums across all entriesMetricsStateextended with 5 atomic gauges, all#[cfg(feature = "metrics")]- Metrics refreshed periodically (every 10 seconds in checkpoint thread)
/metricsendpoint renders all 5 new metrics in valid Prometheus format- Metrics reflect actual index state after writes (verified in integration test)
cargo clippy -D warningsandcargo fmt --checkpass
Test Strategy
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn text_index_stats_empty() {
let fields = vec![TextFieldDef { key: "title".into(), field_type: TextFieldType::Text }];
let idx = TextIndex::ephemeral(&fields).unwrap();
let (segments, docs) = idx.index_stats();
assert_eq!(docs, 0);
// Tantivy may report 0 or 1 segments for an empty index
assert!(segments <= 1);
}
#[test]
fn bitmap_total_cardinality_empty() {
let idx = BitmapIndex::new("test");
assert_eq!(idx.total_cardinality(), 0);
}
#[test]
fn bitmap_total_cardinality_after_inserts() {
let idx = BitmapIndex::new("test");
idx.insert("jazz", 1);
idx.insert("jazz", 2);
idx.insert("rock", 3);
assert_eq!(idx.total_cardinality(), 3);
}
}
Integration test:
#[test]
fn index_metrics_reflect_writes() {
let db = make_test_db_with_text_schema();
// Write items with metadata
for i in 0..10 {
db.write_item_with_metadata(
EntityId::new(i),
&HashMap::from([
("title".to_string(), format!("Item {i}")),
("category".to_string(), "jazz".to_string()),
]),
).unwrap();
}
db.flush_text_index().unwrap();
let metrics = db.metrics();
let prom = metrics.render_prometheus();
assert!(prom.contains("tidaldb_tantivy_indexed_docs"));
assert!(prom.contains("tidaldb_bitmap_index_cardinality"));
}