# HTTP SLI Metrics Completion Guide ## Status: Layer 3 (HTTP SLI Metrics) - 5% Complete **Completed:** - ✅ Pattern established in `handlers/vote.rs` (reference implementation) - ✅ Helper script created at `scripts/add_http_metrics.sh` **Remaining:** 19+ handlers need the same pattern applied ## Reference Pattern (from vote.rs) ```rust pub async fn handler_function( State(state): State, // ... other parameters ) -> Result<(StatusCode, Json)> { // 1. Start timing + increment request counter let start = std::time::Instant::now(); metrics::counter!("stemedb_http_requests_total", "method" => "POST", "path" => "/v1/endpoint").increment(1); // 2. Handler logic (unchanged) // ... // 3. Capture result let result = Ok((StatusCode::OK, Json(response))); // 4. Track duration with status let status = match &result { Ok((s, _)) => s.as_u16(), Err(_) => 500, }; metrics::histogram!("stemedb_http_request_duration_seconds", "method" => "POST", "path" => "/v1/endpoint", "status" => status.to_string().as_str() ).record(start.elapsed().as_secs_f64()); result } ``` ## Handlers Requiring Metrics ### Write Endpoints - [ ] `handlers/supersession.rs::supersede` (POST /v1/supersede) - [ ] `handlers/epoch.rs::create_epoch` (POST /v1/epoch) - [ ] `handlers/source.rs::store_source` (POST /v1/source) ### Admin Endpoints - [ ] `handlers/admin.rs::decay_trust_ranks` (POST /v1/admin/decay_trust_ranks) - [ ] `handlers/escalation.rs::resolve_escalation` (POST /v1/admin/escalation/resolve) - [ ] `handlers/gold_standard.rs::create_gold_standard` (POST /v1/gold_standard) - [ ] `handlers/gold_standard.rs::remove_gold_standard` (DELETE /v1/gold_standard) - [ ] `handlers/gold_standard.rs::verify_agent` (POST /v1/gold_standard/verify) - [ ] `handlers/quarantine.rs::approve_quarantine` (POST /v1/admin/quarantine/approve) - [ ] `handlers/quarantine.rs::reject_quarantine` (POST /v1/admin/quarantine/reject) - [ ] `handlers/circuit_breaker.rs::reset_circuit` (POST /v1/admin/circuit_breaker/reset) - [ ] `handlers/api_keys.rs::create_api_key` (POST /v1/admin/api_keys) - [ ] `handlers/api_keys.rs::revoke_api_key` (DELETE /v1/admin/api_keys) - [ ] `handlers/api_keys.rs::rotate_api_key` (POST /v1/admin/api_keys/rotate) - [ ] `handlers/api_keys.rs::update_api_key` (PATCH /v1/admin/api_keys) ### Read Endpoints - [ ] `handlers/audit.rs::list_audits` (GET /v1/audit) - [ ] `handlers/audit.rs::get_audit` (GET /v1/audit/{id}) - [ ] `handlers/source.rs::get_provenance` (GET /v1/source/provenance) - [ ] `handlers/concepts.rs::resolve_alias` (GET /v1/concepts/alias) - [ ] `handlers/concepts.rs::list_aliases` (GET /v1/concepts/aliases) - [ ] `handlers/concepts.rs::suggest_aliases` (GET /v1/concepts/suggest) - [ ] `handlers/concepts.rs::parse_concept_path` (GET /v1/concepts/parse) ### Aphoria Endpoints (if feature enabled) - [ ] `handlers/aphoria/policy.rs::bless` (POST /v1/aphoria/policy/bless) - [ ] `handlers/aphoria/policy.rs::export_policy` (GET /v1/aphoria/policy/export) - [ ] `handlers/aphoria/policy.rs::import_policy` (POST /v1/aphoria/policy/import) - [ ] `handlers/aphoria/scan.rs::scan` (POST /v1/aphoria/scan) - [ ] `handlers/aphoria/report.rs::push_observations` (POST /v1/aphoria/report) ## Completion Steps 1. **For each handler:** - Add `let start = std::time::Instant::now();` at function start - Add `metrics::counter!` increment after timing starts - Wrap the return value in a variable (`let result = Ok(...)`) - Add status extraction and histogram recording before returning - Return `result` 2. **Verification:** ```bash # After making changes cargo build --workspace cargo run --bin stemedb-api & # Trigger endpoint curl -X POST http://localhost:18180/v1/vote -d '...' # Check metrics curl http://localhost:18180/metrics | grep stemedb_http_request_duration_seconds curl http://localhost:18180/metrics | grep stemedb_http_requests_total ``` 3. **Estimated time:** ~2-3 hours for all 20+ handlers ## Metrics Added Once complete, these metrics will be available: - `stemedb_http_requests_total{method,path}` (counter) - Total request count per endpoint - `stemedb_http_request_duration_seconds{method,path,status}` (histogram) - Request latency distribution ## Next Steps After Completion After Layer 3 is complete: 1. Verify all metrics appear in `/metrics` endpoint 2. Create Grafana dashboards (Layer 5) 3. Configure Prometheus alerts (Layer 6) 4. Set up PagerDuty/Slack integration (Layer 7)