Major additions: - Community Next.js app (port 18187) for browsing claims with API docs - stemedb-chaos crate: Fault injection, chaos testing, CRDT properties - Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents - Disputed claims handling: Manual review workflows and validation - Aphoria security scanner: New extractors (SQL injection, command injection, weak crypto, TLS version), policy-based ignores, UAT reports - Docker infrastructure: Dockerfile, docker-compose.yml for full stack - VulnBank demo: Intentionally vulnerable multi-language test corpus SDK & API enhancements: - Source registry handlers for tracking data provenance - Metrics endpoint - Skeptic filtering improvements Code quality: - Split 14 large files (>500 lines) into focused modules - All files now under 500-line limit per project guidelines Documentation: - Chaos testing guide, circuit breakers, observability docs - Phase 7 UAT documentation updates - Martin Kleppmann technical writer agent Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3.6 KiB
3.6 KiB
Query Audit Trail
Quick Ref: Every query is logged with provenance for incident investigation Status: ✅ Implemented (Phase 2)
The Problem
At 3am, production is broken. An agent deployed wrong config. The SRE needs to know: What did the agent query? What result did it get? What assertions contributed?
Postgres query logs show SQL, not semantic meaning.
The Solution
Every query to the Episteme API is automatically logged with full provenance:
/// Stored at `AUD:{query_id}` in the KV store.
pub struct QueryAudit {
pub query_id: QueryId, // Content-addressed hash
pub agent_id: Option<[u8; 32]>, // From X-Agent-Id header
pub timestamp: u64,
pub params: QueryParams, // Subject, predicate, lifecycle, epoch, lens
pub result_hash: Option<Hash>, // Hash of winning assertion
pub result_confidence: f32,
pub contributing_assertions: Vec<ContributingAssertion>,
}
pub struct ContributingAssertion {
pub assertion_hash: Hash,
pub weight: f32, // How much it influenced result (1.0 for winner)
pub source_hash: Hash, // Original evidence
pub lifecycle: LifecycleStage,
}
Storage Layout
| Key Pattern | Value | Purpose |
|---|---|---|
AUD:{query_id} |
Serialized QueryAudit | Individual audit records |
AUDA:{agent_id}:{timestamp}:{query_id} |
Empty | Agent index for temporal queries |
API
List Query Audits
# List recent audits
GET /v1/audit/queries?limit=100
# Filter by agent
GET /v1/audit/queries?agent_id=<hex-encoded-pubkey>&from=1704067200&to=1704153600
Get Specific Audit
# Full reasoning trace for a single query
GET /v1/audit/query/{query_id}
Including Agent ID in Queries
To associate queries with an agent, include the X-Agent-Id header:
curl -H "X-Agent-Id: <hex-encoded-32-byte-pubkey>" \
"http://localhost:18180/v1/query?subject=Tesla&predicate=revenue"
Response Format
{
"query_id": "a7f3a2b9c1d4e5f6...",
"agent_id": "01020304...",
"timestamp": 1704153600,
"params": {
"subject": "auth/jwt",
"predicate": "signing_algorithm",
"lifecycle": "Approved",
"lens": "Authority"
},
"result_hash": "b8c9d0e1f2a3...",
"result_confidence": 0.87,
"contributing_assertions": [
{
"assertion_hash": "c1d2e3f4a5b6...",
"weight": 1.0,
"source_hash": "d2e3f4a5b6c7...",
"lifecycle": "Approved"
},
{
"assertion_hash": "e3f4a5b6c7d8...",
"weight": 0.0,
"source_hash": "f4a5b6c7d8e9...",
"lifecycle": "Proposed"
}
]
}
Implementation Details
- Query ID Generation: Content-addressed hash of params + timestamp for deterministic IDs
- Fire-and-Forget: Audit logging doesn't block the query response; failures are logged but don't fail queries
- Agent Index: Enables O(1) lookups by agent + time range via prefix scan
Latency Requirements (from user research)
| Query Type | Target Latency |
|---|---|
| Point query (current) | < 100ms |
| Time-travel query | < 500ms |
| Audit trace | < 2s |
| Full provenance chain | < 5s |
Crates
- Types:
stemedb_core::types::{QueryAudit, QueryParams, ContributingAssertion} - Storage:
stemedb_storage::{AuditStore, GenericAuditStore} - API Handlers:
stemedb_api::handlers::audit::{list_audits, get_audit}
Origin
This feature emerged from SRE perspective interviews (see .claude/agents/perspective-oncall-sre.md). Core need: "I need to trace from agent decision → query → assertions in under 10 minutes."