stemedb/ai-lookup/features/query-audit.md

# Query Audit Trail

> **Quick Ref:** Every query is logged with provenance for incident investigation
> **Status:** ✅ Implemented (Phase 2)

## The Problem

At 3am, production is broken. An agent deployed wrong config. The SRE needs to know: What did the agent query? What result did it get? What assertions contributed?

Postgres query logs show SQL, not semantic meaning.

## The Solution

Every query to the Episteme API is automatically logged with full provenance:

```rust
/// Stored at `AUD:{query_id}` in the KV store.
pub struct QueryAudit {
    pub query_id: QueryId,           // Content-addressed hash
    pub agent_id: Option<[u8; 32]>,  // From X-Agent-Id header
    pub timestamp: u64,
    pub params: QueryParams,         // Subject, predicate, lifecycle, epoch, lens
    pub result_hash: Option<Hash>,   // Hash of winning assertion
    pub result_confidence: f32,
    pub contributing_assertions: Vec<ContributingAssertion>,
}

pub struct ContributingAssertion {
    pub assertion_hash: Hash,
    pub weight: f32,        // How much it influenced result (1.0 for winner)
    pub source_hash: Hash,  // Original evidence
    pub lifecycle: LifecycleStage,
}
```

## Storage Layout

| Key Pattern | Value | Purpose |
|-------------|-------|---------|
| `AUD:{query_id}` | Serialized QueryAudit | Individual audit records |
| `AUDA:{agent_id}:{timestamp}:{query_id}` | Empty | Agent index for temporal queries |

## API

### List Query Audits

```bash
# List recent audits
GET /v1/audit/queries?limit=100

# Filter by agent
GET /v1/audit/queries?agent_id=<hex-encoded-pubkey>&from=1704067200&to=1704153600
```

### Get Specific Audit

```bash
# Full reasoning trace for a single query
GET /v1/audit/query/{query_id}
```

### Including Agent ID in Queries

To associate queries with an agent, include the `X-Agent-Id` header:

```bash
curl -H "X-Agent-Id: <hex-encoded-32-byte-pubkey>" \
     "http://localhost:3000/v1/query?subject=Tesla&predicate=revenue"
```

## Response Format

```json
{
  "query_id": "a7f3a2b9c1d4e5f6...",
  "agent_id": "01020304...",
  "timestamp": 1704153600,
  "params": {
    "subject": "auth/jwt",
    "predicate": "signing_algorithm",
    "lifecycle": "Approved",
    "lens": "Authority"
  },
  "result_hash": "b8c9d0e1f2a3...",
  "result_confidence": 0.87,
  "contributing_assertions": [
    {
      "assertion_hash": "c1d2e3f4a5b6...",
      "weight": 1.0,
      "source_hash": "d2e3f4a5b6c7...",
      "lifecycle": "Approved"
    },
    {
      "assertion_hash": "e3f4a5b6c7d8...",
      "weight": 0.0,
      "source_hash": "f4a5b6c7d8e9...",
      "lifecycle": "Proposed"
    }
  ]
}
```

## Implementation Details

- **Query ID Generation:** Content-addressed hash of params + timestamp for deterministic IDs
- **Fire-and-Forget:** Audit logging doesn't block the query response; failures are logged but don't fail queries
- **Agent Index:** Enables O(1) lookups by agent + time range via prefix scan

## Latency Requirements (from user research)

| Query Type | Target Latency |
|------------|---------------|
| Point query (current) | < 100ms |
| Time-travel query | < 500ms |
| Audit trace | < 2s |
| Full provenance chain | < 5s |

## Crates

- **Types:** `stemedb_core::types::{QueryAudit, QueryParams, ContributingAssertion}`
- **Storage:** `stemedb_storage::{AuditStore, GenericAuditStore}`
- **API Handlers:** `stemedb_api::handlers::audit::{list_audits, get_audit}`

## Origin

This feature emerged from SRE perspective interviews (see `.claude/agents/perspective-oncall-sre.md`). Core need: "I need to trace from agent decision → query → assertions in under 10 minutes."