stemedb/ai-lookup/features/query-audit.md
jordan 1ce4004807 feat: Complete Phase 2 (The Cortex) - query, lens, and API layers
This commit adds the read path (Cortex) to complement the write path (Spine):

## Crates
- stemedb-api: HTTP API with axum + utoipa OpenAPI
  - /v1/assert, /v1/query, /v1/epoch, /v1/skeptic, /v1/trace, /v1/audit
  - Metered endpoints with quota enforcement
  - Ed25519 signature verification
- stemedb-lens: Truth resolution lenses
  - RecencyLens, ConsensusLens, ConfidenceLens
  - VoteAwareConsensusLens (Ballot Box pattern)
  - TrustAwareAuthorityLens (The Hive pattern)
  - SkepticLens (conflict analysis)
  - EpochAwareLens (paradigm-safe queries)
- stemedb-query: Query engine with materialized views

## Storage Extensions
- VoteStore: Vote aggregation with cached counts
- TrustRankStore: Agent reputation with decay
- AuditStore: Query audit trail
- IndexStore: SP/P/S index structures
- SupersessionStore: Epoch supersession chains

## SDKs
- sdk/go/steme: Go HTTP client with Ed25519 signing
- sdk/go/adk: ADK-Go tools for AI agents

## Documentation
- Updated CLAUDE.md, architecture.md, roadmap.md
- New ai-lookup entries for all services
- Use case docs for consumer health intelligence
- Arena roadmap for simulation advancement

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 13:22:44 -07:00

127 lines
3.6 KiB
Markdown

# Query Audit Trail
> **Quick Ref:** Every query is logged with provenance for incident investigation
> **Status:** ✅ Implemented (Phase 2)
## The Problem
At 3am, production is broken. An agent deployed wrong config. The SRE needs to know: What did the agent query? What result did it get? What assertions contributed?
Postgres query logs show SQL, not semantic meaning.
## The Solution
Every query to the Episteme API is automatically logged with full provenance:
```rust
/// Stored at `AUD:{query_id}` in the KV store.
pub struct QueryAudit {
pub query_id: QueryId, // Content-addressed hash
pub agent_id: Option<[u8; 32]>, // From X-Agent-Id header
pub timestamp: u64,
pub params: QueryParams, // Subject, predicate, lifecycle, epoch, lens
pub result_hash: Option<Hash>, // Hash of winning assertion
pub result_confidence: f32,
pub contributing_assertions: Vec<ContributingAssertion>,
}
pub struct ContributingAssertion {
pub assertion_hash: Hash,
pub weight: f32, // How much it influenced result (1.0 for winner)
pub source_hash: Hash, // Original evidence
pub lifecycle: LifecycleStage,
}
```
## Storage Layout
| Key Pattern | Value | Purpose |
|-------------|-------|---------|
| `AUD:{query_id}` | Serialized QueryAudit | Individual audit records |
| `AUDA:{agent_id}:{timestamp}:{query_id}` | Empty | Agent index for temporal queries |
## API
### List Query Audits
```bash
# List recent audits
GET /v1/audit/queries?limit=100
# Filter by agent
GET /v1/audit/queries?agent_id=<hex-encoded-pubkey>&from=1704067200&to=1704153600
```
### Get Specific Audit
```bash
# Full reasoning trace for a single query
GET /v1/audit/query/{query_id}
```
### Including Agent ID in Queries
To associate queries with an agent, include the `X-Agent-Id` header:
```bash
curl -H "X-Agent-Id: <hex-encoded-32-byte-pubkey>" \
"http://localhost:3000/v1/query?subject=Tesla&predicate=revenue"
```
## Response Format
```json
{
"query_id": "a7f3a2b9c1d4e5f6...",
"agent_id": "01020304...",
"timestamp": 1704153600,
"params": {
"subject": "auth/jwt",
"predicate": "signing_algorithm",
"lifecycle": "Approved",
"lens": "Authority"
},
"result_hash": "b8c9d0e1f2a3...",
"result_confidence": 0.87,
"contributing_assertions": [
{
"assertion_hash": "c1d2e3f4a5b6...",
"weight": 1.0,
"source_hash": "d2e3f4a5b6c7...",
"lifecycle": "Approved"
},
{
"assertion_hash": "e3f4a5b6c7d8...",
"weight": 0.0,
"source_hash": "f4a5b6c7d8e9...",
"lifecycle": "Proposed"
}
]
}
```
## Implementation Details
- **Query ID Generation:** Content-addressed hash of params + timestamp for deterministic IDs
- **Fire-and-Forget:** Audit logging doesn't block the query response; failures are logged but don't fail queries
- **Agent Index:** Enables O(1) lookups by agent + time range via prefix scan
## Latency Requirements (from user research)
| Query Type | Target Latency |
|------------|---------------|
| Point query (current) | < 100ms |
| Time-travel query | < 500ms |
| Audit trace | < 2s |
| Full provenance chain | < 5s |
## Crates
- **Types:** `stemedb_core::types::{QueryAudit, QueryParams, ContributingAssertion}`
- **Storage:** `stemedb_storage::{AuditStore, GenericAuditStore}`
- **API Handlers:** `stemedb_api::handlers::audit::{list_audits, get_audit}`
## Origin
This feature emerged from SRE perspective interviews (see `.claude/agents/perspective-oncall-sre.md`). Core need: "I need to trace from agent decision query assertions in under 10 minutes."