Phase 1 delivers the complete durability and storage layer:
- WAL with crash recovery: Append-only journal with BLAKE3 checksums,
fsync guarantees, and proper seek-to-EOF on reopen
- Storage engine: sled-backed KVStore with scan_prefix for range queries
- Content-addressed storage: H:{hash}, V:{hash}, E:{hash} key patterns
- Ingestor: Background worker tailing WAL, writing to KV with 8-byte
aligned record headers for rkyv zero-copy deserialization
- Comprehensive tests: 31 tests covering crash recovery, round-trips,
and multi-cycle durability
New crates: stemedb-wal, stemedb-storage, stemedb-ingest
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
83 lines
2.0 KiB
Markdown
83 lines
2.0 KiB
Markdown
# Query Audit Trail
|
|
|
|
> **Quick Ref:** Every query is logged with provenance for incident investigation
|
|
|
|
## The Problem
|
|
|
|
At 3am, production is broken. An agent deployed wrong config. The SRE needs to know: What did the agent query? What result did it get? What assertions contributed?
|
|
|
|
Postgres query logs show SQL, not semantic meaning.
|
|
|
|
## The Solution
|
|
|
|
```rust
|
|
struct QueryAudit {
|
|
pub query_id: Hash,
|
|
pub agent_id: AgentId,
|
|
pub timestamp: u64,
|
|
pub subject: EntityId,
|
|
pub predicate: RelationId,
|
|
pub lens: LensType,
|
|
pub lifecycle_filter: Option<LifecycleStage>,
|
|
pub result_hash: Hash,
|
|
pub result_confidence: f32,
|
|
pub contributing_assertions: Vec<ContributingAssertion>,
|
|
}
|
|
|
|
struct ContributingAssertion {
|
|
pub assertion_hash: Hash,
|
|
pub weight: f32, // How much it influenced result
|
|
pub source_hash: Hash, // Original evidence
|
|
}
|
|
```
|
|
|
|
## API
|
|
|
|
```bash
|
|
# What queries did this agent run?
|
|
GET /audit/queries?agent=deployment-agent&from=2024-01-15T20:00:00Z
|
|
|
|
# Trace command for incident investigation
|
|
episteme trace --agent deployment-agent \
|
|
--time "6 hours ago" \
|
|
--subject "auth/*"
|
|
```
|
|
|
|
## Response Format
|
|
|
|
```json
|
|
{
|
|
"query_id": "q_7f3a2b...",
|
|
"timestamp": "2024-01-15T21:03:47Z",
|
|
"subject": "auth/jwt",
|
|
"predicate": "signing_algorithm",
|
|
"lens": "authority",
|
|
"lifecycle_filter": null,
|
|
"result": {
|
|
"value": "ES256",
|
|
"confidence": 0.87
|
|
},
|
|
"contributing_assertions": [
|
|
{
|
|
"hash": "rfc_2024_001...",
|
|
"lifecycle": "Proposed",
|
|
"weight": 0.9,
|
|
"source": "security-rfc-2024.md"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
## Latency Requirements (from user research)
|
|
|
|
| Query Type | Target Latency |
|
|
|------------|---------------|
|
|
| Point query (current) | < 100ms |
|
|
| Time-travel query | < 500ms |
|
|
| Audit trace | < 2s |
|
|
| Full provenance chain | < 5s |
|
|
|
|
## Origin
|
|
|
|
This feature emerged from SRE perspective interviews (see `.claude/agents/perspective-oncall-sre.md`). Core need: "I need to trace from agent decision → query → assertions in under 10 minutes."
|