jordan 3cfaa1e1d3 feat: Complete Phase 1 (The Spine) - storage foundation

Phase 1 delivers the complete durability and storage layer:

- WAL with crash recovery: Append-only journal with BLAKE3 checksums,
  fsync guarantees, and proper seek-to-EOF on reopen
- Storage engine: sled-backed KVStore with scan_prefix for range queries
- Content-addressed storage: H:{hash}, V:{hash}, E:{hash} key patterns
- Ingestor: Background worker tailing WAL, writing to KV with 8-byte
  aligned record headers for rkyv zero-copy deserialization
- Comprehensive tests: 31 tests covering crash recovery, round-trips,
  and multi-cycle durability

New crates: stemedb-wal, stemedb-storage, stemedb-ingest

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-31 14:15:34 -07:00

2.0 KiB

Raw Blame History

Query Audit Trail

Quick Ref: Every query is logged with provenance for incident investigation

The Problem

At 3am, production is broken. An agent deployed wrong config. The SRE needs to know: What did the agent query? What result did it get? What assertions contributed?

Postgres query logs show SQL, not semantic meaning.

The Solution

struct QueryAudit {
    pub query_id: Hash,
    pub agent_id: AgentId,
    pub timestamp: u64,
    pub subject: EntityId,
    pub predicate: RelationId,
    pub lens: LensType,
    pub lifecycle_filter: Option<LifecycleStage>,
    pub result_hash: Hash,
    pub result_confidence: f32,
    pub contributing_assertions: Vec<ContributingAssertion>,
}

struct ContributingAssertion {
    pub assertion_hash: Hash,
    pub weight: f32,        // How much it influenced result
    pub source_hash: Hash,  // Original evidence
}

API

# What queries did this agent run?
GET /audit/queries?agent=deployment-agent&from=2024-01-15T20:00:00Z

# Trace command for incident investigation
episteme trace --agent deployment-agent \
    --time "6 hours ago" \
    --subject "auth/*"

Response Format

{
  "query_id": "q_7f3a2b...",
  "timestamp": "2024-01-15T21:03:47Z",
  "subject": "auth/jwt",
  "predicate": "signing_algorithm",
  "lens": "authority",
  "lifecycle_filter": null,
  "result": {
    "value": "ES256",
    "confidence": 0.87
  },
  "contributing_assertions": [
    {
      "hash": "rfc_2024_001...",
      "lifecycle": "Proposed",
      "weight": 0.9,
      "source": "security-rfc-2024.md"
    }
  ]
}

Latency Requirements (from user research)

Query Type	Target Latency
Point query (current)	< 100ms
Time-travel query	< 500ms
Audit trace	< 2s
Full provenance chain	< 5s

Origin

This feature emerged from SRE perspective interviews (see .claude/agents/perspective-oncall-sre.md). Core need: "I need to trace from agent decision → query → assertions in under 10 minutes."

2.0 KiB Raw Blame History