Phase 1 delivers the complete durability and storage layer:
- WAL with crash recovery: Append-only journal with BLAKE3 checksums,
fsync guarantees, and proper seek-to-EOF on reopen
- Storage engine: sled-backed KVStore with scan_prefix for range queries
- Content-addressed storage: H:{hash}, V:{hash}, E:{hash} key patterns
- Ingestor: Background worker tailing WAL, writing to KV with 8-byte
aligned record headers for rkyv zero-copy deserialization
- Comprehensive tests: 31 tests covering crash recovery, round-trips,
and multi-cycle durability
New crates: stemedb-wal, stemedb-storage, stemedb-ingest
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3.4 KiB
3.4 KiB
StemeDB (Episteme) Project Context
Project Overview
StemeDB (Episteme) is a probabilistic, log-structured, content-addressed knowledge graph database designed as the "Cortex" for autonomous AI research agents. Unlike traditional databases that enforce a single mutable state, StemeDB preserves immutable history and resolves conflicting assertions at read-time using "Lenses."
It serves as the "Git for Truth," allowing agents to:
- Assert facts with cryptographic signatures and confidence scores.
- Vote on assertions to build consensus without lock contention.
- Fork reality to simulate "what-if" scenarios (Overlay Graphs).
- Resolve truth dynamically via lenses like Consensus, Authority, or Recency.
Tech Stack
- Language: Rust (2024 edition)
- Durability:
stemedb-wal(Quarantine Pattern withfs2,blake3checksums) - Storage:
stemedb-storage(sledembedded KV, abstracted viaKVStoretrait) - Serialization:
rkyv(Zero-copy deserialization for high performance) - Ingestion:
stemedb-ingest(Async background worker bridging WAL and Store) - Simulation:
stemedb-sim(Agent-based modeling to verify system behavior)
Architecture
The system follows a "Spine -> Lattice -> Cortex" architecture:
-
The Spine (Durability):
- Write-Ahead Log (WAL): Append-only log with strict
fsyncguarantees. - Ingestor: Background task that tails the WAL and indexes data.
- KV Store: Persistent storage for assertions and indexes.
- Write-Ahead Log (WAL): Append-only log with strict
-
The Lattice (Connectivity) - In Progress:
- Ballot Box: High-velocity vote stream.
- Materialized Views: Pre-computed truth states.
-
The Cortex (Reasoning) - Planned:
- Lenses: WASM-based filters for truth resolution.
- SMT: Sparse Merkle Trees for efficient branching.
Key Files & Directories
stemedb/crates/stemedb-core/: Core data structures (Assertion,Vote,Epoch) and types.stemedb-wal/: Durability primitives (Journal,FsyncGuard,Record).stemedb-storage/: Storage engine abstraction andsledimplementation.stemedb-ingest/: Async ingestion pipeline logic.stemedb-sim/: "The Arena" simulation for end-to-end verification.
architecture.md: Detailed system design and data flow.roadmap.md: Phased implementation plan and status.usage.md: Rust API usage guide and vision for agent interaction.Makefile: Build and quality automation.
Building and Running
The project uses a Makefile for common tasks:
- Build:
make build(Compiles the workspace) - Test:
make test(Runs unit tests across all crates) - Quality Check:
make quality(Runs fmt, strict clippy linting, duplication checks, and tests) - Run Simulation:
cargo run -p stemedb-sim(Executes the spine verification simulation) - Format:
make fmt(Auto-formats code)
Development Conventions
- Strict Quality:
make qualitymust pass before committing.- No
unwrap()orexpect()in production code (enforced by clippy). - Zero warnings allowed.
- Missing documentation is a hard error.
- No
- Testing: Every crate must have unit tests. The
stemedb-simcrate serves as the integration test suite. - Architecture: Follow the "Defensive by Default" philosophy. Durability > Speed > Features.