# StemeDB (Episteme) Project Context ## Project Overview **StemeDB (Episteme)** is a probabilistic, log-structured, content-addressed knowledge graph database designed as the "Cortex" for autonomous AI research agents. Unlike traditional databases that enforce a single mutable state, StemeDB preserves immutable history and resolves conflicting assertions at read-time using "Lenses." It serves as the "Git for Truth," allowing agents to: * **Assert** facts with cryptographic signatures and confidence scores. * **Vote** on assertions to build consensus without lock contention. * **Fork** reality to simulate "what-if" scenarios (Overlay Graphs). * **Resolve** truth dynamically via lenses like Consensus, Authority, or Recency. ## Tech Stack * **Language:** Rust (2024 edition) * **Durability:** `stemedb-wal` (Quarantine Pattern with `fs2`, `blake3` checksums) * **Storage:** `stemedb-storage` (Hybrid Store: `fjall` LSM-tree for writes, `redb` B-tree for reads) * **Serialization:** `rkyv` (Zero-copy deserialization for high performance) * **Ingestion:** `stemedb-ingest` (Async background worker bridging WAL and Store) * **Simulation:** `stemedb-sim` (Agent-based modeling to verify system behavior) ## Architecture The system follows a "Spine -> Lattice -> Cortex" architecture: 1. **The Spine (Durability):** * **Write-Ahead Log (WAL):** Append-only log with strict `fsync` guarantees. * **Ingestor:** Background task that tails the WAL and indexes data. * **KV Store:** Persistent storage for assertions and indexes. 2. **The Lattice (Connectivity) - *Implemented*:** * **Ballot Box:** High-velocity vote stream. * **Materialized Views:** Pre-computed truth states. 3. **The Cortex (Reasoning) - *Implemented*:** * **Lenses:** WASM-based filters for truth resolution (Consensus, Authority, Recency, etc.). * **SMT:** Sparse Merkle Trees for efficient branching. ## Key Files & Directories * `stemedb/` * `crates/` * `stemedb-core/`: Core data structures (`Assertion`, `Vote`, `Epoch`) and types. * `stemedb-wal/`: Durability primitives (`Journal`, `FsyncGuard`, `Record`). * `stemedb-storage/`: Storage engine abstraction and Hybrid Store implementation. * `stemedb-ingest/`: Async ingestion pipeline logic. * `stemedb-lens/`: Truth Lenses (`Recency`, `Consensus`, `Authority`, `Skeptic`). * `stemedb-sim/`: "The Arena" simulation for end-to-end verification. * `architecture.md`: Detailed system design and data flow. * `roadmap.md`: Phased implementation plan and status. * `docs/sdk/go-usage-guide.md`: Go SDK usage guide and patterns. * `Makefile`: Build and quality automation. ## Building and Running The project uses a `Makefile` for common tasks: * **Build:** `make build` (Compiles the workspace) * **Test:** `make test` (Runs unit tests across all crates) * **Quality Check:** `make quality` (Runs fmt, strict clippy linting, duplication checks, and tests) * **Run Simulation:** `cargo run -p stemedb-sim` (Executes the spine verification simulation) * **Format:** `make fmt` (Auto-formats code) ## Development Conventions * **Strict Quality:** `make quality` must pass before committing. * No `unwrap()` or `expect()` in production code (enforced by clippy). * Zero warnings allowed. * Missing documentation is a hard error. * **Testing:** Every crate must have unit tests. The `stemedb-sim` crate serves as the integration test suite. * **Architecture:** Follow the "Defensive by Default" philosophy. Durability > Speed > Features.