stemedb/GEMINI.md
jordan 3cfaa1e1d3 feat: Complete Phase 1 (The Spine) - storage foundation
Phase 1 delivers the complete durability and storage layer:

- WAL with crash recovery: Append-only journal with BLAKE3 checksums,
  fsync guarantees, and proper seek-to-EOF on reopen
- Storage engine: sled-backed KVStore with scan_prefix for range queries
- Content-addressed storage: H:{hash}, V:{hash}, E:{hash} key patterns
- Ingestor: Background worker tailing WAL, writing to KV with 8-byte
  aligned record headers for rkyv zero-copy deserialization
- Comprehensive tests: 31 tests covering crash recovery, round-trips,
  and multi-cycle durability

New crates: stemedb-wal, stemedb-storage, stemedb-ingest

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 14:15:34 -07:00

3.4 KiB

StemeDB (Episteme) Project Context

Project Overview

StemeDB (Episteme) is a probabilistic, log-structured, content-addressed knowledge graph database designed as the "Cortex" for autonomous AI research agents. Unlike traditional databases that enforce a single mutable state, StemeDB preserves immutable history and resolves conflicting assertions at read-time using "Lenses."

It serves as the "Git for Truth," allowing agents to:

  • Assert facts with cryptographic signatures and confidence scores.
  • Vote on assertions to build consensus without lock contention.
  • Fork reality to simulate "what-if" scenarios (Overlay Graphs).
  • Resolve truth dynamically via lenses like Consensus, Authority, or Recency.

Tech Stack

  • Language: Rust (2024 edition)
  • Durability: stemedb-wal (Quarantine Pattern with fs2, blake3 checksums)
  • Storage: stemedb-storage (sled embedded KV, abstracted via KVStore trait)
  • Serialization: rkyv (Zero-copy deserialization for high performance)
  • Ingestion: stemedb-ingest (Async background worker bridging WAL and Store)
  • Simulation: stemedb-sim (Agent-based modeling to verify system behavior)

Architecture

The system follows a "Spine -> Lattice -> Cortex" architecture:

  1. The Spine (Durability):

    • Write-Ahead Log (WAL): Append-only log with strict fsync guarantees.
    • Ingestor: Background task that tails the WAL and indexes data.
    • KV Store: Persistent storage for assertions and indexes.
  2. The Lattice (Connectivity) - In Progress:

    • Ballot Box: High-velocity vote stream.
    • Materialized Views: Pre-computed truth states.
  3. The Cortex (Reasoning) - Planned:

    • Lenses: WASM-based filters for truth resolution.
    • SMT: Sparse Merkle Trees for efficient branching.

Key Files & Directories

  • stemedb/
    • crates/
      • stemedb-core/: Core data structures (Assertion, Vote, Epoch) and types.
      • stemedb-wal/: Durability primitives (Journal, FsyncGuard, Record).
      • stemedb-storage/: Storage engine abstraction and sled implementation.
      • stemedb-ingest/: Async ingestion pipeline logic.
      • stemedb-sim/: "The Arena" simulation for end-to-end verification.
    • architecture.md: Detailed system design and data flow.
    • roadmap.md: Phased implementation plan and status.
    • usage.md: Rust API usage guide and vision for agent interaction.
    • Makefile: Build and quality automation.

Building and Running

The project uses a Makefile for common tasks:

  • Build: make build (Compiles the workspace)
  • Test: make test (Runs unit tests across all crates)
  • Quality Check: make quality (Runs fmt, strict clippy linting, duplication checks, and tests)
  • Run Simulation: cargo run -p stemedb-sim (Executes the spine verification simulation)
  • Format: make fmt (Auto-formats code)

Development Conventions

  • Strict Quality: make quality must pass before committing.
    • No unwrap() or expect() in production code (enforced by clippy).
    • Zero warnings allowed.
    • Missing documentation is a hard error.
  • Testing: Every crate must have unit tests. The stemedb-sim crate serves as the integration test suite.
  • Architecture: Follow the "Defensive by Default" philosophy. Durability > Speed > Features.