stemedb/GEMINI.md
jml 9bfa626203 docs: reorganize documentation structure for clarity
Major documentation restructure to improve discoverability and reduce duplication.

## Changes

**Deleted (Archived/Consolidated)**:
- Removed duplicate getting started guides
- Archived outdated planning documents
- Consolidated corpus and configuration docs
- Removed obsolete vision/spec files (superseded by vision.md)
- Cleaned up scrapyard and old PDFs

**New Structure**:
- docs/about/ - Project overview and introduction
- docs/guides/ - User guides (moved from root)
- docs/specs/ - Technical specifications
- docs/sdk/ - SDK documentation (Go)
- docs/references/ - API references
- docs/archive/ - Archived historical docs
- applications/aphoria/docs/advanced/ - Advanced topics
- applications/aphoria/docs/reference/ - CLI reference
- applications/aphoria/docs/archive/ - Archived aphoria docs

**Updated**:
- README.md - New root README with clear navigation
- CONTRIBUTING.md - Contribution guidelines
- CLAUDE.md - Updated paths to new structure
- roadmap.md - Added recent completions

## Files Changed
- 57 files changed
- 1,977 insertions(+)
- 961 deletions(-)

**Net change**: +1,016 lines (added CONTRIBUTING.md, README.md, reorganized content)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-11 07:33:40 +00:00

3.4 KiB

StemeDB (Episteme) Project Context

Project Overview

StemeDB (Episteme) is a probabilistic, log-structured, content-addressed knowledge graph database designed as the "Cortex" for autonomous AI research agents. Unlike traditional databases that enforce a single mutable state, StemeDB preserves immutable history and resolves conflicting assertions at read-time using "Lenses."

It serves as the "Git for Truth," allowing agents to:

  • Assert facts with cryptographic signatures and confidence scores.
  • Vote on assertions to build consensus without lock contention.
  • Fork reality to simulate "what-if" scenarios (Overlay Graphs).
  • Resolve truth dynamically via lenses like Consensus, Authority, or Recency.

Tech Stack

  • Language: Rust (2024 edition)
  • Durability: stemedb-wal (Quarantine Pattern with fs2, blake3 checksums)
  • Storage: stemedb-storage (sled embedded KV, abstracted via KVStore trait)
  • Serialization: rkyv (Zero-copy deserialization for high performance)
  • Ingestion: stemedb-ingest (Async background worker bridging WAL and Store)
  • Simulation: stemedb-sim (Agent-based modeling to verify system behavior)

Architecture

The system follows a "Spine -> Lattice -> Cortex" architecture:

  1. The Spine (Durability):

    • Write-Ahead Log (WAL): Append-only log with strict fsync guarantees.
    • Ingestor: Background task that tails the WAL and indexes data.
    • KV Store: Persistent storage for assertions and indexes.
  2. The Lattice (Connectivity) - In Progress:

    • Ballot Box: High-velocity vote stream.
    • Materialized Views: Pre-computed truth states.
  3. The Cortex (Reasoning) - Planned:

    • Lenses: WASM-based filters for truth resolution.
    • SMT: Sparse Merkle Trees for efficient branching.

Key Files & Directories

  • stemedb/
    • crates/
      • stemedb-core/: Core data structures (Assertion, Vote, Epoch) and types.
      • stemedb-wal/: Durability primitives (Journal, FsyncGuard, Record).
      • stemedb-storage/: Storage engine abstraction and sled implementation.
      • stemedb-ingest/: Async ingestion pipeline logic.
      • stemedb-sim/: "The Arena" simulation for end-to-end verification.
    • architecture.md: Detailed system design and data flow.
    • roadmap.md: Phased implementation plan and status.
    • docs/sdk/go-usage-guide.md: Go SDK usage guide and patterns.
    • Makefile: Build and quality automation.

Building and Running

The project uses a Makefile for common tasks:

  • Build: make build (Compiles the workspace)
  • Test: make test (Runs unit tests across all crates)
  • Quality Check: make quality (Runs fmt, strict clippy linting, duplication checks, and tests)
  • Run Simulation: cargo run -p stemedb-sim (Executes the spine verification simulation)
  • Format: make fmt (Auto-formats code)

Development Conventions

  • Strict Quality: make quality must pass before committing.
    • No unwrap() or expect() in production code (enforced by clippy).
    • Zero warnings allowed.
    • Missing documentation is a hard error.
  • Testing: Every crate must have unit tests. The stemedb-sim crate serves as the integration test suite.
  • Architecture: Follow the "Defensive by Default" philosophy. Durability > Speed > Features.