stemedb/GEMINI.md
jordan 02ecac9a07 fix: merge upstream 10 commits, fix DashMap deadlock, deterministic sim ingestion
Merged 10 upstream commits (MemTable, read-your-writes tests, feed endpoint,
security hardening, signed assertions, source registry, dashboard enhancements)
and fixed all test failures across the full workspace (2656/2656 passing).

Key fixes:
- fix(cluster): DashMap deadlock in swim.rs suspect_node/fail_node/alive_node
  - DashMap::get_mut RefMut + iter() on same map = non-reentrant write lock deadlock
  - Fix: extract clone in scoped block to drop RefMut before calling update_node_gauges()
  - 6 previously-hanging SWIM tests now pass in <2s
- fix(sim): replace background-task+polling ingestion with synchronous process_pending()
  - smoke_high_volume_simulation was CPU-starved under 2656 parallel tests
  - Removed ingestor.start() + wait_until_ingested() pattern throughout sim
  - All arena functions now call ingestor.process_pending() directly (deterministic)
- fix(test): v2 signature helper used wrong hash (rkyv vs canonical compute_content_hash_v2)
- fix(test): quota test signed "test" but v1 requires "subject:predicate" format
- fix(test): http_validation now accepts 400 for valid-format-but-invalid-crypto hex
- fix(test): scale_adaptive micro tier assertions updated (auto_promote upstream change)
- config: add nextest.toml with slow-timeout for background-task-tests group

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 20:27:32 -07:00

3.6 KiB

StemeDB (Episteme) Project Context

Project Overview

StemeDB (Episteme) is a probabilistic, log-structured, content-addressed knowledge graph database designed as the "Cortex" for autonomous AI research agents. Unlike traditional databases that enforce a single mutable state, StemeDB preserves immutable history and resolves conflicting assertions at read-time using "Lenses."

It serves as the "Git for Truth," allowing agents to:

  • Assert facts with cryptographic signatures and confidence scores.
  • Vote on assertions to build consensus without lock contention.
  • Fork reality to simulate "what-if" scenarios (Overlay Graphs).
  • Resolve truth dynamically via lenses like Consensus, Authority, or Recency.

Tech Stack

  • Language: Rust (2024 edition)
  • Durability: stemedb-wal (Quarantine Pattern with fs2, blake3 checksums)
  • Storage: stemedb-storage (Hybrid Store: fjall LSM-tree for writes, redb B-tree for reads)
  • Serialization: rkyv (Zero-copy deserialization for high performance)
  • Ingestion: stemedb-ingest (Async background worker bridging WAL and Store)
  • Simulation: stemedb-sim (Agent-based modeling to verify system behavior)

Architecture

The system follows a "Spine -> Lattice -> Cortex" architecture:

  1. The Spine (Durability):

    • Write-Ahead Log (WAL): Append-only log with strict fsync guarantees.
    • Ingestor: Background task that tails the WAL and indexes data.
    • KV Store: Persistent storage for assertions and indexes.
  2. The Lattice (Connectivity) - Implemented:

    • Ballot Box: High-velocity vote stream.
    • Materialized Views: Pre-computed truth states.
  3. The Cortex (Reasoning) - Implemented:

    • Lenses: WASM-based filters for truth resolution (Consensus, Authority, Recency, etc.).
    • SMT: Sparse Merkle Trees for efficient branching.

Key Files & Directories

  • stemedb/
    • crates/
      • stemedb-core/: Core data structures (Assertion, Vote, Epoch) and types.
      • stemedb-wal/: Durability primitives (Journal, FsyncGuard, Record).
      • stemedb-storage/: Storage engine abstraction and Hybrid Store implementation.
      • stemedb-ingest/: Async ingestion pipeline logic.
      • stemedb-lens/: Truth Lenses (Recency, Consensus, Authority, Skeptic).
      • stemedb-sim/: "The Arena" simulation for end-to-end verification.
    • architecture.md: Detailed system design and data flow.
    • roadmap.md: Phased implementation plan and status.
    • docs/sdk/go-usage-guide.md: Go SDK usage guide and patterns.
    • Makefile: Build and quality automation.

Building and Running

The project uses a Makefile for common tasks:

  • Build: make build (Compiles the workspace)
  • Test: make test (Runs unit tests across all crates)
  • Quality Check: make quality (Runs fmt, strict clippy linting, duplication checks, and tests)
  • Run Simulation: cargo run -p stemedb-sim (Executes the spine verification simulation)
  • Format: make fmt (Auto-formats code)

Development Conventions

  • Strict Quality: make quality must pass before committing.
    • No unwrap() or expect() in production code (enforced by clippy).
    • Zero warnings allowed.
    • Missing documentation is a hard error.
  • Testing: Every crate must have unit tests. The stemedb-sim crate serves as the integration test suite.
  • Architecture: Follow the "Defensive by Default" philosophy. Durability > Speed > Features.