stemedb/GEMINI.md
jml 9bfa626203 docs: reorganize documentation structure for clarity
Major documentation restructure to improve discoverability and reduce duplication.

## Changes

**Deleted (Archived/Consolidated)**:
- Removed duplicate getting started guides
- Archived outdated planning documents
- Consolidated corpus and configuration docs
- Removed obsolete vision/spec files (superseded by vision.md)
- Cleaned up scrapyard and old PDFs

**New Structure**:
- docs/about/ - Project overview and introduction
- docs/guides/ - User guides (moved from root)
- docs/specs/ - Technical specifications
- docs/sdk/ - SDK documentation (Go)
- docs/references/ - API references
- docs/archive/ - Archived historical docs
- applications/aphoria/docs/advanced/ - Advanced topics
- applications/aphoria/docs/reference/ - CLI reference
- applications/aphoria/docs/archive/ - Archived aphoria docs

**Updated**:
- README.md - New root README with clear navigation
- CONTRIBUTING.md - Contribution guidelines
- CLAUDE.md - Updated paths to new structure
- roadmap.md - Added recent completions

## Files Changed
- 57 files changed
- 1,977 insertions(+)
- 961 deletions(-)

**Net change**: +1,016 lines (added CONTRIBUTING.md, README.md, reorganized content)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-11 07:33:40 +00:00

66 lines
3.4 KiB
Markdown

# StemeDB (Episteme) Project Context
## Project Overview
**StemeDB (Episteme)** is a probabilistic, log-structured, content-addressed knowledge graph database designed as the "Cortex" for autonomous AI research agents. Unlike traditional databases that enforce a single mutable state, StemeDB preserves immutable history and resolves conflicting assertions at read-time using "Lenses."
It serves as the "Git for Truth," allowing agents to:
* **Assert** facts with cryptographic signatures and confidence scores.
* **Vote** on assertions to build consensus without lock contention.
* **Fork** reality to simulate "what-if" scenarios (Overlay Graphs).
* **Resolve** truth dynamically via lenses like Consensus, Authority, or Recency.
## Tech Stack
* **Language:** Rust (2024 edition)
* **Durability:** `stemedb-wal` (Quarantine Pattern with `fs2`, `blake3` checksums)
* **Storage:** `stemedb-storage` (`sled` embedded KV, abstracted via `KVStore` trait)
* **Serialization:** `rkyv` (Zero-copy deserialization for high performance)
* **Ingestion:** `stemedb-ingest` (Async background worker bridging WAL and Store)
* **Simulation:** `stemedb-sim` (Agent-based modeling to verify system behavior)
## Architecture
The system follows a "Spine -> Lattice -> Cortex" architecture:
1. **The Spine (Durability):**
* **Write-Ahead Log (WAL):** Append-only log with strict `fsync` guarantees.
* **Ingestor:** Background task that tails the WAL and indexes data.
* **KV Store:** Persistent storage for assertions and indexes.
2. **The Lattice (Connectivity) - *In Progress*:**
* **Ballot Box:** High-velocity vote stream.
* **Materialized Views:** Pre-computed truth states.
3. **The Cortex (Reasoning) - *Planned*:**
* **Lenses:** WASM-based filters for truth resolution.
* **SMT:** Sparse Merkle Trees for efficient branching.
## Key Files & Directories
* `stemedb/`
* `crates/`
* `stemedb-core/`: Core data structures (`Assertion`, `Vote`, `Epoch`) and types.
* `stemedb-wal/`: Durability primitives (`Journal`, `FsyncGuard`, `Record`).
* `stemedb-storage/`: Storage engine abstraction and `sled` implementation.
* `stemedb-ingest/`: Async ingestion pipeline logic.
* `stemedb-sim/`: "The Arena" simulation for end-to-end verification.
* `architecture.md`: Detailed system design and data flow.
* `roadmap.md`: Phased implementation plan and status.
* `docs/sdk/go-usage-guide.md`: Go SDK usage guide and patterns.
* `Makefile`: Build and quality automation.
## Building and Running
The project uses a `Makefile` for common tasks:
* **Build:** `make build` (Compiles the workspace)
* **Test:** `make test` (Runs unit tests across all crates)
* **Quality Check:** `make quality` (Runs fmt, strict clippy linting, duplication checks, and tests)
* **Run Simulation:** `cargo run -p stemedb-sim` (Executes the spine verification simulation)
* **Format:** `make fmt` (Auto-formats code)
## Development Conventions
* **Strict Quality:** `make quality` must pass before committing.
* No `unwrap()` or `expect()` in production code (enforced by clippy).
* Zero warnings allowed.
* Missing documentation is a hard error.
* **Testing:** Every crate must have unit tests. The `stemedb-sim` crate serves as the integration test suite.
* **Architecture:** Follow the "Defensive by Default" philosophy. Durability > Speed > Features.