stemedb/roadmap.md
jordan 3cfaa1e1d3 feat: Complete Phase 1 (The Spine) - storage foundation
Phase 1 delivers the complete durability and storage layer:

- WAL with crash recovery: Append-only journal with BLAKE3 checksums,
  fsync guarantees, and proper seek-to-EOF on reopen
- Storage engine: sled-backed KVStore with scan_prefix for range queries
- Content-addressed storage: H:{hash}, V:{hash}, E:{hash} key patterns
- Ingestor: Background worker tailing WAL, writing to KV with 8-byte
  aligned record headers for rkyv zero-copy deserialization
- Comprehensive tests: 31 tests covering crash recovery, round-trips,
  and multi-cycle durability

New crates: stemedb-wal, stemedb-storage, stemedb-ingest

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 14:15:34 -07:00

4.8 KiB

Episteme (StemeDB) Roadmap

Goal: Build the "Git for Truth" substrate for autonomous AI research. Current Phase: Phase 1 (The Spine)


📅 High-Level Timeline

Phase Codename Focus Key Deliverable
1 The Spine Storage & Safety Append-only WAL + KV Store
2 The Lattice Indexing & Async Materialized Views + Ballot Box
3 The Cortex Branching & Vectors SMT Backend + Semantic Search
4 The Hive Trust & Learning Dojo + TrustRank

🛠 Detailed Milestones

Phase 1: The Spine (Foundation)

Goal: Securely ingest assertions and persist them without data loss.

  • Project Scaffold: Initialize Rust workspace, set up linting/CI (clippy, fmt).
  • Assertion Schema: Define the Assertion struct with rkyv serialization.
    • Add dependencies: rkyv, blake3, ed25519-dalek, image_hasher.
    • Define Assertion struct (Subject, Predicate, Object, Confidence, SourceHash).
    • Multi-Sig Expansion: Implement SignatureEntry struct and signatures: Vec<SignatureEntry> field.
    • Visual Expansion: Add visual_hash: Option<pHash> field for image provenance.
    • Test serialization round-trips.
  • Ballot Schema: Define the Vote struct for multi-agent consensus.
    • Add Vote struct: assertion_hash, agent_id, weight, signature.
    • Test serialization round-trips.
  • Paradigm Schema (Epochs): Define the Epoch and SupersessionType structs.
    • Add epoch: Option<EpochId> to Assertion.
    • Implement Epoch struct with supersedes and SupersessionType.
    • Test serialization round-trips.
  • WAL Integration: Implement the Quarantine Pattern for write-ahead logging.
    • Create stemedb-wal crate.
    • Port FsyncGuard and Record logic from established durability patterns.
    • Implement Record format with BLAKE3 checksums and Headers.
    • Verify fsync behavior with tests.
  • Storage Engine: Implement the Store trait using sled (embedded KV).
    • Add sled dependency.
    • Define KVStore trait (put, get, delete, scan_prefix, flush).
    • Implement SledStore wrapper.
  • Basic Ingestor: Background worker that tails WAL and writes to KV.
    • Implement async loop reading from WAL.
    • Write deserialized assertions, votes, and epochs to sled.
    • Content-addressed keys using BLAKE3 hash (H:{hash}, V:{hash}, E:{hash}).
    • Subject adjacency index (S:{subject}).
  • Verification: Write tests proving crash recovery (write -> crash -> restart -> read).
    • WAL-level recovery tests (6 tests in stemedb-wal/src/recovery.rs).
    • Full pipeline recovery tests (4 tests in stemedb-ingest/src/worker.rs).
    • Bug fix: Journal now seeks to end after reopening existing WAL file.

Phase 2: The Lattice (Connectivity)

Goal: Query data with sub-millisecond latency using Materialized Views.

  • The Ballot Box: Implement high-velocity vote ingestion.
    • VoteStore trait and implementation.
  • Materializer: Background worker for O(1) Read Performance.
    • Aggregates Votes + TrustRank.
    • Updates MV:{Subject}:{Predicate} with the winning Assertion.
  • The Meter: Implement Economic Throttling (TAN).
    • Middleware to track Token/Compute cost per Job.
    • Reject requests exceeding Value of Information.
  • Agent Wallet: Key management sidecar.
    • Securely hold private keys.
    • Auto-sign outgoing Assertions/Votes.
  • API Surface: axum HTTP server.
    • POST /assert -> Accepts JSON, writes to WAL, returns JobID.
    • POST /vote -> High-throughput endpoint.
    • GET /query -> Accepts Subject/Predicate/Lens, returns resolved Assertion.

Phase 3: The Cortex (Reasoning)

Goal: Enable semantic search and "What If" scenarios.

  • Sparse Merkle Backend: Implement SMT for O(1) branch creation.
  • Branching Core: Implement Overlay Graph logic.
  • Vector Storage: Integrate hnsw-rs or lance.
  • Semantic Search: Implement k-NN query support.

Phase 4: The Hive (Trust & Scale)

Goal: Turn the database into a training engine.

  • The Dojo: Training Data Pipeline.
    • Post-Mortem Exporter: Query Lens::Skeptic failures -> Negative Samples.
    • Golden Path Generator: Merge events -> Positive Samples.
  • TrustRank Engine: Background "Gardener" process.
    • Implement Back-Propagation logic for agent reputation.
    • Confidence Half-Life: Implement decay calculation engine.

🚦 Tracking

Active Tasks

  • Phase 1 Complete! Ready to start Phase 2 (The Lattice).

Blockers

  • None.