Phase 1 delivers the complete durability and storage layer:
- WAL with crash recovery: Append-only journal with BLAKE3 checksums,
fsync guarantees, and proper seek-to-EOF on reopen
- Storage engine: sled-backed KVStore with scan_prefix for range queries
- Content-addressed storage: H:{hash}, V:{hash}, E:{hash} key patterns
- Ingestor: Background worker tailing WAL, writing to KV with 8-byte
aligned record headers for rkyv zero-copy deserialization
- Comprehensive tests: 31 tests covering crash recovery, round-trips,
and multi-cycle durability
New crates: stemedb-wal, stemedb-storage, stemedb-ingest
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
4.8 KiB
4.8 KiB
Episteme (StemeDB) Roadmap
Goal: Build the "Git for Truth" substrate for autonomous AI research. Current Phase: Phase 1 (The Spine)
📅 High-Level Timeline
| Phase | Codename | Focus | Key Deliverable |
|---|---|---|---|
| 1 | The Spine | Storage & Safety | Append-only WAL + KV Store |
| 2 | The Lattice | Indexing & Async | Materialized Views + Ballot Box |
| 3 | The Cortex | Branching & Vectors | SMT Backend + Semantic Search |
| 4 | The Hive | Trust & Learning | Dojo + TrustRank |
🛠 Detailed Milestones
Phase 1: The Spine (Foundation)
Goal: Securely ingest assertions and persist them without data loss.
- Project Scaffold: Initialize Rust workspace, set up linting/CI (clippy, fmt).
- Assertion Schema: Define the
Assertionstruct withrkyvserialization.- Add dependencies:
rkyv,blake3,ed25519-dalek,image_hasher. - Define
Assertionstruct (Subject, Predicate, Object, Confidence, SourceHash). - Multi-Sig Expansion: Implement
SignatureEntrystruct andsignatures: Vec<SignatureEntry>field. - Visual Expansion: Add
visual_hash: Option<pHash>field for image provenance. - Test serialization round-trips.
- Add dependencies:
- Ballot Schema: Define the
Votestruct for multi-agent consensus.- Add
Votestruct:assertion_hash,agent_id,weight,signature. - Test serialization round-trips.
- Add
- Paradigm Schema (Epochs): Define the
EpochandSupersessionTypestructs.- Add
epoch: Option<EpochId>toAssertion. - Implement
Epochstruct withsupersedesandSupersessionType. - Test serialization round-trips.
- Add
- WAL Integration: Implement the Quarantine Pattern for write-ahead logging.
- Create
stemedb-walcrate. - Port
FsyncGuardandRecordlogic from established durability patterns. - Implement Record format with BLAKE3 checksums and Headers.
- Verify
fsyncbehavior with tests.
- Create
- Storage Engine: Implement the
Storetrait usingsled(embedded KV).- Add
sleddependency. - Define
KVStoretrait (put, get, delete, scan_prefix, flush). - Implement
SledStorewrapper.
- Add
- Basic Ingestor: Background worker that tails WAL and writes to KV.
- Implement async loop reading from WAL.
- Write deserialized assertions, votes, and epochs to
sled. - Content-addressed keys using BLAKE3 hash (
H:{hash},V:{hash},E:{hash}). - Subject adjacency index (
S:{subject}).
- Verification: Write tests proving crash recovery (write -> crash -> restart -> read).
- WAL-level recovery tests (6 tests in
stemedb-wal/src/recovery.rs). - Full pipeline recovery tests (4 tests in
stemedb-ingest/src/worker.rs). - Bug fix: Journal now seeks to end after reopening existing WAL file.
- WAL-level recovery tests (6 tests in
Phase 2: The Lattice (Connectivity)
Goal: Query data with sub-millisecond latency using Materialized Views.
- The Ballot Box: Implement high-velocity vote ingestion.
VoteStoretrait and implementation.
- Materializer: Background worker for O(1) Read Performance.
- Aggregates Votes + TrustRank.
- Updates
MV:{Subject}:{Predicate}with the winning Assertion.
- The Meter: Implement Economic Throttling (TAN).
- Middleware to track Token/Compute cost per Job.
- Reject requests exceeding
Value of Information.
- Agent Wallet: Key management sidecar.
- Securely hold private keys.
- Auto-sign outgoing Assertions/Votes.
- API Surface:
axumHTTP server.POST /assert-> Accepts JSON, writes to WAL, returnsJobID.POST /vote-> High-throughput endpoint.GET /query-> Accepts Subject/Predicate/Lens, returns resolved Assertion.
Phase 3: The Cortex (Reasoning)
Goal: Enable semantic search and "What If" scenarios.
- Sparse Merkle Backend: Implement SMT for O(1) branch creation.
- Branching Core: Implement Overlay Graph logic.
- Vector Storage: Integrate
hnsw-rsorlance. - Semantic Search: Implement k-NN query support.
Phase 4: The Hive (Trust & Scale)
Goal: Turn the database into a training engine.
- The Dojo: Training Data Pipeline.
- Post-Mortem Exporter: Query
Lens::Skepticfailures -> Negative Samples. - Golden Path Generator: Merge events -> Positive Samples.
- Post-Mortem Exporter: Query
- TrustRank Engine: Background "Gardener" process.
- Implement Back-Propagation logic for agent reputation.
- Confidence Half-Life: Implement decay calculation engine.
🚦 Tracking
Active Tasks
- Phase 1 Complete! Ready to start Phase 2 (The Lattice).
Blockers
- None.