stemedb/roadmap.md
jordan 3cfaa1e1d3 feat: Complete Phase 1 (The Spine) - storage foundation
Phase 1 delivers the complete durability and storage layer:

- WAL with crash recovery: Append-only journal with BLAKE3 checksums,
  fsync guarantees, and proper seek-to-EOF on reopen
- Storage engine: sled-backed KVStore with scan_prefix for range queries
- Content-addressed storage: H:{hash}, V:{hash}, E:{hash} key patterns
- Ingestor: Background worker tailing WAL, writing to KV with 8-byte
  aligned record headers for rkyv zero-copy deserialization
- Comprehensive tests: 31 tests covering crash recovery, round-trips,
  and multi-cycle durability

New crates: stemedb-wal, stemedb-storage, stemedb-ingest

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 14:15:34 -07:00

103 lines
4.8 KiB
Markdown

# Episteme (StemeDB) Roadmap
> **Goal:** Build the "Git for Truth" substrate for autonomous AI research.
> **Current Phase:** Phase 1 (The Spine)
---
## 📅 High-Level Timeline
| Phase | Codename | Focus | Key Deliverable |
| :--- | :--- | :--- | :--- |
| **1** | **The Spine** | Storage & Safety | Append-only WAL + KV Store |
| **2** | **The Lattice** | Indexing & Async | Materialized Views + Ballot Box |
| **3** | **The Cortex** | Branching & Vectors | SMT Backend + Semantic Search |
| **4** | **The Hive** | Trust & Learning | Dojo + TrustRank |
---
## 🛠 Detailed Milestones
### Phase 1: The Spine (Foundation)
*Goal: Securely ingest assertions and persist them without data loss.*
- [x] **Project Scaffold**: Initialize Rust workspace, set up linting/CI (clippy, fmt).
- [x] **Assertion Schema**: Define the `Assertion` struct with `rkyv` serialization.
- [x] Add dependencies: `rkyv`, `blake3`, `ed25519-dalek`, `image_hasher`.
- [x] Define `Assertion` struct (Subject, Predicate, Object, Confidence, SourceHash).
- [x] **Multi-Sig Expansion**: Implement `SignatureEntry` struct and `signatures: Vec<SignatureEntry>` field.
- [x] **Visual Expansion**: Add `visual_hash: Option<pHash>` field for image provenance.
- [x] Test serialization round-trips.
- [x] **Ballot Schema**: Define the `Vote` struct for multi-agent consensus.
- [x] Add `Vote` struct: `assertion_hash`, `agent_id`, `weight`, `signature`.
- [x] Test serialization round-trips.
- [x] **Paradigm Schema (Epochs)**: Define the `Epoch` and `SupersessionType` structs.
- [x] Add `epoch: Option<EpochId>` to `Assertion`.
- [x] Implement `Epoch` struct with `supersedes` and `SupersessionType`.
- [x] Test serialization round-trips.
- [x] **WAL Integration**: Implement the Quarantine Pattern for write-ahead logging.
- [x] Create `stemedb-wal` crate.
- [x] Port `FsyncGuard` and `Record` logic from established durability patterns.
- [x] Implement Record format with BLAKE3 checksums and Headers.
- [x] Verify `fsync` behavior with tests.
- [x] **Storage Engine**: Implement the `Store` trait using `sled` (embedded KV).
- [x] Add `sled` dependency.
- [x] Define `KVStore` trait (put, get, delete, scan_prefix, flush).
- [x] Implement `SledStore` wrapper.
- [x] **Basic Ingestor**: Background worker that tails WAL and writes to KV.
- [x] Implement async loop reading from WAL.
- [x] Write deserialized assertions, votes, and epochs to `sled`.
- [x] Content-addressed keys using BLAKE3 hash (`H:{hash}`, `V:{hash}`, `E:{hash}`).
- [x] Subject adjacency index (`S:{subject}`).
- [x] **Verification**: Write tests proving crash recovery (write -> crash -> restart -> read).
- [x] WAL-level recovery tests (6 tests in `stemedb-wal/src/recovery.rs`).
- [x] Full pipeline recovery tests (4 tests in `stemedb-ingest/src/worker.rs`).
- [x] Bug fix: Journal now seeks to end after reopening existing WAL file.
### Phase 2: The Lattice (Connectivity)
*Goal: Query data with sub-millisecond latency using Materialized Views.*
- [ ] **The Ballot Box**: Implement high-velocity vote ingestion.
- [ ] `VoteStore` trait and implementation.
- [ ] **Materializer**: Background worker for O(1) Read Performance.
- [ ] Aggregates Votes + TrustRank.
- [ ] Updates `MV:{Subject}:{Predicate}` with the winning Assertion.
- [ ] **The Meter**: Implement Economic Throttling (TAN).
- [ ] Middleware to track Token/Compute cost per Job.
- [ ] Reject requests exceeding `Value of Information`.
- [ ] **Agent Wallet**: Key management sidecar.
- [ ] Securely hold private keys.
- [ ] Auto-sign outgoing Assertions/Votes.
- [ ] **API Surface**: `axum` HTTP server.
- [ ] `POST /assert` -> Accepts JSON, writes to WAL, returns `JobID`.
- [ ] `POST /vote` -> High-throughput endpoint.
- [ ] `GET /query` -> Accepts Subject/Predicate/Lens, returns resolved Assertion.
### Phase 3: The Cortex (Reasoning)
*Goal: Enable semantic search and "What If" scenarios.*
- [ ] **Sparse Merkle Backend**: Implement SMT for O(1) branch creation.
- [ ] **Branching Core**: Implement Overlay Graph logic.
- [ ] **Vector Storage**: Integrate `hnsw-rs` or `lance`.
- [ ] **Semantic Search**: Implement k-NN query support.
### Phase 4: The Hive (Trust & Scale)
*Goal: Turn the database into a training engine.*
- [ ] **The Dojo**: Training Data Pipeline.
- [ ] **Post-Mortem Exporter**: Query `Lens::Skeptic` failures -> Negative Samples.
- [ ] **Golden Path Generator**: Merge events -> Positive Samples.
- [ ] **TrustRank Engine**: Background "Gardener" process.
- [ ] Implement Back-Propagation logic for agent reputation.
- [ ] **Confidence Half-Life**: Implement decay calculation engine.
---
## 🚦 Tracking
### Active Tasks
* **Phase 1 Complete!** Ready to start Phase 2 (The Lattice).
### Blockers
* None.