# Episteme (StemeDB) Roadmap > **Goal:** Build the "Git for Truth" substrate for autonomous AI research. > **Current Phase:** Phase 1 (The Spine) --- ## 📅 High-Level Timeline | Phase | Codename | Focus | Key Deliverable | | :--- | :--- | :--- | :--- | | **1** | **The Spine** | Storage & Safety | Append-only WAL + KV Store | | **2** | **The Lattice** | Indexing & Async | Materialized Views + Ballot Box | | **3** | **The Cortex** | Branching & Vectors | SMT Backend + Semantic Search | | **4** | **The Hive** | Trust & Learning | Dojo + TrustRank | --- ## 🛠 Detailed Milestones ### Phase 1: The Spine (Foundation) *Goal: Securely ingest assertions and persist them without data loss.* - [x] **Project Scaffold**: Initialize Rust workspace, set up linting/CI (clippy, fmt). - [x] **Assertion Schema**: Define the `Assertion` struct with `rkyv` serialization. - [x] Add dependencies: `rkyv`, `blake3`, `ed25519-dalek`, `image_hasher`. - [x] Define `Assertion` struct (Subject, Predicate, Object, Confidence, SourceHash). - [x] **Multi-Sig Expansion**: Implement `SignatureEntry` struct and `signatures: Vec` field. - [x] **Visual Expansion**: Add `visual_hash: Option` field for image provenance. - [x] Test serialization round-trips. - [x] **Ballot Schema**: Define the `Vote` struct for multi-agent consensus. - [x] Add `Vote` struct: `assertion_hash`, `agent_id`, `weight`, `signature`. - [x] Test serialization round-trips. - [x] **Paradigm Schema (Epochs)**: Define the `Epoch` and `SupersessionType` structs. - [x] Add `epoch: Option` to `Assertion`. - [x] Implement `Epoch` struct with `supersedes` and `SupersessionType`. - [x] Test serialization round-trips. - [x] **WAL Integration**: Implement the Quarantine Pattern for write-ahead logging. - [x] Create `stemedb-wal` crate. - [x] Port `FsyncGuard` and `Record` logic from established durability patterns. - [x] Implement Record format with BLAKE3 checksums and Headers. - [x] Verify `fsync` behavior with tests. - [x] **Storage Engine**: Implement the `Store` trait using `sled` (embedded KV). - [x] Add `sled` dependency. - [x] Define `KVStore` trait (put, get, delete, scan_prefix, flush). - [x] Implement `SledStore` wrapper. - [x] **Basic Ingestor**: Background worker that tails WAL and writes to KV. - [x] Implement async loop reading from WAL. - [x] Write deserialized assertions, votes, and epochs to `sled`. - [x] Content-addressed keys using BLAKE3 hash (`H:{hash}`, `V:{hash}`, `E:{hash}`). - [x] Subject adjacency index (`S:{subject}`). - [x] **Verification**: Write tests proving crash recovery (write -> crash -> restart -> read). - [x] WAL-level recovery tests (6 tests in `stemedb-wal/src/recovery.rs`). - [x] Full pipeline recovery tests (4 tests in `stemedb-ingest/src/worker.rs`). - [x] Bug fix: Journal now seeks to end after reopening existing WAL file. ### Phase 2: The Lattice (Connectivity) *Goal: Query data with sub-millisecond latency using Materialized Views.* - [ ] **The Ballot Box**: Implement high-velocity vote ingestion. - [ ] `VoteStore` trait and implementation. - [ ] **Materializer**: Background worker for O(1) Read Performance. - [ ] Aggregates Votes + TrustRank. - [ ] Updates `MV:{Subject}:{Predicate}` with the winning Assertion. - [ ] **The Meter**: Implement Economic Throttling (TAN). - [ ] Middleware to track Token/Compute cost per Job. - [ ] Reject requests exceeding `Value of Information`. - [ ] **Agent Wallet**: Key management sidecar. - [ ] Securely hold private keys. - [ ] Auto-sign outgoing Assertions/Votes. - [ ] **API Surface**: `axum` HTTP server. - [ ] `POST /assert` -> Accepts JSON, writes to WAL, returns `JobID`. - [ ] `POST /vote` -> High-throughput endpoint. - [ ] `GET /query` -> Accepts Subject/Predicate/Lens, returns resolved Assertion. ### Phase 3: The Cortex (Reasoning) *Goal: Enable semantic search and "What If" scenarios.* - [ ] **Sparse Merkle Backend**: Implement SMT for O(1) branch creation. - [ ] **Branching Core**: Implement Overlay Graph logic. - [ ] **Vector Storage**: Integrate `hnsw-rs` or `lance`. - [ ] **Semantic Search**: Implement k-NN query support. ### Phase 4: The Hive (Trust & Scale) *Goal: Turn the database into a training engine.* - [ ] **The Dojo**: Training Data Pipeline. - [ ] **Post-Mortem Exporter**: Query `Lens::Skeptic` failures -> Negative Samples. - [ ] **Golden Path Generator**: Merge events -> Positive Samples. - [ ] **TrustRank Engine**: Background "Gardener" process. - [ ] Implement Back-Propagation logic for agent reputation. - [ ] **Confidence Half-Life**: Implement decay calculation engine. --- ## 🚦 Tracking ### Active Tasks * **Phase 1 Complete!** Ready to start Phase 2 (The Lattice). ### Blockers * None.