Phase 1 delivers the complete durability and storage layer:
- WAL with crash recovery: Append-only journal with BLAKE3 checksums,
fsync guarantees, and proper seek-to-EOF on reopen
- Storage engine: sled-backed KVStore with scan_prefix for range queries
- Content-addressed storage: H:{hash}, V:{hash}, E:{hash} key patterns
- Ingestor: Background worker tailing WAL, writing to KV with 8-byte
aligned record headers for rkyv zero-copy deserialization
- Comprehensive tests: 31 tests covering crash recovery, round-trips,
and multi-cycle durability
New crates: stemedb-wal, stemedb-storage, stemedb-ingest
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
103 lines
4.8 KiB
Markdown
103 lines
4.8 KiB
Markdown
# Episteme (StemeDB) Roadmap
|
|
|
|
> **Goal:** Build the "Git for Truth" substrate for autonomous AI research.
|
|
> **Current Phase:** Phase 1 (The Spine)
|
|
|
|
---
|
|
|
|
## 📅 High-Level Timeline
|
|
|
|
| Phase | Codename | Focus | Key Deliverable |
|
|
| :--- | :--- | :--- | :--- |
|
|
| **1** | **The Spine** | Storage & Safety | Append-only WAL + KV Store |
|
|
| **2** | **The Lattice** | Indexing & Async | Materialized Views + Ballot Box |
|
|
| **3** | **The Cortex** | Branching & Vectors | SMT Backend + Semantic Search |
|
|
| **4** | **The Hive** | Trust & Learning | Dojo + TrustRank |
|
|
|
|
---
|
|
|
|
## 🛠 Detailed Milestones
|
|
|
|
### Phase 1: The Spine (Foundation)
|
|
*Goal: Securely ingest assertions and persist them without data loss.*
|
|
|
|
- [x] **Project Scaffold**: Initialize Rust workspace, set up linting/CI (clippy, fmt).
|
|
- [x] **Assertion Schema**: Define the `Assertion` struct with `rkyv` serialization.
|
|
- [x] Add dependencies: `rkyv`, `blake3`, `ed25519-dalek`, `image_hasher`.
|
|
- [x] Define `Assertion` struct (Subject, Predicate, Object, Confidence, SourceHash).
|
|
- [x] **Multi-Sig Expansion**: Implement `SignatureEntry` struct and `signatures: Vec<SignatureEntry>` field.
|
|
- [x] **Visual Expansion**: Add `visual_hash: Option<pHash>` field for image provenance.
|
|
- [x] Test serialization round-trips.
|
|
- [x] **Ballot Schema**: Define the `Vote` struct for multi-agent consensus.
|
|
- [x] Add `Vote` struct: `assertion_hash`, `agent_id`, `weight`, `signature`.
|
|
- [x] Test serialization round-trips.
|
|
- [x] **Paradigm Schema (Epochs)**: Define the `Epoch` and `SupersessionType` structs.
|
|
- [x] Add `epoch: Option<EpochId>` to `Assertion`.
|
|
- [x] Implement `Epoch` struct with `supersedes` and `SupersessionType`.
|
|
- [x] Test serialization round-trips.
|
|
- [x] **WAL Integration**: Implement the Quarantine Pattern for write-ahead logging.
|
|
- [x] Create `stemedb-wal` crate.
|
|
- [x] Port `FsyncGuard` and `Record` logic from established durability patterns.
|
|
- [x] Implement Record format with BLAKE3 checksums and Headers.
|
|
- [x] Verify `fsync` behavior with tests.
|
|
- [x] **Storage Engine**: Implement the `Store` trait using `sled` (embedded KV).
|
|
- [x] Add `sled` dependency.
|
|
- [x] Define `KVStore` trait (put, get, delete, scan_prefix, flush).
|
|
- [x] Implement `SledStore` wrapper.
|
|
- [x] **Basic Ingestor**: Background worker that tails WAL and writes to KV.
|
|
- [x] Implement async loop reading from WAL.
|
|
- [x] Write deserialized assertions, votes, and epochs to `sled`.
|
|
- [x] Content-addressed keys using BLAKE3 hash (`H:{hash}`, `V:{hash}`, `E:{hash}`).
|
|
- [x] Subject adjacency index (`S:{subject}`).
|
|
- [x] **Verification**: Write tests proving crash recovery (write -> crash -> restart -> read).
|
|
- [x] WAL-level recovery tests (6 tests in `stemedb-wal/src/recovery.rs`).
|
|
- [x] Full pipeline recovery tests (4 tests in `stemedb-ingest/src/worker.rs`).
|
|
- [x] Bug fix: Journal now seeks to end after reopening existing WAL file.
|
|
|
|
### Phase 2: The Lattice (Connectivity)
|
|
*Goal: Query data with sub-millisecond latency using Materialized Views.*
|
|
|
|
- [ ] **The Ballot Box**: Implement high-velocity vote ingestion.
|
|
- [ ] `VoteStore` trait and implementation.
|
|
- [ ] **Materializer**: Background worker for O(1) Read Performance.
|
|
- [ ] Aggregates Votes + TrustRank.
|
|
- [ ] Updates `MV:{Subject}:{Predicate}` with the winning Assertion.
|
|
- [ ] **The Meter**: Implement Economic Throttling (TAN).
|
|
- [ ] Middleware to track Token/Compute cost per Job.
|
|
- [ ] Reject requests exceeding `Value of Information`.
|
|
- [ ] **Agent Wallet**: Key management sidecar.
|
|
- [ ] Securely hold private keys.
|
|
- [ ] Auto-sign outgoing Assertions/Votes.
|
|
- [ ] **API Surface**: `axum` HTTP server.
|
|
- [ ] `POST /assert` -> Accepts JSON, writes to WAL, returns `JobID`.
|
|
- [ ] `POST /vote` -> High-throughput endpoint.
|
|
- [ ] `GET /query` -> Accepts Subject/Predicate/Lens, returns resolved Assertion.
|
|
|
|
### Phase 3: The Cortex (Reasoning)
|
|
*Goal: Enable semantic search and "What If" scenarios.*
|
|
|
|
- [ ] **Sparse Merkle Backend**: Implement SMT for O(1) branch creation.
|
|
- [ ] **Branching Core**: Implement Overlay Graph logic.
|
|
- [ ] **Vector Storage**: Integrate `hnsw-rs` or `lance`.
|
|
- [ ] **Semantic Search**: Implement k-NN query support.
|
|
|
|
### Phase 4: The Hive (Trust & Scale)
|
|
*Goal: Turn the database into a training engine.*
|
|
|
|
- [ ] **The Dojo**: Training Data Pipeline.
|
|
- [ ] **Post-Mortem Exporter**: Query `Lens::Skeptic` failures -> Negative Samples.
|
|
- [ ] **Golden Path Generator**: Merge events -> Positive Samples.
|
|
- [ ] **TrustRank Engine**: Background "Gardener" process.
|
|
- [ ] Implement Back-Propagation logic for agent reputation.
|
|
- [ ] **Confidence Half-Life**: Implement decay calculation engine.
|
|
|
|
---
|
|
|
|
## 🚦 Tracking
|
|
|
|
### Active Tasks
|
|
* **Phase 1 Complete!** Ready to start Phase 2 (The Lattice).
|
|
|
|
### Blockers
|
|
* None.
|