# Episteme (StemeDB) Architecture > **Design Philosophy:** Immutable History, Probabilistic Resolution. > **Status:** Draft Spec v0.1 ## 1. System Overview Episteme is a **Log-Structured, Content-Addressed Knowledge Graph**. Unlike traditional databases that mutate state in place, Episteme appends **Assertions** to an immutable ledger (Merkle DAG). State resolution happens at read-time via **Lenses**. ### High-Level Data Flow ```ascii [Writer Agent] [Reader Agent] │ ▲ │ (1) Sign & │ (5) Deterministic Answer │ Propose │ (Confidence: 0.92) ▼ │ ┌────────────┐ ┌────────────┐ │ Ingestion │ │ Resolution │ │ Gateway │ │ Engine │ └─────┬──────┘ └─────┬──────┘ │ (2) Append │ (4) Apply Lens (Filter/Rank) │ to WAL │ ▼ │ ┌────────────┐ ┌────────────┐ │ Quarantine │ │ Indexing │ │ Journal │──────► Service │ └────────────┘ (3) └────────────┘ (Durability) (Graph/Vector) ``` --- ## 2. Core Data Structures ### 2.1. The Atomic Unit: `Assertion` Everything in Episteme is an Assertion. There are no "Tables." ```rust // The immutable payload (Content-Addressed by Hash) struct Assertion { // 1. The Triple (The Fact) pub subject: EntityId, // "Tesla_Inc" pub predicate: RelationId, // "has_revenue" pub object: Value, // Variant: Float(10.5B), String("Musk"), Ref(EntityId) // 2. The Lineage (The Chain) pub parent_hash: Option, // If modifying a previous claim (Forking) pub source_hash: Hash, // Evidence pointer (PDF/Log hash) // 3. The Meta-Cognition (The Weight) pub agent_id: PublicKey, // Ed25519 signature pub confidence: f32, // 0.0 - 1.0 (Subjective certainty) pub timestamp: u64, // Wall clock time pub vector: Option>,// Semantic embedding (for fuzzy recall) } ``` ### 2.2. The Storage Layout (LSM Tree) We use a Key-Value store (e.g., `sled` or `RocksDB`) to persist the DAG. | Key | Value | Purpose | | :--- | :--- | :--- | | `H:{Hash}` | `Serialized` | Main content store | | `S:{Subject}` | `List` | Subject-to-Claims Index | | `SP:{Subject}:{Predicate}` | `List` | Exact Triple Index | | `A:{AgentID}` | `ReputationScore` | TrustRank storage | --- ## 3. The Write Path (The Spine) Episteme follows the **Quarantine Pattern** for durability. 1. **Receive:** Agent submits a signed `Assertion`. 2. **Verify:** Check signature validity and structure. 3. **Journal:** Write to `episteme-wal` (Append-only file, fsync immediate). 4. **Acknowledge:** Return `202 Accepted` to Agent with the new `Hash`. 5. **Index (Async):** A background worker tails the WAL: * Deserializes the Assertion. * Updates the `H:{Hash}` store. * Appends `Hash` to the `S:{Subject}` adjacency list. * Updates HNSW vector index (if vector present). --- ## 4. The Read Path (The Cortex) Reading is where Episteme differs from every other DB. A Read is a **Compute Operation**. **Query:** `GET(Subject="Tesla", Predicate="Revenue", Lens="Consensus")` 1. **Gather:** Lookup `SP:Tesla:Revenue`. Get list of candidate Hashes: `[H1, H2, H3, H4]`. 2. **Hydrate:** Fetch full Assertions for each Hash. 3. **Resolve (The Lens):** Pass candidates through the Lens pipeline. ### The Lens Pipeline (Rust Trait) ```rust trait Lens { fn resolve(&self, candidates: Vec, context: Context) -> LensResult; } // Example: Consensus Lens Logic // 1. Group candidates by Object value (clustering). // 2. Sum the TrustRank of Agents in each cluster. // 3. Return the cluster with highest weighted mass. ``` --- ## 5. Advanced Mechanics ### 5.1. Forking Reality (Branching) Branching is handled via **Overlay Graphs**. * A `Branch` is simply a lightweight index (Map) of `Hash -> Assertion`. * **Write to Branch:** Assertions are stored in the Branch's ephemeral index, not the Global DAG. * **Read from Branch:** The Query Engine checks the Branch index *first*, then falls back to Global (Overlay pattern). * **Merge:** Commit the Branch's unique assertions to the Global WAL. ### 5.2. TrustRank (Reputation) Background worker (`episteme-gardener`) runs periodically: 1. Identifies "Settled Facts" (Assertions with >99% consensus over T time). 2. Rewards Agents who claimed these facts *early*. 3. Punishes Agents who claimed the opposite. 4. Updates `A:{AgentID}` reputation scores. --- ## 6. Implementation Roadmap ### Phase 1: The Skeleton (MVP) * [ ] Reuse `quarantine-journal` pattern for WAL. * [ ] Implement `Assertion` struct and serialization (`rkyv`). * [ ] Basic `sled` storage backend. * [ ] Single Lens: `Recency` (Last writer wins logic). ### Phase 2: The Graph * [ ] Implement `Subject -> Hash` indexing. * [ ] Implement `Consensus` Lens (Simple voting). * [ ] Basic HTTP API (`POST /assert`, `GET /query`). ### Phase 3: The Cortex * [ ] Branching support (Context/Session IDs). * [ ] Vector search integration (`lanms` or `hnsw-rs`). * [ ] TrustRank basics. --- ## 7. Technology Stack * **Language:** Rust (2024 edition) * **WAL:** `quarantine-journal` (Local crate or pattern) * **KV Store:** `sled` (Embedded, pure Rust) or `rocksdb` binding. * **Serialization:** `rkyv` (Zero-copy deserialization). * **API:** `axum` + `tower`. * **Hashing:** `blake3` (Fast, secure).