# Episteme (StemeDB) Architecture > **Design Philosophy:** Immutable History, Probabilistic Resolution, Materialized Speed. > **Status:** Draft Spec v1.0 ## 1. System Overview Episteme is a **Log-Structured, Content-Addressed Knowledge Graph**. Unlike traditional databases that mutate state in place, Episteme appends **Assertions** to an immutable ledger (Merkle DAG). State resolution happens via **Lenses**. To solve the O(N) read latency of conflict resolution, Episteme employs a **Materialized View** layer that pre-calculates the "Current Truth" for standard lenses. ### High-Level Data Flow ```ascii [Writer Agent] [Reader Agent] │ ▲ │ (1) Sign & │ (6) Sub-millisecond Answer │ Propose │ (Pre-computed) ▼ │ ┌────────────┐ ┌────────────┐ │ Ingestion │ │ Resolution │ │ Gateway │ │ Engine │ └─────┬──────┘ └─────┬──────┘ │ (2) Append │ (5) Read Materialized View │ to Ballot │ OR Apply Custom Lens ▼ │ ┌────────────┐ ┌────────────┐ │ Quarantine │ │ Indexing │ │ Journal │──────► Service │ └─────┬──────┘ (3) └─────┬──────┘ │ │ (4) Compaction & Materialization ▼ ▼ ┌────────────┐ ┌────────────┐ │ Job Manager│ │ Materialized│ └────────────┘ │ Views │ (TAN Meter) └────────────┘ ``` --- ## 2. Core Data Structures ### 2.1. The Atomic Unit: `Assertion` (The Candidate) Assertions are proposals of truth. They are immutable. ```rust struct Assertion { pub subject: EntityId, pub predicate: RelationId, pub object: Value, pub epoch: Option, pub agent_id: PublicKey, // The Proposer pub timestamp: u64, // ... lineage and vector fields ... } ``` ### 2.2. The Ballot Box: `Vote` (The High-Velocity Stream) To prevent lock contention on Assertions, Agents write **Votes** to a separate high-velocity log. ```rust struct Vote { pub assertion_hash: Hash, // What are we voting on? pub agent_id: PublicKey, // Who is voting? pub weight: f32, // 0.0 - 1.0 (Confidence) pub signature: Signature, // Cryptographic proof pub timestamp: u64, } ``` ### 2.3. The Storage Layout (LSM Tree) | Key | Value | Purpose | | :--- | :--- | :--- | | `H:{Hash}` | `Assertion` | Immutable Content Store | | `V:{Hash}` | `List` | The Ballot Box (Append-only) | | `MV:{Subject}:{Predicate}` | `Assertion` | **Materialized View** (The "Winner") | | `S:{Subject}` | `List` | Adjacency Index | --- ## 3. The Write Path (The Ballot Box) 1. **Ingest:** Agents submit `Assertions` or `Votes`. 2. **Journal:** Written to `episteme-wal`. 3. **Ballot Box:** Votes are appended to the `V:{Hash}` stream. 4. **Compactor (Async):** A background worker aggregates Votes + TrustRank to update the `MV:{Subject}:{Predicate}` key. * This ensures that Read queries (`GET /query`) are O(1) lookups on the Materialized View, not O(N) calculations. --- ## 4. The Read Path (The Cortex) **Fast Path (Standard Lenses):** * Query: `GET /query?lens=Consensus` * Action: `GET MV:{Subject}:{Predicate}` * Cost: **O(1)**. Low latency. **Slow Path (Custom/Skeptic Lenses):** * Query: `GET /query?lens=Skeptic` * Action: Gather all candidates + votes, compute variance on the fly. * Cost: **O(N)**. High latency, used for analysis/debugging. ### Standard Lenses * **Consensus:** Highest cluster density. * **Authority:** Filter by Reputation. * **Recency:** Last Writer Wins. * **Skeptic:** Returns variance/conflict metrics. * **EpochAware:** Validates against current paradigm. * **Constraints:** Returns all `must_use`/`forbidden` assertions for a context. Acts as a "Pre-Flight Check" to solve the Optimization Conflict. --- ## 5. The Meter (Economic Safety) To prevent infinite loops, the Job Manager enforces **Temporal Advantage Normalization (TAN)**. * **Budgeting:** Every Job must declare a `max_cost`. * **Throttling:** Forking Reality or Deep Recursion is rejected if `current_cost + projected_cost > max_cost`. --- ## 6. The Simulator (Mid-Training Pipeline) The system continuously exports data to train the next generation of Agents. * **Negative Samples:** High-confidence assertions that were later superseded (Failures). * **Golden Paths:** Branches that successfully merged to Main (Successes). * **Format:** Exported as HuggingFace-compatible datasets for LoRA fine-tuning. --- ## 7. Implementation Roadmap ### Phase 1: The Spine (Foundation) * [ ] Reuse `quarantine-journal` pattern for WAL. * [ ] Implement `Assertion`, `Epoch`, and **`Vote`** structs. * [ ] Basic `sled` storage backend. ### Phase 2: The Lattice (Connectivity) * [ ] **The Ballot Box**: Implement separate Vote storage stream. * [ ] **Materializer**: Implement background worker to maintain `MV` keys. * [ ] **The Meter**: Implement Budget/TAN middleware in Job Manager. * [ ] **Agent Wallet**: Sidecar for key management/signing. ### Phase 3: The Cortex (Reasoning) * [ ] SMT Backend & Branching. * [ ] Vector Search. * [ ] **Lens: Constraints**: Implement the pre-flight check logic. ### Phase 4: The Hive (Learning) * [ ] **The Simulator**: Log exporter pipeline. * [ ] TrustRank Learning Loop.