# Episteme (StemeDB) Architecture > **Design Philosophy:** Immutable History, Probabilistic Resolution, Materialized Speed. > **Status:** Draft Spec v1.1 ## 1. System Overview Episteme is a **Log-Structured, Content-Addressed Knowledge Graph**. Unlike traditional databases that mutate state in place, Episteme appends **Assertions** to an immutable ledger (Merkle DAG). State resolution happens via **Lenses**. > **Caveat:** Aphoria's scan observations flow through this append-only path today. Aphoria's authored claims (`AuthoredClaim`) do not -- they are stored in a mutable TOML file (`.aphoria/claims.toml`) and bypass the WAL/Merkle DAG entirely. Routing claims through StemeDB as proper Assertions is a planned gap closure. To solve the O(N) read latency of conflict resolution, Episteme employs a **Materialized View** layer that pre-calculates the "Current Truth" for standard lenses. ### High-Level Data Flow ```ascii [Writer Agent] [Reader Agent] │ ▲ │ (1) Sign & │ (6) Sub-millisecond Answer │ Propose │ (Pre-computed) ▼ │ ┌────────────┐ ┌────────────┐ │ Ingestion │ │ Resolution │ │ Gateway │ │ Engine │ └─────┬──────┘ └─────┬──────┘ │ (2) Append │ (5) Apply Lens + Trust Pack │ to Ballot │ (BitSet Filter) ▼ │ ┌────────────┐ ┌────────────┐ │ Quarantine │ │ Indexing │ │ Journal │──────► Service │ └─────┬──────┘ (3) └─────┬──────┘ │ │ (4) Compaction & Materialization ▼ ▼ ┌────────────┐ ┌────────────┐ │ Job Manager│ │ Materialized│ └────────────┘ │ Views │ (TAN Meter) └────────────┘ ``` --- ## 2. Core Data Structures ### 2.1. The Atomic Unit: `Assertion` (The Candidate) Assertions are proposals of truth. They are immutable. ```rust struct Assertion { pub subject: EntityId, pub predicate: RelationId, pub object: Value, pub epoch: Option, pub agent_id: PublicKey, // The Proposer pub timestamp: u64, // ... lineage and vector fields ... } ``` ### 2.2. The Ballot Box: `Vote` (The High-Velocity Stream) To prevent lock contention on Assertions, Agents write **Votes** to a separate high-velocity log. ```rust struct Vote { pub assertion_hash: Hash, // What are we voting on? pub agent_id: PublicKey, // Who is voting? pub weight: f32, // 0.0 - 1.0 (Confidence) pub signature: Signature, // Cryptographic proof pub timestamp: u64, } ``` ### 2.3. The Trust Pack (The Overlay) A curated list of trusted agents, used to filter consensus efficiently. ```rust struct TrustPack { pub id: PackId, pub name: String, pub maintainer: PublicKey, pub agents: BitSet, // BloomFilter or RoaringBitmap for fast intersection } ``` ### 2.4. The Storage Layout (LSM Tree) | Key | Value | Purpose | | :--- | :--- | :--- | | `H:{Hash}` | `Assertion` | Immutable Content Store | | `V:{Hash}` | `List` | The Ballot Box (Append-only) | | `MV:{Subject}:{Predicate}` | `Assertion` | **Materialized View** (The "Winner") | | `TP:{PackID}` | `TrustPack` | Curation Lists | | `S:{Subject}` | `List` | Adjacency Index | --- ## 3. The Write Path (The Ballot Box) 1. **Ingest:** Agents submit `Assertions` or `Votes`. 2. **Journal:** Written to `episteme-wal`. 3. **Ballot Box:** Votes are appended to the `V:{Hash}` stream. 4. **Compactor (Async):** A background worker aggregates Votes + TrustRank to update the `MV` key. --- ## 4. The Read Path (The Cortex) **Fast Path (Standard Lenses):** * Query: `GET /query?lens=Consensus` * Action: `GET MV:{Subject}:{Predicate}` * Cost: **O(1)**. Low latency. **Trusted Path (Trust Packs):** * Query: `GET /query?lens=Authority&trust_pack=Science_Pack` * Action: 1. Fetch Candidate Assertions. 2. Fetch Votes. 3. **Filter:** Intersect Votes with `TrustPack.agents` (BitSet operation). 4. Sum weights of remaining votes. * Cost: **O(1)** (if Materialized per Pack) or **O(M)** (Fast calculation). ### Standard Lenses * **Consensus:** Highest cluster density. * **Authority:** Filter by **Trust Pack**. * **Recency:** Last Writer Wins. * **EpochAware:** Validates against current paradigm. * **Constraints:** (New) Returns all `must_use`/`forbidden` assertions for a context. Acts as a "Pre-Flight Check." --- ## 5. The Meter (Economic Safety) To prevent infinite loops, the Job Manager enforces **Temporal Advantage Normalization (TAN)**. * **Budgeting:** Every Job must declare a `max_cost`. * **Throttling:** Forking Reality or Deep Recursion is rejected if `current_cost + projected_cost > max_cost`. --- ## 6. The Simulator (Mid-Training Pipeline) The system continuously exports data to train the next generation of Agents. * **Negative Samples:** High-confidence assertions that were later superseded (Failures). * **Golden Paths:** Branches that successfully merged to Main (Successes). * **Format:** Exported as HuggingFace-compatible datasets for LoRA fine-tuning. --- ## 7. Implementation Roadmap ### Phase 1: The Spine (Foundation) * [ ] Reuse `quarantine-journal` pattern for WAL. * [ ] Implement `Assertion`, `Epoch`, and **`Vote`** structs. * [ ] Basic `sled` storage backend. ### Phase 2: The Lattice (Connectivity) * [ ] **The Ballot Box**: Implement separate Vote storage stream. * [ ] **Materializer**: Implement background worker to maintain `MV` keys. * [ ] **Trust Packs**: Implement BitSet/BloomFilter logic for agent sets. * [ ] **The Meter**: Implement Budget/TAN middleware in Job Manager. * [ ] **Agent Wallet**: Sidecar for key management/signing. ### Phase 3: The Cortex (Reasoning) * [ ] SMT Backend & Branching. * [ ] Vector Search. * [ ] **Lens: Constraints**: Implement the pre-flight check logic. ### Phase 4: The Hive (Learning) * [ ] **The Simulator**: Log exporter pipeline. * [ ] **Trust Marketplace**: API for publishing/subscribing to Trust Packs. * [ ] **The Super Curator**: Implement "Judge" agent with Visual Anchoring.