# Episteme: The Probabilistic Knowledge Lattice > **Internal Codename:** StemeDB > **Category:** Infrastructure / Database > **Role:** The Cortex (Reasoning & Truth) ## 1. The Manifesto: "Git for Truth" We are building the shared, long-term memory for autonomous research agents. Current databases (Postgres, Neo4j, Vector DBs) suffer from **The Tower of Babel** problem: they store *Data*, not *Evidence*. They are deterministic, stateless, and brittle. If an Agent writes `Revenue = $10M` and another writes `Revenue = $12M`, one must overwrite the other. History is lost. Truth is flattened. **Episteme** rejects the idea of a single, static "database state." Instead, it models knowledge as a **Probabilistic Lattice of Assertions**. * We do not store "Facts." * We store "Claims." * We do not "Update" records. * We "Append" new evidence. * We do not query "The Truth." * We query through "Lenses" (Consensus, Recency, Authority). ## 2. The Core Data Model: The Hyper-Edge The atomic unit of Episteme is not a Row, Document, or Embedding. It is the **Signed Assertion**. ```rust struct Assertion { // The Proposition (The "What") subject: EntityId, // e.g., "Tesla_Inc" predicate: RelationId, // "has_annual_revenue" object: Value, // e.g., "$96.7B" // The Meta-Cognition (The "Why") confidence: f32, // 0.0 to 1.0 (Agent's subjective certainty) source_hash: Hash, // Content-addressed link to source (PDF, URL, Log) agent_id: PublicKey, // Who made this claim? (Cryptographic signature) timestamp: u64, // When? // The Semantic Vector (The "Meaning") vector: Vec, // Embedding for semantic navigation } ``` ### 2.1. Non-Destructive Writes Episteme is an **Append-Only Merkle DAG**. * **Conflict is a Feature:** If Agent A claims X, and Agent B claims Y, the database holds *both* realities simultaneously. * **Traceability:** Every assertion links back to its parent (if it modifies/refutes a previous claim) and its source (evidence). ## 3. The Query Engine: "Truth Lenses" Because the database holds conflicting realities, "Reading" is a compute-heavy operation. You cannot just `GET key`. You must apply a **Lens**. A **Lens** is a compiled WASM filter that resolves the probability field into a concrete answer at Read Time. ### Standard Lenses 1. **Lens::Consensus:** "Return the value with the highest cluster density across all agents." (Democratic Truth) 2. **Lens::Authority:** "Return values signed by Agents with `Reputation > 900`." (Expert Truth) 3. **Lens::Recency:** "Return the latest assertion, ignoring history." (News) 4. **Lens::Skeptic:** "Return the *variance* between claims." (Finds controversy/ambiguity) ## 4. Features for the AI Scientist ### 4.1. "Forking Reality" (Branching) Agents need to simulate futures ("What if inflation hits 5%?"). Episteme supports **Copy-on-Write Branching**. * An Agent creates a `Scenario Branch`. * It inserts hypothetical assertions (`Inflation = 5%`). * It queries for 2nd-order effects. * The Main Branch remains unpolluted. ### 4.2. TrustRank (Reputation Markets) We implement a recursive PageRank-style algorithm for **Source Credibility**. 1. **Validation:** If an Agent's claim is later verified by Ground Truth (e.g., an earnings call), their Reputation Score (`R`) increases. 2. **Back-Propagation:** High-`R` agents confer weight to the sources they cite. 3. **Decay:** Claims from low-`R` agents fade faster from the "Hot" tier. ## 5. Architecture: The Rust Stack Episteme follows the **"Defensive by Default"** best practices. ### Tier 1: The Spine (Durability) * **Component:** `episteme-wal` (Implementing the Quarantine Journal pattern) * **Role:** Raw, serialized append-only log. Ensures we never lose a claim. * **Format:** Binary `Record` with BLAKE3 checksums. ### Tier 2: The Lattice (Graph/Index) * **Component:** `episteme-core` * **Role:** The Hot/Warm memory. * **Hot Tier:** `DashMap` of active contradiction clusters. * **Warm Tier:** `sled` (LSM Tree) for the Merkle DAG + `hnsw` for vector search. ### Tier 3: The Cortex (Compute) * **Component:** `episteme-lens` * **Role:** The WASM runtime for executing Lenses. * **Function:** Collapses the probabilistic graph into deterministic answers for the client. ## 6. The Ecosystem Triad Episteme completes the Intelligence Stack: | System | Biological Analogy | Function | Question Answered | | :--- | :--- | :--- | :--- | | **LogDB** | **The Spine** | Immutable Event Log | "What happened?" | | **AssociativeDB** | **The Hippocampus** | Associative Memory | "What is this like?" | | **Episteme** | **The Cortex** | Structured Reasoning | "Is this true?" |