# Episteme: Git for Truth > **Internal Codename:** StemeDB > **Category:** Infrastructure / Probabilistic Knowledge Database > **Role:** The shared memory for AI research agents that disagree ## The Problem: Databases Force False Certainty Current databases (Postgres, Neo4j, Vector DBs) suffer from **The Tower of Babel** problem: they store _Data_, not _Evidence_. They are deterministic, stateless, and brittle. When multiple agents observe the world and report different things, traditional databases force you to: - **Pick a winner** (losing the disagreement) - **Version-table chaos** (complexity explodes) - **Application logic everywhere** (authority weighting, decay, cascades) **Real example:** A woman researching Semaglutide found her doctor saying "well-tolerated" while Reddit flagged gastroparesis months before the FDA added the warning. She had no way to weigh these sources structurally. The Reddit signal was right. The system failed her. ## The Solution: Store Claims, Resolve at Read Time Episteme rejects the idea of a single, static "database state." Instead, it models knowledge as a **Probabilistic Marketplace**: - **Assertions are immutable.** Every claim is signed, timestamped, and preserved forever. > **Status: PARTIALLY IMPLEMENTED.** StemeDB Assertions follow this model. Aphoria's `AuthoredClaim` entries are currently stored in a mutable TOML file (`.aphoria/claims.toml`), not yet routed through the append-only DAG. Bridging claims into StemeDB as Assertions is tracked in the gap closure plan. - **Contradictions coexist.** The database holds disagreement without forcing resolution. - **Lenses resolve at query time.** Different readers can apply different resolution strategies. - **Source authority is structural.** A regulatory filing outweighs a Reddit post by design. ## The Four Pillars Every use case must demonstrate at least one pillar. If Postgres could do it, it's not a compelling use case. | Pillar | What It Enables | Postgres Gap | | ----------------------------- | -------------------------------------------------- | ---------------------------------------------- | | **First-Class Contradiction** | Hold conflicting facts without forcing resolution | Must pick one value or version-table chaos | | **Invalidation Cascades** | Retracted evidence flags all downstream decisions | Recursive CTEs don't scale, app logic drifts | | **Multi-Signature Consensus** | Weighted trust via cryptographic co-signatures | Join tables have no cryptographic proof | | **Semantic Decay** | Old data fades from hot path but remains auditable | Manual WHERE clauses, inconsistent decay rates | ## The Core Data Model: The Signed Assertion The atomic unit is not a Row, Document, or Embedding. It is the **Signed Assertion**: ```rust struct Assertion { // The Proposition (What is being claimed) subject: EntityId, // "semaglutide", "Tesla_Inc" predicate: RelationId, // "has_side_effect", "annual_revenue" object: ObjectValue, // "gastroparesis", "$96.7B" // The Lineage (Why we believe it) source_hash: Hash, // Content-addressed link to source document source_class: SourceClass, // Authority tier (0=Regulatory...5=Anecdotal) source_metadata: Option, // Rich provenance (journal, DOI, etc.) visual_hash: Option, // Perceptual hash for image provenance epoch: Option, // Paradigm context ("covid-2020", "gaap-2024") // The Meta-Cognition (Who said it, how confident) signatures: Vec, // Ed25519 cryptographic proofs confidence: f32, // 0.0-1.0 subjective certainty timestamp: u64, // When created lifecycle: LifecycleStage, // Proposed → Approved → Deprecated // The Semantic (Meaning for similarity search) vector: Option>, // Embedding for k-NN queries } ``` > **Current state:** Aphoria uses a separate `AuthoredClaim` struct with fields like `concept_path`, `predicate`, `value`, `comparison`, `invariant`, and `consequence`. A bridge function (`authored_claim_to_assertion()`) exists to convert between the two representations but is not yet used in the primary claim storage path. Claims are currently persisted in `.aphoria/claims.toml`, not as `Assertion` entries in the DAG. Routing claims through StemeDB is planned. ## The Source Class Hierarchy Every assertion has a source class that structurally affects resolution weight and decay: | Tier | Class | Example | Decay Half-Life | Authority Weight | | ---- | ----------------- | --------------------- | --------------- | ---------------- | | 0 | **Regulatory** | FDA label, SEC filing | Never | 1.0 | | 1 | **Clinical** | Peer-reviewed RCTs | 2 years | 0.9 | | 2 | **Observational** | Real-world evidence | 1 year | 0.7 | | 3 | **Expert** | Physician guidelines | 6 months | 0.5 | | 4 | **Community** | Patient registries | 3 months | 0.2 | | 5 | **Anecdotal** | Reddit posts, social | 30 days | 0.1 | A million Tier-5 anecdotal assertions cannot outvote a single Tier-0 regulatory assertion. But the million anecdotes can signal "something is happening here" via cluster escalation. ## The Query Engine: Truth Lenses Reading applies a **Lens** to collapse the probabilistic field into a concrete answer. Materialized Views ensure sub-millisecond latency for common patterns. ### Resolution Lenses (Pick a Winner) | Lens | Behavior | | -------------- | ---------------------------------------- | | **Recency** | Last writer wins | | **Consensus** | Highest cluster density of object values | | **Authority** | Filter by TrustRank reputation | | **Vote-Aware** | Weight by Ballot Box votes | | **EpochAware** | Filter out superseded paradigms | ### Analysis Lenses (Surface Disagreement) | Lens | Behavior | | --------------------- | ------------------------------------------------------- | | **Skeptic** | Return all claims with conflict score and weight shares | | **Layered Consensus** | Per-source-class resolution (tier-by-tier visibility) | | **Constraints** | Pre-flight check for must_use/forbidden predicates | The **Skeptic** and **Layered Consensus** lenses are key differentiators: they answer "where do sources agree and disagree?" rather than hiding the variance. ## Key Capabilities ### Time-Travel Queries "What was the known risk profile when I started Semaglutide in June 2023?" ```http GET /query?subject=semaglutide&predicate=side_effects&as_of=1687000000 ``` The append-only DAG preserves every historical state. Time travel is a hash lookup, not a reconstruction. ### Semantic Decay Confidence decays based on source class. Old Reddit posts fade; regulatory filings persist: ```http GET /query?subject=semaglutide&predicate=efficacy&source_class_decay=true ``` ### Conflict Analysis Instead of "here is the answer," show "here is the shape of disagreement": ```http GET /skeptic?subject=semaglutide&predicate=gastroparesis_risk ``` Returns: which tiers agree, which disagree, emerging signals without clinical evidence. ### Query Audit Trail Every query is logged with full provenance. "Why did you believe that?" is answerable: ```http GET /audit/query/{query_id} ``` ## The Ballot Box: High-Velocity Consensus To avoid write contention on assertions, agents vote separately: ```rust struct Vote { assertion_hash: Hash, agent_id: PublicKey, weight: f32, signature: Signature, } ``` Votes are append-only. A background Materializer aggregates votes to update O(1) read views. ## Trust Packs: The Curator Economy Users subscribe to "Trust Packs" (curated lists of trusted agents) to filter reality: - _"The Skeptical Cardio Pack"_ filters out low-quality cardiac studies - _"Mayo Clinic Curated"_ only shows assertions from verified Mayo researchers Trust Packs are BitSet overlays that filter the Consensus Lens efficiently. ## The Meter: Economics of Reasoning Deep Research is computationally expensive. Episteme enforces token-bucket quotas: - **Assert:** 10 tokens - **Vote:** 1 token - **Query:** 5 + lens complexity tokens - **Default:** 10,000 tokens/agent/hour ## Architecture: The Biological Stack | Layer | Crate | Role | | --------------- | ------------------------------- | ---------------------------------------- | | **The Spine** | `stemedb-wal` | Append-only WAL for durability | | **The Lattice** | `stemedb-storage` | KV store, indexes, vector/visual indices | | **The Cortex** | `stemedb-query`, `stemedb-lens` | Query engine, Lenses, Materializer | | **The Surface** | `stemedb-api` | HTTP API with OpenAPI docs | The biological metaphor: - **Spine:** Raw persistence. Never loses a claim. - **Lattice:** Connectivity. O(1) lookups via compound indexes. - **Cortex:** Reasoning. Collapse probability into answers. ## Future Vision ### Forking Reality (Planned) Agents simulate futures without polluting the main branch via **Copy-on-Write Branching** using Sparse Merkle Trees. ### The Super Curator (Planned) A specialized swarm of reviewer agents that audits high-variance facts and escalates emerging signals. ### The Simulator (Planned) A pipeline that converts high-confidence failure logs into synthetic training trajectories. ## The Git Analogy | Git Concept | Episteme Equivalent | | ----------- | ---------------------------------------- | | Commit | Assertion (immutable, content-addressed) | | Branch | Epoch (paradigm context) | | Merge | Lens resolution | | Revert | Epoch supersession cascade | | Blame | Signature/agent audit trail | | History | Append-only DAG preserved forever | ## When to Use Episteme **Use Episteme when:** - Multiple sources report conflicting information - You need to weight sources by authority, not just timestamp - You need to surface disagreement, not hide it - Guidance changes and you need to notify prior consumers - You need to audit "why did you believe that?" - You need historical snapshots ("what was true on this date?") **Use Postgres when:** - You have a single source of truth - Data never conflicts - Temporal validity doesn't matter - Consensus has already been reached by humans For everything else: **Episteme is the database.**