# Episteme: Git for Truth

> **Internal Codename:** StemeDB
> **Category:** Infrastructure / Probabilistic Knowledge Database
> **Role:** The shared memory for AI research agents that disagree

## The Problem: Databases Force False Certainty

Current databases (Postgres, Neo4j, Vector DBs) suffer from **The Tower of Babel** problem: they store _Data_, not _Evidence_. They are deterministic, stateless, and brittle.

When multiple agents observe the world and report different things, traditional databases force you to:

- **Pick a winner** (losing the disagreement)
- **Version-table chaos** (complexity explodes)
- **Application logic everywhere** (authority weighting, decay, cascades)

**Real example:** A woman researching Semaglutide found her doctor saying "well-tolerated" while Reddit flagged gastroparesis months before the FDA added the warning. She had no way to weigh these sources structurally. The Reddit signal was right. The system failed her.

## The Solution: Store Claims, Resolve at Read Time

Episteme rejects the idea of a single, static "database state." Instead, it models knowledge as a **Probabilistic Marketplace**:

- **Assertions are immutable.** Every claim is signed, timestamped, and preserved forever.
  > **Status: PARTIALLY IMPLEMENTED.** StemeDB Assertions follow this model. Aphoria's `AuthoredClaim` entries are currently stored in a mutable TOML file (`.aphoria/claims.toml`), not yet routed through the append-only DAG. Bridging claims into StemeDB as Assertions is tracked in the gap closure plan.
- **Contradictions coexist.** The database holds disagreement without forcing resolution.
- **Lenses resolve at query time.** Different readers can apply different resolution strategies.
- **Source authority is structural.** A regulatory filing outweighs a Reddit post by design.

## The Four Pillars

Every use case must demonstrate at least one pillar. If Postgres could do it, it's not a compelling use case.

| Pillar                        | What It Enables                                    | Postgres Gap                                   |
| ----------------------------- | -------------------------------------------------- | ---------------------------------------------- |
| **First-Class Contradiction** | Hold conflicting facts without forcing resolution  | Must pick one value or version-table chaos     |
| **Invalidation Cascades**     | Retracted evidence flags all downstream decisions  | Recursive CTEs don't scale, app logic drifts   |
| **Multi-Signature Consensus** | Weighted trust via cryptographic co-signatures     | Join tables have no cryptographic proof        |
| **Semantic Decay**            | Old data fades from hot path but remains auditable | Manual WHERE clauses, inconsistent decay rates |

## The Core Data Model: The Signed Assertion

The atomic unit is not a Row, Document, or Embedding. It is the **Signed Assertion**:

```rust
struct Assertion {
    // The Proposition (What is being claimed)
    subject: EntityId,           // "semaglutide", "Tesla_Inc"
    predicate: RelationId,       // "has_side_effect", "annual_revenue"
    object: ObjectValue,         // "gastroparesis", "$96.7B"

    // The Lineage (Why we believe it)
    source_hash: Hash,           // Content-addressed link to source document
    source_class: SourceClass,   // Authority tier (0=Regulatory...5=Anecdotal)
    source_metadata: Option<JSON>, // Rich provenance (journal, DOI, etc.)
    visual_hash: Option<PHash>,  // Perceptual hash for image provenance
    epoch: Option<EpochId>,      // Paradigm context ("covid-2020", "gaap-2024")

    // The Meta-Cognition (Who said it, how confident)
    signatures: Vec<SignatureEntry>,  // Ed25519 cryptographic proofs
    confidence: f32,                   // 0.0-1.0 subjective certainty
    timestamp: u64,                    // When created
    lifecycle: LifecycleStage,         // Proposed → Approved → Deprecated

    // The Semantic (Meaning for similarity search)
    vector: Option<Vec<f32>>,    // Embedding for k-NN queries
}
```

> **Current state:** Aphoria uses a separate `AuthoredClaim` struct with fields like `concept_path`, `predicate`, `value`, `comparison`, `invariant`, and `consequence`. A bridge function (`authored_claim_to_assertion()`) exists to convert between the two representations but is not yet used in the primary claim storage path. Claims are currently persisted in `.aphoria/claims.toml`, not as `Assertion` entries in the DAG. Routing claims through StemeDB is planned.

## The Source Class Hierarchy

Every assertion has a source class that structurally affects resolution weight and decay:

| Tier | Class             | Example               | Decay Half-Life | Authority Weight |
| ---- | ----------------- | --------------------- | --------------- | ---------------- |
| 0    | **Regulatory**    | FDA label, SEC filing | Never           | 1.0              |
| 1    | **Clinical**      | Peer-reviewed RCTs    | 2 years         | 0.9              |
| 2    | **Observational** | Real-world evidence   | 1 year          | 0.7              |
| 3    | **Expert**        | Physician guidelines  | 6 months        | 0.5              |
| 4    | **Community**     | Patient registries    | 3 months        | 0.2              |
| 5    | **Anecdotal**     | Reddit posts, social  | 30 days         | 0.1              |

A million Tier-5 anecdotal assertions cannot outvote a single Tier-0 regulatory assertion. But the million anecdotes can signal "something is happening here" via cluster escalation.

## The Query Engine: Truth Lenses

Reading applies a **Lens** to collapse the probabilistic field into a concrete answer. Materialized Views ensure sub-millisecond latency for common patterns.

### Resolution Lenses (Pick a Winner)

| Lens           | Behavior                                 |
| -------------- | ---------------------------------------- |
| **Recency**    | Last writer wins                         |
| **Consensus**  | Highest cluster density of object values |
| **Authority**  | Filter by TrustRank reputation           |
| **Vote-Aware** | Weight by Ballot Box votes               |
| **EpochAware** | Filter out superseded paradigms          |

### Analysis Lenses (Surface Disagreement)

| Lens                  | Behavior                                                |
| --------------------- | ------------------------------------------------------- |
| **Skeptic**           | Return all claims with conflict score and weight shares |
| **Layered Consensus** | Per-source-class resolution (tier-by-tier visibility)   |
| **Constraints**       | Pre-flight check for must_use/forbidden predicates      |

The **Skeptic** and **Layered Consensus** lenses are key differentiators: they answer "where do sources agree and disagree?" rather than hiding the variance.

## Key Capabilities

### Time-Travel Queries

"What was the known risk profile when I started Semaglutide in June 2023?"

```http
GET /query?subject=semaglutide&predicate=side_effects&as_of=1687000000
```

The append-only DAG preserves every historical state. Time travel is a hash lookup, not a reconstruction.

### Semantic Decay

Confidence decays based on source class. Old Reddit posts fade; regulatory filings persist:

```http
GET /query?subject=semaglutide&predicate=efficacy&source_class_decay=true
```

### Conflict Analysis

Instead of "here is the answer," show "here is the shape of disagreement":

```http
GET /skeptic?subject=semaglutide&predicate=gastroparesis_risk
```

Returns: which tiers agree, which disagree, emerging signals without clinical evidence.

### Query Audit Trail

Every query is logged with full provenance. "Why did you believe that?" is answerable:

```http
GET /audit/query/{query_id}
```

## The Ballot Box: High-Velocity Consensus

To avoid write contention on assertions, agents vote separately:

```rust
struct Vote {
    assertion_hash: Hash,
    agent_id: PublicKey,
    weight: f32,
    signature: Signature,
}
```

Votes are append-only. A background Materializer aggregates votes to update O(1) read views.

## Trust Packs: The Curator Economy

Users subscribe to "Trust Packs" (curated lists of trusted agents) to filter reality:

- _"The Skeptical Cardio Pack"_ filters out low-quality cardiac studies
- _"Mayo Clinic Curated"_ only shows assertions from verified Mayo researchers

Trust Packs are BitSet overlays that filter the Consensus Lens efficiently.

## The Meter: Economics of Reasoning

Deep Research is computationally expensive. Episteme enforces token-bucket quotas:

- **Assert:** 10 tokens
- **Vote:** 1 token
- **Query:** 5 + lens complexity tokens
- **Default:** 10,000 tokens/agent/hour

## Architecture: The Biological Stack

| Layer           | Crate                           | Role                                     |
| --------------- | ------------------------------- | ---------------------------------------- |
| **The Spine**   | `stemedb-wal`                   | Append-only WAL for durability           |
| **The Lattice** | `stemedb-storage`               | KV store, indexes, vector/visual indices |
| **The Cortex**  | `stemedb-query`, `stemedb-lens` | Query engine, Lenses, Materializer       |
| **The Surface** | `stemedb-api`                   | HTTP API with OpenAPI docs               |

The biological metaphor:

- **Spine:** Raw persistence. Never loses a claim.
- **Lattice:** Connectivity. O(1) lookups via compound indexes.
- **Cortex:** Reasoning. Collapse probability into answers.

## Future Vision

### Forking Reality (Planned)

Agents simulate futures without polluting the main branch via **Copy-on-Write Branching** using Sparse Merkle Trees.

### The Super Curator (Planned)

A specialized swarm of reviewer agents that audits high-variance facts and escalates emerging signals.

### The Simulator (Planned)

A pipeline that converts high-confidence failure logs into synthetic training trajectories.

## The Git Analogy

| Git Concept | Episteme Equivalent                      |
| ----------- | ---------------------------------------- |
| Commit      | Assertion (immutable, content-addressed) |
| Branch      | Epoch (paradigm context)                 |
| Merge       | Lens resolution                          |
| Revert      | Epoch supersession cascade               |
| Blame       | Signature/agent audit trail              |
| History     | Append-only DAG preserved forever        |

## When to Use Episteme

**Use Episteme when:**

- Multiple sources report conflicting information
- You need to weight sources by authority, not just timestamp
- You need to surface disagreement, not hide it
- Guidance changes and you need to notify prior consumers
- You need to audit "why did you believe that?"
- You need historical snapshots ("what was true on this date?")

**Use Postgres when:**

- You have a single source of truth
- Data never conflicts
- Temporal validity doesn't matter
- Consensus has already been reached by humans

For everything else: **Episteme is the database.**