stemedb/vision.md

# Episteme: The Probabilistic Knowledge Lattice
> **Internal Codename:** StemeDB
> **Category:** Infrastructure / Database
> **Role:** The Cortex (Reasoning & Truth)

## 1. The Manifesto: "Git for Truth"

We are building the shared, long-term memory for autonomous research agents.

Current databases (Postgres, Neo4j, Vector DBs) suffer from **The Tower of Babel** problem: they store *Data*, not *Evidence*. They are deterministic, stateless, and brittle. If an Agent writes `Revenue = $10M` and another writes `Revenue = $12M`, one must overwrite the other. History is lost. Truth is flattened.

**Episteme** rejects the idea of a single, static "database state." Instead, it models knowledge as a **Probabilistic Lattice of Assertions**.
*   We do not store "Facts."
*   We store "Claims."
*   We do not "Update" records.
*   We "Append" new evidence.
*   We do not query "The Truth."
*   We query through "Lenses" (Consensus, Recency, Authority).

## 2. The Core Data Model: The Hyper-Edge

The atomic unit of Episteme is not a Row, Document, or Embedding. It is the **Signed Assertion**.

```rust
struct Assertion {
    // The Proposition (The "What")
    subject: EntityId,       // e.g., "Tesla_Inc"
    predicate: RelationId,   // "has_annual_revenue"
    object: Value,           // e.g., "$96.7B"

    // The Meta-Cognition (The "Why")
    confidence: f32,         // 0.0 to 1.0 (Agent's subjective certainty)
    source_hash: Hash,       // Content-addressed link to source (PDF, URL, Log)
    agent_id: PublicKey,     // Who made this claim? (Cryptographic signature)
    timestamp: u64,          // When?

    // The Semantic Vector (The "Meaning")
    vector: Vec<f32>,        // Embedding for semantic navigation
}
```

### 2.1. Non-Destructive Writes
Episteme is an **Append-Only Merkle DAG**.
*   **Conflict is a Feature:** If Agent A claims X, and Agent B claims Y, the database holds *both* realities simultaneously.
*   **Traceability:** Every assertion links back to its parent (if it modifies/refutes a previous claim) and its source (evidence).

## 3. The Query Engine: "Truth Lenses"

Because the database holds conflicting realities, "Reading" is a compute-heavy operation. You cannot just `GET key`. You must apply a **Lens**.

A **Lens** is a compiled WASM filter that resolves the probability field into a concrete answer at Read Time.

### Standard Lenses
1.  **Lens::Consensus:** "Return the value with the highest cluster density across all agents." (Democratic Truth)
2.  **Lens::Authority:** "Return values signed by Agents with `Reputation > 900`." (Expert Truth)
3.  **Lens::Recency:** "Return the latest assertion, ignoring history." (News)
4.  **Lens::Skeptic:** "Return the *variance* between claims." (Finds controversy/ambiguity)

## 4. Features for the AI Scientist

### 4.1. "Forking Reality" (Branching)
Agents need to simulate futures ("What if inflation hits 5%?"). Episteme supports **Copy-on-Write Branching**.
*   An Agent creates a `Scenario Branch`.
*   It inserts hypothetical assertions (`Inflation = 5%`).
*   It queries for 2nd-order effects.
*   The Main Branch remains unpolluted.

### 4.2. TrustRank (Reputation Markets)
We implement a recursive PageRank-style algorithm for **Source Credibility**.
1.  **Validation:** If an Agent's claim is later verified by Ground Truth (e.g., an earnings call), their Reputation Score (`R`) increases.
2.  **Back-Propagation:** High-`R` agents confer weight to the sources they cite.
3.  **Decay:** Claims from low-`R` agents fade faster from the "Hot" tier.

## 5. Architecture: The Rust Stack

Episteme follows the **"Defensive by Default"** best practices.

### Tier 1: The Spine (Durability)
*   **Component:** `episteme-wal` (Implementing the Quarantine Journal pattern)
*   **Role:** Raw, serialized append-only log. Ensures we never lose a claim.
*   **Format:** Binary `Record` with BLAKE3 checksums.

### Tier 2: The Lattice (Graph/Index)
*   **Component:** `episteme-core`
*   **Role:** The Hot/Warm memory.
*   **Hot Tier:** `DashMap` of active contradiction clusters.
*   **Warm Tier:** `sled` (LSM Tree) for the Merkle DAG + `hnsw` for vector search.

### Tier 3: The Cortex (Compute)
*   **Component:** `episteme-lens`
*   **Role:** The WASM runtime for executing Lenses.
*   **Function:** Collapses the probabilistic graph into deterministic answers for the client.

## 6. The Ecosystem Triad

Episteme completes the Intelligence Stack:

| System | Biological Analogy | Function | Question Answered |
| :--- | :--- | :--- | :--- |
| **LogDB** | **The Spine** | Immutable Event Log | "What happened?" |
| **AssociativeDB** | **The Hippocampus** | Associative Memory | "What is this like?" |
| **Episteme** | **The Cortex** | Structured Reasoning | "Is this true?" |