stemedb/vision.md
jordan a776744889 Initial project setup with Claude Code monorepo structure
- Rust workspace with stemedb-core crate
- Full .claude/ configuration (agents, skills, commands, guides)
- ai-lookup/ for token-efficient fact storage
- Quality gates: clippy, fmt, jscpd duplication detection
- Pre-commit hook with 5-phase quality checks
- CLAUDE.md router and CODING_GUIDELINES.md standards

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 10:56:26 -07:00

4.7 KiB

Episteme: The Probabilistic Knowledge Lattice

Internal Codename: StemeDB Category: Infrastructure / Database Role: The Cortex (Reasoning & Truth)

1. The Manifesto: "Git for Truth"

We are building the shared, long-term memory for autonomous research agents.

Current databases (Postgres, Neo4j, Vector DBs) suffer from The Tower of Babel problem: they store Data, not Evidence. They are deterministic, stateless, and brittle. If an Agent writes Revenue = $10M and another writes Revenue = $12M, one must overwrite the other. History is lost. Truth is flattened.

Episteme rejects the idea of a single, static "database state." Instead, it models knowledge as a Probabilistic Lattice of Assertions.

  • We do not store "Facts."
  • We store "Claims."
  • We do not "Update" records.
  • We "Append" new evidence.
  • We do not query "The Truth."
  • We query through "Lenses" (Consensus, Recency, Authority).

2. The Core Data Model: The Hyper-Edge

The atomic unit of Episteme is not a Row, Document, or Embedding. It is the Signed Assertion.

struct Assertion {
    // The Proposition (The "What")
    subject: EntityId,       // e.g., "Tesla_Inc"
    predicate: RelationId,   // "has_annual_revenue"
    object: Value,           // e.g., "$96.7B"

    // The Meta-Cognition (The "Why")
    confidence: f32,         // 0.0 to 1.0 (Agent's subjective certainty)
    source_hash: Hash,       // Content-addressed link to source (PDF, URL, Log)
    agent_id: PublicKey,     // Who made this claim? (Cryptographic signature)
    timestamp: u64,          // When?
    
    // The Semantic Vector (The "Meaning")
    vector: Vec<f32>,        // Embedding for semantic navigation
}

2.1. Non-Destructive Writes

Episteme is an Append-Only Merkle DAG.

  • Conflict is a Feature: If Agent A claims X, and Agent B claims Y, the database holds both realities simultaneously.
  • Traceability: Every assertion links back to its parent (if it modifies/refutes a previous claim) and its source (evidence).

3. The Query Engine: "Truth Lenses"

Because the database holds conflicting realities, "Reading" is a compute-heavy operation. You cannot just GET key. You must apply a Lens.

A Lens is a compiled WASM filter that resolves the probability field into a concrete answer at Read Time.

Standard Lenses

  1. Lens::Consensus: "Return the value with the highest cluster density across all agents." (Democratic Truth)
  2. Lens::Authority: "Return values signed by Agents with Reputation > 900." (Expert Truth)
  3. Lens::Recency: "Return the latest assertion, ignoring history." (News)
  4. Lens::Skeptic: "Return the variance between claims." (Finds controversy/ambiguity)

4. Features for the AI Scientist

4.1. "Forking Reality" (Branching)

Agents need to simulate futures ("What if inflation hits 5%?"). Episteme supports Copy-on-Write Branching.

  • An Agent creates a Scenario Branch.
  • It inserts hypothetical assertions (Inflation = 5%).
  • It queries for 2nd-order effects.
  • The Main Branch remains unpolluted.

4.2. TrustRank (Reputation Markets)

We implement a recursive PageRank-style algorithm for Source Credibility.

  1. Validation: If an Agent's claim is later verified by Ground Truth (e.g., an earnings call), their Reputation Score (R) increases.
  2. Back-Propagation: High-R agents confer weight to the sources they cite.
  3. Decay: Claims from low-R agents fade faster from the "Hot" tier.

5. Architecture: The Rust Stack

Episteme follows the "Defensive by Default" best practices.

Tier 1: The Spine (Durability)

  • Component: episteme-wal (Implementing the Quarantine Journal pattern)
  • Role: Raw, serialized append-only log. Ensures we never lose a claim.
  • Format: Binary Record with BLAKE3 checksums.

Tier 2: The Lattice (Graph/Index)

  • Component: episteme-core
  • Role: The Hot/Warm memory.
  • Hot Tier: DashMap of active contradiction clusters.
  • Warm Tier: sled (LSM Tree) for the Merkle DAG + hnsw for vector search.

Tier 3: The Cortex (Compute)

  • Component: episteme-lens
  • Role: The WASM runtime for executing Lenses.
  • Function: Collapses the probabilistic graph into deterministic answers for the client.

6. The Ecosystem Triad

Episteme completes the Intelligence Stack:

System Biological Analogy Function Question Answered
LogDB The Spine Immutable Event Log "What happened?"
AssociativeDB The Hippocampus Associative Memory "What is this like?"
Episteme The Cortex Structured Reasoning "Is this true?"