- Rust workspace with stemedb-core crate - Full .claude/ configuration (agents, skills, commands, guides) - ai-lookup/ for token-efficient fact storage - Quality gates: clippy, fmt, jscpd duplication detection - Pre-commit hook with 5-phase quality checks - CLAUDE.md router and CODING_GUIDELINES.md standards Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
4.7 KiB
Episteme: The Probabilistic Knowledge Lattice
Internal Codename: StemeDB Category: Infrastructure / Database Role: The Cortex (Reasoning & Truth)
1. The Manifesto: "Git for Truth"
We are building the shared, long-term memory for autonomous research agents.
Current databases (Postgres, Neo4j, Vector DBs) suffer from The Tower of Babel problem: they store Data, not Evidence. They are deterministic, stateless, and brittle. If an Agent writes Revenue = $10M and another writes Revenue = $12M, one must overwrite the other. History is lost. Truth is flattened.
Episteme rejects the idea of a single, static "database state." Instead, it models knowledge as a Probabilistic Lattice of Assertions.
- We do not store "Facts."
- We store "Claims."
- We do not "Update" records.
- We "Append" new evidence.
- We do not query "The Truth."
- We query through "Lenses" (Consensus, Recency, Authority).
2. The Core Data Model: The Hyper-Edge
The atomic unit of Episteme is not a Row, Document, or Embedding. It is the Signed Assertion.
struct Assertion {
// The Proposition (The "What")
subject: EntityId, // e.g., "Tesla_Inc"
predicate: RelationId, // "has_annual_revenue"
object: Value, // e.g., "$96.7B"
// The Meta-Cognition (The "Why")
confidence: f32, // 0.0 to 1.0 (Agent's subjective certainty)
source_hash: Hash, // Content-addressed link to source (PDF, URL, Log)
agent_id: PublicKey, // Who made this claim? (Cryptographic signature)
timestamp: u64, // When?
// The Semantic Vector (The "Meaning")
vector: Vec<f32>, // Embedding for semantic navigation
}
2.1. Non-Destructive Writes
Episteme is an Append-Only Merkle DAG.
- Conflict is a Feature: If Agent A claims X, and Agent B claims Y, the database holds both realities simultaneously.
- Traceability: Every assertion links back to its parent (if it modifies/refutes a previous claim) and its source (evidence).
3. The Query Engine: "Truth Lenses"
Because the database holds conflicting realities, "Reading" is a compute-heavy operation. You cannot just GET key. You must apply a Lens.
A Lens is a compiled WASM filter that resolves the probability field into a concrete answer at Read Time.
Standard Lenses
- Lens::Consensus: "Return the value with the highest cluster density across all agents." (Democratic Truth)
- Lens::Authority: "Return values signed by Agents with
Reputation > 900." (Expert Truth) - Lens::Recency: "Return the latest assertion, ignoring history." (News)
- Lens::Skeptic: "Return the variance between claims." (Finds controversy/ambiguity)
4. Features for the AI Scientist
4.1. "Forking Reality" (Branching)
Agents need to simulate futures ("What if inflation hits 5%?"). Episteme supports Copy-on-Write Branching.
- An Agent creates a
Scenario Branch. - It inserts hypothetical assertions (
Inflation = 5%). - It queries for 2nd-order effects.
- The Main Branch remains unpolluted.
4.2. TrustRank (Reputation Markets)
We implement a recursive PageRank-style algorithm for Source Credibility.
- Validation: If an Agent's claim is later verified by Ground Truth (e.g., an earnings call), their Reputation Score (
R) increases. - Back-Propagation: High-
Ragents confer weight to the sources they cite. - Decay: Claims from low-
Ragents fade faster from the "Hot" tier.
5. Architecture: The Rust Stack
Episteme follows the "Defensive by Default" best practices.
Tier 1: The Spine (Durability)
- Component:
episteme-wal(Implementing the Quarantine Journal pattern) - Role: Raw, serialized append-only log. Ensures we never lose a claim.
- Format: Binary
Recordwith BLAKE3 checksums.
Tier 2: The Lattice (Graph/Index)
- Component:
episteme-core - Role: The Hot/Warm memory.
- Hot Tier:
DashMapof active contradiction clusters. - Warm Tier:
sled(LSM Tree) for the Merkle DAG +hnswfor vector search.
Tier 3: The Cortex (Compute)
- Component:
episteme-lens - Role: The WASM runtime for executing Lenses.
- Function: Collapses the probabilistic graph into deterministic answers for the client.
6. The Ecosystem Triad
Episteme completes the Intelligence Stack:
| System | Biological Analogy | Function | Question Answered |
|---|---|---|---|
| LogDB | The Spine | Immutable Event Log | "What happened?" |
| AssociativeDB | The Hippocampus | Associative Memory | "What is this like?" |
| Episteme | The Cortex | Structured Reasoning | "Is this true?" |