- Rust workspace with stemedb-core crate - Full .claude/ configuration (agents, skills, commands, guides) - ai-lookup/ for token-efficient fact storage - Quality gates: clippy, fmt, jscpd duplication detection - Pre-commit hook with 5-phase quality checks - CLAUDE.md router and CODING_GUIDELINES.md standards Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
102 lines
4.7 KiB
Markdown
102 lines
4.7 KiB
Markdown
# Episteme: The Probabilistic Knowledge Lattice
|
|
> **Internal Codename:** StemeDB
|
|
> **Category:** Infrastructure / Database
|
|
> **Role:** The Cortex (Reasoning & Truth)
|
|
|
|
## 1. The Manifesto: "Git for Truth"
|
|
|
|
We are building the shared, long-term memory for autonomous research agents.
|
|
|
|
Current databases (Postgres, Neo4j, Vector DBs) suffer from **The Tower of Babel** problem: they store *Data*, not *Evidence*. They are deterministic, stateless, and brittle. If an Agent writes `Revenue = $10M` and another writes `Revenue = $12M`, one must overwrite the other. History is lost. Truth is flattened.
|
|
|
|
**Episteme** rejects the idea of a single, static "database state." Instead, it models knowledge as a **Probabilistic Lattice of Assertions**.
|
|
* We do not store "Facts."
|
|
* We store "Claims."
|
|
* We do not "Update" records.
|
|
* We "Append" new evidence.
|
|
* We do not query "The Truth."
|
|
* We query through "Lenses" (Consensus, Recency, Authority).
|
|
|
|
## 2. The Core Data Model: The Hyper-Edge
|
|
|
|
The atomic unit of Episteme is not a Row, Document, or Embedding. It is the **Signed Assertion**.
|
|
|
|
```rust
|
|
struct Assertion {
|
|
// The Proposition (The "What")
|
|
subject: EntityId, // e.g., "Tesla_Inc"
|
|
predicate: RelationId, // "has_annual_revenue"
|
|
object: Value, // e.g., "$96.7B"
|
|
|
|
// The Meta-Cognition (The "Why")
|
|
confidence: f32, // 0.0 to 1.0 (Agent's subjective certainty)
|
|
source_hash: Hash, // Content-addressed link to source (PDF, URL, Log)
|
|
agent_id: PublicKey, // Who made this claim? (Cryptographic signature)
|
|
timestamp: u64, // When?
|
|
|
|
// The Semantic Vector (The "Meaning")
|
|
vector: Vec<f32>, // Embedding for semantic navigation
|
|
}
|
|
```
|
|
|
|
### 2.1. Non-Destructive Writes
|
|
Episteme is an **Append-Only Merkle DAG**.
|
|
* **Conflict is a Feature:** If Agent A claims X, and Agent B claims Y, the database holds *both* realities simultaneously.
|
|
* **Traceability:** Every assertion links back to its parent (if it modifies/refutes a previous claim) and its source (evidence).
|
|
|
|
## 3. The Query Engine: "Truth Lenses"
|
|
|
|
Because the database holds conflicting realities, "Reading" is a compute-heavy operation. You cannot just `GET key`. You must apply a **Lens**.
|
|
|
|
A **Lens** is a compiled WASM filter that resolves the probability field into a concrete answer at Read Time.
|
|
|
|
### Standard Lenses
|
|
1. **Lens::Consensus:** "Return the value with the highest cluster density across all agents." (Democratic Truth)
|
|
2. **Lens::Authority:** "Return values signed by Agents with `Reputation > 900`." (Expert Truth)
|
|
3. **Lens::Recency:** "Return the latest assertion, ignoring history." (News)
|
|
4. **Lens::Skeptic:** "Return the *variance* between claims." (Finds controversy/ambiguity)
|
|
|
|
## 4. Features for the AI Scientist
|
|
|
|
### 4.1. "Forking Reality" (Branching)
|
|
Agents need to simulate futures ("What if inflation hits 5%?"). Episteme supports **Copy-on-Write Branching**.
|
|
* An Agent creates a `Scenario Branch`.
|
|
* It inserts hypothetical assertions (`Inflation = 5%`).
|
|
* It queries for 2nd-order effects.
|
|
* The Main Branch remains unpolluted.
|
|
|
|
### 4.2. TrustRank (Reputation Markets)
|
|
We implement a recursive PageRank-style algorithm for **Source Credibility**.
|
|
1. **Validation:** If an Agent's claim is later verified by Ground Truth (e.g., an earnings call), their Reputation Score (`R`) increases.
|
|
2. **Back-Propagation:** High-`R` agents confer weight to the sources they cite.
|
|
3. **Decay:** Claims from low-`R` agents fade faster from the "Hot" tier.
|
|
|
|
## 5. Architecture: The Rust Stack
|
|
|
|
Episteme follows the **"Defensive by Default"** best practices.
|
|
|
|
### Tier 1: The Spine (Durability)
|
|
* **Component:** `episteme-wal` (Implementing the Quarantine Journal pattern)
|
|
* **Role:** Raw, serialized append-only log. Ensures we never lose a claim.
|
|
* **Format:** Binary `Record` with BLAKE3 checksums.
|
|
|
|
### Tier 2: The Lattice (Graph/Index)
|
|
* **Component:** `episteme-core`
|
|
* **Role:** The Hot/Warm memory.
|
|
* **Hot Tier:** `DashMap` of active contradiction clusters.
|
|
* **Warm Tier:** `sled` (LSM Tree) for the Merkle DAG + `hnsw` for vector search.
|
|
|
|
### Tier 3: The Cortex (Compute)
|
|
* **Component:** `episteme-lens`
|
|
* **Role:** The WASM runtime for executing Lenses.
|
|
* **Function:** Collapses the probabilistic graph into deterministic answers for the client.
|
|
|
|
## 6. The Ecosystem Triad
|
|
|
|
Episteme completes the Intelligence Stack:
|
|
|
|
| System | Biological Analogy | Function | Question Answered |
|
|
| :--- | :--- | :--- | :--- |
|
|
| **LogDB** | **The Spine** | Immutable Event Log | "What happened?" |
|
|
| **AssociativeDB** | **The Hippocampus** | Associative Memory | "What is this like?" |
|
|
| **Episteme** | **The Cortex** | Structured Reasoning | "Is this true?" | |