This commit adds the read path (Cortex) to complement the write path (Spine): ## Crates - stemedb-api: HTTP API with axum + utoipa OpenAPI - /v1/assert, /v1/query, /v1/epoch, /v1/skeptic, /v1/trace, /v1/audit - Metered endpoints with quota enforcement - Ed25519 signature verification - stemedb-lens: Truth resolution lenses - RecencyLens, ConsensusLens, ConfidenceLens - VoteAwareConsensusLens (Ballot Box pattern) - TrustAwareAuthorityLens (The Hive pattern) - SkepticLens (conflict analysis) - EpochAwareLens (paradigm-safe queries) - stemedb-query: Query engine with materialized views ## Storage Extensions - VoteStore: Vote aggregation with cached counts - TrustRankStore: Agent reputation with decay - AuditStore: Query audit trail - IndexStore: SP/P/S index structures - SupersessionStore: Epoch supersession chains ## SDKs - sdk/go/steme: Go HTTP client with Ed25519 signing - sdk/go/adk: ADK-Go tools for AI agents ## Documentation - Updated CLAUDE.md, architecture.md, roadmap.md - New ai-lookup entries for all services - Use case docs for consumer health intelligence - Arena roadmap for simulation advancement Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
102 lines
5.3 KiB
Markdown
102 lines
5.3 KiB
Markdown
# Episteme: The Probabilistic Knowledge Lattice
|
|
> **Internal Codename:** StemeDB
|
|
> **Category:** Infrastructure / Database
|
|
> **Role:** The Cortex (Reasoning & Truth)
|
|
|
|
## 1. The Manifesto: "A Marketplace of Truth"
|
|
|
|
We are building the shared, long-term memory for autonomous research agents.
|
|
|
|
Current databases (Postgres, Neo4j, Vector DBs) suffer from **The Tower of Babel** problem: they store *Data*, not *Evidence*. They are deterministic, stateless, and brittle.
|
|
|
|
**Episteme** rejects the idea of a single, static "database state." Instead, it models knowledge as a **Probabilistic Marketplace**.
|
|
* **Democracy:** Truth is established via high-velocity consensus (Voting), not just overwrite privileges.
|
|
* **Federation:** "Waze for Deep Research." Free users contribute to a Global Lattice; Paid users get private silos.
|
|
* **Economics:** Reasoning has a cost. The system enforces efficiency via "The Meter."
|
|
* **Curation:** We are not the Ministry of Truth. We are the App Store for Trust. Users subscribe to "Trust Packs" (e.g., "Mayo Clinic", "Rust Experts") to filter reality.
|
|
|
|
## 2. The Core Data Model: The Hyper-Edge
|
|
|
|
The atomic unit of Episteme is not a Row, Document, or Embedding. It is the **Signed Assertion**.
|
|
|
|
```rust
|
|
struct Assertion {
|
|
// The Proposition (The "What")
|
|
subject: EntityId, // e.g., "Tesla_Inc"
|
|
predicate: RelationId, // e.g., "has_annual_revenue"
|
|
object: Value, // e.g., "$96.7B"
|
|
|
|
// The Meta-Cognition (The "Why")
|
|
confidence: f32, // 0.0 to 1.0 (Agent's subjective certainty)
|
|
source_hash: Hash, // Content-addressed link to source (PDF, URL, Log)
|
|
visual_hash: Option<Hash>, // pHash for visual anchoring against web drift
|
|
agent_id: PublicKey, // Who made this claim? (Cryptographic multi-sig)
|
|
timestamp: u64, // When?
|
|
|
|
// The Semantic Vector (The "Meaning")
|
|
vector: Vec<f32>, // Embedding for semantic navigation
|
|
|
|
// The Paradigm (The "Context")
|
|
epoch: Option<EpochId>, // "covid-guidelines-2020", "gaap-2024"
|
|
}
|
|
```
|
|
|
|
## 3. The Query Engine: "Truth Lenses"
|
|
|
|
Reading is a compute-heavy operation. You must apply a **Lens** to collapse the probabilistic field into a concrete answer. To ensure sub-millisecond latency, Episteme uses **Materialized Views** to pre-calculate the results of standard lenses.
|
|
|
|
### Standard Lenses
|
|
1. **Lens::Consensus:** Returns the value with the highest cluster density (Weighted by Trust Pack).
|
|
2. **Lens::Authority:** Filters by subscribed **Trust Packs** (e.g., "Show me reality according to The Financial Times").
|
|
3. **Lens::Recency:** Returns the latest assertion, ignoring history.
|
|
4. **Lens::EpochAware:** Validates assertions against the *current* paradigm, filtering superseded epochs.
|
|
5. **Lens::Skeptic:** Returns the *variance* between claims (identifies high-conflict/unstable truth).
|
|
|
|
## 4. Features for the Agentive Team
|
|
|
|
### 4.1. "Forking Reality" (Branching)
|
|
Agents need to simulate futures without polluting the main branch. Episteme supports **Copy-on-Write Branching** via Sparse Merkle Trees.
|
|
|
|
### 4.2. The Ballot Box: High-Velocity Consensus
|
|
To avoid write contention, Episteme separates the "Candidate" (Assertion) from the "Votes" (Signatures).
|
|
* Agents write **Votes** to a high-speed append-only log ("The Ballot Box").
|
|
* A background process aggregates these votes to update the Materialized View.
|
|
|
|
### 4.3. The Hive: Learning & Trust
|
|
* **Trust Packs (The Curator Economy):** Users can publish and subscribe to Lists of Trusted Agents.
|
|
* *Example:* "The Skeptical Cardio Pack" filters out low-quality studies.
|
|
* *Mechanism:* A BitSet overlay that filters the Consensus Lens efficiently.
|
|
* **The Simulator (Mid-Training):** A pipeline that converts high-confidence failure logs into **Synthetic Trajectories**.
|
|
* **The Super Curator (Judicial Branch):** A specialized swarm of "Reviewer Agents" that audits high-variance facts.
|
|
|
|
### 4.4. The Meter: Economics of Reasoning
|
|
Deep Research is computationally expensive. Episteme enforces **Temporal Advantage Normalization (TAN)**.
|
|
* **Budgeting:** Every Job carries a budget (tokens/dollars).
|
|
* **Throttling:** The system rejects "Fork Reality" requests if the projected cost exceeds the Value of Information.
|
|
|
|
## 5. Architecture: The Rust Stack
|
|
|
|
Episteme follows the **"Defensive by Default"** best practices.
|
|
|
|
### Tier 1: The Spine (Durability)
|
|
* **Component:** `episteme-wal` (Quarantine Pattern)
|
|
* **Role:** Raw, serialized append-only log. Ensures we never lose a claim.
|
|
|
|
### Tier 2: The Lattice (Graph/Index)
|
|
* **Component:** `episteme-core` (Hot/Warm memory)
|
|
* **Warm Tier:** `sled` (LSM Tree) for the Merkle DAG + `hnsw` for vector search.
|
|
* **Ballot Box:** High-velocity stream for vote ingestion.
|
|
|
|
### Tier 3: The Cortex (Compute)
|
|
* **Component:** `episteme-lens`
|
|
* **Role:** The WASM runtime for executing Lenses and resolving probabilistic state.
|
|
* **Materializer:** Background worker maintaining O(1) read views.
|
|
|
|
## 6. The Ecosystem Triad
|
|
|
|
| System | Biological Analogy | Function | Question Answered |
|
|
| :--- | :--- | :--- | :--- |
|
|
| **LogDB** | **The Spine** | Immutable Event Log | "What happened?" |
|
|
| **AssociativeDB** | **The Hippocampus** | Associative Memory | "What is this like?" |
|
|
| **Episteme** | **The Cortex** | Structured Reasoning | "Is this true?" |
|