stemedb/vision.md
jordan 1ce4004807 feat: Complete Phase 2 (The Cortex) - query, lens, and API layers
This commit adds the read path (Cortex) to complement the write path (Spine):

## Crates
- stemedb-api: HTTP API with axum + utoipa OpenAPI
  - /v1/assert, /v1/query, /v1/epoch, /v1/skeptic, /v1/trace, /v1/audit
  - Metered endpoints with quota enforcement
  - Ed25519 signature verification
- stemedb-lens: Truth resolution lenses
  - RecencyLens, ConsensusLens, ConfidenceLens
  - VoteAwareConsensusLens (Ballot Box pattern)
  - TrustAwareAuthorityLens (The Hive pattern)
  - SkepticLens (conflict analysis)
  - EpochAwareLens (paradigm-safe queries)
- stemedb-query: Query engine with materialized views

## Storage Extensions
- VoteStore: Vote aggregation with cached counts
- TrustRankStore: Agent reputation with decay
- AuditStore: Query audit trail
- IndexStore: SP/P/S index structures
- SupersessionStore: Epoch supersession chains

## SDKs
- sdk/go/steme: Go HTTP client with Ed25519 signing
- sdk/go/adk: ADK-Go tools for AI agents

## Documentation
- Updated CLAUDE.md, architecture.md, roadmap.md
- New ai-lookup entries for all services
- Use case docs for consumer health intelligence
- Arena roadmap for simulation advancement

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 13:22:44 -07:00

102 lines
5.3 KiB
Markdown

# Episteme: The Probabilistic Knowledge Lattice
> **Internal Codename:** StemeDB
> **Category:** Infrastructure / Database
> **Role:** The Cortex (Reasoning & Truth)
## 1. The Manifesto: "A Marketplace of Truth"
We are building the shared, long-term memory for autonomous research agents.
Current databases (Postgres, Neo4j, Vector DBs) suffer from **The Tower of Babel** problem: they store *Data*, not *Evidence*. They are deterministic, stateless, and brittle.
**Episteme** rejects the idea of a single, static "database state." Instead, it models knowledge as a **Probabilistic Marketplace**.
* **Democracy:** Truth is established via high-velocity consensus (Voting), not just overwrite privileges.
* **Federation:** "Waze for Deep Research." Free users contribute to a Global Lattice; Paid users get private silos.
* **Economics:** Reasoning has a cost. The system enforces efficiency via "The Meter."
* **Curation:** We are not the Ministry of Truth. We are the App Store for Trust. Users subscribe to "Trust Packs" (e.g., "Mayo Clinic", "Rust Experts") to filter reality.
## 2. The Core Data Model: The Hyper-Edge
The atomic unit of Episteme is not a Row, Document, or Embedding. It is the **Signed Assertion**.
```rust
struct Assertion {
// The Proposition (The "What")
subject: EntityId, // e.g., "Tesla_Inc"
predicate: RelationId, // e.g., "has_annual_revenue"
object: Value, // e.g., "$96.7B"
// The Meta-Cognition (The "Why")
confidence: f32, // 0.0 to 1.0 (Agent's subjective certainty)
source_hash: Hash, // Content-addressed link to source (PDF, URL, Log)
visual_hash: Option<Hash>, // pHash for visual anchoring against web drift
agent_id: PublicKey, // Who made this claim? (Cryptographic multi-sig)
timestamp: u64, // When?
// The Semantic Vector (The "Meaning")
vector: Vec<f32>, // Embedding for semantic navigation
// The Paradigm (The "Context")
epoch: Option<EpochId>, // "covid-guidelines-2020", "gaap-2024"
}
```
## 3. The Query Engine: "Truth Lenses"
Reading is a compute-heavy operation. You must apply a **Lens** to collapse the probabilistic field into a concrete answer. To ensure sub-millisecond latency, Episteme uses **Materialized Views** to pre-calculate the results of standard lenses.
### Standard Lenses
1. **Lens::Consensus:** Returns the value with the highest cluster density (Weighted by Trust Pack).
2. **Lens::Authority:** Filters by subscribed **Trust Packs** (e.g., "Show me reality according to The Financial Times").
3. **Lens::Recency:** Returns the latest assertion, ignoring history.
4. **Lens::EpochAware:** Validates assertions against the *current* paradigm, filtering superseded epochs.
5. **Lens::Skeptic:** Returns the *variance* between claims (identifies high-conflict/unstable truth).
## 4. Features for the Agentive Team
### 4.1. "Forking Reality" (Branching)
Agents need to simulate futures without polluting the main branch. Episteme supports **Copy-on-Write Branching** via Sparse Merkle Trees.
### 4.2. The Ballot Box: High-Velocity Consensus
To avoid write contention, Episteme separates the "Candidate" (Assertion) from the "Votes" (Signatures).
* Agents write **Votes** to a high-speed append-only log ("The Ballot Box").
* A background process aggregates these votes to update the Materialized View.
### 4.3. The Hive: Learning & Trust
* **Trust Packs (The Curator Economy):** Users can publish and subscribe to Lists of Trusted Agents.
* *Example:* "The Skeptical Cardio Pack" filters out low-quality studies.
* *Mechanism:* A BitSet overlay that filters the Consensus Lens efficiently.
* **The Simulator (Mid-Training):** A pipeline that converts high-confidence failure logs into **Synthetic Trajectories**.
* **The Super Curator (Judicial Branch):** A specialized swarm of "Reviewer Agents" that audits high-variance facts.
### 4.4. The Meter: Economics of Reasoning
Deep Research is computationally expensive. Episteme enforces **Temporal Advantage Normalization (TAN)**.
* **Budgeting:** Every Job carries a budget (tokens/dollars).
* **Throttling:** The system rejects "Fork Reality" requests if the projected cost exceeds the Value of Information.
## 5. Architecture: The Rust Stack
Episteme follows the **"Defensive by Default"** best practices.
### Tier 1: The Spine (Durability)
* **Component:** `episteme-wal` (Quarantine Pattern)
* **Role:** Raw, serialized append-only log. Ensures we never lose a claim.
### Tier 2: The Lattice (Graph/Index)
* **Component:** `episteme-core` (Hot/Warm memory)
* **Warm Tier:** `sled` (LSM Tree) for the Merkle DAG + `hnsw` for vector search.
* **Ballot Box:** High-velocity stream for vote ingestion.
### Tier 3: The Cortex (Compute)
* **Component:** `episteme-lens`
* **Role:** The WASM runtime for executing Lenses and resolving probabilistic state.
* **Materializer:** Background worker maintaining O(1) read views.
## 6. The Ecosystem Triad
| System | Biological Analogy | Function | Question Answered |
| :--- | :--- | :--- | :--- |
| **LogDB** | **The Spine** | Immutable Event Log | "What happened?" |
| **AssociativeDB** | **The Hippocampus** | Associative Memory | "What is this like?" |
| **Episteme** | **The Cortex** | Structured Reasoning | "Is this true?" |