stemedb/architecture.md
jordan a776744889 Initial project setup with Claude Code monorepo structure
- Rust workspace with stemedb-core crate
- Full .claude/ configuration (agents, skills, commands, guides)
- ai-lookup/ for token-efficient fact storage
- Quality gates: clippy, fmt, jscpd duplication detection
- Pre-commit hook with 5-phase quality checks
- CLAUDE.md router and CODING_GUIDELINES.md standards

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 10:56:26 -07:00

157 lines
5.7 KiB
Markdown

# Episteme (StemeDB) Architecture
> **Design Philosophy:** Immutable History, Probabilistic Resolution.
> **Status:** Draft Spec v0.1
## 1. System Overview
Episteme is a **Log-Structured, Content-Addressed Knowledge Graph**. Unlike traditional databases that mutate state in place, Episteme appends **Assertions** to an immutable ledger (Merkle DAG). State resolution happens at read-time via **Lenses**.
### High-Level Data Flow
```ascii
[Writer Agent] [Reader Agent]
│ ▲
│ (1) Sign & │ (5) Deterministic Answer
│ Propose │ (Confidence: 0.92)
▼ │
┌────────────┐ ┌────────────┐
│ Ingestion │ │ Resolution │
│ Gateway │ │ Engine │
└─────┬──────┘ └─────┬──────┘
│ (2) Append │ (4) Apply Lens (Filter/Rank)
│ to WAL │
▼ │
┌────────────┐ ┌────────────┐
│ Quarantine │ │ Indexing │
│ Journal │──────► Service │
└────────────┘ (3) └────────────┘
(Durability) (Graph/Vector)
```
---
## 2. Core Data Structures
### 2.1. The Atomic Unit: `Assertion`
Everything in Episteme is an Assertion. There are no "Tables."
```rust
// The immutable payload (Content-Addressed by Hash)
struct Assertion {
// 1. The Triple (The Fact)
pub subject: EntityId, // "Tesla_Inc"
pub predicate: RelationId, // "has_revenue"
pub object: Value, // Variant: Float(10.5B), String("Musk"), Ref(EntityId)
// 2. The Lineage (The Chain)
pub parent_hash: Option<Hash>, // If modifying a previous claim (Forking)
pub source_hash: Hash, // Evidence pointer (PDF/Log hash)
// 3. The Meta-Cognition (The Weight)
pub agent_id: PublicKey, // Ed25519 signature
pub confidence: f32, // 0.0 - 1.0 (Subjective certainty)
pub timestamp: u64, // Wall clock time
pub vector: Option<Vec<f32>>,// Semantic embedding (for fuzzy recall)
}
```
### 2.2. The Storage Layout (LSM Tree)
We use a Key-Value store (e.g., `sled` or `RocksDB`) to persist the DAG.
| Key | Value | Purpose |
| :--- | :--- | :--- |
| `H:{Hash}` | `Serialized<Assertion>` | Main content store |
| `S:{Subject}` | `List<Hash>` | Subject-to-Claims Index |
| `SP:{Subject}:{Predicate}` | `List<Hash>` | Exact Triple Index |
| `A:{AgentID}` | `ReputationScore` | TrustRank storage |
---
## 3. The Write Path (The Spine)
Episteme follows the **Quarantine Pattern** for durability.
1. **Receive:** Agent submits a signed `Assertion`.
2. **Verify:** Check signature validity and structure.
3. **Journal:** Write to `episteme-wal` (Append-only file, fsync immediate).
4. **Acknowledge:** Return `202 Accepted` to Agent with the new `Hash`.
5. **Index (Async):** A background worker tails the WAL:
* Deserializes the Assertion.
* Updates the `H:{Hash}` store.
* Appends `Hash` to the `S:{Subject}` adjacency list.
* Updates HNSW vector index (if vector present).
---
## 4. The Read Path (The Cortex)
Reading is where Episteme differs from every other DB. A Read is a **Compute Operation**.
**Query:** `GET(Subject="Tesla", Predicate="Revenue", Lens="Consensus")`
1. **Gather:** Lookup `SP:Tesla:Revenue`. Get list of candidate Hashes: `[H1, H2, H3, H4]`.
2. **Hydrate:** Fetch full Assertions for each Hash.
3. **Resolve (The Lens):** Pass candidates through the Lens pipeline.
### The Lens Pipeline (Rust Trait)
```rust
trait Lens {
fn resolve(&self, candidates: Vec<Assertion>, context: Context) -> LensResult;
}
// Example: Consensus Lens Logic
// 1. Group candidates by Object value (clustering).
// 2. Sum the TrustRank of Agents in each cluster.
// 3. Return the cluster with highest weighted mass.
```
---
## 5. Advanced Mechanics
### 5.1. Forking Reality (Branching)
Branching is handled via **Overlay Graphs**.
* A `Branch` is simply a lightweight index (Map) of `Hash -> Assertion`.
* **Write to Branch:** Assertions are stored in the Branch's ephemeral index, not the Global DAG.
* **Read from Branch:** The Query Engine checks the Branch index *first*, then falls back to Global (Overlay pattern).
* **Merge:** Commit the Branch's unique assertions to the Global WAL.
### 5.2. TrustRank (Reputation)
Background worker (`episteme-gardener`) runs periodically:
1. Identifies "Settled Facts" (Assertions with >99% consensus over T time).
2. Rewards Agents who claimed these facts *early*.
3. Punishes Agents who claimed the opposite.
4. Updates `A:{AgentID}` reputation scores.
---
## 6. Implementation Roadmap
### Phase 1: The Skeleton (MVP)
* [ ] Reuse `quarantine-journal` pattern for WAL.
* [ ] Implement `Assertion` struct and serialization (`rkyv`).
* [ ] Basic `sled` storage backend.
* [ ] Single Lens: `Recency` (Last writer wins logic).
### Phase 2: The Graph
* [ ] Implement `Subject -> Hash` indexing.
* [ ] Implement `Consensus` Lens (Simple voting).
* [ ] Basic HTTP API (`POST /assert`, `GET /query`).
### Phase 3: The Cortex
* [ ] Branching support (Context/Session IDs).
* [ ] Vector search integration (`lanms` or `hnsw-rs`).
* [ ] TrustRank basics.
---
## 7. Technology Stack
* **Language:** Rust (2024 edition)
* **WAL:** `quarantine-journal` (Local crate or pattern)
* **KV Store:** `sled` (Embedded, pure Rust) or `rocksdb` binding.
* **Serialization:** `rkyv` (Zero-copy deserialization).
* **API:** `axum` + `tower`.
* **Hashing:** `blake3` (Fast, secure).