- Rust workspace with stemedb-core crate - Full .claude/ configuration (agents, skills, commands, guides) - ai-lookup/ for token-efficient fact storage - Quality gates: clippy, fmt, jscpd duplication detection - Pre-commit hook with 5-phase quality checks - CLAUDE.md router and CODING_GUIDELINES.md standards Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
157 lines
5.7 KiB
Markdown
157 lines
5.7 KiB
Markdown
# Episteme (StemeDB) Architecture
|
|
|
|
> **Design Philosophy:** Immutable History, Probabilistic Resolution.
|
|
> **Status:** Draft Spec v0.1
|
|
|
|
## 1. System Overview
|
|
|
|
Episteme is a **Log-Structured, Content-Addressed Knowledge Graph**. Unlike traditional databases that mutate state in place, Episteme appends **Assertions** to an immutable ledger (Merkle DAG). State resolution happens at read-time via **Lenses**.
|
|
|
|
### High-Level Data Flow
|
|
|
|
```ascii
|
|
[Writer Agent] [Reader Agent]
|
|
│ ▲
|
|
│ (1) Sign & │ (5) Deterministic Answer
|
|
│ Propose │ (Confidence: 0.92)
|
|
▼ │
|
|
┌────────────┐ ┌────────────┐
|
|
│ Ingestion │ │ Resolution │
|
|
│ Gateway │ │ Engine │
|
|
└─────┬──────┘ └─────┬──────┘
|
|
│ (2) Append │ (4) Apply Lens (Filter/Rank)
|
|
│ to WAL │
|
|
▼ │
|
|
┌────────────┐ ┌────────────┐
|
|
│ Quarantine │ │ Indexing │
|
|
│ Journal │──────► Service │
|
|
└────────────┘ (3) └────────────┘
|
|
(Durability) (Graph/Vector)
|
|
```
|
|
|
|
---
|
|
|
|
## 2. Core Data Structures
|
|
|
|
### 2.1. The Atomic Unit: `Assertion`
|
|
Everything in Episteme is an Assertion. There are no "Tables."
|
|
|
|
```rust
|
|
// The immutable payload (Content-Addressed by Hash)
|
|
struct Assertion {
|
|
// 1. The Triple (The Fact)
|
|
pub subject: EntityId, // "Tesla_Inc"
|
|
pub predicate: RelationId, // "has_revenue"
|
|
pub object: Value, // Variant: Float(10.5B), String("Musk"), Ref(EntityId)
|
|
|
|
// 2. The Lineage (The Chain)
|
|
pub parent_hash: Option<Hash>, // If modifying a previous claim (Forking)
|
|
pub source_hash: Hash, // Evidence pointer (PDF/Log hash)
|
|
|
|
// 3. The Meta-Cognition (The Weight)
|
|
pub agent_id: PublicKey, // Ed25519 signature
|
|
pub confidence: f32, // 0.0 - 1.0 (Subjective certainty)
|
|
pub timestamp: u64, // Wall clock time
|
|
pub vector: Option<Vec<f32>>,// Semantic embedding (for fuzzy recall)
|
|
}
|
|
```
|
|
|
|
### 2.2. The Storage Layout (LSM Tree)
|
|
We use a Key-Value store (e.g., `sled` or `RocksDB`) to persist the DAG.
|
|
|
|
| Key | Value | Purpose |
|
|
| :--- | :--- | :--- |
|
|
| `H:{Hash}` | `Serialized<Assertion>` | Main content store |
|
|
| `S:{Subject}` | `List<Hash>` | Subject-to-Claims Index |
|
|
| `SP:{Subject}:{Predicate}` | `List<Hash>` | Exact Triple Index |
|
|
| `A:{AgentID}` | `ReputationScore` | TrustRank storage |
|
|
|
|
---
|
|
|
|
## 3. The Write Path (The Spine)
|
|
|
|
Episteme follows the **Quarantine Pattern** for durability.
|
|
|
|
1. **Receive:** Agent submits a signed `Assertion`.
|
|
2. **Verify:** Check signature validity and structure.
|
|
3. **Journal:** Write to `episteme-wal` (Append-only file, fsync immediate).
|
|
4. **Acknowledge:** Return `202 Accepted` to Agent with the new `Hash`.
|
|
5. **Index (Async):** A background worker tails the WAL:
|
|
* Deserializes the Assertion.
|
|
* Updates the `H:{Hash}` store.
|
|
* Appends `Hash` to the `S:{Subject}` adjacency list.
|
|
* Updates HNSW vector index (if vector present).
|
|
|
|
---
|
|
|
|
## 4. The Read Path (The Cortex)
|
|
|
|
Reading is where Episteme differs from every other DB. A Read is a **Compute Operation**.
|
|
|
|
**Query:** `GET(Subject="Tesla", Predicate="Revenue", Lens="Consensus")`
|
|
|
|
1. **Gather:** Lookup `SP:Tesla:Revenue`. Get list of candidate Hashes: `[H1, H2, H3, H4]`.
|
|
2. **Hydrate:** Fetch full Assertions for each Hash.
|
|
3. **Resolve (The Lens):** Pass candidates through the Lens pipeline.
|
|
|
|
### The Lens Pipeline (Rust Trait)
|
|
|
|
```rust
|
|
trait Lens {
|
|
fn resolve(&self, candidates: Vec<Assertion>, context: Context) -> LensResult;
|
|
}
|
|
|
|
// Example: Consensus Lens Logic
|
|
// 1. Group candidates by Object value (clustering).
|
|
// 2. Sum the TrustRank of Agents in each cluster.
|
|
// 3. Return the cluster with highest weighted mass.
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Advanced Mechanics
|
|
|
|
### 5.1. Forking Reality (Branching)
|
|
Branching is handled via **Overlay Graphs**.
|
|
* A `Branch` is simply a lightweight index (Map) of `Hash -> Assertion`.
|
|
* **Write to Branch:** Assertions are stored in the Branch's ephemeral index, not the Global DAG.
|
|
* **Read from Branch:** The Query Engine checks the Branch index *first*, then falls back to Global (Overlay pattern).
|
|
* **Merge:** Commit the Branch's unique assertions to the Global WAL.
|
|
|
|
### 5.2. TrustRank (Reputation)
|
|
Background worker (`episteme-gardener`) runs periodically:
|
|
1. Identifies "Settled Facts" (Assertions with >99% consensus over T time).
|
|
2. Rewards Agents who claimed these facts *early*.
|
|
3. Punishes Agents who claimed the opposite.
|
|
4. Updates `A:{AgentID}` reputation scores.
|
|
|
|
---
|
|
|
|
## 6. Implementation Roadmap
|
|
|
|
### Phase 1: The Skeleton (MVP)
|
|
* [ ] Reuse `quarantine-journal` pattern for WAL.
|
|
* [ ] Implement `Assertion` struct and serialization (`rkyv`).
|
|
* [ ] Basic `sled` storage backend.
|
|
* [ ] Single Lens: `Recency` (Last writer wins logic).
|
|
|
|
### Phase 2: The Graph
|
|
* [ ] Implement `Subject -> Hash` indexing.
|
|
* [ ] Implement `Consensus` Lens (Simple voting).
|
|
* [ ] Basic HTTP API (`POST /assert`, `GET /query`).
|
|
|
|
### Phase 3: The Cortex
|
|
* [ ] Branching support (Context/Session IDs).
|
|
* [ ] Vector search integration (`lanms` or `hnsw-rs`).
|
|
* [ ] TrustRank basics.
|
|
|
|
---
|
|
|
|
## 7. Technology Stack
|
|
|
|
* **Language:** Rust (2024 edition)
|
|
* **WAL:** `quarantine-journal` (Local crate or pattern)
|
|
* **KV Store:** `sled` (Embedded, pure Rust) or `rocksdb` binding.
|
|
* **Serialization:** `rkyv` (Zero-copy deserialization).
|
|
* **API:** `axum` + `tower`.
|
|
* **Hashing:** `blake3` (Fast, secure). |