- Rust workspace with stemedb-core crate - Full .claude/ configuration (agents, skills, commands, guides) - ai-lookup/ for token-efficient fact storage - Quality gates: clippy, fmt, jscpd duplication detection - Pre-commit hook with 5-phase quality checks - CLAUDE.md router and CODING_GUIDELINES.md standards Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
5.7 KiB
Episteme (StemeDB) Architecture
Design Philosophy: Immutable History, Probabilistic Resolution. Status: Draft Spec v0.1
1. System Overview
Episteme is a Log-Structured, Content-Addressed Knowledge Graph. Unlike traditional databases that mutate state in place, Episteme appends Assertions to an immutable ledger (Merkle DAG). State resolution happens at read-time via Lenses.
High-Level Data Flow
[Writer Agent] [Reader Agent]
│ ▲
│ (1) Sign & │ (5) Deterministic Answer
│ Propose │ (Confidence: 0.92)
▼ │
┌────────────┐ ┌────────────┐
│ Ingestion │ │ Resolution │
│ Gateway │ │ Engine │
└─────┬──────┘ └─────┬──────┘
│ (2) Append │ (4) Apply Lens (Filter/Rank)
│ to WAL │
▼ │
┌────────────┐ ┌────────────┐
│ Quarantine │ │ Indexing │
│ Journal │──────► Service │
└────────────┘ (3) └────────────┘
(Durability) (Graph/Vector)
2. Core Data Structures
2.1. The Atomic Unit: Assertion
Everything in Episteme is an Assertion. There are no "Tables."
// The immutable payload (Content-Addressed by Hash)
struct Assertion {
// 1. The Triple (The Fact)
pub subject: EntityId, // "Tesla_Inc"
pub predicate: RelationId, // "has_revenue"
pub object: Value, // Variant: Float(10.5B), String("Musk"), Ref(EntityId)
// 2. The Lineage (The Chain)
pub parent_hash: Option<Hash>, // If modifying a previous claim (Forking)
pub source_hash: Hash, // Evidence pointer (PDF/Log hash)
// 3. The Meta-Cognition (The Weight)
pub agent_id: PublicKey, // Ed25519 signature
pub confidence: f32, // 0.0 - 1.0 (Subjective certainty)
pub timestamp: u64, // Wall clock time
pub vector: Option<Vec<f32>>,// Semantic embedding (for fuzzy recall)
}
2.2. The Storage Layout (LSM Tree)
We use a Key-Value store (e.g., sled or RocksDB) to persist the DAG.
| Key | Value | Purpose |
|---|---|---|
H:{Hash} |
Serialized<Assertion> |
Main content store |
S:{Subject} |
List<Hash> |
Subject-to-Claims Index |
SP:{Subject}:{Predicate} |
List<Hash> |
Exact Triple Index |
A:{AgentID} |
ReputationScore |
TrustRank storage |
3. The Write Path (The Spine)
Episteme follows the Quarantine Pattern for durability.
- Receive: Agent submits a signed
Assertion. - Verify: Check signature validity and structure.
- Journal: Write to
episteme-wal(Append-only file, fsync immediate). - Acknowledge: Return
202 Acceptedto Agent with the newHash. - Index (Async): A background worker tails the WAL:
- Deserializes the Assertion.
- Updates the
H:{Hash}store. - Appends
Hashto theS:{Subject}adjacency list. - Updates HNSW vector index (if vector present).
4. The Read Path (The Cortex)
Reading is where Episteme differs from every other DB. A Read is a Compute Operation.
Query: GET(Subject="Tesla", Predicate="Revenue", Lens="Consensus")
- Gather: Lookup
SP:Tesla:Revenue. Get list of candidate Hashes:[H1, H2, H3, H4]. - Hydrate: Fetch full Assertions for each Hash.
- Resolve (The Lens): Pass candidates through the Lens pipeline.
The Lens Pipeline (Rust Trait)
trait Lens {
fn resolve(&self, candidates: Vec<Assertion>, context: Context) -> LensResult;
}
// Example: Consensus Lens Logic
// 1. Group candidates by Object value (clustering).
// 2. Sum the TrustRank of Agents in each cluster.
// 3. Return the cluster with highest weighted mass.
5. Advanced Mechanics
5.1. Forking Reality (Branching)
Branching is handled via Overlay Graphs.
- A
Branchis simply a lightweight index (Map) ofHash -> Assertion. - Write to Branch: Assertions are stored in the Branch's ephemeral index, not the Global DAG.
- Read from Branch: The Query Engine checks the Branch index first, then falls back to Global (Overlay pattern).
- Merge: Commit the Branch's unique assertions to the Global WAL.
5.2. TrustRank (Reputation)
Background worker (episteme-gardener) runs periodically:
- Identifies "Settled Facts" (Assertions with >99% consensus over T time).
- Rewards Agents who claimed these facts early.
- Punishes Agents who claimed the opposite.
- Updates
A:{AgentID}reputation scores.
6. Implementation Roadmap
Phase 1: The Skeleton (MVP)
- Reuse
quarantine-journalpattern for WAL. - Implement
Assertionstruct and serialization (rkyv). - Basic
sledstorage backend. - Single Lens:
Recency(Last writer wins logic).
Phase 2: The Graph
- Implement
Subject -> Hashindexing. - Implement
ConsensusLens (Simple voting). - Basic HTTP API (
POST /assert,GET /query).
Phase 3: The Cortex
- Branching support (Context/Session IDs).
- Vector search integration (
lanmsorhnsw-rs). - TrustRank basics.
7. Technology Stack
- Language: Rust (2024 edition)
- WAL:
quarantine-journal(Local crate or pattern) - KV Store:
sled(Embedded, pure Rust) orrocksdbbinding. - Serialization:
rkyv(Zero-copy deserialization). - API:
axum+tower. - Hashing:
blake3(Fast, secure).