stemedb/architecture.md
jordan a776744889 Initial project setup with Claude Code monorepo structure
- Rust workspace with stemedb-core crate
- Full .claude/ configuration (agents, skills, commands, guides)
- ai-lookup/ for token-efficient fact storage
- Quality gates: clippy, fmt, jscpd duplication detection
- Pre-commit hook with 5-phase quality checks
- CLAUDE.md router and CODING_GUIDELINES.md standards

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 10:56:26 -07:00

5.7 KiB

Episteme (StemeDB) Architecture

Design Philosophy: Immutable History, Probabilistic Resolution. Status: Draft Spec v0.1

1. System Overview

Episteme is a Log-Structured, Content-Addressed Knowledge Graph. Unlike traditional databases that mutate state in place, Episteme appends Assertions to an immutable ledger (Merkle DAG). State resolution happens at read-time via Lenses.

High-Level Data Flow

[Writer Agent]      [Reader Agent]
      │                   ▲
      │ (1) Sign &        │ (5) Deterministic Answer
      │     Propose       │     (Confidence: 0.92)
      ▼                   │
┌────────────┐      ┌────────────┐
│  Ingestion │      │ Resolution │
│  Gateway   │      │ Engine     │
└─────┬──────┘      └─────┬──────┘
      │ (2) Append        │ (4) Apply Lens (Filter/Rank)
      │     to WAL        │
      ▼                   │
┌────────────┐      ┌────────────┐
│ Quarantine │      │ Indexing   │
│ Journal    │──────► Service    │
└────────────┘ (3)  └────────────┘
   (Durability)     (Graph/Vector)

2. Core Data Structures

2.1. The Atomic Unit: Assertion

Everything in Episteme is an Assertion. There are no "Tables."

// The immutable payload (Content-Addressed by Hash)
struct Assertion {
    // 1. The Triple (The Fact)
    pub subject: EntityId,       // "Tesla_Inc"
    pub predicate: RelationId,   // "has_revenue"
    pub object: Value,           // Variant: Float(10.5B), String("Musk"), Ref(EntityId)

    // 2. The Lineage (The Chain)
    pub parent_hash: Option<Hash>, // If modifying a previous claim (Forking)
    pub source_hash: Hash,         // Evidence pointer (PDF/Log hash)

    // 3. The Meta-Cognition (The Weight)
    pub agent_id: PublicKey,     // Ed25519 signature
    pub confidence: f32,         // 0.0 - 1.0 (Subjective certainty)
    pub timestamp: u64,          // Wall clock time
    pub vector: Option<Vec<f32>>,// Semantic embedding (for fuzzy recall)
}

2.2. The Storage Layout (LSM Tree)

We use a Key-Value store (e.g., sled or RocksDB) to persist the DAG.

Key Value Purpose
H:{Hash} Serialized<Assertion> Main content store
S:{Subject} List<Hash> Subject-to-Claims Index
SP:{Subject}:{Predicate} List<Hash> Exact Triple Index
A:{AgentID} ReputationScore TrustRank storage

3. The Write Path (The Spine)

Episteme follows the Quarantine Pattern for durability.

  1. Receive: Agent submits a signed Assertion.
  2. Verify: Check signature validity and structure.
  3. Journal: Write to episteme-wal (Append-only file, fsync immediate).
  4. Acknowledge: Return 202 Accepted to Agent with the new Hash.
  5. Index (Async): A background worker tails the WAL:
    • Deserializes the Assertion.
    • Updates the H:{Hash} store.
    • Appends Hash to the S:{Subject} adjacency list.
    • Updates HNSW vector index (if vector present).

4. The Read Path (The Cortex)

Reading is where Episteme differs from every other DB. A Read is a Compute Operation.

Query: GET(Subject="Tesla", Predicate="Revenue", Lens="Consensus")

  1. Gather: Lookup SP:Tesla:Revenue. Get list of candidate Hashes: [H1, H2, H3, H4].
  2. Hydrate: Fetch full Assertions for each Hash.
  3. Resolve (The Lens): Pass candidates through the Lens pipeline.

The Lens Pipeline (Rust Trait)

trait Lens {
    fn resolve(&self, candidates: Vec<Assertion>, context: Context) -> LensResult;
}

// Example: Consensus Lens Logic
// 1. Group candidates by Object value (clustering).
// 2. Sum the TrustRank of Agents in each cluster.
// 3. Return the cluster with highest weighted mass.

5. Advanced Mechanics

5.1. Forking Reality (Branching)

Branching is handled via Overlay Graphs.

  • A Branch is simply a lightweight index (Map) of Hash -> Assertion.
  • Write to Branch: Assertions are stored in the Branch's ephemeral index, not the Global DAG.
  • Read from Branch: The Query Engine checks the Branch index first, then falls back to Global (Overlay pattern).
  • Merge: Commit the Branch's unique assertions to the Global WAL.

5.2. TrustRank (Reputation)

Background worker (episteme-gardener) runs periodically:

  1. Identifies "Settled Facts" (Assertions with >99% consensus over T time).
  2. Rewards Agents who claimed these facts early.
  3. Punishes Agents who claimed the opposite.
  4. Updates A:{AgentID} reputation scores.

6. Implementation Roadmap

Phase 1: The Skeleton (MVP)

  • Reuse quarantine-journal pattern for WAL.
  • Implement Assertion struct and serialization (rkyv).
  • Basic sled storage backend.
  • Single Lens: Recency (Last writer wins logic).

Phase 2: The Graph

  • Implement Subject -> Hash indexing.
  • Implement Consensus Lens (Simple voting).
  • Basic HTTP API (POST /assert, GET /query).

Phase 3: The Cortex

  • Branching support (Context/Session IDs).
  • Vector search integration (lanms or hnsw-rs).
  • TrustRank basics.

7. Technology Stack

  • Language: Rust (2024 edition)
  • WAL: quarantine-journal (Local crate or pattern)
  • KV Store: sled (Embedded, pure Rust) or rocksdb binding.
  • Serialization: rkyv (Zero-copy deserialization).
  • API: axum + tower.
  • Hashing: blake3 (Fast, secure).