stemedb/architecture.md
jordan 422e2d4416 feat(aphoria): wire claims through StemeDB — Gap Closure Phase 1
Claims now flow through StemeDB's append-only knowledge graph instead of
mutable TOML files. This resolves all 6 critical claim-bypass code paths:

- Bridge: lossless AuthoredClaim ↔ Assertion round-trip (comparison, status, lifecycle mapping)
- LocalEpisteme: ingest_authored_claim() and fetch_authored_claims() with AUTHORED_CLAIM predicate index
- EpistemeClaimStore: ClaimStore trait backed by StemeDB (append-only delete via deprecation)
- CLI handlers: all claim commands read/write through StemeDB
- Scanner: loads claims from StemeDB with auto-migration fallback to TOML
- Export: new `aphoria claims export` serializes StemeDB claims to TOML/JSON

Also cleans up dead code (EpistemeConfig.url), renames ingest_claims→ingest_observations,
fixes ClaimFilter.authority_tier type, adds Draft variant to ClaimStatus, and fixes
pre-existing clippy warnings (too_many_arguments, filter_next→rfind).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 02:02:51 -07:00

6.5 KiB

Episteme (StemeDB) Architecture

Design Philosophy: Immutable History, Probabilistic Resolution, Materialized Speed. Status: Draft Spec v1.1

1. System Overview

Episteme is a Log-Structured, Content-Addressed Knowledge Graph. Unlike traditional databases that mutate state in place, Episteme appends Assertions to an immutable ledger (Merkle DAG). State resolution happens via Lenses.

Caveat: Aphoria's scan observations flow through this append-only path today. Aphoria's authored claims (AuthoredClaim) do not -- they are stored in a mutable TOML file (.aphoria/claims.toml) and bypass the WAL/Merkle DAG entirely. Routing claims through StemeDB as proper Assertions is a planned gap closure.

To solve the O(N) read latency of conflict resolution, Episteme employs a Materialized View layer that pre-calculates the "Current Truth" for standard lenses.

High-Level Data Flow

[Writer Agent]      [Reader Agent]
      │                   ▲
      │ (1) Sign &        │ (6) Sub-millisecond Answer
      │     Propose       │     (Pre-computed)
      ▼                   │
┌────────────┐      ┌────────────┐
│  Ingestion │      │ Resolution │
│  Gateway   │      │ Engine     │
└─────┬──────┘      └─────┬──────┘
      │ (2) Append        │ (5) Apply Lens + Trust Pack
      │     to Ballot     │     (BitSet Filter)
      ▼                   │
┌────────────┐      ┌────────────┐
│ Quarantine │      │ Indexing   │
│ Journal    │──────► Service    │
└─────┬──────┘ (3)  └─────┬──────┘
      │                   │ (4) Compaction & Materialization
      ▼                   ▼
┌────────────┐      ┌────────────┐
│ Job Manager│      │ Materialized│
└────────────┘      │ Views      │
 (TAN Meter)        └────────────┘

2. Core Data Structures

2.1. The Atomic Unit: Assertion (The Candidate)

Assertions are proposals of truth. They are immutable.

struct Assertion {
    pub subject: EntityId,
    pub predicate: RelationId,
    pub object: Value,
    pub epoch: Option<EpochId>,
    pub agent_id: PublicKey,     // The Proposer
    pub timestamp: u64,
    // ... lineage and vector fields ...
}

2.2. The Ballot Box: Vote (The High-Velocity Stream)

To prevent lock contention on Assertions, Agents write Votes to a separate high-velocity log.

struct Vote {
    pub assertion_hash: Hash,    // What are we voting on?
    pub agent_id: PublicKey,     // Who is voting?
    pub weight: f32,             // 0.0 - 1.0 (Confidence)
    pub signature: Signature,    // Cryptographic proof
    pub timestamp: u64,
}

2.3. The Trust Pack (The Overlay)

A curated list of trusted agents, used to filter consensus efficiently.

struct TrustPack {
    pub id: PackId,
    pub name: String,
    pub maintainer: PublicKey,
    pub agents: BitSet, // BloomFilter or RoaringBitmap for fast intersection
}

2.4. The Storage Layout (LSM Tree)

Key Value Purpose
H:{Hash} Assertion Immutable Content Store
V:{Hash} List<Vote> The Ballot Box (Append-only)
MV:{Subject}:{Predicate} Assertion Materialized View (The "Winner")
TP:{PackID} TrustPack Curation Lists
S:{Subject} List<Hash> Adjacency Index

3. The Write Path (The Ballot Box)

  1. Ingest: Agents submit Assertions or Votes.
  2. Journal: Written to episteme-wal.
  3. Ballot Box: Votes are appended to the V:{Hash} stream.
  4. Compactor (Async): A background worker aggregates Votes + TrustRank to update the MV key.

4. The Read Path (The Cortex)

Fast Path (Standard Lenses):

  • Query: GET /query?lens=Consensus
  • Action: GET MV:{Subject}:{Predicate}
  • Cost: O(1). Low latency.

Trusted Path (Trust Packs):

  • Query: GET /query?lens=Authority&trust_pack=Science_Pack
  • Action:
    1. Fetch Candidate Assertions.
    2. Fetch Votes.
    3. Filter: Intersect Votes with TrustPack.agents (BitSet operation).
    4. Sum weights of remaining votes.
  • Cost: O(1) (if Materialized per Pack) or O(M) (Fast calculation).

Standard Lenses

  • Consensus: Highest cluster density.
  • Authority: Filter by Trust Pack.
  • Recency: Last Writer Wins.
  • EpochAware: Validates against current paradigm.
  • Constraints: (New) Returns all must_use/forbidden assertions for a context. Acts as a "Pre-Flight Check."

5. The Meter (Economic Safety)

To prevent infinite loops, the Job Manager enforces Temporal Advantage Normalization (TAN).

  • Budgeting: Every Job must declare a max_cost.
  • Throttling: Forking Reality or Deep Recursion is rejected if current_cost + projected_cost > max_cost.

6. The Simulator (Mid-Training Pipeline)

The system continuously exports data to train the next generation of Agents.

  • Negative Samples: High-confidence assertions that were later superseded (Failures).
  • Golden Paths: Branches that successfully merged to Main (Successes).
  • Format: Exported as HuggingFace-compatible datasets for LoRA fine-tuning.

7. Implementation Roadmap

Phase 1: The Spine (Foundation)

  • Reuse quarantine-journal pattern for WAL.
  • Implement Assertion, Epoch, and Vote structs.
  • Basic sled storage backend.

Phase 2: The Lattice (Connectivity)

  • The Ballot Box: Implement separate Vote storage stream.
  • Materializer: Implement background worker to maintain MV keys.
  • Trust Packs: Implement BitSet/BloomFilter logic for agent sets.
  • The Meter: Implement Budget/TAN middleware in Job Manager.
  • Agent Wallet: Sidecar for key management/signing.

Phase 3: The Cortex (Reasoning)

  • SMT Backend & Branching.
  • Vector Search.
  • Lens: Constraints: Implement the pre-flight check logic.

Phase 4: The Hive (Learning)

  • The Simulator: Log exporter pipeline.
  • Trust Marketplace: API for publishing/subscribing to Trust Packs.
  • The Super Curator: Implement "Judge" agent with Visual Anchoring.