stemedb/architecture.md
jordan 1ce4004807 feat: Complete Phase 2 (The Cortex) - query, lens, and API layers
This commit adds the read path (Cortex) to complement the write path (Spine):

## Crates
- stemedb-api: HTTP API with axum + utoipa OpenAPI
  - /v1/assert, /v1/query, /v1/epoch, /v1/skeptic, /v1/trace, /v1/audit
  - Metered endpoints with quota enforcement
  - Ed25519 signature verification
- stemedb-lens: Truth resolution lenses
  - RecencyLens, ConsensusLens, ConfidenceLens
  - VoteAwareConsensusLens (Ballot Box pattern)
  - TrustAwareAuthorityLens (The Hive pattern)
  - SkepticLens (conflict analysis)
  - EpochAwareLens (paradigm-safe queries)
- stemedb-query: Query engine with materialized views

## Storage Extensions
- VoteStore: Vote aggregation with cached counts
- TrustRankStore: Agent reputation with decay
- AuditStore: Query audit trail
- IndexStore: SP/P/S index structures
- SupersessionStore: Epoch supersession chains

## SDKs
- sdk/go/steme: Go HTTP client with Ed25519 signing
- sdk/go/adk: ADK-Go tools for AI agents

## Documentation
- Updated CLAUDE.md, architecture.md, roadmap.md
- New ai-lookup entries for all services
- Use case docs for consumer health intelligence
- Arena roadmap for simulation advancement

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 13:22:44 -07:00

6.1 KiB

Episteme (StemeDB) Architecture

Design Philosophy: Immutable History, Probabilistic Resolution, Materialized Speed. Status: Draft Spec v1.1

1. System Overview

Episteme is a Log-Structured, Content-Addressed Knowledge Graph. Unlike traditional databases that mutate state in place, Episteme appends Assertions to an immutable ledger (Merkle DAG). State resolution happens via Lenses.

To solve the O(N) read latency of conflict resolution, Episteme employs a Materialized View layer that pre-calculates the "Current Truth" for standard lenses.

High-Level Data Flow

[Writer Agent]      [Reader Agent]
      │                   ▲
      │ (1) Sign &        │ (6) Sub-millisecond Answer
      │     Propose       │     (Pre-computed)
      ▼                   │
┌────────────┐      ┌────────────┐
│  Ingestion │      │ Resolution │
│  Gateway   │      │ Engine     │
└─────┬──────┘      └─────┬──────┘
      │ (2) Append        │ (5) Apply Lens + Trust Pack
      │     to Ballot     │     (BitSet Filter)
      ▼                   │
┌────────────┐      ┌────────────┐
│ Quarantine │      │ Indexing   │
│ Journal    │──────► Service    │
└─────┬──────┘ (3)  └─────┬──────┘
      │                   │ (4) Compaction & Materialization
      ▼                   ▼
┌────────────┐      ┌────────────┐
│ Job Manager│      │ Materialized│
└────────────┘      │ Views      │
 (TAN Meter)        └────────────┘

2. Core Data Structures

2.1. The Atomic Unit: Assertion (The Candidate)

Assertions are proposals of truth. They are immutable.

struct Assertion {
    pub subject: EntityId,
    pub predicate: RelationId,
    pub object: Value,
    pub epoch: Option<EpochId>,
    pub agent_id: PublicKey,     // The Proposer
    pub timestamp: u64,
    // ... lineage and vector fields ...
}

2.2. The Ballot Box: Vote (The High-Velocity Stream)

To prevent lock contention on Assertions, Agents write Votes to a separate high-velocity log.

struct Vote {
    pub assertion_hash: Hash,    // What are we voting on?
    pub agent_id: PublicKey,     // Who is voting?
    pub weight: f32,             // 0.0 - 1.0 (Confidence)
    pub signature: Signature,    // Cryptographic proof
    pub timestamp: u64,
}

2.3. The Trust Pack (The Overlay)

A curated list of trusted agents, used to filter consensus efficiently.

struct TrustPack {
    pub id: PackId,
    pub name: String,
    pub maintainer: PublicKey,
    pub agents: BitSet, // BloomFilter or RoaringBitmap for fast intersection
}

2.4. The Storage Layout (LSM Tree)

Key Value Purpose
H:{Hash} Assertion Immutable Content Store
V:{Hash} List<Vote> The Ballot Box (Append-only)
MV:{Subject}:{Predicate} Assertion Materialized View (The "Winner")
TP:{PackID} TrustPack Curation Lists
S:{Subject} List<Hash> Adjacency Index

3. The Write Path (The Ballot Box)

  1. Ingest: Agents submit Assertions or Votes.
  2. Journal: Written to episteme-wal.
  3. Ballot Box: Votes are appended to the V:{Hash} stream.
  4. Compactor (Async): A background worker aggregates Votes + TrustRank to update the MV key.

4. The Read Path (The Cortex)

Fast Path (Standard Lenses):

  • Query: GET /query?lens=Consensus
  • Action: GET MV:{Subject}:{Predicate}
  • Cost: O(1). Low latency.

Trusted Path (Trust Packs):

  • Query: GET /query?lens=Authority&trust_pack=Science_Pack
  • Action:
    1. Fetch Candidate Assertions.
    2. Fetch Votes.
    3. Filter: Intersect Votes with TrustPack.agents (BitSet operation).
    4. Sum weights of remaining votes.
  • Cost: O(1) (if Materialized per Pack) or O(M) (Fast calculation).

Standard Lenses

  • Consensus: Highest cluster density.
  • Authority: Filter by Trust Pack.
  • Recency: Last Writer Wins.
  • EpochAware: Validates against current paradigm.
  • Constraints: (New) Returns all must_use/forbidden assertions for a context. Acts as a "Pre-Flight Check."

5. The Meter (Economic Safety)

To prevent infinite loops, the Job Manager enforces Temporal Advantage Normalization (TAN).

  • Budgeting: Every Job must declare a max_cost.
  • Throttling: Forking Reality or Deep Recursion is rejected if current_cost + projected_cost > max_cost.

6. The Simulator (Mid-Training Pipeline)

The system continuously exports data to train the next generation of Agents.

  • Negative Samples: High-confidence assertions that were later superseded (Failures).
  • Golden Paths: Branches that successfully merged to Main (Successes).
  • Format: Exported as HuggingFace-compatible datasets for LoRA fine-tuning.

7. Implementation Roadmap

Phase 1: The Spine (Foundation)

  • Reuse quarantine-journal pattern for WAL.
  • Implement Assertion, Epoch, and Vote structs.
  • Basic sled storage backend.

Phase 2: The Lattice (Connectivity)

  • The Ballot Box: Implement separate Vote storage stream.
  • Materializer: Implement background worker to maintain MV keys.
  • Trust Packs: Implement BitSet/BloomFilter logic for agent sets.
  • The Meter: Implement Budget/TAN middleware in Job Manager.
  • Agent Wallet: Sidecar for key management/signing.

Phase 3: The Cortex (Reasoning)

  • SMT Backend & Branching.
  • Vector Search.
  • Lens: Constraints: Implement the pre-flight check logic.

Phase 4: The Hive (Learning)

  • The Simulator: Log exporter pipeline.
  • Trust Marketplace: API for publishing/subscribing to Trust Packs.
  • The Super Curator: Implement "Judge" agent with Visual Anchoring.