Merged 10 upstream commits (MemTable, read-your-writes tests, feed endpoint, security hardening, signed assertions, source registry, dashboard enhancements) and fixed all test failures across the full workspace (2656/2656 passing). Key fixes: - fix(cluster): DashMap deadlock in swim.rs suspect_node/fail_node/alive_node - DashMap::get_mut RefMut + iter() on same map = non-reentrant write lock deadlock - Fix: extract clone in scoped block to drop RefMut before calling update_node_gauges() - 6 previously-hanging SWIM tests now pass in <2s - fix(sim): replace background-task+polling ingestion with synchronous process_pending() - smoke_high_volume_simulation was CPU-starved under 2656 parallel tests - Removed ingestor.start() + wait_until_ingested() pattern throughout sim - All arena functions now call ingestor.process_pending() directly (deterministic) - fix(test): v2 signature helper used wrong hash (rkyv vs canonical compute_content_hash_v2) - fix(test): quota test signed "test" but v1 requires "subject:predicate" format - fix(test): http_validation now accepts 400 for valid-format-but-invalid-crypto hex - fix(test): scale_adaptive micro tier assertions updated (auto_promote upstream change) - config: add nextest.toml with slow-timeout for background-task-tests group Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6.9 KiB
Episteme (StemeDB) Architecture
Design Philosophy: Immutable History, Probabilistic Resolution, Materialized Speed. Status: Implementation v1.0
1. System Overview
Episteme is a Log-Structured, Content-Addressed Knowledge Graph. Unlike traditional databases that mutate state in place, Episteme appends Assertions to an immutable ledger (Merkle DAG). State resolution happens via Lenses.
Caveat: Aphoria's scan observations flow through this append-only path today. Aphoria's authored claims (
AuthoredClaim) do not -- they are stored in a mutable TOML file (.aphoria/claims.toml) and bypass the WAL/Merkle DAG entirely. Routing claims through StemeDB as proper Assertions is a planned gap closure.
To solve the O(N) read latency of conflict resolution, Episteme employs a Materialized View layer that pre-calculates the "Current Truth" for standard lenses.
High-Level Data Flow
[Writer Agent] [Reader Agent]
│ ▲
│ (1) Sign & │ (6) Sub-millisecond Answer
│ Propose │ (Pre-computed)
▼ │
┌────────────┐ ┌────────────┐
│ Ingestion │ │ Resolution │
│ Gateway │ │ Engine │
└─────┬──────┘ └─────┬──────┘
│ (2) Append │ (5) Apply Lens + Trust Pack
│ to Ballot │ (BitSet Filter)
▼ │
┌────────────┐ ┌────────────┐
│ Quarantine │ │ Indexing │
│ Journal │──────► Service │
└─────┬──────┘ (3) └─────┬──────┘
│ │ (4) Compaction & Materialization
▼ ▼
┌────────────┐ ┌────────────┐
│ Job Manager│ │ Materialized│
└────────────┘ │ Views │
(TAN Meter) └────────────┘
2. Core Data Structures
2.1. The Atomic Unit: Assertion (The Candidate)
Assertions are proposals of truth. They are immutable.
struct Assertion {
pub subject: EntityId,
pub predicate: RelationId,
pub object: Value,
pub epoch: Option<EpochId>,
pub agent_id: PublicKey, // The Proposer
pub timestamp: u64,
// ... lineage and vector fields ...
}
2.2. The Ballot Box: Vote (The High-Velocity Stream)
To prevent lock contention on Assertions, Agents write Votes to a separate high-velocity log.
struct Vote {
pub assertion_hash: Hash, // What are we voting on?
pub agent_id: PublicKey, // Who is voting?
pub weight: f32, // 0.0 - 1.0 (Confidence)
pub signature: Signature, // Cryptographic proof
pub timestamp: u64,
}
2.3. The Trust Pack (The Overlay)
A curated list of trusted agents, used to filter consensus efficiently.
struct TrustPack {
pub id: PackId,
pub name: String,
pub maintainer: PublicKey,
pub agents: BitSet, // BloomFilter or RoaringBitmap for fast intersection
}
2.4. The Storage Layout (Hybrid Store)
Episteme uses a Hybrid Storage architecture to balance write throughput and read latency:
- Fjall (LSM-Tree): Used for write-heavy, append-only data (Assertions, Votes, WAL).
- Redb (B-Tree): Used for read-heavy, random-access data (Indexes, Materialized Views).
| Key | Value | Purpose | Backend |
|---|---|---|---|
H:{Hash} |
Assertion |
Immutable Content Store | Fjall |
V:{Hash} |
List<Vote> |
The Ballot Box (Append-only) | Fjall |
MV:{Subject}:{Predicate} |
Assertion |
Materialized View (The "Winner") | Redb |
TP:{PackID} |
TrustPack |
Curation Lists | Redb |
S:{Subject} |
List<Hash> |
Adjacency Index | Redb |
3. The Write Path (The Ballot Box)
- Ingest: Agents submit
AssertionsorVotes. - Journal: Written to
episteme-wal(Quarantine Pattern). - Ballot Box: Votes are appended to the
V:{Hash}stream. - Compactor (Async): A background worker aggregates Votes + TrustRank to update the
MVkey.
4. The Read Path (The Cortex)
Fast Path (Standard Lenses):
- Query:
GET /query?lens=Consensus - Action:
GET MV:{Subject}:{Predicate} - Cost: O(1). Low latency.
Trusted Path (Trust Packs):
- Query:
GET /query?lens=Authority&trust_pack=Science_Pack - Action:
- Fetch Candidate Assertions.
- Fetch Votes.
- Filter: Intersect Votes with
TrustPack.agents(BitSet operation). - Sum weights of remaining votes.
- Cost: O(1) (if Materialized per Pack) or O(M) (Fast calculation).
Standard Lenses (Implemented)
- Consensus: Highest cluster density (Vote-aware).
- Authority: Filter by Trust Pack and TrustRank.
- Recency: Last Writer Wins (Hybrid Logical Clock).
- EpochAware: Validates against current paradigm.
- Skeptic: Surfaces conflicts and divergence.
5. The Meter (Economic Safety)
To prevent infinite loops, the Job Manager enforces Temporal Advantage Normalization (TAN).
- Budgeting: Every Job must declare a
max_cost. - Throttling: Forking Reality or Deep Recursion is rejected if
current_cost + projected_cost > max_cost.
6. The Simulator (Mid-Training Pipeline)
The system continuously exports data to train the next generation of Agents.
- Negative Samples: High-confidence assertions that were later superseded (Failures).
- Golden Paths: Branches that successfully merged to Main (Successes).
- Format: Exported as HuggingFace-compatible datasets for LoRA fine-tuning.
7. Implementation Roadmap
Phase 1: The Spine (Foundation)
- Reuse
quarantine-journalpattern for WAL (stemedb-wal). - Implement
Assertion,Epoch, andVotestructs (stemedb-core). - Hybrid Storage backend (
stemedb-storage).
Phase 2: The Lattice (Connectivity)
- The Ballot Box: Separate Vote storage stream.
- Materializer: Background worker to maintain
MVkeys. - Trust Packs: Agent sets for filtering.
- The Meter: Implement Budget/TAN middleware in Job Manager.
- Agent Wallet: Sidecar for key management/signing.
Phase 3: The Cortex (Reasoning)
- Lenses:
Recency,Consensus,Authority,Skepticimplemented (stemedb-lens). - SMT Backend & Branching.
- Vector Search.
- Lens: Constraints: Implement the pre-flight check logic.
Phase 4: The Hive (Learning)
- The Simulator: Log exporter pipeline.
- Trust Marketplace: API for publishing/subscribing to Trust Packs.
- The Super Curator: Implement "Judge" agent with Visual Anchoring.