Merged 10 upstream commits (MemTable, read-your-writes tests, feed endpoint, security hardening, signed assertions, source registry, dashboard enhancements) and fixed all test failures across the full workspace (2656/2656 passing). Key fixes: - fix(cluster): DashMap deadlock in swim.rs suspect_node/fail_node/alive_node - DashMap::get_mut RefMut + iter() on same map = non-reentrant write lock deadlock - Fix: extract clone in scoped block to drop RefMut before calling update_node_gauges() - 6 previously-hanging SWIM tests now pass in <2s - fix(sim): replace background-task+polling ingestion with synchronous process_pending() - smoke_high_volume_simulation was CPU-starved under 2656 parallel tests - Removed ingestor.start() + wait_until_ingested() pattern throughout sim - All arena functions now call ingestor.process_pending() directly (deterministic) - fix(test): v2 signature helper used wrong hash (rkyv vs canonical compute_content_hash_v2) - fix(test): quota test signed "test" but v1 requires "subject:predicate" format - fix(test): http_validation now accepts 400 for valid-format-but-invalid-crypto hex - fix(test): scale_adaptive micro tier assertions updated (auto_promote upstream change) - config: add nextest.toml with slow-timeout for background-task-tests group Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
176 lines
6.9 KiB
Markdown
176 lines
6.9 KiB
Markdown
# Episteme (StemeDB) Architecture
|
|
|
|
> **Design Philosophy:** Immutable History, Probabilistic Resolution, Materialized Speed.
|
|
> **Status:** Implementation v1.0
|
|
|
|
## 1. System Overview
|
|
|
|
Episteme is a **Log-Structured, Content-Addressed Knowledge Graph**. Unlike traditional databases that mutate state in place, Episteme appends **Assertions** to an immutable ledger (Merkle DAG). State resolution happens via **Lenses**.
|
|
|
|
> **Caveat:** Aphoria's scan observations flow through this append-only path today. Aphoria's authored claims (`AuthoredClaim`) do not -- they are stored in a mutable TOML file (`.aphoria/claims.toml`) and bypass the WAL/Merkle DAG entirely. Routing claims through StemeDB as proper Assertions is a planned gap closure.
|
|
|
|
To solve the O(N) read latency of conflict resolution, Episteme employs a **Materialized View** layer that pre-calculates the "Current Truth" for standard lenses.
|
|
|
|
### High-Level Data Flow
|
|
|
|
```ascii
|
|
[Writer Agent] [Reader Agent]
|
|
│ ▲
|
|
│ (1) Sign & │ (6) Sub-millisecond Answer
|
|
│ Propose │ (Pre-computed)
|
|
▼ │
|
|
┌────────────┐ ┌────────────┐
|
|
│ Ingestion │ │ Resolution │
|
|
│ Gateway │ │ Engine │
|
|
└─────┬──────┘ └─────┬──────┘
|
|
│ (2) Append │ (5) Apply Lens + Trust Pack
|
|
│ to Ballot │ (BitSet Filter)
|
|
▼ │
|
|
┌────────────┐ ┌────────────┐
|
|
│ Quarantine │ │ Indexing │
|
|
│ Journal │──────► Service │
|
|
└─────┬──────┘ (3) └─────┬──────┘
|
|
│ │ (4) Compaction & Materialization
|
|
▼ ▼
|
|
┌────────────┐ ┌────────────┐
|
|
│ Job Manager│ │ Materialized│
|
|
└────────────┘ │ Views │
|
|
(TAN Meter) └────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## 2. Core Data Structures
|
|
|
|
### 2.1. The Atomic Unit: `Assertion` (The Candidate)
|
|
Assertions are proposals of truth. They are immutable.
|
|
|
|
```rust
|
|
struct Assertion {
|
|
pub subject: EntityId,
|
|
pub predicate: RelationId,
|
|
pub object: Value,
|
|
pub epoch: Option<EpochId>,
|
|
pub agent_id: PublicKey, // The Proposer
|
|
pub timestamp: u64,
|
|
// ... lineage and vector fields ...
|
|
}
|
|
```
|
|
|
|
### 2.2. The Ballot Box: `Vote` (The High-Velocity Stream)
|
|
To prevent lock contention on Assertions, Agents write **Votes** to a separate high-velocity log.
|
|
|
|
```rust
|
|
struct Vote {
|
|
pub assertion_hash: Hash, // What are we voting on?
|
|
pub agent_id: PublicKey, // Who is voting?
|
|
pub weight: f32, // 0.0 - 1.0 (Confidence)
|
|
pub signature: Signature, // Cryptographic proof
|
|
pub timestamp: u64,
|
|
}
|
|
```
|
|
|
|
### 2.3. The Trust Pack (The Overlay)
|
|
A curated list of trusted agents, used to filter consensus efficiently.
|
|
|
|
```rust
|
|
struct TrustPack {
|
|
pub id: PackId,
|
|
pub name: String,
|
|
pub maintainer: PublicKey,
|
|
pub agents: BitSet, // BloomFilter or RoaringBitmap for fast intersection
|
|
}
|
|
```
|
|
|
|
### 2.4. The Storage Layout (Hybrid Store)
|
|
|
|
Episteme uses a **Hybrid Storage** architecture to balance write throughput and read latency:
|
|
* **Fjall (LSM-Tree):** Used for write-heavy, append-only data (Assertions, Votes, WAL).
|
|
* **Redb (B-Tree):** Used for read-heavy, random-access data (Indexes, Materialized Views).
|
|
|
|
| Key | Value | Purpose | Backend |
|
|
| :--- | :--- | :--- | :--- |
|
|
| `H:{Hash}` | `Assertion` | Immutable Content Store | Fjall |
|
|
| `V:{Hash}` | `List<Vote>` | The Ballot Box (Append-only) | Fjall |
|
|
| `MV:{Subject}:{Predicate}` | `Assertion` | **Materialized View** (The "Winner") | Redb |
|
|
| `TP:{PackID}` | `TrustPack` | Curation Lists | Redb |
|
|
| `S:{Subject}` | `List<Hash>` | Adjacency Index | Redb |
|
|
|
|
---
|
|
|
|
## 3. The Write Path (The Ballot Box)
|
|
|
|
1. **Ingest:** Agents submit `Assertions` or `Votes`.
|
|
2. **Journal:** Written to `episteme-wal` (Quarantine Pattern).
|
|
3. **Ballot Box:** Votes are appended to the `V:{Hash}` stream.
|
|
4. **Compactor (Async):** A background worker aggregates Votes + TrustRank to update the `MV` key.
|
|
|
|
---
|
|
|
|
## 4. The Read Path (The Cortex)
|
|
|
|
**Fast Path (Standard Lenses):**
|
|
* Query: `GET /query?lens=Consensus`
|
|
* Action: `GET MV:{Subject}:{Predicate}`
|
|
* Cost: **O(1)**. Low latency.
|
|
|
|
**Trusted Path (Trust Packs):**
|
|
* Query: `GET /query?lens=Authority&trust_pack=Science_Pack`
|
|
* Action:
|
|
1. Fetch Candidate Assertions.
|
|
2. Fetch Votes.
|
|
3. **Filter:** Intersect Votes with `TrustPack.agents` (BitSet operation).
|
|
4. Sum weights of remaining votes.
|
|
* Cost: **O(1)** (if Materialized per Pack) or **O(M)** (Fast calculation).
|
|
|
|
### Standard Lenses (Implemented)
|
|
* **Consensus:** Highest cluster density (Vote-aware).
|
|
* **Authority:** Filter by **Trust Pack** and **TrustRank**.
|
|
* **Recency:** Last Writer Wins (Hybrid Logical Clock).
|
|
* **EpochAware:** Validates against current paradigm.
|
|
* **Skeptic:** Surfaces conflicts and divergence.
|
|
|
|
---
|
|
|
|
## 5. The Meter (Economic Safety)
|
|
|
|
To prevent infinite loops, the Job Manager enforces **Temporal Advantage Normalization (TAN)**.
|
|
* **Budgeting:** Every Job must declare a `max_cost`.
|
|
* **Throttling:** Forking Reality or Deep Recursion is rejected if `current_cost + projected_cost > max_cost`.
|
|
|
|
---
|
|
|
|
## 6. The Simulator (Mid-Training Pipeline)
|
|
|
|
The system continuously exports data to train the next generation of Agents.
|
|
* **Negative Samples:** High-confidence assertions that were later superseded (Failures).
|
|
* **Golden Paths:** Branches that successfully merged to Main (Successes).
|
|
* **Format:** Exported as HuggingFace-compatible datasets for LoRA fine-tuning.
|
|
|
|
---
|
|
|
|
## 7. Implementation Roadmap
|
|
|
|
### Phase 1: The Spine (Foundation)
|
|
* [x] Reuse `quarantine-journal` pattern for WAL (`stemedb-wal`).
|
|
* [x] Implement `Assertion`, `Epoch`, and **`Vote`** structs (`stemedb-core`).
|
|
* [x] Hybrid Storage backend (`stemedb-storage`).
|
|
|
|
### Phase 2: The Lattice (Connectivity)
|
|
* [x] **The Ballot Box**: Separate Vote storage stream.
|
|
* [x] **Materializer**: Background worker to maintain `MV` keys.
|
|
* [x] **Trust Packs**: Agent sets for filtering.
|
|
* [ ] **The Meter**: Implement Budget/TAN middleware in Job Manager.
|
|
* [ ] **Agent Wallet**: Sidecar for key management/signing.
|
|
|
|
### Phase 3: The Cortex (Reasoning)
|
|
* [x] **Lenses**: `Recency`, `Consensus`, `Authority`, `Skeptic` implemented (`stemedb-lens`).
|
|
* [ ] SMT Backend & Branching.
|
|
* [ ] Vector Search.
|
|
* [ ] **Lens: Constraints**: Implement the pre-flight check logic.
|
|
|
|
### Phase 4: The Hive (Learning)
|
|
* [ ] **The Simulator**: Log exporter pipeline.
|
|
* [ ] **Trust Marketplace**: API for publishing/subscribing to Trust Packs.
|
|
* [ ] **The Super Curator**: Implement "Judge" agent with Visual Anchoring.
|