stemedb/architecture.md
jordan 422e2d4416 feat(aphoria): wire claims through StemeDB — Gap Closure Phase 1
Claims now flow through StemeDB's append-only knowledge graph instead of
mutable TOML files. This resolves all 6 critical claim-bypass code paths:

- Bridge: lossless AuthoredClaim ↔ Assertion round-trip (comparison, status, lifecycle mapping)
- LocalEpisteme: ingest_authored_claim() and fetch_authored_claims() with AUTHORED_CLAIM predicate index
- EpistemeClaimStore: ClaimStore trait backed by StemeDB (append-only delete via deprecation)
- CLI handlers: all claim commands read/write through StemeDB
- Scanner: loads claims from StemeDB with auto-migration fallback to TOML
- Export: new `aphoria claims export` serializes StemeDB claims to TOML/JSON

Also cleans up dead code (EpistemeConfig.url), renames ingest_claims→ingest_observations,
fixes ClaimFilter.authority_tier type, adds Draft variant to ClaimStatus, and fixes
pre-existing clippy warnings (too_many_arguments, filter_next→rfind).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 02:02:51 -07:00

170 lines
6.5 KiB
Markdown

# Episteme (StemeDB) Architecture
> **Design Philosophy:** Immutable History, Probabilistic Resolution, Materialized Speed.
> **Status:** Draft Spec v1.1
## 1. System Overview
Episteme is a **Log-Structured, Content-Addressed Knowledge Graph**. Unlike traditional databases that mutate state in place, Episteme appends **Assertions** to an immutable ledger (Merkle DAG). State resolution happens via **Lenses**.
> **Caveat:** Aphoria's scan observations flow through this append-only path today. Aphoria's authored claims (`AuthoredClaim`) do not -- they are stored in a mutable TOML file (`.aphoria/claims.toml`) and bypass the WAL/Merkle DAG entirely. Routing claims through StemeDB as proper Assertions is a planned gap closure.
To solve the O(N) read latency of conflict resolution, Episteme employs a **Materialized View** layer that pre-calculates the "Current Truth" for standard lenses.
### High-Level Data Flow
```ascii
[Writer Agent] [Reader Agent]
│ ▲
│ (1) Sign & │ (6) Sub-millisecond Answer
│ Propose │ (Pre-computed)
▼ │
┌────────────┐ ┌────────────┐
│ Ingestion │ │ Resolution │
│ Gateway │ │ Engine │
└─────┬──────┘ └─────┬──────┘
│ (2) Append │ (5) Apply Lens + Trust Pack
│ to Ballot │ (BitSet Filter)
▼ │
┌────────────┐ ┌────────────┐
│ Quarantine │ │ Indexing │
│ Journal │──────► Service │
└─────┬──────┘ (3) └─────┬──────┘
│ │ (4) Compaction & Materialization
▼ ▼
┌────────────┐ ┌────────────┐
│ Job Manager│ │ Materialized│
└────────────┘ │ Views │
(TAN Meter) └────────────┘
```
---
## 2. Core Data Structures
### 2.1. The Atomic Unit: `Assertion` (The Candidate)
Assertions are proposals of truth. They are immutable.
```rust
struct Assertion {
pub subject: EntityId,
pub predicate: RelationId,
pub object: Value,
pub epoch: Option<EpochId>,
pub agent_id: PublicKey, // The Proposer
pub timestamp: u64,
// ... lineage and vector fields ...
}
```
### 2.2. The Ballot Box: `Vote` (The High-Velocity Stream)
To prevent lock contention on Assertions, Agents write **Votes** to a separate high-velocity log.
```rust
struct Vote {
pub assertion_hash: Hash, // What are we voting on?
pub agent_id: PublicKey, // Who is voting?
pub weight: f32, // 0.0 - 1.0 (Confidence)
pub signature: Signature, // Cryptographic proof
pub timestamp: u64,
}
```
### 2.3. The Trust Pack (The Overlay)
A curated list of trusted agents, used to filter consensus efficiently.
```rust
struct TrustPack {
pub id: PackId,
pub name: String,
pub maintainer: PublicKey,
pub agents: BitSet, // BloomFilter or RoaringBitmap for fast intersection
}
```
### 2.4. The Storage Layout (LSM Tree)
| Key | Value | Purpose |
| :--- | :--- | :--- |
| `H:{Hash}` | `Assertion` | Immutable Content Store |
| `V:{Hash}` | `List<Vote>` | The Ballot Box (Append-only) |
| `MV:{Subject}:{Predicate}` | `Assertion` | **Materialized View** (The "Winner") |
| `TP:{PackID}` | `TrustPack` | Curation Lists |
| `S:{Subject}` | `List<Hash>` | Adjacency Index |
---
## 3. The Write Path (The Ballot Box)
1. **Ingest:** Agents submit `Assertions` or `Votes`.
2. **Journal:** Written to `episteme-wal`.
3. **Ballot Box:** Votes are appended to the `V:{Hash}` stream.
4. **Compactor (Async):** A background worker aggregates Votes + TrustRank to update the `MV` key.
---
## 4. The Read Path (The Cortex)
**Fast Path (Standard Lenses):**
* Query: `GET /query?lens=Consensus`
* Action: `GET MV:{Subject}:{Predicate}`
* Cost: **O(1)**. Low latency.
**Trusted Path (Trust Packs):**
* Query: `GET /query?lens=Authority&trust_pack=Science_Pack`
* Action:
1. Fetch Candidate Assertions.
2. Fetch Votes.
3. **Filter:** Intersect Votes with `TrustPack.agents` (BitSet operation).
4. Sum weights of remaining votes.
* Cost: **O(1)** (if Materialized per Pack) or **O(M)** (Fast calculation).
### Standard Lenses
* **Consensus:** Highest cluster density.
* **Authority:** Filter by **Trust Pack**.
* **Recency:** Last Writer Wins.
* **EpochAware:** Validates against current paradigm.
* **Constraints:** (New) Returns all `must_use`/`forbidden` assertions for a context. Acts as a "Pre-Flight Check."
---
## 5. The Meter (Economic Safety)
To prevent infinite loops, the Job Manager enforces **Temporal Advantage Normalization (TAN)**.
* **Budgeting:** Every Job must declare a `max_cost`.
* **Throttling:** Forking Reality or Deep Recursion is rejected if `current_cost + projected_cost > max_cost`.
---
## 6. The Simulator (Mid-Training Pipeline)
The system continuously exports data to train the next generation of Agents.
* **Negative Samples:** High-confidence assertions that were later superseded (Failures).
* **Golden Paths:** Branches that successfully merged to Main (Successes).
* **Format:** Exported as HuggingFace-compatible datasets for LoRA fine-tuning.
---
## 7. Implementation Roadmap
### Phase 1: The Spine (Foundation)
* [ ] Reuse `quarantine-journal` pattern for WAL.
* [ ] Implement `Assertion`, `Epoch`, and **`Vote`** structs.
* [ ] Basic `sled` storage backend.
### Phase 2: The Lattice (Connectivity)
* [ ] **The Ballot Box**: Implement separate Vote storage stream.
* [ ] **Materializer**: Implement background worker to maintain `MV` keys.
* [ ] **Trust Packs**: Implement BitSet/BloomFilter logic for agent sets.
* [ ] **The Meter**: Implement Budget/TAN middleware in Job Manager.
* [ ] **Agent Wallet**: Sidecar for key management/signing.
### Phase 3: The Cortex (Reasoning)
* [ ] SMT Backend & Branching.
* [ ] Vector Search.
* [ ] **Lens: Constraints**: Implement the pre-flight check logic.
### Phase 4: The Hive (Learning)
* [ ] **The Simulator**: Log exporter pipeline.
* [ ] **Trust Marketplace**: API for publishing/subscribing to Trust Packs.
* [ ] **The Super Curator**: Implement "Judge" agent with Visual Anchoring.