This commit adds the read path (Cortex) to complement the write path (Spine): ## Crates - stemedb-api: HTTP API with axum + utoipa OpenAPI - /v1/assert, /v1/query, /v1/epoch, /v1/skeptic, /v1/trace, /v1/audit - Metered endpoints with quota enforcement - Ed25519 signature verification - stemedb-lens: Truth resolution lenses - RecencyLens, ConsensusLens, ConfidenceLens - VoteAwareConsensusLens (Ballot Box pattern) - TrustAwareAuthorityLens (The Hive pattern) - SkepticLens (conflict analysis) - EpochAwareLens (paradigm-safe queries) - stemedb-query: Query engine with materialized views ## Storage Extensions - VoteStore: Vote aggregation with cached counts - TrustRankStore: Agent reputation with decay - AuditStore: Query audit trail - IndexStore: SP/P/S index structures - SupersessionStore: Epoch supersession chains ## SDKs - sdk/go/steme: Go HTTP client with Ed25519 signing - sdk/go/adk: ADK-Go tools for AI agents ## Documentation - Updated CLAUDE.md, architecture.md, roadmap.md - New ai-lookup entries for all services - Use case docs for consumer health intelligence - Arena roadmap for simulation advancement Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
168 lines
6.1 KiB
Markdown
168 lines
6.1 KiB
Markdown
# Episteme (StemeDB) Architecture
|
|
|
|
> **Design Philosophy:** Immutable History, Probabilistic Resolution, Materialized Speed.
|
|
> **Status:** Draft Spec v1.1
|
|
|
|
## 1. System Overview
|
|
|
|
Episteme is a **Log-Structured, Content-Addressed Knowledge Graph**. Unlike traditional databases that mutate state in place, Episteme appends **Assertions** to an immutable ledger (Merkle DAG). State resolution happens via **Lenses**.
|
|
|
|
To solve the O(N) read latency of conflict resolution, Episteme employs a **Materialized View** layer that pre-calculates the "Current Truth" for standard lenses.
|
|
|
|
### High-Level Data Flow
|
|
|
|
```ascii
|
|
[Writer Agent] [Reader Agent]
|
|
│ ▲
|
|
│ (1) Sign & │ (6) Sub-millisecond Answer
|
|
│ Propose │ (Pre-computed)
|
|
▼ │
|
|
┌────────────┐ ┌────────────┐
|
|
│ Ingestion │ │ Resolution │
|
|
│ Gateway │ │ Engine │
|
|
└─────┬──────┘ └─────┬──────┘
|
|
│ (2) Append │ (5) Apply Lens + Trust Pack
|
|
│ to Ballot │ (BitSet Filter)
|
|
▼ │
|
|
┌────────────┐ ┌────────────┐
|
|
│ Quarantine │ │ Indexing │
|
|
│ Journal │──────► Service │
|
|
└─────┬──────┘ (3) └─────┬──────┘
|
|
│ │ (4) Compaction & Materialization
|
|
▼ ▼
|
|
┌────────────┐ ┌────────────┐
|
|
│ Job Manager│ │ Materialized│
|
|
└────────────┘ │ Views │
|
|
(TAN Meter) └────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## 2. Core Data Structures
|
|
|
|
### 2.1. The Atomic Unit: `Assertion` (The Candidate)
|
|
Assertions are proposals of truth. They are immutable.
|
|
|
|
```rust
|
|
struct Assertion {
|
|
pub subject: EntityId,
|
|
pub predicate: RelationId,
|
|
pub object: Value,
|
|
pub epoch: Option<EpochId>,
|
|
pub agent_id: PublicKey, // The Proposer
|
|
pub timestamp: u64,
|
|
// ... lineage and vector fields ...
|
|
}
|
|
```
|
|
|
|
### 2.2. The Ballot Box: `Vote` (The High-Velocity Stream)
|
|
To prevent lock contention on Assertions, Agents write **Votes** to a separate high-velocity log.
|
|
|
|
```rust
|
|
struct Vote {
|
|
pub assertion_hash: Hash, // What are we voting on?
|
|
pub agent_id: PublicKey, // Who is voting?
|
|
pub weight: f32, // 0.0 - 1.0 (Confidence)
|
|
pub signature: Signature, // Cryptographic proof
|
|
pub timestamp: u64,
|
|
}
|
|
```
|
|
|
|
### 2.3. The Trust Pack (The Overlay)
|
|
A curated list of trusted agents, used to filter consensus efficiently.
|
|
|
|
```rust
|
|
struct TrustPack {
|
|
pub id: PackId,
|
|
pub name: String,
|
|
pub maintainer: PublicKey,
|
|
pub agents: BitSet, // BloomFilter or RoaringBitmap for fast intersection
|
|
}
|
|
```
|
|
|
|
### 2.4. The Storage Layout (LSM Tree)
|
|
|
|
| Key | Value | Purpose |
|
|
| :--- | :--- | :--- |
|
|
| `H:{Hash}` | `Assertion` | Immutable Content Store |
|
|
| `V:{Hash}` | `List<Vote>` | The Ballot Box (Append-only) |
|
|
| `MV:{Subject}:{Predicate}` | `Assertion` | **Materialized View** (The "Winner") |
|
|
| `TP:{PackID}` | `TrustPack` | Curation Lists |
|
|
| `S:{Subject}` | `List<Hash>` | Adjacency Index |
|
|
|
|
---
|
|
|
|
## 3. The Write Path (The Ballot Box)
|
|
|
|
1. **Ingest:** Agents submit `Assertions` or `Votes`.
|
|
2. **Journal:** Written to `episteme-wal`.
|
|
3. **Ballot Box:** Votes are appended to the `V:{Hash}` stream.
|
|
4. **Compactor (Async):** A background worker aggregates Votes + TrustRank to update the `MV` key.
|
|
|
|
---
|
|
|
|
## 4. The Read Path (The Cortex)
|
|
|
|
**Fast Path (Standard Lenses):**
|
|
* Query: `GET /query?lens=Consensus`
|
|
* Action: `GET MV:{Subject}:{Predicate}`
|
|
* Cost: **O(1)**. Low latency.
|
|
|
|
**Trusted Path (Trust Packs):**
|
|
* Query: `GET /query?lens=Authority&trust_pack=Science_Pack`
|
|
* Action:
|
|
1. Fetch Candidate Assertions.
|
|
2. Fetch Votes.
|
|
3. **Filter:** Intersect Votes with `TrustPack.agents` (BitSet operation).
|
|
4. Sum weights of remaining votes.
|
|
* Cost: **O(1)** (if Materialized per Pack) or **O(M)** (Fast calculation).
|
|
|
|
### Standard Lenses
|
|
* **Consensus:** Highest cluster density.
|
|
* **Authority:** Filter by **Trust Pack**.
|
|
* **Recency:** Last Writer Wins.
|
|
* **EpochAware:** Validates against current paradigm.
|
|
* **Constraints:** (New) Returns all `must_use`/`forbidden` assertions for a context. Acts as a "Pre-Flight Check."
|
|
|
|
---
|
|
|
|
## 5. The Meter (Economic Safety)
|
|
|
|
To prevent infinite loops, the Job Manager enforces **Temporal Advantage Normalization (TAN)**.
|
|
* **Budgeting:** Every Job must declare a `max_cost`.
|
|
* **Throttling:** Forking Reality or Deep Recursion is rejected if `current_cost + projected_cost > max_cost`.
|
|
|
|
---
|
|
|
|
## 6. The Simulator (Mid-Training Pipeline)
|
|
|
|
The system continuously exports data to train the next generation of Agents.
|
|
* **Negative Samples:** High-confidence assertions that were later superseded (Failures).
|
|
* **Golden Paths:** Branches that successfully merged to Main (Successes).
|
|
* **Format:** Exported as HuggingFace-compatible datasets for LoRA fine-tuning.
|
|
|
|
---
|
|
|
|
## 7. Implementation Roadmap
|
|
|
|
### Phase 1: The Spine (Foundation)
|
|
* [ ] Reuse `quarantine-journal` pattern for WAL.
|
|
* [ ] Implement `Assertion`, `Epoch`, and **`Vote`** structs.
|
|
* [ ] Basic `sled` storage backend.
|
|
|
|
### Phase 2: The Lattice (Connectivity)
|
|
* [ ] **The Ballot Box**: Implement separate Vote storage stream.
|
|
* [ ] **Materializer**: Implement background worker to maintain `MV` keys.
|
|
* [ ] **Trust Packs**: Implement BitSet/BloomFilter logic for agent sets.
|
|
* [ ] **The Meter**: Implement Budget/TAN middleware in Job Manager.
|
|
* [ ] **Agent Wallet**: Sidecar for key management/signing.
|
|
|
|
### Phase 3: The Cortex (Reasoning)
|
|
* [ ] SMT Backend & Branching.
|
|
* [ ] Vector Search.
|
|
* [ ] **Lens: Constraints**: Implement the pre-flight check logic.
|
|
|
|
### Phase 4: The Hive (Learning)
|
|
* [ ] **The Simulator**: Log exporter pipeline.
|
|
* [ ] **Trust Marketplace**: API for publishing/subscribing to Trust Packs.
|
|
* [ ] **The Super Curator**: Implement "Judge" agent with Visual Anchoring. |