Complete Aphoria claims system overhaul: - A1: Rename ExtractedClaim to Observation (extractors produce observations, not claims) - A2: Add AuthoredClaim with full provenance, invariants, and authority tiers - A3: Verify engine comparing observations against authored claims, CLI + formatters - A4: Corpus as first-class assertions with predicate indexing, authority lens, trust packs - A5: Coverage analysis, explain/docs generation, self-audit extractor, claim suggester skill Also includes: 42 extractors updated for Observation type, verifiable_predicates trait, conflict detection with comparison modes, claims TOML persistence, Grafana dashboard, backup/restore scripts, and comprehensive test coverage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
22 KiB
Episteme (StemeDB) Roadmap Archive
Purpose: Historical record of completed phases. For current work, see roadmap.md. Last Updated: 2026-02-08
Completed Phases Summary
| Phase | Codename | Status | Completion |
|---|---|---|---|
| 1 | The Spine | ✅ Complete | Storage & Safety — WAL + KV Store |
| 2 | The Lattice | ✅ Complete | Indexing & Async — MVs + Ballot Box |
| 2.5 | Hardening | ✅ Complete | MV staleness, epoch behavior, lens cleanup |
| 3 | The Pilot | ✅ Complete | Vertical Integration — Pharma Ingestion |
| 4 | The Hive | ✅ Complete | Trust & Learning — TrustRank, metadata indexing |
| 5 | The Forge | ✅ Complete | Foundation Hardening — redb/fjall, WAL, indices |
| 6 | The Mesh | ✅ Complete | Distributed Writes — CRDT, Raft, clustering |
| 7 | The Shield | ✅ Complete | Trust at Scale — EigenTrust, PoW, quarantine |
| 8A | Chaos | ✅ Complete | Partition testing, Jepsen-style verification |
| MVP | Consumer Health | ✅ Complete | Real FDA data → conflicts detected → demo |
| Pilot 1-3 | Pilot Prep (Partial) | ✅ Complete | Dashboard, demo data, impact analysis, load testing |
| Pilot 4 | Production Hardening | ✅ Complete | API auth, backup/restore, Prometheus metrics |
| Aphoria A1 | Observations vs Claims | ✅ Complete | Type system: Observation + AuthoredClaim, bridge tiers |
| Aphoria A2 | Authoring Workflow | ✅ Complete | claims create/list/explain/update/supersede/deprecate |
| Aphoria A3 | Verification Engine | ✅ Complete | verify.rs, verify run/map, pre-commit hook, self-audit |
| Aphoria A4 | Corpus as Assertions | ✅ Complete | RFC/OWASP assertions, authority lens, trust packs |
Phase 1: The Spine (Foundation) ✅
Goal: Securely ingest assertions and persist them without data loss.
- Project Scaffold: Initialize Rust workspace, set up linting/CI (clippy, fmt).
- Assertion Schema: Define the
Assertionstruct withrkyvserialization.- Add dependencies:
rkyv,blake3,ed25519-dalek,image_hasher. - Define
Assertionstruct (Subject, Predicate, Object, Confidence, SourceHash). - Multi-Sig Expansion: Implement
SignatureEntrystruct andsignatures: Vec<SignatureEntry>field. - Visual Expansion: Add
visual_hash: Option<pHash>field for image provenance. - Test serialization round-trips.
- Add dependencies:
- Ballot Schema: Define the
Votestruct for multi-agent consensus.- Add
Votestruct:assertion_hash,agent_id,weight,signature. - Test serialization round-trips.
- Add
- Paradigm Schema (Epochs): Define the
EpochandSupersessionTypestructs.- Add
epoch: Option<EpochId>toAssertion. - Implement
Epochstruct withsupersedesandSupersessionType. - Test serialization round-trips.
- Add
- WAL Integration: Implement the Quarantine Pattern for write-ahead logging.
- Create
stemedb-walcrate. - Port
FsyncGuardandRecordlogic from established durability patterns. - Implement Record format with BLAKE3 checksums and Headers.
- Verify
fsyncbehavior with tests.
- Create
- Storage Engine: Implement the
Storetrait usingsled(embedded KV).- Add
sleddependency. - Define
KVStoretrait (put, get, delete, scan_prefix, flush). - Implement
SledStorewrapper.
- Add
- Basic Ingestor: Background worker that tails WAL and writes to KV.
- Implement async loop reading from WAL.
- Write deserialized assertions, votes, and epochs to
sled. - Ed25519 signature verification during ingestion.
- Maintains S: and SP: indexes on ingest.
- Persistent cursor/checkpoint (resumes from
__CURSOR__:ingestin KV store).
- Verification: Crash recovery tests (write -> crash -> restart -> read).
- Single and multi-record crash recovery.
- Multiple crash cycles tested.
Phase 2: The Lattice (Connectivity) ✅
Goal: Query data with sub-millisecond latency using Materialized Views.
- Lifecycle Schema: Add
LifecycleStageto Assertion.- Define enum:
Proposed,UnderReview,Approved,Deprecated,Rejected. - Update
Assertionstruct and serialization tests.
- Define enum:
- The Ballot Box: Implement high-velocity vote ingestion.
VoteStoretrait and implementation.VoteAwareConsensusLensfor real vote-based resolution.
- Index Infrastructure: Compound indexes for O(1) queries.
IndexStoretrait with S: and SP: indexes.QueryEnginesmart routing (SP -> S -> scan).
- Materializer: Background worker for O(1) Read Performance.
MaterializedViewtype instemedb-core.Materializerworker instemedb-querywithstep()andrun().- Aggregates Votes via
VoteAwareConsensusLens(or anyAsyncLens). - Updates
MV:{Subject}:{Predicate}with the winning Assertion + metadata. - Event-driven mode via
run_notified()withtokio::sync::Notify. - Fast-path MV lookup in
QueryEngine::try_fast_path().
- The Meter: Implement Economic Throttling (TAN).
QuotaStoretrait andGenericQuotaStoreimplementation.- Token Bucket algorithm with per-agent per-hour quotas.
MeterLayertower middleware for request cost tracking.- Cost model: Assert=10, Vote=1, Query=5+lens, +1/KB payload.
GET /v1/meter/quotaendpoint to check remaining quota.POST /v1/meter/quota/limitadmin endpoint to set custom limits.
- API Surface:
axumHTTP server with OpenAPI (utoipa).POST /v1/assert-> Accepts JSON, writes to WAL.POST /v1/vote-> High-throughput vote endpoint.POST /v1/epoch-> Create epoch with optional supersession.GET /v1/query-> Subject/Predicate/Lens/Lifecycle/Epoch filtering.GET /v1/health-> Health check with assertion count.GET /swagger-ui-> Interactive API docs.- 5 lens types available: Recency, Consensus, Authority, VoteAwareConsensus, TrustAwareAuthority.
- Query Audit: Log every read with provenance.
- Define
QueryAuditstruct: query_id, agent_id, timestamp, params, result_hash, contributing_assertions. - Storage at
AUD:{query_id}with agent index atAUDA:{agent_id}:{timestamp}:{query_id}. GET /v1/audit/queries-> Returns history of agent decisions.GET /v1/audit/query/{id}-> Full reasoning trace for a single query.- Auto-logging on every query via
X-Agent-Idheader.
- Define
Phase 2.5: Hardening ✅
Goal: Close the gaps between "built" and "works right."
- 2.1 MV Staleness Detection:
max_staleparameter on queries. - 2.2 AuthorityLens -> ConfidenceLens Rename: Eliminated misleading name.
- 2.3 EpochAwareLens: Epoch supersession runtime behavior with cycle detection.
- 2.4 Visual Hash Query Support: Hamming distance queries on
visual_hash. - 2.5 Vector Field:
vector: Option<Vec<f32>>stored on assertions. - 2.6 E2E Integration Test: Full pipeline validation (Write -> Materialize -> Read).
Phase 3: The Pilot (BioTech/Pharma) ✅
Goal: Prove value in the "High-Liability" beachhead.
3A. Schema Expansion
- 3A.1 Source-Class Field: 6-tier
SourceClassenum (Regulatory → Anecdotal). - 3A.2 Conflict Score on Resolution: Normalized variance-based conflict metric.
- 3A.3 Rich Source Metadata:
source_metadata: Option<Vec<u8>>for JSON provenance.
3B. Time & Decay
- 3B.1 Time-Travel Engine:
as_ofparameter for historical queries. - 3B.2 Semantic Decay: Confidence half-life with tier-specific rates.
3C. New Lenses
- 3C.1 Skeptic Lens: Surface disagreement via Shannon entropy conflict scoring.
- 3C.2 Layered Consensus Lens: Per-source-class consensus with tier visibility.
- 3C.3 Constraints Lens: Pre-flight check for must_use/forbidden/prefer.
3D. Epoch Enhancement
- 3D.1 Epoch Cascade Logic: O(1) supersession lookup via pre-computed markers.
3E. Similarity Search
- 3E.1 Vector Search: HNSW-based semantic k-NN queries.
- 3E.2 Visual Hash Index: BK-tree for O(log N) visual similarity.
3F. Provenance
- 3F.1 Source Document Storage: Content-addressed source storage with
GET /v1/provenance/{hash}.
3G. API Cleanup
- 3G.1 Document epoch supersession: Updated docs for
POST /v1/epochwithsupersedesfield.
Phase 4: The Hive (Trust & Scale) ✅
Goal: Change tracking, metadata indexing, and training pipeline primitives.
- TrustRank Engine: Per-agent reputation with decay and learning loop.
- 4.1 "Since" Parameter: MV changelog at
MVC:keys withchanges_sincein responses. - 4.2 Source Metadata Indexing: Indexed fields (journal, doi, platform, study_design) at
SMV:. - 4.3 Batch TrustRank Decay API:
POST /v1/admin/decay-trust-ranks. - 4.4 Vote Provenance Witness:
source_urlandobserved_contexton votes. - 4.5 Conflict Score Filtering:
min_conflict_score/max_conflict_scoreon queries. - 4.6 Escalation Triggers:
EscalationPolicyfires events on high-conflict assertions. - 4.7 Gold Standard Verification: Admin-verified assertions for agent testing.
Phase 5: The Forge (Foundation Hardening) ✅
Goal: Replace abandoned dependencies, fix WAL gaps, persist indices.
5A. Storage Engine Replacement
- 5A.1 Replace sled with redb + fjall: HybridStore with prefix-based routing.
- 5A.2 Key Layout Redesign: Subject-prefix keys for range sharding readiness.
5B. WAL Hardening
- 5B.1 CRC32C Checksums: Hardware-accelerated torn write detection.
- 5B.2 Crash Recovery Implementation: Sequential scan with truncation.
- 5B.3 Group Commit: Batch fsync for throughput.
- 5B.4 Log Rotation: Segment management with safe deletion.
5C. Index Persistence
- 5C.1 Persistent Vector Index: Hot/cold HNSW with checkpoint files.
- 5C.2 Persistent Visual Index: BK-tree snapshots with CRC32C verification.
5D. Concept Hierarchy
- 5D.1 ConceptPath Type: Scheme-qualified subject identifiers.
- 5D.2 Source Scheme Registry: Scheme → default source tier mapping.
- 5D.3 Alias Store: Cross-scheme entity resolution with cycle detection.
- 5D.4 Hierarchical Query: Prefix-based subject queries.
- 5D.5 Alias Resolution in Queries:
GET /v1/concepts/resolve?path=.... - 5D.6 Source Class Inference: Tier inference from scheme.
- 5D.7 Concept API Endpoints: Full CRUD for aliases and hierarchy.
- 5D.8 Battery Tests: 15 tests across Battery 8 and 9.
Phase 6: The Mesh (Distributed Writes) ✅
Goal: Multi-node cluster with CRDT replication and Raft coordination.
6A. CRDT Foundation
- 6A.1 Integrate CRDT Crate: G-Set for assertions, G-Counter for votes.
- 6A.2 Hybrid Logical Clocks: HLC timestamps for causal ordering.
- 6A.3 Merkle Tree Over Assertions: BLAKE3-based diff detection.
6B. Two-Node Replication (PoC)
- 6B.1 RPC Layer: tonic gRPC with SyncClient and SyncServiceHandler.
- 6B.2 Gossip Broadcast: Configurable fanout with rate limiting.
- 6B.3 Merkle Anti-Entropy Sync: Background convergence worker.
- 6B.4 Integration Test: 8 tests validating replication primitives.
6C. Multi-Node Cluster
- 6C.1 Cluster Membership (SWIM Gossip): Node discovery and failure detection.
- 6C.2 Subject-Prefix Range Sharding: BLAKE3 + jump hash routing.
- 6C.4 Gateway: Stateless request routing with health and status endpoints.
- 6C.5 Integration Tests: 82 tests covering membership, sharding, gateway.
Consistency Guarantees
| Property | Guarantee | Mechanism |
|---|---|---|
| Convergence | Eventually consistent | G-Set merge (CRDT) |
| Causality | Supersessions ordered | HLC timestamps |
| Partition Tolerance | Writes never blocked | Any node accepts via CRDT |
| Availability | Reads/writes always succeed | Every node is master for CRDTs |
| Durability | WAL + fsync per node | Existing WAL infra |
| Conflict Resolution | Deterministic | Lens algorithms |
Phase 7: The Shield (Trust at Scale) ✅
Goal: Defend against spam, Sybil attacks, and knowledge poisoning.
7A. Admission Control
- 7A.1 Proof-of-Work Admission: BLAKE3 hashcash with graduated difficulty.
- 7A.2 Graduated Trust Tiers: 5 tiers (Untrusted → Authority) with quota multipliers.
7B. EigenTrust
- 7B.1 Trust Graph Store: Direct trust relationships at
TG:keys. - 7B.2 EigenTrust Computation: Power iteration with Sybil resistance.
- 7B.3 Domain-Specific Trust: Per-predicate-namespace reputation.
7C. Content Defense
- 7C.1 MinHash Deduplication: LSH bucketing with 0.9 Jaccard threshold.
- 7C.2 Content Quality Scoring: Entropy, length, structure heuristics.
- 7C.3 Quarantine Store: Time-ordered suspicious assertions with admin review.
7D. Circuit Breakers
- 7D.1 Per-Agent Circuit Breakers: Closed → Open → HalfOpen state machine.
Phase 8A: Chaos Testing ✅
- 8A.1 Partition Testing: 5-node cluster, network partitions, cascading failures.
- 8A.2 Jepsen-Style Consistency Testing: CRDT properties, clock skew, concurrent writes.
Consumer Health MVP ✅
"Can Episteme demonstrate value that's impossible with Postgres?"
Definition of Done (All Complete)
| Checkpoint | Description |
|---|---|
| Real Data Flows | FDA drug labels for 3+ GLP-1 drugs ingested as signed assertions |
| Conflicts Detected | SkepticLens shows conflict_score > 0.5 when sources disagree |
| Source Hierarchy Works | Tier 0 (FDA) outweighs 100x Tier 5 (anecdotal) volume |
| Time Travel Works | as_of=2024-01-01 returns historical snapshot |
| Decay Works | 6-month-old Reddit claim has lower effective confidence than fresh FDA |
| UAT Passes | Consumer Health scenarios documented and verified |
| Self-Serve Demo | CLI tool lets anyone explore without code |
| Documentation | "Adding a Domain" guide enables new verticals |
MVP Workstream (Weeks 1-6)
- Week 1: Domain definitions, SubjectBuilder, pharma schema
- Week 2: FDA extractor, claim-to-assertion signing
- Week 3: Ingest FDA claims, mock conflicts, SkepticLens demo
- Week 4: UAT scenarios documented and verified
- Week 5:
steme-pharmaCLI for self-serve exploration - Week 6: Polish, reusable patterns, documentation
Enterprise Pilot Preparation (Partial) ✅
Completed: Pilot-1, Pilot-2, Pilot-3, P4.1. Remaining: P4.2-P4.4, P5.1-P5.4 (still in roadmap.md)
Pilot-1: Demo Dashboard (Complete)
Deliverable: React admin dashboard that makes the API visual
- P1.1 Dashboard Scaffold: Next.js + shadcn/ui project setup (
applications/stemedb-dashboard/) - P1.2 Skeptic Query Visualization: Contradictions with conflict scores, tier badges, expandable claims
- P1.3 Layered Consensus View: Per-tier breakdown with cross-tier conflict visualization
- P1.4 Quarantine Admin Panel: Pending queue, approve/reject, filter by reason, metrics
- P1.5 Circuit Breaker Status: Blocked agents, state badges (OPEN/HALF_OPEN/CLOSED), manual reset
- P1.6 Audit Trail Browser: Recent queries, drilldown, filter by agent/time, export JSON/CSV
Pilot-2: Demo Data Seeder (Complete)
Deliverable: Pre-signed realistic demo data using Go SDK
- P2.1 Demo Keypair Management: 5 demo agents (FDA, PubMed, ClinicalTrials, Reddit, Internal) with deterministic keys
- P2.2 Conflict Scenarios: 3 drugs (semaglutide, tirzepatide, liraglutide), 150+ assertions, real FDA content
- P2.3 Retractable Sources: CARDIOVASC_MEGA_TRIAL with 110 cascade assertions across 5 agents
- P2.4 Historical Data: Lifecycle evolution (Proposed → Approved → Deprecated), 17 historical assertions
Pilot-3: Impact Analysis (Complete)
Deliverable: Automatic cascade when source is retracted
- P3.1 Impact Analysis Endpoint:
GET /v1/sources/{hash}/impact, quarantine with preview, restore, 17 tests - P3.2 Cascade Flagging: Query-time source status enrichment,
exclude_quarantined_sourcesfilter, CSV/JSON export - P3.3 Impact Dashboard Widget: Sources page, quarantine dialog with impact preview, impact ripple animation
Pilot-4: Production Hardening (Partial)
- P4.1 Load Testing: Go-based load tester, 10K assertions, 1K writes/sec, 100 concurrent readers, markdown reports
5 Amazement Moments (Status at Archive)
| # | Moment | Status |
|---|---|---|
| 1 | Contradictions visible with confidence scores | ✅ Complete |
| 2 | Cascade invalidation when source retracted | ✅ Complete |
| 3 | Full FDA-ready audit trail | ✅ Complete |
| 4 | Point-in-time queries + decay | ✅ API ready (no timeline UI) |
| 5 | Malicious agent blocked by circuit breaker | ✅ Complete |
Key Architectural Decisions (Historical)
- sled → redb/fjall: sled abandoned. HybridStore routes by key prefix.
- Raft log = WAL: Eliminated duplicate WAL following TiKV v5.4 pattern.
- CRDT for data, Raft for coordination: Assertions are G-Set CRDT.
- Subject-prefix ranges: Co-locate all data for a subject on one shard.
- HLC over TrueTime: Works on commodity hardware.
- AP model: Writes never blocked during partitions.
Research Documents
- docs/research/wal-crash-recovery-research.md — WAL patterns from CockroachDB, TiKV, FoundationDB, SQLite.
- docs/research/distributed-write-path.md — Spanner/CockroachDB-style distributed writes adapted for append-only model.
Crates (as of archive date)
| Crate | Purpose |
|---|---|
stemedb-core |
Assertion, LifecycleStage, MaterializedView, types, signing utilities |
stemedb-wal |
Write-ahead log with crash recovery |
stemedb-storage |
KVStore, VoteStore, IndexStore, TrustRankStore, QuarantineStore, SimilarityIndex |
stemedb-ingest |
Ingestion pipeline, signature verification, ContentDefenseLayer |
stemedb-query |
Query engine, Materializer for O(1) MV reads |
stemedb-lens |
Lenses (Recency, Consensus, Authority, Skeptic, Layered, etc.) |
stemedb-api |
HTTP API with axum + utoipa OpenAPI docs |
stemedb-sim |
Simulation for testing the pipeline |
stemedb-merkle |
BLAKE3 Merkle tree for diff detection |
stemedb-rpc |
gRPC services for node-to-node communication |
stemedb-sync |
Merkle sync, gossip broadcast, anti-entropy |
stemedb-cluster |
Cluster membership (SWIM), sharding, gateway |
stemedb-ontology |
Domain definitions (Pharma), subject builders, medical extractors |
stemedb-chaos |
Chaos testing infrastructure |
Pilot-4: Production Hardening ✅
- P4.2 API Authentication: API key middleware (
X-API-Key), BLAKE3-hashed keys, 3 roles (admin/write/read), 5 CRUD endpoints, bootstrap viaSTEMEDB_ROOT_API_KEY - P4.3 Backup/Restore:
scripts/backup-stemedb.sh+scripts/restore-stemedb.sh, WAL magic verify, rename-not-delete safety - P4.4 Prometheus Metrics:
/metricsendpoint,assertions_total,queries_total,query_latency_seconds,quarantine_pending, Grafana dashboard template
Aphoria A1: Distinguish Observations from Claims ✅
Goal: Type system reflects the real difference. No more pretending grep results are claims.
- A1.1 Rename ExtractedClaim to Observation: Updated across all 42 extractors, bridge, scanner, CLI
- A1.2 Create Claim Type:
AuthoredClaimintypes/authored_claim.rswith provenance/invariant/consequence/authority/evidence/status/supersedes.ClaimStoretrait +TomlClaimStore.ClaimsFileTOML persistence in.aphoria/claims.toml - A1.3 Update Bridge Tier Mapping: Observations → Tier 4 (Community), authored claims get tier from
authority_tierfield viaauthored_claim_to_assertion() - A1.4 Claim File Format:
.aphoria/claims.tomlwith[[claim]]TOML arrays, human-readable, version-controllable
Aphoria A2: Build the Authoring Workflow ✅
Goal: The skill — not the scanner — is the primary interface for creating claims.
- A2.1 Claim Authoring Command:
aphoria claims createwith all fields, authority tier validation - A2.2 Claim Listing:
aphoria claims listwith--category,--status,--format json - A2.3 Claims Explained Generator:
aphoria claims explaingroups by category with provenance/invariant/consequence - A2.4 Enhance Aphoria Skill:
.claude/skills/aphoria-claims/SKILL.mdfor diff review, pattern table, authority tier guide - A2.5 Claim Lifecycle:
update,supersede(with parent pointer),deprecate(with reason)
Aphoria A3: Pair Extractors with Claims ✅
Goal: Extractors verify claims, not generate them. The audit finds real conflicts.
- A3.1 Verification Engine:
ComparisonMode(Equals/NotEquals/Present/Absent),verify.rswith tail-path matching, 4 verdicts (Pass/Conflict/Missing/Unclaimed), 11 unit tests - A3.2 Verify Command:
aphoria verify run|map,--exit-code(0=pass, 1=missing, 2=conflicts, 3=error),--claimand--categoryfilters - A3.3 Verify Report Formatters:
verify_table.rs+verify_json.rs - A3.4 Pre-Commit Hook:
aphoria verify run --changed-only --exit-codeusingwalk_staged_files() - A3.5 Self-Audit Extractors:
self_audit.rs(unwrap count, bridge tier, parent_hash, lifecycle), opt-in, 5+3 tests
Aphoria A4: Make the Corpus First-Class ✅
Goal: RFC/OWASP knowledge lives in Episteme as real assertions, not hardcoded data.
- A4.1 Import RFC Corpus: Tier 0/1 assertions with section references, source hash = content hash,
create_authoritative_assertion_with_metadata()helper - A4.2 Import OWASP Corpus: OWASP → Tier 0/1 assertions with CWE references as metadata
- A4.3 Lens-Based Conflict Resolution:
AphoriaAuthorityLensimplementingstemedb_lens::Lens,TierBreakdownin conflict results - A4.4 Trust Packs as Claim Bundles:
aphoria corpus export-pack,trust-pack list/install,export_claims_as_policy()bridges claims → Trust Packs