stemedb/roadmap-archive.md
jml 3b5f88b4f0 feat(aphoria): implement claims architecture (A1-A5) with verify engine, corpus, coverage, and explain
Complete Aphoria claims system overhaul:
- A1: Rename ExtractedClaim to Observation (extractors produce observations, not claims)
- A2: Add AuthoredClaim with full provenance, invariants, and authority tiers
- A3: Verify engine comparing observations against authored claims, CLI + formatters
- A4: Corpus as first-class assertions with predicate indexing, authority lens, trust packs
- A5: Coverage analysis, explain/docs generation, self-audit extractor, claim suggester skill

Also includes: 42 extractors updated for Observation type, verifiable_predicates trait,
conflict detection with comparison modes, claims TOML persistence, Grafana dashboard,
backup/restore scripts, and comprehensive test coverage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 09:11:47 +00:00

22 KiB

Episteme (StemeDB) Roadmap Archive

Purpose: Historical record of completed phases. For current work, see roadmap.md. Last Updated: 2026-02-08


Completed Phases Summary

Phase Codename Status Completion
1 The Spine Complete Storage & Safety — WAL + KV Store
2 The Lattice Complete Indexing & Async — MVs + Ballot Box
2.5 Hardening Complete MV staleness, epoch behavior, lens cleanup
3 The Pilot Complete Vertical Integration — Pharma Ingestion
4 The Hive Complete Trust & Learning — TrustRank, metadata indexing
5 The Forge Complete Foundation Hardening — redb/fjall, WAL, indices
6 The Mesh Complete Distributed Writes — CRDT, Raft, clustering
7 The Shield Complete Trust at Scale — EigenTrust, PoW, quarantine
8A Chaos Complete Partition testing, Jepsen-style verification
MVP Consumer Health Complete Real FDA data → conflicts detected → demo
Pilot 1-3 Pilot Prep (Partial) Complete Dashboard, demo data, impact analysis, load testing
Pilot 4 Production Hardening Complete API auth, backup/restore, Prometheus metrics
Aphoria A1 Observations vs Claims Complete Type system: Observation + AuthoredClaim, bridge tiers
Aphoria A2 Authoring Workflow Complete claims create/list/explain/update/supersede/deprecate
Aphoria A3 Verification Engine Complete verify.rs, verify run/map, pre-commit hook, self-audit
Aphoria A4 Corpus as Assertions Complete RFC/OWASP assertions, authority lens, trust packs

Phase 1: The Spine (Foundation)

Goal: Securely ingest assertions and persist them without data loss.

  • Project Scaffold: Initialize Rust workspace, set up linting/CI (clippy, fmt).
  • Assertion Schema: Define the Assertion struct with rkyv serialization.
    • Add dependencies: rkyv, blake3, ed25519-dalek, image_hasher.
    • Define Assertion struct (Subject, Predicate, Object, Confidence, SourceHash).
    • Multi-Sig Expansion: Implement SignatureEntry struct and signatures: Vec<SignatureEntry> field.
    • Visual Expansion: Add visual_hash: Option<pHash> field for image provenance.
    • Test serialization round-trips.
  • Ballot Schema: Define the Vote struct for multi-agent consensus.
    • Add Vote struct: assertion_hash, agent_id, weight, signature.
    • Test serialization round-trips.
  • Paradigm Schema (Epochs): Define the Epoch and SupersessionType structs.
    • Add epoch: Option<EpochId> to Assertion.
    • Implement Epoch struct with supersedes and SupersessionType.
    • Test serialization round-trips.
  • WAL Integration: Implement the Quarantine Pattern for write-ahead logging.
    • Create stemedb-wal crate.
    • Port FsyncGuard and Record logic from established durability patterns.
    • Implement Record format with BLAKE3 checksums and Headers.
    • Verify fsync behavior with tests.
  • Storage Engine: Implement the Store trait using sled (embedded KV).
    • Add sled dependency.
    • Define KVStore trait (put, get, delete, scan_prefix, flush).
    • Implement SledStore wrapper.
  • Basic Ingestor: Background worker that tails WAL and writes to KV.
    • Implement async loop reading from WAL.
    • Write deserialized assertions, votes, and epochs to sled.
    • Ed25519 signature verification during ingestion.
    • Maintains S: and SP: indexes on ingest.
    • Persistent cursor/checkpoint (resumes from __CURSOR__:ingest in KV store).
  • Verification: Crash recovery tests (write -> crash -> restart -> read).
    • Single and multi-record crash recovery.
    • Multiple crash cycles tested.

Phase 2: The Lattice (Connectivity)

Goal: Query data with sub-millisecond latency using Materialized Views.

  • Lifecycle Schema: Add LifecycleStage to Assertion.
    • Define enum: Proposed, UnderReview, Approved, Deprecated, Rejected.
    • Update Assertion struct and serialization tests.
  • The Ballot Box: Implement high-velocity vote ingestion.
    • VoteStore trait and implementation.
    • VoteAwareConsensusLens for real vote-based resolution.
  • Index Infrastructure: Compound indexes for O(1) queries.
    • IndexStore trait with S: and SP: indexes.
    • QueryEngine smart routing (SP -> S -> scan).
  • Materializer: Background worker for O(1) Read Performance.
    • MaterializedView type in stemedb-core.
    • Materializer worker in stemedb-query with step() and run().
    • Aggregates Votes via VoteAwareConsensusLens (or any AsyncLens).
    • Updates MV:{Subject}:{Predicate} with the winning Assertion + metadata.
    • Event-driven mode via run_notified() with tokio::sync::Notify.
    • Fast-path MV lookup in QueryEngine::try_fast_path().
  • The Meter: Implement Economic Throttling (TAN).
    • QuotaStore trait and GenericQuotaStore implementation.
    • Token Bucket algorithm with per-agent per-hour quotas.
    • MeterLayer tower middleware for request cost tracking.
    • Cost model: Assert=10, Vote=1, Query=5+lens, +1/KB payload.
    • GET /v1/meter/quota endpoint to check remaining quota.
    • POST /v1/meter/quota/limit admin endpoint to set custom limits.
  • API Surface: axum HTTP server with OpenAPI (utoipa).
    • POST /v1/assert -> Accepts JSON, writes to WAL.
    • POST /v1/vote -> High-throughput vote endpoint.
    • POST /v1/epoch -> Create epoch with optional supersession.
    • GET /v1/query -> Subject/Predicate/Lens/Lifecycle/Epoch filtering.
    • GET /v1/health -> Health check with assertion count.
    • GET /swagger-ui -> Interactive API docs.
    • 5 lens types available: Recency, Consensus, Authority, VoteAwareConsensus, TrustAwareAuthority.
  • Query Audit: Log every read with provenance.
    • Define QueryAudit struct: query_id, agent_id, timestamp, params, result_hash, contributing_assertions.
    • Storage at AUD:{query_id} with agent index at AUDA:{agent_id}:{timestamp}:{query_id}.
    • GET /v1/audit/queries -> Returns history of agent decisions.
    • GET /v1/audit/query/{id} -> Full reasoning trace for a single query.
    • Auto-logging on every query via X-Agent-Id header.

Phase 2.5: Hardening

Goal: Close the gaps between "built" and "works right."

  • 2.1 MV Staleness Detection: max_stale parameter on queries.
  • 2.2 AuthorityLens -> ConfidenceLens Rename: Eliminated misleading name.
  • 2.3 EpochAwareLens: Epoch supersession runtime behavior with cycle detection.
  • 2.4 Visual Hash Query Support: Hamming distance queries on visual_hash.
  • 2.5 Vector Field: vector: Option<Vec<f32>> stored on assertions.
  • 2.6 E2E Integration Test: Full pipeline validation (Write -> Materialize -> Read).

Phase 3: The Pilot (BioTech/Pharma)

Goal: Prove value in the "High-Liability" beachhead.

3A. Schema Expansion

  • 3A.1 Source-Class Field: 6-tier SourceClass enum (Regulatory → Anecdotal).
  • 3A.2 Conflict Score on Resolution: Normalized variance-based conflict metric.
  • 3A.3 Rich Source Metadata: source_metadata: Option<Vec<u8>> for JSON provenance.

3B. Time & Decay

  • 3B.1 Time-Travel Engine: as_of parameter for historical queries.
  • 3B.2 Semantic Decay: Confidence half-life with tier-specific rates.

3C. New Lenses

  • 3C.1 Skeptic Lens: Surface disagreement via Shannon entropy conflict scoring.
  • 3C.2 Layered Consensus Lens: Per-source-class consensus with tier visibility.
  • 3C.3 Constraints Lens: Pre-flight check for must_use/forbidden/prefer.

3D. Epoch Enhancement

  • 3D.1 Epoch Cascade Logic: O(1) supersession lookup via pre-computed markers.
  • 3E.1 Vector Search: HNSW-based semantic k-NN queries.
  • 3E.2 Visual Hash Index: BK-tree for O(log N) visual similarity.

3F. Provenance

  • 3F.1 Source Document Storage: Content-addressed source storage with GET /v1/provenance/{hash}.

3G. API Cleanup

  • 3G.1 Document epoch supersession: Updated docs for POST /v1/epoch with supersedes field.

Phase 4: The Hive (Trust & Scale)

Goal: Change tracking, metadata indexing, and training pipeline primitives.

  • TrustRank Engine: Per-agent reputation with decay and learning loop.
  • 4.1 "Since" Parameter: MV changelog at MVC: keys with changes_since in responses.
  • 4.2 Source Metadata Indexing: Indexed fields (journal, doi, platform, study_design) at SMV:.
  • 4.3 Batch TrustRank Decay API: POST /v1/admin/decay-trust-ranks.
  • 4.4 Vote Provenance Witness: source_url and observed_context on votes.
  • 4.5 Conflict Score Filtering: min_conflict_score/max_conflict_score on queries.
  • 4.6 Escalation Triggers: EscalationPolicy fires events on high-conflict assertions.
  • 4.7 Gold Standard Verification: Admin-verified assertions for agent testing.

Phase 5: The Forge (Foundation Hardening)

Goal: Replace abandoned dependencies, fix WAL gaps, persist indices.

5A. Storage Engine Replacement

  • 5A.1 Replace sled with redb + fjall: HybridStore with prefix-based routing.
  • 5A.2 Key Layout Redesign: Subject-prefix keys for range sharding readiness.

5B. WAL Hardening

  • 5B.1 CRC32C Checksums: Hardware-accelerated torn write detection.
  • 5B.2 Crash Recovery Implementation: Sequential scan with truncation.
  • 5B.3 Group Commit: Batch fsync for throughput.
  • 5B.4 Log Rotation: Segment management with safe deletion.

5C. Index Persistence

  • 5C.1 Persistent Vector Index: Hot/cold HNSW with checkpoint files.
  • 5C.2 Persistent Visual Index: BK-tree snapshots with CRC32C verification.

5D. Concept Hierarchy

  • 5D.1 ConceptPath Type: Scheme-qualified subject identifiers.
  • 5D.2 Source Scheme Registry: Scheme → default source tier mapping.
  • 5D.3 Alias Store: Cross-scheme entity resolution with cycle detection.
  • 5D.4 Hierarchical Query: Prefix-based subject queries.
  • 5D.5 Alias Resolution in Queries: GET /v1/concepts/resolve?path=....
  • 5D.6 Source Class Inference: Tier inference from scheme.
  • 5D.7 Concept API Endpoints: Full CRUD for aliases and hierarchy.
  • 5D.8 Battery Tests: 15 tests across Battery 8 and 9.

Phase 6: The Mesh (Distributed Writes)

Goal: Multi-node cluster with CRDT replication and Raft coordination.

6A. CRDT Foundation

  • 6A.1 Integrate CRDT Crate: G-Set for assertions, G-Counter for votes.
  • 6A.2 Hybrid Logical Clocks: HLC timestamps for causal ordering.
  • 6A.3 Merkle Tree Over Assertions: BLAKE3-based diff detection.

6B. Two-Node Replication (PoC)

  • 6B.1 RPC Layer: tonic gRPC with SyncClient and SyncServiceHandler.
  • 6B.2 Gossip Broadcast: Configurable fanout with rate limiting.
  • 6B.3 Merkle Anti-Entropy Sync: Background convergence worker.
  • 6B.4 Integration Test: 8 tests validating replication primitives.

6C. Multi-Node Cluster

  • 6C.1 Cluster Membership (SWIM Gossip): Node discovery and failure detection.
  • 6C.2 Subject-Prefix Range Sharding: BLAKE3 + jump hash routing.
  • 6C.4 Gateway: Stateless request routing with health and status endpoints.
  • 6C.5 Integration Tests: 82 tests covering membership, sharding, gateway.

Consistency Guarantees

Property Guarantee Mechanism
Convergence Eventually consistent G-Set merge (CRDT)
Causality Supersessions ordered HLC timestamps
Partition Tolerance Writes never blocked Any node accepts via CRDT
Availability Reads/writes always succeed Every node is master for CRDTs
Durability WAL + fsync per node Existing WAL infra
Conflict Resolution Deterministic Lens algorithms

Phase 7: The Shield (Trust at Scale)

Goal: Defend against spam, Sybil attacks, and knowledge poisoning.

7A. Admission Control

  • 7A.1 Proof-of-Work Admission: BLAKE3 hashcash with graduated difficulty.
  • 7A.2 Graduated Trust Tiers: 5 tiers (Untrusted → Authority) with quota multipliers.

7B. EigenTrust

  • 7B.1 Trust Graph Store: Direct trust relationships at TG: keys.
  • 7B.2 EigenTrust Computation: Power iteration with Sybil resistance.
  • 7B.3 Domain-Specific Trust: Per-predicate-namespace reputation.

7C. Content Defense

  • 7C.1 MinHash Deduplication: LSH bucketing with 0.9 Jaccard threshold.
  • 7C.2 Content Quality Scoring: Entropy, length, structure heuristics.
  • 7C.3 Quarantine Store: Time-ordered suspicious assertions with admin review.

7D. Circuit Breakers

  • 7D.1 Per-Agent Circuit Breakers: Closed → Open → HalfOpen state machine.

Phase 8A: Chaos Testing

  • 8A.1 Partition Testing: 5-node cluster, network partitions, cascading failures.
  • 8A.2 Jepsen-Style Consistency Testing: CRDT properties, clock skew, concurrent writes.

Consumer Health MVP

"Can Episteme demonstrate value that's impossible with Postgres?"

Definition of Done (All Complete)

Checkpoint Description
Real Data Flows FDA drug labels for 3+ GLP-1 drugs ingested as signed assertions
Conflicts Detected SkepticLens shows conflict_score > 0.5 when sources disagree
Source Hierarchy Works Tier 0 (FDA) outweighs 100x Tier 5 (anecdotal) volume
Time Travel Works as_of=2024-01-01 returns historical snapshot
Decay Works 6-month-old Reddit claim has lower effective confidence than fresh FDA
UAT Passes Consumer Health scenarios documented and verified
Self-Serve Demo CLI tool lets anyone explore without code
Documentation "Adding a Domain" guide enables new verticals

MVP Workstream (Weeks 1-6)

  • Week 1: Domain definitions, SubjectBuilder, pharma schema
  • Week 2: FDA extractor, claim-to-assertion signing
  • Week 3: Ingest FDA claims, mock conflicts, SkepticLens demo
  • Week 4: UAT scenarios documented and verified
  • Week 5: steme-pharma CLI for self-serve exploration
  • Week 6: Polish, reusable patterns, documentation

Enterprise Pilot Preparation (Partial)

Completed: Pilot-1, Pilot-2, Pilot-3, P4.1. Remaining: P4.2-P4.4, P5.1-P5.4 (still in roadmap.md)

Pilot-1: Demo Dashboard (Complete)

Deliverable: React admin dashboard that makes the API visual

  • P1.1 Dashboard Scaffold: Next.js + shadcn/ui project setup (applications/stemedb-dashboard/)
  • P1.2 Skeptic Query Visualization: Contradictions with conflict scores, tier badges, expandable claims
  • P1.3 Layered Consensus View: Per-tier breakdown with cross-tier conflict visualization
  • P1.4 Quarantine Admin Panel: Pending queue, approve/reject, filter by reason, metrics
  • P1.5 Circuit Breaker Status: Blocked agents, state badges (OPEN/HALF_OPEN/CLOSED), manual reset
  • P1.6 Audit Trail Browser: Recent queries, drilldown, filter by agent/time, export JSON/CSV

Pilot-2: Demo Data Seeder (Complete)

Deliverable: Pre-signed realistic demo data using Go SDK

  • P2.1 Demo Keypair Management: 5 demo agents (FDA, PubMed, ClinicalTrials, Reddit, Internal) with deterministic keys
  • P2.2 Conflict Scenarios: 3 drugs (semaglutide, tirzepatide, liraglutide), 150+ assertions, real FDA content
  • P2.3 Retractable Sources: CARDIOVASC_MEGA_TRIAL with 110 cascade assertions across 5 agents
  • P2.4 Historical Data: Lifecycle evolution (Proposed → Approved → Deprecated), 17 historical assertions

Pilot-3: Impact Analysis (Complete)

Deliverable: Automatic cascade when source is retracted

  • P3.1 Impact Analysis Endpoint: GET /v1/sources/{hash}/impact, quarantine with preview, restore, 17 tests
  • P3.2 Cascade Flagging: Query-time source status enrichment, exclude_quarantined_sources filter, CSV/JSON export
  • P3.3 Impact Dashboard Widget: Sources page, quarantine dialog with impact preview, impact ripple animation

Pilot-4: Production Hardening (Partial)

  • P4.1 Load Testing: Go-based load tester, 10K assertions, 1K writes/sec, 100 concurrent readers, markdown reports

5 Amazement Moments (Status at Archive)

# Moment Status
1 Contradictions visible with confidence scores Complete
2 Cascade invalidation when source retracted Complete
3 Full FDA-ready audit trail Complete
4 Point-in-time queries + decay API ready (no timeline UI)
5 Malicious agent blocked by circuit breaker Complete

Key Architectural Decisions (Historical)

  • sled → redb/fjall: sled abandoned. HybridStore routes by key prefix.
  • Raft log = WAL: Eliminated duplicate WAL following TiKV v5.4 pattern.
  • CRDT for data, Raft for coordination: Assertions are G-Set CRDT.
  • Subject-prefix ranges: Co-locate all data for a subject on one shard.
  • HLC over TrueTime: Works on commodity hardware.
  • AP model: Writes never blocked during partitions.

Research Documents


Crates (as of archive date)

Crate Purpose
stemedb-core Assertion, LifecycleStage, MaterializedView, types, signing utilities
stemedb-wal Write-ahead log with crash recovery
stemedb-storage KVStore, VoteStore, IndexStore, TrustRankStore, QuarantineStore, SimilarityIndex
stemedb-ingest Ingestion pipeline, signature verification, ContentDefenseLayer
stemedb-query Query engine, Materializer for O(1) MV reads
stemedb-lens Lenses (Recency, Consensus, Authority, Skeptic, Layered, etc.)
stemedb-api HTTP API with axum + utoipa OpenAPI docs
stemedb-sim Simulation for testing the pipeline
stemedb-merkle BLAKE3 Merkle tree for diff detection
stemedb-rpc gRPC services for node-to-node communication
stemedb-sync Merkle sync, gossip broadcast, anti-entropy
stemedb-cluster Cluster membership (SWIM), sharding, gateway
stemedb-ontology Domain definitions (Pharma), subject builders, medical extractors
stemedb-chaos Chaos testing infrastructure

Pilot-4: Production Hardening

  • P4.2 API Authentication: API key middleware (X-API-Key), BLAKE3-hashed keys, 3 roles (admin/write/read), 5 CRUD endpoints, bootstrap via STEMEDB_ROOT_API_KEY
  • P4.3 Backup/Restore: scripts/backup-stemedb.sh + scripts/restore-stemedb.sh, WAL magic verify, rename-not-delete safety
  • P4.4 Prometheus Metrics: /metrics endpoint, assertions_total, queries_total, query_latency_seconds, quarantine_pending, Grafana dashboard template

Aphoria A1: Distinguish Observations from Claims

Goal: Type system reflects the real difference. No more pretending grep results are claims.

  • A1.1 Rename ExtractedClaim to Observation: Updated across all 42 extractors, bridge, scanner, CLI
  • A1.2 Create Claim Type: AuthoredClaim in types/authored_claim.rs with provenance/invariant/consequence/authority/evidence/status/supersedes. ClaimStore trait + TomlClaimStore. ClaimsFile TOML persistence in .aphoria/claims.toml
  • A1.3 Update Bridge Tier Mapping: Observations → Tier 4 (Community), authored claims get tier from authority_tier field via authored_claim_to_assertion()
  • A1.4 Claim File Format: .aphoria/claims.toml with [[claim]] TOML arrays, human-readable, version-controllable

Aphoria A2: Build the Authoring Workflow

Goal: The skill — not the scanner — is the primary interface for creating claims.

  • A2.1 Claim Authoring Command: aphoria claims create with all fields, authority tier validation
  • A2.2 Claim Listing: aphoria claims list with --category, --status, --format json
  • A2.3 Claims Explained Generator: aphoria claims explain groups by category with provenance/invariant/consequence
  • A2.4 Enhance Aphoria Skill: .claude/skills/aphoria-claims/SKILL.md for diff review, pattern table, authority tier guide
  • A2.5 Claim Lifecycle: update, supersede (with parent pointer), deprecate (with reason)

Aphoria A3: Pair Extractors with Claims

Goal: Extractors verify claims, not generate them. The audit finds real conflicts.

  • A3.1 Verification Engine: ComparisonMode (Equals/NotEquals/Present/Absent), verify.rs with tail-path matching, 4 verdicts (Pass/Conflict/Missing/Unclaimed), 11 unit tests
  • A3.2 Verify Command: aphoria verify run|map, --exit-code (0=pass, 1=missing, 2=conflicts, 3=error), --claim and --category filters
  • A3.3 Verify Report Formatters: verify_table.rs + verify_json.rs
  • A3.4 Pre-Commit Hook: aphoria verify run --changed-only --exit-code using walk_staged_files()
  • A3.5 Self-Audit Extractors: self_audit.rs (unwrap count, bridge tier, parent_hash, lifecycle), opt-in, 5+3 tests

Aphoria A4: Make the Corpus First-Class

Goal: RFC/OWASP knowledge lives in Episteme as real assertions, not hardcoded data.

  • A4.1 Import RFC Corpus: Tier 0/1 assertions with section references, source hash = content hash, create_authoritative_assertion_with_metadata() helper
  • A4.2 Import OWASP Corpus: OWASP → Tier 0/1 assertions with CWE references as metadata
  • A4.3 Lens-Based Conflict Resolution: AphoriaAuthorityLens implementing stemedb_lens::Lens, TierBreakdown in conflict results
  • A4.4 Trust Packs as Claim Bundles: aphoria corpus export-pack, trust-pack list/install, export_claims_as_policy() bridges claims → Trust Packs