jml 3b5f88b4f0 feat(aphoria): implement claims architecture (A1-A5) with verify engine, corpus, coverage, and explain

Complete Aphoria claims system overhaul:
- A1: Rename ExtractedClaim to Observation (extractors produce observations, not claims)
- A2: Add AuthoredClaim with full provenance, invariants, and authority tiers
- A3: Verify engine comparing observations against authored claims, CLI + formatters
- A4: Corpus as first-class assertions with predicate indexing, authority lens, trust packs
- A5: Coverage analysis, explain/docs generation, self-audit extractor, claim suggester skill

Also includes: 42 extractors updated for Observation type, verifiable_predicates trait,
conflict detection with comparison modes, claims TOML persistence, Grafana dashboard,
backup/restore scripts, and comprehensive test coverage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-08 09:11:47 +00:00

22 KiB

Raw Blame History

Episteme (StemeDB) Roadmap Archive

Purpose: Historical record of completed phases. For current work, see roadmap.md. Last Updated: 2026-02-08

Completed Phases Summary

Phase	Codename	Status	Completion
1	The Spine	✅ Complete	Storage & Safety — WAL + KV Store
2	The Lattice	✅ Complete	Indexing & Async — MVs + Ballot Box
2.5	Hardening	✅ Complete	MV staleness, epoch behavior, lens cleanup
3	The Pilot	✅ Complete	Vertical Integration — Pharma Ingestion
4	The Hive	✅ Complete	Trust & Learning — TrustRank, metadata indexing
5	The Forge	✅ Complete	Foundation Hardening — redb/fjall, WAL, indices
6	The Mesh	✅ Complete	Distributed Writes — CRDT, Raft, clustering
7	The Shield	✅ Complete	Trust at Scale — EigenTrust, PoW, quarantine
8A	Chaos	✅ Complete	Partition testing, Jepsen-style verification
MVP	Consumer Health	✅ Complete	Real FDA data → conflicts detected → demo
Pilot 1-3	Pilot Prep (Partial)	✅ Complete	Dashboard, demo data, impact analysis, load testing
Pilot 4	Production Hardening	✅ Complete	API auth, backup/restore, Prometheus metrics
Aphoria A1	Observations vs Claims	✅ Complete	Type system: Observation + AuthoredClaim, bridge tiers
Aphoria A2	Authoring Workflow	✅ Complete	claims create/list/explain/update/supersede/deprecate
Aphoria A3	Verification Engine	✅ Complete	verify.rs, verify run/map, pre-commit hook, self-audit
Aphoria A4	Corpus as Assertions	✅ Complete	RFC/OWASP assertions, authority lens, trust packs

Phase 1: The Spine (Foundation) ✅

Goal: Securely ingest assertions and persist them without data loss.

Project Scaffold: Initialize Rust workspace, set up linting/CI (clippy, fmt).
Assertion Schema: Define the Assertion struct with rkyv serialization.
- Add dependencies: rkyv, blake3, ed25519-dalek, image_hasher.
- Define Assertion struct (Subject, Predicate, Object, Confidence, SourceHash).
- Multi-Sig Expansion: Implement SignatureEntry struct and signatures: Vec<SignatureEntry> field.
- Visual Expansion: Add visual_hash: Option<pHash> field for image provenance.
- Test serialization round-trips.
Ballot Schema: Define the Vote struct for multi-agent consensus.
- Add Vote struct: assertion_hash, agent_id, weight, signature.
- Test serialization round-trips.
Paradigm Schema (Epochs): Define the Epoch and SupersessionType structs.
- Add epoch: Option<EpochId> to Assertion.
- Implement Epoch struct with supersedes and SupersessionType.
- Test serialization round-trips.
WAL Integration: Implement the Quarantine Pattern for write-ahead logging.
- Create stemedb-wal crate.
- Port FsyncGuard and Record logic from established durability patterns.
- Implement Record format with BLAKE3 checksums and Headers.
- Verify fsync behavior with tests.
Storage Engine: Implement the Store trait using sled (embedded KV).
- Add sled dependency.
- Define KVStore trait (put, get, delete, scan_prefix, flush).
- Implement SledStore wrapper.
Basic Ingestor: Background worker that tails WAL and writes to KV.
- Implement async loop reading from WAL.
- Write deserialized assertions, votes, and epochs to sled.
- Ed25519 signature verification during ingestion.
- Maintains S: and SP: indexes on ingest.
- Persistent cursor/checkpoint (resumes from __CURSOR__:ingest in KV store).
Verification: Crash recovery tests (write -> crash -> restart -> read).
- Single and multi-record crash recovery.
- Multiple crash cycles tested.

Phase 2: The Lattice (Connectivity) ✅

Goal: Query data with sub-millisecond latency using Materialized Views.

Lifecycle Schema: Add LifecycleStage to Assertion.
- Define enum: Proposed, UnderReview, Approved, Deprecated, Rejected.
- Update Assertion struct and serialization tests.
The Ballot Box: Implement high-velocity vote ingestion.
- VoteStore trait and implementation.
- VoteAwareConsensusLens for real vote-based resolution.
Index Infrastructure: Compound indexes for O(1) queries.
- IndexStore trait with S: and SP: indexes.
- QueryEngine smart routing (SP -> S -> scan).
Materializer: Background worker for O(1) Read Performance.
- MaterializedView type in stemedb-core.
- Materializer worker in stemedb-query with step() and run().
- Aggregates Votes via VoteAwareConsensusLens (or any AsyncLens).
- Updates MV:{Subject}:{Predicate} with the winning Assertion + metadata.
- Event-driven mode via run_notified() with tokio::sync::Notify.
- Fast-path MV lookup in QueryEngine::try_fast_path().
The Meter: Implement Economic Throttling (TAN).
- QuotaStore trait and GenericQuotaStore implementation.
- Token Bucket algorithm with per-agent per-hour quotas.
- MeterLayer tower middleware for request cost tracking.
- Cost model: Assert=10, Vote=1, Query=5+lens, +1/KB payload.
- GET /v1/meter/quota endpoint to check remaining quota.
- POST /v1/meter/quota/limit admin endpoint to set custom limits.
API Surface: axum HTTP server with OpenAPI (utoipa).
- POST /v1/assert -> Accepts JSON, writes to WAL.
- POST /v1/vote -> High-throughput vote endpoint.
- POST /v1/epoch -> Create epoch with optional supersession.
- GET /v1/query -> Subject/Predicate/Lens/Lifecycle/Epoch filtering.
- GET /v1/health -> Health check with assertion count.
- GET /swagger-ui -> Interactive API docs.
- 5 lens types available: Recency, Consensus, Authority, VoteAwareConsensus, TrustAwareAuthority.
Query Audit: Log every read with provenance.
- Define QueryAudit struct: query_id, agent_id, timestamp, params, result_hash, contributing_assertions.
- Storage at AUD:{query_id} with agent index at AUDA:{agent_id}:{timestamp}:{query_id}.
- GET /v1/audit/queries -> Returns history of agent decisions.
- GET /v1/audit/query/{id} -> Full reasoning trace for a single query.
- Auto-logging on every query via X-Agent-Id header.

Phase 2.5: Hardening ✅

Goal: Close the gaps between "built" and "works right."

2.1 MV Staleness Detection: max_stale parameter on queries.
2.2 AuthorityLens -> ConfidenceLens Rename: Eliminated misleading name.
2.3 EpochAwareLens: Epoch supersession runtime behavior with cycle detection.
2.4 Visual Hash Query Support: Hamming distance queries on visual_hash.
2.5 Vector Field: vector: Option<Vec<f32>> stored on assertions.
2.6 E2E Integration Test: Full pipeline validation (Write -> Materialize -> Read).

Phase 3: The Pilot (BioTech/Pharma) ✅

Goal: Prove value in the "High-Liability" beachhead.

3A. Schema Expansion

3A.1 Source-Class Field: 6-tier SourceClass enum (Regulatory → Anecdotal).
3A.2 Conflict Score on Resolution: Normalized variance-based conflict metric.
3A.3 Rich Source Metadata: source_metadata: Option<Vec<u8>> for JSON provenance.

3B. Time & Decay

3B.1 Time-Travel Engine: as_of parameter for historical queries.
3B.2 Semantic Decay: Confidence half-life with tier-specific rates.

3C. New Lenses

3C.1 Skeptic Lens: Surface disagreement via Shannon entropy conflict scoring.
3C.2 Layered Consensus Lens: Per-source-class consensus with tier visibility.
3C.3 Constraints Lens: Pre-flight check for must_use/forbidden/prefer.

3D. Epoch Enhancement

3D.1 Epoch Cascade Logic: O(1) supersession lookup via pre-computed markers.

3E. Similarity Search

3E.1 Vector Search: HNSW-based semantic k-NN queries.
3E.2 Visual Hash Index: BK-tree for O(log N) visual similarity.

3F. Provenance

3F.1 Source Document Storage: Content-addressed source storage with GET /v1/provenance/{hash}.

3G. API Cleanup

3G.1 Document epoch supersession: Updated docs for POST /v1/epoch with supersedes field.

Phase 4: The Hive (Trust & Scale) ✅

Goal: Change tracking, metadata indexing, and training pipeline primitives.

TrustRank Engine: Per-agent reputation with decay and learning loop.
4.1 "Since" Parameter: MV changelog at MVC: keys with changes_since in responses.
4.2 Source Metadata Indexing: Indexed fields (journal, doi, platform, study_design) at SMV:.
4.3 Batch TrustRank Decay API: POST /v1/admin/decay-trust-ranks.
4.4 Vote Provenance Witness: source_url and observed_context on votes.
4.5 Conflict Score Filtering: min_conflict_score/max_conflict_score on queries.
4.6 Escalation Triggers: EscalationPolicy fires events on high-conflict assertions.
4.7 Gold Standard Verification: Admin-verified assertions for agent testing.

Phase 5: The Forge (Foundation Hardening) ✅

Goal: Replace abandoned dependencies, fix WAL gaps, persist indices.

5A. Storage Engine Replacement

5A.1 Replace sled with redb + fjall: HybridStore with prefix-based routing.
5A.2 Key Layout Redesign: Subject-prefix keys for range sharding readiness.

5B. WAL Hardening

5B.1 CRC32C Checksums: Hardware-accelerated torn write detection.
5B.2 Crash Recovery Implementation: Sequential scan with truncation.
5B.3 Group Commit: Batch fsync for throughput.
5B.4 Log Rotation: Segment management with safe deletion.

5C. Index Persistence

5C.1 Persistent Vector Index: Hot/cold HNSW with checkpoint files.
5C.2 Persistent Visual Index: BK-tree snapshots with CRC32C verification.

5D. Concept Hierarchy

5D.1 ConceptPath Type: Scheme-qualified subject identifiers.
5D.2 Source Scheme Registry: Scheme → default source tier mapping.
5D.3 Alias Store: Cross-scheme entity resolution with cycle detection.
5D.4 Hierarchical Query: Prefix-based subject queries.
5D.5 Alias Resolution in Queries: GET /v1/concepts/resolve?path=....
5D.6 Source Class Inference: Tier inference from scheme.
5D.7 Concept API Endpoints: Full CRUD for aliases and hierarchy.
5D.8 Battery Tests: 15 tests across Battery 8 and 9.

Phase 6: The Mesh (Distributed Writes) ✅

Goal: Multi-node cluster with CRDT replication and Raft coordination.

6A. CRDT Foundation

6A.1 Integrate CRDT Crate: G-Set for assertions, G-Counter for votes.
6A.2 Hybrid Logical Clocks: HLC timestamps for causal ordering.
6A.3 Merkle Tree Over Assertions: BLAKE3-based diff detection.

6B. Two-Node Replication (PoC)

6B.1 RPC Layer: tonic gRPC with SyncClient and SyncServiceHandler.
6B.2 Gossip Broadcast: Configurable fanout with rate limiting.
6B.3 Merkle Anti-Entropy Sync: Background convergence worker.
6B.4 Integration Test: 8 tests validating replication primitives.

6C. Multi-Node Cluster

6C.1 Cluster Membership (SWIM Gossip): Node discovery and failure detection.
6C.2 Subject-Prefix Range Sharding: BLAKE3 + jump hash routing.
6C.4 Gateway: Stateless request routing with health and status endpoints.
6C.5 Integration Tests: 82 tests covering membership, sharding, gateway.

Consistency Guarantees

Property	Guarantee	Mechanism
Convergence	Eventually consistent	G-Set merge (CRDT)
Causality	Supersessions ordered	HLC timestamps
Partition Tolerance	Writes never blocked	Any node accepts via CRDT
Availability	Reads/writes always succeed	Every node is master for CRDTs
Durability	WAL + fsync per node	Existing WAL infra
Conflict Resolution	Deterministic	Lens algorithms

Phase 7: The Shield (Trust at Scale) ✅

Goal: Defend against spam, Sybil attacks, and knowledge poisoning.

7A. Admission Control

7A.1 Proof-of-Work Admission: BLAKE3 hashcash with graduated difficulty.
7A.2 Graduated Trust Tiers: 5 tiers (Untrusted → Authority) with quota multipliers.

7B. EigenTrust

7B.1 Trust Graph Store: Direct trust relationships at TG: keys.
7B.2 EigenTrust Computation: Power iteration with Sybil resistance.
7B.3 Domain-Specific Trust: Per-predicate-namespace reputation.

7C. Content Defense

7C.1 MinHash Deduplication: LSH bucketing with 0.9 Jaccard threshold.
7C.2 Content Quality Scoring: Entropy, length, structure heuristics.
7C.3 Quarantine Store: Time-ordered suspicious assertions with admin review.

7D. Circuit Breakers

7D.1 Per-Agent Circuit Breakers: Closed → Open → HalfOpen state machine.

Phase 8A: Chaos Testing ✅

8A.1 Partition Testing: 5-node cluster, network partitions, cascading failures.
8A.2 Jepsen-Style Consistency Testing: CRDT properties, clock skew, concurrent writes.

Consumer Health MVP ✅

"Can Episteme demonstrate value that's impossible with Postgres?"

Definition of Done (All Complete)

Checkpoint	Description
Real Data Flows	FDA drug labels for 3+ GLP-1 drugs ingested as signed assertions
Conflicts Detected	SkepticLens shows `conflict_score > 0.5` when sources disagree
Source Hierarchy Works	Tier 0 (FDA) outweighs 100x Tier 5 (anecdotal) volume
Time Travel Works	`as_of=2024-01-01` returns historical snapshot
Decay Works	6-month-old Reddit claim has lower effective confidence than fresh FDA
UAT Passes	Consumer Health scenarios documented and verified
Self-Serve Demo	CLI tool lets anyone explore without code
Documentation	"Adding a Domain" guide enables new verticals

MVP Workstream (Weeks 1-6)

Week 1: Domain definitions, SubjectBuilder, pharma schema
Week 2: FDA extractor, claim-to-assertion signing
Week 3: Ingest FDA claims, mock conflicts, SkepticLens demo
Week 4: UAT scenarios documented and verified
Week 5: steme-pharma CLI for self-serve exploration
Week 6: Polish, reusable patterns, documentation

Enterprise Pilot Preparation (Partial) ✅

Completed: Pilot-1, Pilot-2, Pilot-3, P4.1. Remaining: P4.2-P4.4, P5.1-P5.4 (still in roadmap.md)

Pilot-1: Demo Dashboard (Complete)

Deliverable: React admin dashboard that makes the API visual

P1.1 Dashboard Scaffold: Next.js + shadcn/ui project setup (applications/stemedb-dashboard/)
P1.2 Skeptic Query Visualization: Contradictions with conflict scores, tier badges, expandable claims
P1.3 Layered Consensus View: Per-tier breakdown with cross-tier conflict visualization
P1.4 Quarantine Admin Panel: Pending queue, approve/reject, filter by reason, metrics
P1.5 Circuit Breaker Status: Blocked agents, state badges (OPEN/HALF_OPEN/CLOSED), manual reset
P1.6 Audit Trail Browser: Recent queries, drilldown, filter by agent/time, export JSON/CSV

Pilot-2: Demo Data Seeder (Complete)

Deliverable: Pre-signed realistic demo data using Go SDK

P2.1 Demo Keypair Management: 5 demo agents (FDA, PubMed, ClinicalTrials, Reddit, Internal) with deterministic keys
P2.2 Conflict Scenarios: 3 drugs (semaglutide, tirzepatide, liraglutide), 150+ assertions, real FDA content
P2.3 Retractable Sources: CARDIOVASC_MEGA_TRIAL with 110 cascade assertions across 5 agents
P2.4 Historical Data: Lifecycle evolution (Proposed → Approved → Deprecated), 17 historical assertions

Pilot-3: Impact Analysis (Complete)

Deliverable: Automatic cascade when source is retracted

P3.1 Impact Analysis Endpoint: GET /v1/sources/{hash}/impact, quarantine with preview, restore, 17 tests
P3.2 Cascade Flagging: Query-time source status enrichment, exclude_quarantined_sources filter, CSV/JSON export
P3.3 Impact Dashboard Widget: Sources page, quarantine dialog with impact preview, impact ripple animation

Pilot-4: Production Hardening (Partial)

P4.1 Load Testing: Go-based load tester, 10K assertions, 1K writes/sec, 100 concurrent readers, markdown reports

5 Amazement Moments (Status at Archive)

#	Moment	Status
1	Contradictions visible with confidence scores	✅ Complete
2	Cascade invalidation when source retracted	✅ Complete
3	Full FDA-ready audit trail	✅ Complete
4	Point-in-time queries + decay	✅ API ready (no timeline UI)
5	Malicious agent blocked by circuit breaker	✅ Complete

Key Architectural Decisions (Historical)

sled → redb/fjall: sled abandoned. HybridStore routes by key prefix.
Raft log = WAL: Eliminated duplicate WAL following TiKV v5.4 pattern.
CRDT for data, Raft for coordination: Assertions are G-Set CRDT.
Subject-prefix ranges: Co-locate all data for a subject on one shard.
HLC over TrueTime: Works on commodity hardware.
AP model: Writes never blocked during partitions.

Research Documents

docs/research/wal-crash-recovery-research.md — WAL patterns from CockroachDB, TiKV, FoundationDB, SQLite.
docs/research/distributed-write-path.md — Spanner/CockroachDB-style distributed writes adapted for append-only model.

Crates (as of archive date)

Crate	Purpose
`stemedb-core`	Assertion, LifecycleStage, MaterializedView, types, signing utilities
`stemedb-wal`	Write-ahead log with crash recovery
`stemedb-storage`	KVStore, VoteStore, IndexStore, TrustRankStore, QuarantineStore, SimilarityIndex
`stemedb-ingest`	Ingestion pipeline, signature verification, ContentDefenseLayer
`stemedb-query`	Query engine, Materializer for O(1) MV reads
`stemedb-lens`	Lenses (Recency, Consensus, Authority, Skeptic, Layered, etc.)
`stemedb-api`	HTTP API with axum + utoipa OpenAPI docs
`stemedb-sim`	Simulation for testing the pipeline
`stemedb-merkle`	BLAKE3 Merkle tree for diff detection
`stemedb-rpc`	gRPC services for node-to-node communication
`stemedb-sync`	Merkle sync, gossip broadcast, anti-entropy
`stemedb-cluster`	Cluster membership (SWIM), sharding, gateway
`stemedb-ontology`	Domain definitions (Pharma), subject builders, medical extractors
`stemedb-chaos`	Chaos testing infrastructure

Pilot-4: Production Hardening ✅

P4.2 API Authentication: API key middleware (X-API-Key), BLAKE3-hashed keys, 3 roles (admin/write/read), 5 CRUD endpoints, bootstrap via STEMEDB_ROOT_API_KEY
P4.3 Backup/Restore: scripts/backup-stemedb.sh + scripts/restore-stemedb.sh, WAL magic verify, rename-not-delete safety
P4.4 Prometheus Metrics: /metrics endpoint, assertions_total, queries_total, query_latency_seconds, quarantine_pending, Grafana dashboard template

Aphoria A1: Distinguish Observations from Claims ✅

Goal: Type system reflects the real difference. No more pretending grep results are claims.

A1.1 Rename ExtractedClaim to Observation: Updated across all 42 extractors, bridge, scanner, CLI
A1.2 Create Claim Type: AuthoredClaim in types/authored_claim.rs with provenance/invariant/consequence/authority/evidence/status/supersedes. ClaimStore trait + TomlClaimStore. ClaimsFile TOML persistence in .aphoria/claims.toml
A1.3 Update Bridge Tier Mapping: Observations → Tier 4 (Community), authored claims get tier from authority_tier field via authored_claim_to_assertion()
A1.4 Claim File Format: .aphoria/claims.toml with [[claim]] TOML arrays, human-readable, version-controllable

Aphoria A2: Build the Authoring Workflow ✅

Goal: The skill — not the scanner — is the primary interface for creating claims.

A2.1 Claim Authoring Command: aphoria claims create with all fields, authority tier validation
A2.2 Claim Listing: aphoria claims list with --category, --status, --format json
A2.3 Claims Explained Generator: aphoria claims explain groups by category with provenance/invariant/consequence
A2.4 Enhance Aphoria Skill: .claude/skills/aphoria-claims/SKILL.md for diff review, pattern table, authority tier guide
A2.5 Claim Lifecycle: update, supersede (with parent pointer), deprecate (with reason)

Aphoria A3: Pair Extractors with Claims ✅

Goal: Extractors verify claims, not generate them. The audit finds real conflicts.

A3.1 Verification Engine: ComparisonMode (Equals/NotEquals/Present/Absent), verify.rs with tail-path matching, 4 verdicts (Pass/Conflict/Missing/Unclaimed), 11 unit tests
A3.2 Verify Command: aphoria verify run|map, --exit-code (0=pass, 1=missing, 2=conflicts, 3=error), --claim and --category filters
A3.3 Verify Report Formatters: verify_table.rs + verify_json.rs
A3.4 Pre-Commit Hook: aphoria verify run --changed-only --exit-code using walk_staged_files()
A3.5 Self-Audit Extractors: self_audit.rs (unwrap count, bridge tier, parent_hash, lifecycle), opt-in, 5+3 tests

Aphoria A4: Make the Corpus First-Class ✅

Goal: RFC/OWASP knowledge lives in Episteme as real assertions, not hardcoded data.

A4.1 Import RFC Corpus: Tier 0/1 assertions with section references, source hash = content hash, create_authoritative_assertion_with_metadata() helper
A4.2 Import OWASP Corpus: OWASP → Tier 0/1 assertions with CWE references as metadata
A4.3 Lens-Based Conflict Resolution: AphoriaAuthorityLens implementing stemedb_lens::Lens, TierBreakdown in conflict results
A4.4 Trust Packs as Claim Bundles: aphoria corpus export-pack, trust-pack list/install, export_claims_as_policy() bridges claims → Trust Packs

22 KiB Raw Blame History