stemedb/roadmap.md
jml 3b5f88b4f0 feat(aphoria): implement claims architecture (A1-A5) with verify engine, corpus, coverage, and explain
Complete Aphoria claims system overhaul:
- A1: Rename ExtractedClaim to Observation (extractors produce observations, not claims)
- A2: Add AuthoredClaim with full provenance, invariants, and authority tiers
- A3: Verify engine comparing observations against authored claims, CLI + formatters
- A4: Corpus as first-class assertions with predicate indexing, authority lens, trust packs
- A5: Coverage analysis, explain/docs generation, self-audit extractor, claim suggester skill

Also includes: 42 extractors updated for Observation type, verifiable_predicates trait,
conflict detection with comparison modes, claims TOML persistence, Grafana dashboard,
backup/restore scripts, and comprehensive test coverage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 09:11:47 +00:00

13 KiB

Episteme (StemeDB) Roadmap

Goal: Build the "Git for Truth" substrate for autonomous AI research. Current Focus: A5.3 Claim Suggester validation + Pilot 5 Operational Readiness Target Vertical: BioTech/Pharma ("The Living Review") + Code Truth (Aphoria) Endgame: Distributed multi-writer cluster for millions of concurrent agents

Infrastructure Status: Phases 1-7 complete | Phase 8A (Chaos) complete | Pilot 1-4 complete Aphoria Status: A1-A4 complete (observations/claims/verify/corpus) | A5 flywheel 3/4 done

Archive: For completed phases 1-8A + Pilot 1-3, see roadmap-archive.md


Current Status

Phase Status Summary
1-7, 8A Complete Core infra, cluster, trust, chaos testing
MVP, Pilot 1-4 Complete Consumer Health demo, dashboard, API auth, metrics
Aphoria A1-A4 Complete Observations/claims/verify/corpus/authority lens
Aphoria A5 🎯 In Progress Flywheel: 3/4 done, A5.3 suggest skill needs validation
Pilot 5 Planned Operational readiness: runbooks, ref arch, demo validation
8B-C Planned Distributed observability, geo-distribution
9 Planned Disaster recovery, compliance, storage management

🎯 Aphoria: From Scanner to Knowledge Graph Client (CURRENT)

Goal: Transform Aphoria from "grep with Episteme vocabulary" into a real knowledge graph client that authors, stores, and audits claims with provenance and lineage. Vision Document: applications/aphoria/docs/vision-gaps.md Validation: Maxwell scan (67 observations, 0 noise) + hand-written claims-explained.md

Completed Phases (A1-A4 + P4 — see roadmap-archive.md for details)

Phase What It Delivered
A1 Observation vs AuthoredClaim types, bridge tier mapping, .aphoria/claims.toml format
A2 aphoria claims create/list/explain/update/supersede/deprecate, aphoria-claims skill
A3 verify.rs engine (Pass/Conflict/Missing/Unclaimed), aphoria verify run/map, pre-commit hook, self-audit
A4 RFC/OWASP as Episteme assertions, AphoriaAuthorityLens, Trust Pack export/install
P4 API auth (3 roles), backup/restore scripts, Prometheus metrics + Grafana dashboard

Phase A5: The Flywheel

Goal: The system gets smarter with use. Each claim makes the next claim easier. Details: vision-gaps.md — §5 (claims-explained.md as the product) Research: a5-flywheel-skill-design.md — validates "skill calls CLI" hypothesis Key Insight: LLM reasoning over CLI JSON output replaces ML training. The flywheel is prompt engineering, not machine learning.

  • A5.1 Claim Coverage Metrics: Per-module claim density and gap reporting
    • coverage.rs: CoverageReport, ModuleCoverage, CoverageSummary types
    • compute_coverage() uses verify_claims() as source of truth for claim-observation matching
    • Per-module: observation count, claim count, claimed/unclaimed, missing claims, density
    • aphoria coverage CLI: table, JSON, markdown formats, --sort-by (name/density/unclaimed/observations)
    • Coverage gaps section: modules with observations but no claims
    • 8 unit tests including deprecated claim exclusion
  • A5.2 Auto-Generated Documentation: aphoria docs generate + aphoria claims explain
    • aphoria docs generate CLI command with --output and --format (markdown/json)
    • claims_explain.rs: groups by category, includes provenance/invariant/consequence/evidence per claim
    • explain.rs: reads .aphoria/claims.toml, renders via render_claims_markdown()
    • Provenance chains preserved (supersedes references)
  • A5.3 Claim Suggester Skill: LLM-powered pattern recognition via "skill calls CLI"
    • New skill: .claude/skills/aphoria-suggest/SKILL.md (3 modes: cold start / foundation / flywheel)
    • Workflow defined: claims listverify run --show-unclaimed → reason by analogy → suggest
    • Few-shot learning: existing claims as gold-standard examples for style matching
    • Chain-of-thought: reasoning template before each suggestion
    • Cold start bootstrap: reads README/CLAUDE.md/tests/ADRs when 0 claims
    • Context tiers: local → semantic → summary → global (subagent)
    • Quality gates: non-trivial, not type-enforced, has consequence, not duplicate
    • VG-022 CLOSED: verifiable_predicates() on Extractor trait; 10 extractors declare predicates; verify map shows extractor→claim coverage
    • Dogfood claims: 10 total claims in .aphoria/claims.toml (3 arch + 7 security) covering all ComparisonModes
    • Validate: Run skill against Aphoria's own codebase (dogfood)
    • Validate: Run skill against an external project (cold start test)
    • Iterate: Refine prompt based on suggestion quality from validation
  • A5.4 Onboarding Mode: aphoria explain for new team members
    • explain.rs: generate_explanation() reads claims, renders narrative
    • aphoria explain CLI with --output and --format (markdown/json)
    • Shows claim inventory grouped by category with provenance
    • Empty project handling: directs to aphoria claims create

Pilot 5: Operational Readiness

Goal: Complete production readiness for enterprise pilot demo. Context: Pilot 1-4 complete (see archive).

  • P5.1 Operational Runbooks: Common procedures documented

    • "Server won't start" troubleshooting
    • "High query latency" investigation
    • "Quarantine queue overflow" handling
    • "Circuit breaker stuck open" resolution
    • "Restore from backup" step-by-step
  • P5.2 Reference Architecture: Deployment guide

    • Single-node pilot deployment diagram
    • Network requirements (ports, firewall rules)
    • Reverse proxy configuration (nginx/envoy with TLS)
    • Resource sizing guide (CPU, memory, disk)
  • P5.3 Pilot Success Criteria Document: Definition of done

    • Sub-second query latency at 10K assertions: measured
    • Successful conflict detection on known contradictory studies: demonstrated
    • Complete audit trail export for mock regulatory review: tested
    • Source retraction workflow: exercised
  • P5.4 Executive Demo Script Validation: End-to-end rehearsal

    • Run through amazement-demo-2.md with real dashboard
    • Time each segment (target: 20 minutes total)
    • Record demo video for async sharing
    • All 5 Aha Moments demonstrable with real data

Phase 8B-C: Production Observability (Planned)

Blocked by: Pilot Prep (need real production deployment first)

8B. Observability

  • 8B.1 Distributed Metrics: Per-node, per-range, per-agent metrics.
  • 8B.2 Admin Dashboard: Cluster health visibility.

8C. Production Hardening

  • 8C.1 Snapshot/Restore: Fast replica bootstrap.
  • 8C.2 Backpressure: Don't overwhelm slow nodes.
  • 8C.3 Geo-Distribution: Multi-region deployment.

Phase 9: The Bunker (Disaster Planning)

Goal: Survive the worst. Backup, restore, recover from corruption, comply with regulations.

9A. Backup & Cold Storage

  • 9A.1 Full Cluster Backup: Point-in-time snapshot to S3/GCS.
  • 9A.2 Point-in-Time Recovery (PITR): Restore to any HLC timestamp.
  • 9A.3 Backup Verification: Weekly automated restore tests.

9B. Data Corruption & Rollback

  • 9B.1 Corruption Detection: Deep validation before accepting gossip.
  • 9B.2 Assertion Tombstones: "Delete" in an append-only world.
  • 9B.3 Cluster Rollback: Batch tombstone generation for time ranges.
  • 9B.4 Fork Recovery: Heal split-brain after extended partition.
  • 9C.1 GDPR Right to Erasure: Cryptographic erasure via per-agent keys.
  • 9C.2 Data Retention Policies: Per-subject/predicate retention rules.
  • 9C.3 Audit Trail for Compliance: Immutable admin action log.
  • 9C.4 SOC 2 Type II Certification: External audit and certification.

9D. Storage Management

  • 9D.1 Compaction: Reclaim space from tombstoned data.
  • 9D.2 Tiered Storage: Hot/warm/cold based on access patterns.
  • 9D.3 Storage Quotas: Per-agent and cluster-wide limits.

9E. Incident Response

  • 9E.1 Alerting & Escalation: PagerDuty/Slack integration.
  • 9E.2 Operational Runbooks: Documented procedures for common failures.
  • 9E.3 Chaos Engineering: Monthly "game days" with controlled failures.

9F. Security Hardening

  • 9F.1 TLS Everywhere: mTLS for node-to-node traffic.
  • 9F.2 Encryption at Rest: WAL and KV store encryption.
  • 9F.3 Node Authentication: Ed25519 keypair identity, signed cluster join.

Architecture Overview

Write Path (Spine):           Read Path (Cortex):
[Agent] -> [Ingestion]        [Agent] <- [Lens Engine]
              |                              |
              v                              |
         [WAL/Fsync]                  [Index Lookup]
              |                              |
              v                              |
         [KV Store] <--------------------+

Port Scheme (181XX)

Offset Service Default Env Var
+0 HTTP API 18180 STEMEDB_BIND_ADDR
+1 Cluster Gateway 18181 STEMEDB_NODE_API_ADDR
+2 Cluster RPC 18182 STEMEDB_NODE_RPC_ADDR
+3 SWIM Gossip 18183 via SwimConfig
+4 Metrics 18184 (reserved)
+5 Admin 18185 (reserved)
+6 Latent Signal 18186
+7 Community App 18187
+8 Admin Dashboard 18188

Crates

Crate Purpose Status
stemedb-core Assertion, LifecycleStage, MaterializedView, types, signing
stemedb-wal Write-ahead log with crash recovery
stemedb-storage KVStore, VoteStore, IndexStore, TrustRankStore, QuarantineStore
stemedb-ingest Ingestion pipeline, signature verification, ContentDefenseLayer
stemedb-query Query engine, Materializer for O(1) MV reads
stemedb-lens Lenses (Recency, Consensus, Authority, Skeptic, Layered, etc.)
stemedb-api HTTP API with axum + utoipa OpenAPI docs
stemedb-sim Simulation for testing the pipeline
stemedb-merkle BLAKE3 Merkle tree for diff detection
stemedb-rpc gRPC services for node-to-node communication
stemedb-sync Merkle sync, gossip broadcast, anti-entropy
stemedb-cluster Cluster membership (SWIM), sharding, gateway
stemedb-ontology Domain definitions (Pharma), subject builders, medical extractors
stemedb-chaos Chaos testing infrastructure
stemedb-dashboard Admin dashboard (React/Next.js) (7 panels)

Applications

App Purpose Status
aphoria Code-level truth linter — 42 extractors, claims, verify, coverage 🎯 A5 flywheel
disputed Controversy explorer Planned

SDKs

SDK Purpose Status
sdk/go/steme Go HTTP client with Ed25519 signing and fluent builders
sdk/go/adk ADK-Go tools and callbacks for AI agents

Quick Reference

# Build
cargo build --workspace

# Test
cargo test --workspace

# Lint (must pass before commit)
cargo clippy --workspace -- -D warnings
cargo fmt --check

# Run API server
cargo run --bin stemedb-api

# Run Aphoria scan
cargo run --bin aphoria -- scan /path/to/project --show-observations

# Run demo script
./scripts/demo-consumer-health.sh