# Episteme (StemeDB) Roadmap > **Goal:** Build the "Git for Truth" substrate for autonomous AI research. > **Current Focus:** A5.3 Claim Suggester validation + Pilot 5 Operational Readiness > **Target Vertical:** BioTech/Pharma ("The Living Review") + Code Truth (Aphoria) > **Endgame:** Distributed multi-writer cluster for millions of concurrent agents > > **Infrastructure Status:** Phases 1-7 complete | Phase 8A (Chaos) complete | Pilot 1-4 complete > **Aphoria Status:** A1-A4 complete (observations/claims/verify/corpus) | A5 flywheel 3/4 done > > **Archive:** For completed phases 1-8A + Pilot 1-3, see [roadmap-archive.md](./roadmap-archive.md) --- ## Current Status | Phase | Status | Summary | |-------|--------|---------| | **1-7, 8A** | βœ… Complete | Core infra, cluster, trust, chaos testing | | **MVP, Pilot 1-4** | βœ… Complete | Consumer Health demo, dashboard, API auth, metrics | | **Aphoria A1-A4** | βœ… Complete | Observations/claims/verify/corpus/authority lens | | **Aphoria A5** | 🎯 In Progress | Flywheel: 3/4 done, A5.3 suggest skill needs validation | | **Pilot 5** | Planned | Operational readiness: runbooks, ref arch, demo validation | | **8B-C** | Planned | Distributed observability, geo-distribution | | **9** | Planned | Disaster recovery, compliance, storage management | --- ## 🎯 Aphoria: From Scanner to Knowledge Graph Client (CURRENT) > **Goal:** Transform Aphoria from "grep with Episteme vocabulary" into a real knowledge graph client that authors, stores, and audits claims with provenance and lineage. > **Vision Document:** [applications/aphoria/docs/vision-gaps.md](./applications/aphoria/docs/vision-gaps.md) > **Validation:** Maxwell scan (67 observations, 0 noise) + hand-written [claims-explained.md](./claims-explained.md) ### Completed Phases (A1-A4 + P4 β€” see [roadmap-archive.md](./roadmap-archive.md) for details) | Phase | What It Delivered | |-------|-------------------| | **A1** | `Observation` vs `AuthoredClaim` types, bridge tier mapping, `.aphoria/claims.toml` format | | **A2** | `aphoria claims create/list/explain/update/supersede/deprecate`, `aphoria-claims` skill | | **A3** | `verify.rs` engine (Pass/Conflict/Missing/Unclaimed), `aphoria verify run/map`, pre-commit hook, self-audit | | **A4** | RFC/OWASP as Episteme assertions, `AphoriaAuthorityLens`, Trust Pack export/install | | **P4** | API auth (3 roles), backup/restore scripts, Prometheus metrics + Grafana dashboard | ### Phase A5: The Flywheel > **Goal:** The system gets smarter with use. Each claim makes the next claim easier. > **Details:** [vision-gaps.md β€” Β§5](./applications/aphoria/docs/vision-gaps.md#5-the-claims-explainedmd-pattern-should-be-the-product) (claims-explained.md as the product) > **Research:** [a5-flywheel-skill-design.md](./research-requests/a5-flywheel-skill-design.md) β€” validates "skill calls CLI" hypothesis > **Key Insight:** LLM reasoning over CLI JSON output replaces ML training. The flywheel is prompt engineering, not machine learning. - [x] **A5.1 Claim Coverage Metrics**: Per-module claim density and gap reporting - [x] `coverage.rs`: `CoverageReport`, `ModuleCoverage`, `CoverageSummary` types - [x] `compute_coverage()` uses `verify_claims()` as source of truth for claim-observation matching - [x] Per-module: observation count, claim count, claimed/unclaimed, missing claims, density - [x] `aphoria coverage` CLI: table, JSON, markdown formats, `--sort-by` (name/density/unclaimed/observations) - [x] Coverage gaps section: modules with observations but no claims - [x] 8 unit tests including deprecated claim exclusion - [x] **A5.2 Auto-Generated Documentation**: `aphoria docs generate` + `aphoria claims explain` - [x] `aphoria docs generate` CLI command with `--output` and `--format` (markdown/json) - [x] `claims_explain.rs`: groups by category, includes provenance/invariant/consequence/evidence per claim - [x] `explain.rs`: reads `.aphoria/claims.toml`, renders via `render_claims_markdown()` - [x] Provenance chains preserved (supersedes references) - [ ] **A5.3 Claim Suggester Skill**: LLM-powered pattern recognition via "skill calls CLI" - [x] New skill: `.claude/skills/aphoria-suggest/SKILL.md` (3 modes: cold start / foundation / flywheel) - [x] Workflow defined: `claims list` β†’ `verify run --show-unclaimed` β†’ reason by analogy β†’ suggest - [x] Few-shot learning: existing claims as gold-standard examples for style matching - [x] Chain-of-thought: reasoning template before each suggestion - [x] Cold start bootstrap: reads README/CLAUDE.md/tests/ADRs when 0 claims - [x] Context tiers: local β†’ semantic β†’ summary β†’ global (subagent) - [x] Quality gates: non-trivial, not type-enforced, has consequence, not duplicate - [x] **VG-022 CLOSED**: `verifiable_predicates()` on Extractor trait; 10 extractors declare predicates; `verify map` shows extractorβ†’claim coverage - [x] **Dogfood claims**: 10 total claims in `.aphoria/claims.toml` (3 arch + 7 security) covering all ComparisonModes - [ ] **Validate**: Run skill against Aphoria's own codebase (dogfood) - [ ] **Validate**: Run skill against an external project (cold start test) - [ ] **Iterate**: Refine prompt based on suggestion quality from validation - [x] **A5.4 Onboarding Mode**: `aphoria explain` for new team members - [x] `explain.rs`: `generate_explanation()` reads claims, renders narrative - [x] `aphoria explain` CLI with `--output` and `--format` (markdown/json) - [x] Shows claim inventory grouped by category with provenance - [x] Empty project handling: directs to `aphoria claims create` --- ## Pilot 5: Operational Readiness > **Goal:** Complete production readiness for enterprise pilot demo. > **Context:** Pilot 1-4 complete (see [archive](./roadmap-archive.md)). - [ ] **P5.1 Operational Runbooks**: Common procedures documented - [ ] "Server won't start" troubleshooting - [ ] "High query latency" investigation - [ ] "Quarantine queue overflow" handling - [ ] "Circuit breaker stuck open" resolution - [ ] "Restore from backup" step-by-step - [ ] **P5.2 Reference Architecture**: Deployment guide - [ ] Single-node pilot deployment diagram - [ ] Network requirements (ports, firewall rules) - [ ] Reverse proxy configuration (nginx/envoy with TLS) - [ ] Resource sizing guide (CPU, memory, disk) - [ ] **P5.3 Pilot Success Criteria Document**: Definition of done - [ ] Sub-second query latency at 10K assertions: measured - [ ] Successful conflict detection on known contradictory studies: demonstrated - [ ] Complete audit trail export for mock regulatory review: tested - [ ] Source retraction workflow: exercised - [ ] **P5.4 Executive Demo Script Validation**: End-to-end rehearsal - [ ] Run through `amazement-demo-2.md` with real dashboard - [ ] Time each segment (target: 20 minutes total) - [ ] Record demo video for async sharing - [ ] All 5 Aha Moments demonstrable with real data --- ## Phase 8B-C: Production Observability (Planned) > **Blocked by:** Pilot Prep (need real production deployment first) ### 8B. Observability - [ ] **8B.1 Distributed Metrics**: Per-node, per-range, per-agent metrics. - [ ] **8B.2 Admin Dashboard**: Cluster health visibility. ### 8C. Production Hardening - [ ] **8C.1 Snapshot/Restore**: Fast replica bootstrap. - [ ] **8C.2 Backpressure**: Don't overwhelm slow nodes. - [ ] **8C.3 Geo-Distribution**: Multi-region deployment. --- ## Phase 9: The Bunker (Disaster Planning) > **Goal:** Survive the worst. Backup, restore, recover from corruption, comply with regulations. ### 9A. Backup & Cold Storage - [ ] **9A.1 Full Cluster Backup**: Point-in-time snapshot to S3/GCS. - [ ] **9A.2 Point-in-Time Recovery (PITR)**: Restore to any HLC timestamp. - [ ] **9A.3 Backup Verification**: Weekly automated restore tests. ### 9B. Data Corruption & Rollback - [ ] **9B.1 Corruption Detection**: Deep validation before accepting gossip. - [ ] **9B.2 Assertion Tombstones**: "Delete" in an append-only world. - [ ] **9B.3 Cluster Rollback**: Batch tombstone generation for time ranges. - [ ] **9B.4 Fork Recovery**: Heal split-brain after extended partition. ### 9C. Compliance & Legal - [ ] **9C.1 GDPR Right to Erasure**: Cryptographic erasure via per-agent keys. - [ ] **9C.2 Data Retention Policies**: Per-subject/predicate retention rules. - [ ] **9C.3 Audit Trail for Compliance**: Immutable admin action log. - [ ] **9C.4 SOC 2 Type II Certification**: External audit and certification. ### 9D. Storage Management - [ ] **9D.1 Compaction**: Reclaim space from tombstoned data. - [ ] **9D.2 Tiered Storage**: Hot/warm/cold based on access patterns. - [ ] **9D.3 Storage Quotas**: Per-agent and cluster-wide limits. ### 9E. Incident Response - [ ] **9E.1 Alerting & Escalation**: PagerDuty/Slack integration. - [ ] **9E.2 Operational Runbooks**: Documented procedures for common failures. - [ ] **9E.3 Chaos Engineering**: Monthly "game days" with controlled failures. ### 9F. Security Hardening - [ ] **9F.1 TLS Everywhere**: mTLS for node-to-node traffic. - [ ] **9F.2 Encryption at Rest**: WAL and KV store encryption. - [ ] **9F.3 Node Authentication**: Ed25519 keypair identity, signed cluster join. --- ## Architecture Overview ``` Write Path (Spine): Read Path (Cortex): [Agent] -> [Ingestion] [Agent] <- [Lens Engine] | | v | [WAL/Fsync] [Index Lookup] | | v | [KV Store] <--------------------+ ``` ## Port Scheme (181XX) | Offset | Service | Default | Env Var | |--------|---------|---------|---------| | +0 | HTTP API | 18180 | `STEMEDB_BIND_ADDR` | | +1 | Cluster Gateway | 18181 | `STEMEDB_NODE_API_ADDR` | | +2 | Cluster RPC | 18182 | `STEMEDB_NODE_RPC_ADDR` | | +3 | SWIM Gossip | 18183 | via `SwimConfig` | | +4 | Metrics | 18184 | (reserved) | | +5 | Admin | 18185 | (reserved) | | +6 | Latent Signal | 18186 | β€” | | +7 | Community App | 18187 | β€” | | +8 | Admin Dashboard | 18188 | β€” | ## Crates | Crate | Purpose | Status | |-------|---------|--------| | `stemedb-core` | Assertion, LifecycleStage, MaterializedView, types, signing | βœ… | | `stemedb-wal` | Write-ahead log with crash recovery | βœ… | | `stemedb-storage` | KVStore, VoteStore, IndexStore, TrustRankStore, QuarantineStore | βœ… | | `stemedb-ingest` | Ingestion pipeline, signature verification, ContentDefenseLayer | βœ… | | `stemedb-query` | Query engine, Materializer for O(1) MV reads | βœ… | | `stemedb-lens` | Lenses (Recency, Consensus, Authority, Skeptic, Layered, etc.) | βœ… | | `stemedb-api` | HTTP API with axum + utoipa OpenAPI docs | βœ… | | `stemedb-sim` | Simulation for testing the pipeline | βœ… | | `stemedb-merkle` | BLAKE3 Merkle tree for diff detection | βœ… | | `stemedb-rpc` | gRPC services for node-to-node communication | βœ… | | `stemedb-sync` | Merkle sync, gossip broadcast, anti-entropy | βœ… | | `stemedb-cluster` | Cluster membership (SWIM), sharding, gateway | βœ… | | `stemedb-ontology` | Domain definitions (Pharma), subject builders, medical extractors | βœ… | | `stemedb-chaos` | Chaos testing infrastructure | βœ… | | `stemedb-dashboard` | Admin dashboard (React/Next.js) | βœ… (7 panels) | ## Applications | App | Purpose | Status | |-----|---------|--------| | `aphoria` | Code-level truth linter β€” 42 extractors, claims, verify, coverage | 🎯 A5 flywheel | | `disputed` | Controversy explorer | Planned | ## SDKs | SDK | Purpose | Status | |-----|---------|--------| | `sdk/go/steme` | Go HTTP client with Ed25519 signing and fluent builders | βœ… | | `sdk/go/adk` | ADK-Go tools and callbacks for AI agents | βœ… | --- ## Quick Reference ```bash # Build cargo build --workspace # Test cargo test --workspace # Lint (must pass before commit) cargo clippy --workspace -- -D warnings cargo fmt --check # Run API server cargo run --bin stemedb-api # Run Aphoria scan cargo run --bin aphoria -- scan /path/to/project --show-observations # Run demo script ./scripts/demo-consumer-health.sh ``` --- ## Arena: Simulation Roadmap > **Goal:** Incrementally evolve the simulator from Spine validation to a full Agent-Based Modeling environment. > **Philosophy:** Make it run. Then add. Verify at every step. > **Alignment:** Tracks main roadmap phases; exercises features as they land. ### Current State The simulator (`stemedb-sim`) validates the full system through Arena 0-4: **Completed Arenas:** - βœ… **Arena 0**: Test infrastructure with assertions and CI integration - βœ… **Arena 1**: Query path via QueryEngine, Recency lens, lifecycle filtering, query audit - βœ… **Arena 2**: Voting & VoteAwareConsensus, troll resistance - βœ… **Arena 2.5**: Hardening (race conditions, API tests, crash recovery, input validation) - βœ… **Arena 3**: Materialized Views, fast-path verification, MV freshness - βœ… **Arena 4**: Agent personas (Scientist, Troll, Believer with differentiated strategies) **What's Tested:** - WAL durability, rkyv serialization, Ed25519 signatures - Ingestor pipeline (WAL β†’ KV async flow) - QueryEngine with multiple lenses - Lifecycle filtering, voting, consensus - Query audit trail, materialized views - Strategy-driven agent behaviors **What's Not Yet Tested:** - ❌ TrustRank (Arena 5) - ❌ Concurrent agents at scale (Arena 6) - ❌ Time-travel queries (Arena 7) - ❌ Skeptic lens & conflict scores (Arena 8) ### Upcoming Arena Phases **Arena 5: TrustRank Integration** (Next) - Initialize TrustRank for agents - Reputation adjustment after votes - TrustAwareAuthorityLens verification - Troll reputation decay over time **Arena 6: Concurrent Agents** - Tokio task per agent - Scale to 100 agents, then 1000 - Contention metrics and bottleneck identification **Arena 7: Time-Travel & Epochs** - Time-travel query verification - Epoch creation and supersession - Epoch cascade validation **Arena 8: Skeptic & Conflict** - High/low conflict scenarios - Skeptic lens surfacing outliers - Conflict score accuracy **Arena 9: Full Gameplay Loop** - Ground truth injection - Complete 5-tick scenario - Extended 1000-tick run - Emergence validation ### Alignment with Use Cases | Use Case | Arena Phase | |----------|-------------| | **Agile Agent Team** || | Lifecycle filtering | Arena 1.3 | | Query audit trail | Arena 1.4 | | Time-travel debugging | Arena 7.1 | | Expert weighting | Arena 5.3 | | **Financial Due Diligence** || | Conflict detection | Arena 8.1, 8.3 | | Epoch cascades | Arena 7.2, 7.3 | **Run command:** `cargo run --bin stemedb-sim` **Test suite:** `cargo test -p stemedb-sim` --- ## Related Documents - [CLAUDE.md](./CLAUDE.md) β€” AI assistant instructions and project rules - [roadmap-archive.md](./roadmap-archive.md) β€” Completed phases 1-8A + Pilot 1-3 - [applications/aphoria/docs/vision-gaps.md](./applications/aphoria/docs/vision-gaps.md) β€” Aphoria vision gap analysis - [claims-explained.md](./claims-explained.md) β€” Hand-written Maxwell claims (the gold standard) - [docs/demo/pilot/amazement-demo.md](./docs/demo/pilot/amazement-demo.md) β€” Technical demo script - [docs/demo/pilot/amazement-demo-2.md](./docs/demo/pilot/amazement-demo-2.md) β€” Executive demo script - [uat/production-readiness/README.md](./uat/production-readiness/README.md) β€” Production verification checklist