stemedb/roadmap.md
jordan c02b0370d7 docs: align demo script with roadmap + add SOC 2 certification task
- Fix reference customer answer in amazement-demo-2 (remove placeholder)
- Add Pilot Delivery Milestones section linking demo capabilities to roadmap tasks
- Add SOC 2 Type II certification task (9C.4) with Q3 2026 target
- Add "real data not mockups" success criterion to P5.4 demo validation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 19:00:43 -07:00

394 lines
17 KiB
Markdown

# Episteme (StemeDB) Roadmap
> **Goal:** Build the "Git for Truth" substrate for autonomous AI research.
> **Current Focus:** Enterprise Pilot Preparation
> **Target Vertical:** BioTech/Pharma ("The Living Review")
> **Endgame:** Distributed multi-writer cluster for millions of concurrent agents
>
> **Infrastructure Status:** Phases 1-7 complete ✅ | Phase 8A (Chaos) complete ✅
> **Pilot Status:** Consumer Health MVP complete ✅ | Enterprise Demo in progress
>
> **Archive:** For completed phases 1-7, see [roadmap-archive.md](./roadmap-archive.md)
---
## Current Status
| Phase | Status | Summary |
|-------|--------|---------|
| **1-7** | ✅ Complete | Core infrastructure, distributed cluster, trust & safety |
| **8A** | ✅ Complete | Chaos testing, Jepsen-style verification |
| **MVP** | ✅ Complete | Consumer Health demo with real FDA data |
| **Pilot Prep** | 🎯 In Progress | Dashboard, impact analysis, production hardening |
| **8B-C** | Planned | Observability, geo-distribution |
| **9** | Planned | Disaster recovery, compliance, storage management |
---
## 🎯 Phase: Enterprise Pilot Preparation (CURRENT)
> **Goal:** Make the pilot bulletproof. Amaze enterprise decision makers.
> **Timeline:** 5 weeks
> **Success Criteria:** Dr. Sarah Chen (skeptical VP of Data Infrastructure) fights her CFO for budget
### The 5 Amazement Moments We Must Deliver
| # | Moment | Current State | Gap |
|---|--------|---------------|-----|
| 1 | Contradictions visible with confidence scores | ✅ Complete | Dashboard scaffold + Skeptic Query UI ✅ |
| 2 | Cascade invalidation when source retracted | ⚠️ Manual | No automatic cascade |
| 3 | Full FDA-ready audit trail | ✅ API ready | Dashboard scaffold ready, UI pending (P1.6) |
| 4 | Point-in-time queries + decay | ✅ API ready | No timeline UI |
| 5 | Malicious agent blocked by circuit breaker | ✅ API ready | Dashboard scaffold ready, UI pending (P1.5) |
### Pilot-1: Demo Dashboard (Week 1-2)
> **Deliverable:** React admin dashboard that makes the API visual
- [x] **P1.1 Dashboard Scaffold**: Next.js + shadcn/ui project setup ✅
- [x] Project structure: `applications/stemedb-dashboard/`
- [x] API client for StemeDB endpoints (`src/lib/api/client.ts`)
- [x] Authentication scaffold (API key header)
- [x] Dark mode (default), responsive layout with collapsible sidebar
- [x] shadcn/ui components: button, card, badge, input, separator, tabs
- [x] Live API status indicator (polls /health every 30s)
- [x] Port 18188, builds and runs successfully
- [x] **P1.2 Skeptic Query Visualization**: Show contradictions graphically ✅
- [x] Query builder: subject, predicate inputs
- [x] Conflict score gauge (0.0-1.0 with color coding)
- [x] Claims table with weight bars, source tier badges
- [x] "CONTESTED" / "AGREED" / "UNANIMOUS" status badges
- [x] Expandable claim rows with source details, agents, provenance hashes
- [x] Loading skeleton, empty state, error state with retry
- [ ] **P1.3 Layered Consensus View**: Per-tier breakdown
- [ ] Tier accordion showing each source class
- [ ] Within-tier conflict score
- [ ] Cross-tier conflict visualization
- [ ] "Overall winner" highlight with confidence
- [ ] **P1.4 Quarantine Admin Panel**: Content defense visibility
- [ ] Pending queue with reason, timestamp, quality score
- [ ] Approve/Reject buttons with confirmation
- [ ] Filter by reason (duplicate, spam, untrusted high-confidence)
- [ ] Metrics: pending count, approved/rejected today
- [ ] **P1.5 Circuit Breaker Status**: Trust & safety dashboard
- [ ] Blocked agents list with failure count, retry time
- [ ] State badges: OPEN (red), HALF_OPEN (yellow), CLOSED (green)
- [ ] Manual reset button for admin override
- [ ] Historical trip events
- [ ] **P1.6 Audit Trail Browser**: Query provenance explorer
- [ ] Recent queries list with agent, timestamp, subject
- [ ] Drilldown: contributing assertions, weights, winner
- [ ] Filter by agent, time range, subject
- [ ] Export to JSON/CSV
### Pilot-2: Demo Data Seeder (Week 2)
> **Deliverable:** Pre-signed realistic demo data using Go SDK
- [ ] **P2.1 Demo Keypair Management**: Reproducible demo keys
- [ ] 5 demo agents with known keys (FDA_AGENT, CLINICAL_AGENT, REDDIT_AGENT, etc.)
- [ ] Keys stored in `demo/keys/` (gitignored for real deploys)
- [ ] Go SDK script: `cmd/demo-seed/main.go`
- [ ] **P2.2 Conflict Scenarios**: Pre-built disagreements
- [ ] Gastroparesis risk: FDA (0.2%) vs Reddit (high)
- [ ] Cardiovascular benefit: Trial A vs Trial B (conflicting results)
- [ ] Nausea rate: Label vs real-world evidence gap
- [ ] 500+ total assertions across 10 subjects
- [ ] **P2.3 Retractable Sources**: Set up cascade demo
- [ ] One "landmark study" source that will be retracted
- [ ] 50+ assertions citing this source
- [ ] Script to mark source as quarantined
- [ ] Script to show impact
- [ ] **P2.4 Historical Data**: Time-travel demo data
- [ ] Assertions with timestamps spanning 12 months
- [ ] Knowledge state that changed (FDA label update scenario)
- [ ] Before/after demonstration
### Pilot-3: Impact Analysis (Week 3)
> **Deliverable:** Automatic cascade when source is retracted
- [ ] **P3.1 Impact Analysis Endpoint**: `GET /v1/sources/{hash}/impact`
- [ ] Returns all assertions citing this source
- [ ] Returns count of queries that used those assertions
- [ ] Returns list of affected agents
- [ ] Implementation in `stemedb-api/src/handlers/source_registry/`
- [ ] **P3.2 Cascade Flagging**: Automatic downstream impact
- [ ] When source status → quarantined, flag citing assertions
- [ ] New field on assertion index: `source_status: Option<SourceStatus>`
- [ ] Queries can filter by `exclude_quarantined_sources=true`
- [ ] Alternative: `SourceAwareLens` that checks source status at resolution
- [ ] **P3.3 Impact Dashboard Widget**: Visualize the cascade
- [ ] Source status change UI (Active → Quarantined)
- [ ] Animated "impact ripple" showing affected count
- [ ] List of impacted queries with timestamp
- [ ] "Remediation status" tracking
### Pilot-4: Production Hardening (Week 4)
> **Deliverable:** Load testing, authentication, backup documentation
- [ ] **P4.1 Load Testing**: Prove performance claims
- [ ] k6 or wrk load test scripts
- [ ] Benchmark: 10K assertions baseline latency
- [ ] Benchmark: 1K writes/sec sustained for 1 hour
- [ ] Benchmark: 100 concurrent readers, no degradation
- [ ] Document results in `uat/production-readiness/`
- [ ] **P4.2 API Authentication**: Basic security for pilot
- [ ] API key middleware (`X-API-Key` header)
- [ ] Per-key rate limiting (separate from per-agent quota)
- [ ] Admin keys vs read-only keys
- [ ] Key management: `POST /v1/admin/api-keys`
- [ ] **P4.3 Backup/Restore Documentation**: DR story
- [ ] Document WAL-based recovery procedure
- [ ] Script: `scripts/backup-stemedb.sh` (snapshot + WAL archive)
- [ ] Script: `scripts/restore-stemedb.sh` (restore from backup)
- [ ] Test restore procedure, document in UAT
- [ ] **P4.4 Prometheus Metrics**: Observability baseline
- [ ] `GET /metrics` endpoint with prometheus format
- [ ] Key metrics: `assertions_total`, `queries_total`, `query_latency_seconds`
- [ ] Trust metrics: `quarantine_pending`, `circuit_breakers_open`
- [ ] Basic Grafana dashboard template
### Pilot-5: Operational Readiness (Week 5)
> **Deliverable:** Runbooks, monitoring, reference architecture
- [ ] **P5.1 Operational Runbooks**: Common procedures documented
- [ ] "Server won't start" troubleshooting
- [ ] "High query latency" investigation
- [ ] "Quarantine queue overflow" handling
- [ ] "Circuit breaker stuck open" resolution
- [ ] "Restore from backup" step-by-step
- [ ] **P5.2 Reference Architecture**: Deployment guide
- [ ] Single-node pilot deployment diagram
- [ ] Network requirements (ports, firewall rules)
- [ ] Reverse proxy configuration (nginx/envoy with TLS)
- [ ] Resource sizing guide (CPU, memory, disk)
- [ ] **P5.3 Pilot Success Criteria Document**: Definition of done
- [ ] Sub-second query latency at 10K assertions: measured
- [ ] Successful conflict detection on known contradictory studies: demonstrated
- [ ] Complete audit trail export for mock regulatory review: tested
- [ ] Source retraction workflow: exercised
- [ ] **P5.4 Executive Demo Script Validation**: End-to-end rehearsal
- [ ] Run through `amazement-demo-2.md` with real dashboard
- [ ] Time each segment (target: 20 minutes total)
- [ ] Anticipate and document answers to 10 tough questions
- [ ] Record demo video for async sharing
- [ ] All 5 Aha Moments demonstrable with real data (not mockups)
### Pilot Prep Deliverables Summary
| Week | Deliverable | Owner | Acceptance Criteria |
|------|-------------|-------|---------------------|
| 1-2 | `stemedb-dashboard` | Frontend | 6 functional panels, connects to API |
| 2 | `demo-seed` | SDK | 500+ signed assertions, retractable sources |
| 3 | Impact Analysis | Backend | `/v1/sources/{hash}/impact` returns cascade |
| 4 | Load Test Results | QA | 10K assertions, 1K writes/sec documented |
| 4 | API Authentication | Backend | API keys work, rate limiting functional |
| 4 | Backup/Restore | Ops | Documented and tested procedure |
| 4 | Metrics Endpoint | Backend | `/metrics` returns Prometheus format |
| 5 | Runbooks | Ops | 5 runbooks in `docs/runbooks/` |
| 5 | Reference Architecture | Docs | Deployment guide complete |
| 5 | Demo Rehearsal | All | 20-minute demo runs smoothly |
---
## Phase 8B-C: Production Observability (Planned)
> **Blocked by:** Pilot Prep (need real production deployment first)
### 8B. Observability
- [ ] **8B.1 Distributed Metrics**: Per-node, per-range, per-agent metrics.
- `sync_lag_seconds{peer}`, `merkle_diff_size{peer}`, `convergence_latency_p99`
- `assertions_total{node}`, `writes_per_second{node}`
- Crate: `metrics` + `metrics-exporter-prometheus`
- [ ] **8B.2 Admin Dashboard**: Cluster health visibility.
- `GET /v1/admin/cluster` → node list, range assignments, leader locations
- `GET /v1/admin/ranges` → range sizes, split/merge history
- `POST /v1/admin/sync` → force anti-entropy sync
### 8C. Production Hardening
- [ ] **8C.1 Snapshot/Restore**: Fast replica bootstrap.
- Serialize full node state as snapshot
- New nodes join by restoring snapshot + replaying recent WAL
- [ ] **8C.2 Backpressure**: Don't overwhelm slow nodes.
- Track per-peer sync queue depth
- Throttle gossip to slow peers
- [ ] **8C.3 Geo-Distribution**: Multi-region deployment.
- Regional clusters with CRDT federation
- Locality-aware reads
---
## Phase 9: The Bunker (Disaster Planning)
> **Goal:** Survive the worst. Backup, restore, recover from corruption, comply with regulations.
### 9A. Backup & Cold Storage
- [ ] **9A.1 Full Cluster Backup**: Point-in-time snapshot to S3/GCS.
- [ ] **9A.2 Point-in-Time Recovery (PITR)**: Restore to any HLC timestamp.
- [ ] **9A.3 Backup Verification**: Weekly automated restore tests.
### 9B. Data Corruption & Rollback
- [ ] **9B.1 Corruption Detection**: Deep validation before accepting gossip.
- [ ] **9B.2 Assertion Tombstones**: "Delete" in an append-only world.
- [ ] **9B.3 Cluster Rollback**: Batch tombstone generation for time ranges.
- [ ] **9B.4 Fork Recovery**: Heal split-brain after extended partition.
### 9C. Compliance & Legal
- [ ] **9C.1 GDPR Right to Erasure**: Cryptographic erasure via per-agent keys.
- [ ] **9C.2 Data Retention Policies**: Per-subject/predicate retention rules.
- [ ] **9C.3 Audit Trail for Compliance**: Immutable admin action log.
- [ ] **9C.4 SOC 2 Type II Certification**: External audit and certification.
- Gap assessment and remediation
- Evidence collection automation
- Auditor engagement
- Target: Q3 2026
### 9D. Storage Management
- [ ] **9D.1 Compaction**: Reclaim space from tombstoned data.
- [ ] **9D.2 Tiered Storage**: Hot/warm/cold based on access patterns.
- [ ] **9D.3 Storage Quotas**: Per-agent and cluster-wide limits.
### 9E. Incident Response
- [ ] **9E.1 Alerting & Escalation**: PagerDuty/Slack integration.
- [ ] **9E.2 Operational Runbooks**: Documented procedures for common failures.
- [ ] **9E.3 Chaos Engineering**: Monthly "game days" with controlled failures.
### 9F. Security Hardening
- [ ] **9F.1 TLS Everywhere**: mTLS for node-to-node traffic.
- [ ] **9F.2 Encryption at Rest**: WAL and KV store encryption.
- [ ] **9F.3 Node Authentication**: Ed25519 keypair identity, signed cluster join.
---
## Architecture Overview
```
Write Path (Spine): Read Path (Cortex):
[Agent] -> [Ingestion] [Agent] <- [Lens Engine]
| |
v |
[WAL/Fsync] [Index Lookup]
| |
v |
[KV Store] <--------------------+
```
## Port Scheme (181XX)
| Offset | Service | Default | Env Var |
|--------|---------|---------|---------|
| +0 | HTTP API | 18180 | `STEMEDB_BIND_ADDR` |
| +1 | Cluster Gateway | 18181 | `STEMEDB_NODE_API_ADDR` |
| +2 | Cluster RPC | 18182 | `STEMEDB_NODE_RPC_ADDR` |
| +3 | SWIM Gossip | 18183 | via `SwimConfig` |
| +4 | Metrics | 18184 | (reserved) |
| +5 | Admin | 18185 | (reserved) |
| +6 | Latent Signal | 18186 | — |
| +7 | Community App | 18187 | — |
| +8 | Admin Dashboard | 18188 | — |
## Crates
| Crate | Purpose | Status |
|-------|---------|--------|
| `stemedb-core` | Assertion, LifecycleStage, MaterializedView, types, signing | ✅ |
| `stemedb-wal` | Write-ahead log with crash recovery | ✅ |
| `stemedb-storage` | KVStore, VoteStore, IndexStore, TrustRankStore, QuarantineStore | ✅ |
| `stemedb-ingest` | Ingestion pipeline, signature verification, ContentDefenseLayer | ✅ |
| `stemedb-query` | Query engine, Materializer for O(1) MV reads | ✅ |
| `stemedb-lens` | Lenses (Recency, Consensus, Authority, Skeptic, Layered, etc.) | ✅ |
| `stemedb-api` | HTTP API with axum + utoipa OpenAPI docs | ✅ |
| `stemedb-sim` | Simulation for testing the pipeline | ✅ |
| `stemedb-merkle` | BLAKE3 Merkle tree for diff detection | ✅ |
| `stemedb-rpc` | gRPC services for node-to-node communication | ✅ |
| `stemedb-sync` | Merkle sync, gossip broadcast, anti-entropy | ✅ |
| `stemedb-cluster` | Cluster membership (SWIM), sharding, gateway | ✅ |
| `stemedb-ontology` | Domain definitions (Pharma), subject builders, medical extractors | ✅ |
| `stemedb-chaos` | Chaos testing infrastructure | ✅ |
| `stemedb-dashboard` | Admin dashboard (React/Next.js) | 🎯 In Progress (scaffold complete) |
## SDKs
| SDK | Purpose | Status |
|-----|---------|--------|
| `sdk/go/steme` | Go HTTP client with Ed25519 signing and fluent builders | ✅ |
| `sdk/go/adk` | ADK-Go tools and callbacks for AI agents | ✅ |
## Specialized Agents
| Domain | Agent | When to use |
|--------|-------|-------------|
| **Product Vision** | `episteme-product-visionary` | Use cases, "why not Postgres?", product-market fit |
| **Pilot Prep** | `enterprise-skeptic-buyer` | Pressure-test demos, find gaps, prepare for tough questions |
| General Rust | `primary-developer` | Feature implementation, refactoring |
| Code Quality | `rust-quality-engineer` | Reviews, test coverage, clippy |
| Storage | `storage-engine-architect` | WAL, LSM, crash recovery |
| Graph Engine | `rust-graph-engine-architect` | Lock-free structures, cache optimization |
| Defensive | `defensive-systems-architect` | Rate limiting, circuit breakers, hostile input |
| Distributed | `distributed-systems-engineer` | CRDT replication, Raft coordination, Merkle sync |
| Lenses | `stemedb-lens-architect` | Query resolution, ranking algorithms |
| Planning | `stemedb-planner` | Milestone planning, roadmap |
---
## Quick Reference
```bash
# Build
cargo build --workspace
# Test
cargo test --workspace
# Lint (must pass before commit)
cargo clippy --workspace -- -D warnings
cargo fmt --check
# Run API server
cargo run --bin stemedb-api
# Run demo script
./scripts/demo-consumer-health.sh
```
---
## Related Documents
- [CLAUDE.md](./CLAUDE.md) — AI assistant instructions and project rules
- [roadmap-archive.md](./roadmap-archive.md) — Completed phases 1-7 detail
- [docs/demo/pilot/amazement-demo.md](./docs/demo/pilot/amazement-demo.md) — Technical demo script
- [docs/demo/pilot/amazement-demo-2.md](./docs/demo/pilot/amazement-demo-2.md) — Executive demo script
- [uat/production-readiness/README.md](./uat/production-readiness/README.md) — Production verification checklist
- [.claude/agents/enterprise-skeptic-buyer.md](./.claude/agents/enterprise-skeptic-buyer.md) — Dr. Sarah Chen persona