stemedb/roadmap.md
jordan c02b0370d7 docs: align demo script with roadmap + add SOC 2 certification task
- Fix reference customer answer in amazement-demo-2 (remove placeholder)
- Add Pilot Delivery Milestones section linking demo capabilities to roadmap tasks
- Add SOC 2 Type II certification task (9C.4) with Q3 2026 target
- Add "real data not mockups" success criterion to P5.4 demo validation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 19:00:43 -07:00

17 KiB

Episteme (StemeDB) Roadmap

Goal: Build the "Git for Truth" substrate for autonomous AI research. Current Focus: Enterprise Pilot Preparation Target Vertical: BioTech/Pharma ("The Living Review") Endgame: Distributed multi-writer cluster for millions of concurrent agents

Infrastructure Status: Phases 1-7 complete | Phase 8A (Chaos) complete Pilot Status: Consumer Health MVP complete | Enterprise Demo in progress

Archive: For completed phases 1-7, see roadmap-archive.md


Current Status

Phase Status Summary
1-7 Complete Core infrastructure, distributed cluster, trust & safety
8A Complete Chaos testing, Jepsen-style verification
MVP Complete Consumer Health demo with real FDA data
Pilot Prep 🎯 In Progress Dashboard, impact analysis, production hardening
8B-C Planned Observability, geo-distribution
9 Planned Disaster recovery, compliance, storage management

🎯 Phase: Enterprise Pilot Preparation (CURRENT)

Goal: Make the pilot bulletproof. Amaze enterprise decision makers. Timeline: 5 weeks Success Criteria: Dr. Sarah Chen (skeptical VP of Data Infrastructure) fights her CFO for budget

The 5 Amazement Moments We Must Deliver

# Moment Current State Gap
1 Contradictions visible with confidence scores Complete Dashboard scaffold + Skeptic Query UI
2 Cascade invalidation when source retracted ⚠️ Manual No automatic cascade
3 Full FDA-ready audit trail API ready Dashboard scaffold ready, UI pending (P1.6)
4 Point-in-time queries + decay API ready No timeline UI
5 Malicious agent blocked by circuit breaker API ready Dashboard scaffold ready, UI pending (P1.5)

Pilot-1: Demo Dashboard (Week 1-2)

Deliverable: React admin dashboard that makes the API visual

  • P1.1 Dashboard Scaffold: Next.js + shadcn/ui project setup

    • Project structure: applications/stemedb-dashboard/
    • API client for StemeDB endpoints (src/lib/api/client.ts)
    • Authentication scaffold (API key header)
    • Dark mode (default), responsive layout with collapsible sidebar
    • shadcn/ui components: button, card, badge, input, separator, tabs
    • Live API status indicator (polls /health every 30s)
    • Port 18188, builds and runs successfully
  • P1.2 Skeptic Query Visualization: Show contradictions graphically

    • Query builder: subject, predicate inputs
    • Conflict score gauge (0.0-1.0 with color coding)
    • Claims table with weight bars, source tier badges
    • "CONTESTED" / "AGREED" / "UNANIMOUS" status badges
    • Expandable claim rows with source details, agents, provenance hashes
    • Loading skeleton, empty state, error state with retry
  • P1.3 Layered Consensus View: Per-tier breakdown

    • Tier accordion showing each source class
    • Within-tier conflict score
    • Cross-tier conflict visualization
    • "Overall winner" highlight with confidence
  • P1.4 Quarantine Admin Panel: Content defense visibility

    • Pending queue with reason, timestamp, quality score
    • Approve/Reject buttons with confirmation
    • Filter by reason (duplicate, spam, untrusted high-confidence)
    • Metrics: pending count, approved/rejected today
  • P1.5 Circuit Breaker Status: Trust & safety dashboard

    • Blocked agents list with failure count, retry time
    • State badges: OPEN (red), HALF_OPEN (yellow), CLOSED (green)
    • Manual reset button for admin override
    • Historical trip events
  • P1.6 Audit Trail Browser: Query provenance explorer

    • Recent queries list with agent, timestamp, subject
    • Drilldown: contributing assertions, weights, winner
    • Filter by agent, time range, subject
    • Export to JSON/CSV

Pilot-2: Demo Data Seeder (Week 2)

Deliverable: Pre-signed realistic demo data using Go SDK

  • P2.1 Demo Keypair Management: Reproducible demo keys

    • 5 demo agents with known keys (FDA_AGENT, CLINICAL_AGENT, REDDIT_AGENT, etc.)
    • Keys stored in demo/keys/ (gitignored for real deploys)
    • Go SDK script: cmd/demo-seed/main.go
  • P2.2 Conflict Scenarios: Pre-built disagreements

    • Gastroparesis risk: FDA (0.2%) vs Reddit (high)
    • Cardiovascular benefit: Trial A vs Trial B (conflicting results)
    • Nausea rate: Label vs real-world evidence gap
    • 500+ total assertions across 10 subjects
  • P2.3 Retractable Sources: Set up cascade demo

    • One "landmark study" source that will be retracted
    • 50+ assertions citing this source
    • Script to mark source as quarantined
    • Script to show impact
  • P2.4 Historical Data: Time-travel demo data

    • Assertions with timestamps spanning 12 months
    • Knowledge state that changed (FDA label update scenario)
    • Before/after demonstration

Pilot-3: Impact Analysis (Week 3)

Deliverable: Automatic cascade when source is retracted

  • P3.1 Impact Analysis Endpoint: GET /v1/sources/{hash}/impact

    • Returns all assertions citing this source
    • Returns count of queries that used those assertions
    • Returns list of affected agents
    • Implementation in stemedb-api/src/handlers/source_registry/
  • P3.2 Cascade Flagging: Automatic downstream impact

    • When source status → quarantined, flag citing assertions
    • New field on assertion index: source_status: Option<SourceStatus>
    • Queries can filter by exclude_quarantined_sources=true
    • Alternative: SourceAwareLens that checks source status at resolution
  • P3.3 Impact Dashboard Widget: Visualize the cascade

    • Source status change UI (Active → Quarantined)
    • Animated "impact ripple" showing affected count
    • List of impacted queries with timestamp
    • "Remediation status" tracking

Pilot-4: Production Hardening (Week 4)

Deliverable: Load testing, authentication, backup documentation

  • P4.1 Load Testing: Prove performance claims

    • k6 or wrk load test scripts
    • Benchmark: 10K assertions baseline latency
    • Benchmark: 1K writes/sec sustained for 1 hour
    • Benchmark: 100 concurrent readers, no degradation
    • Document results in uat/production-readiness/
  • P4.2 API Authentication: Basic security for pilot

    • API key middleware (X-API-Key header)
    • Per-key rate limiting (separate from per-agent quota)
    • Admin keys vs read-only keys
    • Key management: POST /v1/admin/api-keys
  • P4.3 Backup/Restore Documentation: DR story

    • Document WAL-based recovery procedure
    • Script: scripts/backup-stemedb.sh (snapshot + WAL archive)
    • Script: scripts/restore-stemedb.sh (restore from backup)
    • Test restore procedure, document in UAT
  • P4.4 Prometheus Metrics: Observability baseline

    • GET /metrics endpoint with prometheus format
    • Key metrics: assertions_total, queries_total, query_latency_seconds
    • Trust metrics: quarantine_pending, circuit_breakers_open
    • Basic Grafana dashboard template

Pilot-5: Operational Readiness (Week 5)

Deliverable: Runbooks, monitoring, reference architecture

  • P5.1 Operational Runbooks: Common procedures documented

    • "Server won't start" troubleshooting
    • "High query latency" investigation
    • "Quarantine queue overflow" handling
    • "Circuit breaker stuck open" resolution
    • "Restore from backup" step-by-step
  • P5.2 Reference Architecture: Deployment guide

    • Single-node pilot deployment diagram
    • Network requirements (ports, firewall rules)
    • Reverse proxy configuration (nginx/envoy with TLS)
    • Resource sizing guide (CPU, memory, disk)
  • P5.3 Pilot Success Criteria Document: Definition of done

    • Sub-second query latency at 10K assertions: measured
    • Successful conflict detection on known contradictory studies: demonstrated
    • Complete audit trail export for mock regulatory review: tested
    • Source retraction workflow: exercised
  • P5.4 Executive Demo Script Validation: End-to-end rehearsal

    • Run through amazement-demo-2.md with real dashboard
    • Time each segment (target: 20 minutes total)
    • Anticipate and document answers to 10 tough questions
    • Record demo video for async sharing
    • All 5 Aha Moments demonstrable with real data (not mockups)

Pilot Prep Deliverables Summary

Week Deliverable Owner Acceptance Criteria
1-2 stemedb-dashboard Frontend 6 functional panels, connects to API
2 demo-seed SDK 500+ signed assertions, retractable sources
3 Impact Analysis Backend /v1/sources/{hash}/impact returns cascade
4 Load Test Results QA 10K assertions, 1K writes/sec documented
4 API Authentication Backend API keys work, rate limiting functional
4 Backup/Restore Ops Documented and tested procedure
4 Metrics Endpoint Backend /metrics returns Prometheus format
5 Runbooks Ops 5 runbooks in docs/runbooks/
5 Reference Architecture Docs Deployment guide complete
5 Demo Rehearsal All 20-minute demo runs smoothly

Phase 8B-C: Production Observability (Planned)

Blocked by: Pilot Prep (need real production deployment first)

8B. Observability

  • 8B.1 Distributed Metrics: Per-node, per-range, per-agent metrics.

    • sync_lag_seconds{peer}, merkle_diff_size{peer}, convergence_latency_p99
    • assertions_total{node}, writes_per_second{node}
    • Crate: metrics + metrics-exporter-prometheus
  • 8B.2 Admin Dashboard: Cluster health visibility.

    • GET /v1/admin/cluster → node list, range assignments, leader locations
    • GET /v1/admin/ranges → range sizes, split/merge history
    • POST /v1/admin/sync → force anti-entropy sync

8C. Production Hardening

  • 8C.1 Snapshot/Restore: Fast replica bootstrap.

    • Serialize full node state as snapshot
    • New nodes join by restoring snapshot + replaying recent WAL
  • 8C.2 Backpressure: Don't overwhelm slow nodes.

    • Track per-peer sync queue depth
    • Throttle gossip to slow peers
  • 8C.3 Geo-Distribution: Multi-region deployment.

    • Regional clusters with CRDT federation
    • Locality-aware reads

Phase 9: The Bunker (Disaster Planning)

Goal: Survive the worst. Backup, restore, recover from corruption, comply with regulations.

9A. Backup & Cold Storage

  • 9A.1 Full Cluster Backup: Point-in-time snapshot to S3/GCS.
  • 9A.2 Point-in-Time Recovery (PITR): Restore to any HLC timestamp.
  • 9A.3 Backup Verification: Weekly automated restore tests.

9B. Data Corruption & Rollback

  • 9B.1 Corruption Detection: Deep validation before accepting gossip.
  • 9B.2 Assertion Tombstones: "Delete" in an append-only world.
  • 9B.3 Cluster Rollback: Batch tombstone generation for time ranges.
  • 9B.4 Fork Recovery: Heal split-brain after extended partition.
  • 9C.1 GDPR Right to Erasure: Cryptographic erasure via per-agent keys.
  • 9C.2 Data Retention Policies: Per-subject/predicate retention rules.
  • 9C.3 Audit Trail for Compliance: Immutable admin action log.
  • 9C.4 SOC 2 Type II Certification: External audit and certification.
    • Gap assessment and remediation
    • Evidence collection automation
    • Auditor engagement
    • Target: Q3 2026

9D. Storage Management

  • 9D.1 Compaction: Reclaim space from tombstoned data.
  • 9D.2 Tiered Storage: Hot/warm/cold based on access patterns.
  • 9D.3 Storage Quotas: Per-agent and cluster-wide limits.

9E. Incident Response

  • 9E.1 Alerting & Escalation: PagerDuty/Slack integration.
  • 9E.2 Operational Runbooks: Documented procedures for common failures.
  • 9E.3 Chaos Engineering: Monthly "game days" with controlled failures.

9F. Security Hardening

  • 9F.1 TLS Everywhere: mTLS for node-to-node traffic.
  • 9F.2 Encryption at Rest: WAL and KV store encryption.
  • 9F.3 Node Authentication: Ed25519 keypair identity, signed cluster join.

Architecture Overview

Write Path (Spine):           Read Path (Cortex):
[Agent] -> [Ingestion]        [Agent] <- [Lens Engine]
              |                              |
              v                              |
         [WAL/Fsync]                  [Index Lookup]
              |                              |
              v                              |
         [KV Store] <--------------------+

Port Scheme (181XX)

Offset Service Default Env Var
+0 HTTP API 18180 STEMEDB_BIND_ADDR
+1 Cluster Gateway 18181 STEMEDB_NODE_API_ADDR
+2 Cluster RPC 18182 STEMEDB_NODE_RPC_ADDR
+3 SWIM Gossip 18183 via SwimConfig
+4 Metrics 18184 (reserved)
+5 Admin 18185 (reserved)
+6 Latent Signal 18186
+7 Community App 18187
+8 Admin Dashboard 18188

Crates

Crate Purpose Status
stemedb-core Assertion, LifecycleStage, MaterializedView, types, signing
stemedb-wal Write-ahead log with crash recovery
stemedb-storage KVStore, VoteStore, IndexStore, TrustRankStore, QuarantineStore
stemedb-ingest Ingestion pipeline, signature verification, ContentDefenseLayer
stemedb-query Query engine, Materializer for O(1) MV reads
stemedb-lens Lenses (Recency, Consensus, Authority, Skeptic, Layered, etc.)
stemedb-api HTTP API with axum + utoipa OpenAPI docs
stemedb-sim Simulation for testing the pipeline
stemedb-merkle BLAKE3 Merkle tree for diff detection
stemedb-rpc gRPC services for node-to-node communication
stemedb-sync Merkle sync, gossip broadcast, anti-entropy
stemedb-cluster Cluster membership (SWIM), sharding, gateway
stemedb-ontology Domain definitions (Pharma), subject builders, medical extractors
stemedb-chaos Chaos testing infrastructure
stemedb-dashboard Admin dashboard (React/Next.js) 🎯 In Progress (scaffold complete)

SDKs

SDK Purpose Status
sdk/go/steme Go HTTP client with Ed25519 signing and fluent builders
sdk/go/adk ADK-Go tools and callbacks for AI agents

Specialized Agents

Domain Agent When to use
Product Vision episteme-product-visionary Use cases, "why not Postgres?", product-market fit
Pilot Prep enterprise-skeptic-buyer Pressure-test demos, find gaps, prepare for tough questions
General Rust primary-developer Feature implementation, refactoring
Code Quality rust-quality-engineer Reviews, test coverage, clippy
Storage storage-engine-architect WAL, LSM, crash recovery
Graph Engine rust-graph-engine-architect Lock-free structures, cache optimization
Defensive defensive-systems-architect Rate limiting, circuit breakers, hostile input
Distributed distributed-systems-engineer CRDT replication, Raft coordination, Merkle sync
Lenses stemedb-lens-architect Query resolution, ranking algorithms
Planning stemedb-planner Milestone planning, roadmap

Quick Reference

# Build
cargo build --workspace

# Test
cargo test --workspace

# Lint (must pass before commit)
cargo clippy --workspace -- -D warnings
cargo fmt --check

# Run API server
cargo run --bin stemedb-api

# Run demo script
./scripts/demo-consumer-health.sh