## Phase 8: Enterprise Extractor Improvements ✅ - 14 security extractors (TLS, JWT, SQL injection, XSS, etc.) - 10 framework-specific extractors (Spring, Django, Rails, etc.) - Config file security detection (YAML, TOML) ## Phase 9: Autonomous Extractor Generation ✅ - Shadow mode executor with TP/FP tracking - Graduation pipeline with confidence thresholds - Auto-rollback on regression detection - Cross-project pattern syncing ## UAT Suite Complete (14 scripts, 90 tests) - test-core-detection.sh (6 tests) - test-declarative-extractors.sh (5 tests) - test-domain-frameworks.sh (5 tests) - test-domain-unreal.sh (3 tests) - test-llm-extraction.sh (6 tests) - test-eval-harness.sh (5 tests) - test-cross-language.sh (3 tests) - test-precommit-performance.sh (4 tests) - test-output-formats.sh (8 tests) - test-drift-detection.sh (6 tests) - test-exit-codes.sh (12 tests) + 3 more scripts ## Other Changes - Updated roadmap to mark Phase 8-9 complete - Added .gitignore entries for build artifacts - Updated pre-commit: 800 line limit, exclude tests/data/cmd Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
24 KiB
Episteme (StemeDB) Roadmap
Goal: Build the "Git for Truth" substrate for autonomous AI research. Current Focus: Enterprise Pilot Preparation Target Vertical: BioTech/Pharma ("The Living Review") Endgame: Distributed multi-writer cluster for millions of concurrent agents
Infrastructure Status: Phases 1-7 complete ✅ | Phase 8A (Chaos) complete ✅ Pilot Status: Consumer Health MVP complete ✅ | Enterprise Demo in progress
Archive: For completed phases 1-7, see roadmap-archive.md
Current Status
| Phase | Status | Summary |
|---|---|---|
| 1-7 | ✅ Complete | Core infrastructure, distributed cluster, trust & safety |
| 8A | ✅ Complete | Chaos testing, Jepsen-style verification |
| MVP | ✅ Complete | Consumer Health demo with real FDA data |
| Pilot Prep | 🎯 In Progress | Dashboard, impact analysis, production hardening |
| 8B-C | Planned | Observability, geo-distribution |
| 9 | Planned | Disaster recovery, compliance, storage management |
🎯 Phase: Enterprise Pilot Preparation (CURRENT)
Goal: Make the pilot bulletproof. Amaze enterprise decision makers. Timeline: 5 weeks Success Criteria: Dr. Sarah Chen (skeptical VP of Data Infrastructure) fights her CFO for budget
The 5 Amazement Moments We Must Deliver
| # | Moment | Current State | Gap |
|---|---|---|---|
| 1 | Contradictions visible with confidence scores | ✅ Complete | Dashboard scaffold + Skeptic Query UI ✅ |
| 2 | Cascade invalidation when source retracted | ✅ Complete | Full UI: Sources page + impact dialog (P3.1-3.3) ✅ |
| 3 | Full FDA-ready audit trail | ✅ Complete | Audit Trail Browser (P1.6) ✅ |
| 4 | Point-in-time queries + decay | ✅ API ready | No timeline UI |
| 5 | Malicious agent blocked by circuit breaker | ✅ Complete | Circuit Breaker Status (P1.5) ✅ |
Pilot-1: Demo Dashboard (Week 1-2)
Deliverable: React admin dashboard that makes the API visual
-
P1.1 Dashboard Scaffold: Next.js + shadcn/ui project setup ✅
- Project structure:
applications/stemedb-dashboard/ - API client for StemeDB endpoints (
src/lib/api/client.ts) - Authentication scaffold (API key header)
- Dark mode (default), responsive layout with collapsible sidebar
- shadcn/ui components: button, card, badge, input, separator, tabs
- Live API status indicator (polls /health every 30s)
- Port 18188, builds and runs successfully
- Project structure:
-
P1.2 Skeptic Query Visualization: Show contradictions graphically ✅
- Query builder: subject, predicate inputs
- Conflict score gauge (0.0-1.0 with color coding)
- Claims table with weight bars, source tier badges
- "CONTESTED" / "AGREED" / "UNANIMOUS" status badges
- Expandable claim rows with source details, agents, provenance hashes
- Loading skeleton, empty state, error state with retry
-
P1.3 Layered Consensus View: Per-tier breakdown ✅
- Tier accordion showing each source class (T0→T5, empty tiers hidden)
- Within-tier conflict score (compact gauge in accordion header)
- Cross-tier conflict visualization (full gauge with stats)
- Extended ConflictGauge with variant prop for reuse
-
P1.4 Quarantine Admin Panel: Content defense visibility ✅
- Pending queue with reason, timestamp, quality score
quarantine-panel.tsx,quarantine-list.tsx,quarantine-row.tsx
- Approve/Reject buttons with confirmation
ConfirmationDialogwith restore/delete actions
- Filter by reason (duplicate, spam, untrusted high-confidence)
quarantine-filters.tsxwith dropdown selector
- Metrics: pending count, approved/rejected today
quarantine-metrics.tsxwith MetricCard grid
- Pending queue with reason, timestamp, quality score
-
P1.5 Circuit Breaker Status: Trust & safety dashboard ✅
- Blocked agents list with failure count, retry time
circuit-list.tsx,circuit-card.tsxwith full details
- State badges: OPEN (red), HALF_OPEN (yellow), CLOSED (green)
state-badge.tsxwith color-coded variants
- Manual reset button for admin override
circuit-panel.tsx-handleResetcalls API
- Summary with state counts
circuit-summary.tsxreplaces historical events (more useful)
- Auto-refresh every 10 seconds
- Blocked agents list with failure count, retry time
-
P1.6 Audit Trail Browser: Query provenance explorer ✅
- Recent queries list with agent, timestamp, subject
audit-list.tsx,audit-row.tsxwith pagination
- Drilldown: contributing assertions, weights, winner
- Expandable row details in
audit-row.tsx
- Expandable row details in
- Filter by agent, time range, subject
audit-filters.tsxwith 1h/24h/7d/30d/all options
- Export to JSON/CSV
audit-export.tsxwith proper escaping
- Recent queries list with agent, timestamp, subject
Pilot-2: Demo Data Seeder (Week 2)
Deliverable: Pre-signed realistic demo data using Go SDK Status: All complete ✅
-
P2.1 Demo Keypair Management: Reproducible demo keys ✅
- 5 demo agents with realistic naming convention:
fda:drug-label-ingestor(Tier 0 - Regulatory)pubmed:abstract-indexer(Tier 1 - Clinical)clinicaltrials:study-importer(Tier 1 - Clinical)reddit:health-discussion-scraper(Tier 5 - Anecdotal)internal:clinical-ops-reviewer(Tier 3 - Expert)
- Keys stored in
demo/keys/with README documenting each agent's role/scopedemo/keys/agents.jsonwith seeds, public keys, tiers, descriptionsdemo/keys/README.mdwith full documentationdemo/keys/keygen.gofor deterministic regeneration
- Go SDK script:
cmd/demo-seed/main.go- Loads keys from
agents.json - Creates 260+ assertions with realistic data
- Loads keys from
- One-command setup:
./scripts/run-demo.sh(start DB → seed → open dashboard)- Build detection, health check, auto-cleanup on exit
--cleanflag for fresh start,--no-opento skip browser
- 5 demo agents with realistic naming convention:
-
P2.2 Conflict Scenarios: Pre-built disagreements with real data ✅
- 3 drugs: semaglutide (45), tirzepatide (38), liraglutide (32) assertions
- 150+ assertions total using real FDA label excerpts
- ClinicalTrials.gov summaries (STEP, SURMOUNT, SELECT, LEADER trials)
- Killer conflicts: Weight loss (FDA 14.9% vs STEP UP 20.7% vs Reddit variable), Gastroparesis (FDA 0.2% vs UBC 3x risk)
- 4 genuine conflicts per drug (weight loss, nausea, gastroparesis, CV benefit)
- Source registry with 30+ deterministic hash sources across T0-T5 tiers
-
P2.3 Retractable Sources: Set up cascade demo ✅
- New
CARDIOVASC_MEGA_TRIALsource insources.go(landmark multi-drug CV outcomes study) - 110 assertions citing this source across 8 categories (visceral cascade effect)
- Primary/Secondary CV Outcomes (30), Biomarkers (15), Subgroup Analyses (20)
- Expert Guidelines (15), Real-World Evidence (15), Comparative Efficacy (10), Community (5)
- 5 agents represented: T0 (FDA), T1 (ClinicalTrials), T2 (PubMed), T3 (Internal), T5 (Reddit)
printCascadeDemoCommands()outputs curl commands for demo flow- Demo documentation updated in
amazement-demo.md - Note: API endpoints (P3.1) complete ✅, live demo ready
- New
-
P2.4 Historical Data: Time-travel via lifecycle evolution ✅
- Approach: Use lifecycle states (Proposed → Approved → Deprecated), not fake timestamps
- Each lifecycle transition auditable with real timestamps (signature timestamps)
- Demo scenario: Wegovy CV indication change (pre-March 2024 vs post-SELECT)
- 8 historical scenarios: CV indication, SELECT trial evolution, ADA guidelines, Tirzepatide expansion
- 17 historical assertions showing lifecycle progression
- Demo commands for
as_ofqueries
Pilot-3: Impact Analysis (Week 3)
Deliverable: Automatic cascade when source is retracted Critical: This unblocks P2.3 (retractable sources demo data)
-
P3.1 Impact Analysis Endpoint:
GET /v1/sources/{hash}/impact✅- Returns all assertions citing this source (verified: 110 assertions for CARDIOVASC_MEGA_TRIAL)
- Returns count of queries that used those assertions
- Returns list of affected agents/recommendations (verified: 4 agents)
- Implementation in
stemedb-api/src/handlers/source_registry/handlers.rs:237-439 POST /v1/sources/{hash}/quarantinewith preview mode (preview=true shows impact without changes)- Preview response: "This will affect X assertions and Y agent recommendations"
- Undo capability:
POST /v1/sources/{hash}/restore(verified: restores 110 assertions) - 17 unit/integration tests passing
-
P3.2 Cascade Flagging: Automatic downstream impact ✅
- When source status → quarantined, flag citing assertions
- Implemented query-time lookup (not index mutation) to preserve append-only immutability
SourceStatusEnricherservice batch-lookups source statuses from SourceRegistrySourceWarningDtoattached to assertions withwarning_type,message,source_label,status_updated_at
New field on assertion index→ Query-time enrichment instead (preserves immutability)- Queries can filter by
exclude_quarantined_sources=true- Added to
QueryParamsindto/query_params.rs - POST-retrieval filter applied after query execution
- Added to
- Define query behavior: quarantined sources show with warning (not silently omitted)
source_warningfield added toAssertionResponseandClaimSummaryDto- Skeptic endpoint enriches claims with warnings
- Export affected items list for regulatory documentation (CSV/JSON)
GET /v1/sources/{hash}/impact/export?format=csv|json- Returns
ImpactExportRowwith assertion_hash, subject, predicate, agent_id, timestamp, lifecycle, confidence - CSV includes proper escaping, JSON returns array of objects
- When source status → quarantined, flag citing assertions
-
P3.3 Impact Dashboard Widget: Visualize the cascade ✅
- Source status change UI (Active → Quarantined)
components/sources/status-badge.tsxwith color-coded badgescomponents/sources/tier-badge.tsxwith T0-T5 labels
- Confirmation dialog: "This will affect 234 downstream assertions and 12 recommendations"
components/sources/quarantine-dialog.tsxwith impact preview- Warning box shows exact affected counts from API
- Choice: "Quarantine immediately" or "Review affected items first"
- Dual action buttons in dialog
- "Review first" opens
ImpactDetailPanelwith full assertion list
- Animated "impact ripple" showing affected count
components/sources/impact-ripple.tsxwith Tailwindanimate-ping- Triggers on dialog open, counts pulse with amber styling
- List of impacted queries with timestamp
components/sources/impact-detail-panel.tsxshows affected assertions table- Affected agents shown as chips
- "Remediation status" tracking
- Source status visible in list, metrics show quarantined count
components/sources/sources-metrics.tsxwith Active/Deprecated/Quarantined counts
- Audit trail: WHO retracted, WHEN, and WHY
RestoreDialogandQuarantineDialogcapture reason field- Export to CSV/JSON for regulatory documentation
- Source status change UI (Active → Quarantined)
Pilot-4: Production Hardening (Week 4)
Deliverable: Load testing, authentication, backup documentation
-
P4.1 Load Testing: Prove performance claims ✅
- Go-based load tester with native Ed25519 signing (
cmd/load-test/) - Benchmark: 10K assertions baseline latency (p99 < 200ms target)
- Benchmark: 1K writes/sec sustained for configurable duration
- Benchmark: 100 concurrent readers, <2x degradation target
- Markdown report generator with pass/fail status
- One-command runner:
./scripts/run-load-test.sh - Results saved to
uat/production-readiness/results/
- Go-based load tester with native Ed25519 signing (
-
P4.2 API Authentication: Basic security for pilot
- API key middleware (
X-API-Keyheader) - Per-key rate limiting (separate from per-agent quota)
- Admin keys vs read-only keys
- Key management:
POST /v1/admin/api-keys
- API key middleware (
-
P4.3 Backup/Restore Documentation: DR story
- Document WAL-based recovery procedure
- Script:
scripts/backup-stemedb.sh(snapshot + WAL archive) - Script:
scripts/restore-stemedb.sh(restore from backup) - Test restore procedure, document in UAT
-
P4.4 Prometheus Metrics: Observability baseline
GET /metricsendpoint with prometheus format- Key metrics:
assertions_total,queries_total,query_latency_seconds - Trust metrics:
quarantine_pending,circuit_breakers_open - Basic Grafana dashboard template
Pilot-5: Operational Readiness (Week 5)
Deliverable: Runbooks, monitoring, reference architecture
-
P5.1 Operational Runbooks: Common procedures documented
- "Server won't start" troubleshooting
- "High query latency" investigation
- "Quarantine queue overflow" handling
- "Circuit breaker stuck open" resolution
- "Restore from backup" step-by-step
-
P5.2 Reference Architecture: Deployment guide
- Single-node pilot deployment diagram
- Network requirements (ports, firewall rules)
- Reverse proxy configuration (nginx/envoy with TLS)
- Resource sizing guide (CPU, memory, disk)
-
P5.3 Pilot Success Criteria Document: Definition of done
- Sub-second query latency at 10K assertions: measured
- Successful conflict detection on known contradictory studies: demonstrated
- Complete audit trail export for mock regulatory review: tested
- Source retraction workflow: exercised
-
P5.4 Executive Demo Script Validation: End-to-end rehearsal
- Run through
amazement-demo-2.mdwith real dashboard - Time each segment (target: 20 minutes total)
- Record demo video for async sharing (backup if live demo fails)
- All 5 Aha Moments demonstrable with real data (not mockups)
- Enterprise Skeptic Questions (must have documented answers):
- What's the data ingestion latency? (FDA update → queryable)
- What happens when agents disagree on interpretation?
- Can I export an audit report for regulators? (PDF/CSV)
- What's the failure mode if service goes down mid-demo?
- How do I verify demo data is representative of my real data?
- If I retract a source, what happens to queries that would have used it?
- Run through
Pilot Prep Deliverables Summary
| Week | Deliverable | Owner | Acceptance Criteria |
|---|---|---|---|
| 1-2 | stemedb-dashboard |
Frontend | ✅ 6 functional panels, connects to API (P1.1-P1.6) |
| 2 | demo-seed (P2.1-P2.4) |
SDK | ✅ 260+ assertions, 3 drugs, real FDA content, lifecycle history, cascade data |
| 3 | Impact Analysis (P3.1) | Backend | ✅ /v1/sources/{hash}/impact + quarantine/restore endpoints |
| 3 | Cascade Flagging (P3.2) | Backend | ✅ Source warnings, exclude filter, impact export |
| 3 | Impact Dashboard (P3.3) | Frontend | ✅ Sources page, quarantine dialog, impact ripple, export |
| 3 | demo-seed (P2.3) |
SDK | ✅ Retractable source with 110 cascade assertions |
| 4 | Load Test Results | QA | ✅ cmd/load-test/ + scripts/run-load-test.sh |
| 4 | API Authentication | Backend | API keys work, rate limiting functional |
| 4 | Backup/Restore | Ops | Documented and tested procedure |
| 4 | Metrics Endpoint | Backend | /metrics returns Prometheus format |
| 5 | Runbooks | Ops | 5 runbooks in docs/runbooks/ |
| 5 | Reference Architecture | Docs | Deployment guide complete |
| 5 | Demo Rehearsal | All | 20-minute demo runs smoothly |
| 5 | One-Command Demo | Ops | ✅ ./scripts/run-demo.sh works (P2.1) |
Demo Data Quality Checklist (from Enterprise Skeptic Review)
- Real FDA label excerpts (public domain) - not synthetic ✅
- ClinicalTrials.gov summaries for plausibility ✅
- Agent names map to real-world roles (
fda:drug-label-ingestor) - P2.1 ✅ - Conflicts are genuine (not "100% vs 0%" manufactured disagreements) - P2.2 ✅
- Cascade demo shows 100+ affected items (visceral impact) - 110 assertions ✅
- Export capability for regulatory documentation (CSV/JSON) - P3.2 ✅
- Recovery story: what happens if demo breaks mid-presentation?
Phase 8B-C: Production Observability (Planned)
Blocked by: Pilot Prep (need real production deployment first)
8B. Observability
-
8B.1 Distributed Metrics: Per-node, per-range, per-agent metrics.
sync_lag_seconds{peer},merkle_diff_size{peer},convergence_latency_p99assertions_total{node},writes_per_second{node}- Crate:
metrics+metrics-exporter-prometheus
-
8B.2 Admin Dashboard: Cluster health visibility.
GET /v1/admin/cluster→ node list, range assignments, leader locationsGET /v1/admin/ranges→ range sizes, split/merge historyPOST /v1/admin/sync→ force anti-entropy sync
8C. Production Hardening
-
8C.1 Snapshot/Restore: Fast replica bootstrap.
- Serialize full node state as snapshot
- New nodes join by restoring snapshot + replaying recent WAL
-
8C.2 Backpressure: Don't overwhelm slow nodes.
- Track per-peer sync queue depth
- Throttle gossip to slow peers
-
8C.3 Geo-Distribution: Multi-region deployment.
- Regional clusters with CRDT federation
- Locality-aware reads
Phase 9: The Bunker (Disaster Planning)
Goal: Survive the worst. Backup, restore, recover from corruption, comply with regulations.
9A. Backup & Cold Storage
- 9A.1 Full Cluster Backup: Point-in-time snapshot to S3/GCS.
- 9A.2 Point-in-Time Recovery (PITR): Restore to any HLC timestamp.
- 9A.3 Backup Verification: Weekly automated restore tests.
9B. Data Corruption & Rollback
- 9B.1 Corruption Detection: Deep validation before accepting gossip.
- 9B.2 Assertion Tombstones: "Delete" in an append-only world.
- 9B.3 Cluster Rollback: Batch tombstone generation for time ranges.
- 9B.4 Fork Recovery: Heal split-brain after extended partition.
9C. Compliance & Legal
- 9C.1 GDPR Right to Erasure: Cryptographic erasure via per-agent keys.
- 9C.2 Data Retention Policies: Per-subject/predicate retention rules.
- 9C.3 Audit Trail for Compliance: Immutable admin action log.
- 9C.4 SOC 2 Type II Certification: External audit and certification.
- Gap assessment and remediation
- Evidence collection automation
- Auditor engagement
- Target: Q3 2026
9D. Storage Management
- 9D.1 Compaction: Reclaim space from tombstoned data.
- 9D.2 Tiered Storage: Hot/warm/cold based on access patterns.
- 9D.3 Storage Quotas: Per-agent and cluster-wide limits.
9E. Incident Response
- 9E.1 Alerting & Escalation: PagerDuty/Slack integration.
- 9E.2 Operational Runbooks: Documented procedures for common failures.
- 9E.3 Chaos Engineering: Monthly "game days" with controlled failures.
9F. Security Hardening
- 9F.1 TLS Everywhere: mTLS for node-to-node traffic.
- 9F.2 Encryption at Rest: WAL and KV store encryption.
- 9F.3 Node Authentication: Ed25519 keypair identity, signed cluster join.
Architecture Overview
Write Path (Spine): Read Path (Cortex):
[Agent] -> [Ingestion] [Agent] <- [Lens Engine]
| |
v |
[WAL/Fsync] [Index Lookup]
| |
v |
[KV Store] <--------------------+
Port Scheme (181XX)
| Offset | Service | Default | Env Var |
|---|---|---|---|
| +0 | HTTP API | 18180 | STEMEDB_BIND_ADDR |
| +1 | Cluster Gateway | 18181 | STEMEDB_NODE_API_ADDR |
| +2 | Cluster RPC | 18182 | STEMEDB_NODE_RPC_ADDR |
| +3 | SWIM Gossip | 18183 | via SwimConfig |
| +4 | Metrics | 18184 | (reserved) |
| +5 | Admin | 18185 | (reserved) |
| +6 | Latent Signal | 18186 | — |
| +7 | Community App | 18187 | — |
| +8 | Admin Dashboard | 18188 | — |
Crates
| Crate | Purpose | Status |
|---|---|---|
stemedb-core |
Assertion, LifecycleStage, MaterializedView, types, signing | ✅ |
stemedb-wal |
Write-ahead log with crash recovery | ✅ |
stemedb-storage |
KVStore, VoteStore, IndexStore, TrustRankStore, QuarantineStore | ✅ |
stemedb-ingest |
Ingestion pipeline, signature verification, ContentDefenseLayer | ✅ |
stemedb-query |
Query engine, Materializer for O(1) MV reads | ✅ |
stemedb-lens |
Lenses (Recency, Consensus, Authority, Skeptic, Layered, etc.) | ✅ |
stemedb-api |
HTTP API with axum + utoipa OpenAPI docs | ✅ |
stemedb-sim |
Simulation for testing the pipeline | ✅ |
stemedb-merkle |
BLAKE3 Merkle tree for diff detection | ✅ |
stemedb-rpc |
gRPC services for node-to-node communication | ✅ |
stemedb-sync |
Merkle sync, gossip broadcast, anti-entropy | ✅ |
stemedb-cluster |
Cluster membership (SWIM), sharding, gateway | ✅ |
stemedb-ontology |
Domain definitions (Pharma), subject builders, medical extractors | ✅ |
stemedb-chaos |
Chaos testing infrastructure | ✅ |
stemedb-dashboard |
Admin dashboard (React/Next.js) | 🎯 In Progress (7 panels complete) |
SDKs
| SDK | Purpose | Status |
|---|---|---|
sdk/go/steme |
Go HTTP client with Ed25519 signing and fluent builders | ✅ |
sdk/go/adk |
ADK-Go tools and callbacks for AI agents | ✅ |
Specialized Agents
| Domain | Agent | When to use |
|---|---|---|
| Product Vision | episteme-product-visionary |
Use cases, "why not Postgres?", product-market fit |
| Pilot Prep | enterprise-skeptic-buyer |
Pressure-test demos, find gaps, prepare for tough questions |
| General Rust | primary-developer |
Feature implementation, refactoring |
| Code Quality | rust-quality-engineer |
Reviews, test coverage, clippy |
| Storage | storage-engine-architect |
WAL, LSM, crash recovery |
| Graph Engine | rust-graph-engine-architect |
Lock-free structures, cache optimization |
| Defensive | defensive-systems-architect |
Rate limiting, circuit breakers, hostile input |
| Distributed | distributed-systems-engineer |
CRDT replication, Raft coordination, Merkle sync |
| Lenses | stemedb-lens-architect |
Query resolution, ranking algorithms |
| Planning | stemedb-planner |
Milestone planning, roadmap |
Quick Reference
# Build
cargo build --workspace
# Test
cargo test --workspace
# Lint (must pass before commit)
cargo clippy --workspace -- -D warnings
cargo fmt --check
# Run API server
cargo run --bin stemedb-api
# Run demo script
./scripts/demo-consumer-health.sh
Related Documents
- CLAUDE.md — AI assistant instructions and project rules
- roadmap-archive.md — Completed phases 1-7 detail
- docs/demo/pilot/amazement-demo.md — Technical demo script
- docs/demo/pilot/amazement-demo-2.md — Executive demo script
- uat/production-readiness/README.md — Production verification checklist
- .claude/agents/enterprise-skeptic-buyer.md — Dr. Sarah Chen persona