# Pilot Success Criteria **Definition of "done" for StemeDB pilot deployments** This document defines the acceptance criteria for validating a StemeDB pilot before promoting to production. All "Must Pass" criteria are ship blockers. --- ## Overview | Section | Must Pass | Should Pass | Nice to Have | Total | |---------|-----------|-------------|--------------|-------| | **[1. Performance](#1-performance-requirements)** | 3 | 2 | 1 | 6 | | **[2. Functional](#2-functional-requirements)** | 4 | 2 | 1 | 7 | | **[3. Operational](#3-operational-requirements)** | 3 | 2 | 1 | 6 | | **[4. Demo Validation](#4-demo-validation-5-amazement-moments)** | 5 | 0 | 0 | 5 | | **[5. Acceptance](#5-acceptance-criteria)** | - | - | - | - | | **Total** | **15** | **6** | **3** | **24** | **Pass threshold:** All 15 "Must Pass" + 4/6 "Should Pass" = **19/24 minimum** --- ## 1. Performance Requirements ### Must Pass #### 1.1 Sub-Second Query Latency (p99 <1s) **Requirement:** p99 query latency <1 second at 10K assertions baseline. **Test Procedure:** ```bash # Load 10K assertions ./scripts/load-test-data.sh --count 10000 # Run query load test (100 queries/sec for 5 minutes) ./scripts/query-load-test.sh \ --rate 100 \ --duration 300 \ --endpoint /v1/query \ --lens recency # Extract p99 latency curl http://localhost:18180/metrics | grep 'stemedb_query_latency_seconds{quantile="0.99"}' ``` **Expected Result:** ``` stemedb_query_latency_seconds{quantile="0.99"} 0.987 # <1.0 ✅ ``` **Acceptance:** - ✅ Pass: p99 <1000ms - ⚠️ Warning: p99 1000-1500ms (acceptable with explanation) - ❌ Fail: p99 >1500ms --- #### 1.2 Sustained Ingest Rate (1K assertions/sec, 5 minutes) **Requirement:** Handle 1,000 assertions/sec sustained for 5 minutes with p99 latency <200ms. **Test Procedure:** ```bash # Run ingest load test ./scripts/ingest-load-test.sh \ --rate 1000 \ --duration 300 # Monitor metrics curl http://localhost:18180/metrics | grep -E '(ingest_rate|wal_fsync_latency)' ``` **Expected Result:** ``` # Ingest rate maintained rate(stemedb_assertions_total[1m]) ~= 1000 # WAL fsync latency <200ms stemedb_wal_fsync_latency_seconds{quantile="0.99"} 0.189 # <0.2 ✅ ``` **Acceptance:** - ✅ Pass: 1K/sec sustained, p99 <200ms, no errors - ⚠️ Warning: 800-1000/sec OR p99 200-300ms - ❌ Fail: <800/sec OR p99 >300ms OR errors >1% --- #### 1.3 Conflict Detection (Score >0.5 on contradictions) **Requirement:** ConflictLens assigns conflict_score >0.5 when assertions contradict. **Test Procedure:** ```bash # Submit contradictory assertions curl -X POST http://localhost:18180/v1/assert \ -d '{ "concept_path": "drug/aspirin/safety", "predicate": "adverse_event_rate", "value": 0.002, # 0.2% "confidence": 0.95, "agent_id": "fda-clinical-trial" }' curl -X POST http://localhost:18180/v1/assert \ -d '{ "concept_path": "drug/aspirin/safety", "predicate": "adverse_event_rate", "value": 0.12, # 12% (contradicts) "confidence": 0.7, "agent_id": "anecdotal-reports" }' # Query with ConflictLens curl -X POST http://localhost:18180/v1/query \ -d '{ "concept_path": "drug/aspirin/safety", "lens": "conflict" }' | jq '.conflict_score' ``` **Expected Result:** ```json { "conflict_score": 0.87, # >0.5 ✅ (high conflict detected) "assertions": [ {"value": 0.002, "confidence": 0.95, "agent": "fda-clinical-trial"}, {"value": 0.12, "confidence": 0.7, "agent": "anecdotal-reports"} ] } ``` **Acceptance:** - ✅ Pass: conflict_score >0.5 for contradictory values - ❌ Fail: conflict_score ≤0.5 --- ### Should Pass #### 1.4 Concurrent Query Capacity (100 readers, <2x degradation) **Requirement:** Support 100 concurrent readers with <2x latency degradation vs baseline. **Test Procedure:** ```bash # Measure baseline (1 concurrent reader) ab -n 1000 -c 1 -p query.json http://localhost:18180/v1/query # Note: mean latency (e.g., 50ms) # Measure under load (100 concurrent readers) ab -n 10000 -c 100 -p query.json http://localhost:18180/v1/query # Note: mean latency (e.g., 85ms) # Calculate degradation echo "scale=2; 85 / 50" | bc # = 1.7x (acceptable) ``` **Expected Result:** - Baseline: 50ms mean - Under load: <100ms mean (2x degradation) **Acceptance:** - ✅ Pass: <2x degradation - ⚠️ Warning: 2-3x degradation - ❌ Fail: >3x degradation --- #### 1.5 Replication Lag <1s (Cluster Only) **Requirement:** Three-node cluster maintains replication lag <1 second. **Test Procedure:** ```bash # Submit assertion to Node 1 curl -X POST http://node1:18180/v1/assert -d '{...}' # Wait 1 second sleep 1 # Query from Node 2 (different node) curl -X POST http://node2:18180/v1/query -d '{...}' # Should return the assertion # Check replication lag metric curl http://node1:18180/metrics | grep replication_lag_seconds ``` **Expected Result:** ``` replication_lag_seconds{node="node1"} 0.234 # <1.0 ✅ replication_lag_seconds{node="node2"} 0.456 # <1.0 ✅ replication_lag_seconds{node="node3"} 0.123 # <1.0 ✅ ``` **Acceptance:** - ✅ Pass: All nodes <1s - ⚠️ Warning: Any node 1-5s - ❌ Fail: Any node >5s --- ### Nice to Have #### 1.6 Dashboard Load Time <2s **Requirement:** StemeDB dashboard loads in <2 seconds. **Test Procedure:** ```bash # Measure page load time curl -w "@curl-format.txt" -o /dev/null -s http://localhost:18188/ # Or use browser DevTools Network tab # Load: http://localhost:18188/ # Check: DOMContentLoaded time ``` **Expected Result:** - DOMContentLoaded: <2000ms **Acceptance:** - ✅ Pass: <2s - ⚠️ Warning: 2-5s - ❌ Fail: >5s --- ## 2. Functional Requirements ### Must Pass #### 2.1 Complete Audit Trail (Export 100 assertions with signatures) **Requirement:** Export 100 assertions with full provenance chain and verify Ed25519 signatures. **Test Procedure:** ```bash # Query 100 assertions curl -X POST http://localhost:18180/v1/query \ -d '{ "concept_path": "drug/*", "lens": "recency", "limit": 100 }' > assertions.json # Verify each signature cat assertions.json | jq -r '.assertions[] | .signature' | while read sig; do # Extract public key, message, signature # Verify Ed25519 signature echo "Verifying $sig..." done # Check provenance fields cat assertions.json | jq '.assertions[] | select(.provenance == null or .provenance == "")' # Should return empty (all have provenance) ``` **Expected Result:** - 100 assertions exported - All have non-empty `provenance` field - All have non-empty `agent_id` field - All signatures verify successfully **Acceptance:** - ✅ Pass: 100/100 valid signatures + provenance - ❌ Fail: Any missing provenance or invalid signature --- #### 2.2 Source Retraction Cascade **Requirement:** Retracting source cascades to 110+ dependent assertions. **Test Procedure:** ```bash # Submit source + 110 dependent assertions ./scripts/seed-retraction-test-data.sh # Retract source curl -X POST http://localhost:18180/v1/retract \ -d '{ "concept_path": "source/CARDIOVASC_MEGA_TRIAL", "reason": "study_retracted_fabricated_data", "cascade": true }' # Query retracted assertions curl -X POST http://localhost:18180/v1/query \ -d '{ "concept_path": "drug/*/cardiovascular_risk", "lens": "recency", "include_retracted": true }' | jq '.assertions[] | select(.lifecycle_stage == "RETRACTED") | length' ``` **Expected Result:** ``` 111 # Source + 110 dependents (≥110 ✅) ``` **Acceptance:** - ✅ Pass: ≥110 assertions retracted - ❌ Fail: <110 assertions retracted --- #### 2.3 Multi-Lens Resolution **Requirement:** RecencyLens, ConsensusLens, and AuthorityLens return different winners for same query. **Test Procedure:** ```bash # Submit 3 assertions (different agents, times, confidence) curl -X POST http://localhost:18180/v1/assert -d '{ "concept_path": "drug/aspirin/dosage", "predicate": "recommended_mg", "value": 81, "confidence": 0.95, "agent_id": "fda-guidelines", "timestamp": "2024-01-01T00:00:00Z" }' curl -X POST http://localhost:18180/v1/assert -d '{ "concept_path": "drug/aspirin/dosage", "predicate": "recommended_mg", "value": 100, "confidence": 0.7, "agent_id": "mayo-clinic", "timestamp": "2025-06-01T00:00:00Z" }' curl -X POST http://localhost:18180/v1/assert -d '{ "concept_path": "drug/aspirin/dosage", "predicate": "recommended_mg", "value": 325, "confidence": 0.6, "agent_id": "patient-forum", "timestamp": "2025-12-01T00:00:00Z" }' # Query with each lens curl -X POST http://localhost:18180/v1/query \ -d '{"concept_path": "drug/aspirin/dosage", "lens": "recency"}' \ | jq '.assertions[0].value' # Expected: 325 (most recent) curl -X POST http://localhost:18180/v1/query \ -d '{"concept_path": "drug/aspirin/dosage", "lens": "authority"}' \ | jq '.assertions[0].value' # Expected: 81 (highest confidence from FDA) curl -X POST http://localhost:18180/v1/query \ -d '{"concept_path": "drug/aspirin/dosage", "lens": "consensus"}' \ | jq '.assertions[0].value' # Expected: 100 (middle value, balances recency + authority) ``` **Expected Result:** - RecencyLens returns: 325 (latest timestamp) - AuthorityLens returns: 81 (FDA, highest confidence) - ConsensusLens returns: 100 (middle value) **All 3 lenses return different winners ✅** **Acceptance:** - ✅ Pass: 3 different winners across lenses - ❌ Fail: Same winner for all lenses (indicates lens not working) --- #### 2.4 Health Endpoint Returns 200 **Requirement:** `/v1/health` returns 200 with valid JSON. **Test Procedure:** ```bash curl -i http://localhost:18180/v1/health ``` **Expected Result:** ``` HTTP/1.1 200 OK Content-Type: application/json { "status": "healthy", "version": "0.1.0", "uptime_seconds": 12345, "assertion_count": 10234 } ``` **Acceptance:** - ✅ Pass: 200 status + valid JSON - ❌ Fail: Non-200 status OR malformed JSON --- ### Should Pass #### 2.5 Query with Complex Lens (AuthorityLens with deep chain) **Requirement:** AuthorityLens resolves assertions with trust chain depth ≥3. **Test Procedure:** ```bash # Submit assertions with trust chain: # Agent A → Agent B → Agent C → Agent D (depth 3) ./scripts/seed-trust-chain.sh --depth 3 # Query with AuthorityLens curl -X POST http://localhost:18180/v1/query \ -d '{ "concept_path": "research/deep_chain", "lens": "authority" }' | jq '.trust_chain_depth' ``` **Expected Result:** ``` 3 # Depth ≥3 ✅ ``` **Acceptance:** - ✅ Pass: Depth ≥3 - ❌ Fail: Depth <3 --- #### 2.6 Time-Travel Query (2023 vs 2025 comparison) **Requirement:** Query returns different results for different timestamps. **Test Procedure:** ```bash # Query as of 2023 curl -X POST http://localhost:18180/v1/query \ -d '{ "concept_path": "drug/aspirin/dosage", "lens": "recency", "as_of": "2023-01-01T00:00:00Z" }' | jq '.assertions[0].value' # Expected: 81 (old guideline) # Query as of 2025 curl -X POST http://localhost:18180/v1/query \ -d '{ "concept_path": "drug/aspirin/dosage", "lens": "recency", "as_of": "2025-12-31T23:59:59Z" }' | jq '.assertions[0].value' # Expected: 325 (updated guideline) ``` **Expected Result:** - 2023: 81 - 2025: 325 - **Different values ✅** **Acceptance:** - ✅ Pass: Different values for different timestamps - ❌ Fail: Same value (time-travel not working) --- ### Nice to Have #### 2.7 Swagger UI Accessible **Requirement:** OpenAPI docs accessible at `/swagger-ui`. **Test Procedure:** ```bash curl -I http://localhost:18180/swagger-ui/ ``` **Expected Result:** ``` HTTP/1.1 200 OK Content-Type: text/html ``` **Acceptance:** - ✅ Pass: 200 status - ⚠️ Warning: 404 (acceptable if documented) --- ## 3. Operational Requirements ### Must Pass #### 3.1 Backup/Restore Roundtrip **Requirement:** Load 10K assertions → backup → restore → verify count matches. **Test Procedure:** ```bash # Load 10K assertions ./scripts/load-test-data.sh --count 10000 # Check count ORIGINAL_COUNT=$(curl -s http://localhost:18180/v1/health | jq '.assertion_count') echo "Original count: $ORIGINAL_COUNT" # Backup sudo ./scripts/backup-stemedb.sh BACKUP_DIR=$(ls -dt backups/stemedb-backup-* | head -1) # Stop server sudo systemctl stop stemedb-api # Restore sudo ./scripts/restore-stemedb.sh $BACKUP_DIR # Start server sudo systemctl start stemedb-api # Wait for startup sleep 10 # Check count RESTORED_COUNT=$(curl -s http://localhost:18180/v1/health | jq '.assertion_count') echo "Restored count: $RESTORED_COUNT" # Verify match [ "$ORIGINAL_COUNT" -eq "$RESTORED_COUNT" ] && echo "✅ Pass" || echo "❌ Fail" ``` **Expected Result:** ``` Original count: 10234 Restored count: 10234 ✅ Pass ``` **Acceptance:** - ✅ Pass: Counts match exactly - ❌ Fail: Counts differ --- #### 3.2 Node Failure Recovery (Three-Node Cluster) **Requirement:** Kill Node 2 → queries continue → node recovers → re-replicates <5 min. **Test Procedure:** ```bash # Kill Node 2 ssh node2 "sudo systemctl stop stemedb-api" # Verify cluster detects failure curl http://node1:18181/cluster/members | jq '.members[] | select(.id=="node2") | .status' # Expected: "DOWN" # Submit query to Node 1 (should succeed) curl -X POST http://node1:18180/v1/query -d '{...}' # Expected: 200 OK # Restart Node 2 ssh node2 "sudo systemctl start stemedb-api" # Wait for re-replication sleep 300 # 5 minutes # Check replication lag curl http://node2:18180/metrics | grep replication_lag_seconds # Expected: <1.0 ``` **Expected Result:** - Node 2 failure detected within 30s - Queries continue to succeed on Node 1 & 3 - Node 2 recovers and re-replicates within 5 minutes - Final replication lag <1s **Acceptance:** - ✅ Pass: All criteria met - ❌ Fail: Queries failed OR recovery >5 min --- #### 3.3 Rolling Restart (Three-Node Cluster, Zero Downtime) **Requirement:** Restart nodes one-by-one during load test → 100% success rate. **Test Procedure:** ```bash # Start load test (background) ./scripts/query-load-test.sh --rate 10 --duration 600 & LOAD_PID=$! # Wait 60s for baseline sleep 60 # Restart Node 1 ssh node1 "sudo systemctl restart stemedb-api" sleep 60 # Restart Node 2 ssh node2 "sudo systemctl restart stemedb-api" sleep 60 # Restart Node 3 ssh node3 "sudo systemctl restart stemedb-api" sleep 60 # Wait for load test to complete wait $LOAD_PID # Check success rate grep "Success rate" load-test-results.log ``` **Expected Result:** ``` Success rate: 100.0% (6000/6000 requests succeeded) ``` **Acceptance:** - ✅ Pass: 100% success rate - ⚠️ Warning: 98-99.9% success rate - ❌ Fail: <98% success rate --- ### Should Pass #### 3.4 Metrics Exposed (Prometheus Format) **Requirement:** `/metrics` endpoint returns Prometheus-format metrics. **Test Procedure:** ```bash curl http://localhost:18180/metrics | head -20 ``` **Expected Result:** ``` # HELP stemedb_assertions_total Total assertions ingested # TYPE stemedb_assertions_total counter stemedb_assertions_total 10234 # HELP stemedb_query_latency_seconds Query latency histogram # TYPE stemedb_query_latency_seconds histogram stemedb_query_latency_seconds_bucket{le="0.005"} 1234 ... ``` **Acceptance:** - ✅ Pass: Valid Prometheus format - ❌ Fail: Invalid format OR endpoint unreachable --- #### 3.5 Grafana Dashboard Loads **Requirement:** Grafana dashboard displays StemeDB metrics without errors. **Test Procedure:** 1. Open http://localhost:3000 (Grafana) 2. Navigate to "StemeDB Overview" dashboard 3. Check all panels load without errors **Expected Result:** - All panels display data - No "No data" or "Error" messages **Acceptance:** - ✅ Pass: All panels load - ⚠️ Warning: 1-2 panels missing data - ❌ Fail: >2 panels missing data --- ### Nice to Have #### 3.6 Backup Automation (Cron Job Running) **Requirement:** Daily backup cron job configured and executed. **Test Procedure:** ```bash # Check cron job exists sudo crontab -l | grep backup-stemedb # Expected: # 0 2 * * * /usr/local/bin/backup-stemedb.sh >> /var/log/stemedb-backup.log 2>&1 # Check last backup ls -lt backups/ | head -3 # Expected: Backup from last 24 hours ``` **Acceptance:** - ✅ Pass: Cron job exists + recent backup - ⚠️ Warning: Cron job exists but no recent backup - ❌ Fail: No cron job --- ## 4. Demo Validation: 5 Amazement Moments **All 5 moments must be demonstrable without errors.** ### Moment 1: Conflicting Claims (FDA 0.2% vs Anecdotal 12%) **Setup:** ```bash ./scripts/demo-moment-1-conflicting-claims.sh ``` **Demo Script:** 1. Show 2 assertions: FDA (0.2%) vs Anecdotal (12%) 2. Query with ConflictLens → Shows conflict_score: 0.87 3. Query with AuthorityLens → Returns FDA value (higher confidence) 4. **Amazement:** "Same data, different answers based on lens choice" **Acceptance:** - ✅ Pass: ConflictLens detects conflict, AuthorityLens picks FDA - ❌ Fail: Lenses don't differentiate --- ### Moment 2: Source Retraction Cascade (110 Assertions Flagged) **Setup:** ```bash ./scripts/demo-moment-2-retraction.sh ``` **Demo Script:** 1. Show study with 110 dependent drug safety assertions 2. Retract study: `POST /v1/retract` with `cascade: true` 3. Query retracted assertions → 111 total (study + dependents) 4. **Amazement:** "One retraction cascades to 110+ assertions automatically" **Acceptance:** - ✅ Pass: 111 assertions retracted - ❌ Fail: <110 assertions retracted --- ### Moment 3: Audit Trail (Provenance Chain to Source) **Setup:** ```bash ./scripts/demo-moment-3-audit-trail.sh ``` **Demo Script:** 1. Query assertion: "Drug X has adverse event rate 5%" 2. Show provenance: "Clinical trial ABC, 2024-06-15" 3. Trace to source: "Trial ABC run by Pharma Corp, funded by..." 4. Verify signature: Ed25519 signature valid 5. **Amazement:** "Full audit trail from claim to original source" **Acceptance:** - ✅ Pass: Provenance chain complete, signature valid - ❌ Fail: Missing provenance OR invalid signature --- ### Moment 4: Time-Travel (Query 2023 vs 2025 Guidelines) **Setup:** ```bash ./scripts/demo-moment-4-time-travel.sh ``` **Demo Script:** 1. Query aspirin dosage as of 2023 → Returns 81mg 2. Query same as of 2025 → Returns 325mg 3. Show timeline of changes (3 updates over 2 years) 4. **Amazement:** "See how medical guidelines evolved over time" **Acceptance:** - ✅ Pass: Different values for different timestamps - ❌ Fail: Same value (time-travel not working) --- ### Moment 5: Lens-Based Resolution (3 Lenses → 3 Winners) **Setup:** ```bash ./scripts/demo-moment-5-lens-resolution.sh ``` **Demo Script:** 1. Show 5 conflicting assertions for "recommended dosage" 2. Query with RecencyLens → Returns latest assertion 3. Query with ConsensusLens → Returns middle value 4. Query with AuthorityLens → Returns highest confidence assertion 5. **Amazement:** "Same query, 3 different answers - you choose resolution strategy" **Acceptance:** - ✅ Pass: 3 lenses return 3 different winners - ❌ Fail: Lenses return same winner --- ## 5. Acceptance Criteria ### Must Pass (Ship Blockers) **All 15 "Must Pass" criteria must be met:** - [ ] 1.1 Query latency p99 <1s - [ ] 1.2 Sustained ingest 1K/sec - [ ] 1.3 Conflict detection >0.5 - [ ] 2.1 Audit trail complete - [ ] 2.2 Retraction cascade ≥110 - [ ] 2.3 Multi-lens resolution - [ ] 2.4 Health endpoint 200 OK - [ ] 3.1 Backup/restore roundtrip - [ ] 3.2 Node failure recovery (cluster) - [ ] 3.3 Rolling restart (cluster) - [ ] 4.1 Moment 1: Conflicting claims - [ ] 4.2 Moment 2: Retraction cascade - [ ] 4.3 Moment 3: Audit trail - [ ] 4.4 Moment 4: Time-travel - [ ] 4.5 Moment 5: Lens resolution ### Should Pass (Recommended) **At least 4/6 "Should Pass" required:** - [ ] 1.4 Concurrent query capacity - [ ] 1.5 Replication lag <1s (cluster) - [ ] 2.5 Complex lens (deep chain) - [ ] 2.6 Time-travel query - [ ] 3.4 Metrics exposed - [ ] 3.5 Grafana dashboard ### Nice to Have (Optional) **Not required for pilot approval:** - [ ] 1.6 Dashboard load time <2s - [ ] 2.7 Swagger UI accessible - [ ] 3.6 Backup automation (cron) --- ## Validation Report Template **Copy this template to document pilot validation results:** ```markdown # StemeDB Pilot Validation Report **Date:** YYYY-MM-DD **Deployment:** [Single-node / Three-node cluster] **Instance Type:** [AWS t3.large / etc.] **Assertions:** [Count] **Evaluator:** [Name] ## Results Summary | Category | Must Pass | Should Pass | Nice to Have | Total | |----------|-----------|-------------|--------------|-------| | Performance | [X/3] | [X/2] | [X/1] | [X/6] | | Functional | [X/4] | [X/2] | [X/1] | [X/7] | | Operational | [X/3] | [X/2] | [X/1] | [X/6] | | Demo | [X/5] | [0/0] | [0/0] | [X/5] | | **Total** | **[X/15]** | **[X/6]** | **[X/3]** | **[X/24]** | **Pass Threshold:** 15/15 Must Pass + 4/6 Should Pass = 19/24 minimum **Actual Score:** [X/24] **Status:** [✅ PASS / ❌ FAIL] ## Detailed Results [Paste test results for each criterion] ## Blockers (if any) [List any "Must Pass" failures] ## Recommendations [Next steps for production deployment] ## Sign-Off - [ ] Engineering Lead: ___________________ Date: ___________ - [ ] Operations Lead: ___________________ Date: ___________ - [ ] Product Lead: ___________________ Date: ___________ ``` --- ## Related Documentation - [Production Readiness UAT](../../uat/production-readiness/README.md) - Pre-validation testing - [Operations Hub](./README.md) - Operational documentation - [Reference Architectures](./reference-architecture/) - Deployment models - [Runbooks](./runbooks/) - Troubleshooting procedures --- **Last Updated:** 2026-02-11