This commit implements comprehensive production hardening across multiple layers to prepare StemeDB for enterprise pilot deployments: ## API Layer - Add rate limiting middleware with configurable limits per endpoint - Enhance error handling with detailed context and proper HTTP status codes - Add security hardening tests for input validation and boundary conditions - Create store_helpers module for defensive storage access patterns ## Storage & WAL - Optimize group commit batching for higher throughput - Add defensive error handling in hybrid backend with proper fallbacks - Enhance WAL journal durability guarantees with fsync validation - Improve index store query performance with better caching ## Operations & Deployment - Add comprehensive operations documentation (deployment, monitoring, DR) - Create systemd units for backup, WAL archival, and verification - Add monitoring configs (Prometheus alerts, metrics exporters) - Implement backup/restore scripts with verification and S3 archival - Add DR drill automation and runbook procedures - Create load balancer configs (nginx, envoy) with health checks ## Documentation - Update CLAUDE.md with operations and troubleshooting guides - Expand roadmap with production readiness milestones - Add pilot success criteria and deployment reference architecture - Document TLS setup, monitoring integration, and incident response ## Configuration - Add .env.example with all required environment variables - Document resource sizing for different deployment scales - Add configuration examples for various deployment topologies This positions StemeDB for successful enterprise pilots with proper operational discipline, monitoring, backup/DR, and security hardening. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
910 lines
21 KiB
Markdown
910 lines
21 KiB
Markdown
# Pilot Success Criteria
|
||
|
||
**Definition of "done" for StemeDB pilot deployments**
|
||
|
||
This document defines the acceptance criteria for validating a StemeDB pilot before promoting to production. All "Must Pass" criteria are ship blockers.
|
||
|
||
---
|
||
|
||
## Overview
|
||
|
||
| Section | Must Pass | Should Pass | Nice to Have | Total |
|
||
|---------|-----------|-------------|--------------|-------|
|
||
| **[1. Performance](#1-performance-requirements)** | 3 | 2 | 1 | 6 |
|
||
| **[2. Functional](#2-functional-requirements)** | 4 | 2 | 1 | 7 |
|
||
| **[3. Operational](#3-operational-requirements)** | 3 | 2 | 1 | 6 |
|
||
| **[4. Demo Validation](#4-demo-validation-5-amazement-moments)** | 5 | 0 | 0 | 5 |
|
||
| **[5. Acceptance](#5-acceptance-criteria)** | - | - | - | - |
|
||
| **Total** | **15** | **6** | **3** | **24** |
|
||
|
||
**Pass threshold:** All 15 "Must Pass" + 4/6 "Should Pass" = **19/24 minimum**
|
||
|
||
---
|
||
|
||
## 1. Performance Requirements
|
||
|
||
### Must Pass
|
||
|
||
#### 1.1 Sub-Second Query Latency (p99 <1s)
|
||
|
||
**Requirement:** p99 query latency <1 second at 10K assertions baseline.
|
||
|
||
**Test Procedure:**
|
||
```bash
|
||
# Load 10K assertions
|
||
./scripts/load-test-data.sh --count 10000
|
||
|
||
# Run query load test (100 queries/sec for 5 minutes)
|
||
./scripts/query-load-test.sh \
|
||
--rate 100 \
|
||
--duration 300 \
|
||
--endpoint /v1/query \
|
||
--lens recency
|
||
|
||
# Extract p99 latency
|
||
curl http://localhost:18180/metrics | grep 'stemedb_query_latency_seconds{quantile="0.99"}'
|
||
```
|
||
|
||
**Expected Result:**
|
||
```
|
||
stemedb_query_latency_seconds{quantile="0.99"} 0.987 # <1.0 ✅
|
||
```
|
||
|
||
**Acceptance:**
|
||
- ✅ Pass: p99 <1000ms
|
||
- ⚠️ Warning: p99 1000-1500ms (acceptable with explanation)
|
||
- ❌ Fail: p99 >1500ms
|
||
|
||
---
|
||
|
||
#### 1.2 Sustained Ingest Rate (1K assertions/sec, 5 minutes)
|
||
|
||
**Requirement:** Handle 1,000 assertions/sec sustained for 5 minutes with p99 latency <200ms.
|
||
|
||
**Test Procedure:**
|
||
```bash
|
||
# Run ingest load test
|
||
./scripts/ingest-load-test.sh \
|
||
--rate 1000 \
|
||
--duration 300
|
||
|
||
# Monitor metrics
|
||
curl http://localhost:18180/metrics | grep -E '(ingest_rate|wal_fsync_latency)'
|
||
```
|
||
|
||
**Expected Result:**
|
||
```
|
||
# Ingest rate maintained
|
||
rate(stemedb_assertions_total[1m]) ~= 1000
|
||
|
||
# WAL fsync latency <200ms
|
||
stemedb_wal_fsync_latency_seconds{quantile="0.99"} 0.189 # <0.2 ✅
|
||
```
|
||
|
||
**Acceptance:**
|
||
- ✅ Pass: 1K/sec sustained, p99 <200ms, no errors
|
||
- ⚠️ Warning: 800-1000/sec OR p99 200-300ms
|
||
- ❌ Fail: <800/sec OR p99 >300ms OR errors >1%
|
||
|
||
---
|
||
|
||
#### 1.3 Conflict Detection (Score >0.5 on contradictions)
|
||
|
||
**Requirement:** ConflictLens assigns conflict_score >0.5 when assertions contradict.
|
||
|
||
**Test Procedure:**
|
||
```bash
|
||
# Submit contradictory assertions
|
||
curl -X POST http://localhost:18180/v1/assert \
|
||
-d '{
|
||
"concept_path": "drug/aspirin/safety",
|
||
"predicate": "adverse_event_rate",
|
||
"value": 0.002, # 0.2%
|
||
"confidence": 0.95,
|
||
"agent_id": "fda-clinical-trial"
|
||
}'
|
||
|
||
curl -X POST http://localhost:18180/v1/assert \
|
||
-d '{
|
||
"concept_path": "drug/aspirin/safety",
|
||
"predicate": "adverse_event_rate",
|
||
"value": 0.12, # 12% (contradicts)
|
||
"confidence": 0.7,
|
||
"agent_id": "anecdotal-reports"
|
||
}'
|
||
|
||
# Query with ConflictLens
|
||
curl -X POST http://localhost:18180/v1/query \
|
||
-d '{
|
||
"concept_path": "drug/aspirin/safety",
|
||
"lens": "conflict"
|
||
}' | jq '.conflict_score'
|
||
```
|
||
|
||
**Expected Result:**
|
||
```json
|
||
{
|
||
"conflict_score": 0.87, # >0.5 ✅ (high conflict detected)
|
||
"assertions": [
|
||
{"value": 0.002, "confidence": 0.95, "agent": "fda-clinical-trial"},
|
||
{"value": 0.12, "confidence": 0.7, "agent": "anecdotal-reports"}
|
||
]
|
||
}
|
||
```
|
||
|
||
**Acceptance:**
|
||
- ✅ Pass: conflict_score >0.5 for contradictory values
|
||
- ❌ Fail: conflict_score ≤0.5
|
||
|
||
---
|
||
|
||
### Should Pass
|
||
|
||
#### 1.4 Concurrent Query Capacity (100 readers, <2x degradation)
|
||
|
||
**Requirement:** Support 100 concurrent readers with <2x latency degradation vs baseline.
|
||
|
||
**Test Procedure:**
|
||
```bash
|
||
# Measure baseline (1 concurrent reader)
|
||
ab -n 1000 -c 1 -p query.json http://localhost:18180/v1/query
|
||
# Note: mean latency (e.g., 50ms)
|
||
|
||
# Measure under load (100 concurrent readers)
|
||
ab -n 10000 -c 100 -p query.json http://localhost:18180/v1/query
|
||
# Note: mean latency (e.g., 85ms)
|
||
|
||
# Calculate degradation
|
||
echo "scale=2; 85 / 50" | bc # = 1.7x (acceptable)
|
||
```
|
||
|
||
**Expected Result:**
|
||
- Baseline: 50ms mean
|
||
- Under load: <100ms mean (2x degradation)
|
||
|
||
**Acceptance:**
|
||
- ✅ Pass: <2x degradation
|
||
- ⚠️ Warning: 2-3x degradation
|
||
- ❌ Fail: >3x degradation
|
||
|
||
---
|
||
|
||
#### 1.5 Replication Lag <1s (Cluster Only)
|
||
|
||
**Requirement:** Three-node cluster maintains replication lag <1 second.
|
||
|
||
**Test Procedure:**
|
||
```bash
|
||
# Submit assertion to Node 1
|
||
curl -X POST http://node1:18180/v1/assert -d '{...}'
|
||
|
||
# Wait 1 second
|
||
sleep 1
|
||
|
||
# Query from Node 2 (different node)
|
||
curl -X POST http://node2:18180/v1/query -d '{...}'
|
||
# Should return the assertion
|
||
|
||
# Check replication lag metric
|
||
curl http://node1:18180/metrics | grep replication_lag_seconds
|
||
```
|
||
|
||
**Expected Result:**
|
||
```
|
||
replication_lag_seconds{node="node1"} 0.234 # <1.0 ✅
|
||
replication_lag_seconds{node="node2"} 0.456 # <1.0 ✅
|
||
replication_lag_seconds{node="node3"} 0.123 # <1.0 ✅
|
||
```
|
||
|
||
**Acceptance:**
|
||
- ✅ Pass: All nodes <1s
|
||
- ⚠️ Warning: Any node 1-5s
|
||
- ❌ Fail: Any node >5s
|
||
|
||
---
|
||
|
||
### Nice to Have
|
||
|
||
#### 1.6 Dashboard Load Time <2s
|
||
|
||
**Requirement:** StemeDB dashboard loads in <2 seconds.
|
||
|
||
**Test Procedure:**
|
||
```bash
|
||
# Measure page load time
|
||
curl -w "@curl-format.txt" -o /dev/null -s http://localhost:18188/
|
||
|
||
# Or use browser DevTools Network tab
|
||
# Load: http://localhost:18188/
|
||
# Check: DOMContentLoaded time
|
||
```
|
||
|
||
**Expected Result:**
|
||
- DOMContentLoaded: <2000ms
|
||
|
||
**Acceptance:**
|
||
- ✅ Pass: <2s
|
||
- ⚠️ Warning: 2-5s
|
||
- ❌ Fail: >5s
|
||
|
||
---
|
||
|
||
## 2. Functional Requirements
|
||
|
||
### Must Pass
|
||
|
||
#### 2.1 Complete Audit Trail (Export 100 assertions with signatures)
|
||
|
||
**Requirement:** Export 100 assertions with full provenance chain and verify Ed25519 signatures.
|
||
|
||
**Test Procedure:**
|
||
```bash
|
||
# Query 100 assertions
|
||
curl -X POST http://localhost:18180/v1/query \
|
||
-d '{
|
||
"concept_path": "drug/*",
|
||
"lens": "recency",
|
||
"limit": 100
|
||
}' > assertions.json
|
||
|
||
# Verify each signature
|
||
cat assertions.json | jq -r '.assertions[] | .signature' | while read sig; do
|
||
# Extract public key, message, signature
|
||
# Verify Ed25519 signature
|
||
echo "Verifying $sig..."
|
||
done
|
||
|
||
# Check provenance fields
|
||
cat assertions.json | jq '.assertions[] | select(.provenance == null or .provenance == "")'
|
||
# Should return empty (all have provenance)
|
||
```
|
||
|
||
**Expected Result:**
|
||
- 100 assertions exported
|
||
- All have non-empty `provenance` field
|
||
- All have non-empty `agent_id` field
|
||
- All signatures verify successfully
|
||
|
||
**Acceptance:**
|
||
- ✅ Pass: 100/100 valid signatures + provenance
|
||
- ❌ Fail: Any missing provenance or invalid signature
|
||
|
||
---
|
||
|
||
#### 2.2 Source Retraction Cascade
|
||
|
||
**Requirement:** Retracting source cascades to 110+ dependent assertions.
|
||
|
||
**Test Procedure:**
|
||
```bash
|
||
# Submit source + 110 dependent assertions
|
||
./scripts/seed-retraction-test-data.sh
|
||
|
||
# Retract source
|
||
curl -X POST http://localhost:18180/v1/retract \
|
||
-d '{
|
||
"concept_path": "source/CARDIOVASC_MEGA_TRIAL",
|
||
"reason": "study_retracted_fabricated_data",
|
||
"cascade": true
|
||
}'
|
||
|
||
# Query retracted assertions
|
||
curl -X POST http://localhost:18180/v1/query \
|
||
-d '{
|
||
"concept_path": "drug/*/cardiovascular_risk",
|
||
"lens": "recency",
|
||
"include_retracted": true
|
||
}' | jq '.assertions[] | select(.lifecycle_stage == "RETRACTED") | length'
|
||
```
|
||
|
||
**Expected Result:**
|
||
```
|
||
111 # Source + 110 dependents (≥110 ✅)
|
||
```
|
||
|
||
**Acceptance:**
|
||
- ✅ Pass: ≥110 assertions retracted
|
||
- ❌ Fail: <110 assertions retracted
|
||
|
||
---
|
||
|
||
#### 2.3 Multi-Lens Resolution
|
||
|
||
**Requirement:** RecencyLens, ConsensusLens, and AuthorityLens return different winners for same query.
|
||
|
||
**Test Procedure:**
|
||
```bash
|
||
# Submit 3 assertions (different agents, times, confidence)
|
||
curl -X POST http://localhost:18180/v1/assert -d '{
|
||
"concept_path": "drug/aspirin/dosage",
|
||
"predicate": "recommended_mg",
|
||
"value": 81,
|
||
"confidence": 0.95,
|
||
"agent_id": "fda-guidelines",
|
||
"timestamp": "2024-01-01T00:00:00Z"
|
||
}'
|
||
|
||
curl -X POST http://localhost:18180/v1/assert -d '{
|
||
"concept_path": "drug/aspirin/dosage",
|
||
"predicate": "recommended_mg",
|
||
"value": 100,
|
||
"confidence": 0.7,
|
||
"agent_id": "mayo-clinic",
|
||
"timestamp": "2025-06-01T00:00:00Z"
|
||
}'
|
||
|
||
curl -X POST http://localhost:18180/v1/assert -d '{
|
||
"concept_path": "drug/aspirin/dosage",
|
||
"predicate": "recommended_mg",
|
||
"value": 325,
|
||
"confidence": 0.6,
|
||
"agent_id": "patient-forum",
|
||
"timestamp": "2025-12-01T00:00:00Z"
|
||
}'
|
||
|
||
# Query with each lens
|
||
curl -X POST http://localhost:18180/v1/query \
|
||
-d '{"concept_path": "drug/aspirin/dosage", "lens": "recency"}' \
|
||
| jq '.assertions[0].value'
|
||
# Expected: 325 (most recent)
|
||
|
||
curl -X POST http://localhost:18180/v1/query \
|
||
-d '{"concept_path": "drug/aspirin/dosage", "lens": "authority"}' \
|
||
| jq '.assertions[0].value'
|
||
# Expected: 81 (highest confidence from FDA)
|
||
|
||
curl -X POST http://localhost:18180/v1/query \
|
||
-d '{"concept_path": "drug/aspirin/dosage", "lens": "consensus"}' \
|
||
| jq '.assertions[0].value'
|
||
# Expected: 100 (middle value, balances recency + authority)
|
||
```
|
||
|
||
**Expected Result:**
|
||
- RecencyLens returns: 325 (latest timestamp)
|
||
- AuthorityLens returns: 81 (FDA, highest confidence)
|
||
- ConsensusLens returns: 100 (middle value)
|
||
|
||
**All 3 lenses return different winners ✅**
|
||
|
||
**Acceptance:**
|
||
- ✅ Pass: 3 different winners across lenses
|
||
- ❌ Fail: Same winner for all lenses (indicates lens not working)
|
||
|
||
---
|
||
|
||
#### 2.4 Health Endpoint Returns 200
|
||
|
||
**Requirement:** `/v1/health` returns 200 with valid JSON.
|
||
|
||
**Test Procedure:**
|
||
```bash
|
||
curl -i http://localhost:18180/v1/health
|
||
```
|
||
|
||
**Expected Result:**
|
||
```
|
||
HTTP/1.1 200 OK
|
||
Content-Type: application/json
|
||
|
||
{
|
||
"status": "healthy",
|
||
"version": "0.1.0",
|
||
"uptime_seconds": 12345,
|
||
"assertion_count": 10234
|
||
}
|
||
```
|
||
|
||
**Acceptance:**
|
||
- ✅ Pass: 200 status + valid JSON
|
||
- ❌ Fail: Non-200 status OR malformed JSON
|
||
|
||
---
|
||
|
||
### Should Pass
|
||
|
||
#### 2.5 Query with Complex Lens (AuthorityLens with deep chain)
|
||
|
||
**Requirement:** AuthorityLens resolves assertions with trust chain depth ≥3.
|
||
|
||
**Test Procedure:**
|
||
```bash
|
||
# Submit assertions with trust chain:
|
||
# Agent A → Agent B → Agent C → Agent D (depth 3)
|
||
|
||
./scripts/seed-trust-chain.sh --depth 3
|
||
|
||
# Query with AuthorityLens
|
||
curl -X POST http://localhost:18180/v1/query \
|
||
-d '{
|
||
"concept_path": "research/deep_chain",
|
||
"lens": "authority"
|
||
}' | jq '.trust_chain_depth'
|
||
```
|
||
|
||
**Expected Result:**
|
||
```
|
||
3 # Depth ≥3 ✅
|
||
```
|
||
|
||
**Acceptance:**
|
||
- ✅ Pass: Depth ≥3
|
||
- ❌ Fail: Depth <3
|
||
|
||
---
|
||
|
||
#### 2.6 Time-Travel Query (2023 vs 2025 comparison)
|
||
|
||
**Requirement:** Query returns different results for different timestamps.
|
||
|
||
**Test Procedure:**
|
||
```bash
|
||
# Query as of 2023
|
||
curl -X POST http://localhost:18180/v1/query \
|
||
-d '{
|
||
"concept_path": "drug/aspirin/dosage",
|
||
"lens": "recency",
|
||
"as_of": "2023-01-01T00:00:00Z"
|
||
}' | jq '.assertions[0].value'
|
||
# Expected: 81 (old guideline)
|
||
|
||
# Query as of 2025
|
||
curl -X POST http://localhost:18180/v1/query \
|
||
-d '{
|
||
"concept_path": "drug/aspirin/dosage",
|
||
"lens": "recency",
|
||
"as_of": "2025-12-31T23:59:59Z"
|
||
}' | jq '.assertions[0].value'
|
||
# Expected: 325 (updated guideline)
|
||
```
|
||
|
||
**Expected Result:**
|
||
- 2023: 81
|
||
- 2025: 325
|
||
- **Different values ✅**
|
||
|
||
**Acceptance:**
|
||
- ✅ Pass: Different values for different timestamps
|
||
- ❌ Fail: Same value (time-travel not working)
|
||
|
||
---
|
||
|
||
### Nice to Have
|
||
|
||
#### 2.7 Swagger UI Accessible
|
||
|
||
**Requirement:** OpenAPI docs accessible at `/swagger-ui`.
|
||
|
||
**Test Procedure:**
|
||
```bash
|
||
curl -I http://localhost:18180/swagger-ui/
|
||
```
|
||
|
||
**Expected Result:**
|
||
```
|
||
HTTP/1.1 200 OK
|
||
Content-Type: text/html
|
||
```
|
||
|
||
**Acceptance:**
|
||
- ✅ Pass: 200 status
|
||
- ⚠️ Warning: 404 (acceptable if documented)
|
||
|
||
---
|
||
|
||
## 3. Operational Requirements
|
||
|
||
### Must Pass
|
||
|
||
#### 3.1 Backup/Restore Roundtrip
|
||
|
||
**Requirement:** Load 10K assertions → backup → restore → verify count matches.
|
||
|
||
**Test Procedure:**
|
||
```bash
|
||
# Load 10K assertions
|
||
./scripts/load-test-data.sh --count 10000
|
||
|
||
# Check count
|
||
ORIGINAL_COUNT=$(curl -s http://localhost:18180/v1/health | jq '.assertion_count')
|
||
echo "Original count: $ORIGINAL_COUNT"
|
||
|
||
# Backup
|
||
sudo ./scripts/backup-stemedb.sh
|
||
BACKUP_DIR=$(ls -dt backups/stemedb-backup-* | head -1)
|
||
|
||
# Stop server
|
||
sudo systemctl stop stemedb-api
|
||
|
||
# Restore
|
||
sudo ./scripts/restore-stemedb.sh $BACKUP_DIR
|
||
|
||
# Start server
|
||
sudo systemctl start stemedb-api
|
||
|
||
# Wait for startup
|
||
sleep 10
|
||
|
||
# Check count
|
||
RESTORED_COUNT=$(curl -s http://localhost:18180/v1/health | jq '.assertion_count')
|
||
echo "Restored count: $RESTORED_COUNT"
|
||
|
||
# Verify match
|
||
[ "$ORIGINAL_COUNT" -eq "$RESTORED_COUNT" ] && echo "✅ Pass" || echo "❌ Fail"
|
||
```
|
||
|
||
**Expected Result:**
|
||
```
|
||
Original count: 10234
|
||
Restored count: 10234
|
||
✅ Pass
|
||
```
|
||
|
||
**Acceptance:**
|
||
- ✅ Pass: Counts match exactly
|
||
- ❌ Fail: Counts differ
|
||
|
||
---
|
||
|
||
#### 3.2 Node Failure Recovery (Three-Node Cluster)
|
||
|
||
**Requirement:** Kill Node 2 → queries continue → node recovers → re-replicates <5 min.
|
||
|
||
**Test Procedure:**
|
||
```bash
|
||
# Kill Node 2
|
||
ssh node2 "sudo systemctl stop stemedb-api"
|
||
|
||
# Verify cluster detects failure
|
||
curl http://node1:18181/cluster/members | jq '.members[] | select(.id=="node2") | .status'
|
||
# Expected: "DOWN"
|
||
|
||
# Submit query to Node 1 (should succeed)
|
||
curl -X POST http://node1:18180/v1/query -d '{...}'
|
||
# Expected: 200 OK
|
||
|
||
# Restart Node 2
|
||
ssh node2 "sudo systemctl start stemedb-api"
|
||
|
||
# Wait for re-replication
|
||
sleep 300 # 5 minutes
|
||
|
||
# Check replication lag
|
||
curl http://node2:18180/metrics | grep replication_lag_seconds
|
||
# Expected: <1.0
|
||
```
|
||
|
||
**Expected Result:**
|
||
- Node 2 failure detected within 30s
|
||
- Queries continue to succeed on Node 1 & 3
|
||
- Node 2 recovers and re-replicates within 5 minutes
|
||
- Final replication lag <1s
|
||
|
||
**Acceptance:**
|
||
- ✅ Pass: All criteria met
|
||
- ❌ Fail: Queries failed OR recovery >5 min
|
||
|
||
---
|
||
|
||
#### 3.3 Rolling Restart (Three-Node Cluster, Zero Downtime)
|
||
|
||
**Requirement:** Restart nodes one-by-one during load test → 100% success rate.
|
||
|
||
**Test Procedure:**
|
||
```bash
|
||
# Start load test (background)
|
||
./scripts/query-load-test.sh --rate 10 --duration 600 &
|
||
LOAD_PID=$!
|
||
|
||
# Wait 60s for baseline
|
||
sleep 60
|
||
|
||
# Restart Node 1
|
||
ssh node1 "sudo systemctl restart stemedb-api"
|
||
sleep 60
|
||
|
||
# Restart Node 2
|
||
ssh node2 "sudo systemctl restart stemedb-api"
|
||
sleep 60
|
||
|
||
# Restart Node 3
|
||
ssh node3 "sudo systemctl restart stemedb-api"
|
||
sleep 60
|
||
|
||
# Wait for load test to complete
|
||
wait $LOAD_PID
|
||
|
||
# Check success rate
|
||
grep "Success rate" load-test-results.log
|
||
```
|
||
|
||
**Expected Result:**
|
||
```
|
||
Success rate: 100.0% (6000/6000 requests succeeded)
|
||
```
|
||
|
||
**Acceptance:**
|
||
- ✅ Pass: 100% success rate
|
||
- ⚠️ Warning: 98-99.9% success rate
|
||
- ❌ Fail: <98% success rate
|
||
|
||
---
|
||
|
||
### Should Pass
|
||
|
||
#### 3.4 Metrics Exposed (Prometheus Format)
|
||
|
||
**Requirement:** `/metrics` endpoint returns Prometheus-format metrics.
|
||
|
||
**Test Procedure:**
|
||
```bash
|
||
curl http://localhost:18180/metrics | head -20
|
||
```
|
||
|
||
**Expected Result:**
|
||
```
|
||
# HELP stemedb_assertions_total Total assertions ingested
|
||
# TYPE stemedb_assertions_total counter
|
||
stemedb_assertions_total 10234
|
||
|
||
# HELP stemedb_query_latency_seconds Query latency histogram
|
||
# TYPE stemedb_query_latency_seconds histogram
|
||
stemedb_query_latency_seconds_bucket{le="0.005"} 1234
|
||
...
|
||
```
|
||
|
||
**Acceptance:**
|
||
- ✅ Pass: Valid Prometheus format
|
||
- ❌ Fail: Invalid format OR endpoint unreachable
|
||
|
||
---
|
||
|
||
#### 3.5 Grafana Dashboard Loads
|
||
|
||
**Requirement:** Grafana dashboard displays StemeDB metrics without errors.
|
||
|
||
**Test Procedure:**
|
||
1. Open http://localhost:3000 (Grafana)
|
||
2. Navigate to "StemeDB Overview" dashboard
|
||
3. Check all panels load without errors
|
||
|
||
**Expected Result:**
|
||
- All panels display data
|
||
- No "No data" or "Error" messages
|
||
|
||
**Acceptance:**
|
||
- ✅ Pass: All panels load
|
||
- ⚠️ Warning: 1-2 panels missing data
|
||
- ❌ Fail: >2 panels missing data
|
||
|
||
---
|
||
|
||
### Nice to Have
|
||
|
||
#### 3.6 Backup Automation (Cron Job Running)
|
||
|
||
**Requirement:** Daily backup cron job configured and executed.
|
||
|
||
**Test Procedure:**
|
||
```bash
|
||
# Check cron job exists
|
||
sudo crontab -l | grep backup-stemedb
|
||
|
||
# Expected:
|
||
# 0 2 * * * /usr/local/bin/backup-stemedb.sh >> /var/log/stemedb-backup.log 2>&1
|
||
|
||
# Check last backup
|
||
ls -lt backups/ | head -3
|
||
|
||
# Expected: Backup from last 24 hours
|
||
```
|
||
|
||
**Acceptance:**
|
||
- ✅ Pass: Cron job exists + recent backup
|
||
- ⚠️ Warning: Cron job exists but no recent backup
|
||
- ❌ Fail: No cron job
|
||
|
||
---
|
||
|
||
## 4. Demo Validation: 5 Amazement Moments
|
||
|
||
**All 5 moments must be demonstrable without errors.**
|
||
|
||
### Moment 1: Conflicting Claims (FDA 0.2% vs Anecdotal 12%)
|
||
|
||
**Setup:**
|
||
```bash
|
||
./scripts/demo-moment-1-conflicting-claims.sh
|
||
```
|
||
|
||
**Demo Script:**
|
||
1. Show 2 assertions: FDA (0.2%) vs Anecdotal (12%)
|
||
2. Query with ConflictLens → Shows conflict_score: 0.87
|
||
3. Query with AuthorityLens → Returns FDA value (higher confidence)
|
||
4. **Amazement:** "Same data, different answers based on lens choice"
|
||
|
||
**Acceptance:**
|
||
- ✅ Pass: ConflictLens detects conflict, AuthorityLens picks FDA
|
||
- ❌ Fail: Lenses don't differentiate
|
||
|
||
---
|
||
|
||
### Moment 2: Source Retraction Cascade (110 Assertions Flagged)
|
||
|
||
**Setup:**
|
||
```bash
|
||
./scripts/demo-moment-2-retraction.sh
|
||
```
|
||
|
||
**Demo Script:**
|
||
1. Show study with 110 dependent drug safety assertions
|
||
2. Retract study: `POST /v1/retract` with `cascade: true`
|
||
3. Query retracted assertions → 111 total (study + dependents)
|
||
4. **Amazement:** "One retraction cascades to 110+ assertions automatically"
|
||
|
||
**Acceptance:**
|
||
- ✅ Pass: 111 assertions retracted
|
||
- ❌ Fail: <110 assertions retracted
|
||
|
||
---
|
||
|
||
### Moment 3: Audit Trail (Provenance Chain to Source)
|
||
|
||
**Setup:**
|
||
```bash
|
||
./scripts/demo-moment-3-audit-trail.sh
|
||
```
|
||
|
||
**Demo Script:**
|
||
1. Query assertion: "Drug X has adverse event rate 5%"
|
||
2. Show provenance: "Clinical trial ABC, 2024-06-15"
|
||
3. Trace to source: "Trial ABC run by Pharma Corp, funded by..."
|
||
4. Verify signature: Ed25519 signature valid
|
||
5. **Amazement:** "Full audit trail from claim to original source"
|
||
|
||
**Acceptance:**
|
||
- ✅ Pass: Provenance chain complete, signature valid
|
||
- ❌ Fail: Missing provenance OR invalid signature
|
||
|
||
---
|
||
|
||
### Moment 4: Time-Travel (Query 2023 vs 2025 Guidelines)
|
||
|
||
**Setup:**
|
||
```bash
|
||
./scripts/demo-moment-4-time-travel.sh
|
||
```
|
||
|
||
**Demo Script:**
|
||
1. Query aspirin dosage as of 2023 → Returns 81mg
|
||
2. Query same as of 2025 → Returns 325mg
|
||
3. Show timeline of changes (3 updates over 2 years)
|
||
4. **Amazement:** "See how medical guidelines evolved over time"
|
||
|
||
**Acceptance:**
|
||
- ✅ Pass: Different values for different timestamps
|
||
- ❌ Fail: Same value (time-travel not working)
|
||
|
||
---
|
||
|
||
### Moment 5: Lens-Based Resolution (3 Lenses → 3 Winners)
|
||
|
||
**Setup:**
|
||
```bash
|
||
./scripts/demo-moment-5-lens-resolution.sh
|
||
```
|
||
|
||
**Demo Script:**
|
||
1. Show 5 conflicting assertions for "recommended dosage"
|
||
2. Query with RecencyLens → Returns latest assertion
|
||
3. Query with ConsensusLens → Returns middle value
|
||
4. Query with AuthorityLens → Returns highest confidence assertion
|
||
5. **Amazement:** "Same query, 3 different answers - you choose resolution strategy"
|
||
|
||
**Acceptance:**
|
||
- ✅ Pass: 3 lenses return 3 different winners
|
||
- ❌ Fail: Lenses return same winner
|
||
|
||
---
|
||
|
||
## 5. Acceptance Criteria
|
||
|
||
### Must Pass (Ship Blockers)
|
||
|
||
**All 15 "Must Pass" criteria must be met:**
|
||
|
||
- [ ] 1.1 Query latency p99 <1s
|
||
- [ ] 1.2 Sustained ingest 1K/sec
|
||
- [ ] 1.3 Conflict detection >0.5
|
||
- [ ] 2.1 Audit trail complete
|
||
- [ ] 2.2 Retraction cascade ≥110
|
||
- [ ] 2.3 Multi-lens resolution
|
||
- [ ] 2.4 Health endpoint 200 OK
|
||
- [ ] 3.1 Backup/restore roundtrip
|
||
- [ ] 3.2 Node failure recovery (cluster)
|
||
- [ ] 3.3 Rolling restart (cluster)
|
||
- [ ] 4.1 Moment 1: Conflicting claims
|
||
- [ ] 4.2 Moment 2: Retraction cascade
|
||
- [ ] 4.3 Moment 3: Audit trail
|
||
- [ ] 4.4 Moment 4: Time-travel
|
||
- [ ] 4.5 Moment 5: Lens resolution
|
||
|
||
### Should Pass (Recommended)
|
||
|
||
**At least 4/6 "Should Pass" required:**
|
||
|
||
- [ ] 1.4 Concurrent query capacity
|
||
- [ ] 1.5 Replication lag <1s (cluster)
|
||
- [ ] 2.5 Complex lens (deep chain)
|
||
- [ ] 2.6 Time-travel query
|
||
- [ ] 3.4 Metrics exposed
|
||
- [ ] 3.5 Grafana dashboard
|
||
|
||
### Nice to Have (Optional)
|
||
|
||
**Not required for pilot approval:**
|
||
|
||
- [ ] 1.6 Dashboard load time <2s
|
||
- [ ] 2.7 Swagger UI accessible
|
||
- [ ] 3.6 Backup automation (cron)
|
||
|
||
---
|
||
|
||
## Validation Report Template
|
||
|
||
**Copy this template to document pilot validation results:**
|
||
|
||
```markdown
|
||
# StemeDB Pilot Validation Report
|
||
|
||
**Date:** YYYY-MM-DD
|
||
**Deployment:** [Single-node / Three-node cluster]
|
||
**Instance Type:** [AWS t3.large / etc.]
|
||
**Assertions:** [Count]
|
||
**Evaluator:** [Name]
|
||
|
||
## Results Summary
|
||
|
||
| Category | Must Pass | Should Pass | Nice to Have | Total |
|
||
|----------|-----------|-------------|--------------|-------|
|
||
| Performance | [X/3] | [X/2] | [X/1] | [X/6] |
|
||
| Functional | [X/4] | [X/2] | [X/1] | [X/7] |
|
||
| Operational | [X/3] | [X/2] | [X/1] | [X/6] |
|
||
| Demo | [X/5] | [0/0] | [0/0] | [X/5] |
|
||
| **Total** | **[X/15]** | **[X/6]** | **[X/3]** | **[X/24]** |
|
||
|
||
**Pass Threshold:** 15/15 Must Pass + 4/6 Should Pass = 19/24 minimum
|
||
**Actual Score:** [X/24]
|
||
**Status:** [✅ PASS / ❌ FAIL]
|
||
|
||
## Detailed Results
|
||
|
||
[Paste test results for each criterion]
|
||
|
||
## Blockers (if any)
|
||
|
||
[List any "Must Pass" failures]
|
||
|
||
## Recommendations
|
||
|
||
[Next steps for production deployment]
|
||
|
||
## Sign-Off
|
||
|
||
- [ ] Engineering Lead: ___________________ Date: ___________
|
||
- [ ] Operations Lead: ___________________ Date: ___________
|
||
- [ ] Product Lead: ___________________ Date: ___________
|
||
```
|
||
|
||
---
|
||
|
||
## Related Documentation
|
||
|
||
- [Production Readiness UAT](../../uat/production-readiness/README.md) - Pre-validation testing
|
||
- [Operations Hub](./README.md) - Operational documentation
|
||
- [Reference Architectures](./reference-architecture/) - Deployment models
|
||
- [Runbooks](./runbooks/) - Troubleshooting procedures
|
||
|
||
---
|
||
|
||
**Last Updated:** 2026-02-11
|