This commit implements comprehensive production hardening across multiple layers to prepare StemeDB for enterprise pilot deployments: ## API Layer - Add rate limiting middleware with configurable limits per endpoint - Enhance error handling with detailed context and proper HTTP status codes - Add security hardening tests for input validation and boundary conditions - Create store_helpers module for defensive storage access patterns ## Storage & WAL - Optimize group commit batching for higher throughput - Add defensive error handling in hybrid backend with proper fallbacks - Enhance WAL journal durability guarantees with fsync validation - Improve index store query performance with better caching ## Operations & Deployment - Add comprehensive operations documentation (deployment, monitoring, DR) - Create systemd units for backup, WAL archival, and verification - Add monitoring configs (Prometheus alerts, metrics exporters) - Implement backup/restore scripts with verification and S3 archival - Add DR drill automation and runbook procedures - Create load balancer configs (nginx, envoy) with health checks ## Documentation - Update CLAUDE.md with operations and troubleshooting guides - Expand roadmap with production readiness milestones - Add pilot success criteria and deployment reference architecture - Document TLS setup, monitoring integration, and incident response ## Configuration - Add .env.example with all required environment variables - Document resource sizing for different deployment scales - Add configuration examples for various deployment topologies This positions StemeDB for successful enterprise pilots with proper operational discipline, monitoring, backup/DR, and security hardening. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
21 KiB
Pilot Success Criteria
Definition of "done" for StemeDB pilot deployments
This document defines the acceptance criteria for validating a StemeDB pilot before promoting to production. All "Must Pass" criteria are ship blockers.
Overview
| Section | Must Pass | Should Pass | Nice to Have | Total |
|---|---|---|---|---|
| 1. Performance | 3 | 2 | 1 | 6 |
| 2. Functional | 4 | 2 | 1 | 7 |
| 3. Operational | 3 | 2 | 1 | 6 |
| 4. Demo Validation | 5 | 0 | 0 | 5 |
| 5. Acceptance | - | - | - | - |
| Total | 15 | 6 | 3 | 24 |
Pass threshold: All 15 "Must Pass" + 4/6 "Should Pass" = 19/24 minimum
1. Performance Requirements
Must Pass
1.1 Sub-Second Query Latency (p99 <1s)
Requirement: p99 query latency <1 second at 10K assertions baseline.
Test Procedure:
# Load 10K assertions
./scripts/load-test-data.sh --count 10000
# Run query load test (100 queries/sec for 5 minutes)
./scripts/query-load-test.sh \
--rate 100 \
--duration 300 \
--endpoint /v1/query \
--lens recency
# Extract p99 latency
curl http://localhost:18180/metrics | grep 'stemedb_query_latency_seconds{quantile="0.99"}'
Expected Result:
stemedb_query_latency_seconds{quantile="0.99"} 0.987 # <1.0 ✅
Acceptance:
- ✅ Pass: p99 <1000ms
- ⚠️ Warning: p99 1000-1500ms (acceptable with explanation)
- ❌ Fail: p99 >1500ms
1.2 Sustained Ingest Rate (1K assertions/sec, 5 minutes)
Requirement: Handle 1,000 assertions/sec sustained for 5 minutes with p99 latency <200ms.
Test Procedure:
# Run ingest load test
./scripts/ingest-load-test.sh \
--rate 1000 \
--duration 300
# Monitor metrics
curl http://localhost:18180/metrics | grep -E '(ingest_rate|wal_fsync_latency)'
Expected Result:
# Ingest rate maintained
rate(stemedb_assertions_total[1m]) ~= 1000
# WAL fsync latency <200ms
stemedb_wal_fsync_latency_seconds{quantile="0.99"} 0.189 # <0.2 ✅
Acceptance:
- ✅ Pass: 1K/sec sustained, p99 <200ms, no errors
- ⚠️ Warning: 800-1000/sec OR p99 200-300ms
- ❌ Fail: <800/sec OR p99 >300ms OR errors >1%
1.3 Conflict Detection (Score >0.5 on contradictions)
Requirement: ConflictLens assigns conflict_score >0.5 when assertions contradict.
Test Procedure:
# Submit contradictory assertions
curl -X POST http://localhost:18180/v1/assert \
-d '{
"concept_path": "drug/aspirin/safety",
"predicate": "adverse_event_rate",
"value": 0.002, # 0.2%
"confidence": 0.95,
"agent_id": "fda-clinical-trial"
}'
curl -X POST http://localhost:18180/v1/assert \
-d '{
"concept_path": "drug/aspirin/safety",
"predicate": "adverse_event_rate",
"value": 0.12, # 12% (contradicts)
"confidence": 0.7,
"agent_id": "anecdotal-reports"
}'
# Query with ConflictLens
curl -X POST http://localhost:18180/v1/query \
-d '{
"concept_path": "drug/aspirin/safety",
"lens": "conflict"
}' | jq '.conflict_score'
Expected Result:
{
"conflict_score": 0.87, # >0.5 ✅ (high conflict detected)
"assertions": [
{"value": 0.002, "confidence": 0.95, "agent": "fda-clinical-trial"},
{"value": 0.12, "confidence": 0.7, "agent": "anecdotal-reports"}
]
}
Acceptance:
- ✅ Pass: conflict_score >0.5 for contradictory values
- ❌ Fail: conflict_score ≤0.5
Should Pass
1.4 Concurrent Query Capacity (100 readers, <2x degradation)
Requirement: Support 100 concurrent readers with <2x latency degradation vs baseline.
Test Procedure:
# Measure baseline (1 concurrent reader)
ab -n 1000 -c 1 -p query.json http://localhost:18180/v1/query
# Note: mean latency (e.g., 50ms)
# Measure under load (100 concurrent readers)
ab -n 10000 -c 100 -p query.json http://localhost:18180/v1/query
# Note: mean latency (e.g., 85ms)
# Calculate degradation
echo "scale=2; 85 / 50" | bc # = 1.7x (acceptable)
Expected Result:
- Baseline: 50ms mean
- Under load: <100ms mean (2x degradation)
Acceptance:
- ✅ Pass: <2x degradation
- ⚠️ Warning: 2-3x degradation
- ❌ Fail: >3x degradation
1.5 Replication Lag <1s (Cluster Only)
Requirement: Three-node cluster maintains replication lag <1 second.
Test Procedure:
# Submit assertion to Node 1
curl -X POST http://node1:18180/v1/assert -d '{...}'
# Wait 1 second
sleep 1
# Query from Node 2 (different node)
curl -X POST http://node2:18180/v1/query -d '{...}'
# Should return the assertion
# Check replication lag metric
curl http://node1:18180/metrics | grep replication_lag_seconds
Expected Result:
replication_lag_seconds{node="node1"} 0.234 # <1.0 ✅
replication_lag_seconds{node="node2"} 0.456 # <1.0 ✅
replication_lag_seconds{node="node3"} 0.123 # <1.0 ✅
Acceptance:
- ✅ Pass: All nodes <1s
- ⚠️ Warning: Any node 1-5s
- ❌ Fail: Any node >5s
Nice to Have
1.6 Dashboard Load Time <2s
Requirement: StemeDB dashboard loads in <2 seconds.
Test Procedure:
# Measure page load time
curl -w "@curl-format.txt" -o /dev/null -s http://localhost:18188/
# Or use browser DevTools Network tab
# Load: http://localhost:18188/
# Check: DOMContentLoaded time
Expected Result:
- DOMContentLoaded: <2000ms
Acceptance:
- ✅ Pass: <2s
- ⚠️ Warning: 2-5s
- ❌ Fail: >5s
2. Functional Requirements
Must Pass
2.1 Complete Audit Trail (Export 100 assertions with signatures)
Requirement: Export 100 assertions with full provenance chain and verify Ed25519 signatures.
Test Procedure:
# Query 100 assertions
curl -X POST http://localhost:18180/v1/query \
-d '{
"concept_path": "drug/*",
"lens": "recency",
"limit": 100
}' > assertions.json
# Verify each signature
cat assertions.json | jq -r '.assertions[] | .signature' | while read sig; do
# Extract public key, message, signature
# Verify Ed25519 signature
echo "Verifying $sig..."
done
# Check provenance fields
cat assertions.json | jq '.assertions[] | select(.provenance == null or .provenance == "")'
# Should return empty (all have provenance)
Expected Result:
- 100 assertions exported
- All have non-empty
provenancefield - All have non-empty
agent_idfield - All signatures verify successfully
Acceptance:
- ✅ Pass: 100/100 valid signatures + provenance
- ❌ Fail: Any missing provenance or invalid signature
2.2 Source Retraction Cascade
Requirement: Retracting source cascades to 110+ dependent assertions.
Test Procedure:
# Submit source + 110 dependent assertions
./scripts/seed-retraction-test-data.sh
# Retract source
curl -X POST http://localhost:18180/v1/retract \
-d '{
"concept_path": "source/CARDIOVASC_MEGA_TRIAL",
"reason": "study_retracted_fabricated_data",
"cascade": true
}'
# Query retracted assertions
curl -X POST http://localhost:18180/v1/query \
-d '{
"concept_path": "drug/*/cardiovascular_risk",
"lens": "recency",
"include_retracted": true
}' | jq '.assertions[] | select(.lifecycle_stage == "RETRACTED") | length'
Expected Result:
111 # Source + 110 dependents (≥110 ✅)
Acceptance:
- ✅ Pass: ≥110 assertions retracted
- ❌ Fail: <110 assertions retracted
2.3 Multi-Lens Resolution
Requirement: RecencyLens, ConsensusLens, and AuthorityLens return different winners for same query.
Test Procedure:
# Submit 3 assertions (different agents, times, confidence)
curl -X POST http://localhost:18180/v1/assert -d '{
"concept_path": "drug/aspirin/dosage",
"predicate": "recommended_mg",
"value": 81,
"confidence": 0.95,
"agent_id": "fda-guidelines",
"timestamp": "2024-01-01T00:00:00Z"
}'
curl -X POST http://localhost:18180/v1/assert -d '{
"concept_path": "drug/aspirin/dosage",
"predicate": "recommended_mg",
"value": 100,
"confidence": 0.7,
"agent_id": "mayo-clinic",
"timestamp": "2025-06-01T00:00:00Z"
}'
curl -X POST http://localhost:18180/v1/assert -d '{
"concept_path": "drug/aspirin/dosage",
"predicate": "recommended_mg",
"value": 325,
"confidence": 0.6,
"agent_id": "patient-forum",
"timestamp": "2025-12-01T00:00:00Z"
}'
# Query with each lens
curl -X POST http://localhost:18180/v1/query \
-d '{"concept_path": "drug/aspirin/dosage", "lens": "recency"}' \
| jq '.assertions[0].value'
# Expected: 325 (most recent)
curl -X POST http://localhost:18180/v1/query \
-d '{"concept_path": "drug/aspirin/dosage", "lens": "authority"}' \
| jq '.assertions[0].value'
# Expected: 81 (highest confidence from FDA)
curl -X POST http://localhost:18180/v1/query \
-d '{"concept_path": "drug/aspirin/dosage", "lens": "consensus"}' \
| jq '.assertions[0].value'
# Expected: 100 (middle value, balances recency + authority)
Expected Result:
- RecencyLens returns: 325 (latest timestamp)
- AuthorityLens returns: 81 (FDA, highest confidence)
- ConsensusLens returns: 100 (middle value)
All 3 lenses return different winners ✅
Acceptance:
- ✅ Pass: 3 different winners across lenses
- ❌ Fail: Same winner for all lenses (indicates lens not working)
2.4 Health Endpoint Returns 200
Requirement: /v1/health returns 200 with valid JSON.
Test Procedure:
curl -i http://localhost:18180/v1/health
Expected Result:
HTTP/1.1 200 OK
Content-Type: application/json
{
"status": "healthy",
"version": "0.1.0",
"uptime_seconds": 12345,
"assertion_count": 10234
}
Acceptance:
- ✅ Pass: 200 status + valid JSON
- ❌ Fail: Non-200 status OR malformed JSON
Should Pass
2.5 Query with Complex Lens (AuthorityLens with deep chain)
Requirement: AuthorityLens resolves assertions with trust chain depth ≥3.
Test Procedure:
# Submit assertions with trust chain:
# Agent A → Agent B → Agent C → Agent D (depth 3)
./scripts/seed-trust-chain.sh --depth 3
# Query with AuthorityLens
curl -X POST http://localhost:18180/v1/query \
-d '{
"concept_path": "research/deep_chain",
"lens": "authority"
}' | jq '.trust_chain_depth'
Expected Result:
3 # Depth ≥3 ✅
Acceptance:
- ✅ Pass: Depth ≥3
- ❌ Fail: Depth <3
2.6 Time-Travel Query (2023 vs 2025 comparison)
Requirement: Query returns different results for different timestamps.
Test Procedure:
# Query as of 2023
curl -X POST http://localhost:18180/v1/query \
-d '{
"concept_path": "drug/aspirin/dosage",
"lens": "recency",
"as_of": "2023-01-01T00:00:00Z"
}' | jq '.assertions[0].value'
# Expected: 81 (old guideline)
# Query as of 2025
curl -X POST http://localhost:18180/v1/query \
-d '{
"concept_path": "drug/aspirin/dosage",
"lens": "recency",
"as_of": "2025-12-31T23:59:59Z"
}' | jq '.assertions[0].value'
# Expected: 325 (updated guideline)
Expected Result:
- 2023: 81
- 2025: 325
- Different values ✅
Acceptance:
- ✅ Pass: Different values for different timestamps
- ❌ Fail: Same value (time-travel not working)
Nice to Have
2.7 Swagger UI Accessible
Requirement: OpenAPI docs accessible at /swagger-ui.
Test Procedure:
curl -I http://localhost:18180/swagger-ui/
Expected Result:
HTTP/1.1 200 OK
Content-Type: text/html
Acceptance:
- ✅ Pass: 200 status
- ⚠️ Warning: 404 (acceptable if documented)
3. Operational Requirements
Must Pass
3.1 Backup/Restore Roundtrip
Requirement: Load 10K assertions → backup → restore → verify count matches.
Test Procedure:
# Load 10K assertions
./scripts/load-test-data.sh --count 10000
# Check count
ORIGINAL_COUNT=$(curl -s http://localhost:18180/v1/health | jq '.assertion_count')
echo "Original count: $ORIGINAL_COUNT"
# Backup
sudo ./scripts/backup-stemedb.sh
BACKUP_DIR=$(ls -dt backups/stemedb-backup-* | head -1)
# Stop server
sudo systemctl stop stemedb-api
# Restore
sudo ./scripts/restore-stemedb.sh $BACKUP_DIR
# Start server
sudo systemctl start stemedb-api
# Wait for startup
sleep 10
# Check count
RESTORED_COUNT=$(curl -s http://localhost:18180/v1/health | jq '.assertion_count')
echo "Restored count: $RESTORED_COUNT"
# Verify match
[ "$ORIGINAL_COUNT" -eq "$RESTORED_COUNT" ] && echo "✅ Pass" || echo "❌ Fail"
Expected Result:
Original count: 10234
Restored count: 10234
✅ Pass
Acceptance:
- ✅ Pass: Counts match exactly
- ❌ Fail: Counts differ
3.2 Node Failure Recovery (Three-Node Cluster)
Requirement: Kill Node 2 → queries continue → node recovers → re-replicates <5 min.
Test Procedure:
# Kill Node 2
ssh node2 "sudo systemctl stop stemedb-api"
# Verify cluster detects failure
curl http://node1:18181/cluster/members | jq '.members[] | select(.id=="node2") | .status'
# Expected: "DOWN"
# Submit query to Node 1 (should succeed)
curl -X POST http://node1:18180/v1/query -d '{...}'
# Expected: 200 OK
# Restart Node 2
ssh node2 "sudo systemctl start stemedb-api"
# Wait for re-replication
sleep 300 # 5 minutes
# Check replication lag
curl http://node2:18180/metrics | grep replication_lag_seconds
# Expected: <1.0
Expected Result:
- Node 2 failure detected within 30s
- Queries continue to succeed on Node 1 & 3
- Node 2 recovers and re-replicates within 5 minutes
- Final replication lag <1s
Acceptance:
- ✅ Pass: All criteria met
- ❌ Fail: Queries failed OR recovery >5 min
3.3 Rolling Restart (Three-Node Cluster, Zero Downtime)
Requirement: Restart nodes one-by-one during load test → 100% success rate.
Test Procedure:
# Start load test (background)
./scripts/query-load-test.sh --rate 10 --duration 600 &
LOAD_PID=$!
# Wait 60s for baseline
sleep 60
# Restart Node 1
ssh node1 "sudo systemctl restart stemedb-api"
sleep 60
# Restart Node 2
ssh node2 "sudo systemctl restart stemedb-api"
sleep 60
# Restart Node 3
ssh node3 "sudo systemctl restart stemedb-api"
sleep 60
# Wait for load test to complete
wait $LOAD_PID
# Check success rate
grep "Success rate" load-test-results.log
Expected Result:
Success rate: 100.0% (6000/6000 requests succeeded)
Acceptance:
- ✅ Pass: 100% success rate
- ⚠️ Warning: 98-99.9% success rate
- ❌ Fail: <98% success rate
Should Pass
3.4 Metrics Exposed (Prometheus Format)
Requirement: /metrics endpoint returns Prometheus-format metrics.
Test Procedure:
curl http://localhost:18180/metrics | head -20
Expected Result:
# HELP stemedb_assertions_total Total assertions ingested
# TYPE stemedb_assertions_total counter
stemedb_assertions_total 10234
# HELP stemedb_query_latency_seconds Query latency histogram
# TYPE stemedb_query_latency_seconds histogram
stemedb_query_latency_seconds_bucket{le="0.005"} 1234
...
Acceptance:
- ✅ Pass: Valid Prometheus format
- ❌ Fail: Invalid format OR endpoint unreachable
3.5 Grafana Dashboard Loads
Requirement: Grafana dashboard displays StemeDB metrics without errors.
Test Procedure:
- Open http://localhost:3000 (Grafana)
- Navigate to "StemeDB Overview" dashboard
- Check all panels load without errors
Expected Result:
- All panels display data
- No "No data" or "Error" messages
Acceptance:
- ✅ Pass: All panels load
- ⚠️ Warning: 1-2 panels missing data
- ❌ Fail: >2 panels missing data
Nice to Have
3.6 Backup Automation (Cron Job Running)
Requirement: Daily backup cron job configured and executed.
Test Procedure:
# Check cron job exists
sudo crontab -l | grep backup-stemedb
# Expected:
# 0 2 * * * /usr/local/bin/backup-stemedb.sh >> /var/log/stemedb-backup.log 2>&1
# Check last backup
ls -lt backups/ | head -3
# Expected: Backup from last 24 hours
Acceptance:
- ✅ Pass: Cron job exists + recent backup
- ⚠️ Warning: Cron job exists but no recent backup
- ❌ Fail: No cron job
4. Demo Validation: 5 Amazement Moments
All 5 moments must be demonstrable without errors.
Moment 1: Conflicting Claims (FDA 0.2% vs Anecdotal 12%)
Setup:
./scripts/demo-moment-1-conflicting-claims.sh
Demo Script:
- Show 2 assertions: FDA (0.2%) vs Anecdotal (12%)
- Query with ConflictLens → Shows conflict_score: 0.87
- Query with AuthorityLens → Returns FDA value (higher confidence)
- Amazement: "Same data, different answers based on lens choice"
Acceptance:
- ✅ Pass: ConflictLens detects conflict, AuthorityLens picks FDA
- ❌ Fail: Lenses don't differentiate
Moment 2: Source Retraction Cascade (110 Assertions Flagged)
Setup:
./scripts/demo-moment-2-retraction.sh
Demo Script:
- Show study with 110 dependent drug safety assertions
- Retract study:
POST /v1/retractwithcascade: true - Query retracted assertions → 111 total (study + dependents)
- Amazement: "One retraction cascades to 110+ assertions automatically"
Acceptance:
- ✅ Pass: 111 assertions retracted
- ❌ Fail: <110 assertions retracted
Moment 3: Audit Trail (Provenance Chain to Source)
Setup:
./scripts/demo-moment-3-audit-trail.sh
Demo Script:
- Query assertion: "Drug X has adverse event rate 5%"
- Show provenance: "Clinical trial ABC, 2024-06-15"
- Trace to source: "Trial ABC run by Pharma Corp, funded by..."
- Verify signature: Ed25519 signature valid
- Amazement: "Full audit trail from claim to original source"
Acceptance:
- ✅ Pass: Provenance chain complete, signature valid
- ❌ Fail: Missing provenance OR invalid signature
Moment 4: Time-Travel (Query 2023 vs 2025 Guidelines)
Setup:
./scripts/demo-moment-4-time-travel.sh
Demo Script:
- Query aspirin dosage as of 2023 → Returns 81mg
- Query same as of 2025 → Returns 325mg
- Show timeline of changes (3 updates over 2 years)
- Amazement: "See how medical guidelines evolved over time"
Acceptance:
- ✅ Pass: Different values for different timestamps
- ❌ Fail: Same value (time-travel not working)
Moment 5: Lens-Based Resolution (3 Lenses → 3 Winners)
Setup:
./scripts/demo-moment-5-lens-resolution.sh
Demo Script:
- Show 5 conflicting assertions for "recommended dosage"
- Query with RecencyLens → Returns latest assertion
- Query with ConsensusLens → Returns middle value
- Query with AuthorityLens → Returns highest confidence assertion
- Amazement: "Same query, 3 different answers - you choose resolution strategy"
Acceptance:
- ✅ Pass: 3 lenses return 3 different winners
- ❌ Fail: Lenses return same winner
5. Acceptance Criteria
Must Pass (Ship Blockers)
All 15 "Must Pass" criteria must be met:
- 1.1 Query latency p99 <1s
- 1.2 Sustained ingest 1K/sec
- 1.3 Conflict detection >0.5
- 2.1 Audit trail complete
- 2.2 Retraction cascade ≥110
- 2.3 Multi-lens resolution
- 2.4 Health endpoint 200 OK
- 3.1 Backup/restore roundtrip
- 3.2 Node failure recovery (cluster)
- 3.3 Rolling restart (cluster)
- 4.1 Moment 1: Conflicting claims
- 4.2 Moment 2: Retraction cascade
- 4.3 Moment 3: Audit trail
- 4.4 Moment 4: Time-travel
- 4.5 Moment 5: Lens resolution
Should Pass (Recommended)
At least 4/6 "Should Pass" required:
- 1.4 Concurrent query capacity
- 1.5 Replication lag <1s (cluster)
- 2.5 Complex lens (deep chain)
- 2.6 Time-travel query
- 3.4 Metrics exposed
- 3.5 Grafana dashboard
Nice to Have (Optional)
Not required for pilot approval:
- 1.6 Dashboard load time <2s
- 2.7 Swagger UI accessible
- 3.6 Backup automation (cron)
Validation Report Template
Copy this template to document pilot validation results:
# StemeDB Pilot Validation Report
**Date:** YYYY-MM-DD
**Deployment:** [Single-node / Three-node cluster]
**Instance Type:** [AWS t3.large / etc.]
**Assertions:** [Count]
**Evaluator:** [Name]
## Results Summary
| Category | Must Pass | Should Pass | Nice to Have | Total |
|----------|-----------|-------------|--------------|-------|
| Performance | [X/3] | [X/2] | [X/1] | [X/6] |
| Functional | [X/4] | [X/2] | [X/1] | [X/7] |
| Operational | [X/3] | [X/2] | [X/1] | [X/6] |
| Demo | [X/5] | [0/0] | [0/0] | [X/5] |
| **Total** | **[X/15]** | **[X/6]** | **[X/3]** | **[X/24]** |
**Pass Threshold:** 15/15 Must Pass + 4/6 Should Pass = 19/24 minimum
**Actual Score:** [X/24]
**Status:** [✅ PASS / ❌ FAIL]
## Detailed Results
[Paste test results for each criterion]
## Blockers (if any)
[List any "Must Pass" failures]
## Recommendations
[Next steps for production deployment]
## Sign-Off
- [ ] Engineering Lead: ___________________ Date: ___________
- [ ] Operations Lead: ___________________ Date: ___________
- [ ] Product Lead: ___________________ Date: ___________
Related Documentation
- Production Readiness UAT - Pre-validation testing
- Operations Hub - Operational documentation
- Reference Architectures - Deployment models
- Runbooks - Troubleshooting procedures
Last Updated: 2026-02-11