# Arena Roadmap: The Simulation > **Goal:** Incrementally evolve the simulator from Spine validation to a full Agent-Based Modeling environment. > **Philosophy:** Make it run. Then add. Verify at every step. > **Alignment:** Tracks main `roadmap.md` phases; exercises features as they land. --- ## Current State (Baseline) The simulator (`stemedb-sim`) currently validates **Phase 1: The Spine**: | Component | Status | What It Proves | |-----------|--------|----------------| | WAL Durability | ✓ Works | Writes persist | | rkyv Serialization | ✓ Works | Roundtrip correctness | | Ed25519 Signatures | ✓ Works | Sign on write, verify on read | | Ingestor Pipeline | ✓ Works | WAL → KV async flow | | Agent Identity | ✓ Works | Keypair generation | **Run command:** `cargo run --bin stemedb-sim` **What's NOW tested (Arena 1-3):** - ✅ Queries via QueryEngine - ✅ Lens resolution (Recency, VoteAwareConsensus) - ✅ Lifecycle filtering - ✅ Voting & consensus - ✅ Query audit trail - ✅ Materialized Views (Arena 3) - ✅ Fast-path MV reads - ✅ MV freshness under load **What's NOT yet tested:** - ❌ HTTP API layer (Arena 2.5.2) - ❌ Concurrent agents (Arena 6) - ❌ TrustRank (Arena 5) - ❌ Time-travel queries (Arena 7) - ❌ Crash recovery (Arena 2.5.3) - ❌ Input validation (Arena 2.5.4) --- ## Arena Phases ### Arena 0: Make It Verifiable ✅ COMPLETE *Goal: Add assertions for what the simulation proves. Currently it prints logs; we need programmatic success/failure.* **Why first:** Without assertions, we don't know if later changes break things. - [x] **0.1 Return Result from main()**: Change from print-and-exit to structured outcome. - [x] Define `SimulationResult { assertions_written: u64, assertions_verified: u64, errors: Vec }`. - [x] Library function `run_simulation(config) -> Result`. - [x] Print summary at end, exit 0 on success, exit 1 on failure, exit 2 on setup error. - [x] **0.2 Integration Test Wrapper**: Make the sim runnable as a test. - [x] Add `crates/stemedb-sim/tests/smoke.rs` with 6 integration tests. - [x] Assert on `SimulationResult` fields. - [x] Run in CI via `cargo test -p stemedb-sim`. **Exit Criteria:** `cargo test -p stemedb-sim` passes ✅ --- ### Arena 1: Query Path (Exercises Phase 2 Features) ✅ COMPLETE *Goal: Extend simulation to read via QueryEngine, not direct KV access.* **Depends on:** Phase 2 complete (Query Engine, Lenses) **Aligns with:** `roadmap.md` Phase 2 "The Lattice" - [x] **1.1 Add Query Engine to Simulation** - [x] Import `stemedb-query` crate. - [x] After ingestion wait, query each assertion via `QueryEngine::execute()`. - [x] Verify result matches what was written. - [x] **1.2 Exercise Recency Lens** - [x] Write 2 assertions for same subject+predicate with different timestamps. - [x] Query with `lens=Recency`. - [x] Verify most recent wins. - [x] **1.3 Exercise Lifecycle Filtering** - [x] Write one `Proposed` and one `Approved` assertion for same fact. - [x] Query with `lifecycle=Approved`. - [x] Verify only Approved returned. - [x] **Use Case Alignment:** This is the JWT signing algorithm bug from Agile Agent Team. - [x] **1.4 Query Audit Verification** - [x] Set `X-Agent-Id` header (or equivalent in direct API call). - [x] After queries, call `AuditStore::get_audits_for_agent()`. - [x] Verify audit trail exists with correct contributing assertions. - [x] **Use Case Alignment:** "What did the deployment agent query?" **Exit Criteria:** Simulation writes, ingests, queries, and verifies all three scenarios. ✅ --- ### Arena 2: Voting & Consensus (Exercises Phase 2 VoteStore) ✅ COMPLETE *Goal: Simulate agents voting on assertions and resolving via VoteAwareConsensusLens.* **Depends on:** Arena 1 complete, Phase 2 VoteStore **Aligns with:** `roadmap.md` Phase 2 "The Ballot Box" - [x] **2.1 Add Vote Creation to Agents** - [x] `Agent::vote(assertion_hash, weight)` method. - [x] Votes stored directly via VoteStore (bypasses WAL for now - see note below). - [x] **2.2 Conflicting Assertions with Votes** - [x] Scientist_Alpha asserts "Protein_X binds Receptor_Y" (confidence 0.8). - [x] Scientist_Beta asserts "Protein_X binds Receptor_Z" (confidence 0.8). - [x] Alpha votes for own assertion (weight 1.0). - [x] Beta votes for own assertion (weight 1.0). - [x] Third agent (Believer) votes for Alpha's assertion. - [x] Query with `lens=VoteAwareConsensus`. - [x] Verify Alpha's assertion wins (2 votes vs 1). - [x] **2.3 Troll Vote Resistance** - [x] Troll creates low-confidence assertion contradicting consensus. - [x] Troll votes for own assertion. - [x] Verify high-vote assertions still win. **Exit Criteria:** Vote-based consensus correctly resolves conflicts. ✅ --- ### Arena 2.5: Hardening (Critical Gap Remediation) 🔴 PRIORITY *Goal: Fix critical gaps discovered during Arena 0-2 review before adding more features.* **Depends on:** Arena 2 complete **Blocks:** Arena 3+ (don't add features on a shaky foundation) **Rationale:** Gap analysis revealed 58/100 production readiness score with 3 critical blockers. - [ ] **2.5.1 Fix Vote Cache Race Condition** (P0 - CRITICAL) - [ ] VoteStore `put_vote()` uses read-modify-write on VC:/VW: keys - [ ] Two concurrent calls can lose updates (final count = N instead of N+1) - [ ] Solution: Add atomic increment or compare-and-swap operation - [ ] Add test: 100 concurrent `put_vote()` calls, verify final count - [ ] **File:** `crates/stemedb-storage/src/vote_store.rs:181-206` - [ ] **2.5.2 Add API Integration Tests** (P0 - CRITICAL) - [ ] Create `crates/stemedb-api/tests/http_integration.rs` - [ ] Test `POST /assertions` - create assertion via HTTP - [ ] Test `POST /votes` - submit vote via HTTP - [ ] Test `GET /query` - query with lens parameter - [ ] Test error responses (400 Bad Request, 500 Internal Error) - [ ] Test rate limiting via QuotaStore middleware - [ ] **Gap:** Entire HTTP layer is currently untested - [ ] **2.5.3 Add Crash Recovery Test** (P0 - CRITICAL) - [ ] Write assertions to WAL - [ ] Kill IngestWorker mid-step (simulate crash) - [ ] Restart IngestWorker with same WAL + KV store - [ ] Verify: cursor resumes correctly, no duplicate ingestion - [ ] Verify: all pre-crash data is recoverable - [ ] **Validates:** Durability claims in architecture.md - [ ] **2.5.4 Add Input Validation** (P1 - HIGH) - [ ] Max subject length: 1024 characters - [ ] Max predicate length: 1024 characters - [ ] Confidence range: 0.0 to 1.0, reject NaN/Inf - [ ] Vote weight: non-negative, reject NaN/Inf - [ ] Timestamp: reject values > current time + 1 hour (clock skew protection) - [ ] Add validation in `IngestWorker::ingest_assertion()` and `ingest_vote()` - [ ] **File:** `crates/stemedb-ingest/src/worker.rs` - [ ] **2.5.5 Replace Sleep Timers with Ingestion Sync** (P1 - HIGH) - [ ] Current: `tokio::time::sleep(Duration::from_millis(500))` (flaky) - [ ] Add: `wait_for_ingestion(store, expected_count, timeout)` helper - [ ] Poll store until expected assertions exist or timeout - [ ] Replace all hardcoded sleeps in simulation - [ ] **Benefit:** Faster tests, deterministic behavior - [ ] **2.5.6 Fix Defensive Error Handling** (P2 - MEDIUM) - [ ] `vote_aware_consensus.rs:99-102`: Returns `[0u8; 32]` on serialization failure - [ ] Change to propagate error or skip candidate with warning - [ ] `worker.rs:161-173`: Ambiguous EOF handling treats all I/O errors as "no data" - [ ] Distinguish true EOF from transient errors **Exit Criteria:** - [ ] Vote cache is atomic (concurrent test passes) - [ ] API layer has integration tests (POST/GET work via HTTP) - [ ] Crash recovery is verified (no data loss on restart) - [ ] Input validation rejects malformed data - [ ] No hardcoded sleep timers in simulation - [ ] Production readiness score: 75+ (up from 58) **Estimated Effort:** 5-6 days with 1 engineer --- ### Arena 3: Materialized Views (Exercises Phase 2 Materializer) ✅ COMPLETE *Goal: Verify fast-path MV reads work under simulation load.* **Depends on:** Arena 2 complete, Phase 2 Materializer **Aligns with:** `roadmap.md` Phase 2 "Materializer" - [x] **3.1 Materializer Integration** - [x] Spin up Materializer alongside Ingestor. - [x] Wire `Notify` between IngestWorker and Materializer. - [x] After ingestion, verify MV keys exist in store. - [x] **3.2 Fast-Path Verification** - [x] Query via QueryEngine with subject+predicate. - [x] Log whether fast-path or slow-path was used (add debug output). - [x] Verify MV winner matches slow-path result. - [x] **3.3 MV Freshness Under Load** - [x] Write 10 assertions in rapid succession. - [x] Wait for materialization. - [x] Verify MV reflects latest state. - [x] **Aligns with:** Phase 2.5 "MV Staleness Detection" **Exit Criteria:** Fast-path queries return correct results under load. ✅ --- ### Arena 4: Agent Personas (First Strategy Differentiation) *Goal: Agents behave differently based on persona. No longer uniform.* **Depends on:** Arena 3 complete **Aligns with:** Vision document "The Players" - [ ] **4.1 AgentStrategy Trait** ```rust trait AgentStrategy { fn decide_action(&self, state: &WorldState) -> AgentAction; fn base_confidence(&self) -> f32; fn name(&self) -> &'static str; } enum AgentAction { Assert { subject: String, predicate: String, object: ObjectValue }, Vote { assertion_hash: Hash, weight: f32 }, Query { subject: String, predicate: String }, Skip, } ``` - [ ] **4.2 Scientist Strategy** - [ ] High base confidence (0.85-0.95). - [ ] Reads from ground truth, asserts facts. - [ ] Votes for assertions that match ground truth. - [ ] **4.3 Troll Strategy** - [ ] Low base confidence (0.3-0.5). - [ ] Contradicts existing assertions. - [ ] Votes against high-reputation assertions. - [ ] **4.4 Believer Strategy** - [ ] Medium base confidence. - [ ] Queries for consensus, votes for winners. - [ ] Amplifies existing consensus. - [ ] **4.5 Strategy-Driven Tick Loop** - [ ] Each tick: iterate agents, call `decide_action()`, execute. - [ ] Track per-strategy metrics (assertions created, votes cast). **Exit Criteria:** Different agent types produce different behaviors in logs. --- ### Arena 5: TrustRank Integration (Exercises Phase 4 Foundation) *Goal: Reputation updates based on agent behavior.* **Depends on:** Arena 4 complete, TrustRank implemented **Aligns with:** `roadmap.md` Phase 4 "TrustRank Engine" - [ ] **5.1 Initialize TrustRank for Agents** - [ ] Each agent starts with base TrustRank (e.g., 0.5). - [ ] Store in TrustRankStore at simulation start. - [ ] **5.2 Reputation Adjustment After Votes** - [ ] When an assertion gains votes, increase author's TrustRank. - [ ] When an assertion is contradicted by consensus, decrease author's TrustRank. - [ ] Use `TrustRankStore::record_outcome()`. - [ ] **5.3 TrustAwareAuthorityLens Verification** - [ ] Two assertions from different agents, same confidence. - [ ] Agent with higher TrustRank should win via `TrustAwareAuthorityLens`. - [ ] **Use Case Alignment:** "Expert vs. junior weighting" from Agile Agent Team. - [ ] **5.4 Troll Reputation Decay** - [ ] After 100 ticks, verify Troll's TrustRank has decreased. - [ ] Verify Scientist's TrustRank has increased. - [ ] **Success Criteria:** "Trust clusters form naturally without hardcoded rules." **Exit Criteria:** TrustRank diverges based on behavior; Troll reputation tanks. --- ### Arena 6: Concurrent Agents (Performance Validation) *Goal: Move from sequential to parallel agent execution.* **Depends on:** Arena 5 complete **Aligns with:** Vision "1000 concurrent agents without locking" - [ ] **6.1 Tokio Task Per Agent** - [ ] Wrap each agent's tick in `tokio::spawn()`. - [ ] Use `Arc>` for WAL access (already in place). - [ ] Run 10 agents concurrently. - [ ] **6.2 Scale to 100 Agents** - [ ] Parameterize agent count. - [ ] Run with 100 agents for 50 ticks. - [ ] Verify no deadlocks, no data corruption. - [ ] **6.3 Contention Metrics** - [ ] Add timing around WAL lock acquisition. - [ ] Log P50/P99 latencies. - [ ] Identify bottlenecks if any. - [ ] **6.4 Target: 1000 Agents** - [ ] Run with 1000 agents (stretch goal). - [ ] May require connection pooling or batching. - [ ] Document findings. **Exit Criteria:** 100 agents run concurrently without errors. --- ### Arena 7: Time-Travel & Epochs (Exercises Phase 3 Features) *Goal: Validate temporal queries and epoch supersession.* **Depends on:** Arena 6 complete, Phase 3 Time-Travel + EpochAwareLens **Aligns with:** `roadmap.md` Phase 3 "Time-Travel Engine", Phase 2.5 "EpochAwareLens" - [ ] **7.1 Time-Travel Query Verification** - [ ] At tick 50, record timestamp T1. - [ ] At tick 100, write a new assertion superseding an old one. - [ ] Query with `as_of=T1`. - [ ] Verify result reflects tick-50 state, not tick-100 state. - [ ] **Use Case Alignment:** "What was the state of knowledge at 9pm?" - [ ] **7.2 Epoch Creation and Supersession** - [ ] Create epoch "v1" at tick 0. - [ ] Create epoch "v2" superseding "v1" at tick 50. - [ ] Assertions referencing "v1" should be filtered by EpochAwareLens. - [ ] **Use Case Alignment:** "Security team migrates from RS256 to ES256." - [ ] **7.3 Epoch Cascade Verification** - [ ] Chain: v3 supersedes v2 supersedes v1. - [ ] Query with EpochAwareLens. - [ ] Only v3 assertions visible. **Exit Criteria:** Historical queries and epoch filtering work correctly. --- ### Arena 8: Skeptic & Conflict (Exercises Phase 3 Lenses) *Goal: Surface disagreement, measure consensus.* **Depends on:** Arena 7 complete, Phase 3 Skeptic Lens + Conflict Score **Aligns with:** `roadmap.md` Phase 3C "Skeptic Lens", Phase 3A.2 "Conflict Score" - [ ] **8.1 High-Conflict Scenario** - [ ] 3 Scientist agents assert conflicting values for same fact. - [ ] Each votes for own assertion. - [ ] Query with `lens=Skeptic`. - [ ] Verify `conflict_score` is high (> 0.5). - [ ] **8.2 Low-Conflict Scenario** - [ ] 3 Scientists assert same value (agreement). - [ ] Query with `lens=Skeptic`. - [ ] Verify `conflict_score` is low (< 0.2). - [ ] **8.3 Skeptic Surfaces Outlier** - [ ] Consensus is A, one dissenter says B. - [ ] Skeptic lens returns B (the controversial position). - [ ] **Use Case Alignment:** Financial Due Diligence "disagreement is the information." **Exit Criteria:** Conflict score accurately reflects disagreement. --- ### Arena 9: Full Gameplay Loop (The Vision) *Goal: Run the complete vision scenario end-to-end.* **Depends on:** Arena 8 complete, all Phase 3 features **Aligns with:** `simulation-vision.md` "The Gameplay Loop" - [ ] **9.1 Ground Truth Injection** - [ ] Load ground truth from YAML config. - [ ] Scientists read ground truth, assert facts. - [ ] **9.2 The 5-Tick Scenario** - [ ] Tick 1: Scientist asserts "Protein_X binds Receptor_Y". - [ ] Tick 2: Troll forks with "Protein_X binds Nothing". - [ ] Tick 3: Believer queries, votes for Scientist. - [ ] Tick 4: TrustRank updates (Scientist up, Troll down). - [ ] Tick 5: Verify consensus via lens. - [ ] **9.3 Extended Run (1000 Ticks)** - [ ] Run full scenario for 1000 ticks. - [ ] Track metrics: - `truth_convergence`: % of facts matching ground truth. - `reputation_distribution`: Scientist vs Troll ranks. - `fork_depth_max`: Deepest contradiction chain. - [ ] **9.4 Success Criteria Verification** - [ ] ✓ Truth survives: High-reputation assertions outlive spam. - [ ] ✓ Lenses work: Consensus lens filters Troll noise. - [ ] ✓ Performance: 1000 ticks complete in < 30 seconds. - [ ] ✓ Emergence: Trust clusters form naturally. **Exit Criteria:** All 4 success criteria from vision document pass. --- ## Alignment with Main Roadmap | Arena Phase | Exercises Roadmap Phase | Key Features Validated | |-------------|------------------------|------------------------| | Arena 0 ✅ | - | Test infrastructure | | Arena 1 ✅ | Phase 2 | QueryEngine, Lenses, Lifecycle, Query Audit | | Arena 2 ✅ | Phase 2 | VoteStore, VoteAwareConsensusLens | | **Arena 2.5** | **- (Hardening)** | **Race conditions, API tests, crash recovery, input validation** | | Arena 3 ✅ | Phase 2 | Materializer, Fast-Path MV, MV Freshness | | Arena 4 | - | Agent differentiation (simulator-only) | | Arena 5 | Phase 4 | TrustRank, TrustAwareAuthorityLens | | Arena 6 | Phase 4 | Concurrency, Performance | | Arena 7 | Phase 2.5 + Phase 3 | Time-Travel, Epochs, EpochAwareLens | | Arena 8 | Phase 3 | Skeptic Lens, Conflict Score | | Arena 9 | All | Full integration | --- ## Alignment with Use Cases | Use Case | Arena Phase That Validates It | |----------|-------------------------------| | **Agile Agent Team** | | | - Lifecycle filtering | Arena 1.3 | | - Query audit trail | Arena 1.4 | | - Time-travel debugging | Arena 7.1 | | - Expert weighting | Arena 5.3 | | - Persistent learning | Arena 5.4 (TrustRank) | | **Financial Due Diligence** | | | - Conflict detection | Arena 8.1, 8.3 | | - Time-travel | Arena 7.1 | | - Epoch cascades | Arena 7.2, 7.3 | | **Consumer Health** | | | - Source-class hierarchy | Phase 3 dependency (not in Arena yet) | | - Layered consensus | Phase 3 dependency | --- ## Development Cadence | Week | Focus | Deliverable | |------|-------|-------------| | 1 | Arena 0 | CI-runnable simulation ✅ | | 2 | Arena 1 | Query path verified ✅ | | 3 | Arena 2 | Voting verified ✅ | | **4** | **Arena 2.5** | **Hardening: race fix, API tests, crash recovery** | | 5 | Arena 3 | Materializer + MVs verified | | 6 | Arena 4 | Agent personas differentiated | | 7-8 | Arena 5-6 | TrustRank + concurrency | | 9-10 | Arena 7-8 | Time-travel + Skeptic | | 11-12 | Arena 9 | Full gameplay loop | --- ## Metrics to Track Once Arena 6+ is complete, export these to logs (and eventually Prometheus): | Metric | Description | Success Target | |--------|-------------|----------------| | `truth_convergence` | % of facts matching ground truth | > 95% | | `troll_reputation` | Troll agent TrustRank at end | < 0.2 | | `scientist_reputation` | Scientist agent TrustRank at end | > 0.8 | | `fork_depth_max` | Deepest contradiction chain | < 10 | | `p99_write_latency_ms` | Write path latency | < 10ms | | `p99_query_latency_ms` | Query path latency | < 50ms | | `concurrent_agents` | Max concurrent agents without errors | 1000 | --- ## Non-Goals (Kept Simple) These are explicitly out of scope for the Arena: - **Prometheus/Grafana integration** - Logs suffice for Phase 3. - **YAML scenario config** - Hardcoded scenarios are fine until Arena 9. - **Full chaos injection (network partitions, node kills)** - Basic crash recovery in 2.5; advanced chaos deferred to Phase 4+. - **External agent frameworks (ADK-Go)** - Simulator uses Rust agents. **Note:** HTTP API testing was previously a non-goal but is now addressed in Arena 2.5.2 due to critical gap discovery. --- ## Next Step Arena 0, 1, and 2 are complete. **Before proceeding to Arena 3**, complete **Arena 2.5: Hardening**. ### Priority Order for Arena 2.5: 1. **2.5.1 Fix Vote Cache Race** (Day 1) — Critical, can cause data corruption 2. **2.5.2 API Integration Tests** (Day 2-3) — Critical, HTTP layer is untested 3. **2.5.3 Crash Recovery Test** (Day 4) — Critical, validates durability claims 4. **2.5.4 Input Validation** (Day 5) — High, defensive hardening 5. **2.5.5 Replace Sleep Timers** (Day 5) — High, faster/deterministic tests 6. **2.5.6 Fix Defensive Errors** (Day 6) — Medium, better error handling ```bash # Verify Arena 0 + 1 + 2 still work: cargo test -p stemedb-sim # Binary also works: cargo run --bin stemedb-sim ``` After Arena 2.5 is complete (production readiness ≥ 75), proceed to **Arena 3.1**: Materializer Integration.