stemedb/arena-roadmap.md
jordan 1ce4004807 feat: Complete Phase 2 (The Cortex) - query, lens, and API layers
This commit adds the read path (Cortex) to complement the write path (Spine):

## Crates
- stemedb-api: HTTP API with axum + utoipa OpenAPI
  - /v1/assert, /v1/query, /v1/epoch, /v1/skeptic, /v1/trace, /v1/audit
  - Metered endpoints with quota enforcement
  - Ed25519 signature verification
- stemedb-lens: Truth resolution lenses
  - RecencyLens, ConsensusLens, ConfidenceLens
  - VoteAwareConsensusLens (Ballot Box pattern)
  - TrustAwareAuthorityLens (The Hive pattern)
  - SkepticLens (conflict analysis)
  - EpochAwareLens (paradigm-safe queries)
- stemedb-query: Query engine with materialized views

## Storage Extensions
- VoteStore: Vote aggregation with cached counts
- TrustRankStore: Agent reputation with decay
- AuditStore: Query audit trail
- IndexStore: SP/P/S index structures
- SupersessionStore: Epoch supersession chains

## SDKs
- sdk/go/steme: Go HTTP client with Ed25519 signing
- sdk/go/adk: ADK-Go tools for AI agents

## Documentation
- Updated CLAUDE.md, architecture.md, roadmap.md
- New ai-lookup entries for all services
- Use case docs for consumer health intelligence
- Arena roadmap for simulation advancement

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 13:22:44 -07:00

20 KiB

Arena Roadmap: The Simulation

Goal: Incrementally evolve the simulator from Spine validation to a full Agent-Based Modeling environment. Philosophy: Make it run. Then add. Verify at every step. Alignment: Tracks main roadmap.md phases; exercises features as they land.


Current State (Baseline)

The simulator (stemedb-sim) currently validates Phase 1: The Spine:

Component Status What It Proves
WAL Durability ✓ Works Writes persist
rkyv Serialization ✓ Works Roundtrip correctness
Ed25519 Signatures ✓ Works Sign on write, verify on read
Ingestor Pipeline ✓ Works WAL → KV async flow
Agent Identity ✓ Works Keypair generation

Run command: cargo run --bin stemedb-sim

What's NOW tested (Arena 1-2):

  • Queries via QueryEngine
  • Lens resolution (Recency, VoteAwareConsensus)
  • Lifecycle filtering
  • Voting & consensus
  • Query audit trail

What's NOT yet tested:

  • HTTP API layer (Arena 2.5.2)
  • Concurrent agents (Arena 6)
  • TrustRank (Arena 5)
  • Materialized Views (Arena 3)
  • Time-travel queries (Arena 7)
  • Crash recovery (Arena 2.5.3)
  • Input validation (Arena 2.5.4)

Arena Phases

Arena 0: Make It Verifiable COMPLETE

Goal: Add assertions for what the simulation proves. Currently it prints logs; we need programmatic success/failure.

Why first: Without assertions, we don't know if later changes break things.

  • 0.1 Return Result from main(): Change from print-and-exit to structured outcome.

    • Define SimulationResult { assertions_written: u64, assertions_verified: u64, errors: Vec<SimulationError> }.
    • Library function run_simulation(config) -> Result<SimulationResult, SimulationSetupError>.
    • Print summary at end, exit 0 on success, exit 1 on failure, exit 2 on setup error.
  • 0.2 Integration Test Wrapper: Make the sim runnable as a test.

    • Add crates/stemedb-sim/tests/smoke.rs with 6 integration tests.
    • Assert on SimulationResult fields.
    • Run in CI via cargo test -p stemedb-sim.

Exit Criteria: cargo test -p stemedb-sim passes


Arena 1: Query Path (Exercises Phase 2 Features) COMPLETE

Goal: Extend simulation to read via QueryEngine, not direct KV access.

Depends on: Phase 2 complete (Query Engine, Lenses) Aligns with: roadmap.md Phase 2 "The Lattice"

  • 1.1 Add Query Engine to Simulation

    • Import stemedb-query crate.
    • After ingestion wait, query each assertion via QueryEngine::execute().
    • Verify result matches what was written.
  • 1.2 Exercise Recency Lens

    • Write 2 assertions for same subject+predicate with different timestamps.
    • Query with lens=Recency.
    • Verify most recent wins.
  • 1.3 Exercise Lifecycle Filtering

    • Write one Proposed and one Approved assertion for same fact.
    • Query with lifecycle=Approved.
    • Verify only Approved returned.
    • Use Case Alignment: This is the JWT signing algorithm bug from Agile Agent Team.
  • 1.4 Query Audit Verification

    • Set X-Agent-Id header (or equivalent in direct API call).
    • After queries, call AuditStore::get_audits_for_agent().
    • Verify audit trail exists with correct contributing assertions.
    • Use Case Alignment: "What did the deployment agent query?"

Exit Criteria: Simulation writes, ingests, queries, and verifies all three scenarios.


Arena 2: Voting & Consensus (Exercises Phase 2 VoteStore) COMPLETE

Goal: Simulate agents voting on assertions and resolving via VoteAwareConsensusLens.

Depends on: Arena 1 complete, Phase 2 VoteStore Aligns with: roadmap.md Phase 2 "The Ballot Box"

  • 2.1 Add Vote Creation to Agents

    • Agent::vote(assertion_hash, weight) method.
    • Votes stored directly via VoteStore (bypasses WAL for now - see note below).
  • 2.2 Conflicting Assertions with Votes

    • Scientist_Alpha asserts "Protein_X binds Receptor_Y" (confidence 0.8).
    • Scientist_Beta asserts "Protein_X binds Receptor_Z" (confidence 0.8).
    • Alpha votes for own assertion (weight 1.0).
    • Beta votes for own assertion (weight 1.0).
    • Third agent (Believer) votes for Alpha's assertion.
    • Query with lens=VoteAwareConsensus.
    • Verify Alpha's assertion wins (2 votes vs 1).
  • 2.3 Troll Vote Resistance

    • Troll creates low-confidence assertion contradicting consensus.
    • Troll votes for own assertion.
    • Verify high-vote assertions still win.

Exit Criteria: Vote-based consensus correctly resolves conflicts.


Arena 2.5: Hardening (Critical Gap Remediation) 🔴 PRIORITY

Goal: Fix critical gaps discovered during Arena 0-2 review before adding more features.

Depends on: Arena 2 complete Blocks: Arena 3+ (don't add features on a shaky foundation) Rationale: Gap analysis revealed 58/100 production readiness score with 3 critical blockers.

  • 2.5.1 Fix Vote Cache Race Condition (P0 - CRITICAL)

    • VoteStore put_vote() uses read-modify-write on VC:/VW: keys
    • Two concurrent calls can lose updates (final count = N instead of N+1)
    • Solution: Add atomic increment or compare-and-swap operation
    • Add test: 100 concurrent put_vote() calls, verify final count
    • File: crates/stemedb-storage/src/vote_store.rs:181-206
  • 2.5.2 Add API Integration Tests (P0 - CRITICAL)

    • Create crates/stemedb-api/tests/http_integration.rs
    • Test POST /assertions - create assertion via HTTP
    • Test POST /votes - submit vote via HTTP
    • Test GET /query - query with lens parameter
    • Test error responses (400 Bad Request, 500 Internal Error)
    • Test rate limiting via QuotaStore middleware
    • Gap: Entire HTTP layer is currently untested
  • 2.5.3 Add Crash Recovery Test (P0 - CRITICAL)

    • Write assertions to WAL
    • Kill IngestWorker mid-step (simulate crash)
    • Restart IngestWorker with same WAL + KV store
    • Verify: cursor resumes correctly, no duplicate ingestion
    • Verify: all pre-crash data is recoverable
    • Validates: Durability claims in architecture.md
  • 2.5.4 Add Input Validation (P1 - HIGH)

    • Max subject length: 1024 characters
    • Max predicate length: 1024 characters
    • Confidence range: 0.0 to 1.0, reject NaN/Inf
    • Vote weight: non-negative, reject NaN/Inf
    • Timestamp: reject values > current time + 1 hour (clock skew protection)
    • Add validation in IngestWorker::ingest_assertion() and ingest_vote()
    • File: crates/stemedb-ingest/src/worker.rs
  • 2.5.5 Replace Sleep Timers with Ingestion Sync (P1 - HIGH)

    • Current: tokio::time::sleep(Duration::from_millis(500)) (flaky)
    • Add: wait_for_ingestion(store, expected_count, timeout) helper
    • Poll store until expected assertions exist or timeout
    • Replace all hardcoded sleeps in simulation
    • Benefit: Faster tests, deterministic behavior
  • 2.5.6 Fix Defensive Error Handling (P2 - MEDIUM)

    • vote_aware_consensus.rs:99-102: Returns [0u8; 32] on serialization failure
    • Change to propagate error or skip candidate with warning
    • worker.rs:161-173: Ambiguous EOF handling treats all I/O errors as "no data"
    • Distinguish true EOF from transient errors

Exit Criteria:

  • Vote cache is atomic (concurrent test passes)
  • API layer has integration tests (POST/GET work via HTTP)
  • Crash recovery is verified (no data loss on restart)
  • Input validation rejects malformed data
  • No hardcoded sleep timers in simulation
  • Production readiness score: 75+ (up from 58)

Estimated Effort: 5-6 days with 1 engineer


Arena 3: Materialized Views (Exercises Phase 2 Materializer)

Goal: Verify fast-path MV reads work under simulation load.

Depends on: Arena 2 complete, Phase 2 Materializer Aligns with: roadmap.md Phase 2 "Materializer"

  • 3.1 Materializer Integration

    • Spin up Materializer alongside Ingestor.
    • Wire Notify between IngestWorker and Materializer.
    • After ingestion, verify MV keys exist in store.
  • 3.2 Fast-Path Verification

    • Query via QueryEngine with subject+predicate.
    • Log whether fast-path or slow-path was used (add debug output).
    • Verify MV winner matches slow-path result.
  • 3.3 MV Freshness Under Load

    • Write 10 assertions in rapid succession.
    • Wait for materialization.
    • Verify MV reflects latest state.
    • Aligns with: Phase 2.5 "MV Staleness Detection"

Exit Criteria: Fast-path queries return correct results under load.


Arena 4: Agent Personas (First Strategy Differentiation)

Goal: Agents behave differently based on persona. No longer uniform.

Depends on: Arena 3 complete Aligns with: Vision document "The Players"

  • 4.1 AgentStrategy Trait

    trait AgentStrategy {
        fn decide_action(&self, state: &WorldState) -> AgentAction;
        fn base_confidence(&self) -> f32;
        fn name(&self) -> &'static str;
    }
    
    enum AgentAction {
        Assert { subject: String, predicate: String, object: ObjectValue },
        Vote { assertion_hash: Hash, weight: f32 },
        Query { subject: String, predicate: String },
        Skip,
    }
    
  • 4.2 Scientist Strategy

    • High base confidence (0.85-0.95).
    • Reads from ground truth, asserts facts.
    • Votes for assertions that match ground truth.
  • 4.3 Troll Strategy

    • Low base confidence (0.3-0.5).
    • Contradicts existing assertions.
    • Votes against high-reputation assertions.
  • 4.4 Believer Strategy

    • Medium base confidence.
    • Queries for consensus, votes for winners.
    • Amplifies existing consensus.
  • 4.5 Strategy-Driven Tick Loop

    • Each tick: iterate agents, call decide_action(), execute.
    • Track per-strategy metrics (assertions created, votes cast).

Exit Criteria: Different agent types produce different behaviors in logs.


Arena 5: TrustRank Integration (Exercises Phase 4 Foundation)

Goal: Reputation updates based on agent behavior.

Depends on: Arena 4 complete, TrustRank implemented Aligns with: roadmap.md Phase 4 "TrustRank Engine"

  • 5.1 Initialize TrustRank for Agents

    • Each agent starts with base TrustRank (e.g., 0.5).
    • Store in TrustRankStore at simulation start.
  • 5.2 Reputation Adjustment After Votes

    • When an assertion gains votes, increase author's TrustRank.
    • When an assertion is contradicted by consensus, decrease author's TrustRank.
    • Use TrustRankStore::record_outcome().
  • 5.3 TrustAwareAuthorityLens Verification

    • Two assertions from different agents, same confidence.
    • Agent with higher TrustRank should win via TrustAwareAuthorityLens.
    • Use Case Alignment: "Expert vs. junior weighting" from Agile Agent Team.
  • 5.4 Troll Reputation Decay

    • After 100 ticks, verify Troll's TrustRank has decreased.
    • Verify Scientist's TrustRank has increased.
    • Success Criteria: "Trust clusters form naturally without hardcoded rules."

Exit Criteria: TrustRank diverges based on behavior; Troll reputation tanks.


Arena 6: Concurrent Agents (Performance Validation)

Goal: Move from sequential to parallel agent execution.

Depends on: Arena 5 complete Aligns with: Vision "1000 concurrent agents without locking"

  • 6.1 Tokio Task Per Agent

    • Wrap each agent's tick in tokio::spawn().
    • Use Arc<Mutex<Journal>> for WAL access (already in place).
    • Run 10 agents concurrently.
  • 6.2 Scale to 100 Agents

    • Parameterize agent count.
    • Run with 100 agents for 50 ticks.
    • Verify no deadlocks, no data corruption.
  • 6.3 Contention Metrics

    • Add timing around WAL lock acquisition.
    • Log P50/P99 latencies.
    • Identify bottlenecks if any.
  • 6.4 Target: 1000 Agents

    • Run with 1000 agents (stretch goal).
    • May require connection pooling or batching.
    • Document findings.

Exit Criteria: 100 agents run concurrently without errors.


Arena 7: Time-Travel & Epochs (Exercises Phase 3 Features)

Goal: Validate temporal queries and epoch supersession.

Depends on: Arena 6 complete, Phase 3 Time-Travel + EpochAwareLens Aligns with: roadmap.md Phase 3 "Time-Travel Engine", Phase 2.5 "EpochAwareLens"

  • 7.1 Time-Travel Query Verification

    • At tick 50, record timestamp T1.
    • At tick 100, write a new assertion superseding an old one.
    • Query with as_of=T1.
    • Verify result reflects tick-50 state, not tick-100 state.
    • Use Case Alignment: "What was the state of knowledge at 9pm?"
  • 7.2 Epoch Creation and Supersession

    • Create epoch "v1" at tick 0.
    • Create epoch "v2" superseding "v1" at tick 50.
    • Assertions referencing "v1" should be filtered by EpochAwareLens.
    • Use Case Alignment: "Security team migrates from RS256 to ES256."
  • 7.3 Epoch Cascade Verification

    • Chain: v3 supersedes v2 supersedes v1.
    • Query with EpochAwareLens.
    • Only v3 assertions visible.

Exit Criteria: Historical queries and epoch filtering work correctly.


Arena 8: Skeptic & Conflict (Exercises Phase 3 Lenses)

Goal: Surface disagreement, measure consensus.

Depends on: Arena 7 complete, Phase 3 Skeptic Lens + Conflict Score Aligns with: roadmap.md Phase 3C "Skeptic Lens", Phase 3A.2 "Conflict Score"

  • 8.1 High-Conflict Scenario

    • 3 Scientist agents assert conflicting values for same fact.
    • Each votes for own assertion.
    • Query with lens=Skeptic.
    • Verify conflict_score is high (> 0.5).
  • 8.2 Low-Conflict Scenario

    • 3 Scientists assert same value (agreement).
    • Query with lens=Skeptic.
    • Verify conflict_score is low (< 0.2).
  • 8.3 Skeptic Surfaces Outlier

    • Consensus is A, one dissenter says B.
    • Skeptic lens returns B (the controversial position).
    • Use Case Alignment: Financial Due Diligence "disagreement is the information."

Exit Criteria: Conflict score accurately reflects disagreement.


Arena 9: Full Gameplay Loop (The Vision)

Goal: Run the complete vision scenario end-to-end.

Depends on: Arena 8 complete, all Phase 3 features Aligns with: simulation-vision.md "The Gameplay Loop"

  • 9.1 Ground Truth Injection

    • Load ground truth from YAML config.
    • Scientists read ground truth, assert facts.
  • 9.2 The 5-Tick Scenario

    • Tick 1: Scientist asserts "Protein_X binds Receptor_Y".
    • Tick 2: Troll forks with "Protein_X binds Nothing".
    • Tick 3: Believer queries, votes for Scientist.
    • Tick 4: TrustRank updates (Scientist up, Troll down).
    • Tick 5: Verify consensus via lens.
  • 9.3 Extended Run (1000 Ticks)

    • Run full scenario for 1000 ticks.
    • Track metrics:
      • truth_convergence: % of facts matching ground truth.
      • reputation_distribution: Scientist vs Troll ranks.
      • fork_depth_max: Deepest contradiction chain.
  • 9.4 Success Criteria Verification

    • ✓ Truth survives: High-reputation assertions outlive spam.
    • ✓ Lenses work: Consensus lens filters Troll noise.
    • ✓ Performance: 1000 ticks complete in < 30 seconds.
    • ✓ Emergence: Trust clusters form naturally.

Exit Criteria: All 4 success criteria from vision document pass.


Alignment with Main Roadmap

Arena Phase Exercises Roadmap Phase Key Features Validated
Arena 0 - Test infrastructure
Arena 1 Phase 2 QueryEngine, Lenses, Lifecycle, Query Audit
Arena 2 Phase 2 VoteStore, VoteAwareConsensusLens
Arena 2.5 - (Hardening) Race conditions, API tests, crash recovery, input validation
Arena 3 Phase 2 Materializer, Fast-Path MV
Arena 4 - Agent differentiation (simulator-only)
Arena 5 Phase 4 TrustRank, TrustAwareAuthorityLens
Arena 6 Phase 4 Concurrency, Performance
Arena 7 Phase 2.5 + Phase 3 Time-Travel, Epochs, EpochAwareLens
Arena 8 Phase 3 Skeptic Lens, Conflict Score
Arena 9 All Full integration

Alignment with Use Cases

Use Case Arena Phase That Validates It
Agile Agent Team
- Lifecycle filtering Arena 1.3
- Query audit trail Arena 1.4
- Time-travel debugging Arena 7.1
- Expert weighting Arena 5.3
- Persistent learning Arena 5.4 (TrustRank)
Financial Due Diligence
- Conflict detection Arena 8.1, 8.3
- Time-travel Arena 7.1
- Epoch cascades Arena 7.2, 7.3
Consumer Health
- Source-class hierarchy Phase 3 dependency (not in Arena yet)
- Layered consensus Phase 3 dependency

Development Cadence

Week Focus Deliverable
1 Arena 0 CI-runnable simulation
2 Arena 1 Query path verified
3 Arena 2 Voting verified
4 Arena 2.5 Hardening: race fix, API tests, crash recovery
5 Arena 3 Materializer + MVs verified
6 Arena 4 Agent personas differentiated
7-8 Arena 5-6 TrustRank + concurrency
9-10 Arena 7-8 Time-travel + Skeptic
11-12 Arena 9 Full gameplay loop

Metrics to Track

Once Arena 6+ is complete, export these to logs (and eventually Prometheus):

Metric Description Success Target
truth_convergence % of facts matching ground truth > 95%
troll_reputation Troll agent TrustRank at end < 0.2
scientist_reputation Scientist agent TrustRank at end > 0.8
fork_depth_max Deepest contradiction chain < 10
p99_write_latency_ms Write path latency < 10ms
p99_query_latency_ms Query path latency < 50ms
concurrent_agents Max concurrent agents without errors 1000

Non-Goals (Kept Simple)

These are explicitly out of scope for the Arena:

  • Prometheus/Grafana integration - Logs suffice for Phase 3.
  • YAML scenario config - Hardcoded scenarios are fine until Arena 9.
  • Full chaos injection (network partitions, node kills) - Basic crash recovery in 2.5; advanced chaos deferred to Phase 4+.
  • External agent frameworks (ADK-Go) - Simulator uses Rust agents.

Note: HTTP API testing was previously a non-goal but is now addressed in Arena 2.5.2 due to critical gap discovery.


Next Step

Arena 0, 1, and 2 are complete. Before proceeding to Arena 3, complete Arena 2.5: Hardening.

Priority Order for Arena 2.5:

  1. 2.5.1 Fix Vote Cache Race (Day 1) — Critical, can cause data corruption
  2. 2.5.2 API Integration Tests (Day 2-3) — Critical, HTTP layer is untested
  3. 2.5.3 Crash Recovery Test (Day 4) — Critical, validates durability claims
  4. 2.5.4 Input Validation (Day 5) — High, defensive hardening
  5. 2.5.5 Replace Sleep Timers (Day 5) — High, faster/deterministic tests
  6. 2.5.6 Fix Defensive Errors (Day 6) — Medium, better error handling
# Verify Arena 0 + 1 + 2 still work:
cargo test -p stemedb-sim

# Binary also works:
cargo run --bin stemedb-sim

After Arena 2.5 is complete (production readiness ≥ 75), proceed to Arena 3.1: Materializer Integration.