Add CRC32C checksums to WAL record format (v2), implement crash recovery with automatic truncation of corrupt records, add feature-gated group commit buffer for batched fsync under concurrent load, and implement log rotation via segment files with global offset addressing. Key changes: - Record format v2: [len:u32][crc32c:u32][blake3:32][payload:N] - recover_file() scans and truncates corrupt tail records - GroupCommitBuffer batches fsync via MPSC channel (tokio feature gate) - SegmentManager with binary search resolution and cursor-based cleanup - Journal::read() auto-refreshes segments on miss for writer/reader split - Split recovery.rs and key_codec.rs into directory modules for 500-line max Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
731 lines
22 KiB
Markdown
731 lines
22 KiB
Markdown
# Arena Simulation: Critical Gap Analysis
|
|
|
|
**Analyst:** BurntSushi (Rust Quality Engineer)
|
|
**Date:** 2026-02-01
|
|
**Subject:** Production Readiness Assessment of stemedb-sim Arena Tests
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
The Arena simulation tests fundamental write→read paths through the system, but exhibits **significant gaps** between simulation coverage and real-world production scenarios. While the simulation successfully validates basic pipeline connectivity, it provides a **false sense of security** regarding system robustness.
|
|
|
|
**Production Readiness Score: 58/100**
|
|
|
|
| Category | Score | Status |
|
|
|----------|-------|--------|
|
|
| Basic Integration Coverage | 85/100 | ✅ Good |
|
|
| Failure Mode Testing | 30/100 | ❌ Critical Gap |
|
|
| Performance Validation | 20/100 | ❌ Critical Gap |
|
|
| API Integration | 0/100 | ❌ Not Tested |
|
|
| Concurrency Testing | 15/100 | ❌ Critical Gap |
|
|
|
|
---
|
|
|
|
## What's Actually Being Tested
|
|
|
|
### Arena 0: Spine (WAL → Ingestor → KVStore)
|
|
|
|
**Code Path Exercised:**
|
|
```
|
|
Agent.sign_assertion() → write_assertion_to_wal() → Journal.append()
|
|
→ IngestWorker.step() → IngestWorker.ingest_assertion()
|
|
→ HybridStore.put() → IndexStore.add_to_indexes()
|
|
```
|
|
|
|
**What Works:**
|
|
- ✅ Basic serialization round-trip (`serialize_assertion` → `deserialize`)
|
|
- ✅ Ed25519 signature verification in `IngestWorker.verify_assertion_signatures()`
|
|
- ✅ Content-addressed storage (`H:{blake3_hash}`)
|
|
- ✅ Subject index updates (`S:{subject}`)
|
|
- ✅ Compound index updates (`SP:{subject}:{predicate}`)
|
|
|
|
**Confidence Rating: 90%** — The simulation accurately tests the happy path for assertion ingestion.
|
|
|
|
---
|
|
|
|
### Arena 1: Cortex (QueryEngine → Lenses)
|
|
|
|
**Code Path Exercised:**
|
|
```
|
|
QueryEngine.execute() → fetch_by_subject_predicate()
|
|
→ RecencyLens.resolve() / VoteAwareConsensusLens.resolve_async()
|
|
→ Resolution with winner selection
|
|
```
|
|
|
|
**What Works:**
|
|
- ✅ Query execution via compound index (`SP:{subject}:{predicate}`)
|
|
- ✅ Subject-only queries (`S:{subject}`)
|
|
- ✅ Lifecycle filtering in `Query.matches()`
|
|
- ✅ RecencyLens timestamp comparison
|
|
- ✅ Audit trail storage via `GenericAuditStore.put_audit()`
|
|
|
|
**Confidence Rating: 85%** — Query path is tested, but lacks edge cases.
|
|
|
|
---
|
|
|
|
### Arena 2: Voting & Consensus
|
|
|
|
**Code Path Exercised:**
|
|
```
|
|
Agent.vote() → write_vote_to_wal() → IngestWorker.ingest_vote()
|
|
→ VoteStore.put_vote() → VoteStore cache updates (VC:{hash}, VW:{hash})
|
|
→ VoteAwareConsensusLens.resolve_async() → Aggregate weight ranking
|
|
```
|
|
|
|
**What Works:**
|
|
- ✅ Vote serialization/deserialization
|
|
- ✅ Vote storage at `V:{assertion_hash}:{vote_hash}`
|
|
- ✅ Vote count cache increments (`VC:{assertion_hash}`)
|
|
- ✅ Aggregate weight cache updates (`VW:{assertion_hash}`)
|
|
- ✅ Winner selection based on highest aggregate weight
|
|
- ✅ Tiebreaker by timestamp
|
|
|
|
**Confidence Rating: 80%** — Vote mechanics work, but lacks adversarial testing.
|
|
|
|
---
|
|
|
|
## Critical Gaps: What's NOT Being Tested
|
|
|
|
### 1. Failure Modes (Severity: CRITICAL)
|
|
|
|
**Missing:**
|
|
- ❌ WAL corruption mid-write (partial record)
|
|
- ❌ Sled crashes during `put()` (fsync failure)
|
|
- ❌ Network partition (if distributed)
|
|
- ❌ OOM conditions during large scans
|
|
- ❌ Disk full during ingestion
|
|
- ❌ Malformed rkyv archives (corrupted length fields)
|
|
- ❌ Hash collisions (theoretical but untested)
|
|
|
|
**Why This Matters:**
|
|
The simulation calls `.expect()` on all storage operations, hiding errors:
|
|
```rust
|
|
// lib.rs:466
|
|
match write_assertion_to_wal(&journal, &assertion).await {
|
|
Ok(raw_bytes) => { /* success path */ }
|
|
Err(e) => {
|
|
result.errors.push(SimulationError { /* logged but not replayed */ });
|
|
}
|
|
}
|
|
```
|
|
|
|
**Problem:** Errors are collected but the simulation continues. No test verifies recovery behavior after failure.
|
|
|
|
**Recommendation:**
|
|
Create `arena-3-chaos.rs` that:
|
|
- Kills IngestWorker mid-step, restarts, verifies cursor recovery
|
|
- Corrupts WAL records, verifies quarantine behavior
|
|
- Fills disk to 100%, verifies graceful degradation
|
|
|
|
**Confidence Rating: 10%** — Failure mode coverage is nearly absent.
|
|
|
|
---
|
|
|
|
### 2. Concurrency & Race Conditions (Severity: CRITICAL)
|
|
|
|
**Missing:**
|
|
- ❌ Multiple IngestWorkers reading same WAL (race on cursor)
|
|
- ❌ Concurrent queries while ingestion is active
|
|
- ❌ Vote cache race conditions (VC/VW updates not atomic)
|
|
- ❌ Index corruption from concurrent writes
|
|
- ❌ Deadlock detection between Journal mutex and Store mutex
|
|
|
|
**Why This Matters:**
|
|
The simulation is **strictly sequential**:
|
|
```rust
|
|
// lib.rs:458 - Arena 0 tick loop is single-threaded
|
|
for tick in 0..config.tick_count {
|
|
let agent = &agents[tick % agents.len()];
|
|
let assertion = agent.sign_assertion(...);
|
|
match write_assertion_to_wal(&journal, &assertion).await { /* ... */ }
|
|
}
|
|
```
|
|
|
|
**Then waits 500ms:**
|
|
```rust
|
|
// lib.rs:486
|
|
tokio::time::sleep(std::time::Duration::from_millis(config.ingestion_wait_ms)).await;
|
|
```
|
|
|
|
**Problem:** No concurrent writes, no overlapping ingestion, no parallel queries.
|
|
|
|
**Real-World Scenario:**
|
|
- 1000 agents writing assertions simultaneously
|
|
- 10 queries executing while votes are being ingested
|
|
- Materializer updating MVs while QueryEngine reads them
|
|
|
|
**Recommendation:**
|
|
```rust
|
|
// Spawn 100 concurrent writers
|
|
let handles: Vec<_> = (0..100).map(|i| {
|
|
let journal = journal.clone();
|
|
tokio::spawn(async move {
|
|
for _ in 0..10 {
|
|
write_assertion_to_wal(&journal, &assertion).await?;
|
|
}
|
|
})
|
|
}).collect();
|
|
|
|
futures::future::join_all(handles).await;
|
|
```
|
|
|
|
**Confidence Rating: 15%** — Concurrency testing is effectively zero.
|
|
|
|
---
|
|
|
|
### 3. Performance Validation (Severity: HIGH)
|
|
|
|
**Missing:**
|
|
- ❌ Throughput measurement (assertions/sec, votes/sec)
|
|
- ❌ Latency percentiles (p50, p99, p99.9)
|
|
- ❌ Memory allocation profiling
|
|
- ❌ Index scan performance under load
|
|
- ❌ VoteStore cache hit rate
|
|
- ❌ Materialized view staleness impact
|
|
|
|
**Why This Matters:**
|
|
The simulation uses **hardcoded sleep timers**:
|
|
```rust
|
|
// lib.rs:486
|
|
tokio::time::sleep(std::time::Duration::from_millis(config.ingestion_wait_ms)).await;
|
|
```
|
|
|
|
**Problem:** No verification that 500ms is sufficient. If ingestion lags, tests pass but assertions are missing.
|
|
|
|
**Actual Check:**
|
|
```rust
|
|
// lib.rs:378
|
|
assert!(result.is_success(), "Simulation should succeed: {:?}", result.errors);
|
|
assert_eq!(result.assertions_written, 14);
|
|
assert_eq!(result.assertions_verified, 10); // ⚠️ Only 10 of 14 verified!
|
|
```
|
|
|
|
**Wait, what?** The test writes 14 assertions but only verifies 10. The 4 extras are from Arena 2 (vote consensus/troll resistance), which are verified through lens resolution, not direct verification loop.
|
|
|
|
**BUT:** This is fragile. If ingestion is slow, `assertions_verified` could be 0 and the test would fail for the wrong reason.
|
|
|
|
**Recommendation:**
|
|
- Add `IngestWorker.wait_until_offset(target)` to block until ingestion catches up
|
|
- Measure ingestion lag: `current_offset - wal_tail_offset`
|
|
- Fail if lag exceeds threshold
|
|
|
|
**Confidence Rating: 20%** — Performance is not measured, only assumed.
|
|
|
|
---
|
|
|
|
### 4. API Integration (Severity: CRITICAL)
|
|
|
|
**Missing:**
|
|
- ❌ HTTP API endpoints (`stemedb-api` crate is completely untested)
|
|
- ❌ JSON serialization/deserialization
|
|
- ❌ OpenAPI spec validation
|
|
- ❌ Rate limiting enforcement
|
|
- ❌ Authentication/authorization
|
|
- ❌ Quota enforcement
|
|
|
|
**Why This Matters:**
|
|
The simulation tests **internal Rust APIs**:
|
|
```rust
|
|
// lib.rs:335
|
|
async fn write_assertion_to_wal(journal: &Arc<Mutex<Journal>>, assertion: &Assertion)
|
|
-> Result<Vec<u8>, String>
|
|
```
|
|
|
|
**Real clients will use HTTP:**
|
|
```bash
|
|
curl -X POST http://localhost:8080/assertions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"subject": "Tesla", "predicate": "revenue", "object": 96.7}'
|
|
```
|
|
|
|
**Gap:** No test verifies:
|
|
- JSON schema compatibility
|
|
- HTTP error responses (400, 500)
|
|
- Content-Type negotiation
|
|
- CORS headers
|
|
- Request timeout handling
|
|
|
|
**Recommendation:**
|
|
Create `tests/api_integration.rs`:
|
|
```rust
|
|
#[tokio::test]
|
|
async fn test_post_assertion_via_http() {
|
|
let app = stemedb_api::create_app(store).await;
|
|
let response = app
|
|
.oneshot(Request::builder()
|
|
.method("POST")
|
|
.uri("/assertions")
|
|
.header("content-type", "application/json")
|
|
.body(Body::from(r#"{"subject":"Tesla","predicate":"revenue"}"#))
|
|
.unwrap())
|
|
.await
|
|
.unwrap();
|
|
|
|
assert_eq!(response.status(), StatusCode::CREATED);
|
|
}
|
|
```
|
|
|
|
**Confidence Rating: 0%** — API layer is completely untested.
|
|
|
|
---
|
|
|
|
### 5. Edge Cases & Adversarial Inputs (Severity: HIGH)
|
|
|
|
**Missing:**
|
|
- ❌ Subject/predicate strings with UTF-8 edge cases (e.g., emoji, RTL text, null bytes)
|
|
- ❌ Extremely long subject strings (DoS via memory exhaustion)
|
|
- ❌ Malicious signatures (wrong length, invalid curve points)
|
|
- ❌ Votes with negative weights (should be rejected)
|
|
- ❌ Assertions with `confidence > 1.0` or `< 0.0`
|
|
- ❌ Circular parent_hash references
|
|
- ❌ Timestamp overflow (year 2038 problem)
|
|
|
|
**Why This Matters:**
|
|
The simulation creates **well-formed data**:
|
|
```rust
|
|
// lib.rs:459
|
|
let agent = &agents[tick % agents.len()];
|
|
let assertion = agent.sign_assertion(
|
|
&format!("Entity_{}", tick), // ⚠️ Always ASCII alphanumeric
|
|
"has_property",
|
|
ObjectValue::Text(format!("Value_{}", tick)),
|
|
);
|
|
```
|
|
|
|
**Real-World Inputs:**
|
|
- `subject: "🚗 Tesla\x00Inc"` (emoji + null byte)
|
|
- `subject: "a".repeat(1_000_000)` (1MB string, DoS)
|
|
- `weight: -999.9` (negative vote weight)
|
|
- `confidence: f32::NAN` (NaN value)
|
|
|
|
**Current Code:**
|
|
```rust
|
|
// worker.rs:256 - Signature verification (GOOD)
|
|
if assertion.signatures.is_empty() {
|
|
return Err(IngestError::InvalidSignature("must have at least one signature".to_string()));
|
|
}
|
|
```
|
|
|
|
**But NO validation for:**
|
|
```rust
|
|
if assertion.subject.len() > MAX_SUBJECT_LEN {
|
|
return Err(IngestError::InvalidInput("subject too long"));
|
|
}
|
|
if assertion.confidence < 0.0 || assertion.confidence > 1.0 {
|
|
return Err(IngestError::InvalidInput("confidence out of range"));
|
|
}
|
|
```
|
|
|
|
**Recommendation:**
|
|
Add defensive validation in `IngestWorker::ingest_assertion()`:
|
|
```rust
|
|
const MAX_SUBJECT_LEN: usize = 1024;
|
|
const MAX_PREDICATE_LEN: usize = 1024;
|
|
|
|
fn validate_assertion(a: &Assertion) -> Result<()> {
|
|
if a.subject.len() > MAX_SUBJECT_LEN {
|
|
return Err(IngestError::InvalidInput("subject too long".into()));
|
|
}
|
|
if a.confidence < 0.0 || a.confidence > 1.0 || a.confidence.is_nan() {
|
|
return Err(IngestError::InvalidInput("invalid confidence".into()));
|
|
}
|
|
// ... more checks
|
|
Ok(())
|
|
}
|
|
```
|
|
|
|
**Confidence Rating: 25%** — Edge case coverage is minimal.
|
|
|
|
---
|
|
|
|
### 6. Test Isolation (Severity: MEDIUM)
|
|
|
|
**Current Approach:**
|
|
```rust
|
|
// lib.rs:401-405
|
|
let temp_wal_dir = tempfile::tempdir().map_err(...)?;
|
|
let temp_db_dir = tempfile::tempdir().map_err(...)?;
|
|
```
|
|
|
|
**Problem:** Tests are isolated, but **not hermetic**. The simulation:
|
|
- Uses real filesystem I/O
|
|
- Uses real Ed25519 crypto (slow)
|
|
- Uses real sleep timers (flaky)
|
|
|
|
**Impact on CI:**
|
|
- Tests are slow (500ms+ sleep per scenario)
|
|
- Tests can be flaky if disk is slow
|
|
- Parallel test execution may interfere (disk I/O contention)
|
|
|
|
**Recommendation:**
|
|
- Mock `KVStore` trait for unit tests (use `HashMap` backend)
|
|
- Mock `Journal` for ingestion tests
|
|
- Use `tokio::time::pause()` to control time advancement
|
|
|
|
**Confidence Rating: 70%** — Isolation is good, but not hermetic.
|
|
|
|
---
|
|
|
|
## Specific File:Line Issues
|
|
|
|
### `/Users/jordanwashburn/Workspace/orchard9/stemedb/crates/stemedb-sim/src/lib.rs`
|
|
|
|
#### Line 97-111: `SimulationResult::is_success()` Logic Flaw
|
|
```rust
|
|
pub fn is_success(&self) -> bool {
|
|
self.errors.is_empty()
|
|
&& self.recency_test_passed
|
|
&& self.lifecycle_test_passed
|
|
&& self.audit_test_passed
|
|
&& self.vote_consensus_test_passed
|
|
&& self.troll_resistance_test_passed
|
|
}
|
|
```
|
|
|
|
**Issue:** No check for `assertions_written == assertions_verified`.
|
|
|
|
**Why:** Comment claims "Arena 2 tests write additional test assertions that are verified through the VoteAwareConsensusLens rather than the main verification loop."
|
|
|
|
**Problem:** This is confusing and fragile. If ingestion fails silently, `assertions_verified` could be 0 but the test passes because `errors.is_empty()`.
|
|
|
|
**Recommendation:**
|
|
```rust
|
|
pub fn is_success(&self) -> bool {
|
|
let base_assertions_verified = self.assertions_verified >= (self.tick_count as u64);
|
|
base_assertions_verified
|
|
&& self.errors.is_empty()
|
|
&& /* ... rest of checks */
|
|
}
|
|
```
|
|
|
|
**Severity:** MEDIUM
|
|
|
|
---
|
|
|
|
#### Line 279-281: Signature Message Format Mismatch Risk
|
|
```rust
|
|
// lib.rs:281
|
|
let message = format!("{}:{}", subject, predicate);
|
|
let signature: Signature = self.signing_key.sign(message.as_bytes());
|
|
```
|
|
|
|
**vs. worker.rs:268:**
|
|
```rust
|
|
// worker.rs:268
|
|
let message = format!("{}:{}", assertion.subject, assertion.predicate);
|
|
```
|
|
|
|
**Issue:** The signed message format is **implicitly coupled** across files. If one changes (e.g., adding `|{timestamp}`), signatures break.
|
|
|
|
**Recommendation:**
|
|
Extract to `stemedb-core/src/types.rs`:
|
|
```rust
|
|
impl Assertion {
|
|
pub fn signing_message(&self) -> String {
|
|
format!("{}:{}", self.subject, self.predicate)
|
|
}
|
|
}
|
|
```
|
|
|
|
**Severity:** MEDIUM
|
|
|
|
---
|
|
|
|
#### Line 486: Hardcoded Sleep Timer
|
|
```rust
|
|
tokio::time::sleep(std::time::Duration::from_millis(config.ingestion_wait_ms)).await;
|
|
```
|
|
|
|
**Issue:** No verification that ingestion actually completed. If system is slow, test fails with "assertion not found" instead of "ingestion timeout".
|
|
|
|
**Recommendation:**
|
|
```rust
|
|
async fn wait_for_ingestion(journal: &Journal, store: &impl KVStore, expected_count: usize) -> Result<(), String> {
|
|
let deadline = tokio::time::Instant::now() + Duration::from_secs(5);
|
|
loop {
|
|
let count = store.scan_prefix(b"H:").await?.len();
|
|
if count >= expected_count {
|
|
return Ok(());
|
|
}
|
|
if tokio::time::Instant::now() > deadline {
|
|
return Err(format!("Timeout: only {}/{} assertions ingested", count, expected_count));
|
|
}
|
|
tokio::time::sleep(Duration::from_millis(10)).await;
|
|
}
|
|
}
|
|
```
|
|
|
|
**Severity:** MEDIUM
|
|
|
|
---
|
|
|
|
#### Line 696, 820, 1073, 1260: Duplicate `tokio::time::sleep()` Calls
|
|
**Pattern:**
|
|
```rust
|
|
tokio::time::sleep(std::time::Duration::from_millis(300)).await;
|
|
```
|
|
|
|
**Issue:** Magic number `300` repeated 4+ times. No justification for why 300ms is correct.
|
|
|
|
**Recommendation:**
|
|
```rust
|
|
const INGESTION_SETTLE_TIME_MS: u64 = 300;
|
|
// ... then use:
|
|
tokio::time::sleep(Duration::from_millis(INGESTION_SETTLE_TIME_MS)).await;
|
|
```
|
|
|
|
**Severity:** LOW
|
|
|
|
---
|
|
|
|
### `/Users/jordanwashburn/Workspace/orchard9/stemedb/crates/stemedb-ingest/src/worker.rs`
|
|
|
|
#### Line 158-174: EOF Handling Ambiguity
|
|
```rust
|
|
let record = {
|
|
let journal = self.journal.lock().await;
|
|
match journal.read(self.current_offset) {
|
|
Ok(record) => record,
|
|
Err(stemedb_wal::QuarantineError::Io { .. }) => {
|
|
return Ok(0); // ⚠️ Assuming EOF
|
|
}
|
|
Err(stemedb_wal::QuarantineError::IoGeneric(e))
|
|
if e.kind() == std::io::ErrorKind::UnexpectedEof => {
|
|
return Ok(0); // ⚠️ Assuming EOF
|
|
}
|
|
Err(e) => return Err(IngestError::Wal(e)),
|
|
}
|
|
};
|
|
```
|
|
|
|
**Issue:** Both `QuarantineError::Io` and `UnexpectedEof` return `Ok(0)`, which means "no new data". But `QuarantineError::Io` could also mean:
|
|
- Disk read error
|
|
- Network mount failure
|
|
- Permission denied
|
|
|
|
**Problem:** Transient I/O errors are silently ignored, treated as EOF.
|
|
|
|
**Recommendation:**
|
|
```rust
|
|
Err(stemedb_wal::QuarantineError::Io { source, .. }) => {
|
|
if source.kind() == std::io::ErrorKind::UnexpectedEof {
|
|
return Ok(0); // True EOF
|
|
}
|
|
warn!(error = %source, "Transient I/O error, retrying");
|
|
return Ok(0); // Retry on next tick
|
|
}
|
|
```
|
|
|
|
**Severity:** HIGH
|
|
|
|
---
|
|
|
|
#### Line 191-192: Vote Count Cache Race Condition
|
|
```rust
|
|
// worker.rs:181-192 (in put_vote)
|
|
let current_count = match self.store.get(&count_key).await? {
|
|
Some(bytes) if bytes.len() == 8 => {
|
|
let arr: [u8; 8] = bytes.try_into().map_err(...)?;
|
|
u64::from_le_bytes(arr)
|
|
}
|
|
_ => 0,
|
|
};
|
|
let new_count = current_count.saturating_add(1);
|
|
self.store.put(&count_key, &new_count.to_le_bytes()).await?;
|
|
```
|
|
|
|
**Issue:** This is a **read-modify-write** race. If two IngestWorkers call `put_vote` concurrently:
|
|
1. Worker A reads `count=5`, Worker B reads `count=5`
|
|
2. Worker A writes `count=6`, Worker B writes `count=6`
|
|
3. Final count is 6, but should be 7 (lost update)
|
|
|
|
**Simulation doesn't test this** because IngestWorker runs sequentially.
|
|
|
|
**Recommendation:**
|
|
Use a CAS (compare-and-swap) operation or atomic counter:
|
|
```rust
|
|
// Requires adding atomic counter support to KVStore trait
|
|
let new_count = self.store.increment_counter(&count_key, 1).await?;
|
|
```
|
|
|
|
**Severity:** CRITICAL (if multiple ingestors are ever deployed)
|
|
|
|
---
|
|
|
|
### `/Users/jordanwashburn/Workspace/orchard9/stemedb/crates/stemedb-storage/src/vote_store.rs`
|
|
|
|
#### Line 183-192: Same Race Condition as Above
|
|
**Same code, same issue.** VoteStore is used by IngestWorker, so this is the root cause.
|
|
|
|
**Severity:** CRITICAL
|
|
|
|
---
|
|
|
|
### `/Users/jordanwashburn/Workspace/orchard9/stemedb/crates/stemedb-lens/src/vote_aware_consensus.rs`
|
|
|
|
#### Line 99-102: Defensive Empty Hash on Serialization Failure
|
|
```rust
|
|
fn compute_assertion_hash(assertion: &Assertion) -> Hash {
|
|
let bytes = match stemedb_core::serde::serialize(assertion) {
|
|
Ok(b) => b,
|
|
Err(_) => return [0u8; 32], // ⚠️ Silent failure
|
|
};
|
|
*blake3::hash(&bytes).as_bytes()
|
|
}
|
|
```
|
|
|
|
**Issue:** If serialization fails, returns `[0u8; 32]` (all zeros). This could cause:
|
|
- All failures to hash to the same value
|
|
- Votes to be looked up at wrong key
|
|
- Silent data corruption
|
|
|
|
**Recommendation:**
|
|
```rust
|
|
fn compute_assertion_hash(assertion: &Assertion) -> Result<Hash, String> {
|
|
let bytes = stemedb_core::serde::serialize(assertion)
|
|
.map_err(|e| format!("Failed to serialize assertion: {}", e))?;
|
|
Ok(*blake3::hash(&bytes).as_bytes())
|
|
}
|
|
```
|
|
|
|
Then propagate error up:
|
|
```rust
|
|
// line 134
|
|
let assertion_hash = match Self::compute_assertion_hash(assertion) {
|
|
Ok(h) => h,
|
|
Err(e) => {
|
|
debug!(error = %e, "Failed to hash assertion, skipping");
|
|
continue;
|
|
}
|
|
};
|
|
```
|
|
|
|
**Severity:** HIGH
|
|
|
|
---
|
|
|
|
### `/Users/jordanwashburn/Workspace/orchard9/stemedb/crates/stemedb-query/src/engine.rs`
|
|
|
|
#### Line 117-159: Staleness Check Has Time Skew Risk
|
|
```rust
|
|
if let Some(max_stale) = query.max_stale {
|
|
let now = SystemTime::now().duration_since(UNIX_EPOCH).map(|d| d.as_secs()).unwrap_or(0);
|
|
let age = now.saturating_sub(view.materialized_at);
|
|
|
|
if age > max_stale {
|
|
debug!("Materialized view is stale, falling back to slow path");
|
|
return Ok(None);
|
|
}
|
|
}
|
|
```
|
|
|
|
**Issue:** If server clock is set backwards, `now < view.materialized_at`, and `saturating_sub` returns 0, making all MVs appear fresh.
|
|
|
|
**Recommendation:**
|
|
```rust
|
|
let age = match now.checked_sub(view.materialized_at) {
|
|
Some(age) => age,
|
|
None => {
|
|
warn!(now, materialized_at = view.materialized_at,
|
|
"Clock skew detected: MV timestamp is in the future, falling back");
|
|
return Ok(None); // Reject future-dated MVs
|
|
}
|
|
};
|
|
```
|
|
|
|
**Severity:** MEDIUM
|
|
|
|
---
|
|
|
|
## Summary of Gaps by Severity
|
|
|
|
| Severity | Count | Examples |
|
|
|----------|-------|----------|
|
|
| CRITICAL | 4 | Vote cache race, API untested, failure mode gaps, concurrency gaps |
|
|
| HIGH | 3 | EOF handling, defensive hash, edge case coverage |
|
|
| MEDIUM | 4 | Time skew, verification logic, signature coupling, hardcoded sleeps |
|
|
| LOW | 1 | Magic numbers |
|
|
|
|
---
|
|
|
|
## Recommendations by Priority
|
|
|
|
### Immediate (Before Production)
|
|
|
|
1. **Fix vote cache race condition** (worker.rs:191, vote_store.rs:183)
|
|
- Use atomic counters or CAS operations
|
|
- Add concurrency test: 100 agents voting simultaneously
|
|
|
|
2. **Add API integration tests** (missing entirely)
|
|
- Create `stemedb-api/tests/http_endpoints.rs`
|
|
- Test JSON serialization, error codes, rate limiting
|
|
|
|
3. **Add failure mode tests** (missing entirely)
|
|
- Crash recovery: kill IngestWorker mid-step, verify cursor recovery
|
|
- WAL corruption: inject malformed record, verify quarantine
|
|
- Disk full: verify graceful degradation
|
|
|
|
### Short-Term (Next Sprint)
|
|
|
|
4. **Add concurrency stress test**
|
|
- 1000 agents writing assertions concurrently
|
|
- 100 queries running while ingestion is active
|
|
- Measure throughput and p99 latency
|
|
|
|
5. **Add input validation**
|
|
- Max length checks on subject/predicate
|
|
- Confidence range validation (0.0 to 1.0)
|
|
- Weight validation (no negatives)
|
|
|
|
6. **Fix defensive error handling**
|
|
- Replace `[0u8; 32]` fallback with proper error propagation (vote_aware_consensus.rs:102)
|
|
- Distinguish EOF from transient I/O errors (worker.rs:163)
|
|
|
|
### Long-Term (Next Quarter)
|
|
|
|
7. **Add performance benchmarks**
|
|
- Measure assertions/sec throughput
|
|
- Measure p50/p99/p99.9 latency
|
|
- Measure memory allocation per operation
|
|
|
|
8. **Add property-based tests**
|
|
- Use `proptest` to generate random assertions
|
|
- Verify invariants: all writes are eventually readable, votes always increase counts
|
|
|
|
9. **Add observability**
|
|
- Export metrics (assertions ingested, queries executed, cache hit rate)
|
|
- Add distributed tracing (OpenTelemetry)
|
|
|
|
---
|
|
|
|
## Conclusion
|
|
|
|
The Arena simulation successfully validates **basic integration** of the write and read paths. However, it provides a **false sense of confidence** because it:
|
|
|
|
1. **Only tests happy paths** — no failures, no corruption, no adversarial inputs
|
|
2. **Only tests sequential execution** — no concurrency, no race conditions
|
|
3. **Only tests internal APIs** — HTTP layer is completely untested
|
|
4. **Only tests well-formed data** — no edge cases, no DoS attempts
|
|
5. **Uses hardcoded timers** — no verification of actual ingestion completion
|
|
|
|
**Production Readiness: 58/100**
|
|
|
|
To reach production readiness (90+), the system needs:
|
|
- Failure mode testing (crash recovery, corruption handling)
|
|
- Concurrency testing (race conditions, deadlocks)
|
|
- API integration testing (HTTP, JSON, error codes)
|
|
- Input validation (length limits, range checks)
|
|
- Performance benchmarks (throughput, latency)
|
|
|
|
**Critical Blockers:**
|
|
1. Fix vote cache race condition (CRITICAL)
|
|
2. Add API integration tests (CRITICAL)
|
|
3. Add failure mode tests (CRITICAL)
|
|
|
|
**Estimated Effort to 90/100:** 3-4 weeks with 1 engineer focused on testing.
|
|
|
|
---
|
|
|
|
**BurntSushi's Verdict:** Ship the current codebase to a **closed beta** or **internal staging** environment, but **not to production**. The vote cache race alone could cause silent data corruption under load. Fix the critical issues first, then open to external users.
|