# Chaos Testing (Phase 8A) The `stemedb-chaos` crate provides infrastructure for testing Episteme distributed clusters under failure conditions. ## Overview Chaos testing verifies that Episteme clusters: - Continue accepting writes during network partitions - Converge correctly after partition heals - Handle node failures and recovery - Maintain CRDT invariants under all conditions - Handle clock skew correctly with HLC timestamps ## Components ### Test Harness | Component | Purpose | |-----------|---------| | `ChaosNode` | Simulated cluster node with fault injection support | | `TestCluster` | Manages N ChaosNodes with shared fault controllers | ### Fault Injection | Controller | Capabilities | |------------|--------------| | `NetworkController` | Partitions, latency, message drops | | `ClockController` | Clock skew injection for HLC testing | ### CRDT Property Verification | Function | Verifies | |----------|----------| | `verify_commutativity()` | `merge(A, B) = merge(B, A)` | | `verify_associativity()` | `(A merge B) merge C = A merge (B merge C)` | | `verify_idempotence()` | `merge(A, A) = A` | ## Running Chaos Tests ```bash # All chaos tests cargo test -p stemedb-chaos # Partition tests only cargo test -p stemedb-chaos --test partition_tests # Consistency tests only cargo test -p stemedb-chaos --test consistency_tests # Unit tests only cargo test -p stemedb-chaos --lib ``` ## Test Categories ### Partition Tests (8 tests) | Test | Scenario | |------|----------| | `test_5_node_kill_2_convergence` | 5-node cluster survives 2 node failures | | `test_partition_between_groups_convergence` | [0,1,2] vs [3,4] partition and heal | | `test_message_reordering_convergence` | 100 writes in random order converge | | `test_message_duplication_idempotent` | Repeated syncs don't create duplicates | | `test_cascading_failure_recovery` | Sequential node failures and recovery | | `test_swim_suspicion_not_false_positive` | Slow node marked Suspect, then Alive | | `test_asymmetric_partition` | One-way partition (0→1 works, 1→0 blocked) | | `test_write_availability_during_partition` | All nodes can write when fully partitioned | ### Consistency Tests (11 tests) | Test | Scenario | |------|----------| | `test_crdt_eventual_consistency` | 1000 concurrent writes across 5 nodes | | `test_crdt_commutativity` | Different merge orders produce same result | | `test_crdt_associativity` | Merge grouping doesn't affect result | | `test_crdt_idempotence` | Syncing same data repeatedly is stable | | `test_hlc_handles_clock_skew` | ±5 second skew still converges | | `test_hlc_monotonic_under_partition` | HLC remains monotonic during partition | | `test_supersession_ordering_with_clock_skew` | HLC ordering with 2s skew | | `test_concurrent_writes_same_subject_under_partition` | Both writes survive (append-only) | | `test_large_merkle_diff_eventual_convergence` | 1500 vs 500 assertions converge | | `test_all_crdt_properties` | Property-based verification | | `test_eventual_consistency_property` | Eventual consistency verification | ## Example Usage ### Basic Cluster Test ```rust use stemedb_chaos::TestCluster; #[tokio::test] async fn test_basic_convergence() { let mut cluster = TestCluster::spawn(3).await.expect("spawn"); // Write to node 0 cluster.get_node_mut(0) .write_assertion("subject", "pred", 1000) .await.expect("write"); // Sync all nodes cluster.sync_all().await.expect("sync"); // Verify convergence cluster.assert_converged(); } ``` ### Partition Testing ```rust use stemedb_chaos::TestCluster; #[tokio::test] async fn test_partition() { let mut cluster = TestCluster::spawn(4).await.expect("spawn"); // Create partition: [0,1] vs [2,3] cluster.network().partition(&[0, 1], &[2, 3]); // Write to both sides cluster.get_node_mut(0).write_assertion("a", "pred", 1000).await.expect("write"); cluster.get_node_mut(2).write_assertion("b", "pred", 2000).await.expect("write"); // Heal and sync cluster.network().heal(); cluster.sync_all().await.expect("sync"); // Both writes survive cluster.assert_converged(); assert_eq!(cluster.get_node(0).assertion_count(), 2); } ``` ### Clock Skew Testing ```rust use stemedb_chaos::TestCluster; #[tokio::test] async fn test_clock_skew() { let mut cluster = TestCluster::spawn(2).await.expect("spawn"); // Inject +5 second skew on node 0 cluster.clock().inject_skew(0, 5000); // Verify skew is detected assert!(cluster.clock().has_significant_skew(0, 1)); // Write with skewed timestamps cluster.get_node_mut(0).write_assertion("skewed", "pred", 1000).await.expect("write"); // Cluster still converges cluster.sync_all().await.expect("sync"); cluster.assert_converged(); } ``` ## Architecture ``` TestCluster ├── nodes: Vec ├── network: Arc └── clock: Arc ChaosNode ├── crdt_store: CrdtAssertionStore ├── merkle_tree: MerkleTree ├── hash_to_data: HashMap ├── hlc: SkewedHlc (respects ClockController) └── alive: bool (kill/revive simulation) NetworkController ├── partitions: DashMap<(from, to), bool> ├── latencies: DashMap<(from, to), Duration> └── drop_rates: DashMap<(from, to), f64> ClockController ├── node_offsets: DashMap └── global_offset_ms: AtomicI64 ``` ## Design Decisions ### Channel-Based vs iptables/tc **Chosen: Channel-based interception** - Aligns with existing `SimNode` pattern in `partition_tolerance.rs` - Deterministic and CI-friendly (no elevated privileges) - Production code stays unchanged - Real network tests can be added later as optional e2e suite ### Sync Semantics - `sync_from()` on ChaosNode checks partition state before syncing - `sync_all()` on TestCluster does full mesh sync respecting partitions - Content-addressed storage ensures idempotent merges ## Metrics The controllers track: - `messages_dropped`: Total messages dropped (partition + drop rate) - `messages_delayed`: Total messages delayed (latency) - `partition_events`: Number of partition operations ```rust let summary = cluster.summary(); println!("Dropped: {}", summary.messages_dropped); println!("Delayed: {}", summary.messages_delayed); println!("Max skew: {}ms", summary.max_clock_skew_ms); ``` ## Related Documentation - [Architecture](../../architecture.md) - Overall system design - [Distributed Write Path](../../docs/research/distributed-write-path.md) - CRDT replication - [Phase 6 UAT](./phase6-uat.md) - Cluster coordination tests