tidaldb/docs/planning/milestone-8/phase-6/task-04-performance-and-ci.md
jordan f4cfd6c81f feat: complete M8 replication primitives + forage enhancements + docs
Milestone 8 (phases 1-4):
- Shard-aware WAL segment naming, BatchHeader v2, ShardRouter
- Transport trait, InProcessTransport, WalShipper, FollowerDb
- HLC, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine
- Session replication bridge with SeqNo/HWM, idempotency store

Forage application:
- Multi-source discovery engine with MAB exploration
- Embedding-based label system, server handlers, UI refresh

Other:
- QUICKSTART.md, README.md, milestone-8 planning docs
- Hard negative union semantics, RLHF export enhancements
- Recovery benchmark and visibility test expansions
- Split 8 oversized source files per CODING_GUIDELINES §9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 13:17:19 -07:00

6.8 KiB

Task 04: Performance Assertions + CI Integration

Delivers

Performance assertions added to m8_uat.rs that verify: cross-region replication < 2s p99, failover < 10s, reconciliation overhead < 100ms. CI configuration ensuring M8 tests run on every PR without flakiness. A benchmark in tidal/benches/replication.rs for sustained 25K signals/sec throughput measurement.

Complexity: S

Dependencies

  • Task 03 (UAT scenario tests)

Technical Design

// tidal/tests/m8_uat.rs (additions)

/// Performance: cross-region replication latency < 2s p99.
///
/// Measures the latency from WAL write on leader to applied on follower.
/// Uses InProcessTransport (no real network). Asserts p99 < 2s.
#[tokio::test]
async fn perf_replication_latency_p99() {
    let cluster = SimulatedCluster::build(three_region_config()).await;

    let mut latencies_ns: Vec<u64> = Vec::with_capacity(1000);

    for i in 0u64..1000 {
        let item = EntityId::new(i);
        let before_ns = crate::util::now_ns();

        cluster.write_signal("view", item, 1.0);

        // Wait until eu-west follower has applied this specific event.
        cluster.await_event_applied(RegionId(1), before_ns, Duration::from_secs(3)).await;

        let after_ns = crate::util::now_ns();
        latencies_ns.push(after_ns - before_ns);
    }

    latencies_ns.sort_unstable();
    let p99_ns = latencies_ns[(latencies_ns.len() as f64 * 0.99) as usize];
    let p99_ms = p99_ns / 1_000_000;

    assert!(
        p99_ms < 2000,
        "replication latency p99 = {}ms, must be < 2000ms (in-process transport overhead)",
        p99_ms
    );

    println!("Replication latency: p50={}ms p99={}ms",
        latencies_ns[latencies_ns.len() / 2] / 1_000_000,
        p99_ms,
    );
}

/// Performance: failover completes in < 10 seconds.
#[tokio::test]
async fn perf_failover_under_10s() {
    let cluster = Arc::new(SimulatedCluster::build(three_region_config()).await);

    let start = Instant::now();
    let _crash = ShardCrash::crash(ShardId(0), cluster.clone(), false).await;

    while !cluster.has_leader() {
        tokio::time::sleep(Duration::from_millis(50)).await;
        assert!(
            start.elapsed() < Duration::from_secs(10),
            "failover must complete within 10 seconds"
        );
    }

    let elapsed = start.elapsed();
    println!("Failover completed in {}ms", elapsed.as_millis());
    assert!(elapsed < Duration::from_secs(10));
}

/// Performance: reconciliation overhead < 100ms for 10K events per side.
#[tokio::test]
async fn perf_reconciliation_overhead() {
    let cluster = SimulatedCluster::build(three_region_config()).await;

    // Inject partition.
    let partition = NetworkPartition::symmetric(
        RegionId(0), RegionId(2), cluster.transport_factory()
    );

    // Write 10K events on each side.
    for i in 0..10_000u64 {
        cluster.write_signal("view", EntityId::new(i), 1.0);
        cluster.node(RegionId(2)).db
            .signal("view", EntityId::new(i + 10_000), 1.0, Timestamp::now())
            .unwrap();
    }

    drop(partition); // Heal.

    let reconcile_start = Instant::now();
    cluster.reconcile_all().await;
    cluster.await_full_convergence(Duration::from_secs(10)).await;
    let reconcile_elapsed = reconcile_start.elapsed();

    println!("Reconciliation of 20K events took {}ms", reconcile_elapsed.as_millis());
    assert!(
        reconcile_elapsed < Duration::from_millis(100),
        "reconciliation overhead must be < 100ms for 20K total events (got {}ms)",
        reconcile_elapsed.as_millis()
    );
}
// tidal/benches/replication.rs

//! Replication throughput benchmark: sustained 25K signals/sec across 3 regions.

use criterion::{criterion_group, criterion_main, Criterion, Throughput};

fn bench_signal_throughput(c: &mut Criterion) {
    let rt = tokio::runtime::Runtime::new().unwrap();
    let cluster = rt.block_on(SimulatedCluster::build(three_region_config()));

    let mut group = c.benchmark_group("replication");
    group.throughput(Throughput::Elements(25_000));
    group.bench_function("25k_signals_per_sec", |b| {
        b.iter(|| {
            rt.block_on(async {
                for i in 0..25_000u64 {
                    cluster.write_signal("view", EntityId::new(i % 10_000), 1.0);
                }
                cluster.await_full_convergence(Duration::from_secs(5)).await;
            });
        });
    });
    group.finish();
}

criterion_group!(benches, bench_signal_throughput);
criterion_main!(benches);

CI Configuration

# .github/workflows/m8-tests.yml (or equivalent in the project's CI)

name: M8 Replication Tests

on:
  pull_request:
    paths:
      - 'tidal/src/replication/**'
      - 'tidal/src/testing/**'
      - 'tidal/tests/m8*'

jobs:
  m8-unit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@stable
      - run: cargo test --manifest-path tidal/Cargo.toml --lib --features test-utils

  m8-integration:
    runs-on: ubuntu-latest
    timeout-minutes: 5
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@stable
      - run: cargo test --manifest-path tidal/Cargo.toml --test m8_uat --features test-utils
      - run: cargo test --manifest-path tidal/Cargo.toml --test m8p2_replication --features test-utils
      - run: cargo test --manifest-path tidal/Cargo.toml --test m8p3_crdt --features test-utils
      - run: cargo test --manifest-path tidal/Cargo.toml --test m8p4_session --features test-utils
      - run: cargo test --manifest-path tidal/Cargo.toml --test m8p5_multitenancy --features test-utils

  clippy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@stable
        with:
          components: clippy, rustfmt
      - run: cargo clippy --manifest-path tidal/Cargo.toml -D warnings --features test-utils
      - run: cargo fmt --manifest-path tidal/Cargo.toml --check

Acceptance Criteria

  • perf_replication_latency_p99: 1000-sample p99 replication latency < 2000ms with InProcessTransport; prints p50 and p99
  • perf_failover_under_10s: leader election + follower promotion completes within 10 seconds; timing printed
  • perf_reconciliation_overhead: reconciliation of 20K total events (10K per side) completes in < 100ms; timing printed
  • benches/replication.rs: 25K signals/sec benchmark runs without panic; throughput number printed by criterion
  • CI configuration: M8 integration tests run on PRs that touch tidal/src/replication/** or tidal/tests/m8*; job timeout = 5 minutes
  • No flaky tests: run cargo test --test m8_uat 5 times in a row; all passes (deterministic due to InProcessTransport)
  • Total CI job runtime (all M8 integration tests) < 3 minutes
  • cargo clippy -D warnings and cargo fmt pass