stemedb/docs/consistency-model.md
jordan d3a88585fe feat: Phase 6 UAT - Admission control, HLC recency, cluster coordination
This commit includes comprehensive work on Phase 6 features:

## Admission Control (Phase 6 admission middleware)
- AdmissionStore implementation backed by TrustRankStore
- PoW verification with tier-based difficulty computation
- Trust tier progression (Newcomer → Established → Trusted → Authority)
- API integration with admission status endpoints

## HLC Recency Lens (Phase 6C)
- HlcRecencyLens for distributed system ordering
- Hybrid logical clock integration with causality preservation

## Cluster Coordination (Phase 6C)
- Multi-node cluster tests (availability, partition tolerance)
- CRDT convergence tests for anti-entropy sync
- Gateway handler improvements

## Aphoria Code Linter (Phase 2A)
- RFC/OWASP corpus builders with network fetching and caching
- Concept hierarchy with auto-alias creation on conflict detection
- Multiple security extractors (TLS, JWT, CORS, secrets, rate limiting)

## Code Organization
- Split large files into modules to comply with 500-line limit
- Improved test organization with separate test modules
- Fixed rkyv serialization for EigenTrustState (AgentScore struct)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 00:43:37 -07:00

7.8 KiB

StemeDB Consistency Model

This document describes the distributed consistency guarantees provided by StemeDB, the mechanisms that enforce them, and what is explicitly not guaranteed.

Six Core Properties

Property Guarantee Mechanism Test Evidence
Eventual Convergence All replicas converge to identical state CRDT merge + Anti-entropy sync stemedb-sync/tests/convergence.rs
Causal Ordering Operations respect happens-before HLC timestamps + HlcRecencyLens stemedb-lens/src/hlc_recency.rs
Partition Tolerance Writes succeed during network partitions Leaderless replication stemedb-cluster/tests/partition_tolerance.rs
Availability Reads/writes succeed if any replica is up Any-replica acceptance stemedb-cluster/tests/availability.rs
Durability Committed writes survive crashes WAL with fsync stemedb-wal/src/lib.rs
Conflict Resolution Deterministic winner selection Lens-based resolution stemedb-lens/src/*.rs

What IS Guaranteed

1. Eventual Convergence

All nodes eventually contain the same set of assertions. After network partitions heal and anti-entropy sync completes, every replica has identical data.

Mechanism:

  • CRDT (Conflict-free Replicated Data Type) stores for assertions and votes
  • Merkle tree-based diff detection for efficient sync
  • Anti-entropy worker periodically syncs with peers

Timing:

  • Convergence typically occurs within seconds of partition healing
  • Configurable anti_entropy_interval (default: 5 seconds)
  • Metrics available via AntiEntropyWorker::avg_convergence_duration_ms()

2. Causal Ordering

Operations that happen-before other operations are ordered correctly. If assertion A causally precedes assertion B, any node that has B also has A.

Mechanism:

  • Hybrid Logical Clock (HLC) timestamps on every assertion
  • HLC propagates through anti-entropy sync
  • HlcRecencyLens resolves "most recent" deterministically using HLC, not wall clock

Key insight: Wall clocks can drift between nodes. HLC combines physical time with logical ordering to provide a total order even when clocks disagree.

3. Partition Tolerance

Writes continue on both sides of a network partition. No data is lost - both partitions' writes survive and merge after healing.

Mechanism:

  • Leaderless replication: any replica accepts writes
  • Append-only storage: writes never conflict (coexist)
  • Lens resolution at read time, not write time

4. High Availability

If any replica for a shard is reachable, reads and writes succeed. There is no single point of failure.

Mechanism:

  • Multiple replicas per shard (configurable replication factor)
  • Writes accepted by any replica
  • Reads served by any replica with current data

5. Durability

Once a write is acknowledged, it survives process crashes and restarts.

Mechanism:

  • Write-ahead log (WAL) with fsync
  • Assertion data written to durable storage before acknowledgment
  • Crash recovery replays uncommitted WAL entries

6. Deterministic Conflict Resolution

When multiple assertions exist for the same subject+predicate, all nodes resolve to the same winner.

Mechanism:

  • Lenses provide resolution strategies:
    • HlcRecencyLens: Latest HLC timestamp wins (total order)
    • ConsensusLens: Most common value wins
    • ConfidenceLens: Highest confidence wins
    • TrustAwareAuthorityLens: Weighted by source reputation
  • Tiebreaker: source_hash provides deterministic ordering when primary criteria match

What is NOT Guaranteed

1. Linearizability

StemeDB is not linearizable. A write on node A is not immediately visible on node B.

Why: Linearizability requires synchronous replication, which conflicts with partition tolerance and availability.

Workaround: Use HLC timestamps to establish order. If your use case requires seeing your own writes immediately, read from the node you wrote to.

2. Read-Your-Writes (Cross-Node)

After writing to node A, a read from node B may not see the write immediately.

Why: Anti-entropy sync is asynchronous to optimize for availability.

Workaround:

  • Sticky sessions (always read from the node you wrote to)
  • Wait for anti-entropy sync to complete (typically <10 seconds)
  • Use gossip for faster propagation of new writes

3. Snapshot Isolation

Concurrent reads may see different subsets of data.

Why: There is no global transaction coordinator.

Workaround: For consistent snapshots, use epoch-aware lenses that filter to a specific epoch.

4. Strong Consistency

There is no guarantee that all nodes see operations in the same order at the same time.

Why: This would require coordination, violating the CAP theorem's availability guarantee.

Clock Skew Handling

HLC Design

HLC timestamps combine:

  • Physical time: NTP64 format (nanoseconds since Unix epoch)
  • Logical counter: Disambiguates events with same physical time
  • Node ID: Breaks ties when counter and time match

Skew Detection

The system detects clock skew exceeding 500ms:

  • detect_clock_skew() compares local and remote HLC timestamps
  • clock_skew_events metric tracks skew occurrences
  • Warning logged when skew exceeds threshold

Recommendations

  1. Use NTP: All nodes should synchronize clocks via NTP
  2. Monitor skew: Track clock_skew_events metric
  3. Tolerate drift: HLC handles moderate skew (< seconds) gracefully
  4. Investigate large skew: Skew > 1 second may indicate NTP misconfiguration

Recovery Scenarios

Partition Heal

  1. Anti-entropy detects divergent Merkle roots
  2. Diff computed to find missing assertions
  3. Missing assertions fetched and merged via CRDT
  4. Local HLC updated from remote timestamps
  5. Convergence achieved when roots match

Metric: avg_convergence_duration_ms() tracks time from divergence detection to convergence.

Node Crash

  1. On restart, WAL is replayed
  2. Uncommitted entries are re-applied
  3. Merkle tree rebuilt from stored assertions
  4. Anti-entropy resumes syncing with peers

Corrupt WAL

  1. Corrupted entries detected via checksum
  2. Valid entries up to corruption point recovered
  3. Node syncs missing data from peers via anti-entropy

Testing Evidence

All consistency properties are verified by automated tests:

Test File Property Tested
crates/stemedb-sync/tests/convergence.rs Two-node convergence, overlapping data, lens determinism, merge commutativity
crates/stemedb-cluster/tests/partition_tolerance.rs Write success during partition, post-partition convergence, concurrent writes
crates/stemedb-cluster/tests/availability.rs Read/write on any replica, node failure isolation, quorum availability
crates/stemedb-lens/src/hlc_recency.rs HLC ordering, clock skew scenarios, deterministic tiebreakers

Run all consistency tests:

cargo test -p stemedb-sync --test convergence
cargo test -p stemedb-cluster --test partition_tolerance
cargo test -p stemedb-cluster --test availability
cargo test -p stemedb-lens -- hlc_recency

Metrics Reference

Metric Location Description
sync_cycles AntiEntropyWorker Completed sync cycles
sync_failures AntiEntropyWorker Failed sync attempts
assertions_synced AntiEntropyWorker Total assertions merged
hlc_updates AntiEntropyWorker Times local HLC advanced from remote
clock_skew_events AntiEntropyWorker Times skew exceeded 500ms
convergence_count() AntiEntropyWorker Number of convergence events
avg_convergence_duration_ms() AntiEntropyWorker Average time to converge

See Also