This commit includes comprehensive work on Phase 6 features: ## Admission Control (Phase 6 admission middleware) - AdmissionStore implementation backed by TrustRankStore - PoW verification with tier-based difficulty computation - Trust tier progression (Newcomer → Established → Trusted → Authority) - API integration with admission status endpoints ## HLC Recency Lens (Phase 6C) - HlcRecencyLens for distributed system ordering - Hybrid logical clock integration with causality preservation ## Cluster Coordination (Phase 6C) - Multi-node cluster tests (availability, partition tolerance) - CRDT convergence tests for anti-entropy sync - Gateway handler improvements ## Aphoria Code Linter (Phase 2A) - RFC/OWASP corpus builders with network fetching and caching - Concept hierarchy with auto-alias creation on conflict detection - Multiple security extractors (TLS, JWT, CORS, secrets, rate limiting) ## Code Organization - Split large files into modules to comply with 500-line limit - Improved test organization with separate test modules - Fixed rkyv serialization for EigenTrustState (AgentScore struct) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
7.8 KiB
StemeDB Consistency Model
This document describes the distributed consistency guarantees provided by StemeDB, the mechanisms that enforce them, and what is explicitly not guaranteed.
Six Core Properties
| Property | Guarantee | Mechanism | Test Evidence |
|---|---|---|---|
| Eventual Convergence | All replicas converge to identical state | CRDT merge + Anti-entropy sync | stemedb-sync/tests/convergence.rs |
| Causal Ordering | Operations respect happens-before | HLC timestamps + HlcRecencyLens |
stemedb-lens/src/hlc_recency.rs |
| Partition Tolerance | Writes succeed during network partitions | Leaderless replication | stemedb-cluster/tests/partition_tolerance.rs |
| Availability | Reads/writes succeed if any replica is up | Any-replica acceptance | stemedb-cluster/tests/availability.rs |
| Durability | Committed writes survive crashes | WAL with fsync | stemedb-wal/src/lib.rs |
| Conflict Resolution | Deterministic winner selection | Lens-based resolution | stemedb-lens/src/*.rs |
What IS Guaranteed
1. Eventual Convergence
All nodes eventually contain the same set of assertions. After network partitions heal and anti-entropy sync completes, every replica has identical data.
Mechanism:
- CRDT (Conflict-free Replicated Data Type) stores for assertions and votes
- Merkle tree-based diff detection for efficient sync
- Anti-entropy worker periodically syncs with peers
Timing:
- Convergence typically occurs within seconds of partition healing
- Configurable
anti_entropy_interval(default: 5 seconds) - Metrics available via
AntiEntropyWorker::avg_convergence_duration_ms()
2. Causal Ordering
Operations that happen-before other operations are ordered correctly. If assertion A causally precedes assertion B, any node that has B also has A.
Mechanism:
- Hybrid Logical Clock (HLC) timestamps on every assertion
- HLC propagates through anti-entropy sync
HlcRecencyLensresolves "most recent" deterministically using HLC, not wall clock
Key insight: Wall clocks can drift between nodes. HLC combines physical time with logical ordering to provide a total order even when clocks disagree.
3. Partition Tolerance
Writes continue on both sides of a network partition. No data is lost - both partitions' writes survive and merge after healing.
Mechanism:
- Leaderless replication: any replica accepts writes
- Append-only storage: writes never conflict (coexist)
- Lens resolution at read time, not write time
4. High Availability
If any replica for a shard is reachable, reads and writes succeed. There is no single point of failure.
Mechanism:
- Multiple replicas per shard (configurable replication factor)
- Writes accepted by any replica
- Reads served by any replica with current data
5. Durability
Once a write is acknowledged, it survives process crashes and restarts.
Mechanism:
- Write-ahead log (WAL) with fsync
- Assertion data written to durable storage before acknowledgment
- Crash recovery replays uncommitted WAL entries
6. Deterministic Conflict Resolution
When multiple assertions exist for the same subject+predicate, all nodes resolve to the same winner.
Mechanism:
- Lenses provide resolution strategies:
HlcRecencyLens: Latest HLC timestamp wins (total order)ConsensusLens: Most common value winsConfidenceLens: Highest confidence winsTrustAwareAuthorityLens: Weighted by source reputation
- Tiebreaker:
source_hashprovides deterministic ordering when primary criteria match
What is NOT Guaranteed
1. Linearizability
StemeDB is not linearizable. A write on node A is not immediately visible on node B.
Why: Linearizability requires synchronous replication, which conflicts with partition tolerance and availability.
Workaround: Use HLC timestamps to establish order. If your use case requires seeing your own writes immediately, read from the node you wrote to.
2. Read-Your-Writes (Cross-Node)
After writing to node A, a read from node B may not see the write immediately.
Why: Anti-entropy sync is asynchronous to optimize for availability.
Workaround:
- Sticky sessions (always read from the node you wrote to)
- Wait for anti-entropy sync to complete (typically <10 seconds)
- Use gossip for faster propagation of new writes
3. Snapshot Isolation
Concurrent reads may see different subsets of data.
Why: There is no global transaction coordinator.
Workaround: For consistent snapshots, use epoch-aware lenses that filter to a specific epoch.
4. Strong Consistency
There is no guarantee that all nodes see operations in the same order at the same time.
Why: This would require coordination, violating the CAP theorem's availability guarantee.
Clock Skew Handling
HLC Design
HLC timestamps combine:
- Physical time: NTP64 format (nanoseconds since Unix epoch)
- Logical counter: Disambiguates events with same physical time
- Node ID: Breaks ties when counter and time match
Skew Detection
The system detects clock skew exceeding 500ms:
detect_clock_skew()compares local and remote HLC timestampsclock_skew_eventsmetric tracks skew occurrences- Warning logged when skew exceeds threshold
Recommendations
- Use NTP: All nodes should synchronize clocks via NTP
- Monitor skew: Track
clock_skew_eventsmetric - Tolerate drift: HLC handles moderate skew (< seconds) gracefully
- Investigate large skew: Skew > 1 second may indicate NTP misconfiguration
Recovery Scenarios
Partition Heal
- Anti-entropy detects divergent Merkle roots
- Diff computed to find missing assertions
- Missing assertions fetched and merged via CRDT
- Local HLC updated from remote timestamps
- Convergence achieved when roots match
Metric: avg_convergence_duration_ms() tracks time from divergence detection to convergence.
Node Crash
- On restart, WAL is replayed
- Uncommitted entries are re-applied
- Merkle tree rebuilt from stored assertions
- Anti-entropy resumes syncing with peers
Corrupt WAL
- Corrupted entries detected via checksum
- Valid entries up to corruption point recovered
- Node syncs missing data from peers via anti-entropy
Testing Evidence
All consistency properties are verified by automated tests:
| Test File | Property Tested |
|---|---|
crates/stemedb-sync/tests/convergence.rs |
Two-node convergence, overlapping data, lens determinism, merge commutativity |
crates/stemedb-cluster/tests/partition_tolerance.rs |
Write success during partition, post-partition convergence, concurrent writes |
crates/stemedb-cluster/tests/availability.rs |
Read/write on any replica, node failure isolation, quorum availability |
crates/stemedb-lens/src/hlc_recency.rs |
HLC ordering, clock skew scenarios, deterministic tiebreakers |
Run all consistency tests:
cargo test -p stemedb-sync --test convergence
cargo test -p stemedb-cluster --test partition_tolerance
cargo test -p stemedb-cluster --test availability
cargo test -p stemedb-lens -- hlc_recency
Metrics Reference
| Metric | Location | Description |
|---|---|---|
sync_cycles |
AntiEntropyWorker |
Completed sync cycles |
sync_failures |
AntiEntropyWorker |
Failed sync attempts |
assertions_synced |
AntiEntropyWorker |
Total assertions merged |
hlc_updates |
AntiEntropyWorker |
Times local HLC advanced from remote |
clock_skew_events |
AntiEntropyWorker |
Times skew exceeded 500ms |
convergence_count() |
AntiEntropyWorker |
Number of convergence events |
avg_convergence_duration_ms() |
AntiEntropyWorker |
Average time to converge |