stemedb/crates/stemedb-merkle/README.md
jordan 2b0923f20e feat: Distributed replication foundation (Phase 6A) - HLC, Merkle trees, CRDT stores, sync protocol
- Add Hybrid Logical Clock (HLC) for causality tracking across nodes
- Implement Merkle tree for efficient diff/sync with BLAKE3 hashing
- Add CRDT-aware stores for assertions and votes with vector clocks
- Create stemedb-sync crate with anti-entropy and gossip protocols
- Add stemedb-rpc crate with gRPC sync service (proto definitions)
- Implement SupersessionChain for tracking assertion lifecycles
- Add Aphoria application for code analysis/reporting
- Add battery11 replication test scaffolding
- Fix .gitignore to exclude nested target directories

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-02 19:31:54 -07:00

130 lines
4.1 KiB
Markdown

# stemedb-merkle
BLAKE3-based Merkle tree for append-only assertion diff detection in StemeDB.
## Overview
This crate provides an efficient Merkle tree implementation optimized for StemeDB's append-only assertion store. The primary use case is **incremental sync between distributed nodes** - quickly identifying which assertions differ between local and remote stores.
## Design Principles
- **Append-Only**: Trees grow monotonically with O(log N) insert performance
- **Content-Addressed**: Uses BLAKE3 for cryptographic hash verification
- **Efficient Diff**: O(log N) comparison to identify divergent subtrees
- **Zero-Copy Serialization**: Uses rkyv for fast persistence and network transfer
- **No unwrap/expect**: All operations return `Result` for defensive error handling
## Architecture
The tree is a binary Merkle tree where:
- **Leaves** contain assertion hashes (BLAKE3 digests)
- **Internal nodes** contain `BLAKE3(left_hash || right_hash)`
- **Root hash** represents the entire assertion set
```
root (BLAKE3(h12 || h34))
/ \
h12 (BLAKE3(h1 || h2)) h34 (BLAKE3(h3 || h4))
/ \ / \
h1 h2 h3 h4
| | | |
a1 a2 a3 a4 (assertion hashes)
```
## Example Usage
### Basic Tree Operations
```rust
use stemedb_merkle::MerkleTree;
// Create a tree and insert assertion hashes
let mut tree = MerkleTree::new();
tree.insert([1u8; 32]).expect("insert");
tree.insert([2u8; 32]).expect("insert");
tree.insert([3u8; 32]).expect("insert");
// Get root hash (O(1) - cached)
let root = tree.root().expect("root");
// Check tree size
assert_eq!(tree.len(), 3);
```
### Incremental Sync (Fast Diff)
```rust
use stemedb_merkle::{MerkleTree, DiffResult, roots_equal};
let mut local = MerkleTree::new();
local.insert([1u8; 32]).expect("insert");
local.insert([2u8; 32]).expect("insert");
let mut remote = MerkleTree::new();
remote.insert([1u8; 32]).expect("insert");
remote.insert([2u8; 32]).expect("insert");
remote.insert([3u8; 32]).expect("insert"); // New assertion
remote.insert([4u8; 32]).expect("insert"); // New assertion
// Quick check: do we need to sync? (O(1))
if !roots_equal(&local, &remote) {
// Find what remote has that local doesn't (O(N))
let diff = DiffResult::diff(&local, &remote);
println!("Need to fetch {} assertions", diff.len());
// Request missing assertions: [3, 4]
for hash in diff.missing_hashes {
// fetch_assertion(hash)...
}
}
```
### Persistence (Crash Recovery)
```rust
use stemedb_merkle::{MerkleTree, serialize::{serialize_tree, deserialize_tree}};
let mut tree = MerkleTree::new();
tree.insert([1u8; 32]).expect("insert");
tree.insert([2u8; 32]).expect("insert");
// Serialize to disk
let bytes = serialize_tree(&tree).expect("serialize");
std::fs::write("merkle_tree.bin", &bytes).expect("write");
// Restore after crash
let bytes = std::fs::read("merkle_tree.bin").expect("read");
let recovered = deserialize_tree(&bytes).expect("deserialize");
assert_eq!(tree.root(), recovered.root());
```
## Performance Characteristics
| Operation | Complexity | Notes |
|-----------|------------|-------|
| Insert | O(log N) | Recompute path from leaf to root |
| Root | O(1) | Cached after each insert |
| Diff | O(N) | Set-based comparison of leaves |
| Serialize | O(N) | Write all leaves to bytes |
| Deserialize | O(N log N) | Rebuild tree from leaves |
## Future Optimizations
For very large trees (millions of assertions), we plan to implement:
- **Subtree-based diff**: Skip identical subtrees by comparing intermediate hashes
- Reduces diff from O(N) to O(diff_size * log N)
- **Incremental serialization**: Only persist changes since last checkpoint
- **Range queries**: Find assertions inserted between timestamps
## Integration with StemeDB
This crate will be used by:
1. **Write-ahead log (WAL)**: Build Merkle tree as assertions are appended
2. **Replication**: Exchange root hashes to detect drift, then sync missing data
3. **Checkpointing**: Persist tree state for fast bootstrap after restart
See `docs/research/distributed-write-path.md` for architecture details.