- Add Hybrid Logical Clock (HLC) for causality tracking across nodes - Implement Merkle tree for efficient diff/sync with BLAKE3 hashing - Add CRDT-aware stores for assertions and votes with vector clocks - Create stemedb-sync crate with anti-entropy and gossip protocols - Add stemedb-rpc crate with gRPC sync service (proto definitions) - Implement SupersessionChain for tracking assertion lifecycles - Add Aphoria application for code analysis/reporting - Add battery11 replication test scaffolding - Fix .gitignore to exclude nested target directories Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
130 lines
4.1 KiB
Markdown
130 lines
4.1 KiB
Markdown
# stemedb-merkle
|
|
|
|
BLAKE3-based Merkle tree for append-only assertion diff detection in StemeDB.
|
|
|
|
## Overview
|
|
|
|
This crate provides an efficient Merkle tree implementation optimized for StemeDB's append-only assertion store. The primary use case is **incremental sync between distributed nodes** - quickly identifying which assertions differ between local and remote stores.
|
|
|
|
## Design Principles
|
|
|
|
- **Append-Only**: Trees grow monotonically with O(log N) insert performance
|
|
- **Content-Addressed**: Uses BLAKE3 for cryptographic hash verification
|
|
- **Efficient Diff**: O(log N) comparison to identify divergent subtrees
|
|
- **Zero-Copy Serialization**: Uses rkyv for fast persistence and network transfer
|
|
- **No unwrap/expect**: All operations return `Result` for defensive error handling
|
|
|
|
## Architecture
|
|
|
|
The tree is a binary Merkle tree where:
|
|
- **Leaves** contain assertion hashes (BLAKE3 digests)
|
|
- **Internal nodes** contain `BLAKE3(left_hash || right_hash)`
|
|
- **Root hash** represents the entire assertion set
|
|
|
|
```
|
|
root (BLAKE3(h12 || h34))
|
|
/ \
|
|
h12 (BLAKE3(h1 || h2)) h34 (BLAKE3(h3 || h4))
|
|
/ \ / \
|
|
h1 h2 h3 h4
|
|
| | | |
|
|
a1 a2 a3 a4 (assertion hashes)
|
|
```
|
|
|
|
## Example Usage
|
|
|
|
### Basic Tree Operations
|
|
|
|
```rust
|
|
use stemedb_merkle::MerkleTree;
|
|
|
|
// Create a tree and insert assertion hashes
|
|
let mut tree = MerkleTree::new();
|
|
tree.insert([1u8; 32]).expect("insert");
|
|
tree.insert([2u8; 32]).expect("insert");
|
|
tree.insert([3u8; 32]).expect("insert");
|
|
|
|
// Get root hash (O(1) - cached)
|
|
let root = tree.root().expect("root");
|
|
|
|
// Check tree size
|
|
assert_eq!(tree.len(), 3);
|
|
```
|
|
|
|
### Incremental Sync (Fast Diff)
|
|
|
|
```rust
|
|
use stemedb_merkle::{MerkleTree, DiffResult, roots_equal};
|
|
|
|
let mut local = MerkleTree::new();
|
|
local.insert([1u8; 32]).expect("insert");
|
|
local.insert([2u8; 32]).expect("insert");
|
|
|
|
let mut remote = MerkleTree::new();
|
|
remote.insert([1u8; 32]).expect("insert");
|
|
remote.insert([2u8; 32]).expect("insert");
|
|
remote.insert([3u8; 32]).expect("insert"); // New assertion
|
|
remote.insert([4u8; 32]).expect("insert"); // New assertion
|
|
|
|
// Quick check: do we need to sync? (O(1))
|
|
if !roots_equal(&local, &remote) {
|
|
// Find what remote has that local doesn't (O(N))
|
|
let diff = DiffResult::diff(&local, &remote);
|
|
|
|
println!("Need to fetch {} assertions", diff.len());
|
|
// Request missing assertions: [3, 4]
|
|
for hash in diff.missing_hashes {
|
|
// fetch_assertion(hash)...
|
|
}
|
|
}
|
|
```
|
|
|
|
### Persistence (Crash Recovery)
|
|
|
|
```rust
|
|
use stemedb_merkle::{MerkleTree, serialize::{serialize_tree, deserialize_tree}};
|
|
|
|
let mut tree = MerkleTree::new();
|
|
tree.insert([1u8; 32]).expect("insert");
|
|
tree.insert([2u8; 32]).expect("insert");
|
|
|
|
// Serialize to disk
|
|
let bytes = serialize_tree(&tree).expect("serialize");
|
|
std::fs::write("merkle_tree.bin", &bytes).expect("write");
|
|
|
|
// Restore after crash
|
|
let bytes = std::fs::read("merkle_tree.bin").expect("read");
|
|
let recovered = deserialize_tree(&bytes).expect("deserialize");
|
|
|
|
assert_eq!(tree.root(), recovered.root());
|
|
```
|
|
|
|
## Performance Characteristics
|
|
|
|
| Operation | Complexity | Notes |
|
|
|-----------|------------|-------|
|
|
| Insert | O(log N) | Recompute path from leaf to root |
|
|
| Root | O(1) | Cached after each insert |
|
|
| Diff | O(N) | Set-based comparison of leaves |
|
|
| Serialize | O(N) | Write all leaves to bytes |
|
|
| Deserialize | O(N log N) | Rebuild tree from leaves |
|
|
|
|
## Future Optimizations
|
|
|
|
For very large trees (millions of assertions), we plan to implement:
|
|
|
|
- **Subtree-based diff**: Skip identical subtrees by comparing intermediate hashes
|
|
- Reduces diff from O(N) to O(diff_size * log N)
|
|
- **Incremental serialization**: Only persist changes since last checkpoint
|
|
- **Range queries**: Find assertions inserted between timestamps
|
|
|
|
## Integration with StemeDB
|
|
|
|
This crate will be used by:
|
|
|
|
1. **Write-ahead log (WAL)**: Build Merkle tree as assertions are appended
|
|
2. **Replication**: Exchange root hashes to detect drift, then sync missing data
|
|
3. **Checkpointing**: Persist tree state for fast bootstrap after restart
|
|
|
|
See `docs/research/distributed-write-path.md` for architecture details.
|