Add CRC32C checksums to WAL record format (v2), implement crash recovery with automatic truncation of corrupt records, add feature-gated group commit buffer for batched fsync under concurrent load, and implement log rotation via segment files with global offset addressing. Key changes: - Record format v2: [len:u32][crc32c:u32][blake3:32][payload:N] - recover_file() scans and truncates corrupt tail records - GroupCommitBuffer batches fsync via MPSC channel (tokio feature gate) - SegmentManager with binary search resolution and cursor-based cleanup - Journal::read() auto-refreshes segments on miss for writer/reader split - Split recovery.rs and key_codec.rs into directory modules for 500-line max Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
142 lines
4.2 KiB
Markdown
142 lines
4.2 KiB
Markdown
# The Ballot Box
|
|
|
|
**Last Updated:** 2026-01-31
|
|
**Confidence:** High
|
|
**Status:** Implemented
|
|
|
|
## Summary
|
|
|
|
The Ballot Box is Episteme's high-velocity vote ingestion system. It separates votes from assertions to enable thousands of agents to vote simultaneously without lock contention.
|
|
|
|
**Key Facts:**
|
|
- Votes are append-only (immutable)
|
|
- Content-addressed by BLAKE3 hash
|
|
- O(1) vote counts via cached counters
|
|
- O(1) aggregate weights for Materializer
|
|
- Decoupled from Assertion mutations
|
|
|
|
**File Pointer:** `crates/stemedb-storage/src/vote_store.rs`
|
|
|
|
## Storage Layout
|
|
|
|
| Key Pattern | Value | Purpose |
|
|
|-------------|-------|---------|
|
|
| `V:{assertion_hash}:{vote_hash}` | Serialized `Vote` | Individual votes |
|
|
| `VC:{assertion_hash}` | `u64` (LE bytes) | Vote count cache |
|
|
| `VW:{assertion_hash}` | `f32` (LE bytes) | Aggregate weight cache |
|
|
|
|
## VoteStore Trait
|
|
|
|
```rust
|
|
#[async_trait]
|
|
pub trait VoteStore: Send + Sync {
|
|
/// Store a vote and return its content-addressed hash
|
|
async fn put_vote(&self, vote: &Vote) -> Result<Hash>;
|
|
|
|
/// Get a specific vote by hash
|
|
async fn get_vote(&self, assertion_hash: &Hash, vote_hash: &Hash) -> Result<Option<Vote>>;
|
|
|
|
/// Get all votes for an assertion (O(n))
|
|
async fn get_votes_for_assertion(&self, assertion_hash: &Hash) -> Result<Vec<Vote>>;
|
|
|
|
/// Get vote count (O(1) via cache)
|
|
async fn get_vote_count(&self, assertion_hash: &Hash) -> Result<u64>;
|
|
|
|
/// Get aggregate weight (O(1) via cache)
|
|
async fn get_aggregate_weight(&self, assertion_hash: &Hash) -> Result<f32>;
|
|
|
|
/// Check if any votes exist
|
|
async fn has_votes(&self, assertion_hash: &Hash) -> Result<bool>;
|
|
}
|
|
```
|
|
|
|
## Usage Example
|
|
|
|
```rust
|
|
use stemedb_storage::{HybridStore, GenericVoteStore, VoteStore};
|
|
use stemedb_core::types::Vote;
|
|
|
|
// Create vote store backed by HybridStore (fjall + redb)
|
|
let kv_store = HybridStore::open("./data")?;
|
|
let vote_store = GenericVoteStore::new(kv_store);
|
|
|
|
// High-velocity vote ingestion
|
|
let vote = Vote {
|
|
assertion_hash: [1u8; 32],
|
|
agent_id: [2u8; 32],
|
|
weight: 0.85,
|
|
signature: sig_bytes,
|
|
timestamp: now,
|
|
};
|
|
let vote_hash = vote_store.put_vote(&vote).await?;
|
|
|
|
// O(1) aggregation queries (for Materializer)
|
|
let count = vote_store.get_vote_count(&assertion_hash).await?;
|
|
let total_weight = vote_store.get_aggregate_weight(&assertion_hash).await?;
|
|
```
|
|
|
|
## Design Rationale
|
|
|
|
### Why Separate Votes from Assertions?
|
|
|
|
Traditional databases would store votes as a column or join table:
|
|
|
|
```sql
|
|
-- Naive approach: votes as assertion metadata
|
|
UPDATE assertions SET vote_count = vote_count + 1 WHERE hash = ?;
|
|
```
|
|
|
|
**Problems:**
|
|
1. Lock contention when many agents vote on same assertion
|
|
2. Lost history (can't see who voted when)
|
|
3. Violates append-only semantics
|
|
|
|
**Ballot Box Solution:**
|
|
1. Votes are separate, immutable records
|
|
2. Each vote is content-addressed
|
|
3. Caches enable O(1) aggregation
|
|
4. Full audit trail preserved
|
|
|
|
### Cache Update Strategy
|
|
|
|
When `put_vote()` is called:
|
|
1. Serialize vote with rkyv
|
|
2. Compute BLAKE3 hash (content address)
|
|
3. Store at `V:{assertion_hash}:{vote_hash}`
|
|
4. Increment `VC:{assertion_hash}` counter
|
|
5. Add weight to `VW:{assertion_hash}` sum
|
|
|
|
The caches are updated atomically with the vote write, ensuring consistency.
|
|
|
|
### Duplicate Vote Handling
|
|
|
|
The VoteStore does NOT prevent duplicate votes - it stores whatever is submitted. Duplicate detection is a higher-level concern (e.g., "one vote per agent per assertion") that should be enforced at the API layer.
|
|
|
|
This design choice keeps the storage layer simple and lets policy be defined elsewhere.
|
|
|
|
## Integration with Materializer
|
|
|
|
The Materializer (Phase 2) will use the VoteStore to update Materialized Views:
|
|
|
|
```rust
|
|
// Materializer pseudocode
|
|
for assertion in new_assertions {
|
|
let votes = vote_store.get_votes_for_assertion(&assertion.hash).await?;
|
|
let weighted_score = calculate_consensus(votes, trustrank);
|
|
|
|
if should_update_mv(weighted_score, current_mv) {
|
|
store.put(
|
|
format!("MV:{}:{}", assertion.subject, assertion.predicate),
|
|
assertion.serialize()
|
|
).await?;
|
|
}
|
|
}
|
|
```
|
|
|
|
## Related Topics
|
|
|
|
- [Storage](./storage.md)
|
|
- [Assertion](./assertion.md)
|
|
- [Architecture](../../../architecture.md)
|
|
- [Vision - The Ballot Box](../../../vision.md#4-2-the-ballot-box-high-velocity-consensus)
|