Add CRC32C checksums to WAL record format (v2), implement crash recovery with automatic truncation of corrupt records, add feature-gated group commit buffer for batched fsync under concurrent load, and implement log rotation via segment files with global offset addressing. Key changes: - Record format v2: [len:u32][crc32c:u32][blake3:32][payload:N] - recover_file() scans and truncates corrupt tail records - GroupCommitBuffer batches fsync via MPSC channel (tokio feature gate) - SegmentManager with binary search resolution and cursor-based cleanup - Journal::read() auto-refreshes segments on miss for writer/reader split - Split recovery.rs and key_codec.rs into directory modules for 500-line max Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3.2 KiB
3.2 KiB
Storage
Last Updated: 2026-01-31 Confidence: High
Summary
Episteme uses a Log-Structured, Content-Addressed storage model. Writes append to WAL, then index asynchronously. Reads query indexes and apply Lenses.
Key Facts:
- Append-only (never mutate)
- WAL for durability (fsync on write)
- KV store: HybridStore (fjall for writes, redb for reads)
- Content-addressed by BLAKE3 hash
File Pointers:
crates/stemedb-storage/src/traits.rs- KVStore traitcrates/stemedb-storage/src/hybrid_backend.rs- HybridStore (routes to fjall or redb)crates/stemedb-storage/src/fjall_backend.rs- FjallStore (write-heavy keys)crates/stemedb-storage/src/redb_backend.rs- RedbStore (read-heavy keys)crates/stemedb-storage/src/serde_helpers.rs- Storage-layer serialize/deserialize helperscrates/stemedb-storage/src/vote_store.rs- VoteStore (Ballot Box)crates/stemedb-storage/src/index_store.rs- IndexStore (S: and SP: indexes)crates/stemedb-storage/src/trust_rank_store.rs- TrustRankStore (TR:)
KV Layout
| Key Pattern | Value | Purpose |
|---|---|---|
H:{Hash} |
Assertion (serialized) |
Main content store |
V:{assertion_hash}:{vote_hash} |
Vote (serialized) |
Ballot Box votes |
VC:{assertion_hash} |
u64 (LE bytes) |
Vote count cache |
VW:{assertion_hash} |
f32 (LE bytes) |
Aggregate weight cache |
E:{epoch_id} |
Epoch (serialized) |
Paradigm definitions |
S:{Subject} |
Vec<Hash> (rkyv) |
Subject index (IndexStore) |
SP:{Subject}:{Predicate} |
Vec<Hash> (rkyv) |
Compound index (IndexStore) |
TR:{AgentId} |
TrustRank (rkyv) |
Agent reputation (TrustRankStore) |
MV:{Subject}:{Predicate} |
MaterializedView (rkyv) |
Pre-computed winner (Materializer) |
__CURSOR__:ingest |
u64 (LE bytes) |
Ingestion WAL offset checkpoint |
Serialization
stemedb-core (shared types)
For core types, use the canonical module:
use stemedb_core::serde::{serialize, deserialize};
let bytes = serialize(&my_value)?;
let value: MyType = deserialize(&bytes)?;
File: crates/stemedb-core/src/serde.rs
Raw AllocSerializer usage is prohibited in production code (enforced via CLAUDE.md).
stemedb-storage (store implementations)
In storage modules, use the storage-layer helpers that map to StorageError:
use crate::serde_helpers::{serialize, deserialize};
let bytes = serialize(&my_value)?; // Returns Result<Vec<u8>, StorageError>
let value: MyType = deserialize(&bytes)?;
File: crates/stemedb-storage/src/serde_helpers.rs
This provides unified error handling across all store implementations (VoteStore, IndexStore, TrustRankStore, AuditStore, TrustPackStore, QuotaStore).
Write Path
1. Agent submits signed Assertion
2. Validate signature
3. Append to WAL (fsync)
4. Return 202 Accepted with Hash
5. Background: tail WAL -> update indexes
Read Path
1. Query: GET(Subject, Predicate, Lens)
2. Lookup: SP:{Subject}:{Predicate} -> [Hash...]
3. Hydrate: Load assertions from H:{Hash}
4. Resolve: Apply Lens
5. Return: Deterministic answer
Related Topics
- Assertion
- Ballot Box - High-velocity vote storage
- Architecture