Add CRC32C checksums to WAL record format (v2), implement crash recovery with automatic truncation of corrupt records, add feature-gated group commit buffer for batched fsync under concurrent load, and implement log rotation via segment files with global offset addressing. Key changes: - Record format v2: [len:u32][crc32c:u32][blake3:32][payload:N] - recover_file() scans and truncates corrupt tail records - GroupCommitBuffer batches fsync via MPSC channel (tokio feature gate) - SegmentManager with binary search resolution and cursor-based cleanup - Journal::read() auto-refreshes segments on miss for writer/reader split - Split recovery.rs and key_codec.rs into directory modules for 500-line max Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
98 lines
3.2 KiB
Markdown
98 lines
3.2 KiB
Markdown
# Storage
|
|
|
|
**Last Updated:** 2026-01-31
|
|
**Confidence:** High
|
|
|
|
## Summary
|
|
|
|
Episteme uses a Log-Structured, Content-Addressed storage model. Writes append to WAL, then index asynchronously. Reads query indexes and apply Lenses.
|
|
|
|
**Key Facts:**
|
|
- Append-only (never mutate)
|
|
- WAL for durability (fsync on write)
|
|
- KV store: HybridStore (fjall for writes, redb for reads)
|
|
- Content-addressed by BLAKE3 hash
|
|
|
|
**File Pointers:**
|
|
- `crates/stemedb-storage/src/traits.rs` - KVStore trait
|
|
- `crates/stemedb-storage/src/hybrid_backend.rs` - HybridStore (routes to fjall or redb)
|
|
- `crates/stemedb-storage/src/fjall_backend.rs` - FjallStore (write-heavy keys)
|
|
- `crates/stemedb-storage/src/redb_backend.rs` - RedbStore (read-heavy keys)
|
|
- `crates/stemedb-storage/src/serde_helpers.rs` - Storage-layer serialize/deserialize helpers
|
|
- `crates/stemedb-storage/src/vote_store.rs` - VoteStore (Ballot Box)
|
|
- `crates/stemedb-storage/src/index_store.rs` - IndexStore (S: and SP: indexes)
|
|
- `crates/stemedb-storage/src/trust_rank_store.rs` - TrustRankStore (TR:)
|
|
|
|
## KV Layout
|
|
|
|
| Key Pattern | Value | Purpose |
|
|
|-------------|-------|---------|
|
|
| `H:{Hash}` | `Assertion` (serialized) | Main content store |
|
|
| `V:{assertion_hash}:{vote_hash}` | `Vote` (serialized) | Ballot Box votes |
|
|
| `VC:{assertion_hash}` | `u64` (LE bytes) | Vote count cache |
|
|
| `VW:{assertion_hash}` | `f32` (LE bytes) | Aggregate weight cache |
|
|
| `E:{epoch_id}` | `Epoch` (serialized) | Paradigm definitions |
|
|
| `S:{Subject}` | `Vec<Hash>` (rkyv) | Subject index (IndexStore) |
|
|
| `SP:{Subject}:{Predicate}` | `Vec<Hash>` (rkyv) | Compound index (IndexStore) |
|
|
| `TR:{AgentId}` | `TrustRank` (rkyv) | Agent reputation (TrustRankStore) |
|
|
| `MV:{Subject}:{Predicate}` | `MaterializedView` (rkyv) | Pre-computed winner (Materializer) |
|
|
| `__CURSOR__:ingest` | `u64` (LE bytes) | Ingestion WAL offset checkpoint |
|
|
|
|
## Serialization
|
|
|
|
### stemedb-core (shared types)
|
|
|
|
For core types, use the canonical module:
|
|
|
|
```rust
|
|
use stemedb_core::serde::{serialize, deserialize};
|
|
|
|
let bytes = serialize(&my_value)?;
|
|
let value: MyType = deserialize(&bytes)?;
|
|
```
|
|
|
|
**File:** `crates/stemedb-core/src/serde.rs`
|
|
|
|
Raw `AllocSerializer` usage is prohibited in production code (enforced via CLAUDE.md).
|
|
|
|
### stemedb-storage (store implementations)
|
|
|
|
In storage modules, use the storage-layer helpers that map to `StorageError`:
|
|
|
|
```rust
|
|
use crate::serde_helpers::{serialize, deserialize};
|
|
|
|
let bytes = serialize(&my_value)?; // Returns Result<Vec<u8>, StorageError>
|
|
let value: MyType = deserialize(&bytes)?;
|
|
```
|
|
|
|
**File:** `crates/stemedb-storage/src/serde_helpers.rs`
|
|
|
|
This provides unified error handling across all store implementations (VoteStore, IndexStore, TrustRankStore, AuditStore, TrustPackStore, QuotaStore).
|
|
|
|
## Write Path
|
|
|
|
```
|
|
1. Agent submits signed Assertion
|
|
2. Validate signature
|
|
3. Append to WAL (fsync)
|
|
4. Return 202 Accepted with Hash
|
|
5. Background: tail WAL -> update indexes
|
|
```
|
|
|
|
## Read Path
|
|
|
|
```
|
|
1. Query: GET(Subject, Predicate, Lens)
|
|
2. Lookup: SP:{Subject}:{Predicate} -> [Hash...]
|
|
3. Hydrate: Load assertions from H:{Hash}
|
|
4. Resolve: Apply Lens
|
|
5. Return: Deterministic answer
|
|
```
|
|
|
|
## Related Topics
|
|
|
|
- [Assertion](./assertion.md)
|
|
- [Ballot Box](./ballot-box.md) - High-velocity vote storage
|
|
- [Architecture](../../../architecture.md)
|