Phase 5C (Index Persistence) implementation: - PersistentVectorIndex with hot/cold architecture - Hot: in-memory HNSW for recent vectors - Cold: memory-mapped HNSW loaded from disk - Background builder for WAL replay and atomic swap - BLAKE3 integrity verification - PersistentVisualIndex with checkpoint persistence - BkTreeSnapshot with rkyv serialization - CRC32C corruption detection - Atomic write pattern (temp → fsync → rename) - Key codec additions for vector index metadata - Split large files into modules (<500 lines each) - battery_pre_sentinel.rs → battery/ directory - visual_index.rs → visual_index/ directory - persistent.rs → persistent/ directory - Refactored ingest worker tests for clarity - Updated roadmap to mark Phase 5 complete Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
4.5 KiB
Storage
Last Updated: 2026-01-31 Confidence: High
Summary
Episteme uses a Log-Structured, Content-Addressed storage model. Writes append to WAL, then index asynchronously. Reads query indexes and apply Lenses.
Key Facts:
- Append-only (never mutate)
- WAL for durability (fsync on write)
- KV store: HybridStore (fjall for writes, redb for reads)
- Content-addressed by BLAKE3 hash
File Pointers:
crates/stemedb-storage/src/traits.rs- KVStore traitcrates/stemedb-storage/src/key_codec.rs- Centralized key encoding (40+ builders, subject validation, extraction)crates/stemedb-storage/src/hybrid_backend.rs- HybridStore (routes to fjall or redb)crates/stemedb-storage/src/fjall_backend.rs- FjallStore (write-heavy keys)crates/stemedb-storage/src/redb_backend.rs- RedbStore (read-heavy keys)crates/stemedb-storage/src/serde_helpers.rs- Storage-layer serialize/deserialize helperscrates/stemedb-storage/src/vote_store.rs- VoteStore (Ballot Box)crates/stemedb-storage/src/index_store.rs- IndexStore (S: and SP: indexes)crates/stemedb-storage/src/trust_rank_store.rs- TrustRankStore (TR:)
KV Layout
All keys use a centralized key_codec module (crates/stemedb-storage/src/key_codec.rs). Subject-scoped keys use {subject}\x00 prefix for co-location; global keys use \x00 prefix to sort first.
Subject-Prefixed Keys (co-located per subject)
| Key Pattern | Value | Purpose |
|---|---|---|
{subject}\x00H:{hash} |
Assertion (serialized) |
Main content store |
{subject}\x00S:{hash_list} |
Vec<Hash> (rkyv) |
Subject index (IndexStore) |
{subject}\x00SP:{predicate} |
Vec<Hash> (rkyv) |
Compound index (IndexStore) |
{subject}\x00MV:{predicate} |
MaterializedView (rkyv) |
Pre-computed winner (Materializer) |
{subject}\x00V:{hash}:{vh} |
Vote (serialized) |
Ballot Box votes |
{subject}\x00VC:{hash} |
u64 (LE bytes) |
Vote count cache |
{subject}\x00VW:{hash} |
f32 (LE bytes) |
Aggregate weight cache |
{subject}\x00GS:{predicate} |
GoldStandard (rkyv) |
Gold standard entries |
Global Keys (sort first via \x00 prefix)
| Key Pattern | Value | Purpose |
|---|---|---|
\x00TRUST:{agent_id} |
TrustRank (rkyv) |
Agent reputation (TrustRankStore) |
\x00QUOTA:{agent_id}:{window} |
Quota record | Per-agent per-window quota |
\x00QLIMIT:{agent_id} |
Quota limit | Per-agent quota limit |
\x00E:{epoch_id} |
Epoch (serialized) |
Paradigm definitions |
\x00SUPERSEDED:{epoch_id} |
Supersession marker | O(1) epoch supersession lookup |
\x00SUP:{hash} |
Supersession record | Supersession data |
\x00AUD:{query_id} |
QueryAudit (rkyv) |
Query audit trail |
\x00ESC:{ts}:{id} |
EscalationEvent (rkyv) |
Escalation events |
\x00TP:{pack_id} |
TrustPack (rkyv) |
Trust packs |
\x00META:{key} |
Varies | System metadata (e.g., cursor) |
\x00HASH_SUBJECT:{hash} |
Subject string | Reverse lookup: hash → subject |
\x00SUBJECTS:{subject} |
Marker | Known subjects index |
\x00GS_LIST:{subj}:{pred} |
Listing data | Gold standard listing |
Serialization
stemedb-core (shared types)
For core types, use the canonical module:
use stemedb_core::serde::{serialize, deserialize};
let bytes = serialize(&my_value)?;
let value: MyType = deserialize(&bytes)?;
File: crates/stemedb-core/src/serde.rs
Raw AllocSerializer usage is prohibited in production code (enforced via CLAUDE.md).
stemedb-storage (store implementations)
In storage modules, use the storage-layer helpers that map to StorageError:
use crate::serde_helpers::{serialize, deserialize};
let bytes = serialize(&my_value)?; // Returns Result<Vec<u8>, StorageError>
let value: MyType = deserialize(&bytes)?;
File: crates/stemedb-storage/src/serde_helpers.rs
This provides unified error handling across all store implementations (VoteStore, IndexStore, TrustRankStore, AuditStore, TrustPackStore, QuotaStore).
Write Path
1. Agent submits signed Assertion
2. Validate signature
3. Append to WAL (fsync)
4. Return 202 Accepted with Hash
5. Background: tail WAL -> update indexes
Read Path
1. Query: GET(Subject, Predicate, Lens)
2. Lookup: {subject}\x00SP:{predicate} -> [Hash...]
3. Hydrate: Load assertions from {subject}\x00H:{hash}
4. Resolve: Apply Lens
5. Return: Deterministic answer
Related Topics
- Assertion
- Ballot Box - High-velocity vote storage
- Architecture