- Add `content: Option<String>` to SourceRecord with rkyv schema evolution (LegacySourceRecord compat deserializer for backward compatibility) - Add MAX_SOURCE_CONTENT_LEN (1MB) limit with API validation - Strip content from list responses, include in single-source GET - Update Go SDK RegisterSourceRequest with Content field - FCM pipeline extracts PDF text via pdftotext and passes to registration - Dashboard impact panel fetches and displays source content with expand/collapse - Add feed endpoint, dashboard feed panel, and signed assertion support - Update data-structures.md, API docs, and storage docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
4.9 KiB
Storage
Last Updated: 2026-02-19 Confidence: High
Summary
Episteme uses a Log-Structured, Content-Addressed storage model. Writes append to WAL, then index asynchronously. Reads query indexes and apply Lenses.
Key Facts:
- Append-only (never mutate)
- WAL for durability (fsync on write)
- KV store: HybridStore (fjall for writes, redb for reads)
- Content-addressed by BLAKE3 hash
File Pointers:
crates/stemedb-storage/src/traits.rs- KVStore traitcrates/stemedb-storage/src/key_codec.rs- Centralized key encoding (40+ builders, subject validation, extraction)crates/stemedb-storage/src/hybrid_backend.rs- HybridStore (routes to fjall or redb)crates/stemedb-storage/src/fjall_backend.rs- FjallStore (write-heavy keys)crates/stemedb-storage/src/redb_backend.rs- RedbStore (read-heavy keys)crates/stemedb-storage/src/serde_helpers.rs- Storage-layer serialize/deserialize helperscrates/stemedb-storage/src/vote_store.rs- VoteStore (Ballot Box)crates/stemedb-storage/src/index_store.rs- IndexStore (S: and SP: indexes)crates/stemedb-storage/src/trust_rank_store.rs- TrustRankStore (TR:)
KV Layout
All keys use a centralized key_codec module (crates/stemedb-storage/src/key_codec.rs). Subject-scoped keys use {subject}\x00 prefix for co-location; global keys use \x00 prefix to sort first.
Subject-Prefixed Keys (co-located per subject)
| Key Pattern | Value | Purpose |
|---|---|---|
{subject}\x00H:{hash} |
Assertion (serialized) |
Main content store |
{subject}\x00S:{hash_list} |
Vec<Hash> (rkyv) |
Subject index (IndexStore) |
{subject}\x00SP:{predicate} |
Vec<Hash> (rkyv) |
Compound index (IndexStore) |
{subject}\x00MV:{predicate} |
MaterializedView (rkyv) |
Pre-computed winner (Materializer) |
{subject}\x00V:{hash}:{vh} |
Vote (serialized) |
Ballot Box votes |
{subject}\x00VC:{hash} |
u64 (LE bytes) |
Vote count cache |
{subject}\x00VW:{hash} |
f32 (LE bytes) |
Aggregate weight cache |
{subject}\x00GS:{predicate} |
GoldStandard (rkyv) |
Gold standard entries |
Global Keys (sort first via \x00 prefix)
| Key Pattern | Value | Purpose |
|---|---|---|
\x00TRUST:{agent_id} |
TrustRank (rkyv) |
Agent reputation (TrustRankStore) |
\x00QUOTA:{agent_id}:{window} |
Quota record | Per-agent per-window quota |
\x00QLIMIT:{agent_id} |
Quota limit | Per-agent quota limit |
\x00E:{epoch_id} |
Epoch (serialized) |
Paradigm definitions |
\x00SUPERSEDED:{epoch_id} |
Supersession marker | O(1) epoch supersession lookup |
\x00SUP:{hash} |
Supersession record | Supersession data |
\x00AUD:{query_id} |
QueryAudit (rkyv) |
Query audit trail |
\x00ESC:{ts}:{id} |
EscalationEvent (rkyv) |
Escalation events |
\x00TP:{pack_id} |
TrustPack (rkyv) |
Trust packs |
\x00META:{key} |
Varies | System metadata (e.g., cursor) |
\x00HASH_SUBJECT:{hash} |
Subject string | Reverse lookup: hash → subject |
\x00SUBJECTS:{subject} |
Marker | Known subjects index |
\x00GS_LIST:{subj}:{pred} |
Listing data | Gold standard listing |
Serialization
stemedb-core (shared types)
For core types, use the canonical module:
use stemedb_core::serde::{serialize, deserialize};
let bytes = serialize(&my_value)?;
let value: MyType = deserialize(&bytes)?;
File: crates/stemedb-core/src/serde.rs
Raw AllocSerializer usage is prohibited in production code (enforced via CLAUDE.md).
stemedb-storage (store implementations)
In storage modules, use the storage-layer helpers that map to StorageError:
use crate::serde_helpers::{serialize, deserialize};
let bytes = serialize(&my_value)?; // Returns Result<Vec<u8>, StorageError>
let value: MyType = deserialize(&bytes)?;
File: crates/stemedb-storage/src/serde_helpers.rs
This provides unified error handling across all store implementations (VoteStore, IndexStore, TrustRankStore, AuditStore, TrustPackStore, QuotaStore).
For types with schema evolution (rkyv compat), use the dedicated compat functions:
use crate::serde_helpers::deserialize_source_record_compat;
let record: SourceRecord = deserialize_source_record_compat(&bytes)?;
Available compat deserializers: deserialize_source_record_compat (SourceRecord). For assertions, use stemedb_core::serde::deserialize_assertion_compat directly.
Write Path
1. Agent submits signed Assertion
2. Validate signature
3. Append to WAL (fsync)
4. Return 202 Accepted with Hash
5. Background: tail WAL -> update indexes
Read Path
1. Query: GET(Subject, Predicate, Lens)
2. Lookup: {subject}\x00SP:{predicate} -> [Hash...]
3. Hydrate: Load assertions from {subject}\x00H:{hash}
4. Resolve: Apply Lens
5. Return: Deterministic answer
Related Topics
- Assertion
- Ballot Box - High-velocity vote storage
- Architecture