stemedb/ai-lookup/services/storage.md
jordan ad07a75d0a feat: add source content to source registry, signed assertions, feed endpoint, dashboard enhancements
- Add `content: Option<String>` to SourceRecord with rkyv schema evolution
  (LegacySourceRecord compat deserializer for backward compatibility)
- Add MAX_SOURCE_CONTENT_LEN (1MB) limit with API validation
- Strip content from list responses, include in single-source GET
- Update Go SDK RegisterSourceRequest with Content field
- FCM pipeline extracts PDF text via pdftotext and passes to registration
- Dashboard impact panel fetches and displays source content with expand/collapse
- Add feed endpoint, dashboard feed panel, and signed assertion support
- Update data-structures.md, API docs, and storage docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 21:54:27 -07:00

4.9 KiB

Storage

Last Updated: 2026-02-19 Confidence: High

Summary

Episteme uses a Log-Structured, Content-Addressed storage model. Writes append to WAL, then index asynchronously. Reads query indexes and apply Lenses.

Key Facts:

  • Append-only (never mutate)
  • WAL for durability (fsync on write)
  • KV store: HybridStore (fjall for writes, redb for reads)
  • Content-addressed by BLAKE3 hash

File Pointers:

  • crates/stemedb-storage/src/traits.rs - KVStore trait
  • crates/stemedb-storage/src/key_codec.rs - Centralized key encoding (40+ builders, subject validation, extraction)
  • crates/stemedb-storage/src/hybrid_backend.rs - HybridStore (routes to fjall or redb)
  • crates/stemedb-storage/src/fjall_backend.rs - FjallStore (write-heavy keys)
  • crates/stemedb-storage/src/redb_backend.rs - RedbStore (read-heavy keys)
  • crates/stemedb-storage/src/serde_helpers.rs - Storage-layer serialize/deserialize helpers
  • crates/stemedb-storage/src/vote_store.rs - VoteStore (Ballot Box)
  • crates/stemedb-storage/src/index_store.rs - IndexStore (S: and SP: indexes)
  • crates/stemedb-storage/src/trust_rank_store.rs - TrustRankStore (TR:)

KV Layout

All keys use a centralized key_codec module (crates/stemedb-storage/src/key_codec.rs). Subject-scoped keys use {subject}\x00 prefix for co-location; global keys use \x00 prefix to sort first.

Subject-Prefixed Keys (co-located per subject)

Key Pattern Value Purpose
{subject}\x00H:{hash} Assertion (serialized) Main content store
{subject}\x00S:{hash_list} Vec<Hash> (rkyv) Subject index (IndexStore)
{subject}\x00SP:{predicate} Vec<Hash> (rkyv) Compound index (IndexStore)
{subject}\x00MV:{predicate} MaterializedView (rkyv) Pre-computed winner (Materializer)
{subject}\x00V:{hash}:{vh} Vote (serialized) Ballot Box votes
{subject}\x00VC:{hash} u64 (LE bytes) Vote count cache
{subject}\x00VW:{hash} f32 (LE bytes) Aggregate weight cache
{subject}\x00GS:{predicate} GoldStandard (rkyv) Gold standard entries

Global Keys (sort first via \x00 prefix)

Key Pattern Value Purpose
\x00TRUST:{agent_id} TrustRank (rkyv) Agent reputation (TrustRankStore)
\x00QUOTA:{agent_id}:{window} Quota record Per-agent per-window quota
\x00QLIMIT:{agent_id} Quota limit Per-agent quota limit
\x00E:{epoch_id} Epoch (serialized) Paradigm definitions
\x00SUPERSEDED:{epoch_id} Supersession marker O(1) epoch supersession lookup
\x00SUP:{hash} Supersession record Supersession data
\x00AUD:{query_id} QueryAudit (rkyv) Query audit trail
\x00ESC:{ts}:{id} EscalationEvent (rkyv) Escalation events
\x00TP:{pack_id} TrustPack (rkyv) Trust packs
\x00META:{key} Varies System metadata (e.g., cursor)
\x00HASH_SUBJECT:{hash} Subject string Reverse lookup: hash → subject
\x00SUBJECTS:{subject} Marker Known subjects index
\x00GS_LIST:{subj}:{pred} Listing data Gold standard listing

Serialization

stemedb-core (shared types)

For core types, use the canonical module:

use stemedb_core::serde::{serialize, deserialize};

let bytes = serialize(&my_value)?;
let value: MyType = deserialize(&bytes)?;

File: crates/stemedb-core/src/serde.rs

Raw AllocSerializer usage is prohibited in production code (enforced via CLAUDE.md).

stemedb-storage (store implementations)

In storage modules, use the storage-layer helpers that map to StorageError:

use crate::serde_helpers::{serialize, deserialize};

let bytes = serialize(&my_value)?;  // Returns Result<Vec<u8>, StorageError>
let value: MyType = deserialize(&bytes)?;

File: crates/stemedb-storage/src/serde_helpers.rs

This provides unified error handling across all store implementations (VoteStore, IndexStore, TrustRankStore, AuditStore, TrustPackStore, QuotaStore).

For types with schema evolution (rkyv compat), use the dedicated compat functions:

use crate::serde_helpers::deserialize_source_record_compat;

let record: SourceRecord = deserialize_source_record_compat(&bytes)?;

Available compat deserializers: deserialize_source_record_compat (SourceRecord). For assertions, use stemedb_core::serde::deserialize_assertion_compat directly.

Write Path

1. Agent submits signed Assertion
2. Validate signature
3. Append to WAL (fsync)
4. Return 202 Accepted with Hash
5. Background: tail WAL -> update indexes

Read Path

1. Query: GET(Subject, Predicate, Lens)
2. Lookup: {subject}\x00SP:{predicate} -> [Hash...]
3. Hydrate: Load assertions from {subject}\x00H:{hash}
4. Resolve: Apply Lens
5. Return: Deterministic answer