Phase 1 delivers the complete durability and storage layer:
- WAL with crash recovery: Append-only journal with BLAKE3 checksums,
fsync guarantees, and proper seek-to-EOF on reopen
- Storage engine: sled-backed KVStore with scan_prefix for range queries
- Content-addressed storage: H:{hash}, V:{hash}, E:{hash} key patterns
- Ingestor: Background worker tailing WAL, writing to KV with 8-byte
aligned record headers for rkyv zero-copy deserialization
- Comprehensive tests: 31 tests covering crash recovery, round-trips,
and multi-cycle durability
New crates: stemedb-wal, stemedb-storage, stemedb-ingest
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3.1 KiB
3.1 KiB
| name | description |
|---|---|
| stemedb-core | Core guidelines for the Episteme database engine. Use when working on storage, DAG, or assertions. |
StemeDB Core Guidelines
Identity
You are building the Spine of Episteme. This is the storage engine that persists the Merkle DAG.
Principles
- Append-Only: We never mutate an existing Assertion. We only append new ones.
- Content-Addressed: The ID of an assertion is its Hash (BLAKE3).
- Defensive: Use
quarantine-journalpatterns (WAL, Fsync). - Typed: Use Strong types (
EntityId,RelationId,Hash) not Strings.
Data Structures
Assertion (sync with crates/stemedb-core/src/types.rs)
pub struct Assertion {
// The Fact
pub subject: EntityId, // "Tesla_Inc"
pub predicate: RelationId, // "has_revenue"
pub object: ObjectValue, // Text/Number/Boolean/Reference
// The Lineage
pub parent_hash: Option<Hash>, // Link to previous version
pub source_hash: Hash, // Evidence pointer
pub visual_hash: Option<PHash>, // pHash for image provenance
// Meta-Cognition
pub signatures: Vec<SignatureEntry>, // Multi-sig support
pub confidence: f32, // 0.0 to 1.0
pub timestamp: u64, // Unix epoch
pub vector: Option<Vec<f32>>, // Semantic embedding
}
pub struct SignatureEntry {
pub agent_id: [u8; 32], // Ed25519 Public Key
pub signature: [u8; 64], // Ed25519 Signature
pub timestamp: u64, // When signed
}
pub enum ObjectValue {
Text(String),
Number(f64),
Boolean(bool),
Reference(EntityId),
}
Storage Layout (KV)
H:{Hash} -> Assertion(Main Store)S:{Subject} -> Vec<Hash>(Index)SP:{Subject}:{Predicate} -> Vec<Hash>(Index)
Do
- Use
rkyvfor zero-copy deserialization. - Use
thiserrorfor library errors. - Validate signatures on Ingest.
- Instrument public methods with
#[instrument]for observability.
Tracing Pattern
All public methods in WAL, storage, and ingestion MUST have tracing spans:
use tracing::{debug, info, instrument};
#[instrument(skip(self, payload), fields(payload_len = payload.len()))]
pub fn append(&mut self, payload: Vec<u8>) -> Result<u64> {
// ... implementation ...
debug!(offset, "Record appended");
Ok(offset)
}
Guidelines:
- Use
skip(self)to avoid noisy output - Use
skip(payload)orskip(value)for large data - Add
fields(key_len = ..., value_len = ...)for size visibility - Use
debug!for routine operations,info!for lifecycle events,warn!for recoverable issues
Do Not
- Use
unwrap()in core logic. - Store large blobs in the Assertions (store pointers/hashes instead).
- Add new types without updating
ai-lookup/services/documentation. - Add public methods without
#[instrument]in WAL/storage/ingest crates.
Documentation Sync
When modifying core types:
- Update this skill's Data Structures section to match actual code
- Add/update entry in
ai-lookup/services/assertion.mdorai-lookup/services/storage.md - Update
ai-lookup/index.mdif adding new concepts