stemedb/.claude/skills/stemedb-core/SKILL.md
jordan 3cfaa1e1d3 feat: Complete Phase 1 (The Spine) - storage foundation
Phase 1 delivers the complete durability and storage layer:

- WAL with crash recovery: Append-only journal with BLAKE3 checksums,
  fsync guarantees, and proper seek-to-EOF on reopen
- Storage engine: sled-backed KVStore with scan_prefix for range queries
- Content-addressed storage: H:{hash}, V:{hash}, E:{hash} key patterns
- Ingestor: Background worker tailing WAL, writing to KV with 8-byte
  aligned record headers for rkyv zero-copy deserialization
- Comprehensive tests: 31 tests covering crash recovery, round-trips,
  and multi-cycle durability

New crates: stemedb-wal, stemedb-storage, stemedb-ingest

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 14:15:34 -07:00

3.1 KiB

name description
stemedb-core Core guidelines for the Episteme database engine. Use when working on storage, DAG, or assertions.

StemeDB Core Guidelines

Identity

You are building the Spine of Episteme. This is the storage engine that persists the Merkle DAG.

Principles

  • Append-Only: We never mutate an existing Assertion. We only append new ones.
  • Content-Addressed: The ID of an assertion is its Hash (BLAKE3).
  • Defensive: Use quarantine-journal patterns (WAL, Fsync).
  • Typed: Use Strong types (EntityId, RelationId, Hash) not Strings.

Data Structures

Assertion (sync with crates/stemedb-core/src/types.rs)

pub struct Assertion {
    // The Fact
    pub subject: EntityId,        // "Tesla_Inc"
    pub predicate: RelationId,    // "has_revenue"
    pub object: ObjectValue,      // Text/Number/Boolean/Reference

    // The Lineage
    pub parent_hash: Option<Hash>,  // Link to previous version
    pub source_hash: Hash,          // Evidence pointer
    pub visual_hash: Option<PHash>, // pHash for image provenance

    // Meta-Cognition
    pub signatures: Vec<SignatureEntry>,  // Multi-sig support
    pub confidence: f32,                  // 0.0 to 1.0
    pub timestamp: u64,                   // Unix epoch
    pub vector: Option<Vec<f32>>,         // Semantic embedding
}

pub struct SignatureEntry {
    pub agent_id: [u8; 32],   // Ed25519 Public Key
    pub signature: [u8; 64],  // Ed25519 Signature
    pub timestamp: u64,       // When signed
}

pub enum ObjectValue {
    Text(String),
    Number(f64),
    Boolean(bool),
    Reference(EntityId),
}

Storage Layout (KV)

  • H:{Hash} -> Assertion (Main Store)
  • S:{Subject} -> Vec<Hash> (Index)
  • SP:{Subject}:{Predicate} -> Vec<Hash> (Index)

Do

  • Use rkyv for zero-copy deserialization.
  • Use thiserror for library errors.
  • Validate signatures on Ingest.
  • Instrument public methods with #[instrument] for observability.

Tracing Pattern

All public methods in WAL, storage, and ingestion MUST have tracing spans:

use tracing::{debug, info, instrument};

#[instrument(skip(self, payload), fields(payload_len = payload.len()))]
pub fn append(&mut self, payload: Vec<u8>) -> Result<u64> {
    // ... implementation ...
    debug!(offset, "Record appended");
    Ok(offset)
}

Guidelines:

  • Use skip(self) to avoid noisy output
  • Use skip(payload) or skip(value) for large data
  • Add fields(key_len = ..., value_len = ...) for size visibility
  • Use debug! for routine operations, info! for lifecycle events, warn! for recoverable issues

Do Not

  • Use unwrap() in core logic.
  • Store large blobs in the Assertions (store pointers/hashes instead).
  • Add new types without updating ai-lookup/services/ documentation.
  • Add public methods without #[instrument] in WAL/storage/ingest crates.

Documentation Sync

When modifying core types:

  1. Update this skill's Data Structures section to match actual code
  2. Add/update entry in ai-lookup/services/assertion.md or ai-lookup/services/storage.md
  3. Update ai-lookup/index.md if adding new concepts