---
name: stemedb-core
description: Core guidelines for the Episteme database engine. Use when working on storage, DAG, or assertions.
---

# StemeDB Core Guidelines

## Identity

You are building the **Spine** of Episteme. This is the storage engine that persists the Merkle DAG.

## Principles

*   **Append-Only**: We never mutate an existing Assertion. We only append new ones.
*   **Content-Addressed**: The ID of an assertion is its Hash (BLAKE3).
*   **Defensive**: Use `quarantine-journal` patterns (WAL, Fsync).
*   **Typed**: Use Strong types (`EntityId`, `RelationId`, `Hash`) not Strings.

## Data Structures

### Assertion (sync with `crates/stemedb-core/src/types.rs`)
```rust
pub struct Assertion {
    // The Fact
    pub subject: EntityId,        // "Tesla_Inc"
    pub predicate: RelationId,    // "has_revenue"
    pub object: ObjectValue,      // Text/Number/Boolean/Reference

    // The Lineage
    pub parent_hash: Option<Hash>,  // Link to previous version
    pub source_hash: Hash,          // Evidence pointer
    pub visual_hash: Option<PHash>, // pHash for image provenance

    // Meta-Cognition
    pub signatures: Vec<SignatureEntry>,  // Multi-sig support
    pub confidence: f32,                  // 0.0 to 1.0
    pub timestamp: u64,                   // Unix epoch
    pub vector: Option<Vec<f32>>,         // Semantic embedding
}

pub struct SignatureEntry {
    pub agent_id: [u8; 32],   // Ed25519 Public Key
    pub signature: [u8; 64],  // Ed25519 Signature
    pub timestamp: u64,       // When signed
}

pub enum ObjectValue {
    Text(String),
    Number(f64),
    Boolean(bool),
    Reference(EntityId),
}
```

## Storage Layout (KV)

*   `H:{Hash} -> Assertion` (Main Store)
*   `S:{Subject} -> Vec<Hash>` (Index)
*   `SP:{Subject}:{Predicate} -> Vec<Hash>` (Index)

## Do
*   Use `rkyv` for zero-copy deserialization.
*   Use `thiserror` for library errors.
*   Validate signatures on Ingest.
*   **Instrument public methods** with `#[instrument]` for observability.
*   **In stemedb-storage**: Use `crate::serde_helpers::{serialize, deserialize}` for all serialization. This provides unified error mapping to `StorageError::Serialization`.

## Tracing Pattern

All public methods in WAL, storage, and ingestion MUST have tracing spans:

```rust
use tracing::{debug, info, instrument};

#[instrument(skip(self, payload), fields(payload_len = payload.len()))]
pub fn append(&mut self, payload: Vec<u8>) -> Result<u64> {
    // ... implementation ...
    debug!(offset, "Record appended");
    Ok(offset)
}
```

Guidelines:
- Use `skip(self)` to avoid noisy output
- Use `skip(payload)` or `skip(value)` for large data
- Add `fields(key_len = ..., value_len = ...)` for size visibility
- Use `debug!` for routine operations, `info!` for lifecycle events, `warn!` for recoverable issues

## Do Not
*   Use `unwrap()` in core logic.
*   Store large blobs in the Assertions (store pointers/hashes instead).
*   Add new types without updating `ai-lookup/services/` documentation.
*   Add public methods without `#[instrument]` in WAL/storage/ingest crates.
*   **In stemedb-storage**: Call `stemedb_core::serde::serialize` or `deserialize` directly. Always use `crate::serde_helpers` instead.

## Documentation Sync

When modifying core types:
1. Update this skill's Data Structures section to match actual code
2. Add/update entry in `ai-lookup/services/assertion.md` or `ai-lookup/services/storage.md`
3. Update `ai-lookup/index.md` if adding new concepts