# Task 01: StorageEngine Trait and Key Encoding ## Context **Milestone:** 1 -- Signal Engine **Phase:** m1p3 -- Storage Engine Trait and fjall Backend **Status:** COMPLETE **Depends On:** m1p1 (`EntityId`, `EntityKind`) **Blocks:** Task 02 (FjallBackend), Task 03 (InMemoryBackend) **Complexity:** M ## Objective Define the `StorageEngine` trait that abstracts all persistent entity state access, the key encoding scheme that colocates entity data for efficient prefix scans, and the supporting types (`WriteBatch`, `BatchOp`, `PrefixIterator`, `StorageError`). This is the boundary that keeps the rest of tidalDB storage-engine-agnostic. The WAL (m1p2) is the signal event source of truth; the storage engine is where derived entity state (metadata, signal checkpoints, indexes) lives. Every higher module — signal ledger, entity API, query engine — talks to a `StorageEngine`, never to fjall directly. ## Requirements - `StorageEngine` is a `Send + Sync` object-safe trait - Operations: `get(&[u8]) -> Result>>`, `put(&[u8], &[u8]) -> Result<()>`, `delete(&[u8]) -> Result<()>`, `scan_prefix(&[u8]) -> PrefixIterator<'_>`, `write_batch(WriteBatch) -> Result<()>`, `flush() -> Result<()>` - Key encoding: `[entity_id: 8 bytes BE][0x00][Tag: 1 byte][suffix: variable]` - 8-byte big-endian entity ID: byte-lexicographic order matches numeric order - `0x00` NUL separator between entity ID and tag - 1-byte `Tag` discriminant for data category within the keyspace - `Tag` enum: `Evt`=0x01 (raw events), `Sig`=0x02 (signal state), `Meta`=0x03 (entity metadata), `Rel`=0x04 (relationships), `Mv`=0x05 (materialized views), `Idx`=0x06 (inverted index) - `entity_prefix(entity_id)` returns 9 bytes: `[entity_id: 8 BE][0x00]` — scans all tags for one entity - `entity_tag_prefix(entity_id, tag)` returns 10 bytes: `[entity_id: 8 BE][0x00][tag: 1]` — scans one tag for one entity - `encode_key(entity_id, tag, suffix)` and `parse_key(key)` roundtrip correctly for all inputs - `WriteBatch` collects `Put` and `Delete` operations; `write_batch()` applies them atomically - `PrefixIterator<'_>` is a type alias for `Box, Vec), StorageError>> + '_>` - `StorageError` integrates with `LumenError::Storage` ## Technical Design ### Key Encoding ``` [entity_id: u64 BE, 8 bytes][NUL: 0x00, 1 byte][Tag: u8, 1 byte][suffix: 0..N bytes] Total prefix for entity scan: 9 bytes Total prefix for tag scan: 10 bytes ``` **Why big-endian for entity IDs?** Byte-lexicographic order of the 8-byte encoding must match numeric order of the u64 value. Big-endian achieves this: `EntityId(1)` → `[0,0,0,0,0,0,0,1]`, `EntityId(256)` → `[0,0,0,1,0,0,0,0]`. Little-endian would invert the ordering. **Why NUL separator?** Prevents a variable-length entity ID prefix from colliding with suffixes. With fixed 8-byte IDs the separator is redundant but is kept for consistency with the subject-prefix pattern from `thoughts.md` and for future extensibility. ### Public API ```rust // === keys.rs === #[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)] #[repr(u8)] pub enum Tag { Evt = 0x01, // raw event records (signal WAL overflow/cold tier) Sig = 0x02, // signal state checkpoints Meta = 0x03, // entity metadata (title, category, created_at, ...) Rel = 0x04, // relationship edges (follows, blocks, interaction weights) Mv = 0x05, // materialized views (pre-computed aggregates) Idx = 0x06, // inverted index entries } /// Build a full key: [entity_id: 8 BE][0x00][tag: 1][suffix] pub fn encode_key(entity_id: EntityId, tag: Tag, suffix: &[u8]) -> Vec; /// Parse a key back into (entity_id, tag, suffix). /// Returns Err on keys too short to contain entity_id + separator + tag. pub fn parse_key(key: &[u8]) -> Result<(EntityId, Tag, &[u8]), StorageError>; /// Prefix for all keys belonging to one entity: [entity_id: 8 BE][0x00] pub fn entity_prefix(entity_id: EntityId) -> [u8; 9]; /// Prefix for one tag of one entity: [entity_id: 8 BE][0x00][tag: 1] pub fn entity_tag_prefix(entity_id: EntityId, tag: Tag) -> [u8; 10]; ``` ```rust // === batch.rs === #[derive(Debug, Clone)] pub enum BatchOp { Put { key: Vec, value: Vec }, Delete { key: Vec }, } #[derive(Debug, Default, Clone)] pub struct WriteBatch { ops: Vec, } impl WriteBatch { pub fn new() -> Self; pub fn put(&mut self, key: Vec, value: Vec) -> &mut Self; pub fn delete(&mut self, key: Vec) -> &mut Self; pub fn ops(&self) -> &[BatchOp]; pub fn is_empty(&self) -> bool; pub fn len(&self) -> usize; } ``` ```rust // === iterator.rs === /// Boxed prefix scan iterator yielding (key, value) pairs. pub type PrefixIterator<'a> = Box, Vec), StorageError>> + 'a>; ``` ```rust // === error.rs === #[derive(Debug, thiserror::Error)] pub enum StorageError { #[error("I/O error: {0}")] Io(#[from] std::io::Error), #[error("storage backend error: {0}")] Backend(String), #[error("key parse error: {0}")] KeyParse(String), #[error("engine closed")] Closed, } ``` ## Test Strategy ### Property Tests (proptest) ```rust // encode_key / parse_key roundtrip for all tags and suffixes proptest! { #[test] fn key_roundtrip( id: u64, tag in prop_oneof![ Just(Tag::Evt), Just(Tag::Sig), Just(Tag::Meta), Just(Tag::Rel), Just(Tag::Mv), Just(Tag::Idx), ], suffix in prop::collection::vec(any::(), 0..64), ) { let entity_id = EntityId::new(id); let key = encode_key(entity_id, tag, &suffix); let (parsed_id, parsed_tag, parsed_suffix) = parse_key(&key).unwrap(); prop_assert_eq!(parsed_id, entity_id); prop_assert_eq!(parsed_tag, tag); prop_assert_eq!(parsed_suffix, suffix.as_slice()); } } // Byte-lexicographic order of encoded keys matches numeric order of entity IDs proptest! { #[test] fn key_ordering_matches_entity_id_ordering(a: u64, b: u64) { let key_a = encode_key(EntityId::new(a), Tag::Meta, b""); let key_b = encode_key(EntityId::new(b), Tag::Meta, b""); prop_assert_eq!( key_a.cmp(&key_b), a.cmp(&b), "key ordering must match entity ID ordering" ); } } // entity_prefix is a prefix of every key for that entity proptest! { #[test] fn entity_prefix_is_prefix_of_all_entity_keys(id: u64) { let entity_id = EntityId::new(id); let prefix = entity_prefix(entity_id); for tag in [Tag::Evt, Tag::Sig, Tag::Meta, Tag::Rel] { let key = encode_key(entity_id, tag, b"suffix"); prop_assert!(key.starts_with(&prefix)); } } } // entity_tag_prefix is a prefix of every key for that entity and tag proptest! { #[test] fn entity_tag_prefix_is_precise(id: u64, suffix in prop::collection::vec(any::(), 0..32)) { let entity_id = EntityId::new(id); let prefix = entity_tag_prefix(entity_id, Tag::Sig); let key = encode_key(entity_id, Tag::Sig, &suffix); prop_assert!(key.starts_with(&prefix)); // Tag::Meta key does NOT start with Tag::Sig prefix let other_key = encode_key(entity_id, Tag::Meta, &suffix); prop_assert!(!other_key.starts_with(&prefix)); } } ``` ### Unit Tests ```rust #[test] fn tag_byte_values() { assert_eq!(Tag::Evt as u8, 0x01); assert_eq!(Tag::Sig as u8, 0x02); assert_eq!(Tag::Meta as u8, 0x03); assert_eq!(Tag::Rel as u8, 0x04); assert_eq!(Tag::Mv as u8, 0x05); assert_eq!(Tag::Idx as u8, 0x06); } #[test] fn entity_prefix_length() { let prefix = entity_prefix(EntityId::new(1)); assert_eq!(prefix.len(), 9); } #[test] fn entity_tag_prefix_length() { let prefix = entity_tag_prefix(EntityId::new(1), Tag::Meta); assert_eq!(prefix.len(), 10); } #[test] fn parse_key_rejects_short_input() { assert!(parse_key(b"").is_err()); assert!(parse_key(&[0u8; 8]).is_err()); // missing NUL + tag assert!(parse_key(&[0u8; 9]).is_err()); // missing tag } #[test] fn write_batch_ops_order_preserved() { let mut batch = WriteBatch::new(); batch.put(b"k1".to_vec(), b"v1".to_vec()); batch.delete(b"k2".to_vec()); batch.put(b"k3".to_vec(), b"v3".to_vec()); assert_eq!(batch.len(), 3); assert!(matches!(batch.ops()[0], BatchOp::Put { .. })); assert!(matches!(batch.ops()[1], BatchOp::Delete { .. })); assert!(matches!(batch.ops()[2], BatchOp::Put { .. })); } ``` ## Acceptance Criteria - [x] `encode_key` / `parse_key` roundtrip correctly for all 6 `Tag` variants and arbitrary suffixes (property tested) - [x] Byte-lexicographic ordering of encoded keys matches numeric ordering of `EntityId` (property tested) - [x] `entity_prefix` is 9 bytes and is a prefix of every key for that entity (property tested) - [x] `entity_tag_prefix` is 10 bytes and is a prefix of only keys with the matching entity+tag (property tested) - [x] `parse_key` returns `StorageError::KeyParse` for inputs shorter than 10 bytes - [x] `WriteBatch` preserves insertion order of operations - [x] `StorageEngine` trait is object-safe (`dyn StorageEngine` compiles) - [x] `StorageEngine: Send + Sync` — enforced by the trait bound - [x] `cargo clippy -D warnings` passes ## Research References - [thoughts.md](../../../../thoughts.md) — Part V.12 (subject-prefix keys: `[entity_id][NUL][TAG][suffix]`, rationale for co-location, entity-scoped prefix scans) - [CODING_GUIDELINES.md](../../../../CODING_GUIDELINES.md) — Section 2 (key encoding: big-endian for byte-lexicographic ordering, NUL separator convention) ## Implementation Notes - `Tag` uses `#[repr(u8)]` for direct byte encoding. A `From` impl with a catch-all `→ StorageError::KeyParse` allows forward-compatible decoding of unknown future tag values. - `PrefixIterator<'_>` is a type alias (not a newtype) to avoid boxing overhead in callers that know the concrete iterator type at compile time. The `'_` lifetime ties the iterator to the backend's lifetime. - `StorageError` uses `thiserror` (already in `Cargo.toml`) for `Display` and `Error` implementations. - Do NOT add `serde` to the storage error types. Error propagation uses `From` impls, not serialization.