Implements the foundation of tidalDB's data pipeline: **Phase 1 – Schema primitives** - EntityId newtype (u64, big-endian ordering) - SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows - SchemaBuilder with full constraint validation (duplicates, identifiers, half-life, windows, velocity) - LumenError wrapping all subsystems with required From impls **Phase 2 – Write-Ahead Log** - Length-prefixed, BLAKE3-protected entry format - Group-commit writer (batch up to 100 events / 10 ms) - Double-buffered content-hash deduplication - Checkpoint, truncation, and crash-recovery with full replay - Integration, property, and UAT tests (incl. 5,500-event deterministic UAT) - Proptest coverage scaled to 10 000 events/run (was ≤500) to meet acceptance criterion; cases reduced 100→10 to keep runtime comparable **Phase 3 – Storage engine** - StorageEngine trait (get/put/delete/scan/batch/flush) - Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers - InMemoryBackend (BTreeMap + RwLock) - FjallStorage with three isolated keyspaces and atomic batch helper - Property tests for key ordering and round-trip correctness Also adds planning docs for phases 4-5, research docs, architecture overview, and roadmap updates. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5.2 KiB
Milestone 1, Phase 3: Storage Engine Trait and fjall Backend
Status: COMPLETE (140 tests passing: 128 unit + 12 integration)
Phase Deliverable
The StorageEngine trait abstraction and two implementations: FjallBackend (fjall 3 LSM-tree) for production and InMemoryBackend (BTreeMap + RwLock) for deterministic testing. Key encoding follows the subject-prefix pattern with a Tag discriminant. FjallStorage coordinates three keyspaces per entity kind. FjallAtomicBatch provides cross-keyspace atomic writes.
This phase is the durable entity store — where metadata, signal checkpoints, and index data live. It is separate from the WAL (m1p2): the WAL is the signal event source of truth; the storage engine is where derived entity state is persisted.
Acceptance Criteria
StorageEnginetrait withget,put,delete,scan_prefix,write_batch,flushoperations- Key encoding:
[entity_id: 8 bytes BE][0x00][Tag: 1 byte][suffix...]withTagenum (Evt=0x01,Sig=0x02,Meta=0x03,Rel=0x04,Mv=0x05,Idx=0x06) encode_key,parse_keyroundtrip correctly for all tag variants and arbitrary suffixesentity_prefix(9 bytes) andentity_tag_prefix(10 bytes) for scoped prefix scans- Byte-lexicographic key ordering matches numeric entity ID ordering (property tested)
FjallBackendwraps a single fjallKeyspace, implementsStorageEngineFjallStorageowns a fjallDatabasewith three keyspaces: "items", "users", "creators" (one perEntityKind)FjallStorage::backend(EntityKind)routes to the correct keyspace backend- Entity kind isolation: same key written to different entity kinds does not collide
FjallAtomicBatchprovides cross-keyspace atomic writes viafjall::OwnedWriteBatch- Data persists across close and reopen (
flush_all+ reopen test) InMemoryBackendusesBTreeMap+RwLockfor deterministic, sorted, concurrent testingWriteBatchandBatchOptypes for atomic multi-operation writesPrefixIteratortype alias for boxed prefix scan iterators- Property tests with proptest: encode/parse roundtrip, prefix ordering, prefix containment
- Criterion benchmarks passing
cargo fmtclean,cargo clippy -D warningsclean, all 140 tests pass (128 unit + 12 integration)
Dependencies
- Requires: m1p1 (types:
EntityId,EntityKind— used in key encoding;StorageErrorreferencesLumenErrorerror hierarchy) - Blocks: m1p4 (Signal Ledger checkpoints via
StorageEngine), m1p5 (Entity CRUD viaStorageEngine)
Research References
- thoughts.md — Part V.9 (hybrid storage: fjall for entity metadata, WAL for signals), Part V.12 (subject-prefix keys:
[entity_id][NUL][TAG][suffix]for co-location and prefix scan efficiency) - CODING_GUIDELINES.md — Section 2 (key encoding: big-endian entity IDs for lexicographic ordering), Section 10 (fjall as the primary storage backend)
Task Index
| # | Task | Delivers | Depends On | Complexity | Status |
|---|---|---|---|---|---|
| 01 | StorageEngine Trait and Key Encoding | StorageEngine, Tag, encode_key, parse_key, entity_prefix, entity_tag_prefix, WriteBatch, BatchOp, PrefixIterator, StorageError |
None | M | COMPLETE |
| 02 | FjallBackend | FjallBackend, FjallStorage, FjallAtomicBatch, persistence tests |
Task 01 | M | COMPLETE |
| 03 | InMemoryBackend | InMemoryBackend, property tests, benchmarks |
Task 01 | S | COMPLETE |
Task Dependency DAG
Task 01: StorageEngine Trait + Key Encoding
|
+---------------------------+
| |
v v
Task 02: FjallBackend Task 03: InMemoryBackend
Tasks 02 and 03 are fully parallelizable after Task 01's trait and key encoding are defined.
File Layout
tidal/src/
storage/
mod.rs -- pub use re-exports
engine.rs -- Task 01: StorageEngine trait
keys.rs -- Task 01: Tag, encode_key, parse_key, entity_prefix, entity_tag_prefix
batch.rs -- Task 01: WriteBatch, BatchOp
iterator.rs -- Task 01: PrefixIterator type alias
error.rs -- Task 01: StorageError
fjall.rs -- Task 02: FjallBackend, FjallStorage, FjallAtomicBatch
memory.rs -- Task 03: InMemoryBackend
lib.rs -- pub mod storage (already present)
Lessons Learned
-
Keyspaces are per
EntityKind, not per data category. TheTagenum provides data-category namespace within each entity-kind keyspace. This meansFjallStoragehas three keyspaces: "items", "users", "creators". ATag::Metakey in the "items" keyspace is distinct fromTag::Metain the "users" keyspace. -
MSRV bumped to 1.91 for fjall 3 compatibility. Documented in
tidal/Cargo.toml. -
LumenErrorname is a legacy artifact from the predecessor project (Engram/Lumen). Will be renamed toTidalErrorwhen convenient but does not block m1p3 progress. -
FjallAtomicBatchprovides cross-keyspace atomicity viafjall::OwnedWriteBatch. This is the mechanism for m1p4 checkpoint writes that touch multiple entity kinds atomically.