Implements the foundation of tidalDB's data pipeline: **Phase 1 – Schema primitives** - EntityId newtype (u64, big-endian ordering) - SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows - SchemaBuilder with full constraint validation (duplicates, identifiers, half-life, windows, velocity) - LumenError wrapping all subsystems with required From impls **Phase 2 – Write-Ahead Log** - Length-prefixed, BLAKE3-protected entry format - Group-commit writer (batch up to 100 events / 10 ms) - Double-buffered content-hash deduplication - Checkpoint, truncation, and crash-recovery with full replay - Integration, property, and UAT tests (incl. 5,500-event deterministic UAT) - Proptest coverage scaled to 10 000 events/run (was ≤500) to meet acceptance criterion; cases reduced 100→10 to keep runtime comparable **Phase 3 – Storage engine** - StorageEngine trait (get/put/delete/scan/batch/flush) - Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers - InMemoryBackend (BTreeMap + RwLock) - FjallStorage with three isolated keyspaces and atomic batch helper - Property tests for key ordering and round-trip correctness Also adds planning docs for phases 4-5, research docs, architecture overview, and roadmap updates. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
87 lines
5.2 KiB
Markdown
87 lines
5.2 KiB
Markdown
# Milestone 1, Phase 3: Storage Engine Trait and fjall Backend
|
|
|
|
## Status: COMPLETE (140 tests passing: 128 unit + 12 integration)
|
|
|
|
## Phase Deliverable
|
|
|
|
The `StorageEngine` trait abstraction and two implementations: `FjallBackend` (fjall 3 LSM-tree) for production and `InMemoryBackend` (BTreeMap + RwLock) for deterministic testing. Key encoding follows the subject-prefix pattern with a `Tag` discriminant. `FjallStorage` coordinates three keyspaces per entity kind. `FjallAtomicBatch` provides cross-keyspace atomic writes.
|
|
|
|
This phase is the durable entity store — where metadata, signal checkpoints, and index data live. It is separate from the WAL (m1p2): the WAL is the signal event source of truth; the storage engine is where derived entity state is persisted.
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [x] `StorageEngine` trait with `get`, `put`, `delete`, `scan_prefix`, `write_batch`, `flush` operations
|
|
- [x] Key encoding: `[entity_id: 8 bytes BE][0x00][Tag: 1 byte][suffix...]` with `Tag` enum (`Evt`=0x01, `Sig`=0x02, `Meta`=0x03, `Rel`=0x04, `Mv`=0x05, `Idx`=0x06)
|
|
- [x] `encode_key`, `parse_key` roundtrip correctly for all tag variants and arbitrary suffixes
|
|
- [x] `entity_prefix` (9 bytes) and `entity_tag_prefix` (10 bytes) for scoped prefix scans
|
|
- [x] Byte-lexicographic key ordering matches numeric entity ID ordering (property tested)
|
|
- [x] `FjallBackend` wraps a single fjall `Keyspace`, implements `StorageEngine`
|
|
- [x] `FjallStorage` owns a fjall `Database` with three keyspaces: "items", "users", "creators" (one per `EntityKind`)
|
|
- [x] `FjallStorage::backend(EntityKind)` routes to the correct keyspace backend
|
|
- [x] Entity kind isolation: same key written to different entity kinds does not collide
|
|
- [x] `FjallAtomicBatch` provides cross-keyspace atomic writes via `fjall::OwnedWriteBatch`
|
|
- [x] Data persists across close and reopen (`flush_all` + reopen test)
|
|
- [x] `InMemoryBackend` uses `BTreeMap` + `RwLock` for deterministic, sorted, concurrent testing
|
|
- [x] `WriteBatch` and `BatchOp` types for atomic multi-operation writes
|
|
- [x] `PrefixIterator` type alias for boxed prefix scan iterators
|
|
- [x] Property tests with proptest: encode/parse roundtrip, prefix ordering, prefix containment
|
|
- [x] Criterion benchmarks passing
|
|
- [x] `cargo fmt` clean, `cargo clippy -D warnings` clean, all 140 tests pass (128 unit + 12 integration)
|
|
|
|
## Dependencies
|
|
|
|
- **Requires:** m1p1 (types: `EntityId`, `EntityKind` — used in key encoding; `StorageError` references `LumenError` error hierarchy)
|
|
- **Blocks:** m1p4 (Signal Ledger checkpoints via `StorageEngine`), m1p5 (Entity CRUD via `StorageEngine`)
|
|
|
|
## Research References
|
|
|
|
- [thoughts.md](../../../../thoughts.md) — Part V.9 (hybrid storage: fjall for entity metadata, WAL for signals), Part V.12 (subject-prefix keys: `[entity_id][NUL][TAG][suffix]` for co-location and prefix scan efficiency)
|
|
- [CODING_GUIDELINES.md](../../../../CODING_GUIDELINES.md) — Section 2 (key encoding: big-endian entity IDs for lexicographic ordering), Section 10 (fjall as the primary storage backend)
|
|
|
|
## Task Index
|
|
|
|
| # | Task | Delivers | Depends On | Complexity | Status |
|
|
|---|------|----------|------------|------------|--------|
|
|
| 01 | StorageEngine Trait and Key Encoding | `StorageEngine`, `Tag`, `encode_key`, `parse_key`, `entity_prefix`, `entity_tag_prefix`, `WriteBatch`, `BatchOp`, `PrefixIterator`, `StorageError` | None | M | COMPLETE |
|
|
| 02 | FjallBackend | `FjallBackend`, `FjallStorage`, `FjallAtomicBatch`, persistence tests | Task 01 | M | COMPLETE |
|
|
| 03 | InMemoryBackend | `InMemoryBackend`, property tests, benchmarks | Task 01 | S | COMPLETE |
|
|
|
|
## Task Dependency DAG
|
|
|
|
```
|
|
Task 01: StorageEngine Trait + Key Encoding
|
|
|
|
|
+---------------------------+
|
|
| |
|
|
v v
|
|
Task 02: FjallBackend Task 03: InMemoryBackend
|
|
```
|
|
|
|
Tasks 02 and 03 are fully parallelizable after Task 01's trait and key encoding are defined.
|
|
|
|
## File Layout
|
|
|
|
```
|
|
tidal/src/
|
|
storage/
|
|
mod.rs -- pub use re-exports
|
|
engine.rs -- Task 01: StorageEngine trait
|
|
keys.rs -- Task 01: Tag, encode_key, parse_key, entity_prefix, entity_tag_prefix
|
|
batch.rs -- Task 01: WriteBatch, BatchOp
|
|
iterator.rs -- Task 01: PrefixIterator type alias
|
|
error.rs -- Task 01: StorageError
|
|
fjall.rs -- Task 02: FjallBackend, FjallStorage, FjallAtomicBatch
|
|
memory.rs -- Task 03: InMemoryBackend
|
|
lib.rs -- pub mod storage (already present)
|
|
```
|
|
|
|
## Lessons Learned
|
|
|
|
1. **Keyspaces are per `EntityKind`**, not per data category. The `Tag` enum provides data-category namespace within each entity-kind keyspace. This means `FjallStorage` has three keyspaces: "items", "users", "creators". A `Tag::Meta` key in the "items" keyspace is distinct from `Tag::Meta` in the "users" keyspace.
|
|
|
|
2. **MSRV bumped to 1.91** for fjall 3 compatibility. Documented in `tidal/Cargo.toml`.
|
|
|
|
3. **`LumenError` name** is a legacy artifact from the predecessor project (Engram/Lumen). Will be renamed to `TidalError` when convenient but does not block m1p3 progress.
|
|
|
|
4. **`FjallAtomicBatch`** provides cross-keyspace atomicity via `fjall::OwnedWriteBatch`. This is the mechanism for m1p4 checkpoint writes that touch multiple entity kinds atomically.
|