tidaldb/docs/planning/milestone-1/phase-3/OVERVIEW.md
jordan 29400d48db feat: implement Milestone 1 phases 1-3 — schema, WAL, and storage layer
Implements the foundation of tidalDB's data pipeline:

**Phase 1 – Schema primitives**
- EntityId newtype (u64, big-endian ordering)
- SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows
- SchemaBuilder with full constraint validation (duplicates, identifiers,
  half-life, windows, velocity)
- LumenError wrapping all subsystems with required From impls

**Phase 2 – Write-Ahead Log**
- Length-prefixed, BLAKE3-protected entry format
- Group-commit writer (batch up to 100 events / 10 ms)
- Double-buffered content-hash deduplication
- Checkpoint, truncation, and crash-recovery with full replay
- Integration, property, and UAT tests (incl. 5,500-event deterministic UAT)
- Proptest coverage scaled to 10 000 events/run (was ≤500) to meet
  acceptance criterion; cases reduced 100→10 to keep runtime comparable

**Phase 3 – Storage engine**
- StorageEngine trait (get/put/delete/scan/batch/flush)
- Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers
- InMemoryBackend (BTreeMap + RwLock)
- FjallStorage with three isolated keyspaces and atomic batch helper
- Property tests for key ordering and round-trip correctness

Also adds planning docs for phases 4-5, research docs, architecture
overview, and roadmap updates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 16:43:24 -07:00

5.2 KiB

Milestone 1, Phase 3: Storage Engine Trait and fjall Backend

Status: COMPLETE (140 tests passing: 128 unit + 12 integration)

Phase Deliverable

The StorageEngine trait abstraction and two implementations: FjallBackend (fjall 3 LSM-tree) for production and InMemoryBackend (BTreeMap + RwLock) for deterministic testing. Key encoding follows the subject-prefix pattern with a Tag discriminant. FjallStorage coordinates three keyspaces per entity kind. FjallAtomicBatch provides cross-keyspace atomic writes.

This phase is the durable entity store — where metadata, signal checkpoints, and index data live. It is separate from the WAL (m1p2): the WAL is the signal event source of truth; the storage engine is where derived entity state is persisted.

Acceptance Criteria

  • StorageEngine trait with get, put, delete, scan_prefix, write_batch, flush operations
  • Key encoding: [entity_id: 8 bytes BE][0x00][Tag: 1 byte][suffix...] with Tag enum (Evt=0x01, Sig=0x02, Meta=0x03, Rel=0x04, Mv=0x05, Idx=0x06)
  • encode_key, parse_key roundtrip correctly for all tag variants and arbitrary suffixes
  • entity_prefix (9 bytes) and entity_tag_prefix (10 bytes) for scoped prefix scans
  • Byte-lexicographic key ordering matches numeric entity ID ordering (property tested)
  • FjallBackend wraps a single fjall Keyspace, implements StorageEngine
  • FjallStorage owns a fjall Database with three keyspaces: "items", "users", "creators" (one per EntityKind)
  • FjallStorage::backend(EntityKind) routes to the correct keyspace backend
  • Entity kind isolation: same key written to different entity kinds does not collide
  • FjallAtomicBatch provides cross-keyspace atomic writes via fjall::OwnedWriteBatch
  • Data persists across close and reopen (flush_all + reopen test)
  • InMemoryBackend uses BTreeMap + RwLock for deterministic, sorted, concurrent testing
  • WriteBatch and BatchOp types for atomic multi-operation writes
  • PrefixIterator type alias for boxed prefix scan iterators
  • Property tests with proptest: encode/parse roundtrip, prefix ordering, prefix containment
  • Criterion benchmarks passing
  • cargo fmt clean, cargo clippy -D warnings clean, all 140 tests pass (128 unit + 12 integration)

Dependencies

  • Requires: m1p1 (types: EntityId, EntityKind — used in key encoding; StorageError references LumenError error hierarchy)
  • Blocks: m1p4 (Signal Ledger checkpoints via StorageEngine), m1p5 (Entity CRUD via StorageEngine)

Research References

  • thoughts.md — Part V.9 (hybrid storage: fjall for entity metadata, WAL for signals), Part V.12 (subject-prefix keys: [entity_id][NUL][TAG][suffix] for co-location and prefix scan efficiency)
  • CODING_GUIDELINES.md — Section 2 (key encoding: big-endian entity IDs for lexicographic ordering), Section 10 (fjall as the primary storage backend)

Task Index

# Task Delivers Depends On Complexity Status
01 StorageEngine Trait and Key Encoding StorageEngine, Tag, encode_key, parse_key, entity_prefix, entity_tag_prefix, WriteBatch, BatchOp, PrefixIterator, StorageError None M COMPLETE
02 FjallBackend FjallBackend, FjallStorage, FjallAtomicBatch, persistence tests Task 01 M COMPLETE
03 InMemoryBackend InMemoryBackend, property tests, benchmarks Task 01 S COMPLETE

Task Dependency DAG

Task 01: StorageEngine Trait + Key Encoding
    |
    +---------------------------+
    |                           |
    v                           v
Task 02: FjallBackend       Task 03: InMemoryBackend

Tasks 02 and 03 are fully parallelizable after Task 01's trait and key encoding are defined.

File Layout

tidal/src/
  storage/
    mod.rs          -- pub use re-exports
    engine.rs       -- Task 01: StorageEngine trait
    keys.rs         -- Task 01: Tag, encode_key, parse_key, entity_prefix, entity_tag_prefix
    batch.rs        -- Task 01: WriteBatch, BatchOp
    iterator.rs     -- Task 01: PrefixIterator type alias
    error.rs        -- Task 01: StorageError
    fjall.rs        -- Task 02: FjallBackend, FjallStorage, FjallAtomicBatch
    memory.rs       -- Task 03: InMemoryBackend
  lib.rs            -- pub mod storage (already present)

Lessons Learned

  1. Keyspaces are per EntityKind, not per data category. The Tag enum provides data-category namespace within each entity-kind keyspace. This means FjallStorage has three keyspaces: "items", "users", "creators". A Tag::Meta key in the "items" keyspace is distinct from Tag::Meta in the "users" keyspace.

  2. MSRV bumped to 1.91 for fjall 3 compatibility. Documented in tidal/Cargo.toml.

  3. LumenError name is a legacy artifact from the predecessor project (Engram/Lumen). Will be renamed to TidalError when convenient but does not block m1p3 progress.

  4. FjallAtomicBatch provides cross-keyspace atomicity via fjall::OwnedWriteBatch. This is the mechanism for m1p4 checkpoint writes that touch multiple entity kinds atomically.