Implements the foundation of tidalDB's data pipeline: **Phase 1 – Schema primitives** - EntityId newtype (u64, big-endian ordering) - SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows - SchemaBuilder with full constraint validation (duplicates, identifiers, half-life, windows, velocity) - LumenError wrapping all subsystems with required From impls **Phase 2 – Write-Ahead Log** - Length-prefixed, BLAKE3-protected entry format - Group-commit writer (batch up to 100 events / 10 ms) - Double-buffered content-hash deduplication - Checkpoint, truncation, and crash-recovery with full replay - Integration, property, and UAT tests (incl. 5,500-event deterministic UAT) - Proptest coverage scaled to 10 000 events/run (was ≤500) to meet acceptance criterion; cases reduced 100→10 to keep runtime comparable **Phase 3 – Storage engine** - StorageEngine trait (get/put/delete/scan/batch/flush) - Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers - InMemoryBackend (BTreeMap + RwLock) - FjallStorage with three isolated keyspaces and atomic batch helper - Property tests for key ordering and round-trip correctness Also adds planning docs for phases 4-5, research docs, architecture overview, and roadmap updates. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5.1 KiB
Milestone 1, Phase 5: Entity CRUD and Signal Write API
Phase Deliverable
The public API surface for Milestone 1: TidalDB::open(), TidalDB::shutdown(), entity metadata write/read, and the signal() method that writes through the WAL and updates in-memory state. This is the interface the M1 UAT scenario tests against -- the first thing a developer touches when they embed tidalDB.
m1p5 is the integration layer. It does not introduce new algorithms or data structures. It composes m1p1 (schema types), m1p2 (WAL), m1p3 (storage engine), and m1p4 (signal ledger) into a single struct that presents a clean, ergonomic API.
Acceptance Criteria
TidalDB::open(config)opens storage, creates signal ledger, restores from checkpoint + WAL replay, returnsResult<TidalDB>TidalDB::shutdown()checkpoints all in-memory state, syncs WAL, closes storage cleanlydb.write_item(id, metadata)stores entity metadata viaStorageEngine::putwithTag::Metadb.read_item(id)retrieves entity metadatadb.signal(signal_type, entity_id, weight, timestamp)atomically: appends to WAL, updates decay scores, updates windowed countersdb.read_decay_score(entity_id, signal_type, decay_rate_idx, query_time)returns current decayed scoredb.read_windowed_count(entity_id, signal_type, window)returns count within windowdb.read_velocity(entity_id, signal_type, window)returns count / window_duration- Full M1 UAT scenario passes as an integration test
TidalDBisSend + Sync-- safe to share across threads behindArc
Dependencies
- Requires: m1p1 (types), m1p2 (WAL), m1p3 (storage engine), m1p4 (signal ledger)
- Blocks: Milestone 2 (ranked retrieval)
Research References
- CODING_GUIDELINES.md -- Section 9 (public API surface), Section 7 (error handling)
- API.md -- Initialization, write path, lifecycle
Spec References
- docs/specs/00-architecture-overview.md -- Section 2 (system diagram: write path and read path separation), Section 8 (code module map showing
lib.rsas TidalDB struct and public API) - docs/specs/03-signal-system.md -- Section 8 (signal write path: WAL append -> hot-tier update -> warm-tier update -> return)
Task Index
| # | Task | Delivers | Depends On | Complexity |
|---|---|---|---|---|
| 01 | TidalDB Core | TidalDB struct, Config, open(), shutdown(), entity metadata CRUD |
None | M |
| 02 | Signal Write and Read API | db.signal(), db.read_decay_score(), db.read_windowed_count(), db.read_velocity() |
Task 01 | S |
| 03 | Integration Test and UAT | Full M1 UAT scenario as integration test, multi-threaded safety test | Task 01, Task 02 | S |
Task Dependency DAG
Task 01: TidalDB Core (struct, open, shutdown, entity CRUD)
|
v
Task 02: Signal Write and Read API (signal, read_decay_score, etc.)
|
v
Task 03: Integration Test and UAT (full M1 scenario, multi-threaded test)
Linear dependency chain. Each task builds directly on the previous.
File Layout
tidal/src/
lib.rs -- TidalDB struct, Config, public API, re-exports (MODIFIED)
signals/
mod.rs -- (unchanged from m1p4)
hot.rs -- (unchanged)
warm.rs -- (unchanged)
ledger.rs -- (unchanged)
checkpoint.rs -- (unchanged)
storage/ -- (unchanged from m1p3)
schema/ -- (unchanged from m1p1)
wal/mod.rs -- (m1p2, provides WalWriter impl)
query/mod.rs -- empty (Milestone 2)
ranking/mod.rs -- empty (Milestone 2)
tidal/tests/
m1_uat.rs -- Task 03: Full M1 UAT integration test
Open Questions
-
String IDs vs numeric IDs in public API -- API.md uses string IDs (
"item_abc"). Internal types useEntityId(u64). For M1, the public API acceptsEntityIddirectly (the internal type). String-to-u64 mapping is an M2 concern when the query language parser is built. This simplifies M1 without limiting future API evolution. -
Entity metadata format -- M1 stores entity metadata as opaque bytes. The application serializes metadata to bytes before calling
write_item. Structured metadata fields (title, category, etc.) are an M2 concern when metadata indexes are built. For M1, metadata is a&[u8]blob stored atTag::Meta. -
WAL integration -- m1p5 connects the WAL (m1p2) to the signal ledger (m1p4) through the
WalWritertrait. TheTidalDB::open()sequence is: open storage -> restore signal ledger from checkpoint -> replay WAL from checkpoint sequence -> ready. If m1p2 is not complete when m1p5 starts, theNoopWalWriteris used for testing, and WAL integration is added when m1p2 delivers. -
User and creator entities -- M1 only supports Item entities. Users and creators are deferred to M3.
TidalDBexposeswrite_item/read_itembut notwrite_user/write_creator. The underlyingFjallStoragealready has keyspaces for all three entity kinds.