Implements the foundation of tidalDB's data pipeline: **Phase 1 – Schema primitives** - EntityId newtype (u64, big-endian ordering) - SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows - SchemaBuilder with full constraint validation (duplicates, identifiers, half-life, windows, velocity) - LumenError wrapping all subsystems with required From impls **Phase 2 – Write-Ahead Log** - Length-prefixed, BLAKE3-protected entry format - Group-commit writer (batch up to 100 events / 10 ms) - Double-buffered content-hash deduplication - Checkpoint, truncation, and crash-recovery with full replay - Integration, property, and UAT tests (incl. 5,500-event deterministic UAT) - Proptest coverage scaled to 10 000 events/run (was ≤500) to meet acceptance criterion; cases reduced 100→10 to keep runtime comparable **Phase 3 – Storage engine** - StorageEngine trait (get/put/delete/scan/batch/flush) - Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers - InMemoryBackend (BTreeMap + RwLock) - FjallStorage with three isolated keyspaces and atomic batch helper - Property tests for key ordering and round-trip correctness Also adds planning docs for phases 4-5, research docs, architecture overview, and roadmap updates. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
88 lines
5.1 KiB
Markdown
88 lines
5.1 KiB
Markdown
# Milestone 1, Phase 5: Entity CRUD and Signal Write API
|
|
|
|
## Phase Deliverable
|
|
|
|
The public API surface for Milestone 1: `TidalDB::open()`, `TidalDB::shutdown()`, entity metadata write/read, and the `signal()` method that writes through the WAL and updates in-memory state. This is the interface the M1 UAT scenario tests against -- the first thing a developer touches when they embed tidalDB.
|
|
|
|
m1p5 is the integration layer. It does not introduce new algorithms or data structures. It composes m1p1 (schema types), m1p2 (WAL), m1p3 (storage engine), and m1p4 (signal ledger) into a single struct that presents a clean, ergonomic API.
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [ ] `TidalDB::open(config)` opens storage, creates signal ledger, restores from checkpoint + WAL replay, returns `Result<TidalDB>`
|
|
- [ ] `TidalDB::shutdown()` checkpoints all in-memory state, syncs WAL, closes storage cleanly
|
|
- [ ] `db.write_item(id, metadata)` stores entity metadata via `StorageEngine::put` with `Tag::Meta`
|
|
- [ ] `db.read_item(id)` retrieves entity metadata
|
|
- [ ] `db.signal(signal_type, entity_id, weight, timestamp)` atomically: appends to WAL, updates decay scores, updates windowed counters
|
|
- [ ] `db.read_decay_score(entity_id, signal_type, decay_rate_idx, query_time)` returns current decayed score
|
|
- [ ] `db.read_windowed_count(entity_id, signal_type, window)` returns count within window
|
|
- [ ] `db.read_velocity(entity_id, signal_type, window)` returns count / window_duration
|
|
- [ ] Full M1 UAT scenario passes as an integration test
|
|
- [ ] `TidalDB` is `Send + Sync` -- safe to share across threads behind `Arc`
|
|
|
|
## Dependencies
|
|
|
|
- **Requires:** m1p1 (types), m1p2 (WAL), m1p3 (storage engine), m1p4 (signal ledger)
|
|
- **Blocks:** Milestone 2 (ranked retrieval)
|
|
|
|
## Research References
|
|
|
|
- [CODING_GUIDELINES.md](../../../../CODING_GUIDELINES.md) -- Section 9 (public API surface), Section 7 (error handling)
|
|
- [API.md](../../../../API.md) -- Initialization, write path, lifecycle
|
|
|
|
## Spec References
|
|
|
|
- [docs/specs/00-architecture-overview.md](../../../specs/00-architecture-overview.md) -- Section 2 (system diagram: write path and read path separation), Section 8 (code module map showing `lib.rs` as TidalDB struct and public API)
|
|
- [docs/specs/03-signal-system.md](../../../specs/03-signal-system.md) -- Section 8 (signal write path: WAL append -> hot-tier update -> warm-tier update -> return)
|
|
|
|
## Task Index
|
|
|
|
| # | Task | Delivers | Depends On | Complexity |
|
|
|---|------|----------|------------|------------|
|
|
| 01 | TidalDB Core | `TidalDB` struct, `Config`, `open()`, `shutdown()`, entity metadata CRUD | None | M |
|
|
| 02 | Signal Write and Read API | `db.signal()`, `db.read_decay_score()`, `db.read_windowed_count()`, `db.read_velocity()` | Task 01 | S |
|
|
| 03 | Integration Test and UAT | Full M1 UAT scenario as integration test, multi-threaded safety test | Task 01, Task 02 | S |
|
|
|
|
## Task Dependency DAG
|
|
|
|
```
|
|
Task 01: TidalDB Core (struct, open, shutdown, entity CRUD)
|
|
|
|
|
v
|
|
Task 02: Signal Write and Read API (signal, read_decay_score, etc.)
|
|
|
|
|
v
|
|
Task 03: Integration Test and UAT (full M1 scenario, multi-threaded test)
|
|
```
|
|
|
|
Linear dependency chain. Each task builds directly on the previous.
|
|
|
|
## File Layout
|
|
|
|
```
|
|
tidal/src/
|
|
lib.rs -- TidalDB struct, Config, public API, re-exports (MODIFIED)
|
|
signals/
|
|
mod.rs -- (unchanged from m1p4)
|
|
hot.rs -- (unchanged)
|
|
warm.rs -- (unchanged)
|
|
ledger.rs -- (unchanged)
|
|
checkpoint.rs -- (unchanged)
|
|
storage/ -- (unchanged from m1p3)
|
|
schema/ -- (unchanged from m1p1)
|
|
wal/mod.rs -- (m1p2, provides WalWriter impl)
|
|
query/mod.rs -- empty (Milestone 2)
|
|
ranking/mod.rs -- empty (Milestone 2)
|
|
tidal/tests/
|
|
m1_uat.rs -- Task 03: Full M1 UAT integration test
|
|
```
|
|
|
|
## Open Questions
|
|
|
|
1. **String IDs vs numeric IDs in public API** -- API.md uses string IDs (`"item_abc"`). Internal types use `EntityId(u64)`. For M1, the public API accepts `EntityId` directly (the internal type). String-to-u64 mapping is an M2 concern when the query language parser is built. This simplifies M1 without limiting future API evolution.
|
|
|
|
2. **Entity metadata format** -- M1 stores entity metadata as opaque bytes. The application serializes metadata to bytes before calling `write_item`. Structured metadata fields (title, category, etc.) are an M2 concern when metadata indexes are built. For M1, metadata is a `&[u8]` blob stored at `Tag::Meta`.
|
|
|
|
3. **WAL integration** -- m1p5 connects the WAL (m1p2) to the signal ledger (m1p4) through the `WalWriter` trait. The `TidalDB::open()` sequence is: open storage -> restore signal ledger from checkpoint -> replay WAL from checkpoint sequence -> ready. If m1p2 is not complete when m1p5 starts, the `NoopWalWriter` is used for testing, and WAL integration is added when m1p2 delivers.
|
|
|
|
4. **User and creator entities** -- M1 only supports Item entities. Users and creators are deferred to M3. `TidalDB` exposes `write_item` / `read_item` but not `write_user` / `write_creator`. The underlying `FjallStorage` already has keyspaces for all three entity kinds.
|