Implements the foundation of tidalDB's data pipeline: **Phase 1 – Schema primitives** - EntityId newtype (u64, big-endian ordering) - SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows - SchemaBuilder with full constraint validation (duplicates, identifiers, half-life, windows, velocity) - LumenError wrapping all subsystems with required From impls **Phase 2 – Write-Ahead Log** - Length-prefixed, BLAKE3-protected entry format - Group-commit writer (batch up to 100 events / 10 ms) - Double-buffered content-hash deduplication - Checkpoint, truncation, and crash-recovery with full replay - Integration, property, and UAT tests (incl. 5,500-event deterministic UAT) - Proptest coverage scaled to 10 000 events/run (was ≤500) to meet acceptance criterion; cases reduced 100→10 to keep runtime comparable **Phase 3 – Storage engine** - StorageEngine trait (get/put/delete/scan/batch/flush) - Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers - InMemoryBackend (BTreeMap + RwLock) - FjallStorage with three isolated keyspaces and atomic batch helper - Property tests for key ordering and round-trip correctness Also adds planning docs for phases 4-5, research docs, architecture overview, and roadmap updates. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
84 lines
6.5 KiB
Markdown
84 lines
6.5 KiB
Markdown
# Milestone 1, Phase 4: Signal Ledger -- Decay Scores and Windowed Aggregation
|
|
|
|
## Phase Deliverable
|
|
|
|
The in-memory per-entity signal state: running exponential decay scores with O(1) update and O(1) read, bucketed windowed counters for 1h/24h/7d aggregate queries, raw velocity computation, and checkpoint/restore for crash recovery. This is the core temporal engine that makes signals a database primitive instead of application math.
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [ ] `HotSignalState` is `#[repr(C, align(64))]` -- one L1 cache line per signal type per entity
|
|
- [ ] Running decay formula `S(t) = S(t_prev) * exp(-lambda * dt) + weight` is mathematically exact, verified against analytical brute-force computation to 6 decimal places across 10,000 random event sequences (property test P2)
|
|
- [ ] Out-of-order events handled correctly: when `t_event < last_update`, weight is pre-decayed: `score += weight * exp(-lambda * (last_update - t_event))` -- no timestamp regression
|
|
- [ ] Decay scores monotonically decrease without new events (property test P1)
|
|
- [ ] Decay scores are always non-negative (invariant INV-SIG-3)
|
|
- [ ] Windowed counts use `BucketedCounter` with per-minute buckets (60) and per-hour buckets (168), supporting 1h/24h/7d windows via bucket summation
|
|
- [ ] Velocity = `windowed_count / window_duration_seconds` -- raw velocity for all configured windows
|
|
- [ ] `SignalLedger` coordinates hot and warm tiers with `DashMap<(EntityId, SignalTypeId), _>` for concurrent access
|
|
- [ ] State checkpointed to `StorageEngine` via `Tag::Sig`; restore from checkpoint reconstructs exact state
|
|
- [ ] Property tests P1-P4 pass: monotonic decrease, analytical match, windowed count correctness, out-of-order commutativity
|
|
|
|
## Dependencies
|
|
|
|
- **Requires:** m1p1 (types: `EntityId`, `Timestamp`, `DecayModel`, `Window`, `WindowSet`, `SignalTypeDef`), m1p2 (WAL: `WalEvent` type for replay interface -- m1p4 defines the `WalWriter` trait but does NOT implement WAL; the trait is a dependency boundary), m1p3 (storage: `StorageEngine` trait, `Tag::Sig`, key encoding for checkpoint persistence)
|
|
- **Blocks:** m1p5 (Entity CRUD and Signal Write API)
|
|
|
|
## Research References
|
|
|
|
- [docs/research/tidaldb_signal_ledger.md](../../../research/tidaldb_signal_ledger.md) -- three-tier architecture, running-score formula proof, BucketedCounter design, EntityState struct (~128 bytes), performance estimates (~36ns write, ~15ns read), Scotty stream-slicing approach
|
|
- [thoughts.md](../../../../thoughts.md) -- Part V.5 (quarantine-first signal ingestion), Part V.6 (group commit), Part V.14 (cache-line alignment for hot-path structs)
|
|
|
|
## Spec References
|
|
|
|
- [docs/specs/03-signal-system.md](../../../specs/03-signal-system.md) -- HotSignalState layout (Section 3), decay computation (Section 4), velocity computation (Section 5), windowed aggregation (Section 6), write path (Section 8), invariants INV-SIG-1 through INV-SIG-5, INV-CON-1 through INV-CON-3, property tests P1-P4, performance targets (Section 12)
|
|
- [docs/specs/00-architecture-overview.md](../../../specs/00-architecture-overview.md) -- Materializer trait (`on_event`, `checkpoint`, `restore`), signal write walkthrough (Section 5), code module map showing `signal/hot.rs`, `signal/warm.rs`
|
|
|
|
## Task Index
|
|
|
|
| # | Task | Delivers | Depends On | Complexity |
|
|
|---|------|----------|------------|------------|
|
|
| 01 | Hot-Tier Signal State | `HotSignalState`, atomic decay score CAS, out-of-order handling, lazy read-time decay | None | L |
|
|
| 02 | Warm-Tier Bucketed Counters | `BucketedCounter`, per-minute/per-hour buckets, windowed count queries, all-time counter | None | M |
|
|
| 03 | Signal Ledger and Velocity | `SignalLedger` coordinating hot+warm, DashMap concurrent access, velocity computation, `WalWriter` trait boundary | Task 01, Task 02 | L |
|
|
| 04 | Checkpoint and Restore | Serialization of hot+warm state to `StorageEngine`, restore from checkpoint, integration with key encoding | Task 03 | M |
|
|
|
|
## Task Dependency DAG
|
|
|
|
```
|
|
Task 01: Hot-Tier Signal State Task 02: Warm-Tier Bucketed Counters
|
|
| |
|
|
+-----------------------------------+
|
|
|
|
|
v
|
|
Task 03: Signal Ledger and Velocity
|
|
|
|
|
v
|
|
Task 04: Checkpoint and Restore
|
|
```
|
|
|
|
Tasks 01 and 02 are fully parallelizable -- they share no types or state. Task 03 composes them. Task 04 adds persistence.
|
|
|
|
## File Layout
|
|
|
|
```
|
|
tidal/src/
|
|
signals/
|
|
mod.rs -- pub use re-exports, SignalTypeId newtype
|
|
hot.rs -- Task 01: HotSignalState, on_signal, current_score
|
|
warm.rs -- Task 02: BucketedCounter, windowed_count, all_time_count
|
|
ledger.rs -- Task 03: SignalLedger, WalWriter trait, velocity
|
|
checkpoint.rs -- Task 04: checkpoint, restore, serialization
|
|
lib.rs -- (unchanged, already declares pub mod signals)
|
|
```
|
|
|
|
## Open Questions
|
|
|
|
1. **`unsafe_code` and `#[repr(C, align(64))]`** -- The crate uses `#![forbid(unsafe_code)]`. `#[repr(C, align(64))]` itself does not require `unsafe` -- it is a layout attribute on a safe struct. Atomic operations (`AtomicU64`) are safe Rust. No `unsafe` is needed for m1p4. Confirmed: the spec's `HotSignalState` uses `AtomicU64` for f64 bit patterns via `f64::from_bits`/`f64::to_bits`, which are safe functions.
|
|
|
|
2. **`DashMap` dependency** -- `dashmap` crate needs to be added to `Cargo.toml`. It is a well-maintained, production-quality concurrent hash map with sharded locks. Alternatives (`crossbeam::SkipList`, manual sharded `RwLock<HashMap>`) are less ergonomic. The crossbeam dependency already exists. Decision: use `dashmap`.
|
|
|
|
3. **WAL trait boundary** -- m1p4 defines a `WalWriter` trait with a single method (`append`) that m1p2 will implement. For m1p4 testing, a no-op `WalWriter` is used. This allows m1p4 to be built and tested independently of m1p2, while establishing the correct dependency boundary. The `SignalLedger` takes a `Box<dyn WalWriter>` at construction.
|
|
|
|
4. **`SignalTypeId` representation** -- The spec uses `u16` for `signal_type_id`. Since the maximum is 64 signal types per entity kind, `u16` is generous but matches the spec. Introduce a `SignalTypeId(u16)` newtype in `signals/mod.rs`, assigned by the schema at registration time.
|
|
|
|
5. **Three decay scores vs one** -- The spec allocates space for 3 decay rates per signal type (for signals participating in multiple ranking profiles with different half-lives). For M1, only the primary decay rate (index 0) is used. The other two slots are zeroed. This matches the spec layout without requiring multi-profile support.
|