Implements the foundation of tidalDB's data pipeline: **Phase 1 – Schema primitives** - EntityId newtype (u64, big-endian ordering) - SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows - SchemaBuilder with full constraint validation (duplicates, identifiers, half-life, windows, velocity) - LumenError wrapping all subsystems with required From impls **Phase 2 – Write-Ahead Log** - Length-prefixed, BLAKE3-protected entry format - Group-commit writer (batch up to 100 events / 10 ms) - Double-buffered content-hash deduplication - Checkpoint, truncation, and crash-recovery with full replay - Integration, property, and UAT tests (incl. 5,500-event deterministic UAT) - Proptest coverage scaled to 10 000 events/run (was ≤500) to meet acceptance criterion; cases reduced 100→10 to keep runtime comparable **Phase 3 – Storage engine** - StorageEngine trait (get/put/delete/scan/batch/flush) - Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers - InMemoryBackend (BTreeMap + RwLock) - FjallStorage with three isolated keyspaces and atomic batch helper - Property tests for key ordering and round-trip correctness Also adds planning docs for phases 4-5, research docs, architecture overview, and roadmap updates. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
435 lines
14 KiB
Markdown
435 lines
14 KiB
Markdown
# Task 02: Signal Write and Read API
|
|
|
|
## Context
|
|
|
|
**Milestone:** 1 -- Signal Engine
|
|
**Phase:** m1p5 -- Entity CRUD and Signal Write API
|
|
**Depends On:** Task 01 (TidalDB Core)
|
|
**Blocks:** Task 03 (Integration Test and UAT)
|
|
**Complexity:** S
|
|
|
|
## Objective
|
|
|
|
Expose the signal write and read operations on the `TidalDB` struct: `signal()`, `read_decay_score()`, `read_windowed_count()`, `read_velocity()`. These are thin wrappers around the `SignalLedger` methods, providing the ergonomic public API that the M1 UAT scenario tests against.
|
|
|
|
This task is intentionally small. All the complexity lives in m1p4 (signal ledger). This task connects that complexity to the public API surface with proper error handling and documentation.
|
|
|
|
## Requirements
|
|
|
|
- `db.signal(signal_type, entity_id, weight, timestamp)` delegates to `signal_ledger.record_signal()`
|
|
- `db.read_decay_score(entity_id, signal_type, decay_rate_idx, query_time)` delegates to `signal_ledger.read_decay_score()`
|
|
- `db.read_windowed_count(entity_id, signal_type, window)` delegates to `signal_ledger.read_windowed_count()`
|
|
- `db.read_velocity(entity_id, signal_type, window)` delegates to `signal_ledger.read_velocity()`
|
|
- All methods take `&self` (no mutable access)
|
|
- Error types are the standard `LumenError` variants
|
|
- Methods are documented with examples
|
|
|
|
## Technical Design
|
|
|
|
### Module Structure
|
|
|
|
No new files. Methods are added to the `TidalDB` impl block in `lib.rs`.
|
|
|
|
### Public API
|
|
|
|
```rust
|
|
// === lib.rs (additions to TidalDB impl) ===
|
|
|
|
impl TidalDB {
|
|
/// Write a signal event.
|
|
///
|
|
/// Records an engagement event (view, like, skip, etc.) targeting an item.
|
|
/// The signal is:
|
|
/// 1. Appended to the WAL (once m1p2 is integrated)
|
|
/// 2. Applied to the hot-tier running decay scores (O(1) update)
|
|
/// 3. Applied to the warm-tier bucketed counters (atomic increment)
|
|
///
|
|
/// The next read query reflects the updated state immediately.
|
|
///
|
|
/// # Arguments
|
|
///
|
|
/// - `signal_type`: Name of the signal (must match a schema-defined signal)
|
|
/// - `entity_id`: The target item's ID
|
|
/// - `weight`: Signal weight (typically 1.0; 0.0-1.0 for completion ratio)
|
|
/// - `timestamp`: Event timestamp (use `Timestamp::now()` for current time)
|
|
///
|
|
/// # Errors
|
|
///
|
|
/// - `LumenError::Schema` if `signal_type` is not defined in the schema
|
|
/// - `LumenError::Durability` if the WAL write fails (when WAL is active)
|
|
///
|
|
/// # Example
|
|
///
|
|
/// ```ignore
|
|
/// db.signal("view", EntityId::new(42), 1.0, Timestamp::now())?;
|
|
/// db.signal("completion", EntityId::new(42), 0.94, Timestamp::now())?;
|
|
/// ```
|
|
pub fn signal(
|
|
&self,
|
|
signal_type: &str,
|
|
entity_id: EntityId,
|
|
weight: f64,
|
|
timestamp: Timestamp,
|
|
) -> Result<()>;
|
|
|
|
/// Read the current decay score for a signal on an entity.
|
|
///
|
|
/// Returns the running exponential decay score at `query_time`. The score
|
|
/// accounts for all previously recorded signals, each decayed by
|
|
/// `exp(-lambda * age)` where `age` is the time since the event.
|
|
///
|
|
/// Returns `None` if no signals of this type have been recorded for this
|
|
/// entity.
|
|
///
|
|
/// # Arguments
|
|
///
|
|
/// - `entity_id`: The target item's ID
|
|
/// - `signal_type`: Name of the signal
|
|
/// - `decay_rate_idx`: Index of the decay rate (0 for primary, 1-2 for secondary)
|
|
/// - `query_time`: The time at which to evaluate the score
|
|
///
|
|
/// # Errors
|
|
///
|
|
/// - `LumenError::Schema` if `signal_type` is not defined
|
|
///
|
|
/// # Example
|
|
///
|
|
/// ```ignore
|
|
/// let score = db.read_decay_score(EntityId::new(42), "view", 0, Timestamp::now())?;
|
|
/// if let Some(s) = score {
|
|
/// println!("view decay score: {s:.6}");
|
|
/// }
|
|
/// ```
|
|
pub fn read_decay_score(
|
|
&self,
|
|
entity_id: EntityId,
|
|
signal_type: &str,
|
|
decay_rate_idx: usize,
|
|
query_time: Timestamp,
|
|
) -> Result<Option<f64>>;
|
|
|
|
/// Read the windowed event count for a signal on an entity.
|
|
///
|
|
/// Returns the number of signal events recorded within the specified
|
|
/// time window. Uses the warm-tier bucketed counters for O(bucket_count)
|
|
/// evaluation.
|
|
///
|
|
/// Returns 0 if no signals of this type have been recorded for this entity.
|
|
///
|
|
/// # Arguments
|
|
///
|
|
/// - `entity_id`: The target item's ID
|
|
/// - `signal_type`: Name of the signal
|
|
/// - `window`: The time window to query (OneHour, TwentyFourHours, etc.)
|
|
///
|
|
/// # Errors
|
|
///
|
|
/// - `LumenError::Schema` if `signal_type` is not defined
|
|
///
|
|
/// # Example
|
|
///
|
|
/// ```ignore
|
|
/// let count = db.read_windowed_count(EntityId::new(42), "view", Window::TwentyFourHours)?;
|
|
/// println!("views in last 24h: {count}");
|
|
/// ```
|
|
pub fn read_windowed_count(
|
|
&self,
|
|
entity_id: EntityId,
|
|
signal_type: &str,
|
|
window: Window,
|
|
) -> Result<u64>;
|
|
|
|
/// Read the velocity (events per second) for a signal on an entity.
|
|
///
|
|
/// Velocity = `windowed_count / window_duration_seconds`.
|
|
/// Returns 0.0 for the AllTime window (velocity is undefined for
|
|
/// unbounded windows) and for entities with no signal history.
|
|
///
|
|
/// # Arguments
|
|
///
|
|
/// - `entity_id`: The target item's ID
|
|
/// - `signal_type`: Name of the signal
|
|
/// - `window`: The time window for velocity computation
|
|
///
|
|
/// # Errors
|
|
///
|
|
/// - `LumenError::Schema` if `signal_type` is not defined
|
|
///
|
|
/// # Example
|
|
///
|
|
/// ```ignore
|
|
/// let velocity = db.read_velocity(EntityId::new(42), "view", Window::OneHour)?;
|
|
/// println!("view velocity: {velocity:.4} events/sec");
|
|
/// ```
|
|
pub fn read_velocity(
|
|
&self,
|
|
entity_id: EntityId,
|
|
signal_type: &str,
|
|
window: Window,
|
|
) -> Result<f64>;
|
|
}
|
|
```
|
|
|
|
### Internal Design
|
|
|
|
Each method is a thin delegation to the `SignalLedger`:
|
|
|
|
```rust
|
|
pub fn signal(
|
|
&self,
|
|
signal_type: &str,
|
|
entity_id: EntityId,
|
|
weight: f64,
|
|
timestamp: Timestamp,
|
|
) -> Result<()> {
|
|
self.signal_ledger.record_signal(signal_type, entity_id, weight, timestamp)
|
|
}
|
|
|
|
pub fn read_decay_score(
|
|
&self,
|
|
entity_id: EntityId,
|
|
signal_type: &str,
|
|
decay_rate_idx: usize,
|
|
query_time: Timestamp,
|
|
) -> Result<Option<f64>> {
|
|
self.signal_ledger.read_decay_score(entity_id, signal_type, decay_rate_idx, query_time)
|
|
}
|
|
|
|
pub fn read_windowed_count(
|
|
&self,
|
|
entity_id: EntityId,
|
|
signal_type: &str,
|
|
window: Window,
|
|
) -> Result<u64> {
|
|
self.signal_ledger.read_windowed_count(entity_id, signal_type, window)
|
|
}
|
|
|
|
pub fn read_velocity(
|
|
&self,
|
|
entity_id: EntityId,
|
|
signal_type: &str,
|
|
window: Window,
|
|
) -> Result<f64> {
|
|
self.signal_ledger.read_velocity(entity_id, signal_type, window)
|
|
}
|
|
```
|
|
|
|
The `read_decay_score` method needs the `query_time` parameter because the `SignalLedger` applies lazy decay: `stored_score * exp(-lambda * (query_time - last_update))`. The caller provides the query time for deterministic behavior. In production, this is `Timestamp::now()`.
|
|
|
|
Note: the `SignalLedger::read_decay_score` signature from m1p4 Task 03 returns `Result<Option<f64>>` and takes a query time. If the Task 03 signature does not include `query_time`, it must be updated. The `HotSignalState::current_score` method requires `query_time_ns` and `lambda` -- the ledger should thread the query time through.
|
|
|
|
### Error Handling
|
|
|
|
All errors are delegated to the `SignalLedger` and propagated as `LumenError`. No new error handling in this task.
|
|
|
|
## Test Strategy
|
|
|
|
### Unit Tests
|
|
|
|
```rust
|
|
#[test]
|
|
fn signal_and_read_decay_score() {
|
|
let dir = TempDir::new().unwrap();
|
|
let db = TidalDB::open(test_config(&dir)).unwrap();
|
|
|
|
let entity = EntityId::new(42);
|
|
let now = Timestamp::now();
|
|
|
|
db.signal("view", entity, 1.0, now).unwrap();
|
|
|
|
let score = db.read_decay_score(entity, "view", 0, now).unwrap();
|
|
assert!(score.is_some());
|
|
let s = score.unwrap();
|
|
assert!((s - 1.0).abs() < 1e-6, "score should be ~1.0 immediately after write, got {s}");
|
|
|
|
db.shutdown().unwrap();
|
|
}
|
|
|
|
#[test]
|
|
fn signal_and_read_windowed_count() {
|
|
let dir = TempDir::new().unwrap();
|
|
let db = TidalDB::open(test_config(&dir)).unwrap();
|
|
|
|
let entity = EntityId::new(1);
|
|
let now = Timestamp::now();
|
|
|
|
for _ in 0..10 {
|
|
db.signal("view", entity, 1.0, now).unwrap();
|
|
}
|
|
|
|
let count = db.read_windowed_count(entity, "view", Window::OneHour).unwrap();
|
|
assert_eq!(count, 10);
|
|
|
|
let all_time = db.read_windowed_count(entity, "view", Window::AllTime).unwrap();
|
|
assert_eq!(all_time, 10);
|
|
|
|
db.shutdown().unwrap();
|
|
}
|
|
|
|
#[test]
|
|
fn signal_and_read_velocity() {
|
|
let dir = TempDir::new().unwrap();
|
|
let db = TidalDB::open(test_config(&dir)).unwrap();
|
|
|
|
let entity = EntityId::new(1);
|
|
let now = Timestamp::now();
|
|
|
|
for _ in 0..100 {
|
|
db.signal("view", entity, 1.0, now).unwrap();
|
|
}
|
|
|
|
let velocity = db.read_velocity(entity, "view", Window::OneHour).unwrap();
|
|
let expected = 100.0 / Window::OneHour.duration_secs_f64();
|
|
assert!(
|
|
(velocity - expected).abs() < 1e-10,
|
|
"velocity={velocity}, expected={expected}"
|
|
);
|
|
|
|
// AllTime velocity is 0
|
|
let v_all = db.read_velocity(entity, "view", Window::AllTime).unwrap();
|
|
assert!((v_all).abs() < 1e-15);
|
|
|
|
db.shutdown().unwrap();
|
|
}
|
|
|
|
#[test]
|
|
fn signal_unknown_type_returns_error() {
|
|
let dir = TempDir::new().unwrap();
|
|
let db = TidalDB::open(test_config(&dir)).unwrap();
|
|
|
|
let result = db.signal("nonexistent", EntityId::new(1), 1.0, Timestamp::now());
|
|
assert!(result.is_err());
|
|
|
|
db.shutdown().unwrap();
|
|
}
|
|
|
|
#[test]
|
|
fn read_score_unknown_type_returns_error() {
|
|
let dir = TempDir::new().unwrap();
|
|
let db = TidalDB::open(test_config(&dir)).unwrap();
|
|
|
|
let result = db.read_decay_score(EntityId::new(1), "nonexistent", 0, Timestamp::now());
|
|
assert!(result.is_err());
|
|
|
|
db.shutdown().unwrap();
|
|
}
|
|
|
|
#[test]
|
|
fn read_score_no_signals_returns_none() {
|
|
let dir = TempDir::new().unwrap();
|
|
let db = TidalDB::open(test_config(&dir)).unwrap();
|
|
|
|
let score = db.read_decay_score(EntityId::new(999), "view", 0, Timestamp::now()).unwrap();
|
|
assert!(score.is_none());
|
|
|
|
db.shutdown().unwrap();
|
|
}
|
|
|
|
#[test]
|
|
fn signal_reflects_immediately() {
|
|
let dir = TempDir::new().unwrap();
|
|
let db = TidalDB::open(test_config(&dir)).unwrap();
|
|
|
|
let entity = EntityId::new(42);
|
|
let t1 = Timestamp::now();
|
|
|
|
// Write first signal
|
|
db.signal("view", entity, 1.0, t1).unwrap();
|
|
let score1 = db.read_decay_score(entity, "view", 0, t1).unwrap().unwrap();
|
|
|
|
// Write second signal
|
|
let t2 = Timestamp::from_nanos(t1.as_nanos() + 1_000_000); // +1ms
|
|
db.signal("view", entity, 1.0, t2).unwrap();
|
|
let score2 = db.read_decay_score(entity, "view", 0, t2).unwrap().unwrap();
|
|
|
|
assert!(score2 > score1, "score should increase after new signal");
|
|
|
|
let count = db.read_windowed_count(entity, "view", Window::AllTime).unwrap();
|
|
assert_eq!(count, 2);
|
|
|
|
db.shutdown().unwrap();
|
|
}
|
|
|
|
#[test]
|
|
fn multiple_signal_types_independent() {
|
|
let dir = TempDir::new().unwrap();
|
|
let db = TidalDB::open(test_config(&dir)).unwrap();
|
|
|
|
let entity = EntityId::new(1);
|
|
let now = Timestamp::now();
|
|
|
|
db.signal("view", entity, 1.0, now).unwrap();
|
|
db.signal("like", entity, 1.0, now).unwrap();
|
|
|
|
let view_count = db.read_windowed_count(entity, "view", Window::AllTime).unwrap();
|
|
let like_count = db.read_windowed_count(entity, "like", Window::AllTime).unwrap();
|
|
let skip_count = db.read_windowed_count(entity, "skip", Window::AllTime).unwrap();
|
|
|
|
assert_eq!(view_count, 1);
|
|
assert_eq!(like_count, 1);
|
|
assert_eq!(skip_count, 0);
|
|
|
|
db.shutdown().unwrap();
|
|
}
|
|
|
|
#[test]
|
|
fn signals_survive_close_reopen() {
|
|
let dir = TempDir::new().unwrap();
|
|
let now = Timestamp::now();
|
|
|
|
// Write signals, shutdown
|
|
{
|
|
let db = TidalDB::open(test_config(&dir)).unwrap();
|
|
for i in 0..50 {
|
|
let ts = Timestamp::from_nanos(now.as_nanos() + i * 1_000_000);
|
|
db.signal("view", EntityId::new(42), 1.0, ts).unwrap();
|
|
}
|
|
db.shutdown().unwrap();
|
|
}
|
|
|
|
// Reopen and verify
|
|
{
|
|
let db = TidalDB::open(test_config(&dir)).unwrap();
|
|
|
|
let count = db.read_windowed_count(EntityId::new(42), "view", Window::AllTime).unwrap();
|
|
assert_eq!(count, 50, "all 50 signals should survive restart");
|
|
|
|
let score = db.read_decay_score(EntityId::new(42), "view", 0, Timestamp::now()).unwrap();
|
|
assert!(score.is_some());
|
|
assert!(score.unwrap() > 0.0);
|
|
|
|
db.shutdown().unwrap();
|
|
}
|
|
}
|
|
```
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [ ] `db.signal()` writes a signal event and updates decay scores + windowed counters
|
|
- [ ] `db.read_decay_score()` returns lazy-decayed score at query time
|
|
- [ ] `db.read_windowed_count()` returns bucketed count for the given window
|
|
- [ ] `db.read_velocity()` returns events per second for the given window
|
|
- [ ] Unknown signal type returns `LumenError::Schema` on all methods
|
|
- [ ] Signals are reflected immediately in subsequent reads
|
|
- [ ] Signal state survives close and reopen (via checkpoint/restore)
|
|
- [ ] Multiple signal types per entity are independent
|
|
- [ ] No `unsafe` code
|
|
- [ ] `cargo clippy -- -D warnings` passes
|
|
- [ ] All tests pass
|
|
|
|
## Research References
|
|
|
|
- [API.md](../../../../API.md) -- Writing Signals section (`db.signal(Signal { ... })`)
|
|
|
|
## Spec References
|
|
|
|
- [docs/specs/03-signal-system.md](../../../specs/03-signal-system.md) -- Section 8 (signal write path), Section 4 (decay read), Section 5 (velocity), Section 12 (performance targets)
|
|
- [docs/specs/00-architecture-overview.md](../../../specs/00-architecture-overview.md) -- Section 5 (signal write walkthrough)
|
|
|
|
## Implementation Notes
|
|
|
|
- This task is deliberately simple -- it is a thin API layer. If the `SignalLedger` from m1p4 is correctly implemented, these methods are one-liners.
|
|
- The `query_time: Timestamp` parameter on `read_decay_score` is important for testing determinism. In production, callers pass `Timestamp::now()`. In tests, callers pass a known timestamp so assertions are deterministic.
|
|
- Do NOT add `signal_batch()` or bulk signal write API. That is an M2+ optimization.
|
|
- Do NOT add `read_all_signals(entity_id)` snapshot API. That is an M2 concern for the response `SignalSnapshot` struct.
|