tidaldb/docs/planning/milestone-1/phase-5/task-02-signal-write-and-read-api.md
jordan 29400d48db feat: implement Milestone 1 phases 1-3 — schema, WAL, and storage layer
Implements the foundation of tidalDB's data pipeline:

**Phase 1 – Schema primitives**
- EntityId newtype (u64, big-endian ordering)
- SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows
- SchemaBuilder with full constraint validation (duplicates, identifiers,
  half-life, windows, velocity)
- LumenError wrapping all subsystems with required From impls

**Phase 2 – Write-Ahead Log**
- Length-prefixed, BLAKE3-protected entry format
- Group-commit writer (batch up to 100 events / 10 ms)
- Double-buffered content-hash deduplication
- Checkpoint, truncation, and crash-recovery with full replay
- Integration, property, and UAT tests (incl. 5,500-event deterministic UAT)
- Proptest coverage scaled to 10 000 events/run (was ≤500) to meet
  acceptance criterion; cases reduced 100→10 to keep runtime comparable

**Phase 3 – Storage engine**
- StorageEngine trait (get/put/delete/scan/batch/flush)
- Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers
- InMemoryBackend (BTreeMap + RwLock)
- FjallStorage with three isolated keyspaces and atomic batch helper
- Property tests for key ordering and round-trip correctness

Also adds planning docs for phases 4-5, research docs, architecture
overview, and roadmap updates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 16:43:24 -07:00

14 KiB

Task 02: Signal Write and Read API

Context

Milestone: 1 -- Signal Engine Phase: m1p5 -- Entity CRUD and Signal Write API Depends On: Task 01 (TidalDB Core) Blocks: Task 03 (Integration Test and UAT) Complexity: S

Objective

Expose the signal write and read operations on the TidalDB struct: signal(), read_decay_score(), read_windowed_count(), read_velocity(). These are thin wrappers around the SignalLedger methods, providing the ergonomic public API that the M1 UAT scenario tests against.

This task is intentionally small. All the complexity lives in m1p4 (signal ledger). This task connects that complexity to the public API surface with proper error handling and documentation.

Requirements

  • db.signal(signal_type, entity_id, weight, timestamp) delegates to signal_ledger.record_signal()
  • db.read_decay_score(entity_id, signal_type, decay_rate_idx, query_time) delegates to signal_ledger.read_decay_score()
  • db.read_windowed_count(entity_id, signal_type, window) delegates to signal_ledger.read_windowed_count()
  • db.read_velocity(entity_id, signal_type, window) delegates to signal_ledger.read_velocity()
  • All methods take &self (no mutable access)
  • Error types are the standard LumenError variants
  • Methods are documented with examples

Technical Design

Module Structure

No new files. Methods are added to the TidalDB impl block in lib.rs.

Public API

// === lib.rs (additions to TidalDB impl) ===

impl TidalDB {
    /// Write a signal event.
    ///
    /// Records an engagement event (view, like, skip, etc.) targeting an item.
    /// The signal is:
    /// 1. Appended to the WAL (once m1p2 is integrated)
    /// 2. Applied to the hot-tier running decay scores (O(1) update)
    /// 3. Applied to the warm-tier bucketed counters (atomic increment)
    ///
    /// The next read query reflects the updated state immediately.
    ///
    /// # Arguments
    ///
    /// - `signal_type`: Name of the signal (must match a schema-defined signal)
    /// - `entity_id`: The target item's ID
    /// - `weight`: Signal weight (typically 1.0; 0.0-1.0 for completion ratio)
    /// - `timestamp`: Event timestamp (use `Timestamp::now()` for current time)
    ///
    /// # Errors
    ///
    /// - `LumenError::Schema` if `signal_type` is not defined in the schema
    /// - `LumenError::Durability` if the WAL write fails (when WAL is active)
    ///
    /// # Example
    ///
    /// ```ignore
    /// db.signal("view", EntityId::new(42), 1.0, Timestamp::now())?;
    /// db.signal("completion", EntityId::new(42), 0.94, Timestamp::now())?;
    /// ```
    pub fn signal(
        &self,
        signal_type: &str,
        entity_id: EntityId,
        weight: f64,
        timestamp: Timestamp,
    ) -> Result<()>;

    /// Read the current decay score for a signal on an entity.
    ///
    /// Returns the running exponential decay score at `query_time`. The score
    /// accounts for all previously recorded signals, each decayed by
    /// `exp(-lambda * age)` where `age` is the time since the event.
    ///
    /// Returns `None` if no signals of this type have been recorded for this
    /// entity.
    ///
    /// # Arguments
    ///
    /// - `entity_id`: The target item's ID
    /// - `signal_type`: Name of the signal
    /// - `decay_rate_idx`: Index of the decay rate (0 for primary, 1-2 for secondary)
    /// - `query_time`: The time at which to evaluate the score
    ///
    /// # Errors
    ///
    /// - `LumenError::Schema` if `signal_type` is not defined
    ///
    /// # Example
    ///
    /// ```ignore
    /// let score = db.read_decay_score(EntityId::new(42), "view", 0, Timestamp::now())?;
    /// if let Some(s) = score {
    ///     println!("view decay score: {s:.6}");
    /// }
    /// ```
    pub fn read_decay_score(
        &self,
        entity_id: EntityId,
        signal_type: &str,
        decay_rate_idx: usize,
        query_time: Timestamp,
    ) -> Result<Option<f64>>;

    /// Read the windowed event count for a signal on an entity.
    ///
    /// Returns the number of signal events recorded within the specified
    /// time window. Uses the warm-tier bucketed counters for O(bucket_count)
    /// evaluation.
    ///
    /// Returns 0 if no signals of this type have been recorded for this entity.
    ///
    /// # Arguments
    ///
    /// - `entity_id`: The target item's ID
    /// - `signal_type`: Name of the signal
    /// - `window`: The time window to query (OneHour, TwentyFourHours, etc.)
    ///
    /// # Errors
    ///
    /// - `LumenError::Schema` if `signal_type` is not defined
    ///
    /// # Example
    ///
    /// ```ignore
    /// let count = db.read_windowed_count(EntityId::new(42), "view", Window::TwentyFourHours)?;
    /// println!("views in last 24h: {count}");
    /// ```
    pub fn read_windowed_count(
        &self,
        entity_id: EntityId,
        signal_type: &str,
        window: Window,
    ) -> Result<u64>;

    /// Read the velocity (events per second) for a signal on an entity.
    ///
    /// Velocity = `windowed_count / window_duration_seconds`.
    /// Returns 0.0 for the AllTime window (velocity is undefined for
    /// unbounded windows) and for entities with no signal history.
    ///
    /// # Arguments
    ///
    /// - `entity_id`: The target item's ID
    /// - `signal_type`: Name of the signal
    /// - `window`: The time window for velocity computation
    ///
    /// # Errors
    ///
    /// - `LumenError::Schema` if `signal_type` is not defined
    ///
    /// # Example
    ///
    /// ```ignore
    /// let velocity = db.read_velocity(EntityId::new(42), "view", Window::OneHour)?;
    /// println!("view velocity: {velocity:.4} events/sec");
    /// ```
    pub fn read_velocity(
        &self,
        entity_id: EntityId,
        signal_type: &str,
        window: Window,
    ) -> Result<f64>;
}

Internal Design

Each method is a thin delegation to the SignalLedger:

pub fn signal(
    &self,
    signal_type: &str,
    entity_id: EntityId,
    weight: f64,
    timestamp: Timestamp,
) -> Result<()> {
    self.signal_ledger.record_signal(signal_type, entity_id, weight, timestamp)
}

pub fn read_decay_score(
    &self,
    entity_id: EntityId,
    signal_type: &str,
    decay_rate_idx: usize,
    query_time: Timestamp,
) -> Result<Option<f64>> {
    self.signal_ledger.read_decay_score(entity_id, signal_type, decay_rate_idx, query_time)
}

pub fn read_windowed_count(
    &self,
    entity_id: EntityId,
    signal_type: &str,
    window: Window,
) -> Result<u64> {
    self.signal_ledger.read_windowed_count(entity_id, signal_type, window)
}

pub fn read_velocity(
    &self,
    entity_id: EntityId,
    signal_type: &str,
    window: Window,
) -> Result<f64> {
    self.signal_ledger.read_velocity(entity_id, signal_type, window)
}

The read_decay_score method needs the query_time parameter because the SignalLedger applies lazy decay: stored_score * exp(-lambda * (query_time - last_update)). The caller provides the query time for deterministic behavior. In production, this is Timestamp::now().

Note: the SignalLedger::read_decay_score signature from m1p4 Task 03 returns Result<Option<f64>> and takes a query time. If the Task 03 signature does not include query_time, it must be updated. The HotSignalState::current_score method requires query_time_ns and lambda -- the ledger should thread the query time through.

Error Handling

All errors are delegated to the SignalLedger and propagated as LumenError. No new error handling in this task.

Test Strategy

Unit Tests

#[test]
fn signal_and_read_decay_score() {
    let dir = TempDir::new().unwrap();
    let db = TidalDB::open(test_config(&dir)).unwrap();

    let entity = EntityId::new(42);
    let now = Timestamp::now();

    db.signal("view", entity, 1.0, now).unwrap();

    let score = db.read_decay_score(entity, "view", 0, now).unwrap();
    assert!(score.is_some());
    let s = score.unwrap();
    assert!((s - 1.0).abs() < 1e-6, "score should be ~1.0 immediately after write, got {s}");

    db.shutdown().unwrap();
}

#[test]
fn signal_and_read_windowed_count() {
    let dir = TempDir::new().unwrap();
    let db = TidalDB::open(test_config(&dir)).unwrap();

    let entity = EntityId::new(1);
    let now = Timestamp::now();

    for _ in 0..10 {
        db.signal("view", entity, 1.0, now).unwrap();
    }

    let count = db.read_windowed_count(entity, "view", Window::OneHour).unwrap();
    assert_eq!(count, 10);

    let all_time = db.read_windowed_count(entity, "view", Window::AllTime).unwrap();
    assert_eq!(all_time, 10);

    db.shutdown().unwrap();
}

#[test]
fn signal_and_read_velocity() {
    let dir = TempDir::new().unwrap();
    let db = TidalDB::open(test_config(&dir)).unwrap();

    let entity = EntityId::new(1);
    let now = Timestamp::now();

    for _ in 0..100 {
        db.signal("view", entity, 1.0, now).unwrap();
    }

    let velocity = db.read_velocity(entity, "view", Window::OneHour).unwrap();
    let expected = 100.0 / Window::OneHour.duration_secs_f64();
    assert!(
        (velocity - expected).abs() < 1e-10,
        "velocity={velocity}, expected={expected}"
    );

    // AllTime velocity is 0
    let v_all = db.read_velocity(entity, "view", Window::AllTime).unwrap();
    assert!((v_all).abs() < 1e-15);

    db.shutdown().unwrap();
}

#[test]
fn signal_unknown_type_returns_error() {
    let dir = TempDir::new().unwrap();
    let db = TidalDB::open(test_config(&dir)).unwrap();

    let result = db.signal("nonexistent", EntityId::new(1), 1.0, Timestamp::now());
    assert!(result.is_err());

    db.shutdown().unwrap();
}

#[test]
fn read_score_unknown_type_returns_error() {
    let dir = TempDir::new().unwrap();
    let db = TidalDB::open(test_config(&dir)).unwrap();

    let result = db.read_decay_score(EntityId::new(1), "nonexistent", 0, Timestamp::now());
    assert!(result.is_err());

    db.shutdown().unwrap();
}

#[test]
fn read_score_no_signals_returns_none() {
    let dir = TempDir::new().unwrap();
    let db = TidalDB::open(test_config(&dir)).unwrap();

    let score = db.read_decay_score(EntityId::new(999), "view", 0, Timestamp::now()).unwrap();
    assert!(score.is_none());

    db.shutdown().unwrap();
}

#[test]
fn signal_reflects_immediately() {
    let dir = TempDir::new().unwrap();
    let db = TidalDB::open(test_config(&dir)).unwrap();

    let entity = EntityId::new(42);
    let t1 = Timestamp::now();

    // Write first signal
    db.signal("view", entity, 1.0, t1).unwrap();
    let score1 = db.read_decay_score(entity, "view", 0, t1).unwrap().unwrap();

    // Write second signal
    let t2 = Timestamp::from_nanos(t1.as_nanos() + 1_000_000); // +1ms
    db.signal("view", entity, 1.0, t2).unwrap();
    let score2 = db.read_decay_score(entity, "view", 0, t2).unwrap().unwrap();

    assert!(score2 > score1, "score should increase after new signal");

    let count = db.read_windowed_count(entity, "view", Window::AllTime).unwrap();
    assert_eq!(count, 2);

    db.shutdown().unwrap();
}

#[test]
fn multiple_signal_types_independent() {
    let dir = TempDir::new().unwrap();
    let db = TidalDB::open(test_config(&dir)).unwrap();

    let entity = EntityId::new(1);
    let now = Timestamp::now();

    db.signal("view", entity, 1.0, now).unwrap();
    db.signal("like", entity, 1.0, now).unwrap();

    let view_count = db.read_windowed_count(entity, "view", Window::AllTime).unwrap();
    let like_count = db.read_windowed_count(entity, "like", Window::AllTime).unwrap();
    let skip_count = db.read_windowed_count(entity, "skip", Window::AllTime).unwrap();

    assert_eq!(view_count, 1);
    assert_eq!(like_count, 1);
    assert_eq!(skip_count, 0);

    db.shutdown().unwrap();
}

#[test]
fn signals_survive_close_reopen() {
    let dir = TempDir::new().unwrap();
    let now = Timestamp::now();

    // Write signals, shutdown
    {
        let db = TidalDB::open(test_config(&dir)).unwrap();
        for i in 0..50 {
            let ts = Timestamp::from_nanos(now.as_nanos() + i * 1_000_000);
            db.signal("view", EntityId::new(42), 1.0, ts).unwrap();
        }
        db.shutdown().unwrap();
    }

    // Reopen and verify
    {
        let db = TidalDB::open(test_config(&dir)).unwrap();

        let count = db.read_windowed_count(EntityId::new(42), "view", Window::AllTime).unwrap();
        assert_eq!(count, 50, "all 50 signals should survive restart");

        let score = db.read_decay_score(EntityId::new(42), "view", 0, Timestamp::now()).unwrap();
        assert!(score.is_some());
        assert!(score.unwrap() > 0.0);

        db.shutdown().unwrap();
    }
}

Acceptance Criteria

  • db.signal() writes a signal event and updates decay scores + windowed counters
  • db.read_decay_score() returns lazy-decayed score at query time
  • db.read_windowed_count() returns bucketed count for the given window
  • db.read_velocity() returns events per second for the given window
  • Unknown signal type returns LumenError::Schema on all methods
  • Signals are reflected immediately in subsequent reads
  • Signal state survives close and reopen (via checkpoint/restore)
  • Multiple signal types per entity are independent
  • No unsafe code
  • cargo clippy -- -D warnings passes
  • All tests pass

Research References

  • API.md -- Writing Signals section (db.signal(Signal { ... }))

Spec References

Implementation Notes

  • This task is deliberately simple -- it is a thin API layer. If the SignalLedger from m1p4 is correctly implemented, these methods are one-liners.
  • The query_time: Timestamp parameter on read_decay_score is important for testing determinism. In production, callers pass Timestamp::now(). In tests, callers pass a known timestamp so assertions are deterministic.
  • Do NOT add signal_batch() or bulk signal write API. That is an M2+ optimization.
  • Do NOT add read_all_signals(entity_id) snapshot API. That is an M2 concern for the response SignalSnapshot struct.