tidaldb/site/content/blog/signals-wrote-100ms-ago.mdx
jordan 192c473f55 feat: complete Milestone 5 — full-text search, RRF fusion, and creator search
- M5p1: BM25 text indexing via Tantivy with background syncer (0.26ms @ 10K docs)
- M5p2: RRF fusion layer combining BM25 + ANN scores (46µs @ 1K candidates)
- M5p3: unified Search query API (8-stage pipeline, BM25 + vector + ranking)
- M5p4: creator text + vector indexing and creator search executor (< 20ms @ 200 creators)
- Refactor db/mod.rs into focused sub-modules (creators, items, sessions, signals, etc.)
- Decompose monolithic files into directory modules (query/executor, ranking/diversity, etc.)
- Split brute.rs → brute/mod.rs + brute/tests.rs; extract search executor helpers
- Add benches: fusion, search, session, text_index
- Add M5 UAT test suites (m5_uat, m5_search, m5p4_creator_search, text_index)
- Update blog posts, roadmap, content strategy, and M5 planning docs
- Add tmp/ and .claude/worktrees/ to .gitignore

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 23:53:16 -07:00

203 lines
9.8 KiB
Plaintext

---
title: "Signals wrote 100ms ago. The query sees them now."
date: "2026-02-21"
author: "Jordan Washburn"
description: "Open a tidalDB instance, define signal types with decay and windows, write 10,000 events, read back scores that match analytical computation to under 0.1% relative error. Including after a crash."
tags: ["signals", "durability", "rust"]
---
tidalDB can open, accept a schema, write 10,000 engagement signals, and read back decay-correct scores that match brute-force analytical computation. Including after you kill the process and restart it.
This post is about what we built, what was harder than expected, and what we learned.
## What we set out to build
The goal was a single question: do temporal signals with O(1) decay, velocity, and windowed aggregation work as a database primitive?
Schema types. A write-ahead log. A storage engine. A signal ledger with cache-line aligned hot structs and bucketed warm-tier counters. And finally, the public API that wires it all together behind `TidalDb::builder()`.
The UAT acceptance criteria: open a database, define three signal types (view, like, skip), write 100 items with metadata, write 10,000 signal events spread across 7 days, verify every decay score against an analytical reference implementation, verify windowed counts, verify immediate signal visibility, close the database, reopen it, and verify the recovered state matches.
The test passes. Here it is.
## The UAT test
```rust
let schema = build_schema();
let db = TidalDb::builder()
.ephemeral()
.with_schema(schema)
.open()?;
// Write 100 items.
for i in 0..100_u64 {
db.write_item(EntityId::new(i), &metadata(i))?;
}
// Generate 10,000 signal events spread over the past 7 days.
let now = Timestamp::now();
let seven_days_ns: u64 = 7 * 24 * 3_600_000_000_000;
let signal_types = ["view", "like", "skip"];
for i in 0..10_000_u64 {
let entity_id = EntityId::new(i % 100);
let sig = signal_types[(i % 3) as usize];
let ts = Timestamp::from_nanos(
now.as_nanos()
.saturating_sub(seven_days_ns)
.saturating_add(i * (seven_days_ns / 10_000)),
);
db.signal(sig, entity_id, 1.0, ts)?;
}
// Read the decay score — matches the brute-force analytical reference to < 0.1% relative error.
let score = db.read_decay_score(EntityId::new(42), "view", 0)?;
// Write a new signal and read again — the new event is immediately visible.
let score_before = db.read_decay_score(EntityId::new(42), "view", 0)?.unwrap_or(0.0);
db.signal("view", EntityId::new(42), 1.0, Timestamp::now())?;
let score_after = db.read_decay_score(EntityId::new(42), "view", 0)?.unwrap_or(0.0);
// score_after > score_before — the signal is reflected without any delay
```
The `analytical_decay` function is a brute-force reference. It iterates every event, computes `weight * exp(-lambda * dt)` for each one, and sums the results. The running accumulator in tidalDB produces the same answer without scanning a single raw event.
## The hard part was not the math
The forward-decay formula is elegant and well-understood. Post 2 covered the derivation. What we did not anticipate was how much work the *wiring* would demand.
The signal ledger needs to write to the WAL before updating its in-memory state. But the WAL is owned by `TidalDb`, and the ledger is a standalone component that does not know about the database layer. If the ledger holds a reference to the WAL, the layers are coupled. If it does not, signals are not durable.
The solution is a trait boundary and a bridge.
The `signals` module defines a `WalWriter` trait. The ledger accepts a `Box<dyn WalWriter>` at construction. It calls `append_signal()` on every write, but never knows what is on the other side. In tests, that is a `NoopWalWriter`. In production, it is a `WalHandleWriter` that forwards events to the live WAL via a channel.
```rust
pub struct WalHandleWriter {
sender: WalSender,
last_seq: Arc<AtomicU64>,
}
impl WalWriter for WalHandleWriter {
fn append_signal(
&self,
signal_type_id: SignalTypeId,
entity_id: EntityId,
weight: f64,
timestamp: Timestamp,
) -> crate::Result<()> {
let event = SignalEvent {
entity_id: entity_id.as_u64(),
signal_type: u8::try_from(signal_type_id.as_u16())?,
weight: weight as f32,
timestamp_nanos: timestamp.as_nanos(),
};
let seq = self.sender.append(event)?;
// CAS loop to track highest committed WAL sequence
// ...
Ok(())
}
}
```
The `WalSender` is a cloneable, `Send + Sync` wrapper around the channel sender. It separates the ability to append from the ownership of the WAL handle. The `TidalDb` owns the `WalHandle` (for shutdown). The ledger owns a `WalSender` (for writes). Neither needs a reference to the other.
This is three types, one trait, and one channel. It took longer to get right than the decay math.
## Crash recovery in a few lines
The persistent-mode test writes 100 signals, reads the decay score, closes the database, reopens it, and asserts the recovered score matches within 0.1%:
```rust
let entity = EntityId::new(42);
// Session 1: write signals, read score, close.
let score_before = {
let db = TidalDb::builder()
.with_data_dir("/var/lib/tidaldb")
.with_schema(build_schema())
.open()?;
for i in 0..100_u64 {
let ts = Timestamp::from_nanos(/* spread over 7 days */);
db.signal("view", entity, 1.0, ts)?;
}
let score = db.read_decay_score(entity, "view", 0)?.unwrap();
db.close()?;
score
};
// Session 2: reopen the same data directory, verify state survived.
{
let db = TidalDb::builder()
.with_data_dir("/var/lib/tidaldb")
.with_schema(build_schema())
.open()?;
let score_after = db.read_decay_score(entity, "view", 0)?.unwrap();
// score_after ≈ score_before — under 0.1% relative deviation.
// The checkpoint + WAL replay path restores exact in-memory state.
}
```
The recovery path has three stages. On shutdown, the database checkpoints all in-memory signal state -- every `HotSignalState` and every `BucketedCounter` -- as a single write batch to durable storage, tagged with the highest WAL sequence number. On reopen, the ledger restores from the checkpoint. Then the WAL replays any events that arrived after the checkpoint was taken. The result is the same in-memory state as before the close, minus the natural decay of a few milliseconds of elapsed time.
The 0.1% tolerance is conservative. With a 7-day half-life, one second of elapsed time causes approximately 0.00012% decay change. The sessions open within milliseconds. A checkpoint-or-restore bug would need to exceed a 1% threshold to hide inside this tolerance. At 0.1%, we catch it.
## What surprised us
The `f64` to `f32` precision boundary.
Signal weights are `f64` in memory -- the hot tier retains full double precision. But the WAL stores weights as `f32` to keep event records compact. That truncation is fine for weights (the difference between `1.0f64` and `1.0f32` is zero). What caught us was the *replay path*.
When a process crashes and restarts, the ledger rebuilds from the WAL. Every event weight passes through the `f64 -> f32 -> f64` round-trip. The analytical reference in the test uses the original `f64` weights. The replayed ledger uses the `f32`-truncated weights. For 10,000 events with weight `1.0`, there is no difference. But for arbitrary weights -- `0.7`, `3.14159`, anything that is not exactly representable in `f32` -- the accumulated rounding error across thousands of events is measurable.
The UAT tolerance is set to `1e-3` relative error. Not because the math is approximate (it is exact to floating-point precision), but because the WAL wire format introduces a quantization boundary. This is a deliberate trade-off: compact WAL records at the cost of sub-thousandth precision after crash recovery. For ranking scores, a 0.1% deviation is indistinguishable. For an accounting ledger, it would be unacceptable. We are not an accounting ledger.
## The public API
This is what a developer sees today:
```rust
use std::time::Duration;
use tidaldb::TidalDb;
use tidaldb::schema::{DecaySpec, EntityId, EntityKind, SchemaBuilder, Timestamp, Window};
// Define signal types in schema
let mut builder = SchemaBuilder::new();
let _ = builder.signal("view", EntityKind::Item,
DecaySpec::Exponential { half_life: Duration::from_secs(7 * 24 * 3600) })
.windows(&[Window::OneHour, Window::TwentyFourHours])
.velocity(false)
.add();
let schema = builder.build()?;
// Open a database
let db = TidalDb::builder()
.with_data_dir("/var/lib/tidaldb")
.with_schema(schema)
.open()?;
// Write an item
db.write_item(EntityId::new(42), &metadata)?;
// Record a signal
db.signal("view", EntityId::new(42), 1.0, Timestamp::now())?;
// Read the decay score -- reflects the signal immediately
let score = db.read_decay_score(EntityId::new(42), "view", 0)?;
// Read windowed count
let views_last_hour = db.read_windowed_count(EntityId::new(42), "view", Window::OneHour)?;
```
No Redis. No Kafka. No cron job recomputing scores. The signal write and the score read share a process, a memory space, and a signal ledger. The score is never stale because it is never cached. It is computed -- one `exp()` and one multiply -- at the moment you ask.
## What is next
Next: ranked retrieval. A single `RETRIEVE` query that takes candidate entities, scores them against a named ranking profile using live decay signals and velocity, enforces diversity constraints, and returns a ranked list. The signal engine now has something to feed.
---
*tidalDB is an open-source, embeddable Rust database for personalized content ranking. The integration tests referenced in this post are at [tidal/tests/signal_api.rs](https://github.com/orchard9/tidalDB/blob/main/tidal/tests/signal_api.rs). Follow the build on [GitHub](https://github.com/orchard9/tidalDB).*