287 lines
10 KiB
Markdown
287 lines
10 KiB
Markdown
# Task 05: Signal Rollup Evaluation (Conditional)
|
|
|
|
## Delivers
|
|
|
|
Benchmark of 30-day windowed count queries at 1M items. If p99 exceeds 50ms, implement hourly rollups following the TimescalaDB continuous aggregate pattern described in the research doc. If p99 is within budget, document the finding and defer rollups.
|
|
|
|
## Complexity
|
|
|
|
L
|
|
|
|
## Dependencies
|
|
|
|
- task-01 complete (1M-item TidalDb with signal data)
|
|
- `docs/research/tidaldb_signal_ledger.md` (rollup architecture, SWAG, BucketedCounter design)
|
|
|
|
## Technical Design
|
|
|
|
### 1. Current 30-day window behavior
|
|
|
|
The `BucketedCounter` currently returns 0 for `Window::ThirtyDays`:
|
|
|
|
```rust
|
|
Window::ThirtyDays => {
|
|
tracing::warn!("ThirtyDays window not supported in M1; returning 0");
|
|
0
|
|
}
|
|
```
|
|
|
|
The warm tier has 168 hour buckets (7 days). A 30-day window requires either:
|
|
- **Option A:** Extend hour buckets from 168 to 720 (30 days x 24 hours). Adds 2,208 bytes per entry (552 x `AtomicU32`). At 1M items x 10 signals = ~22 GB extra RAM. Unacceptable.
|
|
- **Option B:** Hourly rollups on disk. Query merges the 168 hot hour buckets with disk-stored rollups for days 8-30. Cost is one disk read per entity.
|
|
- **Option C:** Daily rollups on disk only. Coarser granularity (1-day resolution for days 8-30) but simpler. One disk read per entity.
|
|
|
|
### 2. Benchmark: 30-day windowed count query cost
|
|
|
|
First, measure the baseline cost of summing hour buckets (the existing 7-day path):
|
|
|
|
```rust
|
|
#![allow(clippy::unwrap_used, clippy::cast_precision_loss)]
|
|
|
|
use criterion::{Criterion, black_box, criterion_group, criterion_main};
|
|
use std::time::Duration;
|
|
use tidaldb::schema::{DecaySpec, EntityId, EntityKind, SchemaBuilder, Timestamp, Window};
|
|
use tidaldb::signals::{NoopWalWriter, SignalLedger, SignalTypeId};
|
|
|
|
/// Benchmark: windowed count query for 200 entities at various window sizes.
|
|
/// This establishes whether the bucket-summing approach is viable at 30d scale.
|
|
fn bench_windowed_count_scaling(c: &mut Criterion) {
|
|
let mut group = c.benchmark_group("windowed_count_scaling");
|
|
|
|
// Build a ledger with 1M entities, each with 10 signals over 7 days.
|
|
let mut builder = SchemaBuilder::new();
|
|
let _ = builder
|
|
.signal(
|
|
"view",
|
|
EntityKind::Item,
|
|
DecaySpec::Exponential {
|
|
half_life: Duration::from_secs(7 * 24 * 3600),
|
|
},
|
|
)
|
|
.windows(&[
|
|
Window::OneHour,
|
|
Window::TwentyFourHours,
|
|
Window::SevenDays,
|
|
Window::AllTime,
|
|
])
|
|
.velocity(false)
|
|
.add();
|
|
let schema = builder.build().unwrap();
|
|
let ledger = SignalLedger::new(schema, Box::new(NoopWalWriter));
|
|
|
|
// Pre-populate: 1M entities, 10 signals each spread over 7 days.
|
|
let now_ns = Timestamp::now().as_nanos();
|
|
let seven_days_ns = 7 * 24 * 3600 * 1_000_000_000u64;
|
|
for entity in 0..1_000_000u64 {
|
|
for sig_idx in 0..10u64 {
|
|
let ts_ns = now_ns - (sig_idx * seven_days_ns / 10);
|
|
let ts = Timestamp::from_nanos(ts_ns);
|
|
ledger.record_signal("view", EntityId::new(entity), 1.0, ts).unwrap();
|
|
}
|
|
}
|
|
|
|
let type_id = ledger.resolve_signal_type("view").unwrap();
|
|
|
|
// Benchmark: read 200 entities' windowed counts (the ranking query hot path)
|
|
let entity_ids: Vec<EntityId> = (0..200u64).map(EntityId::new).collect();
|
|
|
|
for window in &[Window::OneHour, Window::TwentyFourHours, Window::SevenDays, Window::AllTime] {
|
|
group.bench_function(format!("{window:?}_200_entities"), |b| {
|
|
b.iter(|| {
|
|
let mut total = 0u64;
|
|
for &entity_id in black_box(&entity_ids) {
|
|
if let Some(entry) = ledger.entries().get(&(entity_id, type_id)) {
|
|
total += entry.warm.windowed_count(black_box(*window));
|
|
}
|
|
}
|
|
black_box(total)
|
|
});
|
|
});
|
|
}
|
|
|
|
group.finish();
|
|
}
|
|
```
|
|
|
|
### 3. 30-day rollup architecture (if needed)
|
|
|
|
Following the research doc's three-tier hybrid design:
|
|
|
|
```
|
|
Query for 30d count:
|
|
= sum(168 hot hour buckets) // covers days 0-7
|
|
+ sum(hourly rollups for days 8-30) // disk read
|
|
```
|
|
|
|
#### Rollup key schema
|
|
|
|
```
|
|
Key: [entity_id: 8B BE][Tag::HourlyRollup][signal_type_id: 2B BE][hour_bucket: 4B BE]
|
|
Value: [count: 4B LE u32]
|
|
```
|
|
|
|
`hour_bucket` = hours since Unix epoch. This gives a time-ordered key layout for efficient range scans.
|
|
|
|
#### Background rollup writer
|
|
|
|
```rust
|
|
/// Materializes hourly rollups from the in-memory BucketedCounter.
|
|
///
|
|
/// Called once per hour by the checkpoint background thread.
|
|
/// For each entity-signal pair, reads the current hour's aggregate
|
|
/// from the minute buckets and writes it to the rollup storage.
|
|
pub fn materialize_hourly_rollups(
|
|
ledger: &SignalLedger,
|
|
storage: &dyn StorageEngine,
|
|
current_hour_bucket: u32,
|
|
) -> crate::Result<usize> {
|
|
let mut written = 0;
|
|
|
|
for entry in ledger.entries().iter() {
|
|
let (entity_id, signal_type_id) = *entry.key();
|
|
let hour_agg: u32 = entry.value().warm.windowed_count(Window::OneHour)
|
|
.try_into()
|
|
.unwrap_or(u32::MAX);
|
|
|
|
if hour_agg == 0 {
|
|
continue; // skip entities with no activity this hour
|
|
}
|
|
|
|
let mut suffix = [0u8; 6];
|
|
suffix[..2].copy_from_slice(&signal_type_id.as_u16().to_be_bytes());
|
|
suffix[2..6].copy_from_slice(¤t_hour_bucket.to_be_bytes());
|
|
|
|
let key = encode_key(entity_id, Tag::HourlyRollup, &suffix);
|
|
let value = hour_agg.to_le_bytes().to_vec();
|
|
storage.put(&key, &value)?;
|
|
written += 1;
|
|
}
|
|
|
|
Ok(written)
|
|
}
|
|
```
|
|
|
|
#### 30-day query implementation
|
|
|
|
```rust
|
|
impl SignalLedger {
|
|
/// Read 30-day windowed count by merging hot buckets with disk rollups.
|
|
///
|
|
/// Returns the sum of:
|
|
/// 1. All 168 hour buckets in the warm tier (days 0-7)
|
|
/// 2. Hourly rollups from disk for hours 168-720 (days 8-30)
|
|
pub fn read_30d_windowed_count(
|
|
&self,
|
|
entity_id: EntityId,
|
|
signal_type_name: &str,
|
|
storage: &dyn StorageEngine,
|
|
) -> crate::Result<u64> {
|
|
let type_id = self.resolve_signal_type(signal_type_name)?;
|
|
|
|
// Part 1: warm tier (last 7 days)
|
|
let warm_count = match self.entries.get(&(entity_id, type_id)) {
|
|
Some(entry) => entry.warm.windowed_count(Window::SevenDays),
|
|
None => 0,
|
|
};
|
|
|
|
// Part 2: disk rollups (days 8-30)
|
|
let now_hours = (Timestamp::now().as_nanos() / 3_600_000_000_000) as u32;
|
|
let start_hour = now_hours.saturating_sub(720); // 30 days ago
|
|
let end_hour = now_hours.saturating_sub(168); // 7 days ago
|
|
|
|
let mut disk_count = 0u64;
|
|
// Range scan: [entity_id][Tag::HourlyRollup][signal_type_id][start_hour]
|
|
// to [entity_id][Tag::HourlyRollup][signal_type_id][end_hour]
|
|
let prefix = entity_tag_prefix_with_signal(entity_id, Tag::HourlyRollup, type_id);
|
|
for entry in storage.scan_prefix(&prefix) {
|
|
let (key, value) = entry?;
|
|
if let Some(hour_bucket) = extract_hour_bucket(&key) {
|
|
if hour_bucket >= start_hour && hour_bucket < end_hour {
|
|
if value.len() >= 4 {
|
|
disk_count += u64::from(u32::from_le_bytes([
|
|
value[0], value[1], value[2], value[3],
|
|
]));
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
Ok(warm_count + disk_count)
|
|
}
|
|
}
|
|
```
|
|
|
|
### 4. Decision framework
|
|
|
|
| Measured 7d query p99 (200 entities) | Decision |
|
|
|--------------------------------------|----------|
|
|
| < 10ms | 30d at 720 buckets would be ~4x cost = ~40ms. Within budget. Defer rollups, just extend HOUR_BUCKETS to 720. |
|
|
| 10ms - 50ms | Extending to 720 buckets would exceed budget. Implement hourly rollups for days 8-30. |
|
|
| > 50ms | Current 7d implementation already slow. Investigate root cause before adding rollups. |
|
|
|
|
The 7-day path sums 168 `AtomicU32` values with `Relaxed` loads. At ~2ns per atomic load, theoretical cost is 168 x 2ns x 200 entities = ~67us. If the measured cost is significantly higher, cache misses on the `BucketedCounter` arrays are the likely culprit.
|
|
|
|
### 5. Retention management
|
|
|
|
Hourly rollups accumulate at 24 keys per entity-signal per day. For 1M items x 10 signals x 30 days = 7.2B keys. This is excessive. Apply 30-day TTL or daily compaction:
|
|
|
|
```rust
|
|
/// Clean up rollup keys older than 30 days.
|
|
pub fn gc_old_rollups(
|
|
storage: &dyn StorageEngine,
|
|
cutoff_hour_bucket: u32,
|
|
) -> crate::Result<usize> {
|
|
// Scan all HourlyRollup keys and delete those older than cutoff
|
|
// This runs daily as part of the maintenance cycle
|
|
}
|
|
```
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [ ] 7-day windowed count benchmarked at 1M items x 200-entity query: p99 measured and documented
|
|
- [ ] 30-day decision made and documented: rollups needed vs. bucket extension vs. defer
|
|
- [ ] If rollups needed: `materialize_hourly_rollups` implemented and tested
|
|
- [ ] If rollups needed: `read_30d_windowed_count` implemented, merging hot + disk
|
|
- [ ] If rollups needed: retention GC implemented for 30-day TTL on rollup keys
|
|
- [ ] If rollups not needed: document the finding with measured numbers
|
|
- [ ] `Window::ThirtyDays` returns a real value (not 0) after this task
|
|
|
|
## Test Strategy
|
|
|
|
1. **Benchmark (always):** Criterion bench for windowed count at 1M items across all window sizes. This is the decision-making measurement.
|
|
|
|
2. **Correctness (if rollups implemented):**
|
|
```rust
|
|
#[test]
|
|
fn thirty_day_count_merges_hot_and_disk() {
|
|
let db = build_test_db_with_rollups();
|
|
// Write signals spanning 30 days
|
|
// Force rollup materialization
|
|
// Read 30d count
|
|
// Verify it equals sum of all written signals
|
|
}
|
|
```
|
|
|
|
3. **Property test (if rollups implemented):**
|
|
```rust
|
|
proptest! {
|
|
#[test]
|
|
fn thirty_day_count_equals_sum_of_parts(
|
|
hot_count in 0u64..10_000,
|
|
disk_count in 0u64..100_000,
|
|
) {
|
|
// Verify that read_30d_windowed_count returns hot + disk
|
|
}
|
|
}
|
|
```
|
|
|
|
4. **GC correctness (if rollups implemented):**
|
|
```rust
|
|
#[test]
|
|
fn rollup_gc_removes_only_old_keys() {
|
|
// Write rollup keys for days 1-40
|
|
// GC with 30-day cutoff
|
|
// Verify keys for days 1-10 deleted, days 11-40 retained
|
|
}
|
|
```
|