tidaldb/docs/planning/milestone-1/phase-3/task-03-in-memory-backend.md
jordan 29400d48db feat: implement Milestone 1 phases 1-3 — schema, WAL, and storage layer
Implements the foundation of tidalDB's data pipeline:

**Phase 1 – Schema primitives**
- EntityId newtype (u64, big-endian ordering)
- SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows
- SchemaBuilder with full constraint validation (duplicates, identifiers,
  half-life, windows, velocity)
- LumenError wrapping all subsystems with required From impls

**Phase 2 – Write-Ahead Log**
- Length-prefixed, BLAKE3-protected entry format
- Group-commit writer (batch up to 100 events / 10 ms)
- Double-buffered content-hash deduplication
- Checkpoint, truncation, and crash-recovery with full replay
- Integration, property, and UAT tests (incl. 5,500-event deterministic UAT)
- Proptest coverage scaled to 10 000 events/run (was ≤500) to meet
  acceptance criterion; cases reduced 100→10 to keep runtime comparable

**Phase 3 – Storage engine**
- StorageEngine trait (get/put/delete/scan/batch/flush)
- Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers
- InMemoryBackend (BTreeMap + RwLock)
- FjallStorage with three isolated keyspaces and atomic batch helper
- Property tests for key ordering and round-trip correctness

Also adds planning docs for phases 4-5, research docs, architecture
overview, and roadmap updates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 16:43:24 -07:00

203 lines
7.5 KiB
Markdown

# Task 03: InMemoryBackend
## Context
**Milestone:** 1 -- Signal Engine
**Phase:** m1p3 -- Storage Engine Trait and fjall Backend
**Status:** COMPLETE
**Depends On:** Task 01 (`StorageEngine` trait, `WriteBatch`, `StorageError`)
**Blocks:** None (parallel with Task 02)
**Complexity:** S
## Objective
Implement `InMemoryBackend` — a `BTreeMap`-backed, `RwLock`-protected implementation of `StorageEngine` for use in unit tests and property tests. It is deterministic (no OS interaction), fast (no disk I/O), and sorted (BTreeMap preserves lexicographic key order, matching fjall's behavior).
Every test in m1p3, m1p4, and m1p5 that does not specifically test fjall behavior uses `InMemoryBackend`. This makes the test suite run fast and reproducible across platforms.
## Requirements
- `InMemoryBackend` wraps `Arc<RwLock<BTreeMap<Vec<u8>, Vec<u8>>>>`
- `get`, `put`, `delete` acquire appropriate locks (read for `get`, write for others)
- `scan_prefix` acquires a read lock and returns an iterator over matching keys in sorted order
- `write_batch` acquires a write lock and applies all operations atomically within the lock
- `flush` is a no-op (in-memory, nothing to flush)
- `Clone` is implemented (cheap: clones the `Arc`, shares the underlying map)
- State is NOT persistent — data is lost when the backend is dropped
- `Send + Sync` (enforced by `Arc<RwLock<...>>`)
## Technical Design
### Public API
```rust
// === memory.rs ===
/// In-memory storage backend for deterministic testing.
///
/// Uses a `BTreeMap` to match fjall's lexicographic key ordering.
/// Shared via `Arc<RwLock>` for `Send + Sync + Clone`.
#[derive(Debug, Clone, Default)]
pub struct InMemoryBackend {
map: Arc<RwLock<BTreeMap<Vec<u8>, Vec<u8>>>>,
}
impl InMemoryBackend {
pub fn new() -> Self;
}
impl StorageEngine for InMemoryBackend {
fn get(&self, key: &[u8]) -> Result<Option<Vec<u8>>, StorageError>;
fn put(&self, key: &[u8], value: &[u8]) -> Result<(), StorageError>;
fn delete(&self, key: &[u8]) -> Result<(), StorageError>;
fn scan_prefix(&self, prefix: &[u8]) -> PrefixIterator<'_>;
fn write_batch(&self, batch: WriteBatch) -> Result<(), StorageError>;
fn flush(&self) -> Result<(), StorageError>;
}
```
### scan_prefix Design
`BTreeMap::range` accepts a range of `Vec<u8>` keys. To scan all keys with a given prefix, use:
```rust
use std::ops::Bound::*;
let prefix = prefix.to_vec();
let end = next_prefix(&prefix); // increment last non-0xFF byte
let range = map.range(Included(prefix.clone())..end_bound);
```
Where `next_prefix` returns the lexicographic successor of the prefix (or unbounded if the prefix is all 0xFF bytes). This matches fjall's behavior for prefix scans.
**Lifetime challenge:** `scan_prefix` returns `PrefixIterator<'_>` which must hold the `RwLockReadGuard`. One approach: collect into a `Vec` and return an owned iterator. This avoids lifetime issues at the cost of one allocation. Since `InMemoryBackend` is only used in tests, this is acceptable.
## Test Strategy
### Unit Tests
```rust
#[test]
fn in_memory_get_put_delete() {
let backend = InMemoryBackend::new();
backend.put(b"k1", b"v1").unwrap();
assert_eq!(backend.get(b"k1").unwrap(), Some(b"v1".to_vec()));
backend.delete(b"k1").unwrap();
assert_eq!(backend.get(b"k1").unwrap(), None);
}
#[test]
fn in_memory_get_missing_returns_none() {
let backend = InMemoryBackend::new();
assert_eq!(backend.get(b"missing").unwrap(), None);
}
#[test]
fn in_memory_scan_prefix_returns_sorted() {
let backend = InMemoryBackend::new();
backend.put(b"prefix_c", b"vc").unwrap();
backend.put(b"prefix_a", b"va").unwrap();
backend.put(b"prefix_b", b"vb").unwrap();
backend.put(b"other_key", b"vo").unwrap();
let results: Vec<_> = backend.scan_prefix(b"prefix_")
.collect::<Result<Vec<_>, _>>().unwrap();
assert_eq!(results.len(), 3);
assert_eq!(results[0].0, b"prefix_a");
assert_eq!(results[1].0, b"prefix_b");
assert_eq!(results[2].0, b"prefix_c");
}
#[test]
fn in_memory_scan_empty_prefix_returns_all() {
let backend = InMemoryBackend::new();
backend.put(b"a", b"1").unwrap();
backend.put(b"b", b"2").unwrap();
let results: Vec<_> = backend.scan_prefix(b"").collect::<Result<Vec<_>, _>>().unwrap();
assert_eq!(results.len(), 2);
}
#[test]
fn in_memory_write_batch_atomic() {
let backend = InMemoryBackend::new();
backend.put(b"existing", b"old").unwrap();
let mut batch = WriteBatch::new();
batch.put(b"k1".to_vec(), b"v1".to_vec());
batch.put(b"k2".to_vec(), b"v2".to_vec());
batch.delete(b"existing".to_vec());
backend.write_batch(batch).unwrap();
assert_eq!(backend.get(b"k1").unwrap(), Some(b"v1".to_vec()));
assert_eq!(backend.get(b"k2").unwrap(), Some(b"v2".to_vec()));
assert_eq!(backend.get(b"existing").unwrap(), None);
}
#[test]
fn in_memory_clone_shares_state() {
let b1 = InMemoryBackend::new();
let b2 = b1.clone();
b1.put(b"shared", b"value").unwrap();
assert_eq!(b2.get(b"shared").unwrap(), Some(b"value".to_vec()));
}
#[test]
fn in_memory_flush_is_noop() {
let backend = InMemoryBackend::new();
backend.put(b"k", b"v").unwrap();
backend.flush().unwrap(); // must not panic or error
assert_eq!(backend.get(b"k").unwrap(), Some(b"v".to_vec()));
}
```
### Property Tests (proptest)
```rust
// InMemoryBackend scan_prefix ordering matches BTreeMap ordering
proptest! {
#[test]
fn scan_prefix_lexicographic_order(
keys in prop::collection::vec(prop::collection::vec(any::<u8>(), 1..8), 1..20),
prefix in prop::collection::vec(any::<u8>(), 0..4),
) {
let backend = InMemoryBackend::new();
for key in &keys {
backend.put(key, b"v").unwrap();
}
let results: Vec<Vec<u8>> = backend.scan_prefix(&prefix)
.collect::<Result<Vec<_>, _>>().unwrap()
.into_iter().map(|(k, _)| k).collect();
// All results start with prefix
for k in &results {
prop_assert!(k.starts_with(&prefix));
}
// Results are sorted
for window in results.windows(2) {
prop_assert!(window[0] <= window[1]);
}
}
}
```
## Acceptance Criteria
- [x] `InMemoryBackend` implements all `StorageEngine` methods
- [x] `scan_prefix` returns keys in lexicographic order (BTreeMap guarantees)
- [x] `scan_prefix` returns only keys that start with the given prefix
- [x] `write_batch` applies all operations atomically (single write lock hold)
- [x] `flush` is a no-op (returns `Ok(())`)
- [x] `Clone` shares the underlying `BTreeMap` via `Arc<RwLock<...>>`
- [x] `InMemoryBackend: Send + Sync` (enforced by `Arc<RwLock>`)
## Research References
- [CODING_GUIDELINES.md](../../../../CODING_GUIDELINES.md) — Section 2 (key encoding requirements: lexicographic ordering must match numeric ordering — validated via `InMemoryBackend` property tests)
## Implementation Notes
- `BTreeMap` iterates in lexicographic key order by default. This matches fjall's LSM-tree ordering, making `InMemoryBackend` a faithful test double.
- The `scan_prefix` implementation collects into a `Vec` before returning to avoid holding the `RwLockReadGuard` across the `PrefixIterator` lifetime. This is acceptable because `InMemoryBackend` is only used in tests, not on the hot path.
- Do NOT implement persistence. If a test needs persistence, it should use `FjallStorage`. The `InMemoryBackend` is explicitly non-persistent.
- `Default` is derived so that `InMemoryBackend::default()` works for ergonomic test setup.