Implements the foundation of tidalDB's data pipeline: **Phase 1 – Schema primitives** - EntityId newtype (u64, big-endian ordering) - SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows - SchemaBuilder with full constraint validation (duplicates, identifiers, half-life, windows, velocity) - LumenError wrapping all subsystems with required From impls **Phase 2 – Write-Ahead Log** - Length-prefixed, BLAKE3-protected entry format - Group-commit writer (batch up to 100 events / 10 ms) - Double-buffered content-hash deduplication - Checkpoint, truncation, and crash-recovery with full replay - Integration, property, and UAT tests (incl. 5,500-event deterministic UAT) - Proptest coverage scaled to 10 000 events/run (was ≤500) to meet acceptance criterion; cases reduced 100→10 to keep runtime comparable **Phase 3 – Storage engine** - StorageEngine trait (get/put/delete/scan/batch/flush) - Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers - InMemoryBackend (BTreeMap + RwLock) - FjallStorage with three isolated keyspaces and atomic batch helper - Property tests for key ordering and round-trip correctness Also adds planning docs for phases 4-5, research docs, architecture overview, and roadmap updates. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
7.5 KiB
Task 03: InMemoryBackend
Context
Milestone: 1 -- Signal Engine
Phase: m1p3 -- Storage Engine Trait and fjall Backend
Status: COMPLETE
Depends On: Task 01 (StorageEngine trait, WriteBatch, StorageError)
Blocks: None (parallel with Task 02)
Complexity: S
Objective
Implement InMemoryBackend — a BTreeMap-backed, RwLock-protected implementation of StorageEngine for use in unit tests and property tests. It is deterministic (no OS interaction), fast (no disk I/O), and sorted (BTreeMap preserves lexicographic key order, matching fjall's behavior).
Every test in m1p3, m1p4, and m1p5 that does not specifically test fjall behavior uses InMemoryBackend. This makes the test suite run fast and reproducible across platforms.
Requirements
InMemoryBackendwrapsArc<RwLock<BTreeMap<Vec<u8>, Vec<u8>>>>get,put,deleteacquire appropriate locks (read forget, write for others)scan_prefixacquires a read lock and returns an iterator over matching keys in sorted orderwrite_batchacquires a write lock and applies all operations atomically within the lockflushis a no-op (in-memory, nothing to flush)Cloneis implemented (cheap: clones theArc, shares the underlying map)- State is NOT persistent — data is lost when the backend is dropped
Send + Sync(enforced byArc<RwLock<...>>)
Technical Design
Public API
// === memory.rs ===
/// In-memory storage backend for deterministic testing.
///
/// Uses a `BTreeMap` to match fjall's lexicographic key ordering.
/// Shared via `Arc<RwLock>` for `Send + Sync + Clone`.
#[derive(Debug, Clone, Default)]
pub struct InMemoryBackend {
map: Arc<RwLock<BTreeMap<Vec<u8>, Vec<u8>>>>,
}
impl InMemoryBackend {
pub fn new() -> Self;
}
impl StorageEngine for InMemoryBackend {
fn get(&self, key: &[u8]) -> Result<Option<Vec<u8>>, StorageError>;
fn put(&self, key: &[u8], value: &[u8]) -> Result<(), StorageError>;
fn delete(&self, key: &[u8]) -> Result<(), StorageError>;
fn scan_prefix(&self, prefix: &[u8]) -> PrefixIterator<'_>;
fn write_batch(&self, batch: WriteBatch) -> Result<(), StorageError>;
fn flush(&self) -> Result<(), StorageError>;
}
scan_prefix Design
BTreeMap::range accepts a range of Vec<u8> keys. To scan all keys with a given prefix, use:
use std::ops::Bound::*;
let prefix = prefix.to_vec();
let end = next_prefix(&prefix); // increment last non-0xFF byte
let range = map.range(Included(prefix.clone())..end_bound);
Where next_prefix returns the lexicographic successor of the prefix (or unbounded if the prefix is all 0xFF bytes). This matches fjall's behavior for prefix scans.
Lifetime challenge: scan_prefix returns PrefixIterator<'_> which must hold the RwLockReadGuard. One approach: collect into a Vec and return an owned iterator. This avoids lifetime issues at the cost of one allocation. Since InMemoryBackend is only used in tests, this is acceptable.
Test Strategy
Unit Tests
#[test]
fn in_memory_get_put_delete() {
let backend = InMemoryBackend::new();
backend.put(b"k1", b"v1").unwrap();
assert_eq!(backend.get(b"k1").unwrap(), Some(b"v1".to_vec()));
backend.delete(b"k1").unwrap();
assert_eq!(backend.get(b"k1").unwrap(), None);
}
#[test]
fn in_memory_get_missing_returns_none() {
let backend = InMemoryBackend::new();
assert_eq!(backend.get(b"missing").unwrap(), None);
}
#[test]
fn in_memory_scan_prefix_returns_sorted() {
let backend = InMemoryBackend::new();
backend.put(b"prefix_c", b"vc").unwrap();
backend.put(b"prefix_a", b"va").unwrap();
backend.put(b"prefix_b", b"vb").unwrap();
backend.put(b"other_key", b"vo").unwrap();
let results: Vec<_> = backend.scan_prefix(b"prefix_")
.collect::<Result<Vec<_>, _>>().unwrap();
assert_eq!(results.len(), 3);
assert_eq!(results[0].0, b"prefix_a");
assert_eq!(results[1].0, b"prefix_b");
assert_eq!(results[2].0, b"prefix_c");
}
#[test]
fn in_memory_scan_empty_prefix_returns_all() {
let backend = InMemoryBackend::new();
backend.put(b"a", b"1").unwrap();
backend.put(b"b", b"2").unwrap();
let results: Vec<_> = backend.scan_prefix(b"").collect::<Result<Vec<_>, _>>().unwrap();
assert_eq!(results.len(), 2);
}
#[test]
fn in_memory_write_batch_atomic() {
let backend = InMemoryBackend::new();
backend.put(b"existing", b"old").unwrap();
let mut batch = WriteBatch::new();
batch.put(b"k1".to_vec(), b"v1".to_vec());
batch.put(b"k2".to_vec(), b"v2".to_vec());
batch.delete(b"existing".to_vec());
backend.write_batch(batch).unwrap();
assert_eq!(backend.get(b"k1").unwrap(), Some(b"v1".to_vec()));
assert_eq!(backend.get(b"k2").unwrap(), Some(b"v2".to_vec()));
assert_eq!(backend.get(b"existing").unwrap(), None);
}
#[test]
fn in_memory_clone_shares_state() {
let b1 = InMemoryBackend::new();
let b2 = b1.clone();
b1.put(b"shared", b"value").unwrap();
assert_eq!(b2.get(b"shared").unwrap(), Some(b"value".to_vec()));
}
#[test]
fn in_memory_flush_is_noop() {
let backend = InMemoryBackend::new();
backend.put(b"k", b"v").unwrap();
backend.flush().unwrap(); // must not panic or error
assert_eq!(backend.get(b"k").unwrap(), Some(b"v".to_vec()));
}
Property Tests (proptest)
// InMemoryBackend scan_prefix ordering matches BTreeMap ordering
proptest! {
#[test]
fn scan_prefix_lexicographic_order(
keys in prop::collection::vec(prop::collection::vec(any::<u8>(), 1..8), 1..20),
prefix in prop::collection::vec(any::<u8>(), 0..4),
) {
let backend = InMemoryBackend::new();
for key in &keys {
backend.put(key, b"v").unwrap();
}
let results: Vec<Vec<u8>> = backend.scan_prefix(&prefix)
.collect::<Result<Vec<_>, _>>().unwrap()
.into_iter().map(|(k, _)| k).collect();
// All results start with prefix
for k in &results {
prop_assert!(k.starts_with(&prefix));
}
// Results are sorted
for window in results.windows(2) {
prop_assert!(window[0] <= window[1]);
}
}
}
Acceptance Criteria
InMemoryBackendimplements allStorageEnginemethodsscan_prefixreturns keys in lexicographic order (BTreeMap guarantees)scan_prefixreturns only keys that start with the given prefixwrite_batchapplies all operations atomically (single write lock hold)flushis a no-op (returnsOk(()))Cloneshares the underlyingBTreeMapviaArc<RwLock<...>>InMemoryBackend: Send + Sync(enforced byArc<RwLock>)
Research References
- CODING_GUIDELINES.md — Section 2 (key encoding requirements: lexicographic ordering must match numeric ordering — validated via
InMemoryBackendproperty tests)
Implementation Notes
BTreeMapiterates in lexicographic key order by default. This matches fjall's LSM-tree ordering, makingInMemoryBackenda faithful test double.- The
scan_prefiximplementation collects into aVecbefore returning to avoid holding theRwLockReadGuardacross thePrefixIteratorlifetime. This is acceptable becauseInMemoryBackendis only used in tests, not on the hot path. - Do NOT implement persistence. If a test needs persistence, it should use
FjallStorage. TheInMemoryBackendis explicitly non-persistent. Defaultis derived so thatInMemoryBackend::default()works for ergonomic test setup.