Implements the foundation of tidalDB's data pipeline: **Phase 1 – Schema primitives** - EntityId newtype (u64, big-endian ordering) - SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows - SchemaBuilder with full constraint validation (duplicates, identifiers, half-life, windows, velocity) - LumenError wrapping all subsystems with required From impls **Phase 2 – Write-Ahead Log** - Length-prefixed, BLAKE3-protected entry format - Group-commit writer (batch up to 100 events / 10 ms) - Double-buffered content-hash deduplication - Checkpoint, truncation, and crash-recovery with full replay - Integration, property, and UAT tests (incl. 5,500-event deterministic UAT) - Proptest coverage scaled to 10 000 events/run (was ≤500) to meet acceptance criterion; cases reduced 100→10 to keep runtime comparable **Phase 3 – Storage engine** - StorageEngine trait (get/put/delete/scan/batch/flush) - Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers - InMemoryBackend (BTreeMap + RwLock) - FjallStorage with three isolated keyspaces and atomic batch helper - Property tests for key ordering and round-trip correctness Also adds planning docs for phases 4-5, research docs, architecture overview, and roadmap updates. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
203 lines
7.5 KiB
Markdown
203 lines
7.5 KiB
Markdown
# Task 03: InMemoryBackend
|
|
|
|
## Context
|
|
|
|
**Milestone:** 1 -- Signal Engine
|
|
**Phase:** m1p3 -- Storage Engine Trait and fjall Backend
|
|
**Status:** COMPLETE
|
|
**Depends On:** Task 01 (`StorageEngine` trait, `WriteBatch`, `StorageError`)
|
|
**Blocks:** None (parallel with Task 02)
|
|
**Complexity:** S
|
|
|
|
## Objective
|
|
|
|
Implement `InMemoryBackend` — a `BTreeMap`-backed, `RwLock`-protected implementation of `StorageEngine` for use in unit tests and property tests. It is deterministic (no OS interaction), fast (no disk I/O), and sorted (BTreeMap preserves lexicographic key order, matching fjall's behavior).
|
|
|
|
Every test in m1p3, m1p4, and m1p5 that does not specifically test fjall behavior uses `InMemoryBackend`. This makes the test suite run fast and reproducible across platforms.
|
|
|
|
## Requirements
|
|
|
|
- `InMemoryBackend` wraps `Arc<RwLock<BTreeMap<Vec<u8>, Vec<u8>>>>`
|
|
- `get`, `put`, `delete` acquire appropriate locks (read for `get`, write for others)
|
|
- `scan_prefix` acquires a read lock and returns an iterator over matching keys in sorted order
|
|
- `write_batch` acquires a write lock and applies all operations atomically within the lock
|
|
- `flush` is a no-op (in-memory, nothing to flush)
|
|
- `Clone` is implemented (cheap: clones the `Arc`, shares the underlying map)
|
|
- State is NOT persistent — data is lost when the backend is dropped
|
|
- `Send + Sync` (enforced by `Arc<RwLock<...>>`)
|
|
|
|
## Technical Design
|
|
|
|
### Public API
|
|
|
|
```rust
|
|
// === memory.rs ===
|
|
|
|
/// In-memory storage backend for deterministic testing.
|
|
///
|
|
/// Uses a `BTreeMap` to match fjall's lexicographic key ordering.
|
|
/// Shared via `Arc<RwLock>` for `Send + Sync + Clone`.
|
|
#[derive(Debug, Clone, Default)]
|
|
pub struct InMemoryBackend {
|
|
map: Arc<RwLock<BTreeMap<Vec<u8>, Vec<u8>>>>,
|
|
}
|
|
|
|
impl InMemoryBackend {
|
|
pub fn new() -> Self;
|
|
}
|
|
|
|
impl StorageEngine for InMemoryBackend {
|
|
fn get(&self, key: &[u8]) -> Result<Option<Vec<u8>>, StorageError>;
|
|
fn put(&self, key: &[u8], value: &[u8]) -> Result<(), StorageError>;
|
|
fn delete(&self, key: &[u8]) -> Result<(), StorageError>;
|
|
fn scan_prefix(&self, prefix: &[u8]) -> PrefixIterator<'_>;
|
|
fn write_batch(&self, batch: WriteBatch) -> Result<(), StorageError>;
|
|
fn flush(&self) -> Result<(), StorageError>;
|
|
}
|
|
```
|
|
|
|
### scan_prefix Design
|
|
|
|
`BTreeMap::range` accepts a range of `Vec<u8>` keys. To scan all keys with a given prefix, use:
|
|
|
|
```rust
|
|
use std::ops::Bound::*;
|
|
let prefix = prefix.to_vec();
|
|
let end = next_prefix(&prefix); // increment last non-0xFF byte
|
|
let range = map.range(Included(prefix.clone())..end_bound);
|
|
```
|
|
|
|
Where `next_prefix` returns the lexicographic successor of the prefix (or unbounded if the prefix is all 0xFF bytes). This matches fjall's behavior for prefix scans.
|
|
|
|
**Lifetime challenge:** `scan_prefix` returns `PrefixIterator<'_>` which must hold the `RwLockReadGuard`. One approach: collect into a `Vec` and return an owned iterator. This avoids lifetime issues at the cost of one allocation. Since `InMemoryBackend` is only used in tests, this is acceptable.
|
|
|
|
## Test Strategy
|
|
|
|
### Unit Tests
|
|
|
|
```rust
|
|
#[test]
|
|
fn in_memory_get_put_delete() {
|
|
let backend = InMemoryBackend::new();
|
|
backend.put(b"k1", b"v1").unwrap();
|
|
assert_eq!(backend.get(b"k1").unwrap(), Some(b"v1".to_vec()));
|
|
backend.delete(b"k1").unwrap();
|
|
assert_eq!(backend.get(b"k1").unwrap(), None);
|
|
}
|
|
|
|
#[test]
|
|
fn in_memory_get_missing_returns_none() {
|
|
let backend = InMemoryBackend::new();
|
|
assert_eq!(backend.get(b"missing").unwrap(), None);
|
|
}
|
|
|
|
#[test]
|
|
fn in_memory_scan_prefix_returns_sorted() {
|
|
let backend = InMemoryBackend::new();
|
|
backend.put(b"prefix_c", b"vc").unwrap();
|
|
backend.put(b"prefix_a", b"va").unwrap();
|
|
backend.put(b"prefix_b", b"vb").unwrap();
|
|
backend.put(b"other_key", b"vo").unwrap();
|
|
|
|
let results: Vec<_> = backend.scan_prefix(b"prefix_")
|
|
.collect::<Result<Vec<_>, _>>().unwrap();
|
|
assert_eq!(results.len(), 3);
|
|
assert_eq!(results[0].0, b"prefix_a");
|
|
assert_eq!(results[1].0, b"prefix_b");
|
|
assert_eq!(results[2].0, b"prefix_c");
|
|
}
|
|
|
|
#[test]
|
|
fn in_memory_scan_empty_prefix_returns_all() {
|
|
let backend = InMemoryBackend::new();
|
|
backend.put(b"a", b"1").unwrap();
|
|
backend.put(b"b", b"2").unwrap();
|
|
let results: Vec<_> = backend.scan_prefix(b"").collect::<Result<Vec<_>, _>>().unwrap();
|
|
assert_eq!(results.len(), 2);
|
|
}
|
|
|
|
#[test]
|
|
fn in_memory_write_batch_atomic() {
|
|
let backend = InMemoryBackend::new();
|
|
backend.put(b"existing", b"old").unwrap();
|
|
|
|
let mut batch = WriteBatch::new();
|
|
batch.put(b"k1".to_vec(), b"v1".to_vec());
|
|
batch.put(b"k2".to_vec(), b"v2".to_vec());
|
|
batch.delete(b"existing".to_vec());
|
|
backend.write_batch(batch).unwrap();
|
|
|
|
assert_eq!(backend.get(b"k1").unwrap(), Some(b"v1".to_vec()));
|
|
assert_eq!(backend.get(b"k2").unwrap(), Some(b"v2".to_vec()));
|
|
assert_eq!(backend.get(b"existing").unwrap(), None);
|
|
}
|
|
|
|
#[test]
|
|
fn in_memory_clone_shares_state() {
|
|
let b1 = InMemoryBackend::new();
|
|
let b2 = b1.clone();
|
|
|
|
b1.put(b"shared", b"value").unwrap();
|
|
assert_eq!(b2.get(b"shared").unwrap(), Some(b"value".to_vec()));
|
|
}
|
|
|
|
#[test]
|
|
fn in_memory_flush_is_noop() {
|
|
let backend = InMemoryBackend::new();
|
|
backend.put(b"k", b"v").unwrap();
|
|
backend.flush().unwrap(); // must not panic or error
|
|
assert_eq!(backend.get(b"k").unwrap(), Some(b"v".to_vec()));
|
|
}
|
|
```
|
|
|
|
### Property Tests (proptest)
|
|
|
|
```rust
|
|
// InMemoryBackend scan_prefix ordering matches BTreeMap ordering
|
|
proptest! {
|
|
#[test]
|
|
fn scan_prefix_lexicographic_order(
|
|
keys in prop::collection::vec(prop::collection::vec(any::<u8>(), 1..8), 1..20),
|
|
prefix in prop::collection::vec(any::<u8>(), 0..4),
|
|
) {
|
|
let backend = InMemoryBackend::new();
|
|
for key in &keys {
|
|
backend.put(key, b"v").unwrap();
|
|
}
|
|
let results: Vec<Vec<u8>> = backend.scan_prefix(&prefix)
|
|
.collect::<Result<Vec<_>, _>>().unwrap()
|
|
.into_iter().map(|(k, _)| k).collect();
|
|
|
|
// All results start with prefix
|
|
for k in &results {
|
|
prop_assert!(k.starts_with(&prefix));
|
|
}
|
|
// Results are sorted
|
|
for window in results.windows(2) {
|
|
prop_assert!(window[0] <= window[1]);
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [x] `InMemoryBackend` implements all `StorageEngine` methods
|
|
- [x] `scan_prefix` returns keys in lexicographic order (BTreeMap guarantees)
|
|
- [x] `scan_prefix` returns only keys that start with the given prefix
|
|
- [x] `write_batch` applies all operations atomically (single write lock hold)
|
|
- [x] `flush` is a no-op (returns `Ok(())`)
|
|
- [x] `Clone` shares the underlying `BTreeMap` via `Arc<RwLock<...>>`
|
|
- [x] `InMemoryBackend: Send + Sync` (enforced by `Arc<RwLock>`)
|
|
|
|
## Research References
|
|
|
|
- [CODING_GUIDELINES.md](../../../../CODING_GUIDELINES.md) — Section 2 (key encoding requirements: lexicographic ordering must match numeric ordering — validated via `InMemoryBackend` property tests)
|
|
|
|
## Implementation Notes
|
|
|
|
- `BTreeMap` iterates in lexicographic key order by default. This matches fjall's LSM-tree ordering, making `InMemoryBackend` a faithful test double.
|
|
- The `scan_prefix` implementation collects into a `Vec` before returning to avoid holding the `RwLockReadGuard` across the `PrefixIterator` lifetime. This is acceptable because `InMemoryBackend` is only used in tests, not on the hot path.
|
|
- Do NOT implement persistence. If a test needs persistence, it should use `FjallStorage`. The `InMemoryBackend` is explicitly non-persistent.
|
|
- `Default` is derived so that `InMemoryBackend::default()` works for ergonomic test setup.
|