tidaldb/docs/planning/milestone-1/phase-3/task-03-in-memory-backend.md
jordan 29400d48db feat: implement Milestone 1 phases 1-3 — schema, WAL, and storage layer
Implements the foundation of tidalDB's data pipeline:

**Phase 1 – Schema primitives**
- EntityId newtype (u64, big-endian ordering)
- SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows
- SchemaBuilder with full constraint validation (duplicates, identifiers,
  half-life, windows, velocity)
- LumenError wrapping all subsystems with required From impls

**Phase 2 – Write-Ahead Log**
- Length-prefixed, BLAKE3-protected entry format
- Group-commit writer (batch up to 100 events / 10 ms)
- Double-buffered content-hash deduplication
- Checkpoint, truncation, and crash-recovery with full replay
- Integration, property, and UAT tests (incl. 5,500-event deterministic UAT)
- Proptest coverage scaled to 10 000 events/run (was ≤500) to meet
  acceptance criterion; cases reduced 100→10 to keep runtime comparable

**Phase 3 – Storage engine**
- StorageEngine trait (get/put/delete/scan/batch/flush)
- Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers
- InMemoryBackend (BTreeMap + RwLock)
- FjallStorage with three isolated keyspaces and atomic batch helper
- Property tests for key ordering and round-trip correctness

Also adds planning docs for phases 4-5, research docs, architecture
overview, and roadmap updates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 16:43:24 -07:00

7.5 KiB

Task 03: InMemoryBackend

Context

Milestone: 1 -- Signal Engine Phase: m1p3 -- Storage Engine Trait and fjall Backend Status: COMPLETE Depends On: Task 01 (StorageEngine trait, WriteBatch, StorageError) Blocks: None (parallel with Task 02) Complexity: S

Objective

Implement InMemoryBackend — a BTreeMap-backed, RwLock-protected implementation of StorageEngine for use in unit tests and property tests. It is deterministic (no OS interaction), fast (no disk I/O), and sorted (BTreeMap preserves lexicographic key order, matching fjall's behavior).

Every test in m1p3, m1p4, and m1p5 that does not specifically test fjall behavior uses InMemoryBackend. This makes the test suite run fast and reproducible across platforms.

Requirements

  • InMemoryBackend wraps Arc<RwLock<BTreeMap<Vec<u8>, Vec<u8>>>>
  • get, put, delete acquire appropriate locks (read for get, write for others)
  • scan_prefix acquires a read lock and returns an iterator over matching keys in sorted order
  • write_batch acquires a write lock and applies all operations atomically within the lock
  • flush is a no-op (in-memory, nothing to flush)
  • Clone is implemented (cheap: clones the Arc, shares the underlying map)
  • State is NOT persistent — data is lost when the backend is dropped
  • Send + Sync (enforced by Arc<RwLock<...>>)

Technical Design

Public API

// === memory.rs ===

/// In-memory storage backend for deterministic testing.
///
/// Uses a `BTreeMap` to match fjall's lexicographic key ordering.
/// Shared via `Arc<RwLock>` for `Send + Sync + Clone`.
#[derive(Debug, Clone, Default)]
pub struct InMemoryBackend {
    map: Arc<RwLock<BTreeMap<Vec<u8>, Vec<u8>>>>,
}

impl InMemoryBackend {
    pub fn new() -> Self;
}

impl StorageEngine for InMemoryBackend {
    fn get(&self, key: &[u8]) -> Result<Option<Vec<u8>>, StorageError>;
    fn put(&self, key: &[u8], value: &[u8]) -> Result<(), StorageError>;
    fn delete(&self, key: &[u8]) -> Result<(), StorageError>;
    fn scan_prefix(&self, prefix: &[u8]) -> PrefixIterator<'_>;
    fn write_batch(&self, batch: WriteBatch) -> Result<(), StorageError>;
    fn flush(&self) -> Result<(), StorageError>;
}

scan_prefix Design

BTreeMap::range accepts a range of Vec<u8> keys. To scan all keys with a given prefix, use:

use std::ops::Bound::*;
let prefix = prefix.to_vec();
let end = next_prefix(&prefix); // increment last non-0xFF byte
let range = map.range(Included(prefix.clone())..end_bound);

Where next_prefix returns the lexicographic successor of the prefix (or unbounded if the prefix is all 0xFF bytes). This matches fjall's behavior for prefix scans.

Lifetime challenge: scan_prefix returns PrefixIterator<'_> which must hold the RwLockReadGuard. One approach: collect into a Vec and return an owned iterator. This avoids lifetime issues at the cost of one allocation. Since InMemoryBackend is only used in tests, this is acceptable.

Test Strategy

Unit Tests

#[test]
fn in_memory_get_put_delete() {
    let backend = InMemoryBackend::new();
    backend.put(b"k1", b"v1").unwrap();
    assert_eq!(backend.get(b"k1").unwrap(), Some(b"v1".to_vec()));
    backend.delete(b"k1").unwrap();
    assert_eq!(backend.get(b"k1").unwrap(), None);
}

#[test]
fn in_memory_get_missing_returns_none() {
    let backend = InMemoryBackend::new();
    assert_eq!(backend.get(b"missing").unwrap(), None);
}

#[test]
fn in_memory_scan_prefix_returns_sorted() {
    let backend = InMemoryBackend::new();
    backend.put(b"prefix_c", b"vc").unwrap();
    backend.put(b"prefix_a", b"va").unwrap();
    backend.put(b"prefix_b", b"vb").unwrap();
    backend.put(b"other_key", b"vo").unwrap();

    let results: Vec<_> = backend.scan_prefix(b"prefix_")
        .collect::<Result<Vec<_>, _>>().unwrap();
    assert_eq!(results.len(), 3);
    assert_eq!(results[0].0, b"prefix_a");
    assert_eq!(results[1].0, b"prefix_b");
    assert_eq!(results[2].0, b"prefix_c");
}

#[test]
fn in_memory_scan_empty_prefix_returns_all() {
    let backend = InMemoryBackend::new();
    backend.put(b"a", b"1").unwrap();
    backend.put(b"b", b"2").unwrap();
    let results: Vec<_> = backend.scan_prefix(b"").collect::<Result<Vec<_>, _>>().unwrap();
    assert_eq!(results.len(), 2);
}

#[test]
fn in_memory_write_batch_atomic() {
    let backend = InMemoryBackend::new();
    backend.put(b"existing", b"old").unwrap();

    let mut batch = WriteBatch::new();
    batch.put(b"k1".to_vec(), b"v1".to_vec());
    batch.put(b"k2".to_vec(), b"v2".to_vec());
    batch.delete(b"existing".to_vec());
    backend.write_batch(batch).unwrap();

    assert_eq!(backend.get(b"k1").unwrap(), Some(b"v1".to_vec()));
    assert_eq!(backend.get(b"k2").unwrap(), Some(b"v2".to_vec()));
    assert_eq!(backend.get(b"existing").unwrap(), None);
}

#[test]
fn in_memory_clone_shares_state() {
    let b1 = InMemoryBackend::new();
    let b2 = b1.clone();

    b1.put(b"shared", b"value").unwrap();
    assert_eq!(b2.get(b"shared").unwrap(), Some(b"value".to_vec()));
}

#[test]
fn in_memory_flush_is_noop() {
    let backend = InMemoryBackend::new();
    backend.put(b"k", b"v").unwrap();
    backend.flush().unwrap(); // must not panic or error
    assert_eq!(backend.get(b"k").unwrap(), Some(b"v".to_vec()));
}

Property Tests (proptest)

// InMemoryBackend scan_prefix ordering matches BTreeMap ordering
proptest! {
    #[test]
    fn scan_prefix_lexicographic_order(
        keys in prop::collection::vec(prop::collection::vec(any::<u8>(), 1..8), 1..20),
        prefix in prop::collection::vec(any::<u8>(), 0..4),
    ) {
        let backend = InMemoryBackend::new();
        for key in &keys {
            backend.put(key, b"v").unwrap();
        }
        let results: Vec<Vec<u8>> = backend.scan_prefix(&prefix)
            .collect::<Result<Vec<_>, _>>().unwrap()
            .into_iter().map(|(k, _)| k).collect();

        // All results start with prefix
        for k in &results {
            prop_assert!(k.starts_with(&prefix));
        }
        // Results are sorted
        for window in results.windows(2) {
            prop_assert!(window[0] <= window[1]);
        }
    }
}

Acceptance Criteria

  • InMemoryBackend implements all StorageEngine methods
  • scan_prefix returns keys in lexicographic order (BTreeMap guarantees)
  • scan_prefix returns only keys that start with the given prefix
  • write_batch applies all operations atomically (single write lock hold)
  • flush is a no-op (returns Ok(()))
  • Clone shares the underlying BTreeMap via Arc<RwLock<...>>
  • InMemoryBackend: Send + Sync (enforced by Arc<RwLock>)

Research References

  • CODING_GUIDELINES.md — Section 2 (key encoding requirements: lexicographic ordering must match numeric ordering — validated via InMemoryBackend property tests)

Implementation Notes

  • BTreeMap iterates in lexicographic key order by default. This matches fjall's LSM-tree ordering, making InMemoryBackend a faithful test double.
  • The scan_prefix implementation collects into a Vec before returning to avoid holding the RwLockReadGuard across the PrefixIterator lifetime. This is acceptable because InMemoryBackend is only used in tests, not on the hot path.
  • Do NOT implement persistence. If a test needs persistence, it should use FjallStorage. The InMemoryBackend is explicitly non-persistent.
  • Default is derived so that InMemoryBackend::default() works for ergonomic test setup.