- m0p3: CONTRIBUTING.md with run-samples checklist, all 4 examples (quickstart, cli_embedding, axum_embedding, actix_embedding), doc-test coverage for every public API surface - m1p5: TidalDb public API — write_item, signal, read_decay_score, read_windowed_count, read_velocity; StorageBox enum routing memory vs fjall; WalSender/WalHandleWriter bridge; WAL replay on open - Periodic checkpoint: 30s background thread for persistent+schema mode; FjallBackend::Clone (O(1), fjall::Keyspace is ref-counted); graceful shutdown via Arc<AtomicBool> + join before final checkpoint - ROADMAP.md: M0 and M1 fully marked COMPLETE (341 tests passing) - Milestone 2 planning scaffolding added under docs/planning/milestone-2/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
552 lines
20 KiB
Markdown
552 lines
20 KiB
Markdown
# Task 02: B-tree Range Indexes
|
|
|
|
## Context
|
|
|
|
**Milestone:** 2 -- Ranked Retrieval
|
|
**Phase:** m2p2 -- Metadata Indexes and Filter Engine
|
|
**Depends On:** None (uses types from m1p1 but no m2p2 tasks)
|
|
**Blocks:** Task 03 (Composable Filter Engine)
|
|
**Complexity:** S
|
|
|
|
## Objective
|
|
|
|
Deliver `RangeIndex<V>`, a sorted in-memory index for range queries over numeric and timestamp fields. The query `FILTER duration_min:5m, created_within:7d` resolves to range lookups that return `RoaringBitmap` sets: "all entities with duration >= 300 seconds" and "all entities with created_at >= (now - 7 days)". These bitmaps are intersected with categorical bitmaps from Task 01 and fed into the filter engine (Task 03).
|
|
|
|
Range indexes are the complement to bitmap indexes. Bitmap indexes handle exact-match categorical predicates (`category:jazz`). Range indexes handle ordered numeric predicates (`duration >= 300`, `created_at >= 2026-02-13T00:00:00`). Together they cover the full filter predicate space defined in Spec 08 Section 7.3: `Eq`, `Any`, `Range`, `Min`, `Max`, `CreatedWithin`, `CreatedAfter`, `CreatedBefore`.
|
|
|
|
## Requirements
|
|
|
|
- `RangeIndex<V: Ord + Clone>` struct backed by `BTreeMap<V, RoaringBitmap>`
|
|
- Key is the attribute value, value is the set of entity IDs with that exact attribute value
|
|
- `insert(entity_id, value: V)` -- adds entity to the bitmap for that value
|
|
- `delete(entity_id, value: V)` -- removes entity from the bitmap for that value
|
|
- `range(lo: Bound<V>, hi: Bound<V>) -> RoaringBitmap` -- union of all bitmaps with keys in [lo, hi]
|
|
- `gt(threshold: V) -> RoaringBitmap` -- union of all bitmaps with keys > threshold
|
|
- `gte(threshold: V) -> RoaringBitmap` -- union of all bitmaps with keys >= threshold
|
|
- `lt(threshold: V) -> RoaringBitmap` -- union of all bitmaps with keys < threshold
|
|
- `lte(threshold: V) -> RoaringBitmap` -- union of all bitmaps with keys <= threshold
|
|
- `selectivity(lo: Bound<V>, hi: Bound<V>, total: u64) -> f64` -- estimated fraction of entities in range
|
|
- `total_count() -> u64` -- total distinct entity IDs indexed
|
|
- Concrete instantiations: `RangeIndex<u64>` for timestamps (nanoseconds), `RangeIndex<u32>` for duration (seconds)
|
|
- Persistence: serialize/deserialize each value bitmap to/from storage engine via `Tag::Idx`
|
|
- `Send + Sync`
|
|
|
|
## Technical Design
|
|
|
|
### Module Structure
|
|
|
|
```
|
|
tidal/src/storage/
|
|
indexes/
|
|
range.rs -- RangeIndex<V> (this task)
|
|
```
|
|
|
|
### Public API
|
|
|
|
```rust
|
|
// === storage/indexes/range.rs ===
|
|
|
|
use roaring::RoaringBitmap;
|
|
use std::collections::BTreeMap;
|
|
use std::ops::Bound;
|
|
use std::sync::RwLock;
|
|
|
|
use super::IndexError;
|
|
|
|
/// A B-tree backed range index for a single ordered numeric field.
|
|
///
|
|
/// Maps attribute values to `RoaringBitmap` sets of entity IDs.
|
|
/// The B-tree ordering enables efficient range queries: "all entities
|
|
/// with duration >= 300" unions the bitmaps for keys 300, 301, ...
|
|
///
|
|
/// # Design
|
|
///
|
|
/// Unlike the `BitmapIndex` (which uses a `HashMap` for exact-match),
|
|
/// this index uses a `BTreeMap` to exploit key ordering for range
|
|
/// scans. The `range()` method iterates from `lo` to `hi` in the
|
|
/// tree and unions the bitmaps. At 10K entities with ~100 distinct
|
|
/// duration values, this is ~100 bitmap unions -- well under 1ms.
|
|
///
|
|
/// # Concurrency
|
|
///
|
|
/// Same model as `BitmapIndex`: `RwLock` for read/write separation.
|
|
pub struct RangeIndex<V: Ord + Clone> {
|
|
field_name: String,
|
|
tree: RwLock<BTreeMap<V, RoaringBitmap>>,
|
|
}
|
|
|
|
impl<V: Ord + Clone> RangeIndex<V> {
|
|
/// Create a new, empty range index for the given field.
|
|
pub fn new(field_name: impl Into<String>) -> Self;
|
|
|
|
/// The field name this index covers.
|
|
pub fn field_name(&self) -> &str;
|
|
|
|
/// Add an entity with the given attribute value.
|
|
///
|
|
/// If the entity already exists at a DIFFERENT value, the caller
|
|
/// must call `delete(entity_id, old_value)` first. The range index
|
|
/// does not track previous values per entity.
|
|
pub fn insert(&self, entity_id: u32, value: V);
|
|
|
|
/// Remove an entity from the bitmap at the given value.
|
|
///
|
|
/// Returns `true` if the entity was present and removed.
|
|
pub fn delete(&self, entity_id: u32, value: &V) -> bool;
|
|
|
|
/// Range query: return the union of all bitmaps with keys in [lo, hi].
|
|
///
|
|
/// Uses `BTreeMap::range()` to iterate matching entries and
|
|
/// unions their bitmaps.
|
|
pub fn range(&self, lo: Bound<&V>, hi: Bound<&V>) -> RoaringBitmap;
|
|
|
|
/// Greater-than query: return entities with value > threshold.
|
|
pub fn gt(&self, threshold: &V) -> RoaringBitmap;
|
|
|
|
/// Greater-than-or-equal query.
|
|
pub fn gte(&self, threshold: &V) -> RoaringBitmap;
|
|
|
|
/// Less-than query: return entities with value < threshold.
|
|
pub fn lt(&self, threshold: &V) -> RoaringBitmap;
|
|
|
|
/// Less-than-or-equal query.
|
|
pub fn lte(&self, threshold: &V) -> RoaringBitmap;
|
|
|
|
/// Estimate the fraction of entities matching a range query.
|
|
///
|
|
/// Computed as: `sum(cardinality(bitmap) for key in range) / total`.
|
|
/// This is exact, not estimated, because we iterate the actual
|
|
/// bitmaps. At M2 scale (10K entities, ~100 distinct values),
|
|
/// this is cheap. At M7 scale, consider sampling.
|
|
///
|
|
/// Returns a value in [0.0, 1.0]. Returns 0.0 if `total` is 0.
|
|
pub fn selectivity(&self, lo: Bound<&V>, hi: Bound<&V>, total: u64) -> f64;
|
|
|
|
/// Total number of distinct entity IDs indexed.
|
|
pub fn total_count(&self) -> u64;
|
|
|
|
/// Number of distinct attribute values in the tree.
|
|
pub fn distinct_values(&self) -> usize;
|
|
|
|
/// Whether the index is empty.
|
|
pub fn is_empty(&self) -> bool;
|
|
}
|
|
```
|
|
|
|
### Persistence API
|
|
|
|
```rust
|
|
/// Persistence methods for `RangeIndex<u64>` (timestamps) and `RangeIndex<u32>` (durations).
|
|
///
|
|
/// These are implemented on the concrete types, not the generic, because
|
|
/// serialization requires knowing the byte width of V.
|
|
impl RangeIndex<u64> {
|
|
/// Serialize all bitmaps to storage engine key-value pairs.
|
|
///
|
|
/// Key format: `encode_key(EntityId(0), Tag::Idx, suffix)`
|
|
/// where suffix = `b"RNG:" + field_name + b":" + value_be_bytes`.
|
|
pub fn serialize_to_kv_pairs(&self) -> Result<Vec<(Vec<u8>, Vec<u8>)>, IndexError>;
|
|
|
|
/// Deserialize from storage engine key-value pairs.
|
|
pub fn load_from_kv_pairs(
|
|
field_name: impl Into<String>,
|
|
pairs: impl Iterator<Item = (Vec<u8>, Vec<u8>)>,
|
|
) -> Result<Self, IndexError>;
|
|
}
|
|
|
|
impl RangeIndex<u32> {
|
|
/// Serialize all bitmaps to storage engine key-value pairs.
|
|
pub fn serialize_to_kv_pairs(&self) -> Result<Vec<(Vec<u8>, Vec<u8>)>, IndexError>;
|
|
|
|
/// Deserialize from storage engine key-value pairs.
|
|
pub fn load_from_kv_pairs(
|
|
field_name: impl Into<String>,
|
|
pairs: impl Iterator<Item = (Vec<u8>, Vec<u8>)>,
|
|
) -> Result<Self, IndexError>;
|
|
}
|
|
```
|
|
|
|
### Internal Design
|
|
|
|
**BTreeMap iteration for range queries:**
|
|
|
|
```rust
|
|
fn range(&self, lo: Bound<&V>, hi: Bound<&V>) -> RoaringBitmap {
|
|
let tree = self.tree.read().expect("lock poisoned");
|
|
let mut result = RoaringBitmap::new();
|
|
for (_key, bitmap) in tree.range((lo, hi)) {
|
|
result |= bitmap;
|
|
}
|
|
result
|
|
}
|
|
```
|
|
|
|
This leverages `BTreeMap::range()` which returns an iterator over entries with keys in the specified bounds. The bounds use `std::ops::Bound` (`Included`, `Excluded`, `Unbounded`) for flexible range specification.
|
|
|
|
**Selectivity computation:**
|
|
|
|
```rust
|
|
fn selectivity(&self, lo: Bound<&V>, hi: Bound<&V>, total: u64) -> f64 {
|
|
if total == 0 {
|
|
return 0.0;
|
|
}
|
|
let range_bitmap = self.range(lo, hi);
|
|
range_bitmap.len() as f64 / total as f64
|
|
}
|
|
```
|
|
|
|
This is exact (not estimated) because we compute the actual union of bitmaps in the range. At M2 scale, this is fast enough. If M7 shows this is a bottleneck, approximate by sampling a fixed number of values in the range and extrapolating.
|
|
|
|
**Persistence key encoding for ranges:**
|
|
|
|
```
|
|
Key: encode_key(EntityId(0), Tag::Idx, b"RNG:created_at:\x00\x00\x01\x8E\x3A\xB0\xD0\x00")
|
|
^--- INDEX_ROOT_ID ^--- "RNG:" prefix ^--- value in BE bytes
|
|
```
|
|
|
|
Values are stored in big-endian byte order in the key suffix so that lexicographic key ordering matches numeric value ordering. This is important for the storage engine's prefix scan to return values in sorted order.
|
|
|
|
### Error Handling
|
|
|
|
- `insert()` and `delete()` are infallible (in-memory only).
|
|
- `range()` and selectivity methods are infallible.
|
|
- Persistence methods return `Result<_, IndexError>`.
|
|
|
|
## Test Strategy
|
|
|
|
### Property Tests
|
|
|
|
```rust
|
|
use proptest::prelude::*;
|
|
use std::ops::Bound;
|
|
|
|
// Range query returns exactly the entities with values in [lo, hi].
|
|
proptest! {
|
|
#[test]
|
|
fn range_query_correctness(
|
|
entries in prop::collection::vec(
|
|
(0u32..10_000, 0u32..1000), // (entity_id, value)
|
|
1..200,
|
|
),
|
|
lo in 0u32..500,
|
|
hi in 500u32..1000,
|
|
) {
|
|
let index: RangeIndex<u32> = RangeIndex::new("test_field");
|
|
for &(id, value) in &entries {
|
|
index.insert(id, value);
|
|
}
|
|
|
|
let result = index.range(
|
|
Bound::Included(&lo),
|
|
Bound::Included(&hi),
|
|
);
|
|
|
|
// Verify: result contains exactly the entities with lo <= value <= hi
|
|
for &(id, value) in &entries {
|
|
if value >= lo && value <= hi {
|
|
prop_assert!(result.contains(id),
|
|
"entity {id} with value {value} should be in range [{lo}, {hi}]");
|
|
}
|
|
}
|
|
// Note: entity IDs can appear multiple times in entries with different
|
|
// values, some inside range and some outside. The bitmap union ensures
|
|
// the entity is present if ANY of its values fall in range.
|
|
}
|
|
}
|
|
|
|
// Selectivity is in [0.0, 1.0].
|
|
proptest! {
|
|
#[test]
|
|
fn selectivity_in_unit_range(
|
|
entries in prop::collection::vec(
|
|
(0u32..10_000, 0u32..1000),
|
|
1..200,
|
|
),
|
|
lo in 0u32..500,
|
|
hi in 500u32..1000,
|
|
) {
|
|
let index: RangeIndex<u32> = RangeIndex::new("test_field");
|
|
for &(id, value) in &entries {
|
|
index.insert(id, value);
|
|
}
|
|
let total = index.total_count();
|
|
let sel = index.selectivity(
|
|
Bound::Included(&lo),
|
|
Bound::Included(&hi),
|
|
total,
|
|
);
|
|
prop_assert!(sel >= 0.0, "selectivity was {sel}");
|
|
prop_assert!(sel <= 1.0, "selectivity was {sel}");
|
|
}
|
|
}
|
|
|
|
// Insert-delete roundtrip: deleted entities do not appear in range queries.
|
|
proptest! {
|
|
#[test]
|
|
fn insert_delete_roundtrip(
|
|
entries in prop::collection::vec(
|
|
(0u32..1_000, 0u32..100),
|
|
1..100,
|
|
),
|
|
) {
|
|
let index: RangeIndex<u32> = RangeIndex::new("test_field");
|
|
for &(id, value) in &entries {
|
|
index.insert(id, value);
|
|
}
|
|
|
|
// Delete all entries
|
|
for &(id, value) in &entries {
|
|
index.delete(id, &value);
|
|
}
|
|
|
|
// Full range should be empty
|
|
let result = index.range(Bound::Unbounded, Bound::Unbounded);
|
|
prop_assert!(result.is_empty(), "expected empty after deleting all");
|
|
}
|
|
}
|
|
|
|
// Serialize-deserialize roundtrip (u32).
|
|
proptest! {
|
|
#[test]
|
|
fn serialize_roundtrip_u32(
|
|
entries in prop::collection::vec(
|
|
(0u32..10_000, 0u32..1000),
|
|
1..100,
|
|
),
|
|
) {
|
|
let index: RangeIndex<u32> = RangeIndex::new("duration");
|
|
for &(id, value) in &entries {
|
|
index.insert(id, value);
|
|
}
|
|
|
|
let kv_pairs = index.serialize_to_kv_pairs().unwrap();
|
|
let restored = RangeIndex::<u32>::load_from_kv_pairs(
|
|
"duration",
|
|
kv_pairs.into_iter(),
|
|
).unwrap();
|
|
|
|
// Full range query should match
|
|
let orig = index.range(Bound::Unbounded, Bound::Unbounded);
|
|
let rest = restored.range(Bound::Unbounded, Bound::Unbounded);
|
|
prop_assert_eq!(orig, rest);
|
|
}
|
|
}
|
|
```
|
|
|
|
### Unit Tests
|
|
|
|
```rust
|
|
#[test]
|
|
fn new_index_is_empty() {
|
|
let index: RangeIndex<u32> = RangeIndex::new("duration");
|
|
assert!(index.is_empty());
|
|
assert_eq!(index.total_count(), 0);
|
|
assert_eq!(index.distinct_values(), 0);
|
|
}
|
|
|
|
#[test]
|
|
fn insert_and_range_query() {
|
|
let index: RangeIndex<u32> = RangeIndex::new("duration");
|
|
index.insert(1, 60); // 1 minute
|
|
index.insert(2, 300); // 5 minutes
|
|
index.insert(3, 600); // 10 minutes
|
|
index.insert(4, 1800); // 30 minutes
|
|
|
|
// Range [300, 600] should return entities 2 and 3
|
|
let result = index.range(
|
|
Bound::Included(&300),
|
|
Bound::Included(&600),
|
|
);
|
|
assert_eq!(result.len(), 2);
|
|
assert!(result.contains(2));
|
|
assert!(result.contains(3));
|
|
assert!(!result.contains(1));
|
|
assert!(!result.contains(4));
|
|
}
|
|
|
|
#[test]
|
|
fn gte_query() {
|
|
let index: RangeIndex<u32> = RangeIndex::new("duration");
|
|
index.insert(1, 60);
|
|
index.insert(2, 300);
|
|
index.insert(3, 600);
|
|
|
|
let result = index.gte(&300);
|
|
assert_eq!(result.len(), 2);
|
|
assert!(result.contains(2));
|
|
assert!(result.contains(3));
|
|
}
|
|
|
|
#[test]
|
|
fn gt_query() {
|
|
let index: RangeIndex<u32> = RangeIndex::new("duration");
|
|
index.insert(1, 60);
|
|
index.insert(2, 300);
|
|
index.insert(3, 600);
|
|
|
|
let result = index.gt(&300);
|
|
assert_eq!(result.len(), 1);
|
|
assert!(result.contains(3));
|
|
}
|
|
|
|
#[test]
|
|
fn lt_and_lte_queries() {
|
|
let index: RangeIndex<u32> = RangeIndex::new("duration");
|
|
index.insert(1, 60);
|
|
index.insert(2, 300);
|
|
index.insert(3, 600);
|
|
|
|
let lt = index.lt(&300);
|
|
assert_eq!(lt.len(), 1);
|
|
assert!(lt.contains(1));
|
|
|
|
let lte = index.lte(&300);
|
|
assert_eq!(lte.len(), 2);
|
|
assert!(lte.contains(1));
|
|
assert!(lte.contains(2));
|
|
}
|
|
|
|
#[test]
|
|
fn unbounded_range_returns_all() {
|
|
let index: RangeIndex<u32> = RangeIndex::new("duration");
|
|
index.insert(1, 60);
|
|
index.insert(2, 300);
|
|
index.insert(3, 600);
|
|
|
|
let all = index.range(Bound::Unbounded, Bound::Unbounded);
|
|
assert_eq!(all.len(), 3);
|
|
}
|
|
|
|
#[test]
|
|
fn empty_range_returns_empty_bitmap() {
|
|
let index: RangeIndex<u32> = RangeIndex::new("duration");
|
|
index.insert(1, 60);
|
|
index.insert(2, 300);
|
|
|
|
// Range [400, 500] has no entities
|
|
let result = index.range(
|
|
Bound::Included(&400),
|
|
Bound::Included(&500),
|
|
);
|
|
assert!(result.is_empty());
|
|
}
|
|
|
|
#[test]
|
|
fn selectivity_full_range() {
|
|
let index: RangeIndex<u32> = RangeIndex::new("duration");
|
|
index.insert(1, 60);
|
|
index.insert(2, 300);
|
|
index.insert(3, 600);
|
|
|
|
let sel = index.selectivity(
|
|
Bound::Unbounded,
|
|
Bound::Unbounded,
|
|
3,
|
|
);
|
|
assert!((sel - 1.0).abs() < f64::EPSILON);
|
|
}
|
|
|
|
#[test]
|
|
fn selectivity_partial_range() {
|
|
let index: RangeIndex<u32> = RangeIndex::new("duration");
|
|
index.insert(1, 60);
|
|
index.insert(2, 300);
|
|
index.insert(3, 600);
|
|
index.insert(4, 1800);
|
|
|
|
let sel = index.selectivity(
|
|
Bound::Included(&300),
|
|
Bound::Included(&600),
|
|
4,
|
|
);
|
|
assert!((sel - 0.5).abs() < f64::EPSILON); // 2 of 4
|
|
}
|
|
|
|
#[test]
|
|
fn selectivity_zero_total() {
|
|
let index: RangeIndex<u32> = RangeIndex::new("duration");
|
|
let sel = index.selectivity(
|
|
Bound::Unbounded,
|
|
Bound::Unbounded,
|
|
0,
|
|
);
|
|
assert!((sel - 0.0).abs() < f64::EPSILON);
|
|
}
|
|
|
|
#[test]
|
|
fn delete_cleans_up_empty_entries() {
|
|
let index: RangeIndex<u32> = RangeIndex::new("duration");
|
|
index.insert(1, 300);
|
|
index.delete(1, &300);
|
|
|
|
assert_eq!(index.total_count(), 0);
|
|
assert_eq!(index.distinct_values(), 0);
|
|
}
|
|
|
|
#[test]
|
|
fn multiple_entities_same_value() {
|
|
let index: RangeIndex<u32> = RangeIndex::new("duration");
|
|
index.insert(1, 300);
|
|
index.insert(2, 300);
|
|
index.insert(3, 300);
|
|
|
|
let result = index.gte(&300);
|
|
assert_eq!(result.len(), 3);
|
|
assert_eq!(index.distinct_values(), 1); // one value: 300
|
|
}
|
|
|
|
#[test]
|
|
fn timestamp_index_u64() {
|
|
let index: RangeIndex<u64> = RangeIndex::new("created_at");
|
|
let now_ns: u64 = 1_708_000_000_000_000_000; // some timestamp
|
|
let one_day_ago = now_ns - 86_400_000_000_000; // 24h in nanos
|
|
let seven_days_ago = now_ns - 7 * 86_400_000_000_000;
|
|
|
|
index.insert(1, now_ns);
|
|
index.insert(2, one_day_ago);
|
|
index.insert(3, seven_days_ago);
|
|
index.insert(4, seven_days_ago - 1); // older than 7 days
|
|
|
|
// "created_within:7d" = created_at >= seven_days_ago
|
|
let recent = index.gte(&seven_days_ago);
|
|
assert_eq!(recent.len(), 3); // entities 1, 2, 3
|
|
assert!(!recent.contains(4));
|
|
}
|
|
```
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [ ] `RangeIndex<V>` backed by `BTreeMap<V, RoaringBitmap>` with `RwLock` for concurrent access
|
|
- [ ] `insert(entity_id, value)` adds the entity to the bitmap for that exact value
|
|
- [ ] `delete(entity_id, value)` removes the entity from the bitmap; cleans up empty entries
|
|
- [ ] `range(lo, hi)` returns the union bitmap of all entries with keys in [lo, hi] using `BTreeMap::range()`
|
|
- [ ] `gt()`, `gte()`, `lt()`, `lte()` convenience methods implemented via `range()` with appropriate bounds
|
|
- [ ] `selectivity(lo, hi, total)` returns the fraction of entities in range; always in [0.0, 1.0]
|
|
- [ ] `total_count()` returns distinct entity IDs across all values (no double-counting)
|
|
- [ ] Concrete instantiations work: `RangeIndex<u64>` for timestamps, `RangeIndex<u32>` for durations
|
|
- [ ] Persistence: `serialize_to_kv_pairs()` / `load_from_kv_pairs()` roundtrip for `RangeIndex<u64>` and `RangeIndex<u32>` (property tested)
|
|
- [ ] Key encoding uses `encode_key(EntityId(0), Tag::Idx, b"RNG:{field_name}:{value_be_bytes}")` with BE ordering
|
|
- [ ] Range query returns exactly the entities whose values fall within the bounds (property tested)
|
|
- [ ] `RangeIndex<V>` is `Send + Sync`
|
|
- [ ] No `unsafe` code
|
|
- [ ] `cargo clippy -- -D warnings` passes
|
|
- [ ] All property tests and unit tests pass
|
|
|
|
## Research References
|
|
|
|
- [docs/research/ann_for_tidaldb.md](../../../research/ann_for_tidaldb.md) -- Selectivity estimation from sorted index statistics (used for range predicates alongside bitmap cardinality for keyword predicates)
|
|
|
|
## Spec References
|
|
|
|
- [docs/specs/07-vector-retrieval.md](../../../specs/07-vector-retrieval.md) -- Section 3 (selectivity estimation: "numeric range: estimate from sorted index statistics")
|
|
- [docs/specs/08-query-engine.md](../../../specs/08-query-engine.md) -- Section 7.3 (`Filter::Range`, `Filter::Min`, `Filter::Max`, `Filter::CreatedWithin`, `Filter::CreatedAfter`, `Filter::CreatedBefore`)
|
|
|
|
## Implementation Notes
|
|
|
|
- `BTreeMap::range()` accepts `(Bound<&K>, Bound<&K>)`. The `Bound` type from `std::ops` provides `Included`, `Excluded`, and `Unbounded`. This maps directly to filter predicates: `Min` = `Included(threshold)..Unbounded`, `Max` = `Unbounded..Included(threshold)`, `Range` = `Included(lo)..Included(hi)`.
|
|
- Empty bitmaps should be removed from the `BTreeMap` after `delete()` to avoid dead entries accumulating. Check `bitmap.is_empty()` after removal.
|
|
- `RangeIndex<V>` requires `V: Ord + Clone`. `u32` and `u64` both satisfy these bounds. If future needs add `f64` range indexes (e.g., for latitude/longitude), `f64` does NOT implement `Ord` (NaN). Use `ordered_float::OrderedFloat<f64>` or a newtype wrapper. This is deferred -- M2 only needs integer types.
|
|
- The `RangeIndex` does not track which entity has which value. If an entity's value changes (e.g., duration updated), the caller must `delete(id, old_value)` then `insert(id, new_value)`. The caller (entity write path) is responsible for this. The index is a pure mapping structure.
|
|
- For persistence, values are encoded in big-endian bytes in the key suffix so lexicographic key ordering in the storage engine matches numeric value ordering. This allows potential future optimization of scanning index keys from the storage engine in sorted order without loading all into memory.
|
|
- Do NOT implement approximate selectivity estimation (histogram-based, reservoir sampling) in this task. The exact computation via bitmap union is fast enough at M2 scale. If M7 benchmarks show the full union is too slow for selectivity estimation, add histograms as an optimization.
|