tidaldb/docs/planning/milestone-1/phase-1/OVERVIEW.md
jordan 29400d48db feat: implement Milestone 1 phases 1-3 — schema, WAL, and storage layer
Implements the foundation of tidalDB's data pipeline:

**Phase 1 – Schema primitives**
- EntityId newtype (u64, big-endian ordering)
- SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows
- SchemaBuilder with full constraint validation (duplicates, identifiers,
  half-life, windows, velocity)
- LumenError wrapping all subsystems with required From impls

**Phase 2 – Write-Ahead Log**
- Length-prefixed, BLAKE3-protected entry format
- Group-commit writer (batch up to 100 events / 10 ms)
- Double-buffered content-hash deduplication
- Checkpoint, truncation, and crash-recovery with full replay
- Integration, property, and UAT tests (incl. 5,500-event deterministic UAT)
- Proptest coverage scaled to 10 000 events/run (was ≤500) to meet
  acceptance criterion; cases reduced 100→10 to keep runtime comparable

**Phase 3 – Storage engine**
- StorageEngine trait (get/put/delete/scan/batch/flush)
- Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers
- InMemoryBackend (BTreeMap + RwLock)
- FjallStorage with three isolated keyspaces and atomic batch helper
- Property tests for key ordering and round-trip correctness

Also adds planning docs for phases 4-5, research docs, architecture
overview, and roadmap updates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 16:43:24 -07:00

5.0 KiB

Milestone 1, Phase 1: Core Type System and Schema

Phase Deliverable

The foundational type system -- entity IDs, signal type definitions, decay rate declarations, window specifications, and the error types that every subsequent module depends on. The schema module that validates and stores signal/entity definitions.

Acceptance Criteria

  • EntityId is a u64 newtype with Display, Hash, Eq, Ord
  • SignalTypeDef declaration captures: name, decay model (exponential/linear/permanent), half-life duration, enabled windows (1h/24h/7d/30d/all_time), velocity enabled flag
  • DecayModel::Exponential stores pre-computed lambda derived from half-life: lambda = ln(2) / half_life_seconds
  • LumenError enum covers Storage, NotFound, Schema, Durability, Query, Internal variants per CODING_GUIDELINES.md
  • Schema validation rejects: duplicate signal names, zero/negative half-life, empty window list on non-permanent signals, velocity without windows
  • All hot-path numeric types use the precision specified in research (f64 for decay scores, u64 for timestamps in nanoseconds)

Dependencies

  • Requires: Nothing -- this is the root of the dependency DAG
  • Blocks: m1p2 (WAL), m1p3 (Storage/fjall), and transitively all subsequent phases

Research References

Spec References

Task Index

# Task Delivers Depends On Complexity
01 Core Identity and Temporal Types EntityId, EntityKind, Timestamp, Score None S
02 Signal Type Definitions SignalTypeDef, DecayModel, DecaySpec, Window, WindowSet Task 01 S
03 Error Types and Schema Validation LumenError, SchemaError, Schema, SchemaBuilder Task 01, Task 02 S

Task Dependency DAG

Task 01: Core Identity Types
    |
    v
Task 02: Signal Type Definitions  (uses EntityKind from Task 01)
    |
    v
Task 03: Error Types + Schema Validation  (uses EntityId, SignalTypeDef, DecayModel, Window)

Tasks 01 and 02 are technically parallelizable if EntityKind is extracted first, but at complexity S each, sequential execution is fine.

File Layout

tidal/src/
  lib.rs              -- pub mod declarations, Result<T> alias, re-exports
  schema/
    mod.rs            -- pub use re-exports from submodules
    entity.rs         -- Task 01: EntityId, EntityKind
    timestamp.rs      -- Task 01: Timestamp newtype
    score.rs          -- Task 01: Score newtype (finite f64 with Ord)
    signal.rs         -- Task 02: SignalTypeDef, DecayModel, Window, WindowSet
    error.rs          -- Task 03: LumenError, SchemaError, sub-error stubs
    validation.rs     -- Task 03: Schema, SchemaBuilder, DecaySpec, SignalBuilder
  signals/mod.rs      -- empty (m1p4)
  storage/mod.rs      -- empty (m1p3)
  query/mod.rs        -- empty (Milestone 2)
  ranking/mod.rs      -- empty (Milestone 2)

Open Questions

  1. String vs u64 entity IDs in public API -- API.md uses string IDs ("item_abc"), internal types use u64. Resolution: EntityId is u64 internally. String-to-u64 mapping is a m1p5 concern when the public Lumen API is built. m1p1 defines only the internal type.

  2. EntityId uniqueness scope -- globally unique or per-EntityKind? Resolution: signal names are globally unique (no item.view vs user.view). Entity IDs are scoped per-EntityKind by storage namespace. Different column families isolate the namespaces.

  3. Custom windows -- Window::Custom(Duration) deferred. The five fixed variants cover every sort mode and ranking profile in the spec. Adding custom windows would require dynamic bucket allocation. Revisit if M5 benchmarks demand it.