Implements the foundation of tidalDB's data pipeline: **Phase 1 – Schema primitives** - EntityId newtype (u64, big-endian ordering) - SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows - SchemaBuilder with full constraint validation (duplicates, identifiers, half-life, windows, velocity) - LumenError wrapping all subsystems with required From impls **Phase 2 – Write-Ahead Log** - Length-prefixed, BLAKE3-protected entry format - Group-commit writer (batch up to 100 events / 10 ms) - Double-buffered content-hash deduplication - Checkpoint, truncation, and crash-recovery with full replay - Integration, property, and UAT tests (incl. 5,500-event deterministic UAT) - Proptest coverage scaled to 10 000 events/run (was ≤500) to meet acceptance criterion; cases reduced 100→10 to keep runtime comparable **Phase 3 – Storage engine** - StorageEngine trait (get/put/delete/scan/batch/flush) - Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers - InMemoryBackend (BTreeMap + RwLock) - FjallStorage with three isolated keyspaces and atomic batch helper - Property tests for key ordering and round-trip correctness Also adds planning docs for phases 4-5, research docs, architecture overview, and roadmap updates. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5.0 KiB
Milestone 1, Phase 1: Core Type System and Schema
Phase Deliverable
The foundational type system -- entity IDs, signal type definitions, decay rate declarations, window specifications, and the error types that every subsequent module depends on. The schema module that validates and stores signal/entity definitions.
Acceptance Criteria
EntityIdis a u64 newtype withDisplay,Hash,Eq,OrdSignalTypeDefdeclaration captures: name, decay model (exponential/linear/permanent), half-life duration, enabled windows (1h/24h/7d/30d/all_time), velocity enabled flagDecayModel::Exponentialstores pre-computed lambda derived from half-life:lambda = ln(2) / half_life_secondsLumenErrorenum covers Storage, NotFound, Schema, Durability, Query, Internal variants per CODING_GUIDELINES.md- Schema validation rejects: duplicate signal names, zero/negative half-life, empty window list on non-permanent signals, velocity without windows
- All hot-path numeric types use the precision specified in research (f64 for decay scores, u64 for timestamps in nanoseconds)
Dependencies
- Requires: Nothing -- this is the root of the dependency DAG
- Blocks: m1p2 (WAL), m1p3 (Storage/fjall), and transitively all subsequent phases
Research References
- docs/research/tidaldb_signal_ledger.md -- decay formula, EntityState struct, running-score approach
- docs/research/phase1_1_type_system.md -- newtype patterns, Duration handling, error hierarchy, schema validation, f64 precision analysis, Window enum design
- CODING_GUIDELINES.md -- error handling (section 7), module boundaries (section 9), dependencies (section 10)
- thoughts.md -- Part V.12 (subject-prefix keys), Part II.1 (WAL convergence)
Spec References
- docs/specs/03-signal-system.md -- signal type declaration, decay types and lambda precomputation, window definitions, signal ledger architecture
- docs/specs/11-schema.md -- schema definition API, type system, validation rules, schema versioning
- docs/specs/02-entity-model.md -- EntityKind (Item/User/Creator), entity ID encoding, storage representation
- docs/specs/01-storage-engine.md -- key encoding scheme using big-endian EntityId and Timestamp
- docs/specs/00-architecture-overview.md -- system architecture, code module map showing schema/ layout
Task Index
| # | Task | Delivers | Depends On | Complexity |
|---|---|---|---|---|
| 01 | Core Identity and Temporal Types | EntityId, EntityKind, Timestamp, Score |
None | S |
| 02 | Signal Type Definitions | SignalTypeDef, DecayModel, DecaySpec, Window, WindowSet |
Task 01 | S |
| 03 | Error Types and Schema Validation | LumenError, SchemaError, Schema, SchemaBuilder |
Task 01, Task 02 | S |
Task Dependency DAG
Task 01: Core Identity Types
|
v
Task 02: Signal Type Definitions (uses EntityKind from Task 01)
|
v
Task 03: Error Types + Schema Validation (uses EntityId, SignalTypeDef, DecayModel, Window)
Tasks 01 and 02 are technically parallelizable if EntityKind is extracted first, but at complexity S each, sequential execution is fine.
File Layout
tidal/src/
lib.rs -- pub mod declarations, Result<T> alias, re-exports
schema/
mod.rs -- pub use re-exports from submodules
entity.rs -- Task 01: EntityId, EntityKind
timestamp.rs -- Task 01: Timestamp newtype
score.rs -- Task 01: Score newtype (finite f64 with Ord)
signal.rs -- Task 02: SignalTypeDef, DecayModel, Window, WindowSet
error.rs -- Task 03: LumenError, SchemaError, sub-error stubs
validation.rs -- Task 03: Schema, SchemaBuilder, DecaySpec, SignalBuilder
signals/mod.rs -- empty (m1p4)
storage/mod.rs -- empty (m1p3)
query/mod.rs -- empty (Milestone 2)
ranking/mod.rs -- empty (Milestone 2)
Open Questions
-
String vs u64 entity IDs in public API -- API.md uses string IDs (
"item_abc"), internal types useu64. Resolution:EntityIdisu64internally. String-to-u64 mapping is a m1p5 concern when the publicLumenAPI is built. m1p1 defines only the internal type. -
EntityId uniqueness scope -- globally unique or per-EntityKind? Resolution: signal names are globally unique (no
item.viewvsuser.view). Entity IDs are scoped per-EntityKind by storage namespace. Different column families isolate the namespaces. -
Custom windows --
Window::Custom(Duration)deferred. The five fixed variants cover every sort mode and ranking profile in the spec. Adding custom windows would require dynamic bucket allocation. Revisit if M5 benchmarks demand it.