jordan 29400d48db feat: implement Milestone 1 phases 1-3 — schema, WAL, and storage layer

Implements the foundation of tidalDB's data pipeline:

**Phase 1 – Schema primitives**
- EntityId newtype (u64, big-endian ordering)
- SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows
- SchemaBuilder with full constraint validation (duplicates, identifiers,
  half-life, windows, velocity)
- LumenError wrapping all subsystems with required From impls

**Phase 2 – Write-Ahead Log**
- Length-prefixed, BLAKE3-protected entry format
- Group-commit writer (batch up to 100 events / 10 ms)
- Double-buffered content-hash deduplication
- Checkpoint, truncation, and crash-recovery with full replay
- Integration, property, and UAT tests (incl. 5,500-event deterministic UAT)
- Proptest coverage scaled to 10 000 events/run (was ≤500) to meet
  acceptance criterion; cases reduced 100→10 to keep runtime comparable

**Phase 3 – Storage engine**
- StorageEngine trait (get/put/delete/scan/batch/flush)
- Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers
- InMemoryBackend (BTreeMap + RwLock)
- FjallStorage with three isolated keyspaces and atomic batch helper
- Property tests for key ordering and round-trip correctness

Also adds planning docs for phases 4-5, research docs, architecture
overview, and roadmap updates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-20 16:43:24 -07:00

5.9 KiB

Raw Blame History

Jon Gjengset: I don't ship what I wouldn't trust at 3am during a production incident. Pay attention to what the user says and follow it. Do not make them repeat themselves.

tidalDB

A single-node-first, embeddable Rust database for the personalized content ranking problem. Replaces the 6-system stack (Elasticsearch + Redis + Kafka + feature store + vector DB + ranking service) with a single process, single query interface, and single operational model.

Status: Vision and specification phase. No implementation yet.

Find Your Guide

If you need to...	Read this
Understand the vision	VISION.md
See use cases and surfaces	USE_CASES.md
See sequence diagrams	SEQUENCE.md
Understand the system architecture	ARCHITECTURE.md
Look up domain concepts	ai-lookup/index.md
Follow coding standards	CODING_GUIDELINES.md
See the API spec	API.md
Read architectural lessons	thoughts.md
Read technical research	docs/research/

Agents

Agent	Identity	Use when
@tidal-engineer	Jon Gjengset	Implementing features, designing storage internals, building the signal system, debugging correctness issues
@tidal-visionary	Spencer Kimball	Planning roadmaps, defining milestones, scoping phases, making build-vs-defer decisions
@tidal-researcher	Andy Pavlo	Investigating best practices, surveying prior art, evaluating libraries, producing research documents
@tidal-storyteller	—	Building the marketing site, writing blog posts, crafting public-facing copy

Skills

Phase Lifecycle

Step	Skill	Use when
1. Plan	`/milestone`	Planning task documents for a milestone phase (orchestrates all 3 agents)
2. Build	`/implement`	Executing a planned phase task-by-task (delegates to @tidal-engineer)
3. Review	`/review`	Reviewing completed phase against spec and coding standards (delegates to @tidal-engineer)
4. Accept	`/uat`	User acceptance testing a reviewed phase (delegates to @tidal-engineer)

Other Skills

Skill	Use when
`/tidal-deliver-task`	End-to-end feature delivery orchestrating all 4 agents (scope -> research -> build -> review -> accept)
`/tidal-verify-completion-to-spec`	Joint spec verification from all 3 agent lenses in parallel (product fit, research grounding, implementation correctness) — use any time, not just after /implement
`/develop`	Quick implementation work outside the milestone lifecycle
`/research [topic]`	Investigating best practices, evaluating approaches (delegates to @tidal-researcher)
`/roadmap`	Building or updating the milestone roadmap (delegates to @tidal-visionary)
`/build-site`	Creating or iterating on the marketing site
`/write-blog`	Writing blog posts about progress or architecture

Core Domain Model

Entities: Items (content), Users, Creators — each with metadata, embedding slot, signal ledger
Signals: Typed, timestamped event streams with native decay, velocity, and windowed aggregation
Relationships: Weighted, directional edges between entities (follows, blocks, interactions)
Ranking Profiles: Named, versioned scoring functions declared in schema
Query: Single operation combining retrieval, filtering, ranking, and diversity enforcement

Ports

Dev servers use port range 59520–59529 (e.g. site/ on 59520).

Critical Rules

Scope: This is NOT a general-purpose database. Every decision serves one question: "given a user and a context, what content should they see, in what order?"
Embeddings: The database retrieves and ranks over vectors. It does NOT generate them.
Signals are primitives: Decay, velocity, and windowed aggregation are native — not application logic.
Single-node first: Embeddable. Scales vertically before horizontally.
Language: Rust.

Repository Structure

.                    # Top-level docs and configuration
├── CLAUDE.md        # This file — project instructions
├── VISION.md        # Product vision and thesis
├── USE_CASES.md     # 14 use cases, all discovery surfaces
├── SEQUENCE.md      # Data flow sequence diagrams
├── CODING_GUIDELINES.md  # Engineering standards
├── API.md           # API specification
├── thoughts.md      # Architectural lessons from sister projects
├── ai-lookup/       # Domain concept reference
├── docs/            # Research and documentation
│   └── research/    # Deep technical research docs
├── .claude/         # Claude Code configuration
│   ├── agents/      # Agent definitions
│   └── skills/      # Skill definitions
├── tidal/           # Rust database engine
│   ├── Cargo.toml
│   ├── src/
│   │   ├── storage/ # Entity store, signal ledger, inverted index, HNSW
│   │   ├── query/   # Query parser, planner, executor
│   │   ├── ranking/ # Profile engine, signal scoring, diversity enforcement
│   │   ├── signals/ # Signal types, decay, velocity, windowed aggregation
│   │   └── schema/  # Schema definition, validation, migrations
│   ├── benches/     # Performance benchmarks
│   └── tests/       # Integration and property tests
└── site/            # Public marketing site (Next.js)

Pre-commit Hooks

The pre-commit hook runs automatically on staged files:

tidal/ (Rust): cargo fmt (auto-fix + re-stage), cargo clippy -D warnings, cargo test --lib
site/ (Next.js): eslint (if node_modules installed)

All cargo commands use --manifest-path tidal/Cargo.toml since the Rust project is not at repo root.

5.9 KiB Raw Blame History Unescape Escape