tidaldb/CLAUDE.md
jordan 29400d48db feat: implement Milestone 1 phases 1-3 — schema, WAL, and storage layer
Implements the foundation of tidalDB's data pipeline:

**Phase 1 – Schema primitives**
- EntityId newtype (u64, big-endian ordering)
- SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows
- SchemaBuilder with full constraint validation (duplicates, identifiers,
  half-life, windows, velocity)
- LumenError wrapping all subsystems with required From impls

**Phase 2 – Write-Ahead Log**
- Length-prefixed, BLAKE3-protected entry format
- Group-commit writer (batch up to 100 events / 10 ms)
- Double-buffered content-hash deduplication
- Checkpoint, truncation, and crash-recovery with full replay
- Integration, property, and UAT tests (incl. 5,500-event deterministic UAT)
- Proptest coverage scaled to 10 000 events/run (was ≤500) to meet
  acceptance criterion; cases reduced 100→10 to keep runtime comparable

**Phase 3 – Storage engine**
- StorageEngine trait (get/put/delete/scan/batch/flush)
- Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers
- InMemoryBackend (BTreeMap + RwLock)
- FjallStorage with three isolated keyspaces and atomic batch helper
- Property tests for key ordering and round-trip correctness

Also adds planning docs for phases 4-5, research docs, architecture
overview, and roadmap updates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 16:43:24 -07:00

113 lines
5.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

> **Jon Gjengset:** I don't ship what I wouldn't trust at 3am during a production incident.
> Pay attention to what the user says and follow it. Do not make them repeat themselves.
# tidalDB
A single-node-first, embeddable Rust database for the **personalized content ranking problem**. Replaces the 6-system stack (Elasticsearch + Redis + Kafka + feature store + vector DB + ranking service) with a single process, single query interface, and single operational model.
**Status:** Vision and specification phase. No implementation yet.
## Find Your Guide
| If you need to... | Read this |
|-------------------|-----------|
| **Understand the vision** | [VISION.md](VISION.md) |
| **See use cases and surfaces** | [USE_CASES.md](USE_CASES.md) |
| **See sequence diagrams** | [SEQUENCE.md](SEQUENCE.md) |
| **Understand the system architecture** | [ARCHITECTURE.md](ARCHITECTURE.md) |
| **Look up domain concepts** | [ai-lookup/index.md](ai-lookup/index.md) |
| **Follow coding standards** | [CODING_GUIDELINES.md](CODING_GUIDELINES.md) |
| **See the API spec** | [API.md](API.md) |
| **Read architectural lessons** | [thoughts.md](thoughts.md) |
| **Read technical research** | [docs/research/](docs/research/) |
## Agents
| Agent | Identity | Use when |
|-------|----------|----------|
| **@tidal-engineer** | Jon Gjengset | Implementing features, designing storage internals, building the signal system, debugging correctness issues |
| **@tidal-visionary** | Spencer Kimball | Planning roadmaps, defining milestones, scoping phases, making build-vs-defer decisions |
| **@tidal-researcher** | Andy Pavlo | Investigating best practices, surveying prior art, evaluating libraries, producing research documents |
| **@tidal-storyteller** | — | Building the marketing site, writing blog posts, crafting public-facing copy |
## Skills
### Phase Lifecycle
| Step | Skill | Use when |
|------|-------|----------|
| 1. Plan | `/milestone` | Planning task documents for a milestone phase (orchestrates all 3 agents) |
| 2. Build | `/implement` | Executing a planned phase task-by-task (delegates to @tidal-engineer) |
| 3. Review | `/review` | Reviewing completed phase against spec and coding standards (delegates to @tidal-engineer) |
| 4. Accept | `/uat` | User acceptance testing a reviewed phase (delegates to @tidal-engineer) |
### Other Skills
| Skill | Use when |
|-------|----------|
| `/tidal-deliver-task` | End-to-end feature delivery orchestrating all 4 agents (scope -> research -> build -> review -> accept) |
| `/tidal-verify-completion-to-spec` | Joint spec verification from all 3 agent lenses in parallel (product fit, research grounding, implementation correctness) — use any time, not just after /implement |
| `/develop` | Quick implementation work outside the milestone lifecycle |
| `/research [topic]` | Investigating best practices, evaluating approaches (delegates to @tidal-researcher) |
| `/roadmap` | Building or updating the milestone roadmap (delegates to @tidal-visionary) |
| `/build-site` | Creating or iterating on the marketing site |
| `/write-blog` | Writing blog posts about progress or architecture |
## Core Domain Model
- **Entities:** Items (content), Users, Creators — each with metadata, embedding slot, signal ledger
- **Signals:** Typed, timestamped event streams with native decay, velocity, and windowed aggregation
- **Relationships:** Weighted, directional edges between entities (follows, blocks, interactions)
- **Ranking Profiles:** Named, versioned scoring functions declared in schema
- **Query:** Single operation combining retrieval, filtering, ranking, and diversity enforcement
## Ports
Dev servers use port range **5952059529** (e.g. `site/` on 59520).
## Critical Rules
- **Scope:** This is NOT a general-purpose database. Every decision serves one question: "given a user and a context, what content should they see, in what order?"
- **Embeddings:** The database retrieves and ranks over vectors. It does NOT generate them.
- **Signals are primitives:** Decay, velocity, and windowed aggregation are native — not application logic.
- **Single-node first:** Embeddable. Scales vertically before horizontally.
- **Language:** Rust.
## Repository Structure
```
. # Top-level docs and configuration
├── CLAUDE.md # This file — project instructions
├── VISION.md # Product vision and thesis
├── USE_CASES.md # 14 use cases, all discovery surfaces
├── SEQUENCE.md # Data flow sequence diagrams
├── CODING_GUIDELINES.md # Engineering standards
├── API.md # API specification
├── thoughts.md # Architectural lessons from sister projects
├── ai-lookup/ # Domain concept reference
├── docs/ # Research and documentation
│ └── research/ # Deep technical research docs
├── .claude/ # Claude Code configuration
│ ├── agents/ # Agent definitions
│ └── skills/ # Skill definitions
├── tidal/ # Rust database engine
│ ├── Cargo.toml
│ ├── src/
│ │ ├── storage/ # Entity store, signal ledger, inverted index, HNSW
│ │ ├── query/ # Query parser, planner, executor
│ │ ├── ranking/ # Profile engine, signal scoring, diversity enforcement
│ │ ├── signals/ # Signal types, decay, velocity, windowed aggregation
│ │ └── schema/ # Schema definition, validation, migrations
│ ├── benches/ # Performance benchmarks
│ └── tests/ # Integration and property tests
└── site/ # Public marketing site (Next.js)
```
## Pre-commit Hooks
The pre-commit hook runs automatically on staged files:
- **tidal/ (Rust):** `cargo fmt` (auto-fix + re-stage), `cargo clippy -D warnings`, `cargo test --lib`
- **site/ (Next.js):** `eslint` (if node_modules installed)
All cargo commands use `--manifest-path tidal/Cargo.toml` since the Rust project is not at repo root.