Implements the foundation of tidalDB's data pipeline: **Phase 1 – Schema primitives** - EntityId newtype (u64, big-endian ordering) - SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows - SchemaBuilder with full constraint validation (duplicates, identifiers, half-life, windows, velocity) - LumenError wrapping all subsystems with required From impls **Phase 2 – Write-Ahead Log** - Length-prefixed, BLAKE3-protected entry format - Group-commit writer (batch up to 100 events / 10 ms) - Double-buffered content-hash deduplication - Checkpoint, truncation, and crash-recovery with full replay - Integration, property, and UAT tests (incl. 5,500-event deterministic UAT) - Proptest coverage scaled to 10 000 events/run (was ≤500) to meet acceptance criterion; cases reduced 100→10 to keep runtime comparable **Phase 3 – Storage engine** - StorageEngine trait (get/put/delete/scan/batch/flush) - Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers - InMemoryBackend (BTreeMap + RwLock) - FjallStorage with three isolated keyspaces and atomic batch helper - Property tests for key ordering and round-trip correctness Also adds planning docs for phases 4-5, research docs, architecture overview, and roadmap updates. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
113 lines
5.9 KiB
Markdown
113 lines
5.9 KiB
Markdown
> **Jon Gjengset:** I don't ship what I wouldn't trust at 3am during a production incident.
|
||
> Pay attention to what the user says and follow it. Do not make them repeat themselves.
|
||
|
||
# tidalDB
|
||
|
||
A single-node-first, embeddable Rust database for the **personalized content ranking problem**. Replaces the 6-system stack (Elasticsearch + Redis + Kafka + feature store + vector DB + ranking service) with a single process, single query interface, and single operational model.
|
||
|
||
**Status:** Vision and specification phase. No implementation yet.
|
||
|
||
## Find Your Guide
|
||
|
||
| If you need to... | Read this |
|
||
|-------------------|-----------|
|
||
| **Understand the vision** | [VISION.md](VISION.md) |
|
||
| **See use cases and surfaces** | [USE_CASES.md](USE_CASES.md) |
|
||
| **See sequence diagrams** | [SEQUENCE.md](SEQUENCE.md) |
|
||
| **Understand the system architecture** | [ARCHITECTURE.md](ARCHITECTURE.md) |
|
||
| **Look up domain concepts** | [ai-lookup/index.md](ai-lookup/index.md) |
|
||
| **Follow coding standards** | [CODING_GUIDELINES.md](CODING_GUIDELINES.md) |
|
||
| **See the API spec** | [API.md](API.md) |
|
||
| **Read architectural lessons** | [thoughts.md](thoughts.md) |
|
||
| **Read technical research** | [docs/research/](docs/research/) |
|
||
|
||
## Agents
|
||
|
||
| Agent | Identity | Use when |
|
||
|-------|----------|----------|
|
||
| **@tidal-engineer** | Jon Gjengset | Implementing features, designing storage internals, building the signal system, debugging correctness issues |
|
||
| **@tidal-visionary** | Spencer Kimball | Planning roadmaps, defining milestones, scoping phases, making build-vs-defer decisions |
|
||
| **@tidal-researcher** | Andy Pavlo | Investigating best practices, surveying prior art, evaluating libraries, producing research documents |
|
||
| **@tidal-storyteller** | — | Building the marketing site, writing blog posts, crafting public-facing copy |
|
||
|
||
## Skills
|
||
|
||
### Phase Lifecycle
|
||
|
||
| Step | Skill | Use when |
|
||
|------|-------|----------|
|
||
| 1. Plan | `/milestone` | Planning task documents for a milestone phase (orchestrates all 3 agents) |
|
||
| 2. Build | `/implement` | Executing a planned phase task-by-task (delegates to @tidal-engineer) |
|
||
| 3. Review | `/review` | Reviewing completed phase against spec and coding standards (delegates to @tidal-engineer) |
|
||
| 4. Accept | `/uat` | User acceptance testing a reviewed phase (delegates to @tidal-engineer) |
|
||
|
||
### Other Skills
|
||
|
||
| Skill | Use when |
|
||
|-------|----------|
|
||
| `/tidal-deliver-task` | End-to-end feature delivery orchestrating all 4 agents (scope -> research -> build -> review -> accept) |
|
||
| `/tidal-verify-completion-to-spec` | Joint spec verification from all 3 agent lenses in parallel (product fit, research grounding, implementation correctness) — use any time, not just after /implement |
|
||
| `/develop` | Quick implementation work outside the milestone lifecycle |
|
||
| `/research [topic]` | Investigating best practices, evaluating approaches (delegates to @tidal-researcher) |
|
||
| `/roadmap` | Building or updating the milestone roadmap (delegates to @tidal-visionary) |
|
||
| `/build-site` | Creating or iterating on the marketing site |
|
||
| `/write-blog` | Writing blog posts about progress or architecture |
|
||
|
||
## Core Domain Model
|
||
|
||
- **Entities:** Items (content), Users, Creators — each with metadata, embedding slot, signal ledger
|
||
- **Signals:** Typed, timestamped event streams with native decay, velocity, and windowed aggregation
|
||
- **Relationships:** Weighted, directional edges between entities (follows, blocks, interactions)
|
||
- **Ranking Profiles:** Named, versioned scoring functions declared in schema
|
||
- **Query:** Single operation combining retrieval, filtering, ranking, and diversity enforcement
|
||
|
||
## Ports
|
||
|
||
Dev servers use port range **59520–59529** (e.g. `site/` on 59520).
|
||
|
||
## Critical Rules
|
||
|
||
- **Scope:** This is NOT a general-purpose database. Every decision serves one question: "given a user and a context, what content should they see, in what order?"
|
||
- **Embeddings:** The database retrieves and ranks over vectors. It does NOT generate them.
|
||
- **Signals are primitives:** Decay, velocity, and windowed aggregation are native — not application logic.
|
||
- **Single-node first:** Embeddable. Scales vertically before horizontally.
|
||
- **Language:** Rust.
|
||
|
||
## Repository Structure
|
||
|
||
```
|
||
. # Top-level docs and configuration
|
||
├── CLAUDE.md # This file — project instructions
|
||
├── VISION.md # Product vision and thesis
|
||
├── USE_CASES.md # 14 use cases, all discovery surfaces
|
||
├── SEQUENCE.md # Data flow sequence diagrams
|
||
├── CODING_GUIDELINES.md # Engineering standards
|
||
├── API.md # API specification
|
||
├── thoughts.md # Architectural lessons from sister projects
|
||
├── ai-lookup/ # Domain concept reference
|
||
├── docs/ # Research and documentation
|
||
│ └── research/ # Deep technical research docs
|
||
├── .claude/ # Claude Code configuration
|
||
│ ├── agents/ # Agent definitions
|
||
│ └── skills/ # Skill definitions
|
||
├── tidal/ # Rust database engine
|
||
│ ├── Cargo.toml
|
||
│ ├── src/
|
||
│ │ ├── storage/ # Entity store, signal ledger, inverted index, HNSW
|
||
│ │ ├── query/ # Query parser, planner, executor
|
||
│ │ ├── ranking/ # Profile engine, signal scoring, diversity enforcement
|
||
│ │ ├── signals/ # Signal types, decay, velocity, windowed aggregation
|
||
│ │ └── schema/ # Schema definition, validation, migrations
|
||
│ ├── benches/ # Performance benchmarks
|
||
│ └── tests/ # Integration and property tests
|
||
└── site/ # Public marketing site (Next.js)
|
||
```
|
||
|
||
## Pre-commit Hooks
|
||
|
||
The pre-commit hook runs automatically on staged files:
|
||
- **tidal/ (Rust):** `cargo fmt` (auto-fix + re-stage), `cargo clippy -D warnings`, `cargo test --lib`
|
||
- **site/ (Next.js):** `eslint` (if node_modules installed)
|
||
|
||
All cargo commands use `--manifest-path tidal/Cargo.toml` since the Rust project is not at repo root.
|