# Task 01: TidalDB Core ## Context **Milestone:** 1 -- Signal Engine **Phase:** m1p5 -- Entity CRUD and Signal Write API **Depends On:** m1p1 (types), m1p3 (storage), m1p4 (signal ledger) **Blocks:** Task 02 (Signal Write and Read API), Task 03 (Integration Test) **Complexity:** M ## Objective Deliver the `TidalDB` struct -- the single entry point for all database operations. This struct owns the storage engine, the signal ledger, and (when m1p2 ships) the WAL. It provides `open()` to initialize the database, `shutdown()` to cleanly close it, and entity metadata CRUD for items. `TidalDB` is the struct that a developer imports and uses. It must be `Send + Sync` so it can be wrapped in `Arc` and shared across threads. Its API must be clean, ergonomic, and unsurprising -- this is the first thing a user touches. ## Requirements - `TidalDB` struct owns: `FjallStorage`, `SignalLedger`, (optionally) WAL writer - `Config` struct: `data_dir: PathBuf`, `schema: Schema` - `TidalDB::open(config)` initializes storage, creates signal ledger, restores from checkpoint - `TidalDB::shutdown()` checkpoints signal state, flushes storage, drops resources - `db.write_item(entity_id, metadata)` stores metadata bytes at `Tag::Meta` in the items keyspace - `db.read_item(entity_id)` retrieves metadata bytes from `Tag::Meta` - `db.delete_item(entity_id)` removes metadata - `TidalDB` is `Send + Sync` - No `unsafe` code ## Technical Design ### Module Structure ``` tidal/src/ lib.rs -- TidalDB, Config, public API ``` ### Public API ```rust // === lib.rs (replacing current content) === pub mod query; pub mod ranking; pub mod schema; pub mod signals; pub mod storage; pub mod wal; pub use schema::LumenError; /// Crate-wide result type. All public API methods return `Result`. pub type Result = std::result::Result; use std::path::PathBuf; use std::sync::Arc; use schema::{EntityId, Schema, Timestamp, Window}; use signals::ledger::{NoopWalWriter, SignalLedger}; use storage::{FjallStorage, Tag, encode_key}; /// Configuration for opening a TidalDB instance. #[derive(Debug, Clone)] pub struct Config { /// Path to the data directory. Created if it does not exist. pub data_dir: PathBuf, /// Schema defining signal types and their configurations. pub schema: Schema, } /// The TidalDB database instance. /// /// This is the single entry point for all database operations in Milestone 1: /// entity metadata CRUD and signal write/read. /// /// # Thread Safety /// /// `TidalDB` is `Send + Sync`. Share it across threads via `Arc`. /// All methods take `&self` -- no mutable access required. /// /// # Lifecycle /// /// ```ignore /// let db = TidalDB::open(config)?; /// // ... use the database ... /// db.shutdown()?; /// ``` /// /// Dropping `TidalDB` without calling `shutdown()` will attempt a best-effort /// flush but may lose the most recent checkpoint. Always call `shutdown()` /// for clean termination. pub struct TidalDB { /// The fjall-backed storage engine with per-EntityKind keyspaces. storage: FjallStorage, /// The in-memory signal ledger (hot + warm tiers). signal_ledger: SignalLedger, /// The schema (owned, immutable after construction). schema: Schema, } // Compile-time assertion that TidalDB is Send + Sync. const _: () = { fn assert_send_sync() {} // This will fail at compile time if TidalDB is not Send + Sync. // The function is never called; the type check is sufficient. let _ = assert_send_sync::; }; impl TidalDB { /// Open a TidalDB instance. /// /// Creates the data directory if it does not exist. Opens the fjall /// storage engine. Creates the signal ledger. Restores in-memory state /// from the most recent checkpoint (if one exists). /// /// # Errors /// /// - `LumenError::Storage` if the data directory cannot be created or opened /// - `LumenError::Internal` if checkpoint restoration fails (corrupt data) pub fn open(config: Config) -> Result; /// Cleanly shut down the database. /// /// 1. Checkpoints all signal ledger state to storage /// 2. Flushes all storage buffers to disk /// 3. Drops internal resources /// /// # Errors /// /// - `LumenError::Storage` if checkpoint or flush fails pub fn shutdown(&self) -> Result<()>; /// Write item metadata. /// /// Stores the metadata bytes at `Tag::Meta` in the items keyspace. /// If an item with this ID already exists, its metadata is overwritten. /// /// # Arguments /// /// - `entity_id`: The item's unique identifier /// - `metadata`: Opaque metadata bytes (application-serialized) /// /// # Errors /// /// - `LumenError::Storage` on I/O failure pub fn write_item(&self, entity_id: EntityId, metadata: &[u8]) -> Result<()>; /// Read item metadata. /// /// Returns the metadata bytes stored at `Tag::Meta`, or `None` if the /// item does not exist. /// /// # Errors /// /// - `LumenError::Storage` on I/O failure pub fn read_item(&self, entity_id: EntityId) -> Result>>; /// Delete item metadata. /// /// Removes the metadata entry. Does not affect signal state (signals /// for this entity remain in the ledger until eviction). /// /// # Errors /// /// - `LumenError::Storage` on I/O failure pub fn delete_item(&self, entity_id: EntityId) -> Result<()>; /// Check if an item exists in storage. pub fn item_exists(&self, entity_id: EntityId) -> Result; /// Get a reference to the schema. pub fn schema(&self) -> &Schema; /// Access the signal ledger (for Task 02 to build signal API on top). pub(crate) fn signal_ledger(&self) -> &SignalLedger; /// Access the storage (for direct storage operations in testing). #[cfg(test)] pub(crate) fn storage(&self) -> &FjallStorage; } ``` ### Internal Design **Open sequence:** ```rust pub fn open(config: Config) -> Result { // 1. Create data directory if needed std::fs::create_dir_all(&config.data_dir) .map_err(|e| LumenError::Storage(StorageError::Io(e.to_string())))?; // 2. Open fjall storage let storage = FjallStorage::open(&config.data_dir)?; // 3. Create signal ledger with NoopWalWriter // (m1p2 will replace this with the real WAL writer) let signal_ledger = SignalLedger::new( config.schema.clone(), Box::new(NoopWalWriter), ); // 4. Restore from checkpoint (items keyspace) let items_backend = storage.backend(EntityKind::Item); let checkpoint_meta = signal_ledger.restore(items_backend)?; if let Some(meta) = checkpoint_meta { tracing::info!( checkpoint_time_ns = meta.checkpoint_time_ns, wal_sequence = meta.wal_sequence, entries = signal_ledger.entry_count(), "restored signal ledger from checkpoint" ); } else { tracing::info!("no checkpoint found, starting with empty signal state"); } // 5. TODO: WAL replay from checkpoint sequence (m1p2) Ok(Self { storage, signal_ledger, schema: config.schema, }) } ``` **Shutdown sequence:** ```rust pub fn shutdown(&self) -> Result<()> { // 1. Checkpoint signal state let meta = CheckpointMeta { checkpoint_time_ns: Timestamp::now().as_nanos(), wal_sequence: 0, // TODO: get from WAL in m1p2 }; let items_backend = self.storage.backend(EntityKind::Item); self.signal_ledger.checkpoint(items_backend, meta)?; // 2. Flush all storage self.storage.flush_all()?; tracing::info!( entries = self.signal_ledger.entry_count(), "tidalDB shutdown complete" ); Ok(()) } ``` **Entity metadata storage:** Item metadata is stored in the items keyspace with `Tag::Meta` and an empty suffix: ```rust pub fn write_item(&self, entity_id: EntityId, metadata: &[u8]) -> Result<()> { let key = encode_key(entity_id, Tag::Meta, &[]); let backend = self.storage.backend(EntityKind::Item); backend.put(&key, metadata)?; Ok(()) } pub fn read_item(&self, entity_id: EntityId) -> Result>> { let key = encode_key(entity_id, Tag::Meta, &[]); let backend = self.storage.backend(EntityKind::Item); Ok(backend.get(&key)?) } ``` **FjallStorage integration:** The existing `FjallStorage` (m1p3) provides `backend(EntityKind) -> &FjallBackend`. For M1, all signal state is checkpointed to the items keyspace because all M1 signals target items. The signal ledger's `checkpoint()` and `restore()` methods receive the items backend. ### Error Handling - Directory creation failure: mapped to `LumenError::Storage` with a descriptive message. - Storage open failure: `FjallStorage::open` returns `StorageError`, which converts to `LumenError::Storage` via the existing `From` impl. - Checkpoint restore failure: `LumenError::Internal` for corrupt data. - Entity CRUD failures: `LumenError::Storage` for I/O errors. ## Test Strategy ### Unit Tests ```rust use tempfile::TempDir; fn test_config(dir: &TempDir) -> Config { let mut builder = SchemaBuilder::new(); builder .signal( "view", EntityKind::Item, DecaySpec::Exponential { half_life: Duration::from_secs(7 * 24 * 3600), }, ) .windows(&[Window::OneHour, Window::TwentyFourHours, Window::SevenDays]) .velocity(true) .add(); builder .signal( "like", EntityKind::Item, DecaySpec::Exponential { half_life: Duration::from_secs(14 * 24 * 3600), }, ) .windows(&[Window::TwentyFourHours, Window::SevenDays, Window::AllTime]) .velocity(true) .add(); builder .signal( "skip", EntityKind::Item, DecaySpec::Exponential { half_life: Duration::from_secs(24 * 3600), }, ) .windows(&[Window::OneHour, Window::TwentyFourHours]) .velocity(false) .add(); Config { data_dir: dir.path().to_owned(), schema: builder.build().unwrap(), } } #[test] fn open_creates_data_directory() { let dir = TempDir::new().unwrap(); let sub = dir.path().join("subdir"); let config = Config { data_dir: sub.clone(), schema: minimal_schema(), }; let db = TidalDB::open(config).unwrap(); assert!(sub.exists()); db.shutdown().unwrap(); } #[test] fn open_and_shutdown_clean() { let dir = TempDir::new().unwrap(); let db = TidalDB::open(test_config(&dir)).unwrap(); db.shutdown().unwrap(); } #[test] fn write_and_read_item() { let dir = TempDir::new().unwrap(); let db = TidalDB::open(test_config(&dir)).unwrap(); let id = EntityId::new(42); let meta = b"test metadata bytes"; db.write_item(id, meta).unwrap(); let read = db.read_item(id).unwrap(); assert_eq!(read.as_deref(), Some(meta.as_slice())); db.shutdown().unwrap(); } #[test] fn read_nonexistent_item_returns_none() { let dir = TempDir::new().unwrap(); let db = TidalDB::open(test_config(&dir)).unwrap(); let read = db.read_item(EntityId::new(999)).unwrap(); assert!(read.is_none()); db.shutdown().unwrap(); } #[test] fn delete_item() { let dir = TempDir::new().unwrap(); let db = TidalDB::open(test_config(&dir)).unwrap(); let id = EntityId::new(1); db.write_item(id, b"data").unwrap(); assert!(db.item_exists(id).unwrap()); db.delete_item(id).unwrap(); assert!(!db.item_exists(id).unwrap()); db.shutdown().unwrap(); } #[test] fn write_item_overwrites() { let dir = TempDir::new().unwrap(); let db = TidalDB::open(test_config(&dir)).unwrap(); let id = EntityId::new(1); db.write_item(id, b"v1").unwrap(); db.write_item(id, b"v2").unwrap(); let read = db.read_item(id).unwrap().unwrap(); assert_eq!(&read, b"v2"); db.shutdown().unwrap(); } #[test] fn items_persist_across_close_reopen() { let dir = TempDir::new().unwrap(); // Write { let db = TidalDB::open(test_config(&dir)).unwrap(); db.write_item(EntityId::new(1), b"persistent").unwrap(); db.shutdown().unwrap(); } // Reopen and read { let db = TidalDB::open(test_config(&dir)).unwrap(); let read = db.read_item(EntityId::new(1)).unwrap(); assert_eq!(read.as_deref(), Some(b"persistent".as_slice())); db.shutdown().unwrap(); } } #[test] fn schema_accessible_from_db() { let dir = TempDir::new().unwrap(); let db = TidalDB::open(test_config(&dir)).unwrap(); assert_eq!(db.schema().signal_count(), 3); assert!(db.schema().signal("view").is_some()); assert!(db.schema().signal("like").is_some()); assert!(db.schema().signal("skip").is_some()); db.shutdown().unwrap(); } #[test] fn tidaldb_is_send_and_sync() { fn assert_send_sync() {} assert_send_sync::(); } #[test] fn multiple_items_independent() { let dir = TempDir::new().unwrap(); let db = TidalDB::open(test_config(&dir)).unwrap(); for i in 0..100 { db.write_item(EntityId::new(i), format!("item_{i}").as_bytes()).unwrap(); } for i in 0..100 { let read = db.read_item(EntityId::new(i)).unwrap().unwrap(); assert_eq!(read, format!("item_{i}").as_bytes()); } db.shutdown().unwrap(); } ``` ## Acceptance Criteria - [ ] `TidalDB::open(config)` creates data directory, opens storage, creates signal ledger, restores from checkpoint - [ ] `TidalDB::shutdown()` checkpoints signal state, flushes storage - [ ] `db.write_item(id, metadata)` stores bytes at `Tag::Meta` in items keyspace - [ ] `db.read_item(id)` returns stored bytes or `None` - [ ] `db.delete_item(id)` removes metadata entry - [ ] `db.item_exists(id)` returns `true`/`false` - [ ] Items persist across close and reopen - [ ] `TidalDB` is `Send + Sync` (compile-time assertion) - [ ] Schema accessible via `db.schema()` - [ ] No `unsafe` code - [ ] `cargo clippy -- -D warnings` passes - [ ] All tests pass ## Research References - [API.md](../../../../API.md) -- Initialization section (`TidalDB::open(Config)`), lifecycle section (`db.shutdown()`) - [CODING_GUIDELINES.md](../../../../CODING_GUIDELINES.md) -- Section 9 (public API: ergonomic, minimal, hard to misuse) ## Spec References - [docs/specs/00-architecture-overview.md](../../../specs/00-architecture-overview.md) -- Section 2 (system diagram), Section 8 (code module map: `lib.rs` as TidalDB struct) ## Implementation Notes - `lib.rs` currently declares module stubs and re-exports. This task replaces the file content with the `TidalDB` struct while preserving all existing module declarations and re-exports. - `FjallStorage::open()` is the existing method from m1p3. It opens or creates the fjall database at the given path with three keyspaces. - `FjallStorage::flush_all()` is the existing method that flushes all keyspaces. - The `Drop` impl for `TidalDB` should attempt a best-effort checkpoint. Use `tracing::error!` if it fails -- do not panic in Drop. - For M1, the WAL is represented by `NoopWalWriter`. When m1p2 ships, `TidalDB::open` will construct the real WAL and pass it to `SignalLedger::new`. The public API does not change. - Do NOT add `write_user` or `write_creator` methods. Those are M3 concerns. The underlying storage supports them via `storage.backend(EntityKind::User)`, but the public API intentionally omits them. - Do NOT add configuration for `memory_budget`, `signal_durability`, or `background_threads` (from API.md). Those are M2+ concerns. M1 Config is minimal: just `data_dir` and `schema`.