Implements the foundation of tidalDB's data pipeline: **Phase 1 – Schema primitives** - EntityId newtype (u64, big-endian ordering) - SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows - SchemaBuilder with full constraint validation (duplicates, identifiers, half-life, windows, velocity) - LumenError wrapping all subsystems with required From impls **Phase 2 – Write-Ahead Log** - Length-prefixed, BLAKE3-protected entry format - Group-commit writer (batch up to 100 events / 10 ms) - Double-buffered content-hash deduplication - Checkpoint, truncation, and crash-recovery with full replay - Integration, property, and UAT tests (incl. 5,500-event deterministic UAT) - Proptest coverage scaled to 10 000 events/run (was ≤500) to meet acceptance criterion; cases reduced 100→10 to keep runtime comparable **Phase 3 – Storage engine** - StorageEngine trait (get/put/delete/scan/batch/flush) - Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers - InMemoryBackend (BTreeMap + RwLock) - FjallStorage with three isolated keyspaces and atomic batch helper - Property tests for key ordering and round-trip correctness Also adds planning docs for phases 4-5, research docs, architecture overview, and roadmap updates. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
16 KiB
Task 01: TidalDB Core
Context
Milestone: 1 -- Signal Engine Phase: m1p5 -- Entity CRUD and Signal Write API Depends On: m1p1 (types), m1p3 (storage), m1p4 (signal ledger) Blocks: Task 02 (Signal Write and Read API), Task 03 (Integration Test) Complexity: M
Objective
Deliver the TidalDB struct -- the single entry point for all database operations. This struct owns the storage engine, the signal ledger, and (when m1p2 ships) the WAL. It provides open() to initialize the database, shutdown() to cleanly close it, and entity metadata CRUD for items.
TidalDB is the struct that a developer imports and uses. It must be Send + Sync so it can be wrapped in Arc and shared across threads. Its API must be clean, ergonomic, and unsurprising -- this is the first thing a user touches.
Requirements
TidalDBstruct owns:FjallStorage,SignalLedger, (optionally) WAL writerConfigstruct:data_dir: PathBuf,schema: SchemaTidalDB::open(config)initializes storage, creates signal ledger, restores from checkpointTidalDB::shutdown()checkpoints signal state, flushes storage, drops resourcesdb.write_item(entity_id, metadata)stores metadata bytes atTag::Metain the items keyspacedb.read_item(entity_id)retrieves metadata bytes fromTag::Metadb.delete_item(entity_id)removes metadataTidalDBisSend + Sync- No
unsafecode
Technical Design
Module Structure
tidal/src/
lib.rs -- TidalDB, Config, public API
Public API
// === lib.rs (replacing current content) ===
pub mod query;
pub mod ranking;
pub mod schema;
pub mod signals;
pub mod storage;
pub mod wal;
pub use schema::LumenError;
/// Crate-wide result type. All public API methods return `Result<T, LumenError>`.
pub type Result<T> = std::result::Result<T, LumenError>;
use std::path::PathBuf;
use std::sync::Arc;
use schema::{EntityId, Schema, Timestamp, Window};
use signals::ledger::{NoopWalWriter, SignalLedger};
use storage::{FjallStorage, Tag, encode_key};
/// Configuration for opening a TidalDB instance.
#[derive(Debug, Clone)]
pub struct Config {
/// Path to the data directory. Created if it does not exist.
pub data_dir: PathBuf,
/// Schema defining signal types and their configurations.
pub schema: Schema,
}
/// The TidalDB database instance.
///
/// This is the single entry point for all database operations in Milestone 1:
/// entity metadata CRUD and signal write/read.
///
/// # Thread Safety
///
/// `TidalDB` is `Send + Sync`. Share it across threads via `Arc<TidalDB>`.
/// All methods take `&self` -- no mutable access required.
///
/// # Lifecycle
///
/// ```ignore
/// let db = TidalDB::open(config)?;
/// // ... use the database ...
/// db.shutdown()?;
/// ```
///
/// Dropping `TidalDB` without calling `shutdown()` will attempt a best-effort
/// flush but may lose the most recent checkpoint. Always call `shutdown()`
/// for clean termination.
pub struct TidalDB {
/// The fjall-backed storage engine with per-EntityKind keyspaces.
storage: FjallStorage,
/// The in-memory signal ledger (hot + warm tiers).
signal_ledger: SignalLedger,
/// The schema (owned, immutable after construction).
schema: Schema,
}
// Compile-time assertion that TidalDB is Send + Sync.
const _: () = {
fn assert_send_sync<T: Send + Sync>() {}
// This will fail at compile time if TidalDB is not Send + Sync.
// The function is never called; the type check is sufficient.
let _ = assert_send_sync::<TidalDB>;
};
impl TidalDB {
/// Open a TidalDB instance.
///
/// Creates the data directory if it does not exist. Opens the fjall
/// storage engine. Creates the signal ledger. Restores in-memory state
/// from the most recent checkpoint (if one exists).
///
/// # Errors
///
/// - `LumenError::Storage` if the data directory cannot be created or opened
/// - `LumenError::Internal` if checkpoint restoration fails (corrupt data)
pub fn open(config: Config) -> Result<Self>;
/// Cleanly shut down the database.
///
/// 1. Checkpoints all signal ledger state to storage
/// 2. Flushes all storage buffers to disk
/// 3. Drops internal resources
///
/// # Errors
///
/// - `LumenError::Storage` if checkpoint or flush fails
pub fn shutdown(&self) -> Result<()>;
/// Write item metadata.
///
/// Stores the metadata bytes at `Tag::Meta` in the items keyspace.
/// If an item with this ID already exists, its metadata is overwritten.
///
/// # Arguments
///
/// - `entity_id`: The item's unique identifier
/// - `metadata`: Opaque metadata bytes (application-serialized)
///
/// # Errors
///
/// - `LumenError::Storage` on I/O failure
pub fn write_item(&self, entity_id: EntityId, metadata: &[u8]) -> Result<()>;
/// Read item metadata.
///
/// Returns the metadata bytes stored at `Tag::Meta`, or `None` if the
/// item does not exist.
///
/// # Errors
///
/// - `LumenError::Storage` on I/O failure
pub fn read_item(&self, entity_id: EntityId) -> Result<Option<Vec<u8>>>;
/// Delete item metadata.
///
/// Removes the metadata entry. Does not affect signal state (signals
/// for this entity remain in the ledger until eviction).
///
/// # Errors
///
/// - `LumenError::Storage` on I/O failure
pub fn delete_item(&self, entity_id: EntityId) -> Result<()>;
/// Check if an item exists in storage.
pub fn item_exists(&self, entity_id: EntityId) -> Result<bool>;
/// Get a reference to the schema.
pub fn schema(&self) -> &Schema;
/// Access the signal ledger (for Task 02 to build signal API on top).
pub(crate) fn signal_ledger(&self) -> &SignalLedger;
/// Access the storage (for direct storage operations in testing).
#[cfg(test)]
pub(crate) fn storage(&self) -> &FjallStorage;
}
Internal Design
Open sequence:
pub fn open(config: Config) -> Result<Self> {
// 1. Create data directory if needed
std::fs::create_dir_all(&config.data_dir)
.map_err(|e| LumenError::Storage(StorageError::Io(e.to_string())))?;
// 2. Open fjall storage
let storage = FjallStorage::open(&config.data_dir)?;
// 3. Create signal ledger with NoopWalWriter
// (m1p2 will replace this with the real WAL writer)
let signal_ledger = SignalLedger::new(
config.schema.clone(),
Box::new(NoopWalWriter),
);
// 4. Restore from checkpoint (items keyspace)
let items_backend = storage.backend(EntityKind::Item);
let checkpoint_meta = signal_ledger.restore(items_backend)?;
if let Some(meta) = checkpoint_meta {
tracing::info!(
checkpoint_time_ns = meta.checkpoint_time_ns,
wal_sequence = meta.wal_sequence,
entries = signal_ledger.entry_count(),
"restored signal ledger from checkpoint"
);
} else {
tracing::info!("no checkpoint found, starting with empty signal state");
}
// 5. TODO: WAL replay from checkpoint sequence (m1p2)
Ok(Self {
storage,
signal_ledger,
schema: config.schema,
})
}
Shutdown sequence:
pub fn shutdown(&self) -> Result<()> {
// 1. Checkpoint signal state
let meta = CheckpointMeta {
checkpoint_time_ns: Timestamp::now().as_nanos(),
wal_sequence: 0, // TODO: get from WAL in m1p2
};
let items_backend = self.storage.backend(EntityKind::Item);
self.signal_ledger.checkpoint(items_backend, meta)?;
// 2. Flush all storage
self.storage.flush_all()?;
tracing::info!(
entries = self.signal_ledger.entry_count(),
"tidalDB shutdown complete"
);
Ok(())
}
Entity metadata storage:
Item metadata is stored in the items keyspace with Tag::Meta and an empty suffix:
pub fn write_item(&self, entity_id: EntityId, metadata: &[u8]) -> Result<()> {
let key = encode_key(entity_id, Tag::Meta, &[]);
let backend = self.storage.backend(EntityKind::Item);
backend.put(&key, metadata)?;
Ok(())
}
pub fn read_item(&self, entity_id: EntityId) -> Result<Option<Vec<u8>>> {
let key = encode_key(entity_id, Tag::Meta, &[]);
let backend = self.storage.backend(EntityKind::Item);
Ok(backend.get(&key)?)
}
FjallStorage integration:
The existing FjallStorage (m1p3) provides backend(EntityKind) -> &FjallBackend. For M1, all signal state is checkpointed to the items keyspace because all M1 signals target items. The signal ledger's checkpoint() and restore() methods receive the items backend.
Error Handling
- Directory creation failure: mapped to
LumenError::Storagewith a descriptive message. - Storage open failure:
FjallStorage::openreturnsStorageError, which converts toLumenError::Storagevia the existingFromimpl. - Checkpoint restore failure:
LumenError::Internalfor corrupt data. - Entity CRUD failures:
LumenError::Storagefor I/O errors.
Test Strategy
Unit Tests
use tempfile::TempDir;
fn test_config(dir: &TempDir) -> Config {
let mut builder = SchemaBuilder::new();
builder
.signal(
"view",
EntityKind::Item,
DecaySpec::Exponential {
half_life: Duration::from_secs(7 * 24 * 3600),
},
)
.windows(&[Window::OneHour, Window::TwentyFourHours, Window::SevenDays])
.velocity(true)
.add();
builder
.signal(
"like",
EntityKind::Item,
DecaySpec::Exponential {
half_life: Duration::from_secs(14 * 24 * 3600),
},
)
.windows(&[Window::TwentyFourHours, Window::SevenDays, Window::AllTime])
.velocity(true)
.add();
builder
.signal(
"skip",
EntityKind::Item,
DecaySpec::Exponential {
half_life: Duration::from_secs(24 * 3600),
},
)
.windows(&[Window::OneHour, Window::TwentyFourHours])
.velocity(false)
.add();
Config {
data_dir: dir.path().to_owned(),
schema: builder.build().unwrap(),
}
}
#[test]
fn open_creates_data_directory() {
let dir = TempDir::new().unwrap();
let sub = dir.path().join("subdir");
let config = Config {
data_dir: sub.clone(),
schema: minimal_schema(),
};
let db = TidalDB::open(config).unwrap();
assert!(sub.exists());
db.shutdown().unwrap();
}
#[test]
fn open_and_shutdown_clean() {
let dir = TempDir::new().unwrap();
let db = TidalDB::open(test_config(&dir)).unwrap();
db.shutdown().unwrap();
}
#[test]
fn write_and_read_item() {
let dir = TempDir::new().unwrap();
let db = TidalDB::open(test_config(&dir)).unwrap();
let id = EntityId::new(42);
let meta = b"test metadata bytes";
db.write_item(id, meta).unwrap();
let read = db.read_item(id).unwrap();
assert_eq!(read.as_deref(), Some(meta.as_slice()));
db.shutdown().unwrap();
}
#[test]
fn read_nonexistent_item_returns_none() {
let dir = TempDir::new().unwrap();
let db = TidalDB::open(test_config(&dir)).unwrap();
let read = db.read_item(EntityId::new(999)).unwrap();
assert!(read.is_none());
db.shutdown().unwrap();
}
#[test]
fn delete_item() {
let dir = TempDir::new().unwrap();
let db = TidalDB::open(test_config(&dir)).unwrap();
let id = EntityId::new(1);
db.write_item(id, b"data").unwrap();
assert!(db.item_exists(id).unwrap());
db.delete_item(id).unwrap();
assert!(!db.item_exists(id).unwrap());
db.shutdown().unwrap();
}
#[test]
fn write_item_overwrites() {
let dir = TempDir::new().unwrap();
let db = TidalDB::open(test_config(&dir)).unwrap();
let id = EntityId::new(1);
db.write_item(id, b"v1").unwrap();
db.write_item(id, b"v2").unwrap();
let read = db.read_item(id).unwrap().unwrap();
assert_eq!(&read, b"v2");
db.shutdown().unwrap();
}
#[test]
fn items_persist_across_close_reopen() {
let dir = TempDir::new().unwrap();
// Write
{
let db = TidalDB::open(test_config(&dir)).unwrap();
db.write_item(EntityId::new(1), b"persistent").unwrap();
db.shutdown().unwrap();
}
// Reopen and read
{
let db = TidalDB::open(test_config(&dir)).unwrap();
let read = db.read_item(EntityId::new(1)).unwrap();
assert_eq!(read.as_deref(), Some(b"persistent".as_slice()));
db.shutdown().unwrap();
}
}
#[test]
fn schema_accessible_from_db() {
let dir = TempDir::new().unwrap();
let db = TidalDB::open(test_config(&dir)).unwrap();
assert_eq!(db.schema().signal_count(), 3);
assert!(db.schema().signal("view").is_some());
assert!(db.schema().signal("like").is_some());
assert!(db.schema().signal("skip").is_some());
db.shutdown().unwrap();
}
#[test]
fn tidaldb_is_send_and_sync() {
fn assert_send_sync<T: Send + Sync>() {}
assert_send_sync::<TidalDB>();
}
#[test]
fn multiple_items_independent() {
let dir = TempDir::new().unwrap();
let db = TidalDB::open(test_config(&dir)).unwrap();
for i in 0..100 {
db.write_item(EntityId::new(i), format!("item_{i}").as_bytes()).unwrap();
}
for i in 0..100 {
let read = db.read_item(EntityId::new(i)).unwrap().unwrap();
assert_eq!(read, format!("item_{i}").as_bytes());
}
db.shutdown().unwrap();
}
Acceptance Criteria
TidalDB::open(config)creates data directory, opens storage, creates signal ledger, restores from checkpointTidalDB::shutdown()checkpoints signal state, flushes storagedb.write_item(id, metadata)stores bytes atTag::Metain items keyspacedb.read_item(id)returns stored bytes orNonedb.delete_item(id)removes metadata entrydb.item_exists(id)returnstrue/false- Items persist across close and reopen
TidalDBisSend + Sync(compile-time assertion)- Schema accessible via
db.schema() - No
unsafecode cargo clippy -- -D warningspasses- All tests pass
Research References
- API.md -- Initialization section (
TidalDB::open(Config)), lifecycle section (db.shutdown()) - CODING_GUIDELINES.md -- Section 9 (public API: ergonomic, minimal, hard to misuse)
Spec References
- docs/specs/00-architecture-overview.md -- Section 2 (system diagram), Section 8 (code module map:
lib.rsas TidalDB struct)
Implementation Notes
lib.rscurrently declares module stubs and re-exports. This task replaces the file content with theTidalDBstruct while preserving all existing module declarations and re-exports.FjallStorage::open()is the existing method from m1p3. It opens or creates the fjall database at the given path with three keyspaces.FjallStorage::flush_all()is the existing method that flushes all keyspaces.- The
Dropimpl forTidalDBshould attempt a best-effort checkpoint. Usetracing::error!if it fails -- do not panic in Drop. - For M1, the WAL is represented by
NoopWalWriter. When m1p2 ships,TidalDB::openwill construct the real WAL and pass it toSignalLedger::new. The public API does not change. - Do NOT add
write_userorwrite_creatormethods. Those are M3 concerns. The underlying storage supports them viastorage.backend(EntityKind::User), but the public API intentionally omits them. - Do NOT add configuration for
memory_budget,signal_durability, orbackground_threads(from API.md). Those are M2+ concerns. M1 Config is minimal: justdata_dirandschema.