tidaldb/docs/research/enterprise_readiness_risks.md
2026-02-23 22:41:16 -07:00

21 KiB

Research: Enterprise Readiness Risks -- fjall Backup API and Schema Fingerprinting

Risk 1: fjall v3 Backup/Snapshot API

Question

Does fjall 3.x expose a safe backup/snapshot API that tidalDB can use to implement TidalDb::create_backup(dest: &Path) -> Result<BackupInfo> while the database is live?

TidalDB Context

tidalDB uses fjall 3.0.2 (fjall = "3" in Cargo.toml) as its durable storage engine. The FjallStorage struct (at /tidal/src/storage/fjall.rs) owns a single fjall::Database with three keyspaces: items, users, creators. A backup must also capture:

  • WAL segments ({data_dir}/wal/)
  • Tantivy text indexes ({data_dir}/text_index/, {data_dir}/creator_text_index/)
  • USearch vector indexes (stored as .idx files via VectorIndex::save())
  • Signal ledger checkpoints (serialized into fjall under Tag::Sig = 0x02)
  • Co-engagement, cohort, collection, session data (all in fjall under their respective tags)

The backup must be consistent: a restored backup should produce the same query results as the source at the point in time the backup was taken.

Survey of fjall 3.0.2 API Surface

fjall::Database public methods (complete list from docs.rs):

Method Purpose Backup Relevance
snapshot() Cross-keyspace MVCC read snapshot Read consistency only; does NOT produce files
persist(PersistMode) Flushes active journal to disk Required pre-backup for durability
batch() Atomic cross-keyspace write batch Not relevant
keyspace(name, opts) Open/create a keyspace Not relevant
disk_space() Total bytes on disk Informational only
journal_count() Number of journal files Informational only
list_keyspace_names() Enumerate keyspaces Useful for backup enumeration

fjall::Keyspace relevant methods:

Method Purpose Backup Relevance
path() Returns the LSM-tree's filesystem path Needed to locate files to copy
rotate_memtable_and_wait() Flushes memtable to SST, blocks until done Critical pre-backup step
disk_space() Keyspace bytes on disk Informational

fjall::Snapshot:

  • Implements the Readable trait (get, iter, range, prefix, etc.)
  • This is a logical MVCC snapshot for consistent reads -- it does NOT produce a physical file-level snapshot
  • Cannot be used for file-level backup

Has snapshot/backup API: NO

fjall 3.0.2 does not expose a backup_to(), checkpoint(), or export() method. This is tracked as GitHub issue #52: "Backup using Checkpointing", which remains open and blocked as of December 2024.

The planned API (not yet implemented):

Database::backup_to<P: AsRef<Path>>(&self, path: P) -> crate::Result<()>
TxDatabase::backup_to<P: AsRef<Path>>(&self, path: P) -> crate::Result<()>

The blocker is issue #70 -- an "unopened keyspace locking" mechanism needed for safe online backup.

Comparison with Other Embedded Databases

Database Backup API Online? Hard Links? Notes
RocksDB Checkpoint::CreateCheckpoint() Yes Yes (same FS) Hard-links SSTs, copies MANIFEST. Consistent across column families. Production-proven at scale.
SQLite sqlite3_backup_init/step/finish Yes No (page copy) Incremental page-by-page copy while source remains writable.
LMDB mdb_env_copy2() Yes No (page copy) Copy-on-write B-tree makes consistent snapshots trivial.
DuckDB EXPORT DATABASE / COPY Semi No SQL-level export; not a byte-level checkpoint.
fjall 3.0.2 None N/A N/A Issue #52 open. Maintainer recommends cp -R offline.

Safe Backup Procedure for fjall 3.0.2

Given the absence of a backup API, there are two viable approaches:

This is the approach the fjall maintainer explicitly recommends for offline backup. Adapted for tidalDB's multi-engine architecture:

1. Pause writes (set an AtomicBool flag that makes signal/entity writes return Err(Backpressure))
2. Flush all in-flight data:
   a. Flush text syncers (item + creator) via flush_tx channel -- blocks until Tantivy commits
   b. Checkpoint signal ledger + cohort ledger + co-engagement to fjall
   c. For each keyspace: call rotate_memtable_and_wait() to flush memtables to SSTs
   d. Call db.persist(PersistMode::SyncAll) to fsync all journal data
   e. Write WAL checkpoint marker
3. Copy the entire data_dir recursively to dest:
   a. {data_dir}/items/    -> {dest}/items/     (fjall SSTs + journals)
   b. {data_dir}/users/    -> {dest}/users/     (fjall SSTs + journals)
   c. {data_dir}/creators/ -> {dest}/creators/  (fjall SSTs + journals)
   d. {data_dir}/wal/      -> {dest}/wal/       (tidalDB WAL segments)
   e. {data_dir}/text_index/         -> {dest}/text_index/
   f. {data_dir}/creator_text_index/ -> {dest}/creator_text_index/
   g. {data_dir}/cache/    -> {dest}/cache/     (if present)
4. Resume writes (clear the AtomicBool flag)
5. Return BackupInfo { path, size_bytes, timestamp, wal_sequence }

Write pause duration estimate: The flush operations (steps 2a-2d) are I/O-bound. For a database with 10M entities and active signal writes:

  • Text syncer flush: ~100ms (channel round-trip + Tantivy commit)
  • Signal checkpoint: ~50ms (serialize DashMap entries to fjall)
  • rotate_memtable_and_wait per keyspace: ~50ms each (3 keyspaces = ~150ms)
  • persist(SyncAll): ~10ms (fsync)
  • File copy: proportional to data size; 1GB at 500MB/s = ~2s

Total estimated write pause: 300ms flush + copy time. For a 1GB database, roughly 2-3 seconds.

Approach B: Snapshot-Consistent Logical Export

Use Database::snapshot() for a consistent logical view, then iterate and write to a new fjall database:

1. Take snapshot = db.snapshot()
2. For each keyspace, iterate snapshot and write to a new Database at dest
3. Separately copy WAL, Tantivy indexes, vector indexes

Problems with this approach:

  • Does not capture WAL/Tantivy/vector files consistently with the fjall snapshot
  • Much slower than file copy (must deserialize/reserialize every KV pair)
  • No way to snapshot Tantivy or USearch indexes concurrently with the fjall snapshot
  • The logical export would need to reconstruct the exact on-disk format fjall expects

Verdict: Approach B is not viable. The cross-engine consistency problem (fjall + Tantivy + USearch) makes logical export impractical.

A refinement of Approach A for same-filesystem backups:

1. Quiesce + flush (same as Approach A steps 1-2)
2. For fjall SST files: hard-link instead of copy (SSTs are immutable after flush)
3. For journal files, WAL, Tantivy, USearch: copy (these are mutable)
4. Resume writes

This mirrors RocksDB's Checkpoint approach. However, it requires:

  • Enumerating fjall's internal file structure (SSTs vs journals vs metadata)
  • Understanding which files are immutable after rotate_memtable_and_wait()
  • This is fragile without fjall's cooperation (internal layout may change between versions)

Verdict: Too fragile without fjall API support. Wait for issue #52 resolution, then adopt hard-link optimization.

Recommendation for create_backup() Implementation

Use Approach A: Quiesce + File Copy.

pub fn create_backup(&self, dest: &Path) -> Result<BackupInfo> {
    // 1. Pause writes via AtomicBool
    // 2. Flush all engines (text, signal, fjall, WAL)
    // 3. fs_extra::dir::copy(data_dir, dest, &CopyOptions::new())
    // 4. Resume writes
    // 5. Return metadata
}

Key implementation notes:

  • rotate_memtable_and_wait() is public but annotated "NOTE: Used in tests" in fjall source. It is the correct pre-backup call -- it ensures all in-memory data is flushed to SSTs. The annotation reflects that most users do not need to call it directly, not that it is unsafe.
  • persist(PersistMode::SyncAll) must follow to ensure journal data reaches disk.
  • The write pause is bounded by I/O throughput, not by data volume (no serialization).
  • Future: when fjall ships issue #52 (Database::backup_to()), replace the file copy with the native API for hard-link support and reduced pause duration.

Open Questions

  1. rotate_memtable_and_wait() stability: This method is public in fjall 3.0.2 but undocumented on docs.rs. It appears in the keyspace source as pub fn rotate_memtable_and_wait. tidalDB already calls it in FjallBackend::flush(). Risk: it could be renamed or removed in a minor fjall release. Mitigation: pin fjall version; the method is already in tidalDB's dependency surface.

  2. Tantivy backup safety: Tantivy indexes are append-only segment files plus a meta.json. Copying after a commit() (via flush_tx) should be safe, but this needs a test that verifies a copied Tantivy index opens correctly.

  3. USearch backup safety: USearch .idx files are written atomically by VectorIndex::save(). If a backup races with a save, the file could be truncated. The quiesce step prevents this, but we should add a file size/checksum validation on the backup side.

  4. Incremental backup: File copy is O(data_size) every time. For large databases, incremental backup (only copy changed SSTs) would reduce pause duration. This requires tracking file checksums or modification times. Defer to post-MVP.


Risk 2: Schema Fingerprint Migration Risk

Question

Can tidalDB safely add schema fingerprint persistence at open() time without breaking existing databases that were opened before the feature existed?

TidalDB Context

The Schema struct (/tidal/src/schema/validation/mod.rs) contains:

  • signals: HashMap<String, SignalTypeDef> -- signal names, decay params, windows, velocity config
  • embedding_slots: Vec<EmbeddingSlotDef> -- vector dimension config
  • text_fields: Vec<TextFieldDef> -- BM25 field config
  • creator_text_fields: Vec<TextFieldDef> -- creator search fields
  • policies: HashMap<String, AgentPolicy> -- session rate limiting

The fingerprint would hash signal names + decay parameters (the fields that affect storage layout and signal score interpretation). If an application opens a database with a different schema than was used to create it, signal scores become meaningless (wrong decay rates applied to stored data) and WAL replay produces incorrect results.

Currently there is no guard against this. open_with_schema() at /tidal/src/db/open.rs accepts any schema and proceeds.

Proposed Behavior

open() time:
  1. Compute fingerprint = hash(sorted signal names + decay params)
  2. Read stored fingerprint from fjall (e.g., well-known key in items keyspace)
  3. Match:
     a. No stored fingerprint -> bootstrap: write fingerprint, succeed
     b. Stored fingerprint == computed -> succeed
     c. Stored fingerprint != computed -> return TidalError::SchemaMismatch

Analysis of Bootstrap Logic

Case 1: Brand-new database (first open ever)

No stored fingerprint. Write it. Succeed. This is correct -- there is no prior data to conflict with.

Case 2: Existing database, first open after feature addition

This is the migration risk. The database has data written with schema S1. The application opens with schema S2 (which may or may not equal S1). No stored fingerprint exists.

If S1 == S2 (common case): Bootstrap writes the fingerprint. All subsequent opens validate correctly. No problem.

If S1 != S2 (the dangerous case the feature is supposed to prevent): Bootstrap writes the WRONG fingerprint (S2's, not S1's). The data was written with S1, but the fingerprint now says S2. The database is silently corrupted -- not by the fingerprint feature, but by the schema mismatch that already existed before the feature was added.

Verdict: The bootstrap case cannot distinguish "first open with this schema" from "schema was changed." This is inherent -- without a stored fingerprint, there is no ground truth to compare against. The bootstrap behavior is correct and safe because:

  1. If the schema matches, writing the fingerprint is harmless and enables future protection.
  2. If the schema does not match, the data was already corrupted before this feature existed. The fingerprint does not make it worse -- it just fails to detect the pre-existing problem.
  3. The alternative (refusing to open when no fingerprint exists) would break every existing database on the first upgrade. That is worse.

Case 3: Subsequent opens with matching schema

Stored fingerprint matches computed fingerprint. Succeed. This is the steady-state happy path.

Case 4: Subsequent opens with mismatched schema

Stored fingerprint does not match. Return TidalError::SchemaMismatch. This is the feature's purpose -- preventing silent corruption.

Edge Cases

Edge Case 1: Schema additions (adding new signal types)

Adding a new signal type (e.g., "share") changes the fingerprint. The open would fail with SchemaMismatch. This is correct behavior -- the application must decide whether the existing data is compatible with the new schema. Options:

  • Force open: A builder method like .allow_schema_migration() could skip the check and overwrite the fingerprint. The application takes responsibility.
  • Migration tool: A CLI command that validates compatibility and updates the fingerprint.

For tidalDB's workload, adding a signal type is backward-compatible (existing data is unaffected; the new signal starts empty). But removing or changing a signal type is NOT backward-compatible (existing scores become meaningless). The fingerprint feature intentionally blocks both; the migration tool should validate the specific change.

Edge Case 2: HashMap iteration order

Schema.signals is a HashMap<String, SignalTypeDef>. HashMap iteration order is non-deterministic. The fingerprint hash MUST sort signals by name before hashing, or the same schema will produce different fingerprints across runs.

Implementation requirement: Sort signal names alphabetically, then hash (name, decay_model, windows, velocity_enabled) tuples in order.

Edge Case 3: Floating-point decay parameters

Decay lambda is computed from half_life as ln(2) / half_life_secs. Floating-point equality is not reflexive for NaN, but lambda is always a valid positive f64. However, hashing f64 directly is problematic (f64 does not implement Hash).

Solution: Hash the half_life duration in nanoseconds (a u128), not the computed lambda. This avoids floating-point comparison issues entirely and hashes the user's declared intent, not a derived value.

Edge Case 4: Ephemeral mode

Ephemeral databases have no durable storage. Fingerprint persistence is meaningless. Skip the check entirely for StorageMode::Ephemeral.

Edge Case 5: Concurrent opens

If two processes open the same data directory simultaneously (which tidalDB does not currently support, but fjall does not prevent), they could race on the fingerprint write. This is not a new problem -- concurrent opens without coordination are already unsafe.

Edge Case 6: Schema fingerprint storage location

The fingerprint should be stored at a well-known key in the items keyspace, using a dedicated tag or a sentinel entity ID. Options:

  • Option A: Sentinel entity ID 0 with Tag::Meta -- [0x00..00][0x00][0x03]["schema_fingerprint"]

    • Pro: Uses existing key encoding; entity ID 0 is reserved (real entities start at 1+)
    • Con: Occupies the entity ID 0 namespace
  • Option B: New Tag::SchemaFingerprint = 0x0D -- [0x00..00][0x00][0x0D]

    • Pro: Clean separation; easy to locate via prefix scan
    • Con: New tag value (minor, well-understood extension)

Recommendation: Option B. A dedicated tag is cleaner and avoids ambiguity about entity ID 0.

Production Precedent

System Schema Versioning Approach Bootstrap Behavior
DuckDB Storage format version in file header Refuses to open if version mismatch; provides EXPORT DATABASE migration path
SQLite user_version pragma (application-managed) Application sets version; no built-in schema hash
RocksDB No schema concept (KV store) N/A
MongoDB schemaVersion field in documents Application-level; "Schema Versioning Pattern" adds version per document
Flyway/Liquibase Migration history table First run creates history table (bootstrap); subsequent runs compare

The "first run writes, subsequent runs compare" pattern is standard across migration frameworks. The bootstrap-then-validate approach is well-established.

Recommendation

Implement the bootstrap logic as proposed. It is safe and follows production precedent.

Implementation checklist:

  1. Add Tag::SchemaFingerprint = 0x0D to /tidal/src/storage/keys.rs
  2. Implement Schema::fingerprint() -> [u8; 32] that:
    • Sorts signal names alphabetically
    • For each signal: hashes (name, decay_type, half_life_nanos, windows_sorted, velocity_enabled)
    • Uses BLAKE3 or SHA-256 (BLAKE3 preferred for speed; already in the Rust ecosystem)
  3. In open_with_schema() (persistent mode only):
    • Read key [0x00..00][0x00][0x0D] from items keyspace
    • If absent: write fingerprint, log "schema fingerprint initialized", succeed
    • If present and matches: succeed
    • If present and mismatches: return TidalError::SchemaMismatch { stored: hex, computed: hex }
  4. Add SchemaMismatch variant to TidalError
  5. Skip entirely for StorageMode::Ephemeral

Open Questions

  1. What fields to include in the fingerprint? Signal names + decay params are critical because they affect score interpretation. Should embedding slot dimensions and text field definitions also be included? Adding a new text field is backward-compatible, but changing dimensions is not. Recommendation: include signal fields + embedding dimensions. Exclude text fields and policies (additive changes to these are safe).

  2. Force-open escape hatch: Should TidalDbBuilder expose .allow_schema_migration() from day one? This is useful for development but dangerous in production. Recommendation: add it but log a WARN-level message when used. Do not add it until the first user needs it.

  3. Migration tool: A future tidalctl schema migrate command should compare old and new schemas, validate that the change is backward-compatible (additions only, no decay parameter changes), and update the fingerprint. This is post-MVP.


Summary

fjall backup: Use quiesce + file copy

fjall 3.0.2 has no backup API (issue #52 open and blocked). The safe procedure is: pause writes, flush all engines (rotate_memtable_and_wait + persist(SyncAll) + Tantivy flush + WAL checkpoint), copy the entire data directory, resume writes. Estimated write pause: 300ms + file copy time. When fjall ships its backup API, switch to it for hard-link support.

Schema fingerprint: Safe to implement with bootstrap logic

The "no fingerprint -> write and succeed" bootstrap is correct and follows production precedent (Flyway, DuckDB, etc.). It cannot detect schema mismatches that predate the feature, but this is inherent -- the feature prevents future mismatches, not past ones. Key implementation details: sort signals before hashing, hash half_life nanos (not lambda), use a dedicated Tag::SchemaFingerprint, skip for ephemeral mode.

Sources