tidaldb/docs/research/enterprise_readiness_risks.md

# Research: Enterprise Readiness Risks -- fjall Backup API and Schema Fingerprinting

## Risk 1: fjall v3 Backup/Snapshot API

### Question

Does fjall 3.x expose a safe backup/snapshot API that tidalDB can use to implement `TidalDb::create_backup(dest: &Path) -> Result<BackupInfo>` while the database is live?

### TidalDB Context

tidalDB uses fjall 3.0.2 (`fjall = "3"` in Cargo.toml) as its durable storage engine. The `FjallStorage` struct (at `/tidal/src/storage/fjall.rs`) owns a single `fjall::Database` with three keyspaces: items, users, creators. A backup must also capture:

- **WAL segments** (`{data_dir}/wal/`)
- **Tantivy text indexes** (`{data_dir}/text_index/`, `{data_dir}/creator_text_index/`)
- **USearch vector indexes** (stored as `.idx` files via `VectorIndex::save()`)
- **Signal ledger checkpoints** (serialized into fjall under `Tag::Sig = 0x02`)
- **Co-engagement, cohort, collection, session data** (all in fjall under their respective tags)

The backup must be consistent: a restored backup should produce the same query results as the source at the point in time the backup was taken.

### Survey of fjall 3.0.2 API Surface

**`fjall::Database` public methods (complete list from docs.rs):**

| Method | Purpose | Backup Relevance |
|--------|---------|------------------|
| `snapshot()` | Cross-keyspace MVCC read snapshot | Read consistency only; does NOT produce files |
| `persist(PersistMode)` | Flushes active journal to disk | Required pre-backup for durability |
| `batch()` | Atomic cross-keyspace write batch | Not relevant |
| `keyspace(name, opts)` | Open/create a keyspace | Not relevant |
| `disk_space()` | Total bytes on disk | Informational only |
| `journal_count()` | Number of journal files | Informational only |
| `list_keyspace_names()` | Enumerate keyspaces | Useful for backup enumeration |

**`fjall::Keyspace` relevant methods:**

| Method | Purpose | Backup Relevance |
|--------|---------|------------------|
| `path()` | Returns the LSM-tree's filesystem path | Needed to locate files to copy |
| `rotate_memtable_and_wait()` | Flushes memtable to SST, blocks until done | Critical pre-backup step |
| `disk_space()` | Keyspace bytes on disk | Informational |

**`fjall::Snapshot`:**
- Implements the `Readable` trait (get, iter, range, prefix, etc.)
- This is a logical MVCC snapshot for consistent reads -- it does NOT produce a physical file-level snapshot
- Cannot be used for file-level backup

### Has snapshot/backup API: NO

fjall 3.0.2 does **not** expose a `backup_to()`, `checkpoint()`, or `export()` method. This is tracked as [GitHub issue #52: "Backup using Checkpointing"](https://github.com/fjall-rs/fjall/issues/52), which remains **open and blocked** as of December 2024.

The planned API (not yet implemented):
```rust
Database::backup_to<P: AsRef<Path>>(&self, path: P) -> crate::Result<()>
TxDatabase::backup_to<P: AsRef<Path>>(&self, path: P) -> crate::Result<()>
```

The blocker is [issue #70](https://github.com/fjall-rs/fjall/issues/70) -- an "unopened keyspace locking" mechanism needed for safe online backup.

### Comparison with Other Embedded Databases

| Database | Backup API | Online? | Hard Links? | Notes |
|----------|-----------|---------|-------------|-------|
| **RocksDB** | `Checkpoint::CreateCheckpoint()` | Yes | Yes (same FS) | Hard-links SSTs, copies MANIFEST. Consistent across column families. Production-proven at scale. |
| **SQLite** | `sqlite3_backup_init/step/finish` | Yes | No (page copy) | Incremental page-by-page copy while source remains writable. |
| **LMDB** | `mdb_env_copy2()` | Yes | No (page copy) | Copy-on-write B-tree makes consistent snapshots trivial. |
| **DuckDB** | `EXPORT DATABASE` / `COPY` | Semi | No | SQL-level export; not a byte-level checkpoint. |
| **fjall 3.0.2** | None | N/A | N/A | Issue #52 open. Maintainer recommends `cp -R` offline. |

### Safe Backup Procedure for fjall 3.0.2

Given the absence of a backup API, there are two viable approaches:

#### Approach A: Quiesce + File Copy (Recommended)

This is the approach the fjall maintainer explicitly recommends for offline backup. Adapted for tidalDB's multi-engine architecture:

```
1. Pause writes (set an AtomicBool flag that makes signal/entity writes return Err(Backpressure))
2. Flush all in-flight data:
   a. Flush text syncers (item + creator) via flush_tx channel -- blocks until Tantivy commits
   b. Checkpoint signal ledger + cohort ledger + co-engagement to fjall
   c. For each keyspace: call rotate_memtable_and_wait() to flush memtables to SSTs
   d. Call db.persist(PersistMode::SyncAll) to fsync all journal data
   e. Write WAL checkpoint marker
3. Copy the entire data_dir recursively to dest:
   a. {data_dir}/items/    -> {dest}/items/     (fjall SSTs + journals)
   b. {data_dir}/users/    -> {dest}/users/     (fjall SSTs + journals)
   c. {data_dir}/creators/ -> {dest}/creators/  (fjall SSTs + journals)
   d. {data_dir}/wal/      -> {dest}/wal/       (tidalDB WAL segments)
   e. {data_dir}/text_index/         -> {dest}/text_index/
   f. {data_dir}/creator_text_index/ -> {dest}/creator_text_index/
   g. {data_dir}/cache/    -> {dest}/cache/     (if present)
4. Resume writes (clear the AtomicBool flag)
5. Return BackupInfo { path, size_bytes, timestamp, wal_sequence }
```

**Write pause duration estimate:** The flush operations (steps 2a-2d) are I/O-bound. For a database with 10M entities and active signal writes:
- Text syncer flush: ~100ms (channel round-trip + Tantivy commit)
- Signal checkpoint: ~50ms (serialize DashMap entries to fjall)
- rotate_memtable_and_wait per keyspace: ~50ms each (3 keyspaces = ~150ms)
- persist(SyncAll): ~10ms (fsync)
- File copy: proportional to data size; 1GB at 500MB/s = ~2s

**Total estimated write pause: 300ms flush + copy time.** For a 1GB database, roughly 2-3 seconds.

#### Approach B: Snapshot-Consistent Logical Export

Use `Database::snapshot()` for a consistent logical view, then iterate and write to a new fjall database:

```
1. Take snapshot = db.snapshot()
2. For each keyspace, iterate snapshot and write to a new Database at dest
3. Separately copy WAL, Tantivy indexes, vector indexes
```

**Problems with this approach:**
- Does not capture WAL/Tantivy/vector files consistently with the fjall snapshot
- Much slower than file copy (must deserialize/reserialize every KV pair)
- No way to snapshot Tantivy or USearch indexes concurrently with the fjall snapshot
- The logical export would need to reconstruct the exact on-disk format fjall expects

**Verdict: Approach B is not viable.** The cross-engine consistency problem (fjall + Tantivy + USearch) makes logical export impractical.

#### Approach C: Hard-Link Optimization (Same Filesystem)

A refinement of Approach A for same-filesystem backups:

```
1. Quiesce + flush (same as Approach A steps 1-2)
2. For fjall SST files: hard-link instead of copy (SSTs are immutable after flush)
3. For journal files, WAL, Tantivy, USearch: copy (these are mutable)
4. Resume writes
```

This mirrors RocksDB's Checkpoint approach. However, it requires:
- Enumerating fjall's internal file structure (SSTs vs journals vs metadata)
- Understanding which files are immutable after `rotate_memtable_and_wait()`
- This is fragile without fjall's cooperation (internal layout may change between versions)

**Verdict: Too fragile without fjall API support.** Wait for issue #52 resolution, then adopt hard-link optimization.

### Recommendation for `create_backup()` Implementation

**Use Approach A: Quiesce + File Copy.**

```rust
pub fn create_backup(&self, dest: &Path) -> Result<BackupInfo> {
    // 1. Pause writes via AtomicBool
    // 2. Flush all engines (text, signal, fjall, WAL)
    // 3. fs_extra::dir::copy(data_dir, dest, &CopyOptions::new())
    // 4. Resume writes
    // 5. Return metadata
}
```

Key implementation notes:
- `rotate_memtable_and_wait()` is public but annotated "NOTE: Used in tests" in fjall source. It is the correct pre-backup call -- it ensures all in-memory data is flushed to SSTs. The annotation reflects that most users do not need to call it directly, not that it is unsafe.
- `persist(PersistMode::SyncAll)` must follow to ensure journal data reaches disk.
- The write pause is bounded by I/O throughput, not by data volume (no serialization).
- Future: when fjall ships issue #52 (`Database::backup_to()`), replace the file copy with the native API for hard-link support and reduced pause duration.

### Open Questions

1. **rotate_memtable_and_wait() stability:** This method is public in fjall 3.0.2 but undocumented on docs.rs. It appears in the keyspace source as `pub fn rotate_memtable_and_wait`. tidalDB already calls it in `FjallBackend::flush()`. Risk: it could be renamed or removed in a minor fjall release. Mitigation: pin fjall version; the method is already in tidalDB's dependency surface.

2. **Tantivy backup safety:** Tantivy indexes are append-only segment files plus a `meta.json`. Copying after a `commit()` (via flush_tx) should be safe, but this needs a test that verifies a copied Tantivy index opens correctly.

3. **USearch backup safety:** USearch `.idx` files are written atomically by `VectorIndex::save()`. If a backup races with a save, the file could be truncated. The quiesce step prevents this, but we should add a file size/checksum validation on the backup side.

4. **Incremental backup:** File copy is O(data_size) every time. For large databases, incremental backup (only copy changed SSTs) would reduce pause duration. This requires tracking file checksums or modification times. Defer to post-MVP.

---

## Risk 2: Schema Fingerprint Migration Risk

### Question

Can tidalDB safely add schema fingerprint persistence at `open()` time without breaking existing databases that were opened before the feature existed?

### TidalDB Context

The `Schema` struct (`/tidal/src/schema/validation/mod.rs`) contains:
- `signals: HashMap<String, SignalTypeDef>` -- signal names, decay params, windows, velocity config
- `embedding_slots: Vec<EmbeddingSlotDef>` -- vector dimension config
- `text_fields: Vec<TextFieldDef>` -- BM25 field config
- `creator_text_fields: Vec<TextFieldDef>` -- creator search fields
- `policies: HashMap<String, AgentPolicy>` -- session rate limiting

The fingerprint would hash signal names + decay parameters (the fields that affect storage layout and signal score interpretation). If an application opens a database with a different schema than was used to create it, signal scores become meaningless (wrong decay rates applied to stored data) and WAL replay produces incorrect results.

Currently there is no guard against this. `open_with_schema()` at `/tidal/src/db/open.rs` accepts any schema and proceeds.

### Proposed Behavior

```
open() time:
  1. Compute fingerprint = hash(sorted signal names + decay params)
  2. Read stored fingerprint from fjall (e.g., well-known key in items keyspace)
  3. Match:
     a. No stored fingerprint -> bootstrap: write fingerprint, succeed
     b. Stored fingerprint == computed -> succeed
     c. Stored fingerprint != computed -> return TidalError::SchemaMismatch
```

### Analysis of Bootstrap Logic

#### Case 1: Brand-new database (first open ever)

No stored fingerprint. Write it. Succeed. This is correct -- there is no prior data to conflict with.

#### Case 2: Existing database, first open after feature addition

This is the migration risk. The database has data written with schema S1. The application opens with schema S2 (which may or may not equal S1). No stored fingerprint exists.

**If S1 == S2 (common case):** Bootstrap writes the fingerprint. All subsequent opens validate correctly. No problem.

**If S1 != S2 (the dangerous case the feature is supposed to prevent):** Bootstrap writes the WRONG fingerprint (S2's, not S1's). The data was written with S1, but the fingerprint now says S2. The database is silently corrupted -- not by the fingerprint feature, but by the schema mismatch that already existed before the feature was added.

**Verdict:** The bootstrap case cannot distinguish "first open with this schema" from "schema was changed." This is inherent -- without a stored fingerprint, there is no ground truth to compare against. The bootstrap behavior is **correct and safe** because:

1. If the schema matches, writing the fingerprint is harmless and enables future protection.
2. If the schema does not match, the data was already corrupted before this feature existed. The fingerprint does not make it worse -- it just fails to detect the pre-existing problem.
3. The alternative (refusing to open when no fingerprint exists) would break every existing database on the first upgrade. That is worse.

#### Case 3: Subsequent opens with matching schema

Stored fingerprint matches computed fingerprint. Succeed. This is the steady-state happy path.

#### Case 4: Subsequent opens with mismatched schema

Stored fingerprint does not match. Return `TidalError::SchemaMismatch`. This is the feature's purpose -- preventing silent corruption.

### Edge Cases

#### Edge Case 1: Schema additions (adding new signal types)

Adding a new signal type (e.g., `"share"`) changes the fingerprint. The open would fail with `SchemaMismatch`. This is **correct behavior** -- the application must decide whether the existing data is compatible with the new schema. Options:

- **Force open:** A builder method like `.allow_schema_migration()` could skip the check and overwrite the fingerprint. The application takes responsibility.
- **Migration tool:** A CLI command that validates compatibility and updates the fingerprint.

For tidalDB's workload, adding a signal type is backward-compatible (existing data is unaffected; the new signal starts empty). But removing or changing a signal type is NOT backward-compatible (existing scores become meaningless). The fingerprint feature intentionally blocks both; the migration tool should validate the specific change.

#### Edge Case 2: HashMap iteration order

`Schema.signals` is a `HashMap<String, SignalTypeDef>`. HashMap iteration order is non-deterministic. The fingerprint hash MUST sort signals by name before hashing, or the same schema will produce different fingerprints across runs.

**Implementation requirement:** Sort signal names alphabetically, then hash `(name, decay_model, windows, velocity_enabled)` tuples in order.

#### Edge Case 3: Floating-point decay parameters

Decay lambda is computed from `half_life` as `ln(2) / half_life_secs`. Floating-point equality is not reflexive for NaN, but lambda is always a valid positive f64. However, hashing `f64` directly is problematic (`f64` does not implement `Hash`).

**Solution:** Hash the `half_life` duration in nanoseconds (a `u128`), not the computed lambda. This avoids floating-point comparison issues entirely and hashes the user's declared intent, not a derived value.

#### Edge Case 4: Ephemeral mode

Ephemeral databases have no durable storage. Fingerprint persistence is meaningless. Skip the check entirely for `StorageMode::Ephemeral`.

#### Edge Case 5: Concurrent opens

If two processes open the same data directory simultaneously (which tidalDB does not currently support, but fjall does not prevent), they could race on the fingerprint write. This is not a new problem -- concurrent opens without coordination are already unsafe.

#### Edge Case 6: Schema fingerprint storage location

The fingerprint should be stored at a well-known key in the items keyspace, using a dedicated tag or a sentinel entity ID. Options:

- **Option A: Sentinel entity ID 0 with Tag::Meta** -- `[0x00..00][0x00][0x03]["schema_fingerprint"]`
  - Pro: Uses existing key encoding; entity ID 0 is reserved (real entities start at 1+)
  - Con: Occupies the entity ID 0 namespace

- **Option B: New Tag::SchemaFingerprint = 0x0D** -- `[0x00..00][0x00][0x0D]`
  - Pro: Clean separation; easy to locate via prefix scan
  - Con: New tag value (minor, well-understood extension)

**Recommendation:** Option B. A dedicated tag is cleaner and avoids ambiguity about entity ID 0.

### Production Precedent

| System | Schema Versioning Approach | Bootstrap Behavior |
|--------|---------------------------|-------------------|
| **DuckDB** | Storage format version in file header | Refuses to open if version mismatch; provides `EXPORT DATABASE` migration path |
| **SQLite** | `user_version` pragma (application-managed) | Application sets version; no built-in schema hash |
| **RocksDB** | No schema concept (KV store) | N/A |
| **MongoDB** | `schemaVersion` field in documents | Application-level; "Schema Versioning Pattern" adds version per document |
| **Flyway/Liquibase** | Migration history table | First run creates history table (bootstrap); subsequent runs compare |

The "first run writes, subsequent runs compare" pattern is standard across migration frameworks. The bootstrap-then-validate approach is well-established.

### Recommendation

**Implement the bootstrap logic as proposed.** It is safe and follows production precedent.

Implementation checklist:

1. Add `Tag::SchemaFingerprint = 0x0D` to `/tidal/src/storage/keys.rs`
2. Implement `Schema::fingerprint() -> [u8; 32]` that:
   - Sorts signal names alphabetically
   - For each signal: hashes `(name, decay_type, half_life_nanos, windows_sorted, velocity_enabled)`
   - Uses BLAKE3 or SHA-256 (BLAKE3 preferred for speed; already in the Rust ecosystem)
3. In `open_with_schema()` (persistent mode only):
   - Read key `[0x00..00][0x00][0x0D]` from items keyspace
   - If absent: write fingerprint, log "schema fingerprint initialized", succeed
   - If present and matches: succeed
   - If present and mismatches: return `TidalError::SchemaMismatch { stored: hex, computed: hex }`
4. Add `SchemaMismatch` variant to `TidalError`
5. Skip entirely for `StorageMode::Ephemeral`

### Open Questions

1. **What fields to include in the fingerprint?** Signal names + decay params are critical because they affect score interpretation. Should embedding slot dimensions and text field definitions also be included? Adding a new text field is backward-compatible, but changing dimensions is not. Recommendation: include signal fields + embedding dimensions. Exclude text fields and policies (additive changes to these are safe).

2. **Force-open escape hatch:** Should `TidalDbBuilder` expose `.allow_schema_migration()` from day one? This is useful for development but dangerous in production. Recommendation: add it but log a WARN-level message when used. Do not add it until the first user needs it.

3. **Migration tool:** A future `tidalctl schema migrate` command should compare old and new schemas, validate that the change is backward-compatible (additions only, no decay parameter changes), and update the fingerprint. This is post-MVP.

---

## Summary

### fjall backup: Use quiesce + file copy

fjall 3.0.2 has **no backup API** ([issue #52](https://github.com/fjall-rs/fjall/issues/52) open and blocked). The safe procedure is: pause writes, flush all engines (`rotate_memtable_and_wait` + `persist(SyncAll)` + Tantivy flush + WAL checkpoint), copy the entire data directory, resume writes. Estimated write pause: 300ms + file copy time. When fjall ships its backup API, switch to it for hard-link support.

### Schema fingerprint: Safe to implement with bootstrap logic

The "no fingerprint -> write and succeed" bootstrap is correct and follows production precedent (Flyway, DuckDB, etc.). It cannot detect schema mismatches that predate the feature, but this is inherent -- the feature prevents future mismatches, not past ones. Key implementation details: sort signals before hashing, hash half_life nanos (not lambda), use a dedicated `Tag::SchemaFingerprint`, skip for ephemeral mode.

## Sources

- [fjall docs.rs -- Database struct](https://docs.rs/fjall/latest/fjall/struct.Database.html)
- [fjall docs.rs -- Keyspace struct](https://docs.rs/fjall/latest/fjall/struct.Keyspace.html)
- [fjall docs.rs -- Snapshot struct](https://docs.rs/fjall/latest/fjall/struct.Snapshot.html)
- [fjall docs.rs -- PersistMode enum](https://docs.rs/fjall/latest/fjall/enum.PersistMode.html)
- [fjall GitHub issue #52: Backup using Checkpointing](https://github.com/fjall-rs/fjall/issues/52) -- open, blocked
- [fjall keyspace source: rotate_memtable_and_wait](https://github.com/fjall-rs/fjall/blob/main/src/keyspace/mod.rs) -- public, annotated "NOTE: Used in tests"
- [fjall 3.0 release blog post](https://fjall-rs.github.io/post/fjall-3/) -- confirms checkpoint is "looking ahead," not shipped
- [RocksDB Checkpoints wiki](https://github.com/facebook/rocksdb/wiki/Checkpoints) -- hard-link SSTs, copy MANIFEST, consistent cross-CF
- [RocksDB Checkpoint blog post, 2015](https://rocksdb.org/blog/2015/11/10/use-checkpoints-for-efficient-snapshots.html)
- [SQLite Online Backup API](https://sqlite.org/backup.html) -- sqlite3_backup_init/step/finish
- [DuckDB Storage Versions](https://duckdb.org/docs/stable/internals/storage) -- version in file header, refuses mismatched opens
- [MongoDB Schema Versioning Pattern](https://www.mongodb.com/blog/post/building-with-patterns-the-schema-versioning-pattern)
- tidalDB source: `/tidal/src/storage/fjall.rs` -- FjallStorage, FjallBackend, flush_all()
- tidalDB source: `/tidal/src/db/mod.rs` -- TidalDb struct, shutdown_inner(), data surface
- tidalDB source: `/tidal/src/db/open.rs` -- open_with_schema(), the integration point for fingerprint check
- tidalDB source: `/tidal/src/db/paths.rs` -- directory layout: wal, items, users, creators, cache
- tidalDB source: `/tidal/src/schema/validation/mod.rs` -- Schema struct, signals HashMap
- tidalDB source: `/tidal/src/storage/keys.rs` -- Tag enum, key encoding format