tidaldb/docs/ops/capacity-planning.md

# Capacity Planning

This document provides RAM, disk, and startup time estimates for tidalDB deployments. Use these tables to provision hardware before going to production.

All estimates assume a single-node deployment with default configuration (30-second checkpoint interval, f16 vector quantization, DashMap-based hot tier).

---

## RAM Capacity

tidalDB is an in-memory-first database. USearch HNSW indexes, the signal ledger hot tier, and Tantivy reader segments all reside in RAM during operation. There is no swap tolerance for USearch -- if the process is swapped, ANN query latency degrades from microseconds to seconds.

| Items | Embedding Dims | USearch RAM | Signal Ledger RAM (10 signals) | Tantivy RAM | Total Estimate |
|------:|---------------:|------------:|-------------------------------:|------------:|---------------:|
| 100K  | 128D           | ~26 MB      | ~110 MB                        | ~50 MB      | ~200 MB        |
| 100K  | 768D           | ~154 MB     | ~110 MB                        | ~50 MB      | ~320 MB        |
| 100K  | 1536D          | ~307 MB     | ~110 MB                        | ~50 MB      | ~470 MB        |
| 1M    | 128D           | ~256 MB     | ~1.1 GB                        | ~200 MB     | ~1.6 GB        |
| 1M    | 768D           | ~1.5 GB     | ~1.1 GB                        | ~200 MB     | ~2.8 GB        |
| 1M    | 1536D          | ~3.1 GB     | ~1.1 GB                        | ~200 MB     | ~4.4 GB        |
| 10M   | 128D           | ~2.6 GB     | ~11 GB                         | ~500 MB     | ~14 GB         |
| 10M   | 768D           | ~15 GB      | ~11 GB                         | ~500 MB     | ~27 GB         |
| 10M   | 1536D          | ~31 GB      | ~11 GB                         | ~500 MB     | ~43 GB         |

### Formulas

**USearch HNSW index:**

```
items * dims * 2 bytes (f16 quantization) * 1.2 (HNSW graph overhead)
```

The 20% graph overhead accounts for HNSW neighbor lists (M=16 default, two layers). Actual overhead varies with M and ef_construction parameters.

**Signal ledger hot tier:**

```
items * signal_count * ~1,088 bytes/entry
```

Each `(entity_id, signal_type_id)` entry in the DashMap holds the running decay score, windowed counters (BucketedCounter with minute and hour buckets), velocity state, and the DashMap per-shard overhead. The 1,088 bytes/entry figure was measured in the m7p3 scale benchmarks.

The signal ledger has a memory budget of 5M entries (`DEFAULT_MAX_SIGNAL_ENTRIES`). When exceeded, the checkpoint thread evicts cold entries (oldest `last_update` timestamp). If your workload has more than 5M active `(entity, signal_type)` pairs, cold entries will be served from fjall checkpoints (slower, but correct).

**Tantivy text index:**

Tantivy's RAM usage depends on the number of indexed documents, average document length, and the number of open reader segments. The estimates above assume short metadata fields (title + description, ~200 bytes average). Long-form content indexing will increase RAM proportionally.

### Notes

- Signal ledger RAM is for the in-memory hot tier only. The WAL and fjall checkpoints add disk usage, not RAM.
- The "10 signals" column assumes 10 distinct signal types per entity. Scale linearly for more signal types.
- USearch RAM is the dominant cost at high dimensionality. If you use 1536D embeddings (e.g., OpenAI text-embedding-3-large), plan for USearch to consume 70%+ of total RAM at 10M items.

---

## Disk Capacity

Disk usage comes from three sources: fjall LSM-tree storage (metadata, relationships, signal checkpoints), WAL segments (append-only signal event log), and Tantivy/USearch index files.

| Items | Metadata Size   | Signal Events/Day | Disk/Day (WAL) | Fjall (90 days) | Total (90 days) |
|------:|:----------------|------------------:|----------------:|----------------:|----------------:|
| 100K  | small (256B avg) | 50K               | ~2 MB           | ~1 GB           | ~1.2 GB         |
| 1M    | small            | 500K              | ~20 MB          | ~10 GB          | ~11.8 GB        |
| 10M   | small            | 5M                | ~200 MB         | ~100 GB         | ~118 GB         |

### Formulas

**WAL daily growth:**

```
signal_events_per_day * ~40 bytes/event
```

Each WAL entry contains: 4-byte magic, 8-byte sequence, 1-byte event type, 8-byte entity ID, 2-byte signal type ID, 8-byte timestamp, 8-byte weight (f64), 32-byte BLAKE3 checksum. WAL segments are compacted after each successful checkpoint (every 30 seconds), so WAL disk usage represents only the uncompacted tail, not cumulative growth.

**Fjall storage:**

```
items * metadata_avg_bytes * 1.5 (LSM write amplification)
```

The 1.5x amplification factor accounts for LSM-tree space amplification (multiple sorted runs before compaction merges them). Actual amplification depends on the compaction strategy and write pattern. Signal checkpoints are also stored in fjall -- add ~100 bytes per active `(entity, signal_type)` pair for the serialized checkpoint data.

**Tantivy and USearch on disk:**

- Tantivy: roughly 1.5-2x the raw text size after indexing (inverted index + postings + term dictionary).
- USearch: saved index files are approximately the same size as the in-memory representation (items * dims * 2 bytes + graph metadata).

### WAL Compaction

WAL segments older than the last successful checkpoint are automatically deleted by the checkpoint thread (every 30 seconds). Under normal operation, WAL disk usage stays bounded at roughly `signal_rate * 40 bytes * 30 seconds`. Monitor `tidaldb_wal_lag_bytes` -- if it grows unbounded, checkpointing may be failing (check `tidaldb_checkpoint_failures_total`).

---

## Startup Time

Startup involves: opening fjall keyspaces, restoring the signal ledger from checkpoint, replaying WAL events since the last checkpoint, rebuilding in-memory indexes (bitmap, range, universe, creator-items, collections, suggestions), and loading USearch vector indexes.

| Items | Vectors | Typical Startup |
|------:|--------:|:----------------|
| 100K  | 100K    | ~2-5 sec        |
| 1M    | 1M      | ~15-45 sec      |
| 10M   | 10M     | ~3-8 min        |

### Dominant Costs

1. **USearch index load** is the dominant startup cost at 1M+ vectors. USearch rebuilds the HNSW graph from its serialized format. Progress is logged every 10K vectors.

2. **Signal ledger restore** reads the checkpoint from fjall (a single prefix scan of `Tag::Sig` keys), then replays any WAL events with sequence numbers higher than the checkpoint's `wal_sequence`. Time is proportional to the number of active signal entries + unreplayed WAL events.

3. **Entity state rebuild** scans the items and users keyspaces to reconstruct creator-items bitmaps, relationship indexes (follows, blocks, hides), and interaction weights. Progress is logged every 10K items.

4. **Suggestion index rebuild** scans all item metadata for "title" fields and indexes terms for autocomplete. This is a sequential scan -- fast for 100K items, noticeable at 10M.

5. **Collection index rebuild** reconstructs collection membership bitmaps from fjall.

### Notes

- Startup time is I/O-bound, not CPU-bound. Fast NVMe storage reduces startup time significantly compared to spinning disk.
- WAL replay time depends on how many signals were written since the last checkpoint (at most ~30 seconds of writes under normal operation).
- Tantivy indexes are opened directly from disk (memory-mapped) and do not require a rebuild step.

---

## Recommended Provisioning

**General rule:** provision 2x the estimated RAM for headroom.

| Scale    | Recommended RAM | Recommended Disk | CPU Cores |
|:---------|:----------------|:-----------------|:----------|
| 100K items, 128D | 512 MB   | 5 GB SSD         | 2         |
| 100K items, 768D | 1 GB     | 5 GB SSD         | 2         |
| 1M items, 128D   | 4 GB     | 25 GB SSD        | 4         |
| 1M items, 768D   | 8 GB     | 25 GB SSD        | 4         |
| 10M items, 128D  | 32 GB    | 250 GB NVMe      | 8         |
| 10M items, 768D  | 64 GB    | 250 GB NVMe      | 8         |
| 10M items, 1536D | 96 GB    | 250 GB NVMe      | 16        |

### Why 2x headroom?

- Signal ledger entries grow as new `(entity, signal_type)` pairs are written. The hot tier can hold up to 5M entries before trimming kicks in.
- Tantivy segment merges temporarily double the index size during merge operations.
- USearch does not support incremental resize -- if you approach capacity, you need enough free RAM to hold both the old and new index during a potential rebuild.
- The Rust allocator (jemalloc or system) has its own fragmentation overhead.

### Swap

Do not configure swap for production tidalDB instances. USearch HNSW traversal accesses memory in a random-access pattern that defeats page-level caching. A single swapped page in the HNSW graph can turn a 50-microsecond ANN query into a 50-millisecond disk seek.

### Disk Type

SSD is strongly recommended for all deployments. NVMe is recommended at 10M+ items. The WAL uses synchronous `fsync` on every segment rotation, and fjall's journal uses `persist(SyncAll)` during checkpoint. Spinning disk latency on these operations directly impacts signal write throughput.