tidaldb/docs/ops/capacity-planning.md
2026-02-23 22:41:16 -07:00

9.0 KiB

Capacity Planning

This document provides RAM, disk, and startup time estimates for tidalDB deployments. Use these tables to provision hardware before going to production.

All estimates assume a single-node deployment with default configuration (30-second checkpoint interval, f16 vector quantization, DashMap-based hot tier).


RAM Capacity

tidalDB is an in-memory-first database. USearch HNSW indexes, the signal ledger hot tier, and Tantivy reader segments all reside in RAM during operation. There is no swap tolerance for USearch -- if the process is swapped, ANN query latency degrades from microseconds to seconds.

Items Embedding Dims USearch RAM Signal Ledger RAM (10 signals) Tantivy RAM Total Estimate
100K 128D ~26 MB ~110 MB ~50 MB ~200 MB
100K 768D ~154 MB ~110 MB ~50 MB ~320 MB
100K 1536D ~307 MB ~110 MB ~50 MB ~470 MB
1M 128D ~256 MB ~1.1 GB ~200 MB ~1.6 GB
1M 768D ~1.5 GB ~1.1 GB ~200 MB ~2.8 GB
1M 1536D ~3.1 GB ~1.1 GB ~200 MB ~4.4 GB
10M 128D ~2.6 GB ~11 GB ~500 MB ~14 GB
10M 768D ~15 GB ~11 GB ~500 MB ~27 GB
10M 1536D ~31 GB ~11 GB ~500 MB ~43 GB

Formulas

USearch HNSW index:

items * dims * 2 bytes (f16 quantization) * 1.2 (HNSW graph overhead)

The 20% graph overhead accounts for HNSW neighbor lists (M=16 default, two layers). Actual overhead varies with M and ef_construction parameters.

Signal ledger hot tier:

items * signal_count * ~1,088 bytes/entry

Each (entity_id, signal_type_id) entry in the DashMap holds the running decay score, windowed counters (BucketedCounter with minute and hour buckets), velocity state, and the DashMap per-shard overhead. The 1,088 bytes/entry figure was measured in the m7p3 scale benchmarks.

The signal ledger has a memory budget of 5M entries (DEFAULT_MAX_SIGNAL_ENTRIES). When exceeded, the checkpoint thread evicts cold entries (oldest last_update timestamp). If your workload has more than 5M active (entity, signal_type) pairs, cold entries will be served from fjall checkpoints (slower, but correct).

Tantivy text index:

Tantivy's RAM usage depends on the number of indexed documents, average document length, and the number of open reader segments. The estimates above assume short metadata fields (title + description, ~200 bytes average). Long-form content indexing will increase RAM proportionally.

Notes

  • Signal ledger RAM is for the in-memory hot tier only. The WAL and fjall checkpoints add disk usage, not RAM.
  • The "10 signals" column assumes 10 distinct signal types per entity. Scale linearly for more signal types.
  • USearch RAM is the dominant cost at high dimensionality. If you use 1536D embeddings (e.g., OpenAI text-embedding-3-large), plan for USearch to consume 70%+ of total RAM at 10M items.

Disk Capacity

Disk usage comes from three sources: fjall LSM-tree storage (metadata, relationships, signal checkpoints), WAL segments (append-only signal event log), and Tantivy/USearch index files.

Items Metadata Size Signal Events/Day Disk/Day (WAL) Fjall (90 days) Total (90 days)
100K small (256B avg) 50K ~2 MB ~1 GB ~1.2 GB
1M small 500K ~20 MB ~10 GB ~11.8 GB
10M small 5M ~200 MB ~100 GB ~118 GB

Formulas

WAL daily growth:

signal_events_per_day * ~40 bytes/event

Each WAL entry contains: 4-byte magic, 8-byte sequence, 1-byte event type, 8-byte entity ID, 2-byte signal type ID, 8-byte timestamp, 8-byte weight (f64), 32-byte BLAKE3 checksum. WAL segments are compacted after each successful checkpoint (every 30 seconds), so WAL disk usage represents only the uncompacted tail, not cumulative growth.

Fjall storage:

items * metadata_avg_bytes * 1.5 (LSM write amplification)

The 1.5x amplification factor accounts for LSM-tree space amplification (multiple sorted runs before compaction merges them). Actual amplification depends on the compaction strategy and write pattern. Signal checkpoints are also stored in fjall -- add ~100 bytes per active (entity, signal_type) pair for the serialized checkpoint data.

Tantivy and USearch on disk:

  • Tantivy: roughly 1.5-2x the raw text size after indexing (inverted index + postings + term dictionary).
  • USearch: saved index files are approximately the same size as the in-memory representation (items * dims * 2 bytes + graph metadata).

WAL Compaction

WAL segments older than the last successful checkpoint are automatically deleted by the checkpoint thread (every 30 seconds). Under normal operation, WAL disk usage stays bounded at roughly signal_rate * 40 bytes * 30 seconds. Monitor tidaldb_wal_lag_bytes -- if it grows unbounded, checkpointing may be failing (check tidaldb_checkpoint_failures_total).


Startup Time

Startup involves: opening fjall keyspaces, restoring the signal ledger from checkpoint, replaying WAL events since the last checkpoint, rebuilding in-memory indexes (bitmap, range, universe, creator-items, collections, suggestions), and loading USearch vector indexes.

Items Vectors Typical Startup
100K 100K ~2-5 sec
1M 1M ~15-45 sec
10M 10M ~3-8 min

Dominant Costs

  1. USearch index load is the dominant startup cost at 1M+ vectors. USearch rebuilds the HNSW graph from its serialized format. Progress is logged every 10K vectors.

  2. Signal ledger restore reads the checkpoint from fjall (a single prefix scan of Tag::Sig keys), then replays any WAL events with sequence numbers higher than the checkpoint's wal_sequence. Time is proportional to the number of active signal entries + unreplayed WAL events.

  3. Entity state rebuild scans the items and users keyspaces to reconstruct creator-items bitmaps, relationship indexes (follows, blocks, hides), and interaction weights. Progress is logged every 10K items.

  4. Suggestion index rebuild scans all item metadata for "title" fields and indexes terms for autocomplete. This is a sequential scan -- fast for 100K items, noticeable at 10M.

  5. Collection index rebuild reconstructs collection membership bitmaps from fjall.

Notes

  • Startup time is I/O-bound, not CPU-bound. Fast NVMe storage reduces startup time significantly compared to spinning disk.
  • WAL replay time depends on how many signals were written since the last checkpoint (at most ~30 seconds of writes under normal operation).
  • Tantivy indexes are opened directly from disk (memory-mapped) and do not require a rebuild step.

General rule: provision 2x the estimated RAM for headroom.

Scale Recommended RAM Recommended Disk CPU Cores
100K items, 128D 512 MB 5 GB SSD 2
100K items, 768D 1 GB 5 GB SSD 2
1M items, 128D 4 GB 25 GB SSD 4
1M items, 768D 8 GB 25 GB SSD 4
10M items, 128D 32 GB 250 GB NVMe 8
10M items, 768D 64 GB 250 GB NVMe 8
10M items, 1536D 96 GB 250 GB NVMe 16

Why 2x headroom?

  • Signal ledger entries grow as new (entity, signal_type) pairs are written. The hot tier can hold up to 5M entries before trimming kicks in.
  • Tantivy segment merges temporarily double the index size during merge operations.
  • USearch does not support incremental resize -- if you approach capacity, you need enough free RAM to hold both the old and new index during a potential rebuild.
  • The Rust allocator (jemalloc or system) has its own fragmentation overhead.

Swap

Do not configure swap for production tidalDB instances. USearch HNSW traversal accesses memory in a random-access pattern that defeats page-level caching. A single swapped page in the HNSW graph can turn a 50-microsecond ANN query into a 50-millisecond disk seek.

Disk Type

SSD is strongly recommended for all deployments. NVMe is recommended at 10M+ items. The WAL uses synchronous fsync on every segment rotation, and fjall's journal uses persist(SyncAll) during checkpoint. Spinning disk latency on these operations directly impacts signal write throughput.