- Extract redeliver_missed(tx, db, log) helper into cluster_transport.rs - heal_region now removes partition then immediately ships any missed batch-log entries to the healed follower's channel - await_convergence refactored to call the same helper (no logic change) - tidal-server: reload_text_index before search in cluster mode - tidal-server: write_signal returns Result instead of panicking on unknown signal - tidal-server: leader shows lag_events=0 (writes directly, no receiver thread) - tidal-server: fix cluster mode error propagation (ServerError::from) - docs/runbooks/cluster.md: add full cluster operations runbook - docker/: add Dockerfile for containerised cluster deployment - README.md: add tidal-server HTTP API getting-started section - Split oversized source files per CODING_GUIDELINES §9 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
4.8 KiB
tidalDB Cluster Runbook
This runbook describes how to operate the simulated multi-region tidalDB
cluster that ships with tidal-server. The cluster reuses the
SimulatedCluster fabric — it runs multiple in-process nodes, replays the
real WAL + CRDT reconciliation paths, and exposes a single HTTP surface
for microservices.
Important limitations
- Cluster mode currently replicates global signals only.
user_id/creator_idcontexts are rejected so followers stay consistent with the leader’s WAL stream.- All metadata and embedding writes are broadcast to every region up front. There is no separate replication log for items yet.
Prerequisites
- Rust toolchain ≥ 1.91 if running directly.
- Docker 25+ if running via container.
- Port 9500 available (default cluster listener).
1. Launch the cluster locally
cargo run -p tidal-server -- \
cluster \
--listen 127.0.0.1:9500 \
--schema tidal-server/config/default-schema.yaml \
--topology tidal-server/config/default-cluster.yaml
The default topology spins up three regions (us-east, eu-west,
ap-south) with us-east as leader.
2. Launch via Docker
# Build the image once
docker build -f docker/cluster/Dockerfile -t tidal-cluster .
# Run (press Ctrl+C to stop)
docker run --rm -p 9500:9500 tidal-cluster
To supply custom schema/topology files:
docker run --rm -p 9500:9500 \
-v $PWD/configs/my-schema.yaml:/srv/schema.yaml \
-v $PWD/configs/my-topology.yaml:/srv/topology.yaml \
tidal-cluster \
tidal-server cluster \
--listen 0.0.0.0:9500 \
--schema /srv/schema.yaml \
--topology /srv/topology.yaml
3. Core API calls
All routes are JSON unless noted.
Health
curl http://localhost:9500/health
Returns overall status and item count on the leader.
Register items & embeddings
curl -X POST http://localhost:9500/items \
-H 'Content-Type: application/json' \
-d '{ "entity_id": 1, "metadata": { "title": "Jazz Piano", "category": "music" } }'
curl -X POST http://localhost:9500/embeddings \
-H 'Content-Type: application/json' \
-d '{ "entity_id": 1, "values": [0.1, 0.2, 0.3, 0.4] }'
Record signals (cluster mode = global only)
curl -X POST http://localhost:9500/signals \
-H 'Content-Type: application/json' \
-d '{ "entity_id": 1, "signal": "view", "weight": 1.0 }'
Retrieve and search
curl "http://localhost:9500/feed?user_id=42&profile=trending&limit=10"
curl "http://localhost:9500/search?query=jazz%20piano&limit=5"
# Target a specific region (followers may lag during partitions)
curl "http://localhost:9500/feed?profile=trending®ion=eu-west"
4. Cluster operations
Check cluster status
curl http://localhost:9500/cluster/status | jq
Sample response:
{
"leader": "us-east",
"relay_log_len": 125,
"regions": [
{ "name": "us-east", "applied_events": 125, "lag_events": 0, "partitioned": false },
{ "name": "eu-west", "applied_events": 125, "lag_events": 0, "partitioned": false },
{ "name": "ap-south", "applied_events": 124, "lag_events": 1, "partitioned": false }
]
}
Promote a new leader
curl -X POST http://localhost:9500/cluster/promote \
-H 'Content-Type: application/json' \
-d '{ "region": "eu-west" }'
/cluster/status will now report eu-west as leader. New writes are routed
there and replayed to the other regions.
Simulate a partition & heal
# Isolate ap-south (writes will skip this follower)
curl -X POST http://localhost:9500/cluster/partition \
-H 'Content-Type: application/json' \
-d '{ "region": "ap-south" }'
# Heal the partition (missed batches are replayed automatically)
curl -X POST http://localhost:9500/cluster/heal \
-H 'Content-Type: application/json' \
-d '{ "region": "ap-south" }'
Monitor /cluster/status to confirm lag drops back to zero after healing.
5. Runbook checklist
- Startup — launch
tidal-server cluster …(or Docker). Confirm log linelistening on http://…. - Baseline health —
GET /healthandGET /cluster/statusreturn200. - Seed data —
POST /items,/embeddings,/signalsfor initial items. - Traffic — microservices call
/signals,/feed,/search. Addregionquery param to pin to a follower for canary reads. - Failover — to move traffic during maintenance,
POST /cluster/promoteto the target region. Verify status before proceeding. - Partition drill —
POST /cluster/partitionto isolate a follower, observe lag, thenPOST /cluster/heal. - Shutdown — send SIGINT (Ctrl+C) or stop the container. The server logs
shutdown signal receivedand exits cleanly.
Refer to docs/planning/ROADMAP.md for the underlying distributed
fabric guarantees and property tests.