# tidalDB Cluster Runbook This runbook describes how to operate the simulated multi-region tidalDB cluster that ships with `tidal-server`. The cluster reuses the `SimulatedCluster` fabric — it runs multiple in-process nodes, replays the real WAL + CRDT reconciliation paths, and exposes a single HTTP surface for microservices. > **Important limitations** > > - Cluster mode currently replicates global signals only. `user_id` / > `creator_id` contexts are rejected so followers stay consistent with the > leader’s WAL stream. > - All metadata and embedding writes are broadcast to every region up front. > There is no separate replication log for items yet. ## Prerequisites - Rust toolchain ≥ 1.91 if running directly. - Docker 25+ if running via container. - Port 9500 available (default cluster listener). ## 1. Launch the cluster locally ```bash cargo run -p tidal-server -- \ cluster \ --listen 127.0.0.1:9500 \ --schema tidal-server/config/default-schema.yaml \ --topology tidal-server/config/default-cluster.yaml ``` The default topology spins up three regions (`us-east`, `eu-west`, `ap-south`) with `us-east` as leader. ## 2. Launch via Docker ```bash # Build the image once docker build -f docker/cluster/Dockerfile -t tidal-cluster . # Run (press Ctrl+C to stop) docker run --rm -p 9500:9500 tidal-cluster ``` To supply custom schema/topology files: ```bash docker run --rm -p 9500:9500 \ -v $PWD/configs/my-schema.yaml:/srv/schema.yaml \ -v $PWD/configs/my-topology.yaml:/srv/topology.yaml \ tidal-cluster \ tidal-server cluster \ --listen 0.0.0.0:9500 \ --schema /srv/schema.yaml \ --topology /srv/topology.yaml ``` ## 3. Core API calls All routes are JSON unless noted. ### Health ```bash curl http://localhost:9500/health ``` Returns overall status and item count on the leader. ### Register items & embeddings ```bash curl -X POST http://localhost:9500/items \ -H 'Content-Type: application/json' \ -d '{ "entity_id": 1, "metadata": { "title": "Jazz Piano", "category": "music" } }' curl -X POST http://localhost:9500/embeddings \ -H 'Content-Type: application/json' \ -d '{ "entity_id": 1, "values": [0.1, 0.2, 0.3, 0.4] }' ``` ### Record signals (cluster mode = global only) ```bash curl -X POST http://localhost:9500/signals \ -H 'Content-Type: application/json' \ -d '{ "entity_id": 1, "signal": "view", "weight": 1.0 }' ``` ### Retrieve and search ```bash curl "http://localhost:9500/feed?user_id=42&profile=trending&limit=10" curl "http://localhost:9500/search?query=jazz%20piano&limit=5" # Target a specific region (followers may lag during partitions) curl "http://localhost:9500/feed?profile=trending®ion=eu-west" ``` ## 4. Cluster operations ### Check cluster status ```bash curl http://localhost:9500/cluster/status | jq ``` Sample response: ```json { "leader": "us-east", "relay_log_len": 125, "regions": [ { "name": "us-east", "applied_events": 125, "lag_events": 0, "partitioned": false }, { "name": "eu-west", "applied_events": 125, "lag_events": 0, "partitioned": false }, { "name": "ap-south", "applied_events": 124, "lag_events": 1, "partitioned": false } ] } ``` ### Promote a new leader ```bash curl -X POST http://localhost:9500/cluster/promote \ -H 'Content-Type: application/json' \ -d '{ "region": "eu-west" }' ``` `/cluster/status` will now report `eu-west` as leader. New writes are routed there and replayed to the other regions. ### Simulate a partition & heal ```bash # Isolate ap-south (writes will skip this follower) curl -X POST http://localhost:9500/cluster/partition \ -H 'Content-Type: application/json' \ -d '{ "region": "ap-south" }' # Heal the partition (missed batches are replayed automatically) curl -X POST http://localhost:9500/cluster/heal \ -H 'Content-Type: application/json' \ -d '{ "region": "ap-south" }' ``` Monitor `/cluster/status` to confirm lag drops back to zero after healing. ## 5. Runbook checklist 1. **Startup** — launch `tidal-server cluster …` (or Docker). Confirm log line `listening on http://…`. 2. **Baseline health** — `GET /health` and `GET /cluster/status` return `200`. 3. **Seed data** — `POST /items`, `/embeddings`, `/signals` for initial items. 4. **Traffic** — microservices call `/signals`, `/feed`, `/search`. Add `region` query param to pin to a follower for canary reads. 5. **Failover** — to move traffic during maintenance, `POST /cluster/promote` to the target region. Verify status before proceeding. 6. **Partition drill** — `POST /cluster/partition` to isolate a follower, observe lag, then `POST /cluster/heal`. 7. **Shutdown** — send SIGINT (Ctrl+C) or stop the container. The server logs `shutdown signal received` and exits cleanly. Refer to `docs/planning/ROADMAP.md` for the underlying distributed fabric guarantees and property tests.