jordan eca7765e8d fix: heal_region re-delivers missed WAL batches so partitioned followers converge immediately after heal

- Extract redeliver_missed(tx, db, log) helper into cluster_transport.rs
- heal_region now removes partition then immediately ships any missed
  batch-log entries to the healed follower's channel
- await_convergence refactored to call the same helper (no logic change)
- tidal-server: reload_text_index before search in cluster mode
- tidal-server: write_signal returns Result instead of panicking on unknown signal
- tidal-server: leader shows lag_events=0 (writes directly, no receiver thread)
- tidal-server: fix cluster mode error propagation (ServerError::from)
- docs/runbooks/cluster.md: add full cluster operations runbook
- docker/: add Dockerfile for containerised cluster deployment
- README.md: add tidal-server HTTP API getting-started section
- Split oversized source files per CODING_GUIDELINES §9

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-25 11:57:01 -07:00

5.4 KiB

Raw Blame History

iknowyou — Dev Setup

Infrastructure

Local Personalization Engine (tidalDB-backed)

Run the personalization engine server locally (default bind: 127.0.0.1:7777):

cargo run -p iknowyou-engine --bin server --features synap-aux

Environment variables:

IKY_ENGINE_BIND (default 127.0.0.1:7777)
IKY_ENGINE_DATA_DIR (default temp dir iknowyou_engine_data)
IKY_ENGINE_URL (used by Next.js API route; default http://127.0.0.1:7777)
SYNAP_URL / SYNAP_API_KEY (optional; enables auxiliary memory writes only)

Health check:

curl http://127.0.0.1:7777/healthz

The app/api/chat/route.ts path now writes observer-driven personalization feedback to this service (/v1/feedback, /v1/sessions/*) while Synap remains optional auxiliary memory.

GPU Server


Host	`msd5685.mjhst.com`
SSH	`ssh ubuntu@msd5685.mjhst.com`
GPU	NVIDIA RTX 6000 Ada Generation (48 GB VRAM)
RAM	94 GB
CPUs	20
Disk	243 GB (172 GB free)
OS	Ubuntu 22.04, kernel 5.15.0-161
CUDA	13.0 (nvcc 13.0.88)
Driver	535.288.01
Public IP	208.122.213.81

vLLM + Qwen3-8B

Model: Qwen/Qwen3-8B (BF16, ~15.3 GB on GPU)

API: OpenAI-compatible at http://msd5685.mjhst.com:8000/v1

Service: systemd unit vllm.service — starts on boot, restarts on failure.

# Check status
ssh ubuntu@msd5685.mjhst.com "sudo systemctl status vllm"

# View logs
ssh ubuntu@msd5685.mjhst.com "sudo journalctl -u vllm -f"

# Restart
ssh ubuntu@msd5685.mjhst.com "sudo systemctl restart vllm"

Config: /etc/systemd/system/vllm.service

[Service]
ExecStart=/home/ubuntu/vllm-env/bin/vllm serve Qwen/Qwen3-8B \
  --host 0.0.0.0 \
  --port 8000 \
  --reasoning-parser qwen3 \
  --max-model-len 32768 \
  --gpu-memory-utilization 0.85

Python env: /home/ubuntu/vllm-env (Python 3.10, vLLM 0.15.1)

Using the API

Chat completion

curl http://msd5685.mjhst.com:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-8B",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello"}
    ],
    "temperature": 0.7,
    "top_p": 0.8,
    "max_tokens": 512
  }'

Thinking mode

Qwen3 supports a /think and /no_think toggle in the user message, or via chat_template_kwargs:

# Thinking enabled (default — model reasons in <think> blocks before answering)
curl http://msd5685.mjhst.com:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-8B",
    "messages": [{"role": "user", "content": "What is 23 * 47?"}],
    "temperature": 0.6,
    "top_p": 0.95
  }'

# Thinking disabled (faster, no reasoning trace)
curl http://msd5685.mjhst.com:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-8B",
    "messages": [{"role": "user", "content": "What is 23 * 47?"}],
    "temperature": 0.7,
    "top_p": 0.8,
    "chat_template_kwargs": {"enable_thinking": false}
  }'

Recommended sampling:

Thinking mode: temperature=0.6, top_p=0.95, top_k=20
Non-thinking mode: temperature=0.7, top_p=0.8, top_k=20

Structured output (for Observer)

curl http://msd5685.mjhst.com:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-8B",
    "messages": [{"role": "user", "content": "Extract sentiment from: I love this idea!"}],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "sentiment",
        "schema": {
          "type": "object",
          "properties": {
            "sentiment": {"type": "string", "enum": ["positive", "negative", "neutral"]},
            "confidence": {"type": "number"}
          },
          "required": ["sentiment", "confidence"]
        }
      }
    },
    "chat_template_kwargs": {"enable_thinking": false}
  }'

Streaming

curl http://msd5685.mjhst.com:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-8B",
    "messages": [{"role": "user", "content": "Tell me a short story."}],
    "stream": true,
    "temperature": 0.7
  }'

Check model status

curl http://msd5685.mjhst.com:8000/v1/models
curl http://msd5685.mjhst.com:8000/health

NVIDIA Driver Notes

The server had a driver version mismatch (kernel module 535.274 vs userspace 535.288) on first setup. Fixed by:

# Unload old modules
sudo rmmod nvidia_uvm nvidia_drm nvidia_modeset nvidia
# Reload with new version
sudo modprobe nvidia && sudo modprobe nvidia_uvm

After a reboot, the DKMS-built 535.288 module loads automatically. If nvidia-smi ever shows "Driver/library version mismatch" again, either reboot or run the rmmod/modprobe sequence above.

Topology

Local machine (macOS)
  │
  │  SSH tunnel or direct HTTP
  │
  ▼
msd5685.mjhst.com (Ubuntu 22.04)
  │
  ├── vLLM (systemd, port 8000)
  │     └── Qwen/Qwen3-8B (BF16, 48GB RTX 6000 Ada)
  │
  └── [future] iknowyou server (port TBD)
        └── embedded tidalDB

For local development, use an SSH tunnel to reach the API:

ssh -L 8000:localhost:8000 ubuntu@msd5685.mjhst.com
# Then: curl http://localhost:8000/v1/models

Or hit it directly at http://msd5685.mjhst.com:8000 (port must be open in firewall).

5.4 KiB Raw Blame History