- Extract redeliver_missed(tx, db, log) helper into cluster_transport.rs - heal_region now removes partition then immediately ships any missed batch-log entries to the healed follower's channel - await_convergence refactored to call the same helper (no logic change) - tidal-server: reload_text_index before search in cluster mode - tidal-server: write_signal returns Result instead of panicking on unknown signal - tidal-server: leader shows lag_events=0 (writes directly, no receiver thread) - tidal-server: fix cluster mode error propagation (ServerError::from) - docs/runbooks/cluster.md: add full cluster operations runbook - docker/: add Dockerfile for containerised cluster deployment - README.md: add tidal-server HTTP API getting-started section - Split oversized source files per CODING_GUIDELINES §9 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
210 lines
5.4 KiB
Markdown
210 lines
5.4 KiB
Markdown
# iknowyou — Dev Setup
|
|
|
|
## Infrastructure
|
|
|
|
### Local Personalization Engine (tidalDB-backed)
|
|
|
|
Run the personalization engine server locally (default bind: `127.0.0.1:7777`):
|
|
|
|
```bash
|
|
cargo run -p iknowyou-engine --bin server --features synap-aux
|
|
```
|
|
|
|
Environment variables:
|
|
|
|
- `IKY_ENGINE_BIND` (default `127.0.0.1:7777`)
|
|
- `IKY_ENGINE_DATA_DIR` (default temp dir `iknowyou_engine_data`)
|
|
- `IKY_ENGINE_URL` (used by Next.js API route; default `http://127.0.0.1:7777`)
|
|
- `SYNAP_URL` / `SYNAP_API_KEY` (optional; enables auxiliary memory writes only)
|
|
|
|
Health check:
|
|
|
|
```bash
|
|
curl http://127.0.0.1:7777/healthz
|
|
```
|
|
|
|
The `app/api/chat/route.ts` path now writes observer-driven personalization feedback to this service (`/v1/feedback`, `/v1/sessions/*`) while Synap remains optional auxiliary memory.
|
|
|
|
### GPU Server
|
|
|
|
| | |
|
|
|---|---|
|
|
| **Host** | `msd5685.mjhst.com` |
|
|
| **SSH** | `ssh ubuntu@msd5685.mjhst.com` |
|
|
| **GPU** | NVIDIA RTX 6000 Ada Generation (48 GB VRAM) |
|
|
| **RAM** | 94 GB |
|
|
| **CPUs** | 20 |
|
|
| **Disk** | 243 GB (172 GB free) |
|
|
| **OS** | Ubuntu 22.04, kernel 5.15.0-161 |
|
|
| **CUDA** | 13.0 (nvcc 13.0.88) |
|
|
| **Driver** | 535.288.01 |
|
|
| **Public IP** | 208.122.213.81 |
|
|
|
|
### vLLM + Qwen3-8B
|
|
|
|
**Model:** `Qwen/Qwen3-8B` (BF16, ~15.3 GB on GPU)
|
|
|
|
**API:** OpenAI-compatible at `http://msd5685.mjhst.com:8000/v1`
|
|
|
|
**Service:** systemd unit `vllm.service` — starts on boot, restarts on failure.
|
|
|
|
```
|
|
# Check status
|
|
ssh ubuntu@msd5685.mjhst.com "sudo systemctl status vllm"
|
|
|
|
# View logs
|
|
ssh ubuntu@msd5685.mjhst.com "sudo journalctl -u vllm -f"
|
|
|
|
# Restart
|
|
ssh ubuntu@msd5685.mjhst.com "sudo systemctl restart vllm"
|
|
```
|
|
|
|
**Config:** `/etc/systemd/system/vllm.service`
|
|
|
|
```ini
|
|
[Service]
|
|
ExecStart=/home/ubuntu/vllm-env/bin/vllm serve Qwen/Qwen3-8B \
|
|
--host 0.0.0.0 \
|
|
--port 8000 \
|
|
--reasoning-parser qwen3 \
|
|
--max-model-len 32768 \
|
|
--gpu-memory-utilization 0.85
|
|
```
|
|
|
|
**Python env:** `/home/ubuntu/vllm-env` (Python 3.10, vLLM 0.15.1)
|
|
|
|
## Using the API
|
|
|
|
### Chat completion
|
|
|
|
```bash
|
|
curl http://msd5685.mjhst.com:8000/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "Qwen/Qwen3-8B",
|
|
"messages": [
|
|
{"role": "system", "content": "You are a helpful assistant."},
|
|
{"role": "user", "content": "Hello"}
|
|
],
|
|
"temperature": 0.7,
|
|
"top_p": 0.8,
|
|
"max_tokens": 512
|
|
}'
|
|
```
|
|
|
|
### Thinking mode
|
|
|
|
Qwen3 supports a `/think` and `/no_think` toggle in the user message, or via `chat_template_kwargs`:
|
|
|
|
```bash
|
|
# Thinking enabled (default — model reasons in <think> blocks before answering)
|
|
curl http://msd5685.mjhst.com:8000/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "Qwen/Qwen3-8B",
|
|
"messages": [{"role": "user", "content": "What is 23 * 47?"}],
|
|
"temperature": 0.6,
|
|
"top_p": 0.95
|
|
}'
|
|
|
|
# Thinking disabled (faster, no reasoning trace)
|
|
curl http://msd5685.mjhst.com:8000/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "Qwen/Qwen3-8B",
|
|
"messages": [{"role": "user", "content": "What is 23 * 47?"}],
|
|
"temperature": 0.7,
|
|
"top_p": 0.8,
|
|
"chat_template_kwargs": {"enable_thinking": false}
|
|
}'
|
|
```
|
|
|
|
**Recommended sampling:**
|
|
- Thinking mode: `temperature=0.6, top_p=0.95, top_k=20`
|
|
- Non-thinking mode: `temperature=0.7, top_p=0.8, top_k=20`
|
|
|
|
### Structured output (for Observer)
|
|
|
|
```bash
|
|
curl http://msd5685.mjhst.com:8000/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "Qwen/Qwen3-8B",
|
|
"messages": [{"role": "user", "content": "Extract sentiment from: I love this idea!"}],
|
|
"response_format": {
|
|
"type": "json_schema",
|
|
"json_schema": {
|
|
"name": "sentiment",
|
|
"schema": {
|
|
"type": "object",
|
|
"properties": {
|
|
"sentiment": {"type": "string", "enum": ["positive", "negative", "neutral"]},
|
|
"confidence": {"type": "number"}
|
|
},
|
|
"required": ["sentiment", "confidence"]
|
|
}
|
|
}
|
|
},
|
|
"chat_template_kwargs": {"enable_thinking": false}
|
|
}'
|
|
```
|
|
|
|
### Streaming
|
|
|
|
```bash
|
|
curl http://msd5685.mjhst.com:8000/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "Qwen/Qwen3-8B",
|
|
"messages": [{"role": "user", "content": "Tell me a short story."}],
|
|
"stream": true,
|
|
"temperature": 0.7
|
|
}'
|
|
```
|
|
|
|
### Check model status
|
|
|
|
```bash
|
|
curl http://msd5685.mjhst.com:8000/v1/models
|
|
curl http://msd5685.mjhst.com:8000/health
|
|
```
|
|
|
|
## NVIDIA Driver Notes
|
|
|
|
The server had a driver version mismatch (kernel module 535.274 vs userspace 535.288) on first setup. Fixed by:
|
|
|
|
```bash
|
|
# Unload old modules
|
|
sudo rmmod nvidia_uvm nvidia_drm nvidia_modeset nvidia
|
|
# Reload with new version
|
|
sudo modprobe nvidia && sudo modprobe nvidia_uvm
|
|
```
|
|
|
|
After a reboot, the DKMS-built 535.288 module loads automatically. If `nvidia-smi` ever shows "Driver/library version mismatch" again, either reboot or run the rmmod/modprobe sequence above.
|
|
|
|
## Topology
|
|
|
|
```
|
|
Local machine (macOS)
|
|
│
|
|
│ SSH tunnel or direct HTTP
|
|
│
|
|
▼
|
|
msd5685.mjhst.com (Ubuntu 22.04)
|
|
│
|
|
├── vLLM (systemd, port 8000)
|
|
│ └── Qwen/Qwen3-8B (BF16, 48GB RTX 6000 Ada)
|
|
│
|
|
└── [future] iknowyou server (port TBD)
|
|
└── embedded tidalDB
|
|
```
|
|
|
|
For local development, use an SSH tunnel to reach the API:
|
|
|
|
```bash
|
|
ssh -L 8000:localhost:8000 ubuntu@msd5685.mjhst.com
|
|
# Then: curl http://localhost:8000/v1/models
|
|
```
|
|
|
|
Or hit it directly at `http://msd5685.mjhst.com:8000` (port must be open in firewall).
|