tidaldb/applications/iknowyou/devsetup.md
jordan f4cfd6c81f feat: complete M8 replication primitives + forage enhancements + docs
Milestone 8 (phases 1-4):
- Shard-aware WAL segment naming, BatchHeader v2, ShardRouter
- Transport trait, InProcessTransport, WalShipper, FollowerDb
- HLC, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine
- Session replication bridge with SeqNo/HWM, idempotency store

Forage application:
- Multi-source discovery engine with MAB exploration
- Embedding-based label system, server handlers, UI refresh

Other:
- QUICKSTART.md, README.md, milestone-8 planning docs
- Hard negative union semantics, RLHF export enhancements
- Recovery benchmark and visibility test expansions
- Split 8 oversized source files per CODING_GUIDELINES §9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 13:17:19 -07:00

187 lines
4.7 KiB
Markdown

# iknowyou — Dev Setup
## Infrastructure
### GPU Server
| | |
|---|---|
| **Host** | `msd5685.mjhst.com` |
| **SSH** | `ssh ubuntu@msd5685.mjhst.com` |
| **GPU** | NVIDIA RTX 6000 Ada Generation (48 GB VRAM) |
| **RAM** | 94 GB |
| **CPUs** | 20 |
| **Disk** | 243 GB (172 GB free) |
| **OS** | Ubuntu 22.04, kernel 5.15.0-161 |
| **CUDA** | 13.0 (nvcc 13.0.88) |
| **Driver** | 535.288.01 |
| **Public IP** | 208.122.213.81 |
### vLLM + Qwen3-8B
**Model:** `Qwen/Qwen3-8B` (BF16, ~15.3 GB on GPU)
**API:** OpenAI-compatible at `http://msd5685.mjhst.com:8000/v1`
**Service:** systemd unit `vllm.service` — starts on boot, restarts on failure.
```
# Check status
ssh ubuntu@msd5685.mjhst.com "sudo systemctl status vllm"
# View logs
ssh ubuntu@msd5685.mjhst.com "sudo journalctl -u vllm -f"
# Restart
ssh ubuntu@msd5685.mjhst.com "sudo systemctl restart vllm"
```
**Config:** `/etc/systemd/system/vllm.service`
```ini
[Service]
ExecStart=/home/ubuntu/vllm-env/bin/vllm serve Qwen/Qwen3-8B \
--host 0.0.0.0 \
--port 8000 \
--reasoning-parser qwen3 \
--max-model-len 32768 \
--gpu-memory-utilization 0.85
```
**Python env:** `/home/ubuntu/vllm-env` (Python 3.10, vLLM 0.15.1)
## Using the API
### Chat completion
```bash
curl http://msd5685.mjhst.com:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-8B",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello"}
],
"temperature": 0.7,
"top_p": 0.8,
"max_tokens": 512
}'
```
### Thinking mode
Qwen3 supports a `/think` and `/no_think` toggle in the user message, or via `chat_template_kwargs`:
```bash
# Thinking enabled (default — model reasons in <think> blocks before answering)
curl http://msd5685.mjhst.com:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-8B",
"messages": [{"role": "user", "content": "What is 23 * 47?"}],
"temperature": 0.6,
"top_p": 0.95
}'
# Thinking disabled (faster, no reasoning trace)
curl http://msd5685.mjhst.com:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-8B",
"messages": [{"role": "user", "content": "What is 23 * 47?"}],
"temperature": 0.7,
"top_p": 0.8,
"chat_template_kwargs": {"enable_thinking": false}
}'
```
**Recommended sampling:**
- Thinking mode: `temperature=0.6, top_p=0.95, top_k=20`
- Non-thinking mode: `temperature=0.7, top_p=0.8, top_k=20`
### Structured output (for Observer)
```bash
curl http://msd5685.mjhst.com:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-8B",
"messages": [{"role": "user", "content": "Extract sentiment from: I love this idea!"}],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "sentiment",
"schema": {
"type": "object",
"properties": {
"sentiment": {"type": "string", "enum": ["positive", "negative", "neutral"]},
"confidence": {"type": "number"}
},
"required": ["sentiment", "confidence"]
}
}
},
"chat_template_kwargs": {"enable_thinking": false}
}'
```
### Streaming
```bash
curl http://msd5685.mjhst.com:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-8B",
"messages": [{"role": "user", "content": "Tell me a short story."}],
"stream": true,
"temperature": 0.7
}'
```
### Check model status
```bash
curl http://msd5685.mjhst.com:8000/v1/models
curl http://msd5685.mjhst.com:8000/health
```
## NVIDIA Driver Notes
The server had a driver version mismatch (kernel module 535.274 vs userspace 535.288) on first setup. Fixed by:
```bash
# Unload old modules
sudo rmmod nvidia_uvm nvidia_drm nvidia_modeset nvidia
# Reload with new version
sudo modprobe nvidia && sudo modprobe nvidia_uvm
```
After a reboot, the DKMS-built 535.288 module loads automatically. If `nvidia-smi` ever shows "Driver/library version mismatch" again, either reboot or run the rmmod/modprobe sequence above.
## Topology
```
Local machine (macOS)
│ SSH tunnel or direct HTTP
msd5685.mjhst.com (Ubuntu 22.04)
├── vLLM (systemd, port 8000)
│ └── Qwen/Qwen3-8B (BF16, 48GB RTX 6000 Ada)
└── [future] iknowyou server (port TBD)
└── embedded tidalDB
```
For local development, use an SSH tunnel to reach the API:
```bash
ssh -L 8000:localhost:8000 ubuntu@msd5685.mjhst.com
# Then: curl http://localhost:8000/v1/models
```
Or hit it directly at `http://msd5685.mjhst.com:8000` (port must be open in firewall).