# Network Requirements **Network configuration for StemeDB deployments** --- ## Port Scheme (181XX) StemeDB uses ports in the `181XX` range for all services: | Port | Protocol | Service | Purpose | Expose To | |------|----------|---------|---------|-----------| | **18180** | TCP/HTTP | API Server | Queries, ingest, metrics | Clients (via reverse proxy) | | **18181** | TCP/HTTP | Cluster Gateway | Cluster coordination, admin endpoints | Internal network only | | **18182** | TCP/gRPC | Cluster RPC | Assertion replication | Cluster nodes only | | **18183** | UDP | SWIM Gossip | Membership, failure detection | Cluster nodes only | | 18184 | TCP/HTTP | (Reserved for future metrics) | - | - | | 18185 | TCP/HTTP | (Reserved for future admin) | - | - | | 18186-18189 | - | (Reserved for applications) | - | - | --- ## Firewall Rules ### Single-Node Deployment **Allow inbound:** - Port 18180 from load balancer/reverse proxy (or internal network) - Port 22 (SSH) from bastion host **Block:** - Port 18180 from public internet (use reverse proxy) - Ports 18181-18183 (not used in single-node) **AWS Security Group:** ```bash # Allow API from load balancer aws ec2 authorize-security-group-ingress \ --group-id sg-stemedb \ --source-group sg-load-balancer \ --protocol tcp \ --port 18180 # Allow SSH from bastion aws ec2 authorize-security-group-ingress \ --group-id sg-stemedb \ --source-group sg-bastion \ --protocol tcp \ --port 22 ``` **iptables:** ```bash # Allow API from internal network only sudo iptables -A INPUT -p tcp -s 10.0.0.0/8 --dport 18180 -j ACCEPT sudo iptables -A INPUT -p tcp --dport 18180 -j DROP # Save rules sudo iptables-save > /etc/iptables/rules.v4 ``` --- ### Three-Node Cluster **Allow inbound:** - Port 18180 from load balancer (API traffic) - Ports 18181-18183 from cluster nodes (inter-node) - Port 22 (SSH) from bastion host **Block:** - Ports 18180-18183 from public internet - Port 18181 from outside internal network (admin endpoint security) **AWS Security Group:** ```bash # Allow API from load balancer aws ec2 authorize-security-group-ingress \ --group-id sg-stemedb \ --source-group sg-load-balancer \ --protocol tcp \ --port 18180 # Allow cluster communication (node ↔ node) aws ec2 authorize-security-group-ingress \ --group-id sg-stemedb \ --source-group sg-stemedb \ --protocol tcp \ --port 18181-18182 # Allow SWIM gossip (UDP) aws ec2 authorize-security-group-ingress \ --group-id sg-stemedb \ --source-group sg-stemedb \ --protocol udp \ --port 18183 # Allow SSH from bastion aws ec2 authorize-security-group-ingress \ --group-id sg-stemedb \ --source-group sg-bastion \ --protocol tcp \ --port 22 ``` **iptables (on each node):** ```bash # Allow API from load balancer sudo iptables -A INPUT -p tcp -s 10.0.1.10 --dport 18180 -j ACCEPT # Allow cluster traffic from other nodes sudo iptables -A INPUT -p tcp -s 10.0.1.51 --dport 18181:18182 -j ACCEPT sudo iptables -A INPUT -p tcp -s 10.0.1.52 --dport 18181:18182 -j ACCEPT sudo iptables -A INPUT -p tcp -s 10.0.1.53 --dport 18181:18182 -j ACCEPT # Allow SWIM gossip sudo iptables -A INPUT -p udp -s 10.0.1.0/24 --dport 18183 -j ACCEPT # Drop everything else sudo iptables -A INPUT -p tcp --dport 18180:18189 -j DROP ``` --- ## TLS Configuration ### Requirements - **Minimum TLS version:** 1.3 - **Certificate validity:** <90 days (automate renewal) - **Key algorithm:** RSA 2048-bit or ECDSA P-256 - **Termination:** At reverse proxy (recommended) or at StemeDB API ### Let's Encrypt Automation **Certbot with nginx:** ```bash # Install certbot sudo apt install certbot python3-certbot-nginx # Obtain certificate sudo certbot --nginx -d stemedb.example.com # Auto-renewal (cron) sudo crontab -e # Add: 0 3 * * * certbot renew --quiet && systemctl reload nginx ``` **Manual certificate (for testing):** ```bash # Generate self-signed (NOT for production) openssl req -x509 -newkey rsa:2048 -nodes \ -keyout /etc/stemedb/tls/key.pem \ -out /etc/stemedb/tls/cert.pem \ -days 365 \ -subj "/CN=stemedb.local" # Set permissions sudo chmod 600 /etc/stemedb/tls/key.pem sudo chmod 644 /etc/stemedb/tls/cert.pem ``` ### TLS at Reverse Proxy (Recommended) **Nginx example:** ```nginx server { listen 443 ssl http2; server_name stemedb.example.com; ssl_certificate /etc/letsencrypt/live/stemedb.example.com/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/stemedb.example.com/privkey.pem; ssl_protocols TLSv1.3; ssl_ciphers HIGH:!aNULL:!MD5; ssl_prefer_server_ciphers on; location / { proxy_pass http://stemedb_cluster; } } ``` **See:** [Nginx Config](../../deployment/nginx/stemedb.conf) for complete example. --- ## DNS Configuration ### Single-Node **Simple A record:** ``` stemedb.example.com. 300 IN A 10.0.1.50 ``` **Health check:** Point DNS to healthy server, manual failover ### Three-Node Cluster **Option 1: Load balancer with CNAME** ``` stemedb.example.com. 300 IN CNAME stemedb-lb.example.com. stemedb-lb.example.com. 60 IN A 10.0.1.10 node1.example.com. 300 IN A 10.0.1.51 node2.example.com. 300 IN A 10.0.1.52 node3.example.com. 300 IN A 10.0.1.53 ``` **Option 2: Multiple A records (DNS round-robin)** ``` stemedb.example.com. 60 IN A 10.0.1.51 stemedb.example.com. 60 IN A 10.0.1.52 stemedb.example.com. 60 IN A 10.0.1.53 ``` ⚠️ **Note:** DNS round-robin doesn't detect failed nodes. Use load balancer instead. ### Internal DNS (Private Network) **For cluster communication:** ``` # Private hosted zone: cluster.local node1.cluster.local. 300 IN A 10.0.1.51 node2.cluster.local. 300 IN A 10.0.1.52 node3.cluster.local. 300 IN A 10.0.1.53 ``` --- ## Latency Requirements ### Single-Node - **Client → Server:** <100ms (typical internet) - **No inter-node requirements** ### Three-Node Cluster - **Client → Load Balancer:** <100ms - **Load Balancer → Node:** <10ms (same region) - **Node ↔ Node:** **<5ms (CRITICAL)** **Why <5ms inter-node?** - SWIM gossip requires fast responses - Replication lag increases with latency - Merkle sync performance degrades **Test latency:** ```bash # From node1 to node2 ping -c 100 node2.cluster.local # Expected: # rtt min/avg/max/mdev = 0.5/1.2/3.5/0.8 ms # If avg >5ms → Nodes too far apart (different regions?) ``` **Deployment recommendations:** - ✅ Same availability zone: <1ms typical - ⚠️ Same region, different AZs: 1-5ms (acceptable) - ❌ Different regions: >10ms (not supported) --- ## Bandwidth Requirements ### Single-Node - **Ingest:** ~1 KB per assertion → 100 assertions/sec = 100 KB/sec = 0.8 Mbps - **Queries:** ~5 KB per query → 100 queries/sec = 500 KB/sec = 4 Mbps - **Total:** ~5 Mbps typical, 10 Mbps recommended ### Three-Node Cluster **Per node:** - **Client traffic:** Same as single-node (~5 Mbps) - **Replication traffic:** ~1 MB per 1K assertions → 1 Gbps for high-throughput **Total cluster:** - **Client traffic:** 15 Mbps (3× single-node) - **Replication traffic:** ~10 Mbps typical, 100 Mbps burst **Recommended:** - **Public bandwidth:** 100 Mbps per node - **Private bandwidth:** 1 Gbps per node (10 Gbps for production) --- ## Load Balancer Configuration ### Health Checks **HTTP health check configuration:** ``` Endpoint: /v1/health Method: GET Interval: 5 seconds Timeout: 3 seconds Healthy threshold: 2 Unhealthy threshold: 3 ``` **Expected response:** ```json { "status": "healthy", "version": "0.1.0", "uptime_seconds": 12345 } ``` **Mark unhealthy if:** - HTTP status != 200 - Response time >3 seconds - `status` field != "healthy" ### Load Balancing Algorithm **Recommended:** Round-robin - Simple - Evenly distributes load - No sticky sessions needed (CRDTs handle conflicts) **Not recommended:** Least connections - Can cause hotspots - Unnecessary complexity ### Session Affinity **Not required** - StemeDB uses CRDTs, so queries can hit any node --- ## Security Considerations ### Admin Endpoints ⚠️ **CRITICAL:** Admin endpoints have NO authentication in Pilot 5 **Endpoints to restrict:** - `/v1/admin/quarantine` - Manage quarantine queue - `/v1/admin/circuit_breakers` - Ban/unban agents - `/v1/admin/indexes/rebuild` - Trigger index rebuild - `/v1/admin/compact` - Trigger compaction **Restriction methods:** **Option 1: Firewall (recommended)** ```bash # Block /v1/admin/ from public # iptables example: sudo iptables -A INPUT -p tcp --dport 18180 -m string --string "/v1/admin/" --algo bm -j DROP # Or in nginx: location /v1/admin/ { deny all; return 403; } ``` **Option 2: VPN-only access** - Require VPN connection to reach port 18181 (cluster gateway) - Use `/v1/admin/` endpoints via cluster gateway only **Option 3: IP allowlist** ```nginx # Nginx example location /v1/admin/ { allow 10.0.0.0/8; # Internal network deny all; } ``` ### Metrics Endpoint **`/metrics` endpoint exposes sensitive information:** - Assertion counts - Query patterns - Agent IDs - Performance data **Restriction:** ```nginx # Allow only from monitoring systems location /metrics { allow 10.0.1.100; # Prometheus server deny all; } ``` --- ## Network Topology Examples ### Single-Node with Reverse Proxy ``` Internet │ ▼ [Nginx/Envoy] (TLS termination, port 443) │ ▼ [StemeDB API] (port 18180, HTTP) │ ▼ [Data] (/data/wal, /data/db) ``` ### Three-Node Cluster ``` Internet │ ▼ [Load Balancer] (TLS, port 443) │ ├─────────┬─────────┐ ▼ ▼ ▼ [Node 1] [Node 2] [Node 3] (port 18180, HTTP) │ │ │ └─────────┴─────────┘ (ports 18182-18183, replication) ``` **See:** [diagrams/network-topology.txt](./diagrams/network-topology.txt) for ASCII diagram. --- ## Troubleshooting ### Connection Refused **Symptom:** `curl: (7) Failed to connect to localhost port 18180: Connection refused` **Diagnosis:** ```bash # Check if port is listening sudo lsof -i :18180 # Should show: stemedb-api # Check firewall sudo iptables -L -n | grep 18180 # Check service status sudo systemctl status stemedb-api ``` **Resolution:** See [Server Won't Start Runbook](../../runbooks/server-wont-start.md) ### High Latency Between Nodes **Symptom:** `replication_lag_seconds` >5 **Diagnosis:** ```bash # Test inter-node latency ping -c 100 node2 # If avg >5ms → Network issue # Check bandwidth iperf3 -c node2 # Should show >100 Mbps ``` **Resolution:** See [High Query Latency Runbook](../../runbooks/high-query-latency.md#1-replication-lag) ### SWIM Gossip Not Working **Symptom:** Nodes not discovering each other **Diagnosis:** ```bash # Check UDP port 18183 sudo tcpdump -i eth0 udp port 18183 # Should show periodic SWIM messages # Check firewall (UDP!) sudo iptables -L -n | grep 18183 ``` **Resolution:** Open UDP port 18183 between cluster nodes --- ## Related Documentation - [Single-Node Architecture](./single-node-pilot.md) - Network for single-node - [Three-Node Cluster](./three-node-cluster.md) - Network for cluster - [Deployment Examples](../../deployment/) - Nginx and Envoy configs - [Add Node Runbook](../../runbooks/add-node.md) - Cluster network setup --- **Last Updated:** 2026-02-11