# Single-Node Pilot Architecture **Target:** Proof of concept, friendly pilot, development environments **⚠️ NOT RECOMMENDED FOR PRODUCTION** - Single point of failure, manual recovery required --- ## Overview The single-node architecture is the simplest StemeDB deployment: one server running `stemedb-api` with local storage. Suitable for early pilots, development, and demonstrations where availability is not critical. ``` [See: diagrams/single-node.txt for ASCII diagram] ``` --- ## Target Specifications | Metric | Value | |--------|-------| | **Assertions** | <10,000 | | **Queries/sec** | <100 | | **Concurrent users** | <50 | | **Availability** | Best effort (single point of failure) | | **RTO** | 2 hours (manual restore) | | **RPO** | 24 hours (daily backup) | --- ## Hardware Requirements ### Minimum (Pilot <5K assertions) - **CPU:** 2 vCPUs - **RAM:** 4GB - **Disk:** 50GB SSD (30GB WAL + 20GB DB) - **Network:** 100 Mbps **Example instances:** - AWS: `t3.medium` (2 vCPU, 4GB) - GCP: `n1-standard-1` (1 vCPU, 3.75GB) - Azure: `Standard_B2s` (2 vCPU, 4GB) ### Recommended (Pilot <10K assertions) - **CPU:** 4 vCPUs - **RAM:** 8GB - **Disk:** 100GB SSD (50GB WAL + 50GB DB) - **Network:** 1 Gbps **Example instances:** - AWS: `t3.large` (2 vCPU, 8GB) - GCP: `n2-standard-2` (2 vCPU, 8GB) - Azure: `Standard_D2s_v3` (2 vCPU, 8GB) **See:** [Resource Sizing Guide](./resource-sizing.md) for calculations. --- ## Architecture Diagram **Component layout:** ``` ┌─────────────────────────────────────────────────────┐ │ StemeDB Server │ │ ┌───────────────────────────────────────────────┐ │ │ │ stemedb-api (Port 18180) │ │ │ │ ┌─────────────┐ ┌──────────────┐ │ │ │ │ │ HTTP Router │───▶│ Ingest │ │ │ │ │ │ (Axum) │ │ Pipeline │ │ │ │ │ └─────────────┘ └──────┬───────┘ │ │ │ │ │ │ │ │ │ ┌──────────────────┐ ▼ │ │ │ │ │ Query Engine │ ┌────────────┐ │ │ │ │ │ (Lenses) │ │ WAL │ │ │ │ │ └────────┬─────────┘ └────────────┘ │ │ │ │ │ /data/wal/ │ │ │ │ ▼ │ │ │ │ ┌──────────────────┐ │ │ │ │ │ HybridStore │ │ │ │ │ │ • KV Store │ │ │ │ │ │ • Indexes │ │ │ │ │ └──────────────────┘ │ │ │ │ /data/db/ │ │ │ └───────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────┘ ▲ │ │ ▼ ┌─────────┐ ┌──────────────────┐ │ Clients │ │ Backups (daily) │ │ (Agents,│ │ /backups/ │ │ Dash) │ │ (rsync-based) │ └─────────┘ └──────────────────┘ ``` --- ## Deployment Steps ### Prerequisites - [ ] Ubuntu 22.04 or RHEL 9 server - [ ] `stemedb-api` binary installed - [ ] systemd service configured - [ ] Firewall rules applied ### Step 1: Install StemeDB ```bash # Download binary (replace with your release URL) sudo curl -L https://github.com/yourorg/stemedb/releases/download/v0.1.0/stemedb-api -o /usr/local/bin/stemedb-api sudo chmod +x /usr/local/bin/stemedb-api # Verify installation stemedb-api --version # Expected: stemedb-api 0.1.0 ``` ### Step 2: Create Data Directories ```bash # Create directories sudo mkdir -p /data/{wal,db} sudo mkdir -p /backups # Create stemedb user sudo useradd -r -s /bin/false stemedb # Set permissions sudo chown -R stemedb:stemedb /data sudo chown -R stemedb:stemedb /backups sudo chmod 755 /data/{wal,db} ``` ### Step 3: Configure Environment ```bash # Create config file sudo tee /etc/stemedb/config.env <> /var/log/stemedb-backup.log 2>&1 # Test backup sudo /usr/local/bin/backup-stemedb.sh ls -lh /backups/ ``` **Estimated deployment time:** 1-2 hours --- ## Network Configuration ### Ports | Port | Protocol | Purpose | Expose To | |------|----------|---------|-----------| | **18180** | TCP/HTTP | API queries, ingest | Clients (via reverse proxy) | | **18180** | TCP/HTTP | Metrics endpoint | Internal monitoring | ### Firewall Rules **AWS Security Group:** ```bash # Allow HTTP from load balancer only aws ec2 authorize-security-group-ingress \ --group-id sg-xxx \ --source-group sg-lb \ --protocol tcp \ --port 18180 # Allow SSH from bastion aws ec2 authorize-security-group-ingress \ --group-id sg-xxx \ --source-group sg-bastion \ --protocol tcp \ --port 22 ``` **iptables:** ```bash # Allow HTTP from internal network only sudo iptables -A INPUT -p tcp -s 10.0.0.0/8 --dport 18180 -j ACCEPT sudo iptables -A INPUT -p tcp --dport 18180 -j DROP # Persist rules sudo iptables-save > /etc/iptables/rules.v4 ``` **See:** [Network Requirements](./network-requirements.md) for full details. --- ## Monitoring ### Prometheus **Scrape configuration:** ```yaml # /etc/prometheus/prometheus.yml scrape_configs: - job_name: 'stemedb' static_configs: - targets: ['localhost:18180'] metrics_path: '/metrics' scrape_interval: 15s ``` ### Key Metrics to Monitor ```bash # Query latency (should be <200ms p99) stemedb_query_latency_seconds{quantile="0.99"} # Ingest rate (assertions/sec) rate(stemedb_assertions_total[1m]) # WAL fsync latency (should be <10ms) stemedb_wal_fsync_latency_seconds # Disk usage (alert at 80%) node_filesystem_avail_bytes{mountpoint="/data"} # Memory usage process_resident_memory_bytes ``` ### Grafana Dashboard **See:** Example dashboard in `docker-compose/pilot-with-monitoring.yml` stack. **Key panels:** - Query latency (p50, p95, p99) - Ingest rate (assertions/sec) - Disk usage (WAL, DB, total) - Error rate (4xx, 5xx responses) --- ## Failure Scenarios ### Server Failure **Impact:** Complete outage, all queries and writes fail **Recovery:** 1. Provision new server 2. Restore from backup (see [Restore Runbook](../../runbooks/restore-from-backup.md)) 3. Update DNS to point to new server 4. Validate with test queries **Estimated RTO:** 2 hours (manual) **Data loss:** Last 24 hours (if daily backup) ### Disk Failure **Impact:** Data loss, server won't start **Recovery:** 1. Replace disk 2. Restore from backup 3. Restart server **Estimated RTO:** 2 hours **Data loss:** Last 24 hours ### Process Crash (OOM, segfault) **Impact:** Temporary outage, automatic restart via systemd **Recovery:** - Automatic (systemd restart after 5s) - WAL replay recovers in-flight data **Estimated RTO:** 10-30 seconds **Data loss:** None (WAL preserves writes) --- ## Limitations **Single-node architecture has these limitations:** 1. **No High Availability:** - Server failure = complete outage - No automatic failover - Manual recovery required 2. **No Horizontal Scaling:** - Single CPU/RAM/disk bottleneck - Can't add capacity by adding nodes 3. **Manual Recovery:** - Restore from backup is manual process - Downtime 1-2 hours typical 4. **Limited Throughput:** - ~100 queries/sec typical - ~100 assertions/sec write capacity 5. **Data Loss Risk:** - Daily backups = up to 24hr data loss - No real-time replication **For production deployments, use [Three-Node Cluster](./three-node-cluster.md) instead.** --- ## When to Migrate **Migrate to three-node cluster when:** - [ ] Assertion count approaching 10,000 - [ ] Query latency p99 >500ms sustained - [ ] Availability requirements tighten (need <5min RTO) - [ ] Pilot validated, moving to production - [ ] Compliance requires redundancy **Migration procedure:** [Add Node Runbook](../../runbooks/add-node.md#1-bootstrap-3-node-cluster) --- ## Cost Estimate **AWS example (t3.large, us-east-1):** | Resource | Monthly Cost | |----------|--------------| | Compute (t3.large) | $60 | | Storage (100GB SSD) | $10 | | Backup (500GB S3) | $12 | | Data transfer | $5 | | **Total** | **~$87/month** | **GCP example (n2-standard-2, us-central1):** | Resource | Monthly Cost | |----------|--------------| | Compute (n2-standard-2) | $65 | | Storage (100GB SSD) | $17 | | Backup (500GB Cloud Storage) | $10 | | **Total** | **~$92/month** | --- ## Related Documentation - [Three-Node Cluster](./three-node-cluster.md) - Production architecture - [Resource Sizing](./resource-sizing.md) - Hardware calculations - [Network Requirements](./network-requirements.md) - Firewall rules - [Pilot Success Criteria](../../pilot-success-criteria.md) - Validation checklist - [Deployment Example](../../deployment/docker-compose/pilot-with-monitoring.yml) - Docker Compose stack --- **Last Updated:** 2026-02-11