# StemeDB Systemd Units Systemd service and timer units for automated StemeDB operations. ## Installation ### 1. Copy Units to System Directory ```bash sudo cp docs/operations/deployment/systemd/stemedb-*.{service,timer} /etc/systemd/system/ ``` ### 2. Copy Backup Script ```bash sudo cp scripts/backup-stemedb.sh /usr/local/bin/ sudo chmod +x /usr/local/bin/backup-stemedb.sh ``` ### 3. Create Configuration File Create `/etc/default/stemedb-backup`: ```bash # AWS S3 Configuration AWS_REGION=us-east-1 AWS_S3_BUCKET=stemedb-backups-prod # AWS credentials: use IAM instance profile (preferred) or specify below # AWS_ACCESS_KEY_ID=AKIAXXXXXXXXXXXXXXXX # AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx # Backup Configuration BACKUP_OUTPUT_DIR=/var/backups/stemedb BACKUP_RETENTION=30d # StemeDB Data Directories STEMEDB_WAL_DIR=/var/lib/stemedb/wal STEMEDB_DB_DIR=/var/lib/stemedb/db ``` **Security Note:** Use IAM instance profiles instead of credentials in config file when possible. ### 4. Create Backup Directory ```bash sudo mkdir -p /var/backups/stemedb sudo chown stemedb:stemedb /var/backups/stemedb ``` ### 5. Enable and Start Timers ```bash # Reload systemd configuration sudo systemctl daemon-reload # Enable backup timer (starts on boot) sudo systemctl enable stemedb-backup.timer # Start backup timer immediately sudo systemctl start stemedb-backup.timer # Enable verification timer sudo systemctl enable stemedb-verify-backup.timer sudo systemctl start stemedb-verify-backup.timer # Enable WAL archival timer sudo systemctl enable stemedb-archive-wal.timer sudo systemctl start stemedb-archive-wal.timer ``` ## Verification ### Check Timer Status ```bash # List all StemeDB timers systemctl list-timers 'stemedb-*' # Expected output: # NEXT LEFT LAST PASSED UNIT ACTIVATES # Wed 2026-02-12 06:00:00 UTC 3h 45min left n/a n/a stemedb-backup.timer stemedb-backup.service # Sun 2026-02-16 03:00:00 UTC 3d 23h left n/a n/a stemedb-verify-backup.timer stemedb-verify-backup.service # Wed 2026-02-12 02:30:00 UTC 15min left n/a n/a stemedb-archive-wal.timer stemedb-archive-wal.service ``` ### Check Service Status ```bash # View backup service status sudo systemctl status stemedb-backup.service # View recent logs sudo journalctl -u stemedb-backup.service -n 50 # Follow logs in real-time sudo journalctl -u stemedb-backup.service -f ``` ### Manual Trigger ```bash # Trigger backup manually (without waiting for timer) sudo systemctl start stemedb-backup.service # Watch progress sudo journalctl -u stemedb-backup.service -f ``` ## Units Reference ### stemedb-backup.timer - **Schedule:** Every 6 hours (00:00, 06:00, 12:00, 18:00 UTC) - **Persistent:** Runs on boot if missed - **Randomized Delay:** 0-5 minutes to avoid thundering herd ### stemedb-backup.service - **What it does:** - Backs up WAL and DB directories - Enforces retention policy (default: 30 days) - Uploads to S3 (if `--upload-s3` flag enabled) - Writes Prometheus metrics - **Timeout:** 1 hour - **Retries:** 3 attempts with 5-minute backoff ### stemedb-verify-backup.timer - **Schedule:** Weekly on Sunday at 03:00 UTC - **Persistent:** Yes ### stemedb-verify-backup.service - **What it does:** - Validates latest backup checksums - Checks magic bytes, CRC32C, BLAKE3 - Writes verification status to metrics - **Timeout:** 30 minutes ### stemedb-archive-wal.timer - **Schedule:** Every 15 minutes - **Persistent:** Yes ### stemedb-archive-wal.service - **What it does:** - Ships WAL segments to S3 - Tracks archival state - Achieves RPO=15min - **Timeout:** 10 minutes ## Monitoring All services write metrics to `/var/lib/node_exporter/textfile_collector/stemedb_backup.prom` for Prometheus scraping. **Key metrics:** - `stemedb_backup_age_seconds` - Time since last successful backup - `stemedb_backup_last_success_timestamp` - Unix timestamp of last backup - `stemedb_backup_verification_status` - 1 = verified, 0 = failed/pending - `stemedb_wal_archival_lag_seconds` - Delay between WAL creation and S3 upload See `docs/operations/deployment/prometheus/backup-alerts.yml` for alert rules. ## Troubleshooting ### Timer Not Running ```bash # Check if timer is enabled systemctl is-enabled stemedb-backup.timer # Check timer status systemctl status stemedb-backup.timer # View timer logs journalctl -u stemedb-backup.timer ``` ### Service Failing ```bash # View service logs sudo journalctl -u stemedb-backup.service -n 100 # Common issues: # - Permission denied: check user/group in service file # - AWS credentials: verify /etc/default/stemedb-backup or IAM role # - Disk full: check df -h /var/backups/stemedb ``` ### S3 Upload Failing ```bash # Test AWS credentials sudo -u stemedb aws s3 ls s3://stemedb-backups-prod/ # Check bucket permissions aws s3api get-bucket-policy --bucket stemedb-backups-prod # Verify service has AWS environment variables sudo systemctl show stemedb-backup.service --property=Environment ``` ## Maintenance ### Update Timer Schedule Edit `/etc/systemd/system/stemedb-backup.timer`, change `OnCalendar`, then: ```bash sudo systemctl daemon-reload sudo systemctl restart stemedb-backup.timer ``` ### Change Retention Policy Edit `/etc/default/stemedb-backup`, change `BACKUP_RETENTION`, then: ```bash # No restart needed - takes effect on next backup ``` ### Disable Backups Temporarily ```bash # Stop timer (prevents new backups) sudo systemctl stop stemedb-backup.timer # Re-enable later sudo systemctl start stemedb-backup.timer ``` ## Related Documentation - [Backup Script Reference](../../../../scripts/backup-stemedb.sh) - [Restore Runbook](../../runbooks/restore-from-backup.md) - [Disaster Recovery](../../runbooks/disaster-recovery.md) - [Prometheus Alerts](../prometheus/backup-alerts.yml)