# Certificate Expiring Soon ## Severity: CRITICAL ## Alert Rule **Alert:** `CertificateExpiringSoon` **Trigger:** TLS certificate expires within 7 days **Duration:** 1h ## Symptom - Alert fires: "TLS certificate expires in X days" - Metrics show `stemedb_tls_cert_expiry_seconds < 604800` (7 days) - Logs contain certificate expiry warnings - `openssl` commands show approaching expiration date ## Impact **User Impact (if cert expires):** - All HTTPS/TLS connections fail immediately - API becomes unreachable for external clients - Dashboard shows "Certificate Invalid" errors - Inter-node cluster communication fails (if using mTLS) **Business Impact:** - Complete service outage for external users - SLA breach - Customer trust erosion (security warnings in browsers) ## Investigation Steps ### 1. Check Certificate Expiration ```bash # Check certificate expiry date echo | openssl s_client -servername stemedb.example.com \ -connect localhost:18180 2>/dev/null | \ openssl x509 -noout -dates # notBefore=Jan 1 00:00:00 2025 GMT # notAfter=Apr 1 23:59:59 2026 GMT # Days until expiry echo | openssl s_client -servername stemedb.example.com \ -connect localhost:18180 2>/dev/null | \ openssl x509 -noout -checkend $((7 * 86400)) ``` ### 2. Check Certificate Details ```bash # View full certificate openssl s_client -servername stemedb.example.com \ -connect localhost:18180 /dev/null | \ openssl x509 -text -noout | grep -A 3 "Subject:\|Issuer:\|Validity" ``` ### 3. Check Certificate Source ```bash # Check if using Let's Encrypt cat /etc/stemedb/tls/cert.pem | openssl x509 -noout -issuer # issuer=C = US, O = Let's Encrypt, CN = R3 # Check certbot renewal status (if using Let's Encrypt) certbot certificates | grep -A 10 stemedb.example.com ``` ### 4. Check Renewal Automation ```bash # Check certbot timer (systemd) systemctl status certbot.timer # Check cron jobs crontab -l | grep certbot # Check recent renewal attempts journalctl -u certbot --since "7 days ago" | grep -i "renew" ``` ## Resolution ### If Using Let's Encrypt **1. Attempt manual renewal:** ```bash # Dry run first certbot renew --dry-run --cert-name stemedb.example.com # If successful, perform actual renewal certbot renew --cert-name stemedb.example.com --force-renewal ``` **2. Reload certificate in stemedb-api:** ```bash # Option A: Graceful reload (no downtime) systemctl reload stemedb-api # Option B: Restart (brief downtime) systemctl restart stemedb-api ``` **3. Verify new certificate:** ```bash echo | openssl s_client -servername stemedb.example.com \ -connect localhost:18180 2>/dev/null | \ openssl x509 -noout -dates | grep notAfter ``` ### If Using Custom CA **1. Generate new certificate signing request (CSR):** ```bash # Generate new private key openssl genrsa -out /etc/stemedb/tls/new-key.pem 4096 # Generate CSR openssl req -new -key /etc/stemedb/tls/new-key.pem \ -out /tmp/stemedb.csr \ -subj "/C=US/ST=CA/O=StemeDB/CN=stemedb.example.com" ``` **2. Submit CSR to CA:** ```bash # Send CSR to CA for signing # (Process varies by CA - follow CA-specific procedures) cat /tmp/stemedb.csr | mail -s "Certificate Renewal Request" ca@example.com ``` **3. After receiving signed certificate, install:** ```bash # Backup old certificate cp /etc/stemedb/tls/cert.pem /etc/stemedb/tls/cert.pem.old.$(date +%Y%m%d) cp /etc/stemedb/tls/key.pem /etc/stemedb/tls/key.pem.old.$(date +%Y%m%d) # Install new certificate mv /tmp/new-cert.pem /etc/stemedb/tls/cert.pem mv /etc/stemedb/tls/new-key.pem /etc/stemedb/tls/key.pem # Set correct permissions chmod 600 /etc/stemedb/tls/key.pem chmod 644 /etc/stemedb/tls/cert.pem chown stemedb:stemedb /etc/stemedb/tls/*.pem ``` **4. Reload service:** ```bash systemctl reload stemedb-api # Verify service accepted new cert journalctl -u stemedb-api --since "1 min ago" | grep -i "tls\|certificate" ``` ### If Renewal Fails **1. Check common failure reasons:** ```bash # DNS validation issues (Let's Encrypt) dig _acme-challenge.stemedb.example.com TXT # HTTP validation issues curl -v http://stemedb.example.com/.well-known/acme-challenge/test # Rate limits certbot renew --dry-run 2>&1 | grep -i "rate limit" ``` **2. Switch to DNS validation (if HTTP fails):** ```bash certbot certonly --manual --preferred-challenges dns \ -d stemedb.example.com \ --email ops@example.com ``` **3. Use staging CA to test (doesn't count against rate limits):** ```bash certbot renew --cert-name stemedb.example.com \ --server https://acme-staging-v02.api.letsencrypt.org/directory \ --dry-run ``` ### If Certificate Already Expired **1. Generate temporary self-signed certificate:** ```bash openssl req -x509 -nodes -days 30 -newkey rsa:4096 \ -keyout /etc/stemedb/tls/temp-key.pem \ -out /etc/stemedb/tls/temp-cert.pem \ -subj "/CN=stemedb.example.com" ``` **2. Install temporary cert:** ```bash mv /etc/stemedb/tls/cert.pem /etc/stemedb/tls/cert.pem.expired cp /etc/stemedb/tls/temp-cert.pem /etc/stemedb/tls/cert.pem cp /etc/stemedb/tls/temp-key.pem /etc/stemedb/tls/key.pem systemctl reload stemedb-api ``` **3. Fix renewal and replace with valid cert:** Follow renewal steps above, then replace temporary cert. ## Prevention ### Automated Renewal **1. Enable certbot timer (Let's Encrypt):** ```bash # Enable automatic renewal systemctl enable certbot.timer systemctl start certbot.timer # Verify timer is active systemctl list-timers | grep certbot ``` **2. Configure deploy hook:** Create `/etc/letsencrypt/renewal-hooks/deploy/reload-stemedb.sh`: ```bash #!/bin/bash systemctl reload stemedb-api journalctl -u stemedb-api -n 5 | grep -i "certificate reloaded" || \ echo "WARNING: Certificate reload may have failed" ``` Make executable: ```bash chmod +x /etc/letsencrypt/renewal-hooks/deploy/reload-stemedb.sh ``` **3. Test renewal automation:** ```bash # Dry run triggers deploy hook certbot renew --dry-run ``` ### Monitoring **1. Alert at 30 days (warning) and 7 days (critical):** ```yaml # Prometheus alert - alert: CertificateExpiringWarning expr: stemedb_tls_cert_expiry_seconds < (30 * 86400) annotations: summary: "TLS certificate expires in 30 days" - alert: CertificateExpiringSoon expr: stemedb_tls_cert_expiry_seconds < (7 * 86400) annotations: summary: "TLS certificate expires in 7 days - RENEW NOW" ``` **2. Export certificate expiry metric:** Ensure `/metrics` endpoint includes: ``` stemedb_tls_cert_expiry_seconds{domain="stemedb.example.com"} 2592000 ``` **3. Set up external monitoring:** ```bash # Monitor from outside (catches firewall issues) # Cron job on monitoring server: 0 */6 * * * /usr/local/bin/check-cert.sh stemedb.example.com ``` ### Operational Best Practices **1. Renew at 60 days (Let's Encrypt expires at 90):** Edit `/etc/letsencrypt/renewal/stemedb.example.com.conf`: ```ini renew_before_expiry = 30 days ``` **2. Document certificate renewal procedures:** Maintain runbook with: - CA contact information - DNS/domain registrar access - Escalation path if renewal fails **3. Test renewal quarterly:** ```bash # Quarterly manual test certbot renew --cert-name stemedb.example.com --force-renewal --dry-run ``` ## Escalation **Escalate immediately if:** - Certificate expires in <48 hours and renewal failing - CA rate limits prevent renewal - DNS validation requires domain registrar access (not available) - Certificate already expired and affecting production **Escalation path:** 1. **Primary on-call:** Infrastructure SRE 2. **Secondary:** Security engineer (CA coordination) 3. **Final escalation:** VP Engineering + Legal (CA contract issues) ## References - **Dashboard:** [StemeDB TLS Health](http://grafana.example.com/d/stemedb-tls) - **Related alerts:** `TLSHandshakeFailures`, `ClientAuthenticationErrors` - **Metrics:** - `stemedb_tls_cert_expiry_seconds` (days until expiry) - `stemedb_tls_handshake_errors_total` (TLS failures) - **Docs:** - Let's Encrypt: https://letsencrypt.org/docs/ - Certbot renewal: https://eff-certbot.readthedocs.io/en/stable/using.html#renewal