# Alertmanager configuration for Slack integration # # This configuration sends StemeDB alerts to Slack channels by severity. # Merge this with your existing alertmanager.yml or pagerduty-config.yml. receivers: # Critical alerts -> #stemedb-alerts-critical (high visibility) - name: 'slack-critical' slack_configs: - api_url: '' channel: '#stemedb-alerts-critical' username: 'StemeDB Alerts' icon_emoji: ':rotating_light:' title: ':fire: StemeDB CRITICAL Alert' title_link: '{{ range .Alerts }}{{ .Annotations.dashboard }}{{ end }}' text: | {{ range .Alerts }} *Alert:* {{ .Labels.alertname }} *Severity:* {{ .Labels.severity }} *Component:* {{ .Labels.component }} *Instance:* {{ .Labels.instance }} {{ .Annotations.summary }} *Description:* {{ .Annotations.description }} *Impact:* {{ .Annotations.impact }} *Action Required:* {{ .Annotations.action }} <{{ .Annotations.runbook }}|View Runbook> | <{{ .Annotations.dashboard }}|View Dashboard> {{ end }} color: 'danger' send_resolved: true # Warning alerts -> #stemedb-alerts-warning (medium visibility) - name: 'slack-warning' slack_configs: - api_url: '' channel: '#stemedb-alerts-warning' username: 'StemeDB Alerts' icon_emoji: ':warning:' title: ':warning: StemeDB Warning Alert' title_link: '{{ range .Alerts }}{{ .Annotations.dashboard }}{{ end }}' text: | {{ range .Alerts }} *Alert:* {{ .Labels.alertname }} *Component:* {{ .Labels.component }} *Instance:* {{ .Labels.instance }} {{ .Annotations.summary }} *Description:* {{ .Annotations.description }} <{{ .Annotations.runbook }}|View Runbook> {{ end }} color: 'warning' send_resolved: true # Info alerts -> #stemedb-alerts-info (low visibility, audit trail) - name: 'slack-info' slack_configs: - api_url: '' channel: '#stemedb-alerts-info' username: 'StemeDB Alerts' icon_emoji: ':information_source:' title: 'StemeDB Info' text: | {{ range .Alerts }} {{ .Annotations.summary }} {{ .Annotations.description }} <{{ .Annotations.runbook }}|Details> {{ end }} color: 'good' send_resolved: false # Slack Integration Setup Guide ## 1. Create Slack App 1. Go to https://api.slack.com/apps 2. Click **Create New App** → **From scratch** 3. Name: `StemeDB Alerts` 4. Select your workspace ## 2. Enable Incoming Webhooks 1. In your app → **Incoming Webhooks** 2. Toggle **Activate Incoming Webhooks** to ON 3. Click **Add New Webhook to Workspace** 4. Select channel (e.g., `#stemedb-alerts-critical`) 5. Click **Allow** 6. Copy webhook URL (starts with `https://hooks.slack.com/services/...`) 7. Repeat for warning and info channels ## 3. Configure Alertmanager Replace placeholders with your webhook URLs: ```yaml api_url: '' ``` Becomes: ```yaml api_url: 'https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXX' ``` ## 4. Test Integration ```bash # Send test message directly to Slack curl -X POST https://hooks.slack.com/services/YOUR/WEBHOOK/URL \ -H 'Content-Type: application/json' \ -d '{ "text": "Test alert from StemeDB monitoring setup", "username": "StemeDB Alerts", "icon_emoji": ":rotating_light:" }' ``` ## 5. Recommended Channel Structure Create three Slack channels: | Channel | Purpose | Members | Notifications | |---------|---------|---------|---------------| | `#stemedb-alerts-critical` | Critical alerts requiring immediate action | On-call engineers, managers | @channel | | `#stemedb-alerts-warning` | Warning alerts for investigation | Engineering team | @here | | `#stemedb-alerts-info` | Info alerts for audit trail | Engineering team, optional | None | ## 6. Channel Topics Set channel topics with useful links: ``` #stemedb-alerts-critical 🔴 Critical StemeDB alerts | On-call: @oncall-engineer | Runbooks: https://docs/runbooks | Dashboards: https://grafana/stemedb ``` ``` #stemedb-alerts-warning 🟡 StemeDB warning alerts | Escalate to #stemedb-alerts-critical if critical | Runbooks: https://docs/runbooks ``` ``` #stemedb-alerts-info â„šī¸ StemeDB informational alerts | No action required | Mute this channel if too noisy ``` ## 7. Slack Workflow Integration (Advanced) For automated incident response, create Slack workflows: ### Critical Alert Workflow Triggered by: Message posted to `#stemedb-alerts-critical` with "CRITICAL" Steps: 1. **Create incident channel** (`#incident-YYYY-MM-DD-HH-MM`) 2. **Add participants** (@oncall-engineer, @manager, @sre-lead) 3. **Post incident template** with runbook links 4. **Start Zoom call** for coordination 5. **Create PagerDuty incident** if not auto-created ### Resolution Workflow Triggered by: Reaction `:white_check_mark:` on critical alert Steps: 1. **Mark incident as resolved** in PagerDuty 2. **Post resolution message** in incident channel 3. **Request post-mortem** (create template doc) 4. **Archive incident channel** after 7 days ## Troubleshooting ### Messages not appearing in Slack 1. **Verify webhook URL:** ```bash curl -X POST https://hooks.slack.com/services/YOUR/WEBHOOK/URL \ -d '{"text":"test"}' ``` 2. **Check Alertmanager logs:** ```bash journalctl -u alertmanager -f | grep slack ``` 3. **Verify app permissions:** - App must have `incoming-webhook` scope - App must be installed in workspace ### Alert formatting broken - Slack uses Markdown syntax (not Go templates) - Test formatting with https://api.slack.com/docs/messages/builder - Use `\n` for line breaks, `*bold*`, `_italic_`, `` `code` `` ### Too many notifications - Mute `#stemedb-alerts-info` channel (low priority) - Increase `group_interval` in Alertmanager (batch more alerts) - Add inhibition rules to suppress related alerts ### Alerts not resolving - Set `send_resolved: true` in Slack config (default: false for info) - Verify Prometheus `for` duration allows time for resolution ## Best Practices 1. **Channel naming**: Use consistent prefix (`stemedb-alerts-*`) 2. **Color coding**: Critical=red, Warning=orange, Info=blue 3. **Actionable messages**: Include runbook links and next steps 4. **Mention on-call**: Use `@oncall-engineer` handle in critical channel 5. **Archive old channels**: Auto-archive incident channels after 7 days 6. **Review periodically**: Check alert volume, tune thresholds 7. **Test regularly**: Send test alerts monthly to verify routing ## Example Alert Flow ``` ┌─────────────────────────────────────────────────────────────┐ │ Prometheus fires "WALDiskNearlyFull" alert │ └─────────────────────────────────────────────────────────────┘ │ â–ŧ ┌─────────────────────────────────────────────────────────────┐ │ Alertmanager routes to 'slack-critical' receiver │ └─────────────────────────────────────────────────────────────┘ │ â–ŧ ┌─────────────────────────────────────────────────────────────┐ │ Message posted to #stemedb-alerts-critical │ │ "đŸ”Ĩ WAL disk usage >90% on prod-node-1" │ │ + Runbook link + Dashboard link │ └─────────────────────────────────────────────────────────────┘ │ â–ŧ ┌─────────────────────────────────────────────────────────────┐ │ On-call engineer clicks runbook │ │ Follows steps: Check disk, run cleanup, increase size │ └─────────────────────────────────────────────────────────────┘ │ â–ŧ ┌─────────────────────────────────────────────────────────────┐ │ Disk usage drops to 75% │ │ Prometheus marks alert as resolved │ └─────────────────────────────────────────────────────────────┘ │ â–ŧ ┌─────────────────────────────────────────────────────────────┐ │ Alertmanager sends resolved notification to Slack │ │ "✅ WAL disk usage now 75% on prod-node-1" │ └─────────────────────────────────────────────────────────────┘ ```