# Slow WAL Fsync ## Severity: WARNING ## Alert Rule **Alert:** `WALFsyncSlow` **Trigger:** WAL fsync p99 latency > 100ms **Duration:** 10m ## Symptom - Metrics show `stemedb_wal_fsync_duration_seconds{quantile="0.99"} > 0.1` - API write latency increasing (p99 > 200ms) - Logs may show "slow fsync" warnings - Ingestion throughput degrading ## Impact **User Impact:** - Slower API responses for write operations - Reduced ingestion throughput (assertions/sec) - Client timeouts if latency exceeds configured limits **System Impact:** - Write pipeline backpressure - Increased memory usage (buffered writes) - Risk of WAL segment rotation delays ## Investigation Steps ### 1. Check Fsync Latency Metrics ```bash # Current p50, p90, p99 latency curl -s http://localhost:18180/metrics | grep wal_fsync_duration_seconds # Expected output: # stemedb_wal_fsync_duration_seconds{quantile="0.5"} 0.001 # stemedb_wal_fsync_duration_seconds{quantile="0.9"} 0.01 # stemedb_wal_fsync_duration_seconds{quantile="0.99"} 0.15 # ← HIGH ``` ### 2. Check Disk I/O Utilization ```bash # Disk stats iostat -x 2 10 # Look for: # - High %util on WAL partition (>80% sustained) # - High await (>50ms indicates congestion) ``` ### 3. Check for Competing I/O ```bash # Processes doing disk I/O iotop -o -b -n 5 # Look for other processes writing to same disk ``` ### 4. Check Disk Write Cache ```bash # Verify write cache is enabled (should be for durability) hdparm -W /dev/sda # write-caching = 1 (on) ``` ### 5. Test Raw Disk Performance ```bash # Benchmark fsync performance cd /var/lib/stemedb/wal time sh -c "dd if=/dev/zero of=test.dat bs=4k count=10000 && sync" rm test.dat # Expected: <5 seconds on SSD, <15 seconds on spinning disk ``` ## Resolution ### If Disk I/O is Saturated **1. Identify competing workload:** ```bash # Top I/O consumers iotop -o -b -n 1 | head -20 ``` **2. Reduce competing I/O:** ```bash # Pause non-critical I/O (backups, log compression, etc.) systemctl stop backup.service systemctl stop log-archiver.timer ``` **3. Monitor improvement:** ```bash watch -n 5 'curl -s http://localhost:18180/metrics | grep wal_fsync_duration' ``` ### If Disk is Slow (Hardware Issue) **1. Check SMART status:** ```bash smartctl -a /dev/sda | grep -E "(Seek_Error|Reallocated_Sector)" ``` **2. If disk is failing, prepare for migration:** ```bash # Mark node for draining curl -X POST http://localhost:18180/v1/admin/node/drain # Schedule maintenance window for disk replacement ``` **3. Temporarily reduce write rate:** ```bash # Apply rate limit to reduce I/O pressure curl -X POST http://localhost:18180/v1/admin/rate-limit \ -d '{"max_writes_per_sec": 500}' ``` ### If Filesystem is Misconfigured **1. Check mount options:** ```bash mount | grep /var/lib/stemedb/wal ``` **Expected:** `data=ordered` or `data=writeback` (not `data=journal` which is slower) **2. If using wrong mount options, remount:** ```bash # Edit /etc/fstab /dev/sdb1 /var/lib/stemedb/wal ext4 data=ordered,noatime 0 2 # Remount (requires downtime) systemctl stop stemedb-api umount /var/lib/stemedb/wal mount /var/lib/stemedb/wal systemctl start stemedb-api ``` ### If Group Commit Not Optimal **1. Tune group commit settings:** Edit `/etc/stemedb/api.toml`: ```toml [wal] group_commit_max_wait_ms = 10 # Increase batching window group_commit_max_bytes = 1048576 # 1MB batches ``` **2. Restart service:** ```bash systemctl restart stemedb-api ``` **3. Monitor fsync frequency:** ```bash # Fsync count should decrease with larger batches curl -s http://localhost:18180/metrics | grep wal_fsync_total ``` ### If Cloud Provider Throttling **1. Check for IOPS throttling (AWS EBS example):** ```bash # CloudWatch metrics aws cloudwatch get-metric-statistics \ --namespace AWS/EBS \ --metric-name VolumeQueueLength \ --dimensions Name=VolumeId,Value=vol-abc123 \ --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \ --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \ --period 300 \ --statistics Average ``` **2. Increase provisioned IOPS:** ```bash # Modify EBS volume (AWS example) aws ec2 modify-volume --volume-id vol-abc123 \ --iops 3000 --volume-type gp3 ``` **3. Wait for optimization to complete:** ```bash watch aws ec2 describe-volumes-modifications \ --volume-ids vol-abc123 \ --query 'VolumesModifications[0].ModificationState' ``` ## Prevention ### Monitoring **1. Alert on sustained high latency:** ```yaml - alert: WALFsyncDegrading expr: stemedb_wal_fsync_duration_seconds{quantile="0.99"} > 0.05 for: 15m annotations: summary: "WAL fsync p99 latency degrading (>50ms)" ``` **2. Monitor disk queue depth:** ```yaml - alert: DiskQueueDepthHigh expr: node_disk_io_weighted_seconds_total > 100 for: 10m annotations: summary: "Disk queue depth indicates congestion" ``` ### Capacity Planning **1. Use dedicated disk for WAL:** - NVMe SSD with capacitor-backed cache - Separate physical disk from KV store - Provisioned IOPS (cloud deployments) **2. Benchmark before production:** ```bash # Test fsync performance under load fio --name=fsync-test --rw=write --bs=4k --size=1G \ --fsync=1 --numjobs=4 --runtime=60 \ --filename=/var/lib/stemedb/wal/test.dat ``` Expected: p99 latency <10ms on NVMe, <50ms on SATA SSD. **3. Right-size provisioned IOPS (cloud):** ``` IOPS needed = (writes_per_sec * 1.5) # 1.5x for overhead Example: - 1000 writes/sec → 1500 IOPS minimum - Use 3000 IOPS for headroom (2x) ``` ### Operational Best Practices **1. Regular disk health checks:** ```bash # Weekly SMART check smartctl -a /dev/sda | grep -E "(PASSED|FAILED)" # Alert on pending sectors smartctl -a /dev/sda | awk '/Current_Pending_Sector/ {if($10>0) print "WARNING: Pending sectors detected"}' ``` **2. Monitor filesystem age:** ```bash # Check filesystem age (ext4) tune2fs -l /dev/sdb1 | grep "Filesystem created" # Consider reformatting if >2 years old (fragmentation) ``` **3. Test I/O performance quarterly:** ```bash # Benchmark and compare to baseline fio --name=seq-write --rw=write --bs=1M --size=10G \ --filename=/var/lib/stemedb/wal/bench.dat \ --output-format=json > /tmp/fio-$(date +%Y%m%d).json ``` ## Escalation **Escalate if:** - Fsync latency exceeds 200ms for >30 minutes - Disk errors appear in logs (hardware failure) - Tuning and optimization has no effect - Cloud provider throttling cannot be resolved **Escalation path:** 1. **Primary on-call:** Storage SRE 2. **Secondary:** Infrastructure engineer 3. **Final escalation:** Cloud vendor TAM (if cloud-related) ## References - **Dashboard:** [StemeDB WAL Performance](http://grafana.example.com/d/stemedb-wal) - **Related alerts:** `WALFsyncFailure`, `HighStorageErrorRate`, `DiskUtilizationHigh` - **Metrics:** - `stemedb_wal_fsync_duration_seconds` (latency distribution) - `stemedb_wal_fsync_total` (fsync count) - `node_disk_io_time_weighted_seconds_total` (disk queue time) - **Runbooks:** `wal-fsync-failure.md`, `disk-full.md`