Write Ahead Log
Disk Watermark
txt
flood stage disk watermark [95%] exceeded on ...
Launch Params
bash
--storage.tsdb.retention.time=7d
--storage.tsdb.retention.size=50GB
--storage.tsdb.wal-compression
- WAL 目录不受
tsdb.retention.size
控制 - WAL 目录会定期自动压缩到 block
- 重启会强制压缩 WAL 数据
--storage.tsdb.wal-compression
- This flag was introduced in 2.11.0 and enabled by default in 2.20.0. Note that once enabled, downgrading Prometheus to a version below 2.11.0 will require deleting the WAL.
Debug
Check WAL logs
bash
kubectl logs xxx | grep -i wal
Check WAL compact logs
bash
kubectl logs xxx | grep -i compact
Safely Clean
K8s
bash
## scale 0
kubectl scale deployment prometheus --replicas=0
## enter container
kubectl exec -it xxx -- /bin/sh
## delete 会丢失一部分数据
rm -rf /data/wal/*
## scale back
kubectl scale deployment prometheus --replicas=1
systemd
bash
## 1. 优雅停止 Prometheus
systemctl stop prometheus # 如果使用 systemd
## 2. 等待确认进程完全停止
ps aux | grep prometheus
## 3. 备份当前 WAL 目录(可选)
cp -r /prometheus/wal /prometheus/wal.backup
## 4. 清理 WAL 目录
rm -rf /prometheus/wal/*
## 5. 重启 Prometheus
systemctl start prometheus
NFS Problem
WARNING
Non-POSIX compliant filesystems are not supported for Prometheus' local storage as unrecoverable corruptions may happen. NFS filesystems (including AWS's EFS) are not supported. NFS could be POSIX-compliant, but most implementations are not. It is strongly recommended to use a local filesystem for reliability.