Skip to content

Write Ahead Log

Disk Watermark

txt
flood stage disk watermark [95%] exceeded on ...

Launch Params

bash
--storage.tsdb.retention.time=7d
--storage.tsdb.retention.size=50GB
--storage.tsdb.wal-compression
  • WAL 目录不受 tsdb.retention.size 控制
  • WAL 目录会定期自动压缩到 block
  • 重启会强制压缩 WAL 数据
  • --storage.tsdb.wal-compression
    • This flag was introduced in 2.11.0 and enabled by default in 2.20.0. Note that once enabled, downgrading Prometheus to a version below 2.11.0 will require deleting the WAL.

Debug

Check WAL logs

bash
kubectl logs xxx | grep -i wal

Check WAL compact logs

bash
kubectl logs xxx | grep -i compact

Safely Clean

K8s

bash
## scale 0
kubectl scale deployment prometheus --replicas=0

## enter container
kubectl exec -it xxx -- /bin/sh

## delete 会丢失一部分数据
rm -rf /data/wal/*

## scale back
kubectl scale deployment prometheus --replicas=1

systemd

bash
## 1. 优雅停止 Prometheus
systemctl stop prometheus   # 如果使用 systemd

## 2. 等待确认进程完全停止
ps aux | grep prometheus

## 3. 备份当前 WAL 目录(可选)
cp -r /prometheus/wal /prometheus/wal.backup

## 4. 清理 WAL 目录
rm -rf /prometheus/wal/*

## 5. 重启 Prometheus
systemctl start prometheus

NFS Problem

WARNING

Non-POSIX compliant filesystems are not supported for Prometheus' local storage as unrecoverable corruptions may happen. NFS filesystems (including AWS's EFS) are not supported. NFS could be POSIX-compliant, but most implementations are not. It is strongly recommended to use a local filesystem for reliability.

Reference