Skip to main content
Advanced Search
Search Terms
Content Type

Exact Matches
Tag Searches
Date Options
Updated after
Updated before
Created after
Created before

Search Results

41 total results found

Filter Order

Fluent Bit Troubleshooting

Wrong [FILTER] Name modify Match kube.* Rename message log Rename msg log [FILTER] Name grep Match kube.* Exclude $log ^(GET /healthz|/ready|/health|/live|/ping)$ [FILTER] Name kube...

troubleshooting
filter
fluent-bit

Kube Tag Prefix

Fluent Bit Troubleshooting

Error incoming record tag (kube.containers) is shorter than kube_tag_prefix value (kube.var.log.containers.), skip filter Reproduce Set Tag to kube.foobar will trigger error. [INPUT] Name tail Path /var/log/containers/*.log multi...

troubleshooting
fluent-bit

Too Many Open Files

Fluent Bit Troubleshooting

https://github.com/fluent/fluent-bit/issues/1777 @aderuwe It might be more related to inotify mechanism - as by default (e.g. when using the default Docker image) this mechanism is used for tailing files in in_tail plugin: https://linux.die.net/man/2/inotify_...

troubleshooting
tail
ulimit
inotify
fluent-bit

Large Metrics

Istio Troubleshooting

Sometimes the metrics exposed by Istio can be very large, which may cause vmagent to fail when scraping them. There are two solutions: Increase vmagent scrape size. Reduce metics size.

troubleshooting
metrics
istio

OTEL Jaeger Exporter

Jaeger Troubleshooting

Background Otel Collector remove native jaeger exporter port: 14250. Otel Collector use new otlp/jaeger exporter. Jaeger support otlp since v1.35.0. Connect Error Otel Collector can not connect to jaeger collector. 2025-02-19T08:56:49.561Z info i...

troubleshooting
otel
jeager

Ephemeral Storage

K8s Troubleshooting

Errors Warning FailedScheduling 21m (x3 over 22m) default-scheduler 0/18 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 6 Insuffici...

ephemeral
k8s
storage
troubleshooting

Horizontal Pod Autoscaler Version Not Found

K8s Troubleshooting

Errors helm upgrade foo foo.tgz -f values.yaml Error: UPGRADE FAILED: resource mapping not found for name: "foo" namespace: "" from "": no matches for kind "HorizontalPodAutoscaler" in version "autoscaling/v2beta1" ensure CRDs are installed first Key Errors...

troubleshooting
v2beta1
HorizontalPodAutoscaler
helm
autoscaleing
hpa
k8s

ImageGCFailed

K8s Troubleshooting

Errors Source LogEvents: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ImageGCFailed 4m14s (x15704 over 133d) kubelet, b...

k8s
troubleshooting
image

Image Pull Stuck

K8s Troubleshooting

Kube Event Log Stuck with the following log. Normal Pulling 30m kubelet Pulling image "registry.example.com/test/code-go:release-22971" Get Problem Node kubectl get po -o wide | grep foo kubectl get node -o wide Describe Node Status Check node event...

troubleshooting
image
k8s

Kubernetes CIDR

K8s Troubleshooting

Background Sealos 安装的 k8s 的 CIDR 是 100.64.xxx.xxx,会导致 keycloak 异常(什么异常?) SonyFlake 的 isPrivateIPv4 没有判断 100.xxx。 100.64.0.0/10 是 RFC 6598 定义的“共享地址”,不是 RFC1918 私有地址。 Basic Concepts CIDR (Classless Inter-Domain Routing): A method for representing IP address...

troubleshooting
sonyflake
sealos
cidr
k8s

kube-state-metrics

K8s Troubleshooting

Errors kube-state-metricsE0624 16:46:27.590502 1 metrics_handler.go:227] "Failed to write metrics" ^CE0624 16:47:27.573964 1 metrics_handler.go:227] "Failed to write metrics" err="failed to write help text: write tcp 100.81.9.63:8080->100.81.9.62:...

troubleshooting
kube-state-metrics
k8s

kubeadm Join Cluster Error

K8s Troubleshooting

背景 在 master 机器中使用 one-ansible 安装 k8s 集群。最终发现原因是 kubernetes.example.local 这个域名连不上,在新的节点上加上 host 重新执行 kubeadm join 就好了。 错误日志 kubeadm join kubernetes.example.local:6443 --token molqef.zcb8h6z6i59wxdvf --discovery-token-ca-cert-hash sha256:c146d7fb88ecc7e41389d7...

troubleshooting
kubeadm
k8s

NFS Mount Failed

K8s Troubleshooting

Errors SourceEvents: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedMount 9m39s (x19102 over 26d) kubelet MountVolume.SetUp failed for volume "pvc-510bc02...

pv
troubleshooting
mount
k8s

Rancher PVC Pending

K8s Troubleshooting

因为 local path storage 在动态创建 helper pod 的时候,没有成功拉到 helper pod 的镜像 Errors PVC Pending code-gitaly-data-code-gitaly-0-2-0 Pending local-path 216d Describe pvc. Type: Normal Reason: ExternalProvisioning Age: 4m32s (x15836 over 2d18h) From: ...

troubleshooting
pvc
rancher
k8s

Helm Secret Number

K8s Troubleshooting

Rander 会 load 所有版本的 Helm Secret,会卡死,要限制 helm 保留的版本。

secret
rancher
k8s
helm
troubleshooting

Database Metrics Doesn't Exist

OpenTelemetry Troubleshooting

Errors Key errors{ "kind": "exporter", "data_type": "logs", "name": "clickhouse", "error": "PrepareContext:code: 81, message: Database metrics doesn't exist", "interval": "14.932404217s" } Source2025-01-08T03:43:01.128Z info internal/retr...

troubleshooting
exporter
clickhouse
opentelemetry

OTEL Collector Installaction

OpenTelemetry Troubleshooting

Environment Kubernetes Require: 1.24+ Actual: 1.18.3 Opentelemetry Collector Chart: 0.109.0 Opentelemetry Collector Contrib: 0.113.0 Internal Traffic Policy Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest: error vali...

troubleshooting
installation
opentelemetry

Cluster Not Ready

Elasticsearch Troubleshooting

You said: Warning Unhealthy 33s (x6045 over 14h) kubelet Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=yellow&timeout=3s" ) Cluster is not yet ready (request params: "wait_for_status=yellow&tim...

elasticsearch
troubleshooting

Disk Usage

Elasticsearch Troubleshooting

Errors [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block] { "type": "server", "timestamp": "2025-02-11T19:04:17,088Z", "level": "WARN", "component": "o.e.c.r.a.DiskThresholdMonitor", "cluster.na...

troubleshooting
nfs
disk
elasticsearch

Shard Recovery

Elasticsearch Troubleshooting

Check State curl http://localhost:9200/_cluster/health?pretty { "cluster_name" : "elasticsearch", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 1191, "active_shards" : 119...

troubleshooting
shard
elasticsearch