Search for [troubleshooting]

Filter Order

Fluent Bit Troubleshooting

Wrong [FILTER] Name modify Match kube.* Rename message log Rename msg log [FILTER] Name grep Match kube.* Exclude $log ^(GET /healthz|/ready|/health|/live|/ping)$ [FILTER] Name kube...

troubleshooting

filter

fluent-bit

Kube Tag Prefix

Fluent Bit Troubleshooting

Error incoming record tag (kube.containers) is shorter than kube_tag_prefix value (kube.var.log.containers.), skip filter Reproduce Set Tag to kube.foobar will trigger error. [INPUT] Name tail Path /var/log/containers/*.log multi...

troubleshooting

fluent-bit

Too Many Open Files

Fluent Bit Troubleshooting

https://github.com/fluent/fluent-bit/issues/1777 @aderuwe It might be more related to inotify mechanism - as by default (e.g. when using the default Docker image) this mechanism is used for tailing files in in_tail plugin: https://linux.die.net/man/2/inotify_...

troubleshooting

tail

ulimit

inotify

fluent-bit

Large Metrics

Istio Troubleshooting

Sometimes the metrics exposed by Istio can be very large, which may cause vmagent to fail when scraping them. There are two solutions: Increase vmagent scrape size. Reduce metics size.

troubleshooting

metrics

istio

OTEL Jaeger Exporter

Jaeger Troubleshooting

Background Otel Collector remove native jaeger exporter port: 14250. Otel Collector use new otlp/jaeger exporter. Jaeger support otlp since v1.35.0. Connect Error Otel Collector can not connect to jaeger collector. 2025-02-19T08:56:49.561Z info i...

troubleshooting

otel

jeager

Ephemeral Storage

K8s Troubleshooting

Errors Warning FailedScheduling 21m (x3 over 22m) default-scheduler 0/18 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/disk-pressure: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 6 Insuffici...

ephemeral

k8s

storage

troubleshooting

Horizontal Pod Autoscaler Version Not Found

K8s Troubleshooting

Errors helm upgrade foo foo.tgz -f values.yaml Error: UPGRADE FAILED: resource mapping not found for name: "foo" namespace: "" from "": no matches for kind "HorizontalPodAutoscaler" in version "autoscaling/v2beta1" ensure CRDs are installed first Key Errors...

troubleshooting

v2beta1

HorizontalPodAutoscaler

helm

autoscaleing

hpa

k8s

ImageGCFailed

K8s Troubleshooting

Errors Source LogEvents: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ImageGCFailed 4m14s (x15704 over 133d) kubelet, b...

k8s

troubleshooting

image

Image Pull Stuck

K8s Troubleshooting

Kube Event Log Stuck with the following log. Normal Pulling 30m kubelet Pulling image "registry.example.com/test/code-go:release-22971" Get Problem Node kubectl get po -o wide | grep foo kubectl get node -o wide Describe Node Status Check node event...

troubleshooting

image

k8s

Kubernetes CIDR

K8s Troubleshooting

Background Sealos 安装的 k8s 的 CIDR 是 100.64.xxx.xxx，会导致 keycloak 异常（什么异常?） SonyFlake 的 isPrivateIPv4 没有判断 100.xxx。 100.64.0.0/10 是 RFC 6598 定义的“共享地址”，不是 RFC1918 私有地址。 Basic Concepts CIDR (Classless Inter-Domain Routing): A method for representing IP address...

troubleshooting

sonyflake

sealos

cidr

k8s

kube-state-metrics

K8s Troubleshooting

Errors kube-state-metricsE0624 16:46:27.590502 1 metrics_handler.go:227] "Failed to write metrics" ^CE0624 16:47:27.573964 1 metrics_handler.go:227] "Failed to write metrics" err="failed to write help text: write tcp 100.81.9.63:8080->100.81.9.62:...

troubleshooting

kube-state-metrics

k8s

kubeadm Join Cluster Error

K8s Troubleshooting

背景在 master 机器中使用 one-ansible 安装 k8s 集群。最终发现原因是 kubernetes.example.local 这个域名连不上，在新的节点上加上 host 重新执行 kubeadm join 就好了。错误日志 kubeadm join kubernetes.example.local:6443 --token molqef.zcb8h6z6i59wxdvf --discovery-token-ca-cert-hash sha256:c146d7fb88ecc7e41389d7...

troubleshooting

kubeadm

k8s

NFS Mount Failed

K8s Troubleshooting

Errors SourceEvents: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedMount 9m39s (x19102 over 26d) kubelet MountVolume.SetUp failed for volume "pvc-510bc02...

pv

troubleshooting

mount

k8s

Rancher PVC Pending

K8s Troubleshooting

因为 local path storage 在动态创建 helper pod 的时候，没有成功拉到 helper pod 的镜像 Errors PVC Pending code-gitaly-data-code-gitaly-0-2-0 Pending local-path 216d Describe pvc. Type: Normal Reason: ExternalProvisioning Age: 4m32s (x15836 over 2d18h) From: ...

troubleshooting

pvc

rancher

k8s

Helm Secret Number

K8s Troubleshooting

Rander 会 load 所有版本的 Helm Secret，会卡死，要限制 helm 保留的版本。

secret

rancher

k8s

helm

troubleshooting

Database Metrics Doesn't Exist

OpenTelemetry Troubleshooting

Errors Key errors{ "kind": "exporter", "data_type": "logs", "name": "clickhouse", "error": "PrepareContext:code: 81, message: Database metrics doesn't exist", "interval": "14.932404217s" } Source2025-01-08T03:43:01.128Z info internal/retr...

troubleshooting

exporter

clickhouse

opentelemetry

OTEL Collector Installaction

OpenTelemetry Troubleshooting

Environment Kubernetes Require: 1.24+ Actual: 1.18.3 Opentelemetry Collector Chart: 0.109.0 Opentelemetry Collector Contrib: 0.113.0 Internal Traffic Policy Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest: error vali...

troubleshooting

installation

opentelemetry

Cluster Not Ready

Elasticsearch Troubleshooting

You said: Warning Unhealthy 33s (x6045 over 14h) kubelet Readiness probe failed: Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=yellow&timeout=3s" ) Cluster is not yet ready (request params: "wait_for_status=yellow&tim...

elasticsearch

troubleshooting

Disk Usage

Elasticsearch Troubleshooting

Errors [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block] { "type": "server", "timestamp": "2025-02-11T19:04:17,088Z", "level": "WARN", "component": "o.e.c.r.a.DiskThresholdMonitor", "cluster.na...

troubleshooting

nfs

disk

elasticsearch

Shard Recovery

Elasticsearch Troubleshooting

Check State curl http://localhost:9200/_cluster/health?pretty { "cluster_name" : "elasticsearch", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 1191, "active_shards" : 119...

troubleshooting

shard

elasticsearch

Advanced Search

Search Terms

Content Type

Exact Matches

Tag Searches

Date Options

Search Results

42 total results found

Filter Order

Kube Tag Prefix

Too Many Open Files

Large Metrics

OTEL Jaeger Exporter

Ephemeral Storage

Horizontal Pod Autoscaler Version Not Found

ImageGCFailed

Image Pull Stuck

Kubernetes CIDR

kube-state-metrics

kubeadm Join Cluster Error

NFS Mount Failed

Rancher PVC Pending

Helm Secret Number

Database Metrics Doesn't Exist

OTEL Collector Installaction

Cluster Not Ready

Disk Usage

Shard Recovery

Updated after

Updated before

Created after

Created before