Kubernetes CrashLoopBackOff 是什么？以及如何修复？

3 年 ago

新, 韵

2 minutes

Kubernetes Crashloopbackoff, an illustrated representation. A Pod is in a loop. It tries to run, but it fails, so it goes to a Failed state. If waits a bit to help you debug, then it tries to run again. If the issue is not fixed, we are in a loop. It fails again.

使用Prometheus检测CrashLoopBackOff的方法

要检测集群中的CrashLoopBackOff怎么做？

获取 Kubernetes pods 的信息

$ kubectl get pods
NAME                     READY     STATUS             RESTARTS   AGE
flask-7996469c47-d7zl2   1/1       Running            1          77d
flask-7996469c47-tdr2n   1/1       Running            0          77d
nginx-5796d5bc7c-2jdr5   0/1       CrashLoopBackOff   2          1m
nginx-5796d5bc7c-xsl6p   0/1       CrashLoopBackOff   2          1m

RESTARTS 列には、1つ以上の再起動の表示があります。

崩溃循环中断

跑步失败了 le)

A timeline of a CrashloopBackoff. Everytime it fails, the BackoffTime and the Restart Count are increased

CrashloopBackoff的时间线。每次失败，BackoffTime和Restart Count都会增加。

一般而言，出现CrashLoopBackOff错误的常见原因是什么？

状况

設定ミス：設定ファイルのタイプミスのようなものです。

リソースが利用できない：マウントされていないPersistedVolumeのようなもの。

コマンドライン引数が間違っている：見つからないか、正しくないもの。

バグと例外：これは、アプリケーションに特有のもので、何でもありです。

既存のポートをバインドしようとした。
メモリ制限が低すぎるため、コンテナは Out Of Memory で強制終了した。

liveness probesのエラーは、Podが準備完了であることをレポートしていない。

読み取り専用のファイルシステム、または一般的なパーミッションが不足している。

CrashLoopBackOff状态的调试、故障排除和修复方法

检查Pod日志。

检查事件。

检查部署情况。

查看Pod描述 – kubectl describe pod

查看kubectl描述的pod

$ kubectl describe pod the-pod-name
Name:         the-pod-name
Namespace:    default
Priority:     0
…
State:          Waiting
Reason:       CrashLoopBackOff
Last State:     Terminated
Reason:       Error
…
Warning  BackOff                1m (x5 over 1m)   kubelet, ip-10-0-9-132.us-east-2.compute.internal  Back-off restarting failed container
…

最後の終了の理由は “Error”。

kubectl描述pod

…

…
Warning  BackOff                1m (x5 over 1m)   kubelet, ip-10-0-9-132.us-east-2.compute.internal  Back-off restarting failed container
…

重新启动失败的容器时请稍后再试

查看日志 – kubectl logs

kubectl logs mypod

kubectl logs mypod -c mycontainer

查看事件- kubectl获取事件信息

kubectl get events

kubectl get events --field-selector involvedObject.name=mypod

描述Pod

描述Pod
Pod的描述

确认部署 – kubectl描述部署

kubectl describe deployment mydeployment

把一切整合起来

Debugging a Crashloopbackoff. It shows three terminals with the relationship between several debug commands.

调试CrashloopBackoff。这表示有三个终端与一些调试命令相关。

在Prometheus中检测CrashLoopBackOff的方法

用於雲端監控的Prometheus。

状态Kube状态指标

kube_pod_container_status_waiting_reason{reason="CrashLoopBackOff"} == 1

PromQL example of CrashLoopBackOff detection based on pod status waiting.

rate(kube_pod_container_status_restarts_total[5m]) > 0

PromQL example of CrashLoopBackOff detection based on restart rate

请注意：

Correlation between restarts and crashloopbackoff. Not all restarts are caused by a crashloopbackoff.

应该在每次CrashLoopBackOff周期结束时重新启动（1），但也可能会出现与CrashLoopBackOff无关的重新启动（2）。

- alert: CrashLoopBackOffAlert
expr: rate(kube_pod_container_status_restarts_total[5m]) > 0
for: 10m
labels:
severity: warning
annotations:
summary: Pod is in CrashLoopBackOff state
description: Pod {{ $labels.pod }} in {{ $labels.namespace }} has a container {{ $labels.container }} which is causing a CrashLoopBackOff

总结

kubectl的翻译

使用Sysdig Monitor加快CrashLoopBackOff的调试速度。

How to debug a crashloopbackoff with Sysdig Monitor Advisor

使用Sysdig Monitor Advisor调试崩溃循环回退的方法