学习Kubernetes – 第五部分:健康检查

书籍的目录

    1. 前次复习

 

    1. 健康检查功能是什么

健康检查的类型
处理器
实践

关于术语
后记

之前的复习

上次学习Kubernetes的第4节是关于清单文件和控制器的介绍。我尝试描述了YAML格式的一些特点以及对控制器类型的理解。我也有一些故障排除的经验,感觉很好。

健康检查功能是什么?

03.PNG

健康检查的类型

kubelet的健康检查会使用两种类型的探针进行检查。

存活探针 (Liveness Probe)

检查容器是否正在运行,例如,应用程序是否无响应。
如果没有Liveness Probe的描述在清单中,将不会检查并强制终止容器的运行。

健康检查探针(Readiness Probe)的准备状态

检查容器应用程序是否能够接受请求。例如,如果在初始加载过程中变得过重而无法返回请求,或者没有响应,将不会转发请求。

如果在清单中没有给出Readiness Probe的描述,将不会进行检查,并且将继续转发请求。

处理程序

传统的负载均衡器在进行健康检查时,通常会通过HTTP检查容器是否存活,而同样,容器中的Pod也需要实现与探针相对应的处理程序。
处理程序有三种类型。

    • exec

 

    • tcpSocket

 

    httpGet

执行

在容器内执行命令。当EXIT代码为0时,诊断成功。

livenessProbe: 
 exec: 
  command: 
  - cat: 
  - /tmp/healthy
 initialDelaySeconds: 3 #初回起動から探査開始までの猶予時間(秒)
 periodSeconds: 5 #チェック間隔(秒)
 timeoutSeconds: 2 #タイムアウトを指定(秒)デフォは1秒
 successThreshold: 2 #Probeが成功したと判断する最小回数を指定 デフォは1回
 failureThreshold: 5 #Probeが失敗したと判断する最小回数を指定 デフォは3回

TCP套接字

如果能够通过指定的TCP端口号建立连接,那么说明诊断成功。

readinessProbe: 
 tcpSocket: 
  port: 80 

获取HTTP

定期的に在指定的路径和端口进行HTTP GET请求,若HTTP状态码为200至400之间,则判断为诊断成功。

readinessProbe: 
 httpGet: 
  path: /healthz
  port: 3000
 initialDelaySeconds: 3   #初回起動から探査開始までの猶予時間(秒)
 periodSeconds: 5         #チェック間隔(秒)
 timeoutSeconds: 2        #タイムアウトを指定(秒)デフォは1秒
 successThreshold: 2      #Probeが成功したと判断する最小回数を指定 デフォは1回
 failureThreshold: 5      #Probeが失敗したと判断する最小回数を指定 デフォは3回

注意事项:由于Pod是在运行节点上的kubelet上执行的,所以目标是Pod上的容器。如果节点由于硬件故障而停止运行,则kubelet也会停止,因此这不是解决节点故障的方法。如果不知道在哪里以及何时发生了故障,就会产生误解,例如“哦,节点故障没问题,有kubelet在。”(这是对我说的)

实践

实践流程包括LivenessProbe和readinessProbe的行为确认。
1. 创建和应用清单文件。
2. 删除索引文件以引发错误。

创建和应用清单(活性探针)

apiVersion: v1
kind: Pod
metadata:
  name: liveness-check
spec:
  containers:
  - image: nginx
    name: nginx
    livenessProbe:
      httpGet:
        port: 80
        path: /
      failureThreshold: 3
      periodSeconds: 3
実行
D:\Repository\kubernetes\vagrant-kubernetes>kubectl apply -f ./LivenessProbe_test.yml
pod/liveness-check created

確認
D:\Repository\kubernetes\vagrant-kubernetes>kubectl get po
NAME             READY   STATUS    RESTARTS   AGE
liveness-check   1/1     Running   0          71s

詳細確認
D:\Repository\kubernetes\vagrant-kubernetes>kubectl describe po liveness-check
Name:         liveness-check
Namespace:    default
Priority:     0
Node:         node2/172.16.20.13
Start Time:   Mon, 08 Feb 2021 12:55:53 +0900
Labels:       <none>
Annotations:  <none>
Status:       Running
IP:           10.244.2.27
IPs:          <none>
Containers:
  nginx:
    Container ID:   docker://fbecc20c37432146df765f26d7aa6c0428483ad7ec4c708d51c1027ef477f650
    Image:          nginx
    Image ID:       docker-pullable://nginx@sha256:10b8cc432d56da8b61b070f4c7d2543a9ed17c2b23010b43af434fd40e2ca4aa
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Mon, 08 Feb 2021 12:55:58 +0900
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:80/ delay=0s timeout=1s period=3s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-4vlhd (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  default-token-4vlhd:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-4vlhd
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  2m25s  default-scheduler  Successfully assigned default/liveness-check to node2
  Normal  Pulling    2m23s  kubelet            Pulling image "nginx"
  Normal  Pulled     2m19s  kubelet            Successfully pulled image "nginx"
  Normal  Created    2m19s  kubelet            Created container nginx
  Normal  Started    2m19s  kubelet            Started container nginx

只需要寫一個中文版本:
确认的部分是,

    Liveness:       http-get http://:80/ delay=0s timeout=1s period=3s #success=1 #failure=3

删除索引文件导致错误发生

削除
D:\Repository\kubernetes\vagrant-kubernetes>kubectl exec liveness-check -- rm /usr/share/nginx/html/index.html

詳細確認
D:\Repository\kubernetes\vagrant-kubernetes> kubectl describe po liveness-check
Name:         liveness-check
Namespace:    default
Priority:     0
Node:         node2/172.16.20.13
Start Time:   Mon, 08 Feb 2021 12:55:53 +0900
Labels:       <none>
Annotations:  <none>
Status:       Running
IP:           10.244.2.27
IPs:          <none>
Containers:
  nginx:
    Container ID:   docker://d8de55c8ab78df6978f135551c8871aef517d42ea4b8f0e7c9015553e260e997
    Image:          nginx
    Image ID:       docker-pullable://nginx@sha256:10b8cc432d56da8b61b070f4c7d2543a9ed17c2b23010b43af434fd40e2ca4aa
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Mon, 08 Feb 2021 13:02:28 +0900
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 08 Feb 2021 12:55:58 +0900
      Finished:     Mon, 08 Feb 2021 13:02:24 +0900
    Ready:          True
    Restart Count:  1
    Liveness:       http-get http://:80/ delay=0s timeout=1s period=3s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-4vlhd (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  default-token-4vlhd:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-4vlhd
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  6m46s                default-scheduler  Successfully assigned default/liveness-check to node2
  Normal   Pulling    14s (x2 over 6m44s)  kubelet            Pulling image "nginx"
  Warning  Unhealthy  14s (x3 over 20s)    kubelet            Liveness probe failed: HTTP probe failed with statuscode: 403
  Normal   Killing    14s                  kubelet            Container nginx failed liveness probe, will be restarted
  Normal   Pulled     10s (x2 over 6m40s)  kubelet            Successfully pulled image "nginx"
  Normal   Created    10s (x2 over 6m40s)  kubelet            Created container nginx
  Normal   Started    10s (x2 over 6m40s)  kubelet            Started container nginx

你做得很好,已经进行了正确的检测和重新创建。

Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  6m46s                default-scheduler  Successfully assigned default/liveness-check to node2
  Normal   Pulling    14s (x2 over 6m44s)  kubelet            Pulling image "nginx"
  Warning  Unhealthy  14s (x3 over 20s)    kubelet            Liveness probe failed: HTTP probe failed with statuscode: 403
  Normal   Killing    14s                  kubelet            Container nginx failed liveness probe, will be restarted
  Normal   Pulled     10s (x2 over 6m40s)  kubelet            Successfully pulled image "nginx"
  Normal   Created    10s (x2 over 6m40s)  kubelet            Created container nginx
  Normal   Started    10s (x2 over 6m40s)  kubelet            Started container nginx

下一个是readinessProbe。

创建和应用清单(就绪探测)

apiVersion: v1
kind: Pod
metadata:
  name: readiness-check
  labels:
    app: nginx
spec:
  containers:
  - image: nginx
    name: nginx
    readinessProbe:
      httpGet:
        port: 80
        path: /
      failureThreshold: 1
      periodSeconds: 1
apiVersion: v1
kind: Service
metadata:
  name: readiness-check-svc
spec:
  selector:
    app: nginx
  ports:
  - port: 80
ポッド作成
D:\Repository\kubernetes\vagrant-kubernetes>kubectl apply -f ./ReadinessProbe_test.yml
pod/readiness-check created

サービス作成
D:\Repository\kubernetes\vagrant-kubernetes>kubectl apply -f ./ReadinessProbe_svc_test.yml
service/readiness-check-svc created

確認
D:\Repository\kubernetes\vagrant-kubernetes>kubectl get po
NAME              READY   STATUS    RESTARTS   AGE
readiness-check   1/1     Running   0          3m40s

詳細
D:\Repository\kubernetes\vagrant-kubernetes>kubectl describe po readiness-check
Name:         readiness-check
Namespace:    default
Priority:     0
Node:         node2/172.16.20.13
Start Time:   Mon, 08 Feb 2021 13:11:08 +0900
Labels:       app=nginx
Annotations:  <none>
Status:       Running
IP:           10.244.2.28
IPs:          <none>
Containers:
  nginx:
    Container ID:   docker://e780be7f11f1b628971ab205236fb73a9227e21b9812079898d446b05b0d8fe5
    Image:          nginx
    Image ID:       docker-pullable://nginx@sha256:10b8cc432d56da8b61b070f4c7d2543a9ed17c2b23010b43af434fd40e2ca4aa
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Mon, 08 Feb 2021 13:11:13 +0900
    Ready:          True
    Restart Count:  0
    Readiness:      http-get http://:80/ delay=0s timeout=1s period=1s #success=1 #failure=1
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-4vlhd (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  default-token-4vlhd:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-4vlhd
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  3m55s  default-scheduler  Successfully assigned default/readiness-check to node2
  Normal  Pulling    3m54s  kubelet            Pulling image "nginx"
  Normal  Pulled     3m50s  kubelet            Successfully pulled image "nginx"
  Normal  Created    3m50s  kubelet            Created container nginx
  Normal  Started    3m50s  kubelet            Started container nginx

サービス確認
D:\Repository\kubernetes\vagrant-kubernetes>kubectl get svc
NAME                  TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes            ClusterIP   10.32.0.1    <none>        443/TCP   6d17h
readiness-check-svc   ClusterIP   10.32.0.6    <none>        80/TCP    4m30s

詳細
D:\Repository\kubernetes\vagrant-kubernetes>kubectl describe svc readiness-check-svc
Name:              readiness-check-svc
Namespace:         default
Labels:            <none>
Annotations:       <none>
Selector:          app=nginx
Type:              ClusterIP
IP Families:       <none>
IP:                10.32.0.6
IPs:               <none>
Port:              <unset>  80/TCP
TargetPort:        80/TCP
Endpoints:         10.244.2.28:80
Session Affinity:  None
Events:            <none>

別ノードからアクセス
D:\Repository\kubernetes\vagrant-kubernetes>vagrant ssh node1
Welcome to Ubuntu 18.04.5 LTS (GNU/Linux 4.15.0-135-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Mon Feb  8 04:41:02 UTC 2021

  System load:     0.06              IP address for enp0s3:  10.0.2.15
  Usage of /:      6.3% of 38.71GB   IP address for enp0s8:  172.16.20.12
  Memory usage:    27%               IP address for enp0s9:  192.168.11.41
  Swap usage:      0%                IP address for docker0: 172.17.0.1
  Processes:       105               IP address for cni0:    10.244.1.1
  Users logged in: 0

 * Introducing self-healing high availability clusters in MicroK8s.
   Simple, hardened, Kubernetes for production, from RaspberryPi to DC.

     https://microk8s.io/high-availability

 * Canonical Livepatch is available for installation.
   - Reduce system reboots and improve kernel security. Activate at:
     https://ubuntu.com/livepatch

7 packages can be updated.
0 of these updates are security updates.
To see these additional updates run: apt list --upgradable

New release '20.04.2 LTS' available.
Run 'do-release-upgrade' to upgrade to it.


Last login: Mon Feb  8 04:36:38 2021 from 10.0.2.2
vagrant@node1:~$ wget -O - -T 1 10.32.0.6
--2021-02-08 04:41:06--  http://10.32.0.6/
Connecting to 10.32.0.6:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 612 [text/html]
Saving to: ‘STDOUT’

-                               0%[                                                  ]       0  --.-KB/s               <!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
-                             100%[=================================================>]     612  --.-KB/s    in 0s

2021-02-08 04:41:06 (113 MB/s) - written to stdout [612/612]

vagrant@node1:~$

删除index文件导致错误发生。

コピー
D:\Repository\kubernetes\vagrant-kubernetes>kubectl exec readiness-check -- cp -r /usr/share/nginx/html/index.html /usr/share/nginx/html/index_bk.html

削除
D:\Repository\kubernetes\vagrant-kubernetes>kubectl exec readiness-check -- rm /usr/share/nginx/html/index.html

確認
D:\Repository\kubernetes\vagrant-kubernetes>kubectl describe po readiness-check
Name:         readiness-check
Namespace:    default
Priority:     0
Node:         node2/172.16.20.13
Start Time:   Mon, 08 Feb 2021 13:40:51 +0900
Labels:       app=nginx
Annotations:  <none>
Status:       Running
IP:           10.244.2.29
IPs:          <none>
Containers:
  nginx:
    Container ID:   docker://6162644eb97f23bfed4e6341e8d7752c736e9f21f214a0af86f24532c627533f
    Image:          nginx
    Image ID:       docker-pullable://nginx@sha256:10b8cc432d56da8b61b070f4c7d2543a9ed17c2b23010b43af434fd40e2ca4aa
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Mon, 08 Feb 2021 13:40:56 +0900
    Ready:          False
    Restart Count:  0
    Readiness:      http-get http://:80/ delay=0s timeout=1s period=1s #success=1 #failure=1
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-4vlhd (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  default-token-4vlhd:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-4vlhd
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  4m3s               default-scheduler  Successfully assigned default/readiness-check to node2
  Normal   Pulling    4m2s               kubelet            Pulling image "nginx"
  Normal   Pulled     3m58s              kubelet            Successfully pulled image "nginx"
  Normal   Created    3m58s              kubelet            Created container nginx
  Normal   Started    3m58s              kubelet            Started container nginx
  Warning  Unhealthy  1s (x18 over 18s)  kubelet            Readiness probe failed: HTTP probe failed with statuscode: 403

接続確認
vagrant@node1:~$ wget -O - -T 1 10.32.0.6
--2021-02-08 04:47:14--  http://10.32.0.6/
Connecting to 10.32.0.6:80... failed: Connection timed out.
Retrying.

由于已经死了所以无法连接嘛。

D:\Repository\kubernetes\vagrant-kubernetes>kubectl exec readiness-check -- cp -r /usr/share/nginx/html/index_bk.html /usr/share/nginx/html/index.html

他のノードから接続してみる。
vagrant@node1:~$ wget -O - -T 1 10.32.0.6
--2021-02-08 04:58:16--  http://10.32.0.6/
Connecting to 10.32.0.6:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 612 [text/html]
Saving to: ‘STDOUT’

-                               0%[                                                  ]       0  --.-KB/s               <!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
-                             100%[=================================================>]     612  --.-KB/s    in 0s

2021-02-08 04:58:16 (51.6 MB/s) - written to stdout [612/612]

vagrant@node1:~$

这次为了确认操作是否正常,我进行了断连确认。我认为在实际运营中进行断连确认是不可行的。我认为应该通过状态等方式进行确认。毕竟系统是正常运行的。

关于术语

・探测
关于探索的事情。

作者留言

我學習了需要充分理解的健康檢查。由於未設定外部IP,所以我需要連接到節點並進行訪問測試,這讓我覺得確認環境還是有些麻煩。下一次,我也許會更詳細地學習部署。

文献引用

从Docker入门到学习Kubernetes的15步骤
Kubernetes官方文档

bannerAds