在红帽CodeReady Containers(OpenShift 4)上启用集群监控并进行构建

3 年 ago

文, 翔

9 minutes

首先
最初
开头
首要
一开始

在学习OpenShift期间，我进行了一些调研，并将我认为有必要保存下来的内容写成了一篇文章。
尽管”CodeReady Containers”默认情况下禁用了集群监控功能（prometheus、alertmanager、grafana、kube-state-metrics、node-exporter），但我根据”Getting Started Guide”的参考信息将其启用了。
此外，我还在尝试分配持久性存储。

环境

CodeReady Containers v1.23.1 (OpenShift 4.7)

操作步骤

每次都让人感到疑惑，但是从设置开始，进入配置。（记好了，自己）

设置后，会询问是否匿名发送使用情况，但由于我们的环境资源有限，性能并不好，所以选择”N”。※很抱歉无法提供帮助。
另外，在第一次设置时，会在中途进行图像的下载和安装，会要求输入MacOS的密码。

% ./crc setup
CodeReady Containers is constantly improving and we would like to know more about usage (more details at https://developers.redhat.com/article/tool-data-collection)
Your preference can be changed manually if desired using 'crc config set consent-telemetry <yes/no>'
Would you like to contribute anonymous usage statistics? [y/N]: N
No worry, you can still enable telemetry manually with the command 'crc config set consent-telemetry yes'.
INFO Checking if running as non-root              
INFO Checking if podman remote executable is cached 
INFO Checking if admin-helper executable is cached 
INFO Caching admin-helper executable              
INFO Using root access: Changing ownership of /Users/haomei/.crc/bin/admin-helper-darwin 
Password:
INFO Using root access: Setting suid for /Users/haomei/.crc/bin/admin-helper-darwin 
INFO Checking minimum RAM requirements            
INFO Checking if HyperKit is installed            
INFO Setting up virtualization with HyperKit      
INFO Using root access: Changing ownership of /Users/haomei/.crc/bin/hyperkit 
INFO Using root access: Setting suid for /Users/haomei/.crc/bin/hyperkit 
INFO Checking if crc-driver-hyperkit is installed 
INFO Installing crc-machine-hyperkit              
INFO Using root access: Changing ownership of /Users/haomei/.crc/bin/crc-driver-hyperkit 
INFO Using root access: Setting suid for /Users/haomei/.crc/bin/crc-driver-hyperkit 
INFO Checking file permissions for /etc/hosts     
INFO Checking file permissions for /etc/resolver/testing 
INFO Checking if CRC bundle is extracted in '$HOME/.crc' 
INFO Checking if /Users/haomei/.crc/cache/crc_hyperkit_4.7.0.crcbundle exists 
INFO Extracting bundle from the CRC executable    
INFO Ensuring directory /Users/haomei/.crc/cache exists 
INFO Extracting embedded bundle crc_hyperkit_4.7.0.crcbundle to /Users/haomei/.crc/cache 
INFO Uncompressing crc_hyperkit_4.7.0.crcbundle   
crc.qcow2: 9.97 GiB / 9.97 GiB [-------------------------------------------------------------------------------------------------------------------------------------------] 100.00%
Setup is complete, you can now run 'crc start' to start the OpenShift cluster

下一个是配置。在CRC中，默认可用内存值为9GB（9216MiB），但要启用集群监视，建议使用14GiB（14336MiB）或更高的内存。

% ./crc config set memory 16384
Changes to configuration property 'memory' are only applied when the CRC instance is started.
If you already have a running CRC instance, then for this configuration change to take effect, stop the CRC instance with 'crc stop' and restart it with 'crc start'.

由于变得更加方便，现在也可以通过config命令有效地进行集群监控。这让人感到非常高兴。

% ./crc config set enable-cluster-monitoring true
Successfully configured enable-cluster-monitoring to true

可以使用config view命令来查看已设定的config列表。

% ./crc config view                              
- consent-telemetry                     : no
- enable-cluster-monitoring             : true
- memory                                : 16384

只需在“start”命令中启动即可。
初次启动时需要使用“-p”选项指定pull secret。
请事先从开发者网站下载并保存好自己的“pull-secret.txt”文件。

% ./crc start -p ../pull-secret.txt 
INFO Checking if running as non-root              
INFO Checking if podman remote executable is cached 
INFO Checking if admin-helper executable is cached 
INFO Checking minimum RAM requirements            
INFO Checking if HyperKit is installed            
INFO Checking if crc-driver-hyperkit is installed 
INFO Checking file permissions for /etc/hosts     
INFO Checking file permissions for /etc/resolver/testing 
INFO Loading bundle: crc_hyperkit_4.7.0.crcbundle ... 
INFO Creating CodeReady Containers VM for OpenShift 4.7.0... 
INFO CodeReady Containers VM is running           
INFO Generating new SSH Key pair ...              
INFO Updating authorized keys ...                 
INFO Copying kubeconfig file to instance dir ...  
INFO Starting network time synchronization in CodeReady Containers VM 
INFO Network restart not needed                   
INFO Check internal and public DNS query ...      
INFO Check DNS query from host ...                
INFO Adding user's pull secret to instance disk... 
INFO Verifying validity of the kubelet certificates ... 
INFO Starting OpenShift kubelet service           
INFO Waiting for kube-apiserver availability... [takes around 2min] 
INFO Adding user's pull secret to the cluster ... 
INFO Updating cluster ID ...                      
INFO Enabling cluster monitoring operator...      
INFO Starting OpenShift cluster ... [waiting for the cluster to stabilize] 
INFO 6 operators are progressing: dns, image-registry, network, openshift-controller-manager, operator-lifecycle-manager-packageserver, service-ca 
INFO 6 operators are progressing: dns, image-registry, network, openshift-controller-manager, operator-lifecycle-manager-packageserver, service-ca 
INFO 6 operators are progressing: dns, image-registry, network, openshift-controller-manager, operator-lifecycle-manager-packageserver, service-ca 
INFO 3 operators are progressing: image-registry, openshift-controller-manager, service-ca 
INFO 4 operators are progressing: image-registry, monitoring, openshift-controller-manager, service-ca 
INFO 2 operators are progressing: console, monitoring 
INFO 2 operators are progressing: monitoring, openshift-controller-manager 
INFO 2 operators are progressing: kube-apiserver, openshift-controller-manager 
INFO 2 operators are progressing: kube-apiserver, openshift-controller-manager 
INFO Operator kube-apiserver is progressing       
INFO 2 operators are progressing: kube-apiserver, operator-lifecycle-manager-packageserver 
INFO 3 operators are progressing: kube-apiserver, monitoring, operator-lifecycle-manager-packageserver 
INFO 2 operators are progressing: kube-apiserver, monitoring 
INFO All operators are available. Ensuring stability ... 
INFO Operators are stable (2/3) ...               
INFO Operators are stable (3/3) ...               
INFO Updating kubeconfig                          
WARN The cluster might report a degraded or error state. This is expected since several operators have been disabled to lower the resource usage. For more information, please consult the documentation 
Started the OpenShift cluster.

The server is accessible via web console at:
  https://console-openshift-console.apps-crc.testing

Log in as administrator:
  Username: kubeadmin
  Password: T3sJD-jjueE-2BnHe-ftNBw

Log in as user:
  Username: developer
  Password: developer

Use the 'oc' command line interface:
  $ eval $(crc oc-env)
  $ oc login -u developer https://api.crc.testing:6443

配置环境，以便能使用oc命令。

% eval $(./crc oc-env)

我将使用 kubeadmin 进行登录。

% oc login -u kubeadmin -p T3sJD-jjueE-2BnHe-ftNBw https://api.crc.testing:6443
Login successful.

You have access to 61 projects, the list has been suppressed. You can list all projects with 'oc projects'

Using project "default".

OpenShift集群监控会部署在“openshift-monitoring”项目（命名空间）中。

% oc get pods -n openshift-monitoring
NAME                                           READY   STATUS    RESTARTS   AGE
alertmanager-main-0                            5/5     Running   0          11m
alertmanager-main-1                            5/5     Running   0          11m
alertmanager-main-2                            5/5     Running   0          11m
cluster-monitoring-operator-686555c948-chw9v   2/2     Running   4          12m
grafana-6f4d96d7fd-kp2dt                       2/2     Running   0          11m
kube-state-metrics-749954d685-sjslc            3/3     Running   0          11m
node-exporter-jfxw4                            2/2     Running   0          11m
openshift-state-metrics-587d97bb47-tsnpd       3/3     Running   0          11m
prometheus-adapter-664dfbdf7b-frrnw            1/1     Running   0          10m
prometheus-adapter-664dfbdf7b-wkshz            1/1     Running   0          10m
prometheus-k8s-0                               7/7     Running   1          11m
prometheus-k8s-1                               7/7     Running   1          11m
prometheus-operator-658ccb589c-686cf           2/2     Running   1          11m
telemeter-client-599864d5f-h6xjp               3/3     Running   0          11m
thanos-querier-665b8bc578-8bj8q                5/5     Running   0          11m
thanos-querier-665b8bc578-cmfqx                5/5     Running   0          11m

确认动作

如果要按节点单位引用指标，则应执行”oc adm top node”。

% oc adm top nodes
NAME                 CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
crc-l6qvn-master-0   1397m        34%    10494Mi         65%

执行命令”oc adm top pod”来获得以Pod为单位的指标。

% oc adm top pods -n openshift-monitoring
NAME                                           CPU(cores)   MEMORY(bytes)   
alertmanager-main-0                            2m           73Mi            
alertmanager-main-1                            1m           75Mi            
alertmanager-main-2                            2m           78Mi            
cluster-monitoring-operator-686555c948-chw9v   0m           64Mi            
grafana-6f4d96d7fd-kp2dt                       2m           50Mi            
kube-state-metrics-749954d685-sjslc            0m           72Mi            
node-exporter-jfxw4                            18m          26Mi            
openshift-state-metrics-587d97bb47-tsnpd       0m           44Mi            
prometheus-adapter-664dfbdf7b-frrnw            5m           38Mi            
prometheus-adapter-664dfbdf7b-wkshz            2m           36Mi            
prometheus-k8s-0                               37m          828Mi           
prometheus-k8s-1                               41m          797Mi           
prometheus-operator-658ccb589c-686cf           1m           135Mi           
telemeter-client-599864d5f-h6xjp               0m           47Mi            
thanos-querier-665b8bc578-8bj8q                4m           74Mi            
thanos-querier-665b8bc578-cmfqx                2m           84Mi

在Web控制台上进行确认。使用crc命令并指定“console”选项来执行Web控制台。

% ./crc console   
Opening the OpenShift Web Console in the default browser...

永久储存空间的分配

以前的版本中，我记得默认情况下有一个动态供应商来提供本地PV，但不知何时消失了。经过一番调查，我在以下的wiki上找到了解决方法：
https://github.com/code-ready/crc/wiki/Dynamic-volume-provisioning
我将部署这个”local-path-provisioner”部分。
直接复制粘贴就可以了。

% oc new-project local-path-storage
% oc create serviceaccount local-path-provisioner-service-account -n local-path-storage
% oc adm policy add-scc-to-user hostaccess -z local-path-provisioner-service-account -n local-path-storage
% cat <<EOF | oc apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: local-path-provisioner-role
rules:
- apiGroups: [""]
  resources: ["nodes", "persistentvolumeclaims"]
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources: ["endpoints", "persistentvolumes", "pods"]
  verbs: ["*"]
- apiGroups: [""]
  resources: ["events"]
  verbs: ["create", "patch"]
- apiGroups: ["storage.k8s.io"]
  resources: ["storageclasses"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: local-path-provisioner-bind
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: local-path-provisioner-role
subjects:
- kind: ServiceAccount
  name: local-path-provisioner-service-account
  namespace: local-path-storage
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: local-path-provisioner
  namespace: local-path-storage
spec:
  replicas: 1
  selector:
    matchLabels:
      app: local-path-provisioner
  template:
    metadata:
      labels:
        app: local-path-provisioner
    spec:
      serviceAccountName: local-path-provisioner-service-account
      containers:
      - name: local-path-provisioner
        image: rancher/local-path-provisioner:v0.0.12
        imagePullPolicy: IfNotPresent
        command:
        - local-path-provisioner
        - --debug
        - start
        - --config
        - /etc/config/config.json
        volumeMounts:
        - name: config-volume
          mountPath: /etc/config/
        env:
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
      volumes:
        - name: config-volume
          configMap:
            name: local-path-config
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-path
provisioner: rancher.io/local-path
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: local-path-config
  namespace: local-path-storage
data:
  config.json: |-
        {
                "nodePathMap":[
                {
                        "node":"DEFAULT_PATH_FOR_NON_LISTED_NODES",
                        "paths":["/mnt/pv-data"]
                }
                ]
        }
EOF

这样一来，就可以使用名为“local-path”的StorageClass了。

接下来，在集群监控的configmap中进行存储设置。
首先，由于configmap本身不存在，我们需要创建它。它的名称是“cluster-monitoring-config”。

% oc create configmap cluster-monitoring-config -n openshift-monitoring
configmap/cluster-monitoring-config created

一旦完成，就會進行編輯。

% oc edit configmap cluster-monitoring-config -n openshift-monitoring

在vi编辑器中切换到yaml编辑模式，然后添加包含”data:”行及其后续设置。
设置prometheus的数据保留期为”retention: 24h”。请根据存储容量进行调整。

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
kind: ConfigMap
metadata:
  creationTimestamp: "2021-03-15T06:31:08Z"
  name: cluster-monitoring-config
  namespace: openshift-monitoring
  resourceVersion: "53959"
  selfLink: /api/v1/namespaces/openshift-monitoring/configmaps/cluster-monitoring-config
  uid: 755425ba-c5b7-48bd-a29f-d375cd29a694
data:
  config.yaml: |
    prometheusK8s:
      volumeClaimTemplate:
        spec:
          storageClassName: local-path
          resources:
            requests:
              storage: 40Gi
      retention: 24h
    alertmanagerMain:
      volumeClaimTemplate:
        spec:
          storageClassName: local-path
          resources:
            requests:
              storage: 40Gi

当您使用「:wq」进行保存时，将会输出编辑完成的消息。

configmap/cluster-monitoring-config edited

配置或编辑configmap会根据内容重新启动prometheus和alertmanager。

% oc get pod -n openshift-monitoring -w
NAME                                         READY   STATUS    RESTARTS   AGE
alertmanager-main-0                           5/5     Running   0          31m
alertmanager-main-1                           5/5     Running   0          31m
alertmanager-main-2                           5/5     Running   0          31m
cluster-monitoring-operator-686555c948-g7frt  2/2     Running   4          32m
grafana-6f4d96d7fd-6bz9x                      2/2     Running   0          32m
kube-state-metrics-749954d685-2rj98           3/3     Running   0          32m
node-exporter-qf9kt                           2/2     Running   0          32m
openshift-state-metrics-587d97bb47-tv9qm      3/3     Running   0          32m
prometheus-adapter-78f4dff485-b97lk           1/1     Running   0          30m
prometheus-adapter-78f4dff485-ctjr2           1/1     Running   0          30m
prometheus-k8s-0                              7/7     Running   1          31m
prometheus-k8s-1                              7/7     Running   1          31m
prometheus-operator-658ccb589c-zkhjs          2/2     Running   1          32m
telemeter-client-5c9f466b48-2qfw5             3/3     Running   0          32m
thanos-querier-74f6ff8cd6-bmkzg               5/5     Running   0          32m
thanos-querier-74f6ff8cd6-djm75               5/5     Running   0          32m
alertmanager-main-0                           5/5     Terminating   0          30m  ←alertmanagerの停止処理開始
alertmanager-main-1                           5/5     Terminating   0          30m
            :
prometheus-k8s-0                              7/7     Terminating         1          30m　←prometheusの停止処理開始
prometheus-k8s-1                              7/7     Terminating         1          30m
            :
alertmanager-main-0                           5/5     Running             0          8s　←alertmanagerの起動完了
alertmanager-main-1                           5/5     Running             0          7s
alertmanager-main-2                           5/5     Running             0          7s
            :
prometheus-k8s-1                              7/7     Running             1          9s　←prometheusの起動完了
prometheus-k8s-0                              7/7     Running             1          12s

我将确认PVC和PV是否已正确分配。

% oc get pvc -n openshift-monitoring   
NAME                                       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
alertmanager-main-db-alertmanager-main-0   Bound    pvc-96694b93-ec7a-4c10-b80c-6880f3104d70   40Gi       RWO            local-path     17m
alertmanager-main-db-alertmanager-main-1   Bound    pvc-130939d0-5a1d-4eac-8a9d-c0b3dfb5a1b1   40Gi       RWO            local-path     17m
alertmanager-main-db-alertmanager-main-2   Bound    pvc-c27a7084-ef29-4429-8aee-1f3bd8d843f8   40Gi       RWO            local-path     17m
prometheus-k8s-db-prometheus-k8s-0         Bound    pvc-250c335f-ec8a-43e1-8f21-af5ac8635016   40Gi       RWO            local-path     3m30s
prometheus-k8s-db-prometheus-k8s-1         Bound    pvc-df236008-9d14-4acc-b1f4-7fd9ab9feeee   40Gi       RWO            local-path     3m29s
% oc get pv | grep openshift-monitoring
pvc-130939d0-5a1d-4eac-8a9d-c0b3dfb5a1b1   40Gi       RWO            Delete           Bound       openshift-monitoring/alertmanager-main-db-alertmanager-main-1   local-path              17m
pvc-250c335f-ec8a-43e1-8f21-af5ac8635016   40Gi       RWO            Delete           Bound       openshift-monitoring/prometheus-k8s-db-prometheus-k8s-0         local-path              3m57s
pvc-96694b93-ec7a-4c10-b80c-6880f3104d70   40Gi       RWO            Delete           Bound       openshift-monitoring/alertmanager-main-db-alertmanager-main-0   local-path              17m
pvc-c27a7084-ef29-4429-8aee-1f3bd8d843f8   40Gi       RWO            Delete           Bound       openshift-monitoring/alertmanager-main-db-alertmanager-main-2   local-path              17m
pvc-df236008-9d14-4acc-b1f4-7fd9ab9feeee   40Gi       RWO            Delete           Bound       openshift-monitoring/prometheus-k8s-db-prometheus-k8s-1         local-path              3m57s

最后

我参考的文件如下。

■《Red Hat CodeReady Containers 1.23》产品文件
https://access.redhat.com/documentation/zh-cn/red_hat_codeready_containers/1.23/html-single/getting_started_guide/index

■code-ready.github.io的「快速入门指南」
https://code-ready.github.io/crc/

■Red Hat OpenShift 4.7的产品文档 – 配置集群监控的持久化存储设置
https://access.redhat.com/documentation/ja-jp/openshift_container_platform/4.7/html-single/monitoring/index#configuring-persistent-storage

■ 我这次参考的文章是 CRC 的 Wiki 中的「动态卷配额」，链接如下：
https://github.com/code-ready/crc/wiki

首先 最初 开头 首要 一开始

环境

操作步骤

确认动作

永久储存空间的分配

最后

首先
最初
开头
首要
一开始