在红帽CodeReady Containers(OpenShift 4)上启用集群监控并进行构建
首先
最初
开头
首要
一开始
在学习OpenShift期间,我进行了一些调研,并将我认为有必要保存下来的内容写成了一篇文章。
尽管”CodeReady Containers”默认情况下禁用了集群监控功能(prometheus、alertmanager、grafana、kube-state-metrics、node-exporter),但我根据”Getting Started Guide”的参考信息将其启用了。
此外,我还在尝试分配持久性存储。
环境
-
- macOS Big Sur v11.2.3
- CodeReady Containers v1.23.1 (OpenShift 4.7)
操作步骤
每次都让人感到疑惑,但是从设置开始,进入配置。(记好了,自己)
设置后,会询问是否匿名发送使用情况,但由于我们的环境资源有限,性能并不好,所以选择”N”。※很抱歉无法提供帮助。
另外,在第一次设置时,会在中途进行图像的下载和安装,会要求输入MacOS的密码。
% ./crc setup
CodeReady Containers is constantly improving and we would like to know more about usage (more details at https://developers.redhat.com/article/tool-data-collection)
Your preference can be changed manually if desired using 'crc config set consent-telemetry <yes/no>'
Would you like to contribute anonymous usage statistics? [y/N]: N
No worry, you can still enable telemetry manually with the command 'crc config set consent-telemetry yes'.
INFO Checking if running as non-root
INFO Checking if podman remote executable is cached
INFO Checking if admin-helper executable is cached
INFO Caching admin-helper executable
INFO Using root access: Changing ownership of /Users/haomei/.crc/bin/admin-helper-darwin
Password:
INFO Using root access: Setting suid for /Users/haomei/.crc/bin/admin-helper-darwin
INFO Checking minimum RAM requirements
INFO Checking if HyperKit is installed
INFO Setting up virtualization with HyperKit
INFO Using root access: Changing ownership of /Users/haomei/.crc/bin/hyperkit
INFO Using root access: Setting suid for /Users/haomei/.crc/bin/hyperkit
INFO Checking if crc-driver-hyperkit is installed
INFO Installing crc-machine-hyperkit
INFO Using root access: Changing ownership of /Users/haomei/.crc/bin/crc-driver-hyperkit
INFO Using root access: Setting suid for /Users/haomei/.crc/bin/crc-driver-hyperkit
INFO Checking file permissions for /etc/hosts
INFO Checking file permissions for /etc/resolver/testing
INFO Checking if CRC bundle is extracted in '$HOME/.crc'
INFO Checking if /Users/haomei/.crc/cache/crc_hyperkit_4.7.0.crcbundle exists
INFO Extracting bundle from the CRC executable
INFO Ensuring directory /Users/haomei/.crc/cache exists
INFO Extracting embedded bundle crc_hyperkit_4.7.0.crcbundle to /Users/haomei/.crc/cache
INFO Uncompressing crc_hyperkit_4.7.0.crcbundle
crc.qcow2: 9.97 GiB / 9.97 GiB [-------------------------------------------------------------------------------------------------------------------------------------------] 100.00%
Setup is complete, you can now run 'crc start' to start the OpenShift cluster
下一个是配置。在CRC中,默认可用内存值为9GB(9216MiB),但要启用集群监视,建议使用14GiB(14336MiB)或更高的内存。
% ./crc config set memory 16384
Changes to configuration property 'memory' are only applied when the CRC instance is started.
If you already have a running CRC instance, then for this configuration change to take effect, stop the CRC instance with 'crc stop' and restart it with 'crc start'.
由于变得更加方便,现在也可以通过config命令有效地进行集群监控。这让人感到非常高兴。
% ./crc config set enable-cluster-monitoring true
Successfully configured enable-cluster-monitoring to true
可以使用config view命令来查看已设定的config列表。
% ./crc config view
- consent-telemetry : no
- enable-cluster-monitoring : true
- memory : 16384
只需在“start”命令中启动即可。
初次启动时需要使用“-p”选项指定pull secret。
请事先从开发者网站下载并保存好自己的“pull-secret.txt”文件。
% ./crc start -p ../pull-secret.txt
INFO Checking if running as non-root
INFO Checking if podman remote executable is cached
INFO Checking if admin-helper executable is cached
INFO Checking minimum RAM requirements
INFO Checking if HyperKit is installed
INFO Checking if crc-driver-hyperkit is installed
INFO Checking file permissions for /etc/hosts
INFO Checking file permissions for /etc/resolver/testing
INFO Loading bundle: crc_hyperkit_4.7.0.crcbundle ...
INFO Creating CodeReady Containers VM for OpenShift 4.7.0...
INFO CodeReady Containers VM is running
INFO Generating new SSH Key pair ...
INFO Updating authorized keys ...
INFO Copying kubeconfig file to instance dir ...
INFO Starting network time synchronization in CodeReady Containers VM
INFO Network restart not needed
INFO Check internal and public DNS query ...
INFO Check DNS query from host ...
INFO Adding user's pull secret to instance disk...
INFO Verifying validity of the kubelet certificates ...
INFO Starting OpenShift kubelet service
INFO Waiting for kube-apiserver availability... [takes around 2min]
INFO Adding user's pull secret to the cluster ...
INFO Updating cluster ID ...
INFO Enabling cluster monitoring operator...
INFO Starting OpenShift cluster ... [waiting for the cluster to stabilize]
INFO 6 operators are progressing: dns, image-registry, network, openshift-controller-manager, operator-lifecycle-manager-packageserver, service-ca
INFO 6 operators are progressing: dns, image-registry, network, openshift-controller-manager, operator-lifecycle-manager-packageserver, service-ca
INFO 6 operators are progressing: dns, image-registry, network, openshift-controller-manager, operator-lifecycle-manager-packageserver, service-ca
INFO 3 operators are progressing: image-registry, openshift-controller-manager, service-ca
INFO 4 operators are progressing: image-registry, monitoring, openshift-controller-manager, service-ca
INFO 2 operators are progressing: console, monitoring
INFO 2 operators are progressing: monitoring, openshift-controller-manager
INFO 2 operators are progressing: kube-apiserver, openshift-controller-manager
INFO 2 operators are progressing: kube-apiserver, openshift-controller-manager
INFO Operator kube-apiserver is progressing
INFO 2 operators are progressing: kube-apiserver, operator-lifecycle-manager-packageserver
INFO 3 operators are progressing: kube-apiserver, monitoring, operator-lifecycle-manager-packageserver
INFO 2 operators are progressing: kube-apiserver, monitoring
INFO All operators are available. Ensuring stability ...
INFO Operators are stable (2/3) ...
INFO Operators are stable (3/3) ...
INFO Updating kubeconfig
WARN The cluster might report a degraded or error state. This is expected since several operators have been disabled to lower the resource usage. For more information, please consult the documentation
Started the OpenShift cluster.
The server is accessible via web console at:
https://console-openshift-console.apps-crc.testing
Log in as administrator:
Username: kubeadmin
Password: T3sJD-jjueE-2BnHe-ftNBw
Log in as user:
Username: developer
Password: developer
Use the 'oc' command line interface:
$ eval $(crc oc-env)
$ oc login -u developer https://api.crc.testing:6443
配置环境,以便能使用oc命令。
% eval $(./crc oc-env)
我将使用 kubeadmin 进行登录。
% oc login -u kubeadmin -p T3sJD-jjueE-2BnHe-ftNBw https://api.crc.testing:6443
Login successful.
You have access to 61 projects, the list has been suppressed. You can list all projects with 'oc projects'
Using project "default".
OpenShift集群监控会部署在“openshift-monitoring”项目(命名空间)中。
% oc get pods -n openshift-monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 5/5 Running 0 11m
alertmanager-main-1 5/5 Running 0 11m
alertmanager-main-2 5/5 Running 0 11m
cluster-monitoring-operator-686555c948-chw9v 2/2 Running 4 12m
grafana-6f4d96d7fd-kp2dt 2/2 Running 0 11m
kube-state-metrics-749954d685-sjslc 3/3 Running 0 11m
node-exporter-jfxw4 2/2 Running 0 11m
openshift-state-metrics-587d97bb47-tsnpd 3/3 Running 0 11m
prometheus-adapter-664dfbdf7b-frrnw 1/1 Running 0 10m
prometheus-adapter-664dfbdf7b-wkshz 1/1 Running 0 10m
prometheus-k8s-0 7/7 Running 1 11m
prometheus-k8s-1 7/7 Running 1 11m
prometheus-operator-658ccb589c-686cf 2/2 Running 1 11m
telemeter-client-599864d5f-h6xjp 3/3 Running 0 11m
thanos-querier-665b8bc578-8bj8q 5/5 Running 0 11m
thanos-querier-665b8bc578-cmfqx 5/5 Running 0 11m
确认动作
如果要按节点单位引用指标,则应执行”oc adm top node”。
% oc adm top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
crc-l6qvn-master-0 1397m 34% 10494Mi 65%
执行命令”oc adm top pod”来获得以Pod为单位的指标。
% oc adm top pods -n openshift-monitoring
NAME CPU(cores) MEMORY(bytes)
alertmanager-main-0 2m 73Mi
alertmanager-main-1 1m 75Mi
alertmanager-main-2 2m 78Mi
cluster-monitoring-operator-686555c948-chw9v 0m 64Mi
grafana-6f4d96d7fd-kp2dt 2m 50Mi
kube-state-metrics-749954d685-sjslc 0m 72Mi
node-exporter-jfxw4 18m 26Mi
openshift-state-metrics-587d97bb47-tsnpd 0m 44Mi
prometheus-adapter-664dfbdf7b-frrnw 5m 38Mi
prometheus-adapter-664dfbdf7b-wkshz 2m 36Mi
prometheus-k8s-0 37m 828Mi
prometheus-k8s-1 41m 797Mi
prometheus-operator-658ccb589c-686cf 1m 135Mi
telemeter-client-599864d5f-h6xjp 0m 47Mi
thanos-querier-665b8bc578-8bj8q 4m 74Mi
thanos-querier-665b8bc578-cmfqx 2m 84Mi
在Web控制台上进行确认。使用crc命令并指定“console”选项来执行Web控制台。
% ./crc console
Opening the OpenShift Web Console in the default browser...




永久储存空间的分配
以前的版本中,我记得默认情况下有一个动态供应商来提供本地PV,但不知何时消失了。经过一番调查,我在以下的wiki上找到了解决方法:
https://github.com/code-ready/crc/wiki/Dynamic-volume-provisioning
我将部署这个”local-path-provisioner”部分。
直接复制粘贴就可以了。
% oc new-project local-path-storage
% oc create serviceaccount local-path-provisioner-service-account -n local-path-storage
% oc adm policy add-scc-to-user hostaccess -z local-path-provisioner-service-account -n local-path-storage
% cat <<EOF | oc apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: local-path-provisioner-role
rules:
- apiGroups: [""]
resources: ["nodes", "persistentvolumeclaims"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["endpoints", "persistentvolumes", "pods"]
verbs: ["*"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "patch"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: local-path-provisioner-bind
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: local-path-provisioner-role
subjects:
- kind: ServiceAccount
name: local-path-provisioner-service-account
namespace: local-path-storage
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: local-path-provisioner
namespace: local-path-storage
spec:
replicas: 1
selector:
matchLabels:
app: local-path-provisioner
template:
metadata:
labels:
app: local-path-provisioner
spec:
serviceAccountName: local-path-provisioner-service-account
containers:
- name: local-path-provisioner
image: rancher/local-path-provisioner:v0.0.12
imagePullPolicy: IfNotPresent
command:
- local-path-provisioner
- --debug
- start
- --config
- /etc/config/config.json
volumeMounts:
- name: config-volume
mountPath: /etc/config/
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumes:
- name: config-volume
configMap:
name: local-path-config
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-path
provisioner: rancher.io/local-path
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
---
kind: ConfigMap
apiVersion: v1
metadata:
name: local-path-config
namespace: local-path-storage
data:
config.json: |-
{
"nodePathMap":[
{
"node":"DEFAULT_PATH_FOR_NON_LISTED_NODES",
"paths":["/mnt/pv-data"]
}
]
}
EOF
这样一来,就可以使用名为“local-path”的StorageClass了。
接下来,在集群监控的configmap中进行存储设置。
首先,由于configmap本身不存在,我们需要创建它。它的名称是“cluster-monitoring-config”。
% oc create configmap cluster-monitoring-config -n openshift-monitoring
configmap/cluster-monitoring-config created
一旦完成,就會進行編輯。
% oc edit configmap cluster-monitoring-config -n openshift-monitoring
在vi编辑器中切换到yaml编辑模式,然后添加包含”data:”行及其后续设置。
设置prometheus的数据保留期为”retention: 24h”。请根据存储容量进行调整。
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
kind: ConfigMap
metadata:
creationTimestamp: "2021-03-15T06:31:08Z"
name: cluster-monitoring-config
namespace: openshift-monitoring
resourceVersion: "53959"
selfLink: /api/v1/namespaces/openshift-monitoring/configmaps/cluster-monitoring-config
uid: 755425ba-c5b7-48bd-a29f-d375cd29a694
data:
config.yaml: |
prometheusK8s:
volumeClaimTemplate:
spec:
storageClassName: local-path
resources:
requests:
storage: 40Gi
retention: 24h
alertmanagerMain:
volumeClaimTemplate:
spec:
storageClassName: local-path
resources:
requests:
storage: 40Gi
当您使用「:wq」进行保存时,将会输出编辑完成的消息。
configmap/cluster-monitoring-config edited
配置或编辑configmap会根据内容重新启动prometheus和alertmanager。
% oc get pod -n openshift-monitoring -w
NAME READY STATUS RESTARTS AGE
alertmanager-main-0 5/5 Running 0 31m
alertmanager-main-1 5/5 Running 0 31m
alertmanager-main-2 5/5 Running 0 31m
cluster-monitoring-operator-686555c948-g7frt 2/2 Running 4 32m
grafana-6f4d96d7fd-6bz9x 2/2 Running 0 32m
kube-state-metrics-749954d685-2rj98 3/3 Running 0 32m
node-exporter-qf9kt 2/2 Running 0 32m
openshift-state-metrics-587d97bb47-tv9qm 3/3 Running 0 32m
prometheus-adapter-78f4dff485-b97lk 1/1 Running 0 30m
prometheus-adapter-78f4dff485-ctjr2 1/1 Running 0 30m
prometheus-k8s-0 7/7 Running 1 31m
prometheus-k8s-1 7/7 Running 1 31m
prometheus-operator-658ccb589c-zkhjs 2/2 Running 1 32m
telemeter-client-5c9f466b48-2qfw5 3/3 Running 0 32m
thanos-querier-74f6ff8cd6-bmkzg 5/5 Running 0 32m
thanos-querier-74f6ff8cd6-djm75 5/5 Running 0 32m
alertmanager-main-0 5/5 Terminating 0 30m ←alertmanagerの停止処理開始
alertmanager-main-1 5/5 Terminating 0 30m
:
prometheus-k8s-0 7/7 Terminating 1 30m ←prometheusの停止処理開始
prometheus-k8s-1 7/7 Terminating 1 30m
:
alertmanager-main-0 5/5 Running 0 8s ←alertmanagerの起動完了
alertmanager-main-1 5/5 Running 0 7s
alertmanager-main-2 5/5 Running 0 7s
:
prometheus-k8s-1 7/7 Running 1 9s ←prometheusの起動完了
prometheus-k8s-0 7/7 Running 1 12s
我将确认PVC和PV是否已正确分配。
% oc get pvc -n openshift-monitoring
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
alertmanager-main-db-alertmanager-main-0 Bound pvc-96694b93-ec7a-4c10-b80c-6880f3104d70 40Gi RWO local-path 17m
alertmanager-main-db-alertmanager-main-1 Bound pvc-130939d0-5a1d-4eac-8a9d-c0b3dfb5a1b1 40Gi RWO local-path 17m
alertmanager-main-db-alertmanager-main-2 Bound pvc-c27a7084-ef29-4429-8aee-1f3bd8d843f8 40Gi RWO local-path 17m
prometheus-k8s-db-prometheus-k8s-0 Bound pvc-250c335f-ec8a-43e1-8f21-af5ac8635016 40Gi RWO local-path 3m30s
prometheus-k8s-db-prometheus-k8s-1 Bound pvc-df236008-9d14-4acc-b1f4-7fd9ab9feeee 40Gi RWO local-path 3m29s
% oc get pv | grep openshift-monitoring
pvc-130939d0-5a1d-4eac-8a9d-c0b3dfb5a1b1 40Gi RWO Delete Bound openshift-monitoring/alertmanager-main-db-alertmanager-main-1 local-path 17m
pvc-250c335f-ec8a-43e1-8f21-af5ac8635016 40Gi RWO Delete Bound openshift-monitoring/prometheus-k8s-db-prometheus-k8s-0 local-path 3m57s
pvc-96694b93-ec7a-4c10-b80c-6880f3104d70 40Gi RWO Delete Bound openshift-monitoring/alertmanager-main-db-alertmanager-main-0 local-path 17m
pvc-c27a7084-ef29-4429-8aee-1f3bd8d843f8 40Gi RWO Delete Bound openshift-monitoring/alertmanager-main-db-alertmanager-main-2 local-path 17m
pvc-df236008-9d14-4acc-b1f4-7fd9ab9feeee 40Gi RWO Delete Bound openshift-monitoring/prometheus-k8s-db-prometheus-k8s-1 local-path 3m57s
最后
我参考的文件如下。
■《Red Hat CodeReady Containers 1.23》产品文件
https://access.redhat.com/documentation/zh-cn/red_hat_codeready_containers/1.23/html-single/getting_started_guide/index
■code-ready.github.io的「快速入门指南」
https://code-ready.github.io/crc/
■Red Hat OpenShift 4.7的产品文档 – 配置集群监控的持久化存储设置
https://access.redhat.com/documentation/ja-jp/openshift_container_platform/4.7/html-single/monitoring/index#configuring-persistent-storage
■ 我这次参考的文章是 CRC 的 Wiki 中的「动态卷配额」,链接如下:
https://github.com/code-ready/crc/wiki