在IBM云上的红帽OpenShift上部署自定义的Ansible Operator(手动安装方法)
为了让读者更容易理解,我们需要提供更多的上下文。请提供相关的句子或背景信息,以便我可以更准确地翻译这个句子。
Kubernetes Operator是Kubernetes或OpenShift集群应用程序打包、部署和管理的一种方式。如果只是简单部署应用程序,Helm或Kustomize是比较知名的工具,但Operator的优点在于能够自动化部署后的管理,例如配置更改或版本升级等操作工作。
目前(2021/10),Operator SDK可以使用Go、Ansible和Helm进行开发。
我个人认为,运营商的最大挑战是开发困难,因为它非常复杂且信息有限。如果按照官方教程使用SDK,基本上可以运行,但是当我想要改变某些操作或修改某些命名时,或者出现了无法正常工作或问题的情况,我就会陷入困境。因此,不仅需要遵循教程,还需要仔细理解机制并进行开发。
如果仅仅将Operator用作软件包管理器,我个人认为它并不值得花费那么多努力。我认为使用Operator的好处在于可以在OperatorHub或者Red Hat Marletplace等平台进行公开。
因此,为了兼顾我的学习,我将尝试在相对真实的环境中开发操作人员。由于我是初学者,所以很有可能会犯错,如果您能指出我的错误,我将不胜感激。
顺便提一下,由于我不擅长Go语言,所以我打算使用我熟悉的Ansible开发Operator,并将其实际部署到IBM Cloud的OpenShift上进行测试。
环境
在特点方面,关于操作员开发的教程和示例通常是从像quay.io和docker.io这样的公共注册表中获取的图像。但是,这次我们假定是企业级的用途,因此将使用私有注册表。因此,必须始终考虑ImagePullSecret。

步骤
准备SDK
请确保安装了最低限的make和operator-sdk。还需要安装IBM Cloud CLI、Docker或者Podman、OpenShift CLI,但假设这些已经安装好。
$ sudo dnf -y make
$ wget https://github.com/operator-framework/operator-sdk/releases/download/v1.12.0/operator-sdk_linux_amd64
$ sudo cp operator-sdk_linux_amd64 /usr/local/bin/operator-sdk
$ sudo chmod 755 /usr/local/bin/operator-sdk
$ operator-sdk completion bash | sudo tee /etc/bash_completion.d/operator-sdk >/dev/null
$ . /etc/bash_completion.d/operator-sdk
创建项目
我們將建立一個專案,在”Plugins”中指定它是一個Ansible Operator。
$ mkdir hello-ansible-operator
$ cd hello-ansible-operator/
$ operator-sdk init --plugins ansible --domain teruq.example.com
Writing kustomize manifests for you to edit...
Next: define a resource with:
#$ operator-sdk create api
我来看一下现在的项目结构。
$ ls -F
Dockerfile
Makefile
PROJECT
config/
molecule/
playbooks/
requirements.yml
roles/
watches.yaml
有两个特别需要关注的点。config中生成了用于部署Operator的清单模板。roles是定义Operator操作的核心部分,但在这个阶段还是空的。
操作员的开发
增添API
向 Operator 的 API 中添加 Hello,将生成相应的 Role 模板。
$ operator-sdk create api --group example --version v1alpha1 --kind Hello --generate-role
Writing kustomize manifests for you to edit...
将生成以下类似的文件。APIHello将对应helloRole。这将成为编辑应用程序状态管理的基础。
$ find roles/
roles/
roles/.placeholder
roles/hello
roles/hello/defaults
roles/hello/defaults/main.yml
roles/hello/files
roles/hello/files/.placeholder
roles/hello/handlers
roles/hello/handlers/main.yml
roles/hello/meta
roles/hello/meta/main.yml
roles/hello/README.md
roles/hello/tasks
roles/hello/tasks/main.yml
roles/hello/templates
roles/hello/templates/.placeholder
roles/hello/vars
roles/hello/vars/main.yml
编辑角色
作为例子,我们使用k8s模块来定义容器的Deployment和Service,同时将副本数量定义为变量replicas。
---
# tasks file for Hello
- name: Deployment hello is defined
community.kubernetes.k8s:
definition:
apiVersion: apps/v1
kind: Deployment
metadata:
name: "{{ ansible_operator_meta.name }}-hello"
namespace: "{{ ansible_operator_meta.namespace }}"
spec:
replicas: "{{ replicas }}"
selector:
matchLabels:
app: hello
template:
metadata:
labels:
app: hello
spec:
containers:
- name: hello
image: ibmcom/hello
- name: Service hello is defined
community.kubernetes.k8s:
definition:
apiVersion: v1
kind: Service
metadata:
name: "{{ ansible_operator_meta.name }}-hello"
namespace: "{{ ansible_operator_meta.namespace }}"
spec:
ports:
- name: 8080-tcp
port: 8080
protocol: TCP
targetPort: 8080
selector:
app: hello
type: ClusterIP
为了使Ansible能够独立工作,将replicas的默认值设定为defaults。实际上并不使用。
---
# defaults file for Hello
replicas: 1
对RBAC的修订
我在Role中定义了Deployment和Service。由于Service在默认情况下不被RBAC允许,所以我将在以下文件中进行补充。
- apiGroups:
- ""
resources:
- secrets
- pods
- pods/exec
- pods/log
- services # 追加
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
定制资源定义的修改
你好,我们将修改Hello API的Custom Resource Definition,并指定replicas属性。
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: hellos.sample.teruq.example.com
spec:
group: sample.teruq.example.com
names:
kind: Hello
listKind: HelloList
plural: hellos
singular: hello
scope: Namespaced
versions:
- name: v1alpha1
schema:
openAPIV3Schema:
description: Hello is the Schema for the hellos API
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conve
ntions.md#resources'
type: string
kind:
description: 'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-con
ventions.md#types-kinds'
type: string
metadata:
type: object
spec:
description: Spec defines the desired state of Hello
type: object
#x-kubernetes-preserve-unknown-fields: true # 削除
properties: # 追加
replicas: # 追加
description: The number of replicas of Hello deployment. # 追加
type: integer # 追加
default: 1 # 追加
required: # 追加
- replicas # 追加
status:
description: Status defines the observed state of Hello
type: object
x-kubernetes-preserve-unknown-fields: true
type: object
served: true
storage: true
subresources:
status: {}
修改Makefile
在Makefile中,有一个指定图像名称和标签的部分,但默认情况下不能使用。以后可以通过在make命令中指定参数如make IMG=xxx以覆盖默认设置,但这可能导致操作错误。因此,需要直接修改Makefile。
# IMAGE_TAG_BASE defines the docker.io namespace and part of the image name for remote images.
# This variable is used to construct full image tags for bundle and catalog images.
#
# For example, running 'make bundle-build bundle-push catalog-build catalog-push' will build and push both
# example.com/sample-ansible-operator-bundle:$VERSION and example.com/sample-ansible-operator-catalog:$VERSION.
#IMAGE_TAG_BASE ?= teruq.example.com/hello-ansible-operator
IMAGE_TAG_BASE ?= jp.icr.io/teruq/hello-ansible-operator # 修正
# BUNDLE_IMG defines the image:tag used for the bundle.
# You can use it as an arg. (E.g make bundle-build BUNDLE_IMG=<some-registry>/<project-name-bundle>:<tag>)
BUNDLE_IMG ?= $(IMAGE_TAG_BASE)-bundle:v$(VERSION)
# Image URL to use all building/pushing image targets
#IMG ?= controller:latest
IMG ?= $(IMAGE_TAG_BASE):$(VERSION) # 修正
操作员创建映像的构建
创建 Operator 的映像。Dockerfile 已经自动生成。
FROM quay.io/operator-framework/ansible-operator:v1.12.0
COPY requirements.yml ${HOME}/requirements.yml
RUN ansible-galaxy collection install -r ${HOME}/requirements.yml \
&& chmod -R ug+rwx ${HOME}/.ansible
COPY watches.yaml ${HOME}/watches.yaml
COPY roles/ ${HOME}/roles/
COPY playbooks/ ${HOME}/playbooks/
建立图像。
$ make docker-build
docker build -t jp.icr.io/teruq/hello-ansible-operator:0.0.1 .
...
推送操作符的图像
我将代码推送到IBM Cloud容器注册表(ICR)中。请先使用IBM Cloud CLI进行身份验证。
$ export IBMCLOUD_API_KEY=********
$ ibmcloud login
$ ibmcloud cr login
推送图片。
$ make docker-push
docker push jp.icr.io/teruq/hello-ansible-operator:0.0.1
The push refers to repository [jp.icr.io/teruq/hello-ansible-operator]
...
0.0.1: digest: sha256:031595ea6f536daf9ae1d8b9901c4ddd69e2d7a07679e01ed189f57aef82dcd0 size: 3031
我会确认是否已在ICR注册。
$ ibmcloud cr images | grep hello-ansible
jp.icr.io/teruq/hello-ansible-operator 0.0.1 031595ea6f53 teruq 2 minutes ago 156 MB 55 件の問題
设置ImagePullSecret
在部署Operator之后,直接从ICR拉取镜像时会由于权限不足而失败。将ICR的ImagePullSecret复制,并确保在Controller的ServiceAccount在Operator部署时能够使用。
创建 Operator 的命名空间。您可以在以下文件中确认命名空间的名称。
# Adds namespace to all resources.
namespace: hello-ansible-operator-system
创建命名空间。
$ oc create ns hello-ansible-operator-system
namespace/hello-ansible-operator-system created
从默认命名空间复制all-icr-io密钥。
$ ibmcloud oc cluster config -c ${クラスタ}
$ oc login -u apikey -p ${APIキー}
$ oc get secret all-icr-io -n default -o yaml | grep -v namespace: | oc create -n hello-ansible-operator-system -f -
secret/all-icr-io created
接下来,我们将更改Operator的Deployment,以使用ImagePullSecret。此外,虽然不是必需的,但将Operator的ImagePullPolicy设置为Always可方便我们在尝试错误和重新尝试相同标签的Operator图像时重新拉取镜像。
...
containers:
- args:
- --leader-elect
- --leader-election-id=sample-ansible-operator
image: controller:latest
imagePullPolicy: Always # 追加(オプション)
...
serviceAccountName: controller-manager
terminationGracePeriodSeconds: 10
imagePullSecrets:
- name: all-icr-io # 追加
运营商的安装
我們將在叢集中安裝運算子。
$ make deploy
cd config/manager && /.../hello-ansible-operator/bin/kustomize edit set image controller=jp.icr.io/teruq/hello-ansible-operator:0.0.1
/.../hello-ansible-operator/bin/kustomize build config/default | kubectl apply -f -
Warning: resource namespaces/hello-ansible-operator-system is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
namespace/hello-ansible-operator-system configured
customresourcedefinition.apiextensions.k8s.io/hellos.example.teruq.example.com created
Warning: resource serviceaccounts/hello-ansible-operator-controller-manager is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
serviceaccount/hello-ansible-operator-controller-manager configured
role.rbac.authorization.k8s.io/hello-ansible-operator-leader-election-role created
clusterrole.rbac.authorization.k8s.io/hello-ansible-operator-manager-role created
clusterrole.rbac.authorization.k8s.io/hello-ansible-operator-metrics-reader created
clusterrole.rbac.authorization.k8s.io/hello-ansible-operator-proxy-role created
rolebinding.rbac.authorization.k8s.io/hello-ansible-operator-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/hello-ansible-operator-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/hello-ansible-operator-proxy-rolebinding created
configmap/hello-ansible-operator-manager-config created
service/hello-ansible-operator-controller-manager-metrics-service created
deployment.apps/hello-ansible-operator-controller-manager created
我会确认操作员已启动。
$ oc get pods -n hello-ansible-operator-system
NAME READY STATUS RESTARTS AGE
hello-ansible-operator-controller-manager-55b57dd7fb-h7h4z 2/2 Running 0 51s
Custom Resource的设置。
Custom Resource的配置。
Custom Resource的设定。
在此阶段,Operator已部署,但实际应用程序尚未部署。为此,我们将定义自定义资源。请修改以下文件。在角色和自定义资源定义中,将副本数的默认值设为1,而在自定义资源中设为2。另外,您可以根据喜好更改名称。
apiVersion: sample.teruq.example.com/v1alpha1
kind: Hello
metadata:
#name: hello-sample
name: sample # 修正
spec:
# Add fields here
replicas: 2 # 追加
应用程序的部署
在想要部署应用程序的命名空间中创建自定义资源。
$ oc new-project qiita 2>/dev/null || oc project qiita
$ oc apply -f config/samples/example_v1alpha1_hello.yaml
hello.example.teruq.example.com/sample created
Operator会自动检测到Custom Resource的创建,并部署应用程序。
$ oc get deploy | grep hello
sample-hello 2/2 2 2 27s
$ oc get pods | grep hello
sample-hello-c54fb8b58-bv4dl 1/1 Running 0 48s
sample-hello-c54fb8b58-ckfll 1/1 Running 0 48s
$ oc get svc | grep hello
sample-hello ClusterIP 172.21.9.127 <none> 8080/TCP 66s
如果Pod无法启动,可能是由于Operator的部署失败。通过查看Controller的日志,可能可以获得一些提示。以下是一个正常的示例,可以看到Controller中正在运行Ansible。
$ stern -n hello-ansible-operator-system hello-
...
hello-ansible-operator-controller-manager-55b57dd7fb-h7h4z manager {"level":"info","ts":1633185646.4900098,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/api/v1/namespaces/qiita/services/sample-hello","Verb":"get","APIPrefix":"api","APIGroup":"","APIVersion":"v1","Namespace":"qiita","Resource":"services","Subresource":"","Name":"sample-hello","Parts":["services","sample-hello"]}}
hello-ansible-operator-controller-manager-55b57dd7fb-h7h4z manager {"level":"info","ts":1633185646.8944943,"logger":"runner","msg":"Ansible-runner exited successfully","job":"7465230838818706286","name":"sample","namespace":"qiita"}
hello-ansible-operator-controller-manager-55b57dd7fb-h7h4z manager
hello-ansible-operator-controller-manager-55b57dd7fb-h7h4z manager ----- Ansible Task Status Event StdOut (example.teruq.example.com/v1alpha1, Kind=Hello, sample/qiita) -----
hello-ansible-operator-controller-manager-55b57dd7fb-h7h4z manager
hello-ansible-operator-controller-manager-55b57dd7fb-h7h4z manager
hello-ansible-operator-controller-manager-55b57dd7fb-h7h4z manager PLAY RECAP *********************************************************************
hello-ansible-operator-controller-manager-55b57dd7fb-h7h4z manager localhost : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
hello-ansible-operator-controller-manager-55b57dd7fb-h7h4z manager
hello-ansible-operator-controller-manager-55b57dd7fb-h7h4z manager
...
确认应用程序的运行
创建一条路由。
$ oc create route edge --service sample-hello
route.route.openshift.io/sample-hello created
$ oc get route sample-hello
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
sample-hello sample-hello-qiita.roks-public-tok-********81bb6d6d7afea007d1a8cafd-0000.jp-tok.containers.appdomain.cloud sample-hello 8080-tcp edge None
确认可以通过curl进行访问。
$ curl https://sample-hello-qiita.roks-public-tok-********81bb6d6d7afea007d1a8cafd-0000.jp-tok.containers.appdomain.cloud
Hello World
对Custom Resource进行操作验证
我试图更新Custom Resource。将replicas设置为3。
$ oc patch hello/sample --type merge -p '{"spec": {"replicas": 3}}'
hello.example.teruq.example.com/sample patched
确认一下目前的Pod数量已经增加到3个。
$ oc get pods | grep hello
sample-hello-c54fb8b58-bv4dl 1/1 Running 0 4m27s
sample-hello-c54fb8b58-ckfll 1/1 Running 0 4m27s
sample-hello-c54fb8b58-hw8fg 1/1 Running 0 16s
打扫整理
当删除自定义资源时,应用程序也会被一同删除。
$ oc delete -f config/samples/example_v1alpha1_hello.yaml
hello.example.teruq.example.com "sample" delete
$ oc get pods | grep hello
sample-hello-c54fb8b58-bv4dl 1/1 Terminating 0 7m43s
sample-hello-c54fb8b58-ckfll 1/1 Terminating 0 7m43s
sample-hello-c54fb8b58-hw8fg 1/1 Terminating 0 3m32s
我会移除操作员。
$ make undeploy
/.../hello-ansible-operator/bin/kustomize build config/default | kubectl delete -f -
namespace "hello-ansible-operator-system" deleted
customresourcedefinition.apiextensions.k8s.io "hellos.example.teruq.example.com" deleted
serviceaccount "hello-ansible-operator-controller-manager" deleted
role.rbac.authorization.k8s.io "hello-ansible-operator-leader-election-role" deleted
clusterrole.rbac.authorization.k8s.io "hello-ansible-operator-manager-role" deleted
clusterrole.rbac.authorization.k8s.io "hello-ansible-operator-metrics-reader" deleted
clusterrole.rbac.authorization.k8s.io "hello-ansible-operator-proxy-role" deleted
rolebinding.rbac.authorization.k8s.io "hello-ansible-operator-leader-election-rolebinding" deleted
clusterrolebinding.rbac.authorization.k8s.io "hello-ansible-operator-manager-rolebinding" deleted
clusterrolebinding.rbac.authorization.k8s.io "hello-ansible-operator-proxy-rolebinding" deleted
configmap "hello-ansible-operator-manager-config" deleted
service "hello-ansible-operator-controller-manager-metrics-service" deleted
deployment.apps "hello-ansible-operator-controller-manager" deleted
请注意,由于命名空间已被删除,因此手动添加的ImagePullSecret和ServiceAccount也被删除了。
为了完全清理此次流程,不要忘记手动创建的路由。
$ oc delete route sample-hello -n qiita
总结。
这次我们手动安装了Ansible Operator。下一次我们将尝试使用OLM(Operator Lifecycle Manager)进行安装验证。