在IBM云上的红帽OpenShift上部署自定义的Ansible Operator(手动安装方法)

为了让读者更容易理解,我们需要提供更多的上下文。请提供相关的句子或背景信息,以便我可以更准确地翻译这个句子。

Kubernetes Operator是Kubernetes或OpenShift集群应用程序打包、部署和管理的一种方式。如果只是简单部署应用程序,Helm或Kustomize是比较知名的工具,但Operator的优点在于能够自动化部署后的管理,例如配置更改或版本升级等操作工作。

目前(2021/10),Operator SDK可以使用Go、Ansible和Helm进行开发。

我个人认为,运营商的最大挑战是开发困难,因为它非常复杂且信息有限。如果按照官方教程使用SDK,基本上可以运行,但是当我想要改变某些操作或修改某些命名时,或者出现了无法正常工作或问题的情况,我就会陷入困境。因此,不仅需要遵循教程,还需要仔细理解机制并进行开发。

如果仅仅将Operator用作软件包管理器,我个人认为它并不值得花费那么多努力。我认为使用Operator的好处在于可以在OperatorHub或者Red Hat Marletplace等平台进行公开。

因此,为了兼顾我的学习,我将尝试在相对真实的环境中开发操作人员。由于我是初学者,所以很有可能会犯错,如果您能指出我的错误,我将不胜感激。

顺便提一下,由于我不擅长Go语言,所以我打算使用我熟悉的Ansible开发Operator,并将其实际部署到IBM Cloud的OpenShift上进行测试。

环境

在特点方面,关于操作员开发的教程和示例通常是从像quay.io和docker.io这样的公共注册表中获取的图像。但是,这次我们假定是企业级的用途,因此将使用私有注册表。因此,必须始终考虑ImagePullSecret。

image.png

步骤

准备SDK

请确保安装了最低限的make和operator-sdk。还需要安装IBM Cloud CLI、Docker或者Podman、OpenShift CLI,但假设这些已经安装好。

$ sudo dnf -y make
$ wget https://github.com/operator-framework/operator-sdk/releases/download/v1.12.0/operator-sdk_linux_amd64
$ sudo cp operator-sdk_linux_amd64 /usr/local/bin/operator-sdk
$ sudo chmod 755 /usr/local/bin/operator-sdk
$ operator-sdk completion bash | sudo tee /etc/bash_completion.d/operator-sdk >/dev/null
$ . /etc/bash_completion.d/operator-sdk

创建项目

我們將建立一個專案,在”Plugins”中指定它是一個Ansible Operator。

$ mkdir hello-ansible-operator
$ cd hello-ansible-operator/

$ operator-sdk init --plugins ansible --domain teruq.example.com
Writing kustomize manifests for you to edit...
Next: define a resource with:
#$ operator-sdk create api

我来看一下现在的项目结构。

$ ls -F
Dockerfile  
Makefile  
PROJECT  
config/  
molecule/  
playbooks/  
requirements.yml  
roles/  
watches.yaml

有两个特别需要关注的点。config中生成了用于部署Operator的清单模板。roles是定义Operator操作的核心部分,但在这个阶段还是空的。

操作员的开发

增添API

向 Operator 的 API 中添加 Hello,将生成相应的 Role 模板。

$ operator-sdk create api --group example --version v1alpha1 --kind Hello --generate-role
Writing kustomize manifests for you to edit...

将生成以下类似的文件。APIHello将对应helloRole。这将成为编辑应用程序状态管理的基础。

$ find roles/
roles/
roles/.placeholder
roles/hello
roles/hello/defaults
roles/hello/defaults/main.yml
roles/hello/files
roles/hello/files/.placeholder
roles/hello/handlers
roles/hello/handlers/main.yml
roles/hello/meta
roles/hello/meta/main.yml
roles/hello/README.md
roles/hello/tasks
roles/hello/tasks/main.yml
roles/hello/templates
roles/hello/templates/.placeholder
roles/hello/vars
roles/hello/vars/main.yml

编辑角色

作为例子,我们使用k8s模块来定义容器的Deployment和Service,同时将副本数量定义为变量replicas。

---
# tasks file for Hello
- name: Deployment hello is defined
  community.kubernetes.k8s:
    definition:
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: "{{ ansible_operator_meta.name }}-hello"
        namespace: "{{ ansible_operator_meta.namespace }}"
      spec:
        replicas: "{{ replicas }}"
        selector:
          matchLabels:
            app: hello
        template:
          metadata:
            labels:
              app: hello
          spec:
            containers:
            - name: hello
              image: ibmcom/hello

- name: Service hello is defined
  community.kubernetes.k8s:
    definition:
      apiVersion: v1
      kind: Service
      metadata:
        name: "{{ ansible_operator_meta.name }}-hello"
        namespace: "{{ ansible_operator_meta.namespace }}"
      spec:
        ports:
        - name: 8080-tcp
          port: 8080
          protocol: TCP
          targetPort: 8080
        selector:
          app: hello
        type: ClusterIP

为了使Ansible能够独立工作,将replicas的默认值设定为defaults。实际上并不使用。

---
# defaults file for Hello
replicas: 1

对RBAC的修订

我在Role中定义了Deployment和Service。由于Service在默认情况下不被RBAC允许,所以我将在以下文件中进行补充。

 - apiGroups:
      - ""
    resources:
      - secrets
      - pods
      - pods/exec
      - pods/log
      - services  # 追加
    verbs:
      - create
      - delete
      - get
      - list
      - patch
      - update
      - watch

定制资源定义的修改

你好,我们将修改Hello API的Custom Resource Definition,并指定replicas属性。

---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: hellos.sample.teruq.example.com
spec:
  group: sample.teruq.example.com
  names:
    kind: Hello
    listKind: HelloList
    plural: hellos
    singular: hello
  scope: Namespaced
  versions:
  - name: v1alpha1
    schema:
      openAPIV3Schema:
        description: Hello is the Schema for the hellos API
        properties:
          apiVersion:
            description: 'APIVersion defines the versioned schema of this representation
              of an object. Servers should convert recognized schemas to the latest
              internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conve
ntions.md#resources'
            type: string
          kind:
            description: 'Kind is a string value representing the REST resource this
              object represents. Servers may infer this from the endpoint the client
              submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-con
ventions.md#types-kinds'
            type: string
          metadata:
            type: object
          spec:
            description: Spec defines the desired state of Hello
            type: object
            #x-kubernetes-preserve-unknown-fields: true                   # 削除
            properties:                                                   # 追加
              replicas:                                                   # 追加
                description: The number of replicas of Hello deployment.  # 追加
                type: integer                                             # 追加
                default: 1                                                # 追加
            required:                                                     # 追加
            - replicas                                                    # 追加
          status:
            description: Status defines the observed state of Hello
            type: object
            x-kubernetes-preserve-unknown-fields: true
        type: object
    served: true
    storage: true
    subresources:
      status: {}

修改Makefile

在Makefile中,有一个指定图像名称和标签的部分,但默认情况下不能使用。以后可以通过在make命令中指定参数如make IMG=xxx以覆盖默认设置,但这可能导致操作错误。因此,需要直接修改Makefile。

# IMAGE_TAG_BASE defines the docker.io namespace and part of the image name for remote images.
# This variable is used to construct full image tags for bundle and catalog images.
#
# For example, running 'make bundle-build bundle-push catalog-build catalog-push' will build and push both
# example.com/sample-ansible-operator-bundle:$VERSION and example.com/sample-ansible-operator-catalog:$VERSION.
#IMAGE_TAG_BASE ?= teruq.example.com/hello-ansible-operator
IMAGE_TAG_BASE ?= jp.icr.io/teruq/hello-ansible-operator  # 修正

# BUNDLE_IMG defines the image:tag used for the bundle.
# You can use it as an arg. (E.g make bundle-build BUNDLE_IMG=<some-registry>/<project-name-bundle>:<tag>)
BUNDLE_IMG ?= $(IMAGE_TAG_BASE)-bundle:v$(VERSION)

# Image URL to use all building/pushing image targets
#IMG ?= controller:latest
IMG ?= $(IMAGE_TAG_BASE):$(VERSION)                       # 修正

操作员创建映像的构建

创建 Operator 的映像。Dockerfile 已经自动生成。

FROM quay.io/operator-framework/ansible-operator:v1.12.0

COPY requirements.yml ${HOME}/requirements.yml
RUN ansible-galaxy collection install -r ${HOME}/requirements.yml \
 && chmod -R ug+rwx ${HOME}/.ansible

COPY watches.yaml ${HOME}/watches.yaml
COPY roles/ ${HOME}/roles/
COPY playbooks/ ${HOME}/playbooks/

建立图像。

$ make docker-build
docker build -t jp.icr.io/teruq/hello-ansible-operator:0.0.1 .
...

推送操作符的图像

我将代码推送到IBM Cloud容器注册表(ICR)中。请先使用IBM Cloud CLI进行身份验证。

$ export IBMCLOUD_API_KEY=********
$ ibmcloud login
$ ibmcloud cr login

推送图片。

$ make docker-push
docker push jp.icr.io/teruq/hello-ansible-operator:0.0.1
The push refers to repository [jp.icr.io/teruq/hello-ansible-operator]
...
0.0.1: digest: sha256:031595ea6f536daf9ae1d8b9901c4ddd69e2d7a07679e01ed189f57aef82dcd0 size: 3031

我会确认是否已在ICR注册。

$ ibmcloud cr images | grep hello-ansible
jp.icr.io/teruq/hello-ansible-operator               0.0.1                              031595ea6f53   teruq      2 minutes ago   156 MB   55 件の問題

设置ImagePullSecret

在部署Operator之后,直接从ICR拉取镜像时会由于权限不足而失败。将ICR的ImagePullSecret复制,并确保在Controller的ServiceAccount在Operator部署时能够使用。

创建 Operator 的命名空间。您可以在以下文件中确认命名空间的名称。

# Adds namespace to all resources.
namespace: hello-ansible-operator-system

创建命名空间。

$ oc create ns hello-ansible-operator-system
namespace/hello-ansible-operator-system created

从默认命名空间复制all-icr-io密钥。

$ ibmcloud oc cluster config -c ${クラスタ}
$ oc login -u apikey -p ${APIキー}
$ oc get secret all-icr-io -n default -o yaml | grep -v namespace: | oc create -n hello-ansible-operator-system -f -
secret/all-icr-io created

接下来,我们将更改Operator的Deployment,以使用ImagePullSecret。此外,虽然不是必需的,但将Operator的ImagePullPolicy设置为Always可方便我们在尝试错误和重新尝试相同标签的Operator图像时重新拉取镜像。

...
      containers:
      - args:
        - --leader-elect
        - --leader-election-id=sample-ansible-operator
        image: controller:latest
        imagePullPolicy: Always  # 追加(オプション)
...
      serviceAccountName: controller-manager
      terminationGracePeriodSeconds: 10
      imagePullSecrets:
      - name: all-icr-io         # 追加

运营商的安装

我們將在叢集中安裝運算子。

$ make deploy
cd config/manager && /.../hello-ansible-operator/bin/kustomize edit set image controller=jp.icr.io/teruq/hello-ansible-operator:0.0.1
/.../hello-ansible-operator/bin/kustomize build config/default | kubectl apply -f -
Warning: resource namespaces/hello-ansible-operator-system is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
namespace/hello-ansible-operator-system configured
customresourcedefinition.apiextensions.k8s.io/hellos.example.teruq.example.com created
Warning: resource serviceaccounts/hello-ansible-operator-controller-manager is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
serviceaccount/hello-ansible-operator-controller-manager configured
role.rbac.authorization.k8s.io/hello-ansible-operator-leader-election-role created
clusterrole.rbac.authorization.k8s.io/hello-ansible-operator-manager-role created
clusterrole.rbac.authorization.k8s.io/hello-ansible-operator-metrics-reader created
clusterrole.rbac.authorization.k8s.io/hello-ansible-operator-proxy-role created
rolebinding.rbac.authorization.k8s.io/hello-ansible-operator-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/hello-ansible-operator-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/hello-ansible-operator-proxy-rolebinding created
configmap/hello-ansible-operator-manager-config created
service/hello-ansible-operator-controller-manager-metrics-service created
deployment.apps/hello-ansible-operator-controller-manager created

我会确认操作员已启动。

$ oc get pods -n hello-ansible-operator-system
NAME                                                         READY   STATUS    RESTARTS   AGE
hello-ansible-operator-controller-manager-55b57dd7fb-h7h4z   2/2     Running   0          51s

Custom Resource的设置。

Custom Resource的配置。

Custom Resource的设定。

在此阶段,Operator已部署,但实际应用程序尚未部署。为此,我们将定义自定义资源。请修改以下文件。在角色和自定义资源定义中,将副本数的默认值设为1,而在自定义资源中设为2。另外,您可以根据喜好更改名称。

apiVersion: sample.teruq.example.com/v1alpha1
kind: Hello
metadata:
  #name: hello-sample
  name: sample  # 修正
spec:
  # Add fields here
  replicas: 2   # 追加

应用程序的部署

在想要部署应用程序的命名空间中创建自定义资源。

$ oc new-project qiita 2>/dev/null || oc project qiita
$ oc apply -f config/samples/example_v1alpha1_hello.yaml
hello.example.teruq.example.com/sample created

Operator会自动检测到Custom Resource的创建,并部署应用程序。

$ oc get deploy | grep hello
sample-hello     2/2     2            2           27s

$ oc get pods | grep hello
sample-hello-c54fb8b58-bv4dl      1/1     Running   0          48s
sample-hello-c54fb8b58-ckfll      1/1     Running   0          48s

$ oc get svc | grep hello
sample-hello     ClusterIP   172.21.9.127    <none>        8080/TCP   66s

如果Pod无法启动,可能是由于Operator的部署失败。通过查看Controller的日志,可能可以获得一些提示。以下是一个正常的示例,可以看到Controller中正在运行Ansible。

$ stern -n hello-ansible-operator-system  hello-
...
hello-ansible-operator-controller-manager-55b57dd7fb-h7h4z manager {"level":"info","ts":1633185646.4900098,"logger":"proxy","msg":"Read object from cache","resource":{"IsResourceRequest":true,"Path":"/api/v1/namespaces/qiita/services/sample-hello","Verb":"get","APIPrefix":"api","APIGroup":"","APIVersion":"v1","Namespace":"qiita","Resource":"services","Subresource":"","Name":"sample-hello","Parts":["services","sample-hello"]}}
hello-ansible-operator-controller-manager-55b57dd7fb-h7h4z manager {"level":"info","ts":1633185646.8944943,"logger":"runner","msg":"Ansible-runner exited successfully","job":"7465230838818706286","name":"sample","namespace":"qiita"}
hello-ansible-operator-controller-manager-55b57dd7fb-h7h4z manager
hello-ansible-operator-controller-manager-55b57dd7fb-h7h4z manager ----- Ansible Task Status Event StdOut (example.teruq.example.com/v1alpha1, Kind=Hello, sample/qiita) -----
hello-ansible-operator-controller-manager-55b57dd7fb-h7h4z manager
hello-ansible-operator-controller-manager-55b57dd7fb-h7h4z manager
hello-ansible-operator-controller-manager-55b57dd7fb-h7h4z manager PLAY RECAP *********************************************************************
hello-ansible-operator-controller-manager-55b57dd7fb-h7h4z manager localhost                  : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
hello-ansible-operator-controller-manager-55b57dd7fb-h7h4z manager
hello-ansible-operator-controller-manager-55b57dd7fb-h7h4z manager
...

确认应用程序的运行

创建一条路由。

$ oc create route edge --service sample-hello
route.route.openshift.io/sample-hello created

$ oc get route sample-hello
NAME           HOST/PORT                                                                                                    PATH   SERVICES       PORT       TERMINATION   WILDCARD
sample-hello   sample-hello-qiita.roks-public-tok-********81bb6d6d7afea007d1a8cafd-0000.jp-tok.containers.appdomain.cloud          sample-hello   8080-tcp   edge          None

确认可以通过curl进行访问。

$ curl https://sample-hello-qiita.roks-public-tok-********81bb6d6d7afea007d1a8cafd-0000.jp-tok.containers.appdomain.cloud
Hello World

对Custom Resource进行操作验证

我试图更新Custom Resource。将replicas设置为3。

$ oc patch hello/sample --type merge -p '{"spec": {"replicas": 3}}'
hello.example.teruq.example.com/sample patched

确认一下目前的Pod数量已经增加到3个。

$ oc get pods | grep hello
sample-hello-c54fb8b58-bv4dl      1/1     Running   0          4m27s
sample-hello-c54fb8b58-ckfll      1/1     Running   0          4m27s
sample-hello-c54fb8b58-hw8fg      1/1     Running   0          16s

打扫整理

当删除自定义资源时,应用程序也会被一同删除。

$ oc delete -f config/samples/example_v1alpha1_hello.yaml
hello.example.teruq.example.com "sample" delete

$ oc get pods | grep hello
sample-hello-c54fb8b58-bv4dl      1/1     Terminating   0          7m43s
sample-hello-c54fb8b58-ckfll      1/1     Terminating   0          7m43s
sample-hello-c54fb8b58-hw8fg      1/1     Terminating   0          3m32s

我会移除操作员。

$ make undeploy
/.../hello-ansible-operator/bin/kustomize build config/default | kubectl delete -f -
namespace "hello-ansible-operator-system" deleted
customresourcedefinition.apiextensions.k8s.io "hellos.example.teruq.example.com" deleted
serviceaccount "hello-ansible-operator-controller-manager" deleted
role.rbac.authorization.k8s.io "hello-ansible-operator-leader-election-role" deleted
clusterrole.rbac.authorization.k8s.io "hello-ansible-operator-manager-role" deleted
clusterrole.rbac.authorization.k8s.io "hello-ansible-operator-metrics-reader" deleted
clusterrole.rbac.authorization.k8s.io "hello-ansible-operator-proxy-role" deleted
rolebinding.rbac.authorization.k8s.io "hello-ansible-operator-leader-election-rolebinding" deleted
clusterrolebinding.rbac.authorization.k8s.io "hello-ansible-operator-manager-rolebinding" deleted
clusterrolebinding.rbac.authorization.k8s.io "hello-ansible-operator-proxy-rolebinding" deleted
configmap "hello-ansible-operator-manager-config" deleted
service "hello-ansible-operator-controller-manager-metrics-service" deleted
deployment.apps "hello-ansible-operator-controller-manager" deleted

请注意,由于命名空间已被删除,因此手动添加的ImagePullSecret和ServiceAccount也被删除了。

为了完全清理此次流程,不要忘记手动创建的路由。

$ oc delete route sample-hello -n qiita

总结。

这次我们手动安装了Ansible Operator。下一次我们将尝试使用OLM(Operator Lifecycle Manager)进行安装验证。

广告
将在 10 秒后关闭
bannerAds