在AKS上建立Prometheus+Grafana和Elasticsearch+Fluentd+Kibana，并使用Grafana执行日志警报的步骤（1/2）

3 年 ago

清, 宇

6 minutes

首先

在操作系统时，资源监控、日志监控和警报通知始终是必需的。即使在运营Kubernetes时也同样如此。
在云上建立托管的Kubernetes时，我认为经常会使用云提供的监控服务和警报服务。但是，如果想要充分利用现有系统的经验，或者注重可移植性，可能也会希望自己准备资源和日志的监控以及警报。

因此，我們計劃在Microsoft Azure AKS的Kubernetes上建立使用Prometheus和Grafana進行資源監控，以及使用Elasticsearch、fluentd和Kibana進行日誌集合，並使用Grafana進行資源和日誌的警報通知。

由于本文篇幅较长，所以本次解释提到了在AKS上构建Prometheus和Grafana。
请在下一篇文章中了解有关启动AElasticsearch + Fluentd + Kibana以集中收集每个节点和Pod的日志，并在日志中存在特定字符串时从Grafana发送通知到Slack的部分信息。

第一回：https://qiita.com/nmatsui/items/6d8319f3216bd8786eb9
第二回：https://qiita.com/nmatsui/items/ef7cf8f5c957f82d2ca1

第一篇：https://qiita.com/nmatsui/items/6d8319f3216bd8786eb9
第二篇：https://qiita.com/nmatsui/items/ef7cf8f5c957f82d2ca1

验证环境

クラウド側

バージョンMicrosoft Azure AKS1.11.1

クライアント側

バージョンkubectl1.11.2azure-cli2.0.44helm2.9.1

验证时使用的yaml等详细信息已在GitHub上公开。请参阅nmatsui/kubernetes-monitoring。

进行环境建设

准备Microsoft Azure AKS。

使用az命令创建资源组并启动AKS。同时，指定可以使用Premium Storage的vm size，如Dsv3-series。

$ az group create --name k8s --location japaneast
$ az aks create --resource-group k8s --name k8saks --node-count 3 --ssh-key-value $HOME/.ssh/azure.pub --node-vm-size Standard_D2s_v3 --kubernetes-version 1.11.1
$ az aks get-credentials --resource-group k8s --name k8saks

准备Helm

使用CoreOS发布的Helm Chart，Prometheus和Grafana可以通过coreos/prometheus-operator和coreos/kube-prometheus进行安装。然而，在启用了RBAC的Kubernetes环境中，Helm需要较高的权限来在内部操作各种资源才能正常工作。尽管本应探索最小必要权限，但这次我们将直接授予超级用户权限（cluster-admin）。

apiVersion: v1
kind: ServiceAccount
metadata:
  name: tiller
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: tiller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: tiller
    namespace: kube-system

$ kubectl apply -f rbac/tiller-rbac.yaml

$ helm init --service-account tiller
$ helm repo update
$ helm repo add coreos https://s3-eu-west-1.amazonaws.com/coreos-charts/stable/

我们需要确认直播播客（Tiller pod）已经启动。

$ kubectl get pod --namespace kube-system -l app=helm -l name=tiller
NAME                            READY     STATUS    RESTARTS   AGE
tiller-deploy-759cb9df9-fqcrv   1/1       Running   0          2m

安装 Prometheus 和 Grafana

使用coreos/prometheus-operator和coreos/kube-prometheus，安装Prometheus和Grafana。

然而，如果直接使用2018/08/19時点的默认设置，会出现以下问题。因此，请覆盖设置并进行安装。请根据需要修改存储容量等设置。

ElasticsearchのAlertingに対応したGrafana 5.2系ではなく、5.0系がインストールされる

global:
  rbacEnable: true

alertmanager:
  image:
    repository: quay.io/prometheus/alertmanager
    tag: v0.15.1
  storageSpec:
    volumeClaimTemplate:
      metadata:
        name: pg-alertmanager-storage-claim
      spec:
        storageClassName: managed-premium
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 30Gi

prometheus:
  image:
    repository: quay.io/prometheus/prometheus
    tag: v2.3.2
  storageSpec:
    volumeClaimTemplate:
      metadata:
        name: pg-prometheus-storage-claim
      spec:
        storageClassName: managed-premium
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 30Gi

grafana:
  image:
    repository: grafana/grafana
    tag: 5.2.2
  auth:
    anonymous:
      enabled: "false"

请查看相关的Helm的values.yaml文件来了解默认设置是如何的。
grafana的values.yaml
prometheus的values.yaml
alertmanager的values.yaml
kube-prometheus的values.yaml

安装 Prometheus Operator。

使用`namespace`参数将`coreos/prometheus-operator`安装到`monitoring`命名空间中。

$ helm install coreos/prometheus-operator --name pg-op --namespace monitoring

根据网络状况不同，可能会出现watch断开连接的错误，例如：”Error: watch closed before Until timeout”。然而，Kubernetes集群的构建正在进行中。如果构建成功，prometheus-operator的job应该已经完成，并且应该已经启动了一个pod。

$ kubectl get jobs --namespace monitoring -l app=prometheus-operator -l release=pg-op
NAME                                      DESIRED   SUCCESSFUL   AGE
pg-op-prometheus-operator-create-sm-job   1         1            5m

$ kubectl get pods --namespace monitoring -l app=prometheus-operator -l release=pg-op
NAME                                            READY     STATUS      RESTARTS   AGE
pg-op-prometheus-operator-688494b68f-lrcst      1/1       Running     0          5m
pg-op-prometheus-operator-create-sm-job-fnsgq   0/1       Completed   0          5m

安装 Prometheus 和 Grafana。

指定命名空间为”monitoring”，通过使用coreos/kube-prometheus，在集群中安装Prometheus和Grafana。

$ helm install coreos/kube-prometheus --name pg --namespace monitoring -f monitoring/kube-prometheus-azure.yaml

如果构建成功的话，AlertManager、Prometheus和Grafana应该都能启动起来，并且每个节点上应该都有一个node-exporter在运行。

AlertManager

$ kubectl get persistentvolumeclaims --namespace monitoring -l app=alertmanager
NAME                                   STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
alertmanager-pg-db-alertmanager-pg-0   Bound     pvc-3c2ef8c2-a340-11e8-8990-caec6aa008cf   30Gi       RWO            managed-premium   5m

$ kubectl get pods -n monitoring -l app=alertmanager
NAME                READY     STATUS    RESTARTS   AGE
alertmanager-pg-0   2/2       Running   0          5m

Prometheus

$ kubectl get persistentvolumeclaims --namespace monitoring -l app=prometheus
NAME                                                     STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
prometheus-pg-prometheus-db-prometheus-pg-prometheus-0   Bound     pvc-3c7fb880-a340-11e8-8990-caec6aa008cf   30Gi       RWO            managed-premium   6m

$ kubectl get pods --namespace monitoring -l app=prometheus
NAME                         READY     STATUS    RESTARTS   AGE
prometheus-pg-prometheus-0   3/3       Running   1          6m

Grafana

$ kubectl get pods --namespace monitoring -l app=pg-grafana
NAME                          READY     STATUS    RESTARTS   AGE
pg-grafana-75cdf6b96d-njxwb   2/2       Running   0          7m

node-exporter

$ kubectl get daemonsets --namespace monitoring
NAME               DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
pg-exporter-node   3         3         3         3            3           <none>          7m

$ kubectl get pods -o wide -n monitoring -l app=pg-exporter-node
NAME                     READY     STATUS    RESTARTS   AGE       IP           NODE
pg-exporter-node-4wtfd   1/1       Running   0          8m        10.240.0.4   aks-nodepool1-14983502-2
pg-exporter-node-9mjdg   1/1       Running   0          8m        10.240.0.5   aks-nodepool1-14983502-1
pg-exporter-node-l2gnx   1/1       Running   0          8m        10.240.0.6   aks-nodepool1-14983502-0

根据Azure AKS的要求进行补丁更新。

在使用GCP时，这应该没有问题，但截至2018/08/19，为了与Azure AKS兼容，需要应用一些补丁。

导出kube-dns-v20的指标数据。

默认情况下，Azure AKS的kube-dns似乎不会导出metrics。我们将一个sidecar注入到kube-dns，以将kube-dns的状态导出到prometheus。

spec:
  template:
    spec:
      containers:
      - name: kubedns
        env:
        - name: PROMETHEUS_PORT
          value: "10055"
      - name: sidecar
        image: k8s.gcr.io/k8s-dns-sidecar-amd64:1.14.10
        livenessProbe:
          httpGet:
            path: /metrics
            port: 10054
            scheme: HTTP
          initialDelaySeconds: 60
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 5
        args:
        - --v=2
        - --logtostderr
        - --probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local
        - --probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local
        ports:
        - containerPort: 10054
          name: metrics
          protocol: TCP
        resources:
          requests:
            memory: 20Mi
            cpu: 10m

$ kubectl patch deployment -n kube-system kube-dns-v20 --patch "$(cat monitoring/kube-dns-metrics-patch.yaml)"

请参考 https://github.com/Azure/AKS/issues/345

将kubelet exporter使用的端口从https更改为http。

在Azure AKS中，默认情况下，似乎无法正常进行https导出。我们将kubelets的状态导出端口从https更改为http，即可解决该问题。

$ kubectl get servicemonitors pg-exporter-kubelets --namespace monitoring -o yaml | sed 's/https/http/' | kubectl replace -f -

请参考以下链接：https://github.com/coreos/prometheus-operator/issues/926

删除Kubernetes的控制平面exporter。

默认的Azure AKS似乎无法从外部获取apiserver的状态。很遗憾，在2018/08/19时点上似乎没有好的解决办法，所以我们只能放弃并删除kubernetes的控制平面导出器。

$ kubectl delete servicemonitor pg-exporter-kubernetes --namespace monitoring

请提供用中文原生方式重新表达的选项。

这个问题的Github链接：https://github.com/coreos/prometheus-operator/issues/1522

删除与Kubernetes控制平面相关的警报。

由于删除了控制平面的导出器，导致无法从Kubernetes获取一些值，因此由coreos/kube-prometheus设置的以下警报会一直保持提醒状态。

alert: K8SSchedulerDown

Prometheus的导出器和警报规则是在configmap prometheus-pg-prometheus-rulefiles中定义的，可以使用以下命令进行确认。然而，如果想修改这个configmap，事实上并不是这样的。

$ kubectl get configmap prometheus-pg-prometheus-rulefiles --namespace monitoring -o yaml

这个configmap是由coreos/prometheus-operator动态生成的，即使直接编辑也会被强制恢复为原始的configmap。
实际上，正确的操作步骤是修改由coreos/kube-prometheus注册的自定义资源PrometheusRule。

直到我意识到这个事实，花了三个小时…

coreos/kube-prometheus生成了10个PrometheusRule，您可以使用以下命令来进行确认。其中，将需要修正以下4个。

$ kubectl get prometheusrules --namespace monitoring

PrometheusRule削除するalertpg-kube-prometheusDeadMansSwitchpg-exporter-kubernetesK8SApiserverDownpg-exporter-kube-controller-managerK8SControllerManagerDownpg-exporter-kube-schedulerK8SSchedulerDown

$ kubectl edit prometheusrule pg-kube-prometheus --namespace monitoring

       for: 10m
       labels:
         severity: warning
-    - alert: DeadMansSwitch
-      annotations:
-        description: This is a DeadMansSwitch meant to ensure that the entire Alerting
-          pipeline is functional.
-        summary: Alerting DeadMansSwitch
-      expr: vector(1)
-      labels:
-        severity: none
     - expr: process_open_fds / process_max_fds
       record: fd_utilization
     - alert: FdExhaustionClose

$ kubectl edit prometheusrule pg-exporter-kubernetes --namespace monitoring

       for: 10m
       labels:
         severity: critical
-    - alert: K8SApiserverDown
-      annotations:
-        description: No API servers are reachable or all have disappeared from service
-          discovery
-        summary: No API servers are reachable
-      expr: absent(up{job="apiserver"} == 1)
-      for: 20m
-      labels:
-        severity: critical
     - alert: K8sCertificateExpirationNotice
       annotations:
         description: Kubernetes API Certificate is expiring soon (less than 7 days)

$ kubectl edit prometheusrule pg-exporter-kube-controller-manager --namespace monitoring

 spec:
   groups:
   - name: kube-controller-manager.rules
-    rules:
-    - alert: K8SControllerManagerDown
-      annotations:
-        description: There is no running K8S controller manager. Deployments and replication
-          controllers are not making progress.
-        runbook: https://coreos.com/tectonic/docs/latest/troubleshooting/controller-recovery.html#recovering-a-controller-manager
-        summary: Controller manager is down
-      expr: absent(up{job="kube-controller-manager"} == 1)
-      for: 5m
-      labels:
-        severity: critical
+    rules: []

$ kubectl edit prometheusrule pg-exporter-kube-scheduler --namespace monitoring

       labels:
         quantile: "0.5"
       record: cluster:scheduler_binding_latency_seconds:quantile
-    - alert: K8SSchedulerDown
-      annotations:
-        description: There is no running K8S scheduler. New pods are not being assigned
-          to nodes.
-        runbook: https://coreos.com/tectonic/docs/latest/troubleshooting/controller-recovery.html#recovering-a-scheduler
-        summary: Scheduler is down
-      expr: absent(up{job="kube-scheduler"} == 1)
-      for: 5m
-      labels:
-        severity: critical

确认动作

终于完成了环境构建。现在，我们打开Prometheus和Grafana的Web控制台，来确认它们的状态。

然而，这次我们将Prometheus和Grafana都构建为ClusterIP。因此，无法从AKS外部使用Web控制台。因此，我们需要进行端口转发，以便使用Web控制台。

普罗米修斯的确认

将 Prometheus 的 9090 端口进行转发。

$ kubectl port-forward $(kubectl get pod --namespace monitoring -l prometheus=kube-prometheus -l app=prometheus -o template --template "{{(index .items 0).metadata.name}}") --namespace monitoring 9090:9090

确认Grafana

将Prometheus的3000端口转发。

$ kubectl port-forward $(kubectl get pod --namespace monitoring -l app=pg-grafana -o template --template "{{(index .items 0).metadata.name}}") --namespace monitoring 3000:3000

coreos/kube-prometheus会自动将Prometheus注册为DataSource，但有时会设置奇怪的URL。请确认Prometheus Service的名称，并将URL更改为正确的（这次是http://pg-prometheus:9090/）。

$ kubectl get services --namespace monitoring -l app=prometheus -l prometheus=pg
NAME            TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
pg-prometheus   ClusterIP   10.0.105.143   <none>        9090/TCP   3h

下一回

因此，我們在Azure AKS上構建了Prometheus和Grafana來進行資源監視，效果非常好。
下一步計劃是在本次AKS上添加Elasticsearch+Fluentd+Kibana，將Kubernetes上的日誌集中到Elasticsearch中，並且將Elasticsearch和Grafana進行連接，當特定的日誌輸出時，我們希望能夠通知到Slack。