普罗米修斯样本设定
简单概述
-
- k8sテスト環境構築
- Prometheus サンプル設定
建立目录
- 全体目次
环境
(Note: This is the native Chinese translation of the word “environment”)
-
- Rancher: v2.4.8
-
- kubernetes(Client): v1.19.1
-
- kubernetes(Server): v1.18.8
-
- kube-prometheus-stack Chart: v9.4.3
- kube-prometheus-stack App: v0.38.1
样品设置摘要
-
- Nginx度量的曝露设置
-
- 将Nginx添加到Prometheus的目标列表中
-
- 创建Grafana仪表盘来显示Nginx的HTTP请求计数
使用HTTP请求计数来创建警报 → 将警报外部转发
Nginx指標設定
-
- 作業場所: ClientPC
-
- ngx_http_stub_status_module Configuration Page
-
- https://nginx.org/en/docs/http/ngx_http_stub_status_module.html#stub_status
Nginxのnginx.confに以下を追加し、ngx_http_stub_status_moduleを有効にする
server {
listen 8080;
location /metrics {
stub_status;
}
}
-
- nginx-prometheus-exporter Page
-
- https://github.com/nginxinc/nginx-prometheus-exporter
PrometheusにNginx metricsを追加するため、nginx-prometheus-exporterをサイドカーとして追加
マニフェスト作成
apiVersion: v1
kind: ConfigMap
metadata:
name: nginx-configmap
labels:
app: nginx
data:
nginx.conf: |2
user nginx;
worker_processes 1;
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
server {
listen 8080;
location /metrics {
stub_status;
}
}
sendfile on;
keepalive_timeout 65;
include /etc/nginx/conf.d/*.conf;
}
---
apiVersion: v1
kind: Service
metadata:
name: nginx-svc
labels:
app: nginx
spec:
selector:
app: nginx
ports:
- name: nginx-http
protocol: TCP
port: 80
targetPort: 80
- name: nginx-exporter
protocol: TCP
port: 9113
targetPort: 9113
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deploy
labels:
app: nginx
spec:
selector:
matchLabels:
app: nginx
replicas: 1
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.19.2
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
volumeMounts:
- name: nginx-conf
mountPath: /etc/nginx/nginx.conf
subPath: nginx.conf
- name: nginx-exporter
image: nginx/nginx-prometheus-exporter:0.8.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 9113
args:
- -nginx.scrape-uri=http://localhost:8080/metrics
volumes:
- name: nginx-conf
configMap:
name: nginx-configmap
items:
- key: nginx.conf
path: nginx.conf
- Nginx 起動
$ kubectl apply -f test-nginx.yaml
增加Prometheus的目标
在安装Prometheus Operator时,可以使用新增的CRDServiceMonitor来添加目标。
-
- serviceMonitorSelector 確認
-
- serviceMonitorSelector.matchLabelsの値をServiceMonitorに追加する
-
- この環境ではrelease: prometheusを追加
- → 設定しないとPrometheusはServiceMonitorを追加しない
$ kubectl get prometheus -n monitoring prometheus-kube-prometheus-prometheus -o yaml
..........
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector:
matchLabels:
release: prometheus
..........
- マニフェスト作成
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: nginx-servicemonitor
namespace: monitoring
labels:
app: nginx
release: prometheus # serviceMonitorSelectorで確認したラベル
spec:
endpoints:
- port: nginx-exporter
namespaceSelector:
matchNames:
- default
selector:
matchLabels:
app: nginx # Serviceのラベル
- ServiceMonitor 作成
$ kubectl apply -f nginx-servicemonitor.yaml
## 確認 ##
$ kubectl get servicemonitor -n monitoring
NAME AGE
..........
nginx-servicemonitor 64s
..........


创建Grafana仪表盘

NameData sourceRefreshQueryclusterPrometheusOn Dashboard Loadlabel_values(kube_pod_info, cluster)namespacePrometheusOn Dashboard Loadlabel_values(kube_pod_info{cluster=”\$cluster”}, namespace)

警報設置
在安装 Prometheus Operator 时,使用添加的 CRD PrometheusRule 来添加目标
-
- ruleSelector 確認
-
- ruleSelector.matchLabelsの値をPrometheusRuleに追加する
-
- この環境ではapp: kube-prometheus-stack、release: prometheusを追加
- → 設定しないとPrometheusはPrometheusRuleを追加しない
$ kubectl get prometheus -n monitoring prometheus-kube-prometheus-prometheus -o yaml
..........
ruleNamespaceSelector: {}
ruleSelector:
matchLabels:
app: kube-prometheus-stack
release: prometheus
..........
-
- マニフェスト作成
作成Rule: HTTP Request Count値が1分間20を超えたらアラート発生
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
app: kube-prometheus-stack # ruleSelectorで確認したラベル
release: prometheus # ruleSelectorで確認したラベル
name: http-request-rule.rules
namespace: monitoring
spec:
groups:
- name: nginx-http-request.rules
rules:
- alert: NginxTooManyRequest
annotations:
message: HTTP Request to {{ $labels.namespace }}/{{ $labels.pod }} is over 20 / 1 minute.
expr: sum(increase(nginx_http_requests_total{job="nginx-svc", namespace=~".*"}[1m])) by (namespace, pod) > 20
for: 1m
labels:
severity: critical
- PrometheusRule 作成
$ kubectl apply -f nginx-http-request-rule.yaml
## 確認 ##
$ kubectl get prometheusrule -n monitoring
NAME AGE
..........
http-request-rule.rules 27m
..........

提醒:外部转移到
把警报转发到Slack

global:
resolve_timeout: 1m
slack_api_url: 'https://hooks.slack.com/services/xxxxx..............................'
route:
receiver: 'slack_notifications'
group_interval: 5m
group_wait: 30s
repeat_interval: 12h
routes:
- match:
alertname: 'NginxTooManyRequest'
receiver: 'nginx_request_count_error'
receivers:
- name: 'slack_notifications'
- name: 'nginx_request_count_error'
slack_configs:
- channel: '#it-test'
send_resolved: true
icon_url: https://avatars3.githubusercontent.com/u/3380462
-
- 設定ファイルのsecret化
- 上記マニフェストのbase64値を確認
$ kubectl create secret generic test --from-file=./alertmanager.yaml --dry-run=client -o yaml
apiVersion: v1
data:
alertmanager.yaml: {base64値}
kind: Secret
metadata:
creationTimestamp: null
name: test
-
- secret 更新用マニフェスト作成
- 上記のbase64値を使って更新用マニフェスト作成
apiVersion: v1
kind: Secret
metadata:
labels:
release: prometheus
name: alertmanager-prometheus-prometheus-oper-alertmanager
namespace: monitoring
type: Opaque
data:
alertmanager.yaml: {base64値}
- secret 更新
## secret 確認 ##
$ kubectl get secret -n monitoring
NAME TYPE DATA AGE
..........
alertmanager-prometheus-kube-prometheus-alertmanager Opaque 1 3d10h
..........
## 更新 ##
$ kubectl apply -f alertmanager-slack.yaml
