解释Prometheus.yml文件中的内容

2 年 ago

韵, 科

2 minutes

首先

Prometheus相关的文章可以在网上找到一些探索和初步使用的信息，但很少有关于以文件为单位的解释性文章。
顾名思义，这篇文章介绍了在Prometheus中最初需要设置的prometheus.yml文件的内容。

如果有任何错误，请指正，谢谢。

还有，本文所描述的内容是适用于Linux的Prometheus版本2.13.1。

Prometheus.yml文件的内容

首先，我们来看一下prometheus.yml的内容。
我们将逐一按照行号对每个配置项进行解释说明。


1　# my global config
2　global:
3  scrape_interval:     15s 
　　# Set the scrape interval to every 15 seconds. Default is every 1 minute.
4  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
5  # scrape_timeout is set to the global default (10s).
6
7　# Alertmanager configuration
8　alerting:
9   alertmanagers:
10   - static_configs:
11     - targets:
12       # - alertmanager:9093
13 
14　 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
15 　rule_files:
16   # - "first_rules.yml"
17   # - "second_rules.yml"
18 
19 　# A scrape configuration containing exactly one endpoint to scrape:
20 　# Here it's Prometheus itself.
21 　scrape_configs:
22   # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
23   - job_name: 'prometheus'
24 
25     # metrics_path defaults to '/metrics'
26     # scheme defaults to 'http'.
27 
28     static_configs:
29     - targets: ['localhost:9090']

全局设定（1~5行）

1　# my global config
<<訳>> 全体設定　メトリクス取得の周期等を設定する

2　global:
3  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
<<訳>>データ取得間隔           # スクレイプ（データ取得）間隔を設定します。デフォルトは1分毎です 

4  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
<<訳>>評価間隔(直訳)           # Prometheusはアラーティングルールとアラートの状態を取得し、ヘルスチェックを実施します。これは、その間隔を設定する項目。

5  # scrape_timeout is set to the global default (10s).
<<筆者挿入(公式から引用)>> [ scrape_timeout: <duration> | default = 10s ] #スクレイプのタイムアウト時間を設定します。デフォルト10秒。

警报管理器的设定（第7至12行）

7　# Alertmanager configuration
   # Alertmanagerの設定項目。
<<解説>> Prometheusサーバがアラート送信(push)するAlertmanagerインスタンスを指定します。
         また、Alertmanagerとの通信方法を設定するためのパラメータ設定も行えます。
         Alertmanagerはstatic_configsで静的に設定することも、サポートされている
         service検出メカニズムを使用して動的に検出することもできます。


8　alerting:
9   alertmanagers:
10   - static_configs:
11     - targets:
12       # - alertmanager:9093
<<解説>> デフォルトのAlertmanagerにpushする場合は上記のalertmanagerをコメントアウトすればOK

设置告警规则（第14至17行）

14　 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
<<訳>>globalで設定したevaluation_intervalに従って定期的に評価する。
<<解説>>Prometheusが取得したメトリクスに対する評価のルール設定ファイルを指定します。
        先に設定したevaluation_intervalの取得間隔で参照するファイルとなります。
        また、本ファイルがymlなので、記述順にルールの重みづけがされます。
15 　rule_files:
16   # - "first_rules.yml"
17   # - "second_rules.yml"

設定監視對象（第19至29行）

19 　# A scrape configuration containing exactly one endpoint to scrape:
20 　# Here it's Prometheus itself.

21 　scrape_configs:
22   # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
23   - job_name: 'prometheus'
24 
25     # metrics_path defaults to '/metrics'
26     # scheme defaults to 'http'.
27 
28     static_configs:
29     - targets: ['localhost:9090'] 

<<解説>>targets: Prometheus が情報を取得する対象のホスト名もしくはIPアドレスと、ポート番号を設定する
        主な設定対象はexporter。

以下是人们通常认为有洞察力和启发性的言论。

・《Prometheus 入門》 (オライリージャパン)

・普罗米修斯官方文件的配置

彼得罗斯：了解警报延迟的原因