将Prometheus Node Exporter安装在Raspberry Pi Zero W上会导致持续出现错误的syslog,需要进行相应的解决
太长没读过 dú guò)
树莓派Zero W默认未设置任何监控目标,导致日志中持续输出采集器获取错误。若在启动node-exporter时添加–no-collector.pressure –no-collector.rapl参数,则不再输出错误信息。
背景
我之前在家中的Raspberry Pi Zero W上安装了Prometheus的node-exporter,并进行资源监控。在Grafana Loki中查看syslog时,定期出现以下错误信息。
Jan 17 22:47:47 raspberry-zw prometheus-node-exporter[27206]: level=error ts=2023-01-17T13:47:47.285Z caller=collector.go:161 msg="collector failed" name=rapl duration_seconds=0.000263998 err="failed to retrieve rapl stats: no sysfs powercap / RAPL power metrics files found"
Jan 17 22:47:47 raspberry-zw prometheus-node-exporter[27206]: level=error ts=2023-01-17T13:47:47.359Z caller=collector.go:161 msg="collector failed" name=pressure duration_seconds=0.000376998 err="failed to retrieve pressure stats: psi_stats: unavailable for cpu"
环境
执行环境为Raspberry Pi Zero W,操作系统为Rasbian 11 bullseye。
PRETTY_NAME="Raspbian GNU/Linux 11 (bullseye)"
NAME="Raspbian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
这个`node-exporter`是通过apt安装的。
虽然1.1.2版本修复了与`rapl`相关的错误,但这次与之无关。
node_exporter, version 1.1.2+ds (branch: debian/sid, revision: 1.1.2+ds-2.1)
build user: team+pkg-go@tracker.debian.org
build date: 20210725-21:22:06
go version: go1.15.9
platform: linux/arm
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===================================-=======================-============-====================================================================
ii prometheus-node-exporter 1.1.2+ds-2.1 armhf Prometheus exporter for machine metrics
ii prometheus-node-exporter-collectors 0+git20210115.7d89f19-1 all Supplemental textfile collector scripts for Prometheus node_exporter
公式文件的描述
根据官方文档:https://github.com/prometheus/node_exporter/blob/master/README.md
可以看到以下内容:
简而言之,执行的指标默认情况下会自动启用并执行。
如果想要禁用此功能,需在启动参数中添加–no-collector.。
收藏家
每个操作系统对于收藏家的支持有所不同。下面的表格列出了所有现有的收藏家以及支持的系统。
通过提供–collector标志来启用收藏家。默认启用的收藏家可以通过提供–no-collector标志来禁用。要只启用一些特定的收藏家,请使用–collector.disable-defaults –collector标志。
这一次错误输出的是以下两个收集器。
名称
描述压力
从/proc/pressure/中公开压力停顿统计。rapl
从/sys/class/powercap中公开各种统计信息。
Cause/Reason
首先,collector的引用对象psi/rapl在树莓派zero w上未安装。
(在Raspiberrypi 4上,有相关的目录,即使默认执行node expoter也不会产生此错误。)
ls: cannot access '/proc/pressure': No such file or directory
ls: cannot access '/sys/class/powercap': No such file or directory
对应
根据官方文件,在启动参数中设置–no-collector.。
这次在systemctl的服务文件/lib/systemd/system/prometheus-node-exporter.service中,指定了命令执行时的环境变量文件,
因此要覆盖环境设置文件。
[Unit]
Description=Prometheus exporter for machine metrics
Documentation=https://github.com/prometheus/node_exporter
[Service]
Restart=on-failure
User=prometheus
EnvironmentFile=/etc/default/prometheus-node-exporter
ExecStart=/usr/bin/prometheus-node-exporter $ARGS
ExecReload=/bin/kill -HUP $MAINPID
TimeoutStopSec=20s
SendSIGKILL=no
[Install]
WantedBy=multi-user.target
改變前:
# Set the command-line arguments to pass to the server.
# Due to shell scaping, to pass backslashes for regexes, you need to double
# them (\\d for \d). If running under systemd, you need to double them again
# (\\\\d to mean \d), and escape newlines too.
ARGS=""
# prometheus-node-exporter supports the following options:
#
# --collector.arp
# Enable the arp collector (default: enabled).
修改后:
# Set the command-line arguments to pass to the server.
# Due to shell scaping, to pass backslashes for regexes, you need to double
# them (\\d for \d). If running under systemd, you need to double them again
# (\\\\d to mean \d), and escape newlines too.
ARGS="--no-collector.pressure --no-collector.rapl"
# prometheus-node-exporter supports the following options:
#
# --collector.arp
# Enable the arp collector (default: enabled).
应用更改命令
执行下列操作以应用更改。
执行以下内容以生效更改。
$ sudo vim /etc/default/prometheus-node-exporter
$ sudo systemctl restart prometheus-node-exporter
% sudo systemctl status prometheus-node-exporter
● prometheus-node-exporter.service - Prometheus exporter for machine metrics
Loaded: loaded (/lib/systemd/system/prometheus-node-exporter.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2023-01-17 23:13:22 JST; 55min ago
Docs: https://github.com/prometheus/node_exporter
Main PID: 27402 (prometheus-node)
Tasks: 7 (limit: 415)
CPU: 2min 21.207s
CGroup: /system.slice/prometheus-node-exporter.service
└─27402 /usr/bin/prometheus-node-exporter --no-collector.pressure --no-collector.rapl
以上 (yǐ could be paraphrased in Chinese as “以上所述” (yǐ suǒ shù) or “上述内容” shù .