尝试免费安装Linux系统监控软件“Prometheus”

3 年 ago

清, 扬

4 minutes

中国 (实施环境)

[testuser@testhost ~]$ uname -a
Linux testhost 4.18.0-448.el8.x86_64 #1 SMP Wed Jan 18 15:02:46 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
[testuser@testhost ~]$ cat /etc/redhat-release
CentOS Stream release 8

Prometheus版本为2.46.0.linux-amd64，node_exporter版本为1.6.1.linux-amd64。

0. 总结

您是否熟悉“云原生”这个词？
云原生是一种“从设计层面开始考虑在云上运行的思维方式”，是更深层次的“使应用在云上运行”的概念。
云原生推动组织中有一个叫做Cloud Native Computing Foundation（CNCF）的团体，该团体管理的软件之一是Prometheus。

Prometheus（普罗米修斯）是一款开源的系统监控软件。
它具有动态管理监控服务器等特点，被认为非常适合云原生环境。
此外，它的设置也非常简单，只需在服务器上部署并执行命令即可立即启动。

普罗米修斯

这次我们作为试验，在单个 Linux 服务器上进行安装，而不是在云环境中。
作为最基本的配置，我们将安装核心部分的「Prometheus」和用于获取信息的组件「exporter」，并且目标是将 CPU 使用情况显示在屏幕上。

1. 软件下载

首先，您可以从以下网站下载软件。

下载

Prometheus使用了2.46.0版本。

出口商使用不同的信息进行收集，但这次我们将使用适用于机器资源收集的 node_exporter。其版本为1.6.1。

2. 安装 Prometheus 。

现在，我们开始安装。
首先，在服务器上放置 Prometheus 主文件的 .tar.gz 文件，并进行解压。

[testuser@testhost tmp]$ pwd
/tmp
[testuser@testhost tmp]$ ls -l prometheus-*.tar.gz
-rw-rw-r--. 1 testuser testuser 94876162  8月 15 12:42 prometheus-2.46.0.linux-amd64.tar.gz
[testuser@testhost tmp]$ tar xzf prometheus-2.46.0.linux-amd64.tar.gz
[testuser@testhost tmp]$ ls -ld prometheus-*
drwxr-xr-x. 4 testuser testuser      132  7月 25 22:11 prometheus-2.46.0.linux-amd64
-rw-rw-r--. 1 testuser testuser 94876162  8月 15 12:42 prometheus-2.46.0.linux-amd64.tar.gz
[testuser@testhost tmp]$

解压缩后的文件内容如下所示。

[testuser@testhost tmp]$ cd prometheus-2.46.0.linux-amd64/
[testuser@testhost prometheus-2.46.0.linux-amd64]$ pwd
/tmp/prometheus-2.46.0.linux-amd64
[testuser@testhost prometheus-2.46.0.linux-amd64]$ ls -l
合計 236276
-rw-r--r--. 1 testuser testuser     11357  7月 25 22:06 LICENSE
-rw-r--r--. 1 testuser testuser      3773  7月 25 22:06 NOTICE
drwxr-xr-x. 2 testuser testuser        38  7月 25 22:06 console_libraries
drwxr-xr-x. 2 testuser testuser       173  7月 25 22:06 consoles
-rwxr-xr-x. 1 testuser testuser 123611355  7月 25 21:34 prometheus
-rw-r--r--. 1 testuser testuser       934  7月 25 22:06 prometheus.yml
-rwxr-xr-x. 1 testuser testuser 118310964  7月 25 21:36 promtool

通常情况下，我们会将它们正确地放置在目录中并修改配置文件，但由于这次只是试验，所以我们会简单地将其放在 tmp 目录中运行。

以下是执行命令的方法。
只要通过Ctrl + C或其他方式终止命令进程，Prometheus将保持在运行状态。

[testuser@testhost prometheus-2.46.0.linux-amd64]$ ./prometheus --config.file=./prometheus.yml
ts=2023-08-15T04:10:42.048Z caller=main.go:541 level=info msg="No time or size retention was set so using the default time retention" duration=15d
ts=2023-08-15T04:10:42.049Z caller=main.go:585 level=info msg="Starting Prometheus Server" mode=server version="(version=2.46.0, branch=HEAD, revision=cbb69e51423565ec40f46e74f4ff2dbb3b7fb4f0)"
(省略)
ts=2023-08-15T04:10:42.190Z caller=main.go:1011 level=info msg="Server is ready to receive web requests."
ts=2023-08-15T04:10:42.190Z caller=manager.go:1009 level=info component="rule manager" msg="Starting rule manager..."

在这种情况下，快速访问 Prometheus。
打开浏览器，通过HTTP访问已安装Prometheus的服务器。
端口号是9090。

使用以下的 IP 地址连接服务器: 9090。

如果出现以下这样的画面，那么就表示成功了。

3. 导出者安装

好的，仅需一种选择，来用中文表达：
嗯，仅仅这样是无法收集到信息的。
为了进行信息收集，需要一个名为exporter的组件。
将刚才下载的node_exporter传输到服务器上并解压吧。

[testuser@testhost tmp]$ pwd
/tmp
[testuser@testhost tmp]$ ls -l node_exporter-*.tar.gz
-rw-rw-r--. 1 testuser testuser 10368103  8月 15 12:44 node_exporter-1.6.1.linux-amd64.tar.gz
[testuser@testhost tmp]$ tar xzf node_exporter-1.6.1.linux-amd64.tar.gz
[testuser@testhost tmp]$ ls -ld node_exporter-*
drwxr-xr-x. 2 testuser testuser       56  7月 17 21:16 node_exporter-1.6.1.linux-amd64
-rw-rw-r--. 1 testuser testuser 10368103  8月 15 12:44 node_exporter-1.6.1.linux-amd64.tar.gz

解冻后的内容如下所示。

[testuser@testhost ~]$ cd /tmp/node_exporter-1.6.1.linux-amd64/
[testuser@testhost node_exporter-1.6.1.linux-amd64]$ pwd
/tmp/node_exporter-1.6.1.linux-amd64
[testuser@testhost node_exporter-1.6.1.linux-amd64]$ ls -l
合計 19572
-rw-r--r--. 1 testuser testuser    11357  7月 17 21:15 LICENSE
-rw-r--r--. 1 testuser testuser      463  7月 17 21:15 NOTICE
-rwxr-xr-x. 1 testuser testuser 20025119  7月 17 21:11 node_exporter

我们立即启动node_exporter吧。

[testuser@testhost node_exporter-1.6.1.linux-amd64]$ ./node_exporter
ts=2023-08-15T04:18:45.379Z caller=node_exporter.go:180 level=info msg="Starting node_exporter" version="(version=1.6.1, branch=HEAD, revision=4a1b77600c1873a8233f3ffb55afcedbb63b8d84)"
ts=2023-08-15T04:18:45.379Z caller=node_exporter.go:181 level=info msg="Build context" build_context="(go=go1.20.6, platform=linux/amd64, user=root@586879db11e5, date=20230717-12:10:52, tags=netgo osusergo static_build)"
(省略)
ts=2023-08-15T04:18:45.389Z caller=tls_config.go:274 level=info msg="Listening on" address=[::]:9100
ts=2023-08-15T04:18:45.389Z caller=tls_config.go:277 level=info msg="TLS is disabled." http2=false address=[::]:9100

然而，仅有这样还不能实现 Prometheus 和 exporter 的协作。

4. Prometheus和exporter的协作

好的，剛剛我沒有解釋，但在啟動 Prometheus時的命令行

$ ./prometheus --config.file=./prometheus.yml

关于 ./prometheus.yml ，它是Prometheus的基本配置文件。
配置文件采用了YAML格式编写，并且其实际内容如下所示。

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]

这个设置文件的末尾部分

    static_configs:
      - targets: ["localhost:9090"]

通过将“の部分”更改为合适的主机名和端口号，可以实现exporter和Prometheus的协同工作。
由于本次将在同一台服务器上安装exporter和Prometheus，因此localhost保持不变。
node_exporter的端口号是”9100″。
因此，将被替换为以下内容：

    static_configs:
      - targets: ["localhost:9100"]

那么，让我们立即使用Ctrl+C暂停Prometheus主程序的命令进程，并开始进行配置文件的修改。

[testuser@testhost prometheus-2.46.0.linux-amd64]$ cp -p prometheus.yml prometheus.yml.bk
[testuser@testhost prometheus-2.46.0.linux-amd64]$ ls -l prometheus.yml*
-rw-r--r--. 1 testuser testuser 934  7月 25 22:06 prometheus.yml
-rw-r--r--. 1 testuser testuser 934  7月 25 22:06 prometheus.yml.bk
[testuser@testhost prometheus-2.46.0.linux-amd64]$ vi prometheus.yml
[testuser@testhost prometheus-2.46.0.linux-amd64]$ diff prometheus.yml prometheus.yml.bk
29c29
<       - targets: ["localhost:9100"]
---
>       - targets: ["localhost:9090"]

完成書寫後，重新啟動 Prometheus 主程序。

[testuser@testhost prometheus-2.46.0.linux-amd64]$ ./prometheus --config.file=./prometheus.yml
(省略)

请按照先前的方法，在浏览器中连接到 Prometheus 并确认 export 与其正确协作。
请选择「状态」→「目标」。

在「目标」页面上显示了「localhost:9100」，这表明之前启动的导出程序已被检测到。

让我们显示一下CPU的使用情况。选择“图表”并返回到初始画面。

点击搜索表单右侧的带有圆形标记的按钮。

然后会显示当前可见的指标列表，这次选择”node_cpu_seconds_total”。

确认在搜索表单中输入了”node_cpu_seconds_total”，然后按下”执行”按钮。

选择“图形”选项，可以看到有关CPU使用情况的信息以图表形式展示出来。

我能够将图表显示如下。

5. 查询语句

最后我们稍微修改一下查询语句，试着筛选出要显示的信息。
Prometheus 的查询语句语法详见以下页面。

查询

我参考了这个，并制作了以下查询语句。
限制了要显示的实例数。

node_cpu_seconds_total{instance="localhost:9100",cpu="0"}

执行此查询后得到的结果如下。
可以看到显示的图表被筛选了。

6. 备注

这次我们仅限于最基本的构建，但 Prometheus 还有很多其他功能。
例如，除了本次使用的 node_exporter 外，还有许多其他的 exporter，通过使用它们可以收集各种各样的信息。
另外，虽然这次没有介绍，但如果使用 alertmanager 组件，也可以发送警报到电子邮件或聊天工具上。

此外，Prometheus 可以与外部软件进行集成。
尤其是在官方网站上介绍了与 Grafana 的集成方法，可以通过仪表板等方式进行更高级的可视化。

使用Grafana

我认为在使用Prometheus时，查阅官方网站上的文档会得到许多其他有用的信息，因此我建议你看一下，这样不会有任何损失。

文件