请使用IBM Cloud监控与Sysdig来显示GlusterFS的指标

2 年 ago

韵, 科

2 minutes

目标

有一个将GlusterFS的各种指标转换为Prometheus格式的Exporter，并可以使用它来可视化指标。我们将使用IBM Cloud Monitoring with Sysdig来显示。Sysdig最近添加了对Prometheus格式的支持。

环境

假设我们已经建立了过去文章中所述的环境。

程序

Gluster出口器

构建gluster-exporter

我使用Go进行编译。

$ mkdir -p ${GOPATH}/src/github.com/gluster
$ cd ${GOPATH}/src/github.com/gluster
$ git clone https://github.com/gluster/gluster-prometheus.git
$ cd gluster-prometheus
$ ./scripts/install-reqs.sh
$ make
$ ./build/gluster-exporter --version
version   : v0.3-dev.93.git3ebaacc
go version: go1.15.14
go OS/arch: linux/amd64

安装gluster-exporter

将下面这两个文件复制到GlusterFS服务器上（所有服务器）。

./build/gluster-exporter.service -> /etc/systemd/system/
./build/gluster-exporter -> /usr/local/sbin/

创建配置文件

将以下文件部署到GlusterFS服务器（所有节点）。

[globals]
gluster-mgmt = "glusterd"
glusterd-dir = "/var/lib/glusterd"
gluster-binary-path = "gluster"
# If you want to connect to a remote gd1 host, set the variable gd1-remote-host
# However, using a remote host restrict the gluster cli to read-only commands
# The following collectors won't work in remote mode : gluster_volume_counts, gluster_volume_profile
#gd1-remote-host = "localhost"
gd2-rest-endpoint = "http://127.0.0.1:24007"
port = 9713
metrics-path = "/metrics"
log-dir = "/var/log"
log-file = "gluster-exporter.log"
log-level = "info"

[collectors.gluster_ps]
name = "gluster_ps"
sync-interval = 5
disabled = false

[collectors.gluster_brick]
name = "gluster_brick"
sync-interval = 5
disabled = false

服务开始

在所有GlusterFS服务器上启动Gluster Exporter。

# systemctl daemon-reload
# systemctl enable --now gluster-exporter

确认动作

从GlusterFS服务器外部访问端口9713。

$ curl -s vs-gluster-tok1:9713/metrics
$ curl -s vs-gluster-tok2:9713/metrics
$ curl -s vs-gluster-tok3:9713/metrics

Sysdig系统

安装Sysdig Agent

按照指南，将GlusterFS服务器（所有台）安装。

$ curl -sL https://ibm.biz/install-sysdig-agent | sudo bash -s -- -a ******** -c ingest.private.jp-tok.monitoring.cloud.ibm.com --collector_port 6443 --secure true -ac "sysdig_capture_enabled: false"

启用Prometheus数据采集。

使用三个共同的Sysdig Agent启用对Prometheus的抓取。

（追記）
prometheus:
  enabled: true
  prom_service_discovery: true

每个服务器的抓取设置都将其目标设定为自身的主机，这是为了实现Sysdig代理和GlusterFS的1:1对应。

scrape_configs:
- job_name: gluster
  static_configs:
  - targets:
    - vs-gluster-tok1:9713

scrape_configs:
- job_name: gluster
  static_configs:
  - targets:
    - vs-gluster-tok2:9713

scrape_configs:
- job_name: gluster
  static_configs:
  - targets:
    - vs-gluster-tok3:9713

在各个服务器上重新启动Sysdig Agent。

# service dragent restart

确认仪表盘的显示

从开始抓取数据到几分钟到大约10分钟后，您可以在Sysdig仪表板上看到它显示出来。

在使用 Prometheus 进行网页抓取之后，你可以立即确认结果，但使用 Sysdig 需要相当长的时间，这是非常令人不满意的。