使用Pacemaker/Corosync来构建冗余的Zabbix服务器

3 年 ago

逸, 科

4 minutes

索引1. 经过
2. 前提条件
2.1. 逻辑结构
2.2. 前提条件
3. 初始设置
4. Zabbix的安装设置
5. 为MariaDB设置复制用的DB用户
6. 集群的设置
6.1. 停止服务和禁用自动启动
6.2. 主机认证
6.3. 创建集群
6.4. 集群属性的设置
7. 资源设置
7.1. 转移到SV02备用
7.2. MariaDB
7.3. Zabbix服务器
7.4. Apache
7.5. 虚拟IP
7.6. 启动限制设置
8. 编辑Zabbix Agent的conf文件
9. 连通测试（Zabbix-Server ⇔ Zabbix-Agent）

1. 经过前几天，根据标题所述，我们已经构建了使用Pacemaker/Corosync实现冗余Zabbix服务器的设置。虽然许多人使用类似的中间件进行设置，但是由于缺少关于服务器和代理之间连接的文章，我决定将此记载在这里，以备忘录的形式公开。

作为一名初出茅庐的基础设施工程师，我知道自己还有许多不足之处，希望您能在评论中指出。

前提 – Premise

2.1. 逻辑结构

2.2. 假设条件・两个节点是在ESXi上配置的CentOS7。
・每个节点都有一个网卡。
・Zabbix是Active-Standby配置。
・MariaDB是Master-Slave配置。
・通过虚拟IP进行切换。
・防火墙和SELinux是禁用的。

3. 初始设定在hosts文件中添加主机名。
【在SV01/SV02上执行】

# vi /etc/hosts
----------------------------------------------------------------
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.11.13 SV01  #ノード1のホスト名
192.168.11.14 SV02　#ノード2のホスト名

在SV01/SV02上，安装 Pacemaker、Corosync 和 pcs。同时，启动 pcs 服务并设置为自动启动。

# yum -y install pacemaker corosync pcs
# systemctl start pcsd
# systemctl enable pcsd

在SV01/SV02上进行以下操作：更改集群认证用户的密码。hacluster是在安装Corosync时自动添加的用户。

# passwd hacluster
ユーザー hacluster のパスワードを変更。
新しいパスワード: [任意のパスワード]
新しいパスワードを再入力してください: [任意のパスワード]
passwd: すべての認証トークンが正しく更新できました。

4. Zabbix的安装配置根据以下网站的参考，我们将搭建Zabbix系统。

「尝试使用 Zabbix①: 安装篇」

使ってみよう Zabbix① : インストール編

5. MariaDB的复制用数据库用户设置在SV01/SV02上创建MariaDB的复制用户。

# systemctl start mariadb
# mysql -uroot
> create user 'repl'@'%' identified by '[任意のパスワード]';
> create user 'repl'@'localhost' identified by '[任意のパスワード]';
> grant replication slave on *.* to 'repl'@'%';
> grant process, super, replication slave, replication client, reload on *.* to 'repl'@'localhost';
> flush privileges;
> quit;

在SV01上进行MariaDB复制功能的设置。

# vi /etc/my.cnf
----------------------------------------------------------------
[mysqld]
log-bin=mariadb-bin　#追記
server-id=1　#追記
log-basename=SV01　#追記

在SV02上进行【实施】。

# vi /etc/my.cnf
----------------------------------------------------------------
[mysqld]
log-bin=mariadb-bin　#追記
server-id=2　#追記
log-basename=SV02　#追記

首先，将主数据库（SV01）的数据迁移到从数据库（SV02）。
在进行迁移之前，需要先删除从数据库（SV02）中的数据。
【在SV02上执行】

# systemctl stop mariadb
# rm -rf /var/lib/mysql

接下来，在主服务器（SV01）上通过SSH连接将数据库数据传输到从服务器（SV02）。
【在SV01上执行】

# systemctl stop mariadb
# tar cf - -C /var/lib mysql | ssh 192.168.11.14 tar xpf - -C /var/lib
Are you sure you want to continue connectiong (yes/no)? yes
root@192.168.11.14's password:[SV02のrootユーザのパスワード]

6. 集群的设置

6.1. 服务停止和自动启动禁用在设置集群之前，在资源上停止并禁用要添加的服务的自动启动。
【在SV01/SV02上执行】

# systemctl stop zabbix-server
# systemctl stop mariadb
# systemctl stop httpd
# systemctl disable zabbix-server
# systemctl disable mariadb
# systemctl disable httpd

6.2. 主机认证在集群中对每个主机进行认证。
【在SV01上执行】

# pcs cluster auth SV01 SV02
Username: hacluster
Password: [haclusterのパスワード]
SV01: Authorized
SV02: Authorized

6.3. 创建群集在SV01上创建Zabbix集群。”zabbix_cluster” 可以替换为任何名字。

# pcs cluster setup --name zabbix_cluster SV01 SV02

启动群集，并设置自动启动。
【在SV01上执行】

# pcs cluster start --all
# pcs cluster enable --all

6.4.设置群集的属性在SV01上执行STONITH的无效化操作。

# pcs property set stonith-enabled=false

对于 Quorum 操作的更改。
【在 SV01 上执行】

# pcs property set no-quorum-policy=ignore

禁用自动回退功能。（忽略警告）
【在SV01上执行】

# pcs resource defaults resource-stickiness=INFINITY
Warning: Defaults do not apply to resources which override them with their own defined values

在错误发生时，立即进行故障切换。（忽略警告）
【在SV01上执行】

# pcs resource defaults migration-threshold=1
Warning: Defaults do not apply to resources which override them with their own defined values

7. 资源设置

7.1. SV02的待机状态转移【在SV01上执行】将SV02的集群节点状态强制设为待机状态。

# pcs cluster standby SV02

7.2. MariaDB -> 7.2版本的MariaDB
将MariaDB添加到资源中。
【在SV01上进行】

# pcs resource create mariadb \
> ocf:heartbeat:mysql \
> binary=/usr/bin/mysqld_safe \
> datadir=/var/lib/mysql \
> log=/var/log/mariadb/mariadb.log \
> pid=/run/mariadb/mariadb.pid \
> replication_user=repl \
> replication_passwd=[replユーザのパスワード]  \
> op monitor interval=10s timeout=10s

在SV01上，使用Master-Slave配置来追加MariaDB的资源。

# pcs resource master mariadb-clone mariadb \
> master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

我們會優先使SV01作為主機啟動。
從預測上看，得分越高越優先啟動。
-INFINITY<-100<0<100<INFINITY
【在SV01上執行】

# pcs constraint location mariadb-clone prefers SV01=100

7.3. Zabbix服务器在SV01上添加Zabbix服务器资源。

# pcs resource create zabbix-server \
> systemd:zabbix-server \
> op monitor interval=10s timeout=10s

7.4. Apache7.4. Apache

在SV01/SV02上创建Apache的status.conf文件。

# vi /etc/httpd/conf.d/status.conf
----------------------------------------------------------------
ExtendedStatus On

<Location /server-status>
    SetHandler server-status
    Require local
</Location>

在SV01上执行，将Apache添加到资源中。

# pcs resource create apache \
> ocf:heartbeat:apache \
> configfile=/etc/httpd/conf/httpd.conf \
> statusurl="http://localhost/server-status" \
> op monitor interval=10s timeout=10s

7.5. 虚拟 IP在资源中添加虚拟IP。
【在SV01上执行】

# pcs resource create vip \
> ocf:heartbeat:IPaddr2 \
> ip=192.168.11.200 \
> cidr_netmask=24 \
> nic=eno16777736 \
> op monitor interval=10s timeout=10s

7.6 启动限制设定首先，为了方便进行设置，我们将资源分组组织起来。【在SV01上执行】

# pcs resource group add zabbix_group vip apache zabbix-server

然后，我们将设置每个资源的启动限制。
【在SV01上执行】

# pcs constraint order start zabbix_group then promote mariadb-clone
# pcs constraint colocation add zabbix_group with master mariadb-clone INFINITY
# pcs constraint colocation add master mariadb-clone with zabbix_group INFINITY
# pcs constraint colocation add zabbix-server apache INFINITY
# pcs constraint colocation add zabbix-server vip INFINITY

在此时点，执行status命令，确认以下结果已输出。
如果出现Faild Resource Action等错误，则请重新审查各配置文件和设置值。

# pcs status
Cluster name: zabbix_cluster
Stack: corosync
Current DC: SV01 (version 1.1.23-1.el7-9acf116022) - partition with quorum
Last updated: Wed Nov 25 18:46:14 2020
Last change: Wed Nov 25 18:35:46 2020 by root via cibadmin on SV01

2 nodes configured
5 resource instances configured

Online: [ SV01 SV02 ]

Full list of resources:

 Master/Slave Set: mariadb-clone [mariadb]
     Masters: [ SV01 ]
     Slaves: [ SV02 ]
 Resource Group: zabbix_group
     vip        (ocf::heartbeat:IPaddr2):       Started SV01
     apache     (ocf::heartbeat:apache):        Started SV01
     zabbix-server      (systemd:zabbix-server):        Started SV01

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

8. 编辑Zabbix Agent的配置文件在SV01/SV02上进行以下操作：
1. 将zabbix_agentd.conf文件按照下述方式进行修改。
2. 然后重新启动Zabbix Agent服务。

# vi /etc/zabbix/zabbix_agentd.conf
----------------------------------------------------------------
No.|
98 | Server=192.168.11.13, 192.168.11.14, 192.168.11.200
114| ListenIP=0.0.0.0
139| ServerActive=192.168.11.13, 192.168.11.14, 192.168.11.200
150| # Hostname=Zabbix server
----------------------------------------------------------------
# systemctl restart zabbix-agent

9. 连通性确认（Zabbix-Server ⇔ Zabbix-Agent）

在完成如下表格设置后，点击【追加】按钮。

項目設定値名前CPU使用率タイプZabbixエージェントキーsystem.cpu.util[,idle,avg1]データ型数値(浮動小数)アプリケーションCPU

在进行如下表格设置之后，点击【追加】按钮。

項目設定値名前CPU使用率80%超過,{HOST.NAME}深刻度重度の障害条件式{Template_CPU:system.cpu.util[,idle,avg1].last()}<20

以上是使用Pacemaker/Corosync构建冗余性Zabbix服务器的搭建完成。
如果要添加监视目标主机，请先安装Zabbix-Agent，然后按照第8章中提到的内容编辑zabbix_agent.conf文件。

#技术文章

使用Pacemaker/Corosync来构建冗余的Zabbix服务器

前提 – Premise

2.2. 假设条件・两个节点是在ESXi上配置的CentOS7。
・每个节点都有一个网卡。
・Zabbix是Active-Standby配置。
・MariaDB是Master-Slave配置。
・通过虚拟IP进行切换。
・防火墙和SELinux是禁用的。

4. Zabbix的安装配置根据以下网站的参考，我们将搭建Zabbix系统。

「尝试使用 Zabbix①: 安装篇」

使ってみよう Zabbix① : インストール編

6. 集群的设置

6.2. 主机认证在集群中对每个主机进行认证。
【在SV01上执行】

`# pcs cluster auth SV01 SV02 Username: hacluster Password: [haclusterのパスワード] SV01: Authorized SV02: Authorized`

6.3. 创建群集在SV01上创建Zabbix集群。”zabbix_cluster” 可以替换为任何名字。

`# pcs cluster setup --name zabbix_cluster SV01 SV02`

启动群集，并设置自动启动。
【在SV01上执行】

`# pcs cluster start --all # pcs cluster enable --all`

7. 资源设置

7.1. SV02的待机状态转移【在SV01上执行】将SV02的集群节点状态强制设为待机状态。

`# pcs cluster standby SV02`

7.3. Zabbix服务器在SV01上添加Zabbix服务器资源。

`# pcs resource create zabbix-server \ > systemd:zabbix-server \ > op monitor interval=10s timeout=10s`

7.5. 虚拟 IP在资源中添加虚拟IP。
【在SV01上执行】

`# pcs resource create vip \ > ocf:heartbeat:IPaddr2 \ > ip=192.168.11.200 \ > cidr_netmask=24 \ > nic=eno16777736 \ > op monitor interval=10s timeout=10s`

前提 – Premise

2.2. 假设条件・两个节点是在ESXi上配置的CentOS7。 ・每个节点都有一个网卡。 ・Zabbix是Active-Standby配置。 ・MariaDB是Master-Slave配置。 ・通过虚拟IP进行切换。 ・防火墙和SELinux是禁用的。

4. Zabbix的安装配置根据以下网站的参考，我们将搭建Zabbix系统。 「尝试使用 Zabbix①: 安装篇」 使ってみよう Zabbix① : インストール編

6. 集群的设置

6.2. 主机认证在集群中对每个主机进行认证。 【在SV01上执行】 # pcs cluster auth SV01 SV02 Username: hacluster Password: [haclusterのパスワード] SV01: Authorized SV02: Authorized

6.3. 创建群集在SV01上创建Zabbix集群。”zabbix_cluster” 可以替换为任何名字。 # pcs cluster setup --name zabbix_cluster SV01 SV02 启动群集，并设置自动启动。 【在SV01上执行】 # pcs cluster start --all # pcs cluster enable --all

7. 资源设置

7.1. SV02的待机状态转移【在SV01上执行】将SV02的集群节点状态强制设为待机状态。 # pcs cluster standby SV02

7.3. Zabbix服务器在SV01上添加Zabbix服务器资源。 # pcs resource create zabbix-server \ > systemd:zabbix-server \ > op monitor interval=10s timeout=10s

7.5. 虚拟 IP在资源中添加虚拟IP。 【在SV01上执行】 # pcs resource create vip \ > ocf:heartbeat:IPaddr2 \ > ip=192.168.11.200 \ > cidr_netmask=24 \ > nic=eno16777736 \ > op monitor interval=10s timeout=10s

2.2. 假设条件・两个节点是在ESXi上配置的CentOS7。
・每个节点都有一个网卡。
・Zabbix是Active-Standby配置。
・MariaDB是Master-Slave配置。
・通过虚拟IP进行切换。
・防火墙和SELinux是禁用的。

4. Zabbix的安装配置根据以下网站的参考，我们将搭建Zabbix系统。

「尝试使用 Zabbix①: 安装篇」

使ってみよう Zabbix① : インストール編

6.2. 主机认证在集群中对每个主机进行认证。
【在SV01上执行】

`# pcs cluster auth SV01 SV02 Username: hacluster Password: [haclusterのパスワード] SV01: Authorized SV02: Authorized`

6.3. 创建群集在SV01上创建Zabbix集群。”zabbix_cluster” 可以替换为任何名字。

`# pcs cluster setup --name zabbix_cluster SV01 SV02`

启动群集，并设置自动启动。
【在SV01上执行】

`# pcs cluster start --all # pcs cluster enable --all`

7.1. SV02的待机状态转移【在SV01上执行】将SV02的集群节点状态强制设为待机状态。

`# pcs cluster standby SV02`

7.3. Zabbix服务器在SV01上添加Zabbix服务器资源。

`# pcs resource create zabbix-server \ > systemd:zabbix-server \ > op monitor interval=10s timeout=10s`

7.5. 虚拟 IP在资源中添加虚拟IP。
【在SV01上执行】

`# pcs resource create vip \ > ocf:heartbeat:IPaddr2 \ > ip=192.168.11.200 \ > cidr_netmask=24 \ > nic=eno16777736 \ > op monitor interval=10s timeout=10s`