实践！通过搭建Pacemaker + Corosync来实现Zabbix的冗余化

3 年 ago

科, 颖

4 minutes

简述

使用Pacemaker和Corosync成功地实现了Zabbix服务器的冗余备份。
在我刚成为工程师的时候，曾经尝试过一次失败了，后来放弃了，但是这次成功了，所以我写下了备忘录。
现在很少见到使用Pacemaker的项目，而且我觉得以后也不会再遇到了。
这完全是出于兴趣而做的。

▼ Pacemaker
HAクラスタ構成を形成し、リソース制御(ノード切り離し＋待機系ノードへのリソース引継ぎ)を行うOSS。
▼ Corosync
HAクラスタ構成において、各ノードの死活監視を行うOSS。
▼ Zabbix
統合監視を行うOSS。

2. 前提

Note: The original statement “前提” does not have enough context to be paraphrased further in Chinese. It simply means “premise” or “prerequisite.”

2-1. 逻辑构建

2-2. 前提条件

前提是一个必要条件。

以下是用中文进行的重述：

・有两个节点，节点上安装了CentOS 7，并且在VMware Workstation上配置。
・每个节点都有一个NIC。
・Zabbix是以Active-Standby的方式进行配置的。
・MariaDB是以Master-Slave的方式进行配置的。
・使用虚拟IP进行切换。
・防火墙和SELinux均已停用。

3. 初始设定

假设虚拟机和操作系统的初始配置已完成。
我们将在hosts文件中添加主机名。
【在SV01/SV02上执行】

# vi /etc/hosts
----------------------------------------------------------------
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.11.13 SV01  #ノード1のホスト名
192.168.11.14 SV02　#ノード2のホスト名

在SV01/SV02上安装Pacemaker、Corosync和pcs。
此外，还要启动pcs服务并设置为自动启动。

# yum -y install pacemaker corosync pcs
# systemctl start pcsd
# systemctl enable pcsd

更改用于集群认证的用户密码。
hacluster是在安装Corosync时自动添加的用户。
【在SV01/SV02上执行】

# passwd hacluster
ユーザー hacluster のパスワードを変更。
新しいパスワード: [任意のパスワード]
新しいパスワードを再入力してください: [任意のパスワード]
passwd: すべての認証トークンが正しく更新できました。

4. Zabbix的安装设置

【在SV01/SV02上执行】
我们假设Zabbix的初始建设已经完成，可以显示登录界面。
※ 版本为Zabbix 4.0，数据库为mariaDB。

5. 针对MariaDB复制设置的数据库用户配置

在SV01/SV02上创建用于MariaDB复制的用户。

# systemctl start mariadb
# mysql -uroot
> create user 'repl'@'%' identified by '[任意のパスワード]';
> create user 'repl'@'localhost' identified by '[任意のパスワード]';
> grant replication slave on *.* to 'repl'@'%';
> grant process, super, replication slave, replication client, reload on *.* to 'repl'@'localhost';
> flush privileges;
> quit;

在SV01上进行MariaDB的复制功能设置。

# vi /etc/my.cnf
----------------------------------------------------------------
[mysqld]
log-bin=mariadb-bin　#追記
server-id=1　#追記
log-basename=SV01　#追記

【在SV02上执行】

# vi /etc/my.cnf
----------------------------------------------------------------
[mysqld]
log-bin=mariadb-bin　#追記
server-id=2　#追記
log-basename=SV02　#追記

在中文中，只需提供一种选项：
将Master（SV01）的数据库数据迁移到Slave（SV02）。
首先，需要先删除Slave（SV02）中的数据库数据。
【在SV02上执行】

# systemctl stop mariadb
# rm -rf /var/lib/mysql

接下来，我们将使用SSH连接将Master（SV01）的数据库数据传输到Slave（SV02）。
【在SV01上进行】

# systemctl stop mariadb
# tar cf - -C /var/lib mysql | ssh 192.168.11.14 tar xpf - -C /var/lib
Are you sure you want to continue connectiong (yes/no)? yes
root@192.168.11.14's password:[SV02のrootユーザのパスワード]

6. 集群的设置

6-1. 服务停止与自动启动禁用

在设置群集之前，请停止并禁用要添加为资源的服务，【在SV01/SV02上执行】。

# systemctl stop zabbix-server
# systemctl stop mariadb
# systemctl stop httpd
# systemctl disable zabbix-server
# systemctl disable mariadb
# systemctl disable httpd

6-2. 主机认证

在集群中对每个主机进行认证。
【在SV01上执行】

# pcs cluster auth SV01 SV02
Username: hacluster
Password: [haclusterのパスワード]
SV01: Authorized
SV02: Authorized

6-3. 创建集群

在SV01上创建Zabbix集群。”zabbix_cluster”之处是可选的。

# pcs cluster setup --name zabbix_cluster SV01 SV02

启动集群并设置自动启动。
【在SV01上执行】

# pcs cluster start --all
# pcs cluster enable --all

6-4. 设置群集的属性配置

禁用STONITH。
【在SV01上执行】

# pcs property set stonith-enabled=false

更改委员会的运作方式。
【在SV01上执行】

# pcs property set no-quorum-policy=ignore

在SV01中禁用自动回退功能。（忽略警告）

# pcs resource defaults resource-stickiness=INFINITY
Warning: Defaults do not apply to resources which override them with their own defined values

在错误发生时，立即进行故障转移（忽略警告）。【在SV01进行】

# pcs resource defaults migration-threshold=1
Warning: Defaults do not apply to resources which override them with their own defined values

7.资源配置

7-1. SV02的待机转换

将SV02的集群节点状态强制设置为Standby状态。【在SV01上执行】

# pcs cluster standby SV02

7-2. MariaDB -> 7-2. MariaDB

在SV01上执行添加MariaDB资源的操作。

# pcs resource create mariadb \
> ocf:heartbeat:mysql \
> binary=/usr/bin/mysqld_safe \
> datadir=/var/lib/mysql \
> log=/var/log/mariadb/mariadb.log \
> pid=/run/mariadb/mariadb.pid \
> replication_user=repl \
> replication_passwd=[replユーザのパスワード]  \
> op monitor interval=10s timeout=10s

在SV01上，使用主从模式配置增加的MariaDB资源。

# pcs resource master mariadb-clone mariadb \
> master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

我們將先優先啟動SV01作為主機。
根據分數值高低，優先級啟動。
-INFINITY<-100<0<100<INFINITY
【在SV01上進行操作】

# pcs constraint location mariadb-clone prefers SV01=100

7-3. Zabbix服务器

将Zabbix服务器添加到资源中。
【在SV01上执行】

# pcs resource create zabbix-server \
> systemd:zabbix-server \
> op monitor interval=10s timeout=10s

7.4. 阿帕奇

在SV01/SV02上创建Apache的status.conf文件。

# vi /etc/httpd/conf.d/status.conf
----------------------------------------------------------------
ExtendedStatus On

<Location /server-status>
    SetHandler server-status
    Require local
</Location>

在SV01上，将Apache添加为资源。

# pcs resource create apache \
> ocf:heartbeat:apache \
> configfile=/etc/httpd/conf/httpd.conf \
> statusurl="http://localhost/server-status" \
> op monitor interval=10s timeout=10s

7-5. 虚拟IP

在资源中添加虚拟IP。
【在SV01执行】

# pcs resource create vip \
> ocf:heartbeat:IPaddr2 \
> ip=192.168.11.200 \
> cidr_netmask=24 \
> nic=eno16777736 \
> op monitor interval=10s timeout=10s

7-6. 设置启动条件

首先，将资源分组以便更容易进行设置。【在SV01上执行】

# pcs resource group add zabbix_group vip apache zabbix-server

接下来，我们将设置每个资源的启动限制。
【在 SV01 上执行】

# pcs constraint order start zabbix_group then promote mariadb-clone
# pcs constraint colocation add zabbix_group with master mariadb-clone INFINITY
# pcs constraint colocation add master mariadb-clone with zabbix_group INFINITY
# pcs constraint colocation add zabbix-server apache INFINITY
# pcs constraint colocation add zabbix-server vip INFINITY

在这个时点上，执行status命令并确认输出结果如下所示。
如果出现了Faild Resource Action等错误，就需要重新检查各个配置文件和设置值。

# pcs status
Cluster name: zabbix_cluster
Stack: corosync
Current DC: SV01 (version 1.1.23-1.el7-9acf116022) - partition with quorum
Last updated: Wed Nov 25 18:46:14 2020
Last change: Wed Nov 25 18:35:46 2020 by root via cibadmin on SV01

2 nodes configured
5 resource instances configured

Online: [ SV01 SV02 ]

Full list of resources:

 Master/Slave Set: mariadb-clone [mariadb]
     Masters: [ SV01 ]
     Slaves: [ SV02 ]
 Resource Group: zabbix_group
     vip        (ocf::heartbeat:IPaddr2):       Started SV01
     apache     (ocf::heartbeat:apache):        Started SV01
     zabbix-server      (systemd:zabbix-server):        Started SV01

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

8. 编辑Zabbix Agent的conf文件

在SV01/SV02上，将zabbix_agentd.conf文件修改如下所示。然后重新启动Zabbix Agent服务。

# vi /etc/zabbix/zabbix_agentd.conf
----------------------------------------------------------------
No.|
98 | Server=192.168.11.13, 192.168.11.14, 192.168.11.200
114| ListenIP=0.0.0.0
139| ServerActive=192.168.11.13, 192.168.11.14, 192.168.11.200
150| # Hostname=Zabbix server
----------------------------------------------------------------
# systemctl restart zabbix-agent