尝试使用PostgreSQL-14的复制功能

3 年 ago

雅, 悟

7 minutes

首先

由于上次已经在CentOS上安装了PostgreSQL-14，所以这次我们将尝试使用复制功能。

可以点击这里查看之前的文章。

操作系统是CentOS，版本如下。

[root@db-server-01 ~]# cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)

这次我们要用两台DB服务器进行同步，所以请准备两台状态相同的DB服务器。
笔者已经准备了两台与上一篇文章中服务器状态相同的服务器。

尝试使用复制功能

DB服务器#01

首先，在DB服务器#01上进行工作。

修改postgresql.conf文件

首先，从修改postgresql.conf文件开始。
在此之前，先做好备份以防万一。
注意，postgresql.conf文件默认位于/var/lib/pgsql/data/之类的位置。
※ 我故意创建了一个名为/database/的数据库集群。

[root@db-server-01 ~]# cp -p /database/data/postgresql.conf /database/data/postgresql.conf_bk
[root@db-server-01 ~]# ls -al /database/data/postgresql.conf*
-rw------- 1 postgres postgres 28782 Jul 21 03:17 /database/data/postgresql.conf
-rw------- 1 postgres postgres 28782 Jul 21 03:17 /database/data/postgresql.conf_bk

去修改postgresql.conf文件。

[root@db-server-01 ~]# vi /database/data/postgresql.conf]

#60行目
listen_addresses = '*'

#205行目
wal_level = replica

#210行目
synchronous_commit = on

# 298行目
max_wal_senders = 10

#312行目
synchronous_standby_names = '*'

暂且记录在表上。

行数修正前修正後参考60#listen_addresses = ‘localhost’listen_addresses = ‘*’https://postgresqlco.nf/doc/ja/param/listen_addresses/205#wal_level = replicawal_level = replicahttps://postgresqlco.nf/doc/ja/param/wal_level/210#synchronous_commit = onsynchronous_commit = onhttps://postgresqlco.nf/doc/ja/param/synchronous_commit/298#max_wal_senders = 10max_wal_senders = 10https://postgresqlco.nf/doc/ja/param/max_wal_senders/312#synchronous_standby_names = ”synchronous_standby_names = ‘*’https://postgresqlco.nf/doc/ja/param/synchronous_standby_names/

使用diff命令进行差异确认时，结果会像这样子。

[root@db-server-01 ~]# diff /database/data/postgresql.conf_bk /database/data/postgresql.conf
60c60
< #listen_addresses = 'localhost'               # what IP address(es) to listen on;
---
> listen_addresses = '*'               # what IP address(es) to listen on;
205c205
< #wal_level = replica                  # minimal, replica, or logical
---
> wal_level = replica                   # minimal, replica, or logical
210c210
< #synchronous_commit = on              # synchronization level;
---
> synchronous_commit = on               # synchronization level;
298c298
< #max_wal_senders = 10         # max number of walsender processes
---
> max_wal_senders = 10          # max number of walsender processes
312c312
< #synchronous_standby_names = ''       # standby servers that provide sync rep
---
> synchronous_standby_names = '*'       # standby servers that provide sync rep

修改 pg_hba.conf

接下来，我们将修改pg_hba.conf文件。
为了安全起见，先创建一个备份。

[root@db-server-01 ~]# cp -p /database/data/pg_hba.conf /database/data/pg_hba.conf_bk
[root@db-server-01 ~]# ls -al /database/data/pg_hba.conf*
-rw------- 1 postgres postgres 4789 Jul 21 03:17 /database/data/pg_hba.conf
-rw------- 1 postgres postgres 4789 Jul 21 03:17 /database/data/pg_hba.conf_bk

那么，我会进行修正。

[root@db-server-01 ~]# vi /database/data/pg_hba.conf

#最終行に下記を追加（接続許可の設定
host    replication     repl_user       192.168.255.129/32      md5
host    replication     repl_user       192.168.255.130/32      md5

192.168.255.129是DB服务器#01的IP地址。
192.168.255.130是DB服务器#02的IP地址。
请提供每台服务器的IP地址。

repl_user是允许连接的用户。
稍后将创建。

使用diff命令进行差异确认

[root@db-server-01 ~]# diff /database/data/pg_hba.conf_bk /database/data/pg_hba.conf
98a99,100
> host    replication     repl_user       192.168.255.129/32      md5
> host    replication     repl_user       192.168.255.130/32      md5

接下来，我们将创建用于复制的用户（角色）。
切换到postgres用户并执行createuser命令。

[root@db-server-01 ~]# su - postgres
[postgres@db-server-01 ~]$ createuser --replication -P repl_user
Enter password for new role:        ## パスワード一回目
Enter it again:                     ## パスワード二回目

在这里发生错误的人可以参考连接失败。

如果看起来没有问题，就重新启动 postgresql-14。

[postgres@db-server-01 ~]$ exit

[root@db-server-01 ~]# systemctl restart postgresql-14

DB服务器＃01的工作已经完成。

数据库服务器#02

首先停止 PostgreSQL-14，并删除所有文件。

[root@db-server-02 ~]# systemctl stop postgresql-14
[root@db-server-02 ~]# rm -rf /database/data/*

切换账户并执行pg_basebackup命令！

[root@db-server-02 ~]# su - postgres
[postgres@db-server-02 ~]$pg_basebackup -R -h 192.168.255.129 -U repl_user -D /database/data -P
Password:
26941/26941 kB (100%), 1/1 tablespace

在这里遇到错误的人可以参考“No route to host”错误。

参考pg_basebackup命令的选项请点击此处。

我們來檢查一下/database/data/的內容吧。

[postgres@db-server-02 ~]$ls -al /database/data/
total 204
drwx------. 20 postgres postgres   4096 Jul 21 04:41 .
drwxr-xr-x   4 postgres postgres     32 Jul 18 05:10 ..
-rw-------   1 postgres postgres    225 Jul 21 04:41 backup_label
-rw-------   1 postgres postgres 137646 Jul 21 04:41 backup_manifest
drwx------   5 postgres postgres     41 Jul 21 04:41 base
-rw-------   1 postgres postgres     30 Jul 21 04:41 current_logfiles
drwx------   2 postgres postgres   4096 Jul 21 04:41 global
drwx------   2 postgres postgres     32 Jul 21 04:41 log
drwx------   2 postgres postgres      6 Jul 21 04:41 pg_commit_ts
drwx------   2 postgres postgres      6 Jul 21 04:41 pg_dynshmem
-rw-------   1 postgres postgres   4925 Jul 21 04:41 pg_hba.conf
-rw-------   1 postgres postgres   1636 Jul 21 04:41 pg_ident.conf
drwx------   4 postgres postgres     68 Jul 21 04:41 pg_logical
drwx------   4 postgres postgres     36 Jul 21 04:41 pg_multixact
drwx------   2 postgres postgres      6 Jul 21 04:41 pg_notify
drwx------   2 postgres postgres      6 Jul 21 04:41 pg_replslot
drwx------   2 postgres postgres      6 Jul 21 04:41 pg_serial
drwx------   2 postgres postgres      6 Jul 21 04:41 pg_snapshots
drwx------   2 postgres postgres      6 Jul 21 04:41 pg_stat
drwx------   2 postgres postgres      6 Jul 21 04:41 pg_stat_tmp
drwx------   2 postgres postgres      6 Jul 21 04:41 pg_subtrans
drwx------   2 postgres postgres      6 Jul 21 04:41 pg_tblspc
drwx------   2 postgres postgres      6 Jul 21 04:41 pg_twophase
-rw-------   1 postgres postgres      3 Jul 21 04:41 PG_VERSION
drwx------   3 postgres postgres     60 Jul 21 04:41 pg_wal
drwx------   2 postgres postgres     18 Jul 21 04:41 pg_xact
-rw-------   1 postgres postgres    337 Jul 21 04:41 postgresql.auto.conf
-rw-------   1 postgres postgres  28770 Jul 21 04:41 postgresql.conf
-rw-------   1 postgres postgres      0 Jul 21 04:41 standby.signal

启动 PostgreSQL-14。

[postgres@db-server-02 ~]$ exit
[root@db-server-02 ~]# systemctl start postgresql-14

DB服务器 #01

尝试连接到DB服务器#01并验证同步状态。

[postgres@db-server-01 ~]$ psql
postgres=# select * from pg_stat_replication;
 pid  | usesysid |  usename  | application_name |   client_addr   | client_hostname | client_port | backend_start | backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state | reply_time
------+----------+-----------+------------------+-----------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+-----------+------------+---------------+------------+-------------------------------
 4848 |    16384 | repl_user | walreceiver      | 192.168.255.130 |                 |       59322 | 2023-07-21 04:44:53.702684-07 || streaming | 0/3000060 | 0/3000060 | 0/3000060 | 0/3000060  |           |           |            |             1 | sync       | 2023-07-21 04:45:23.758623-07
(1 row)

如果sync_state处于sync状态，那就OK！顺便说一下，async是异步状态。

如果在”select * from pg_stat_replication;”查询语句中没有任何输出结果，那么说明处于既非同步也非异步的状态。

确认动作

我們來試試做個捲餅吧。

第一个DB服务器

postgres=# CREATE ROLE test;
CREATE ROLE
postgres=# DROP ROLE test;
DROP ROLE

数据库服务器#02

postgres=# \du
                                   List of roles
 Role name |                         Attributes                         | Member of
-----------+------------------------------------------------------------+-----------
 postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 repl_user | Replication                                                | {}
 test      | Cannot login                                               | {}

postgres=# \du
                                   List of roles
 Role name |                         Attributes                         | Member of
-----------+------------------------------------------------------------+-----------
 postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 repl_user | Replication

好像可以在同一时间完成！

错误处理

我会记下自己遇到的错误。
供参考。

连接失败

在尝试创建 repl_user 时，执行 createuser 命令时遇到以下错误…

[postgres@db-server-01 ~]$ createuser --replication -P repl_user
Enter password for new role:
Enter it again:
createuser: error: connection to server on socket "/var/run/postgresql/.s.PGSQL.5432" failed: No such file or directory
        Is the server running locally and accepting connections on that socket?

由于服务未启动，这是原因。服务处于”不活跃（无响应）”状态。

[root@db-server-01 ~]# systemctl status postgresql-14
● postgresql-14.service - PostgreSQL 14 database server
   Loaded: loaded (/usr/lib/systemd/system/postgresql-14.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Fri 2023-07-21 03:14:37 PDT; 26min ago
     Docs: https://www.postgresql.org/docs/14/static/
  Process: 1100 ExecStart=/usr/pgsql-14/bin/postmaster -D ${PGDATA} (code=exited, status=0/SUCCESS)
  Process: 1077 ExecStartPre=/usr/pgsql-14/bin/postgresql-14-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)
 Main PID: 1100 (code=exited, status=0/SUCCESS)

Jul 21 03:09:27 db-server-01 systemd[1]: Starting PostgreSQL 14 database server...
Jul 21 03:14:37 db-server-01 systemd[1]: Stopped PostgreSQL 14 database server.
Hint: Some lines were ellipsized, use -l to show in full.

因此，我們將開始操作。

[root@db-server-01 ~]# systemctl start postgresql-14
[root@db-server-01 ~]# systemctl status postgresql-14
● postgresql-14.service - PostgreSQL 14 database server
   Loaded: loaded (/usr/lib/systemd/system/postgresql-14.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2023-07-21 03:57:26 PDT; 7s ago
     Docs: https://www.postgresql.org/docs/14/static/
  Process: 3278 ExecStartPre=/usr/pgsql-14/bin/postgresql-14-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)
 Main PID: 3284 (postmaster)
    Tasks: 8
   CGroup: /system.slice/postgresql-14.service
           tq3284 /usr/pgsql-14/bin/postmaster -D /database/data/
           tq3286 postgres: logger

Jul 21 03:57:26 db-server-01 systemd[1]: Started PostgreSQL 14 database server.
Hint: Some lines were ellipsized, use -l to show in full.

再次尝试执行createuser。

[postgres@db-server-01 ~]$ createuser --replication -P repl_user
Enter password for new role:
Enter it again:

那个…？
没有收到回应..
先按Ctrl + C键连接到数据库，然后检查角色是否已创建。

[postgres@db-server-01 ~]$ psql
postgres=# \du
                                   List of roles
 Role name |                         Attributes                         | Member of
-----------+------------------------------------------------------------+-----------
 postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS | {}

原因是因为 synchronous_standby_names = ‘*’。

这个参数是用来设置同步的备机服务器的，但目前无法与任何地方进行同步，因为无法将“repl_user已创建”的信息发送到备机服务器，所以没有返回响应。

在重新开始之前，我本以为在DB服务器#01上并没有创建角色，但事实上确实有。

重新启动 PostgreSQL-14 并再次确认，可以看到角色存在。

[postgres@db-server-01 ~]$ exit
[root@db-server-01 ~]# systemctl restart postgresql-14
[root@db-server-01 ~]# su - postgres
[postgres@db-server-01 ~]$ psql
postgres=# \du
                                   List of roles
 Role name |                         Attributes                         | Member of
-----------+------------------------------------------------------------+-----------
 postgres  | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 repl_user | Replication                                                | {}

暂时做好了卷饼，没问题！

无法连接到主机

执行 pg_basebackup 命令时出现错误。。

[postgres@db-server-02 ~]$pg_basebackup -R -h 192.168.255.129 -U repl_user -D /database/data -P
pg_basebackup: error: connection to server at "192.168.255.129", port 5432 failed: No route to host
        Is the server running on that host and accepting TCP/IP connections?

192.168.255.129（DB服务器#01）的5432端口好像无法连接！看起来很生气的样子。

首先，我们尝试使用ping命令测试是否可以与192.168.255.129进行通信。
丢包率为0%，看起来没有任何问题。

[root@db-server-02 ~]$ping 192.168.255.129
PING 192.168.255.129 (192.168.255.129) 56(84) bytes of data.
64 bytes from 192.168.255.129: icmp_seq=1 ttl=64 time=0.530 ms
64 bytes from 192.168.255.129: icmp_seq=2 ttl=64 time=0.631 ms
^C
--- 192.168.255.129 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 2005ms
rtt min/avg/max/mdev = 0.530/0.614/0.683/0.069 ms

那就意味着5432端口是关闭的，需要去检查一下firewalld。

[root@db-server-02 ~]$systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2023-07-21 03:09:36 PDT; 1h 18min ago
     Docs: man:firewalld(1)
 Main PID: 660 (firewalld)
    Tasks: 2
   CGroup: /system.slice/firewalld.service
           mq660 /usr/bin/python2 -Es /usr/sbin/firewalld --nofork --nopid

由于活动状态（正在运行）无法停止此子程序。
停止在DB服务器＃01和DB服务器＃02上。
※这次是为了学习数据库而停止，但请确认停止不会有问题…..

顺便把自动启动也关闭了。

[root@db-server-02 ~]# systemctl stop firewalld
[root@db-server-02 ~]# systemctl disable firewalld
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
[root@db-server-02 ~]# systemctl is-enabled firewalld
disabled

为了确定，确认一下 SELinux。
它已经被禁用了，看起来没问题！

[root@db-server-02 ~]# getenforce
Disabled

再次执行pg_basebackup时，一切都顺利进行。

[root@db-server-02 ~]# su - postgres
[postgres@db-server-02 ~]$pg_basebackup -R -h 192.168.255.129 -U repl_user -D /database/data -P
Password:           ## createuser の時に設定した repl_user のパスワードを入力
26941/26941 kB (100%), 1/1 tablespace

以上

最后

感谢您一直以来的观看。

很高兴我能顺利使用了复制功能。

我想尝试一下关于wal文件的部分以及备份和恢复的部分。

请务必阅读其他的文章！