使用GPExpand扩容集群及解决常见的问题

之前的博文介绍了在Greenplum6中增强了功能的在线扩容工具——GPexpand. 本文将通过一个实际的例子来展示如何使用GPexpand以及如何解决在扩容中可能遇到的问题。

环境准备

本次演示如何给一个有两台机器的Greenplum Database集群新加入一台机器,因此先准备好三台虚拟机,并保证它们之间可以用hostname互相访问。

然后,我们用前两台虚拟机搭建一个Greeplum Database的集群,第一台机器作为Master, Segments同时分布在第一台机器和第二台机器上。

每台机器需要设置一些系统参数,这部分本文不赘述,请参考在线文档

每台机器需要安装一些python依赖,这部分是由于构建集群时候,需要进行一些远程操作,可以用Greenplum源码包里的脚本快速完成,由于本文的虚拟机是Ubuntu, 可以调用脚本README.ubuntu.bash 完成依赖安装 (其他常见操作系统Greenplum的代码仓库里也有对应的环境设置脚本)。

我们既可以从pivotal.io下载二进制安装包进行安装,也可以手动编译源码。这部分内容不是本文重点,故略去。这里只提一些需要注意的地方和技巧:

  • 可以用gpssh同时操作多个远程机器
  • 用gpseginstall在各个segment机器上安装Greenplum
  • 用gpssh-exkeys使得从Master主机ssh到Segment主机不需要密码

可以查询系统表gp_segment_configuration来看集群情况。搭建好的集群状态如下:

postgres=# select * from gp_segment_configuration ;
dbid | content | role | preferred_role | mode | status | port | hostname | address | datadir
------+---------+------+----------------+------+--------+------+----------+---------+-----------------------------------
1 | -1 | p | p | n | u | 5432 | mdw | mdw | /home/gpadmin/test/master/gpseg-1
2 | 0 | p | p | n | u | 6000 | sdw1 | sdw1 | /home/gpadmin/test/p1/gpseg0
5 | 3 | p | p | n | u | 6000 | sdw2 | sdw2 | /home/gpadmin/test/p1/gpseg3
3 | 1 | p | p | n | u | 6001 | sdw1 | sdw1 | /home/gpadmin/test/p2/gpseg1
6 | 4 | p | p | n | u | 6001 | sdw2 | sdw2 | /home/gpadmin/test/p2/gpseg4
4 | 2 | p | p | n | u | 6002 | sdw1 | sdw1 | /home/gpadmin/test/p3/gpseg2
7 | 5 | p | p | n | u | 6002 | sdw2 | sdw2 | /home/gpadmin/test/p3/gpseg5
(7 rows)

使用Gpexpand扩容

三步扩容一个集群

GPexpand的使用非常简单,只需要三个命令就可以完成Greenplum集群的扩容:

1. gpexpand -i inputfile
2. gpexpand
3. gpexpand -c

第一命令往集群里加入新的节点,第二个命令完成数据重分布,第三个命令清除掉记录的中间信息。

接下来,我们用一个详尽的例子展示GPexpand。

数据准备

为了演示整个扩容流程,我们先创建一些数据库,并创建不同类型的表 (AO表, 复制表, 堆表等)和数据库。使用下面的脚本做这些准备工作:

#!/bin/bash
createdb testexpand

psql -d testexpand <<EOF
   create table t1(c1 int, c2 text) distributed by (c1);
   create table t2(c1 int, c2 date) distributed randomly;
   create table t3(c1 boolean, c2 date) distributed replicated;
   create table t4(c1 int, c2 int) with(appendonly=true) distributed by (c1);
   create table t5(c1 int, c2 int) with(appendonly=true, orientation=column) distributed randomly;
EOF

可以使用工具mock-data 灌入测试数据:

mockd greenplum -d testexpand -n 1024 -x -u gpadmin

完成后我们查看各个表的分布情况:

testexpand=# select localoid::regclass::text, * from gp_distribution_policy ;
localoid | localoid | policytype | numsegments | distkey | distclass
----------+----------+------------+-------------+---------+-----------
t1 | 16385 | p | 6 | 1 | 10027
t2 | 16391 | p | 6 | |
t3 | 16394 | r | 6 | |
t4 | 16397 | p | 6 | 1 | 10027
t5 | 16404 | p | 6 | |
(5 rows)

我们可以看到所有的表都只分布在前6个segment上 (numsegments为6)。

在线扩容阶段1——增加节点

GPexpand的第一阶段是加入新的节点到集群中,这部分是在线完成的,在线的含义是:不用停机重启,不影响用户正常的查询(修改catalog的操作除外)。

命令行工具GPexpand需要根据用户输入的配置文件添加节点,配置文件里要提供新节点的各种信息。我们也可以交互式的让GPexpand帮我们生成这个配置文件。

在终端输入gpexpand命令后,交互式过程如下 (略去了无用的提示信息, 交互式用户输入部分加粗):

Please refer to the Admin Guide for more information.
Would you like to initiate a new System Expansion Yy|Nn (default=N):
y
Enter a comma separated list of new hosts you want
to add to your array. Do not include interface hostnames.
Enter a blank line to only add segments to existing hosts[]:
sdw3
By default, new hosts are configured with the same number of primary segments as existing hosts. Optionally, you can increase the number of segments per host. For example, if existing hosts have two primary segments, entering a value of 2 will initialize two additional segments on existing hosts, and four segments on new hosts. In addition, mirror segments will be added for these new primary segments if mirroring is enabled.
How many new primary segments per host do you want to add? (default=0):
0
Generating configuration file…
20190423:16:38:49:028224 gpexpand:mdw:gpadmin-[INFO]:-Generating input file…
Input configuration file was written to 'gpexpand_inputfile_20190423_163849'.
Please review the file and make sure that it is correct then re-run
with: gpexpand -i gpexpand_inputfile_20190423_163849
20190423:16:38:49:028224 gpexpand:mdw:gpadmin-[INFO]:-Exiting…

上述交互式回答生成的配置文件如下:

zlv-2:zlv-2:6000:/home/gpadmin/test/p1/gpseg6:8:6:p
zlv-2:zlv-2:6001:/home/gpadmin/test/p2/gpseg7:9:7:p
zlv-2:zlv-2:6002:/home/gpadmin/test/p3/gpseg8:10:8:p

准备新的节点

新节点需要安装一些依赖,如python的一些库和Greenplum的二进制,并准备好配置文件里的相应的目录。这部分和开始的准备工作几乎一样, 细节参考在线文档

进行扩容的第一阶段

为了演示在线扩容不影响用户查询,在真正扩容之前我们起一个事务,只要不做更新catalog的操作,这个事务在更新过程中(之后)一直可以执行用户查询,不受影响 。如果该事务在扩容的第一阶段之前修改了catalog,则该事务会一直持有catalog锁,扩容将无法进行。如果该事务在扩容持有catalog锁之后,试图修改catalog (如进行DDL操作), 则该事务会报错并回滚。同样的,如果该事务如果试图访问已经完成数据重分布的表,也会报错。后两者报错的原因,都是因为新的机器上没有事务相关信息。

testexpand=# begin;
BEGIN
testexpand=# select * from t1 limit 1;

然后我们在终端运行命令进行扩容,扩容的过程中会锁住catalog,这个锁的获取和释放是有日志显示的。在释放之前,如果出现了错误,是可以成功回滚的。我们下面的第一次操作就是在这个锁释放之前,按Ctrl-C终端命令,看看能不能恢复。

[gpadmin@mdw test]$ gpexpand -i gpexpand_inputfile_20190423_164336
20190423:16:45:05:028411 gpexpand:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.0.0-alpha.0+dev.16250.ge9cd4c060f build dev'
20190423:16:45:05:028411 gpexpand:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.20 (Greenplum Database 6.0.0-alpha.0+dev.16250.ge9cd4c060f build dev) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit compiled on Apr 23 2019 16:22:57 (with assert checking)'
20190423:16:45:05:028411 gpexpand:mdw:gpadmin-[INFO]:-Querying gpexpand schema for current expansion state
20190423:16:45:05:028411 gpexpand:mdw:gpadmin-[INFO]:-Heap checksum setting consistent across cluster
20190423:16:45:05:028411 gpexpand:mdw:gpadmin-[INFO]:-Syncing Greenplum Database extensions
20190423:16:45:05:028411 gpexpand:mdw:gpadmin-[INFO]:-The packages on sdw3 are consistent.
20190423:16:45:05:028411 gpexpand:mdw:gpadmin-[INFO]:-Locking catalog
20190423:16:45:05:028411 gpexpand:mdw:gpadmin-[INFO]:-Locked catalog
20190423:16:45:05:028411 gpexpand:mdw:gpadmin-[INFO]:-Creating segment template
20190423:16:45:06:028411 gpexpand:mdw:gpadmin-[INFO]:-Copying postgresql.conf from existing segment into template
20190423:16:45:07:028411 gpexpand:mdw:gpadmin-[INFO]:-Copying pg_hba.conf from existing segment into template
20190423:16:45:07:028411 gpexpand:mdw:gpadmin-[INFO]:-Creating schema tar file
20190423:16:45:07:028411 gpexpand:mdw:gpadmin-[INFO]:-Distributing template tar file to new hosts
20190423:16:45:08:028411 gpexpand:mdw:gpadmin-[INFO]:-Configuring new segments (primary)
20190423:16:45:08:028411 gpexpand:mdw:gpadmin-[INFO]:-{'sdw3': '/home/gpadmin/test/p1/gpseg6:6000:true:false:8:6::-1:,/home/gpadmin/test/p2/gpseg7:6001:true:false:9:7::-1:,/home/gpadmin/test/p3/gpseg8:6002:true:false:10:8::-1:'}
^C20190423:16:45:08:028411 gpexpand:mdw:gpadmin-[INFO]:-Shutting down gpexpand…
User Interrupted

异常退出后,我们再次运行gpexpand会提示让我们回滚:

[gpadmin@mdw test]$ gpexpand -r
20190423:16:45:54:028485 gpexpand:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.0.0-alpha.0+dev.16250.ge9cd4c060f build dev'
20190423:16:45:54:028485 gpexpand:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.20 (Greenplum Database 6.0.0-alpha.0+dev.16250.ge9cd4c060f build dev) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit compiled on Apr 23 2019 16:22:57 (with assert checking)'
20190423:16:45:54:028485 gpexpand:mdw:gpadmin-[INFO]:-Rolling back building of new segments
20190423:16:45:54:028485 gpexpand:mdw:gpadmin-[INFO]:-Rolling back segment template build
20190423:16:45:55:028485 gpexpand:mdw:gpadmin-[INFO]:-Rollback complete.

进行回滚,然后再次运行gpexpand:

[gpadmin@mdw test]$ gpexpand -i gpexpand_inputfile_20190423_164336
20190423:16:46:09:028515 gpexpand:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.0.0-alpha.0+dev.16250.ge9cd4c060f build dev'
20190423:16:46:09:028515 gpexpand:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.20 (Greenplum Database 6.0.0-alpha.0+dev.16250.ge9cd4c060f build dev) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit compiled on Apr 23 2019 16:22:57 (with assert checking)'
20190423:16:46:09:028515 gpexpand:mdw:gpadmin-[INFO]:-Querying gpexpand schema for current expansion state
20190423:16:46:09:028515 gpexpand:mdw:gpadmin-[INFO]:-Heap checksum setting consistent across cluster
20190423:16:46:09:028515 gpexpand:mdw:gpadmin-[INFO]:-Syncing Greenplum Database extensions
20190423:16:46:10:028515 gpexpand:mdw:gpadmin-[INFO]:-The packages on sdw3 are consistent.
20190423:16:46:10:028515 gpexpand:mdw:gpadmin-[INFO]:-Locking catalog
20190423:16:46:12:028515 gpexpand:mdw:gpadmin-[INFO]:-Locked catalog
20190423:16:46:12:028515 gpexpand:mdw:gpadmin-[INFO]:-Creating segment template
20190423:16:46:16:028515 gpexpand:mdw:gpadmin-[INFO]:-Copying postgresql.conf from existing segment into template
20190423:16:46:16:028515 gpexpand:mdw:gpadmin-[INFO]:-Copying pg_hba.conf from existing segment into template
20190423:16:46:16:028515 gpexpand:mdw:gpadmin-[INFO]:-Creating schema tar file
20190423:16:46:17:028515 gpexpand:mdw:gpadmin-[INFO]:-Distributing template tar file to new hosts
20190423:16:46:18:028515 gpexpand:mdw:gpadmin-[INFO]:-Configuring new segments (primary)
20190423:16:46:18:028515 gpexpand:mdw:gpadmin-[INFO]:-{'sdw3': '/home/gpadmin/test/p1/gpseg6:6000:true:false:8:6::-1:,/home/gpadmin/test/p2/gpseg7:6001:true:false:9:7::-1:,/home/gpadmin/test/p3/gpseg8:6002:true:false:10:8::-1:'}
20190423:16:46:35:028515 gpexpand:mdw:gpadmin-[INFO]:-Cleaning up temporary template files
20190423:16:46:35:028515 gpexpand:mdw:gpadmin-[INFO]:-Cleaning up databases in new segments.
20190423:16:46:39:028515 gpexpand:mdw:gpadmin-[INFO]:-Unlocking catalog
20190423:16:46:39:028515 gpexpand:mdw:gpadmin-[INFO]:-Unlocked catalog
20190423:16:46:39:028515 gpexpand:mdw:gpadmin-[INFO]:-Creating expansion schema
20190423:16:46:39:028515 gpexpand:mdw:gpadmin-[INFO]:-Populating gpexpand.status_detail with data from database template1
20190423:16:46:39:028515 gpexpand:mdw:gpadmin-[INFO]:-Populating gpexpand.status_detail with data from database postgres
20190423:16:46:39:028515 gpexpand:mdw:gpadmin-[INFO]:-Populating gpexpand.status_detail with data from database testexpand
20190423:16:46:40:028515 gpexpand:mdw:gpadmin-[INFO]:-* 20190423:16:46:40:028515 gpexpand:mdw:gpadmin-[INFO]:-Initialization of the system expansion complete. 20190423:16:46:40:028515 gpexpand:mdw:gpadmin-[INFO]:-To begin table expansion onto the new segments 20190423:16:46:40:028515 gpexpand:mdw:gpadmin-[INFO]:-rerun gpexpand 20190423:16:46:40:028515 gpexpand:mdw:gpadmin-[INFO]:-*
20190423:16:46:40:028515 gpexpand:mdw:gpadmin-[INFO]:-Exiting…

完成这一步后,我们就成功加入了一个节点到系统中:

postgres=# select * from gp_segment_configuration ;
dbid | content | role | preferred_role | mode | status | port | hostname | address | datadir
------+---------+------+----------------+------+--------+------+----------+---------+-----------------------------------
1 | -1 | p | p | n | u | 5432 | mdw | mdw | /home/gpadmin/test/master/gpseg-1
2 | 0 | p | p | n | u | 6000 | sdw1 | sdw1 | /home/gpadmin/test/p1/gpseg0
5 | 3 | p | p | n | u | 6000 | sdw2 | sdw2 | /home/gpadmin/test/p1/gpseg3
3 | 1 | p | p | n | u | 6001 | sdw1 | sdw1 | /home/gpadmin/test/p2/gpseg1
6 | 4 | p | p | n | u | 6001 | sdw2 | sdw2 | /home/gpadmin/test/p2/gpseg4
4 | 2 | p | p | n | u | 6002 | sdw1 | sdw1 | /home/gpadmin/test/p3/gpseg2
7 | 5 | p | p | n | u | 6002 | sdw2 | sdw2 | /home/gpadmin/test/p3/gpseg5
8 | 6 | p | p | n | u | 6000 | sdw3 | sdw3 | /home/gpadmin/test/p1/gpseg6
9 | 7 | p | p | n | u | 6001 | sdw3 | sdw3 | /home/gpadmin/test/p2/gpseg7
10 | 8 | p | p | n | u | 6002 | sdw3 | sdw3 | /home/gpadmin/test/p3/gpseg8
(10 rows)

可以看到多了1个节点。

在回到之前的事务里,仍旧可以执行,不受影响:

testexpand=# select * from t1 limit 1;
c1 |
c2
---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------

8972227 | quisquam non perspiciatis in consequatur aliquam. sunt autem officiis officiis laboriosam recusandae quia et. non ullam natus dolorem incidunt fugiat sunt.
rem sed quia. rerum quaerat adipisci ea sit sit voluptas ex. ea quia magnam ipsam quam. quis deleniti suscipit perferendis harum laborum quia. dolore quasi in voluptas
sequi dolor aut architecto! illum nulla consequatur earum odit optio.
(1 row)

在线扩容阶段2——数据重分布

完成第一阶段后,数据还只分布在之前的节点上,再次查看gp_distribution_policy如下:

testexpand=# select localoid::regclass::text, * from gp_distribution_policy ;
localoid | localoid | policytype | numsegments | distkey | distclass
----------+----------+------------+-------------+---------+-----------
t1 | 16385 | p | 6 | 1 | 10027
t2 | 16391 | p | 6 | |
t3 | 16394 | r | 6 | |
t4 | 16397 | p | 6 | 1 | 10027
t5 | 16404 | p | 6 | |
(5 rows)

我们发现它们的numsegments还是6,这些表称为部分表,即使不进行数据重分布,对他们的增删查改也都是可以正常的进行。部分分布表仍旧可以是哈希分布的,对它们的查询,仍然可以享受到co-locate带来的优势。但我们推荐在gpexpand第一阶段完成后,尽快完成数据的重分布。在完成第一阶段以后,创建的表,都将是全分布表 (即numsegments是集群的大小)。

继续调用gpexpand命令就可以进行数据重分布了 (注意: 可以根据自己业务需要调整重分布表的顺序)。数据重分布的过程可以并行,-n参数控制并发的进程数目。下面用2个进程并发的对上述表重分布。(注意:对每个表的数据重分布过程,会锁住这个表,那个过程中,读写都是禁止的)

[gpadmin@mdw test]$ gpexpand
20190423:16:50:41:028651 gpexpand:mdw:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 6.0.0-alpha.0+dev.16250.ge9cd4c060f build dev'
20190423:16:50:41:028651 gpexpand:mdw:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 9.4.20 (Greenplum Database 6.0.0-alpha.0+dev.16250.ge9cd4c060f build dev) on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit compiled on Apr 23 2019 16:22:57 (with assert checking)'
20190423:16:50:41:028651 gpexpand:mdw:gpadmin-[INFO]:-Querying gpexpand schema for current expansion state
20190423:16:50:41:028651 gpexpand:mdw:gpadmin-[INFO]:-Expanding testexpand.public.t2
20190423:16:50:41:028651 gpexpand:mdw:gpadmin-[INFO]:-Finished expanding testexpand.public.t2
20190423:16:50:41:028651 gpexpand:mdw:gpadmin-[INFO]:-Expanding testexpand.public.t3
20190423:16:50:41:028651 gpexpand:mdw:gpadmin-[INFO]:-Finished expanding testexpand.public.t3
20190423:16:50:42:028651 gpexpand:mdw:gpadmin-[INFO]:-Expanding testexpand.public.t4
20190423:16:50:42:028651 gpexpand:mdw:gpadmin-[INFO]:-Finished expanding testexpand.public.t4
20190423:16:50:42:028651 gpexpand:mdw:gpadmin-[INFO]:-Expanding testexpand.public.t5
20190423:16:50:42:028651 gpexpand:mdw:gpadmin-[INFO]:-Finished expanding testexpand.public.t5
20190423:16:50:43:028651 gpexpand:mdw:gpadmin-[INFO]:-Expanding testexpand.public.t1
20190423:16:50:43:028651 gpexpand:mdw:gpadmin-[INFO]:-Finished expanding testexpand.public.t1
20190423:16:50:46:028651 gpexpand:mdw:gpadmin-[INFO]:-EXPANSION COMPLETED SUCCESSFULLY
20190423:16:50:46:028651 gpexpand:mdw:gpadmin-[INFO]:-Exiting…

完成重分布后,再次查询gp_distribution_policy可以发现所有的表的numsegments都是9了。

testexpand=# select localoid::regclass::text, * from gp_distribution_policy ;
localoid | localoid | policytype | numsegments | distkey | distclass
----------+----------+------------+-------------+---------+-----------
t2 | 16391 | p | 9 | |
t3 | 16394 | r | 9 | |
t4 | 16397 | p | 9 | 1 | 10027
t5 | 16404 | p | 9 | |
t1 | 16385 | p | 9 | 1 | 10027
(5 rows)

其他问题和Tips

扩容期间的工具影响

在重分布期间,gpcheckcat和gpaddpkg不允许运行,因为gpcheckcat的目的是在集群范围内检测可能的catalog问题,并提示修复。在扩容期间,并没有必要执行这样的操作。用户可以在扩容完成后执行。 另外,gpconfig和gppkg在扩容的第一阶段不允许运行,目的是防止一些配置或者文件只应用到原有节点而没有应用到新节点,造成节点间文件或者配置的不一致。

数据重分布期间注意事项

  • 不要修改表名
  • 不要更改表的policy
  • 不要修改表的Schema


关注微信公众号

VMware 中国研发中心