关于Greenplum数据库中几条普通数据库日志的思考

作者:阿福 Greenplum中文社区成员,山东辉鸿泛在电子科技有限公司技术总监

Greenplum通过Master接受外部增删改查操作,默认情况下,所有增删改查操作都会记录到$MASTER_DATA_DIRECTORY/pg_log下对应的文件中。

1.有时候查看Greenplum Master日志的时候,会发现有一些LOG信息类似如下:

2020-09-09 20:09:56.760795 CST,"gposs5","postgres",p32314,th-591984576,"10.211.55.2","61196",2020-09-09 20:09:56 CST,0,con10,cmd1,seg-1,,dx4,,sx1,"LOG","00000","statement: BEGIN",,,,,,"BEGIN",0,,"postgres.c",1590, 
2020-09-09 20:09:56.769845 CST,"gposs5","postgres",p32314,th-591984576,"10.211.55.2","61196",2020-09-09 20:09:56 CST,0,con10,cmd2,seg-1,,dx4,,sx1,"LOG","00000","statement: select 2020-09-08",,,,,,"select 2020-09-08",0,,"postgres.c",1590, 
2020-09-09 20:09:56.771312 CST,"gposs5","postgres",p32314,th-591984576,"10.211.55.2","61196",2020-09-09 20:09:56 CST,0,con11,,seg-1,,dx4,,sx1,"LOG","00000","The previous session was reset because its gang was disconnected (session id = 10). The new session id = 11",,,,,,,0,,"cdbgang.c",1710,

那么这些日志为何会产生呢?下面我们通过一段简单的python代码来复现几个场景。

2.代码部分如下:

update1_sql = "select %s"
conn1 = Tools.gp_connect()
cur1 = conn1.cursor()
cur1.execute(update1_sql % "'2020-09-08'")
row = cur1.fetchone()
cur1.close()
print(row[0])
conn1.commit()
conn1.close()

time.sleep(1000000)

此段代码为正常执行的代码,后台数据库产生的日志如下,有begin;正常执行查询语句;commit;

2020-09-09 20:30:30.898275 CST,"gposs5","postgres",p4116,th-591984576,"10.211.55.2","64416",2020-09-09 20:30:30 CST,0,con41,cmd1,seg-1,,dx26,,sx1,"LOG","00000","statement: BEGIN",,,,,,"BEGIN",0,,"postgres.c",1590,
2020-09-09 20:30:30.909131 CST,"gposs5","postgres",p4116,th-591984576,"10.211.55.2","64416",2020-09-09 20:30:30 CST,0,con41,cmd2,seg-1,,dx26,,sx1,"LOG","00000","statement: select '2020-09-08'",,,,,,"select '2020-09-08'",0,,"postgres.c",1590,
2020-09-09 20:30:30.909766 CST,"gposs5","postgres",p4116,th-591984576,"10.211.55.2","64416",2020-09-09 20:30:30 CST,0,con41,cmd4,seg-1,,dx26,,sx1,"LOG","00000","statement: COMMIT",,,,,,"COMMIT",0,,"postgres.c",1590,

错误场景一:

我们把代码中的commit部分注释掉,如下:

update1_sql = "select %s"
conn1 = Tools.gp_connect()
cur1 = conn1.cursor()
cur1.execute(update1_sql % "'2020-09-08'")
row = cur1.fetchone()
cur1.close()
print(row[0])
#conn1.commit()
conn1.close()

time.sleep(1000000)

执行上面代码,可以正常获取查询结果,但是实际上我们开启事务后并没有正常关闭,然后就直接把数据库连接关了,有点粗暴啊。此时,数据库日志如下:

2020-09-09 20:34:31.100551 CST,"gposs5","postgres",p4869,th-591984576,"10.211.55.2","65043",2020-09-09 20:34:31 CST,0,con42,cmd1,seg-1,,dx27,,sx1,"LOG","00000","statement: BEGIN",,,,,,"BEGIN",0,,"postgres.c",1590,
2020-09-09 20:34:31.116187 CST,"gposs5","postgres",p4869,th-591984576,"10.211.55.2","65043",2020-09-09 20:34:31 CST,0,con42,cmd2,seg-1,,dx27,,sx1,"LOG","00000","statement: select '2020-09-08'",,,,,,"select '2020-09-08'",0,,"postgres.c",1590,
2020-09-09 20:34:31.117199 CST,"gposs5","postgres",p4869,th-591984576,"10.211.55.2","65043",2020-09-09 20:34:31 CST,0,con43,,seg-1,,dx27,,sx1,"LOG","00000","The previous session was reset because its gang was disconnected (session id = 42). The new session id = 43",,,,,,,0,,"cdbgang.c",1710,

The previous session was reset because its gang was disconnected,数据库重置了该会话。

错误场景二:

我们把代码中的commit部分放开,把连接close的代码注释掉:

update1_sql = "select %s"
conn1 = Tools.gp_connect()
cur1 = conn1.cursor()
cur1.execute(update1_sql % "'2020-09-08'")
row = cur1.fetchone()
cur1.close()
print(row[0])
conn1.commit()
# conn1.close()

time.sleep(1000000)

此时是一个完整的事务,只是说我们没有把该连接关掉,还可以在该代码中执行一些其他的操作,但是最终,这个链接是一定要关掉的,否则会耗尽数据库资源:

2020-09-09 20:38:43.982689 CST,"gposs5","postgres",p5655,th-591984576,"10.211.55.2","49315",2020-09-09 20:38:43 CST,0,con44,cmd1,seg-1,,dx28,,sx1,"LOG","00000","statement: BEGIN",,,,,,"BEGIN",0,,"postgres.c",1590,
2020-09-09 20:38:43.996448 CST,"gposs5","postgres",p5655,th-591984576,"10.211.55.2","49315",2020-09-09 20:38:43 CST,0,con44,cmd2,seg-1,,dx28,,sx1,"LOG","00000","statement: select '2020-09-08'",,,,,,"select '2020-09-08'",0,,"postgres.c",1590,
2020-09-09 20:38:43.996787 CST,"gposs5","postgres",p5655,th-591984576,"10.211.55.2","49315",2020-09-09 20:38:43 CST,0,con44,cmd4,seg-1,,dx28,,sx1,"LOG","00000","statement: COMMIT",,,,,,"COMMIT",0,,"postgres.c",1590,

错误场景三:

这个场景比较有意思,也是最值得我们注意的一个错误场景,如下,我们把commit和close操作都注释掉:

update1_sql = "select %s"
conn1 = Tools.gp_connect()
cur1 = conn1.cursor()
cur1.execute(update1_sql % "'2020-09-08'")
row = cur1.fetchone()
cur1.close()
print(row[0])
# conn1.commit()
# conn1.close()

time.sleep(1000000)

此时,该代码也能正常执行,控制台也有正确的结果返回。后台数据库日志如下,只有begin和sql查询,没有对应的commit语句。

2020-09-09 20:42:15.271391 CST,"gposs5","postgres",p6347,th-591984576,"10.211.55.2","49861",2020-09-09 20:42:15 CST,0,con47,cmd1,seg-1,,dx32,,sx1,"LOG","00000","statement: BEGIN",,,,,,"BEGIN",0,,"postgres.c",1590,
2020-09-09 20:42:15.284936 CST,"gposs5","postgres",p6347,th-591984576,"10.211.55.2","49861",2020-09-09 20:42:15 CST,0,con47,cmd2,seg-1,,dx32,,sx1,"LOG","00000","statement: select '2020-09-08'",,,,,,"select '2020-09-08'",0,,"postgres.c",1590,

此时我们查询活动视图,发现该事务没有正常关闭:

postgres=# select * from pg_stat_activity;
 datid | datname  | procpid | sess_id | usesysid | usename |          current_query          | waiting |          query_start          |         backend_start
   | client_addr | client_port | application_name |          xact_start           | waiting_reason | rsgid | rsgname | rsgqueueduration
-------+----------+---------+---------+----------+---------+---------------------------------+---------+-------------------------------+----------------------------
---+-------------+-------------+------------------+-------------------------------+----------------+-------+---------+------------------
 12094 | postgres |    6347 |      47 |       10 | gposs5  | <IDLE> in transaction           | f       | 2020-09-09 20:42:15.284855+08 | 2020-09-09 20:42:15.268696+
08 | 10.211.55.2 |       49861 |                  | 2020-09-09 20:42:15.271253+08 |                |     0 | unknown |
 12094 | postgres |    3096 |      25 |       10 | gposs5  | select * from pg_stat_activity; | f       | 2020-09-09 20:45:39.149973+08 | 2020-09-09 20:25:17.0897+08
   |             |          -1 | psql             | 2020-09-09 20:45:39.149973+08 |                |     0 | unknown |
(2 rows)

那么这个场景就有意思了,如果你不提交,也不关闭连接,此时只能听天由命了,万一来一个DDL语句,马上就会锁表,所有DDL后面的查询都会等待锁而处于假死状态。

从上面的几个例子可以看到,在python里面,如果没有显示的提交,便会产生上面的一些提示日志,这种情况,可以扩展到任何将查询作为一个单独事务的语言或框架中,这就要求我们在平时代码编写时,养成良好的数据库提交习惯。

备注:以上问题是在gpdb5.x上面复现的,在5.x的较新版本及6.x中,存在一个idle in transaction处理参数,用于在数据库层面上处理这种未正常结束的事务。无论怎样,最好的处理方式还是正确的使用代码来访问数据库。

苑泽福,Greenplum中文社区成员,山东辉鸿泛在电子科技有限公司技术总监。从业十几年,曾供职于鼎兴达、瀚高,拥有丰富的数据库开发运维经验,近年来一直专注于Greenplum数据库,主导并完成了多个基于Greenplum的数据平台落地。目前专注于物联网产品研发,处于创业阶段。希望能用有限的时间,为Greenplum开源社区作出微薄贡献。业余时间喜欢跑步、踢球。

关注微信公众号

VMware 中国研发中心