标签
PostgreSQL , rotate table , 自动清理 , 触发器 , insert into on conflict
https://github.com/digoal/blog/blob/master/201803/20180311_06.md#%E8%83%8C%E6%99%AF 背景
时序场景,这个需求很常见:
1、自动清理过期的历史数据,并且要保证清理速度够快,不产生WAL日志。
要么DROP,要么truncate。
2、我们知道PG支持分区表,如果是通过分区的方法来实现,最好不过了,但是需要不停的创建分区,这个目前pg_pathman能够满足自动创建分区。但是自动删除分区还是不够自动。
所以怎么做呢?
https://github.com/digoal/blog/blob/master/201803/20180311_06.md#%E4%B8%80%E4%B8%AA%E5%81%9A%E6%B3%95%E6%98%AF%E8%BF%99%E6%A0%B7%E7%9A%84 一个做法是这样的:
1、固定一个周期的所有分区表(类似list分区),比如以小时为结尾的表,只需要24个。以周为单位的表,只需要7个。以分钟为单位的表,需要60个。
2、自动根据数据插入的时间,触发,并清理(truncate)旧的数据。
这种方法,写入代码是固定的不需要变更。
数据表是一次性创建的,不需要后续再增加。
https://github.com/digoal/blog/blob/master/201803/20180311_06.md#%E4%BE%8B%E5%AD%90 例子
1、创建周期内的所有明细分区表
以10分钟为单位,创建6个表,代表每个10分钟。
create table t1_0 (id int, info text, crt_time timestamp, check (substring(to_char(crt_time,'yyyymmddhh24miss'), 11, 1)='0')); create table t1_1 (id int, info text, crt_time timestamp, check (substring(to_char(crt_time,'yyyymmddhh24miss'), 11, 1)='1')); create table t1_2 (id int, info text, crt_time timestamp, check (substring(to_char(crt_time,'yyyymmddhh24miss'), 11, 1)='2')); create table t1_3 (id int, info text, crt_time timestamp, check (substring(to_char(crt_time,'yyyymmddhh24miss'), 11, 1)='3')); create table t1_4 (id int, info text, crt_time timestamp, check (substring(to_char(crt_time,'yyyymmddhh24miss'), 11, 1)='4')); create table t1_5 (id int, info text, crt_time timestamp, check (substring(to_char(crt_time,'yyyymmddhh24miss'), 11, 1)='5'));
2、创建约束表,或者说调度表(因为我们不能每条记录都触发一个TRUNCATE事件)
以10分钟为周期,清除10分钟前的数据。
create table t_const(crt_time timestamp primary key);
3、创建分区表的触发器,将数据按周期截断后,写入约束表。
create or replace function tg() returns trigger as $$ declare begin insert into t_const values (to_timestamp(substring(to_char(NEW.crt_time,'yyyymmddhh24miss'), 1, 11)||'000','yyyymmddhh24miss')) on conflict (crt_time) do nothing; return null; end; $$ language plpgsql strict; create trigger tg1 after insert on t1_0 for each row execute procedure tg(); create trigger tg1 after insert on t1_1 for each row execute procedure tg(); create trigger tg1 after insert on t1_2 for each row execute procedure tg(); create trigger tg1 after insert on t1_3 for each row execute procedure tg(); create trigger tg1 after insert on t1_4 for each row execute procedure tg(); create trigger tg1 after insert on t1_5 for each row execute procedure tg();
4、创建约束表触发器,触发清除明细表分区的规则。
create or replace function tg_truncate() returns trigger as $$ declare suffix int := substring(to_char(NEW.crt_time,'yyyymmddhh24miss'), 11, 1)::int; begin set lock_timeout = '1s'; for i in 0..5 loop if i=suffix then continue; end if; if suffix=0 and i=5 then continue; end if; if i=suffix-1 then continue; end if; execute 'truncate t1_'||i; raise notice 'truncated %', 't1_'||i; end loop; return null; end; $$ language plpgsql strict;
create trigger tg1 after insert on t_const for each row execute procedure tg_truncate();
试一试
自动清除
postgres=# insert into t1_0 values (1,now(),'2018-01-02 10:00:40'); NOTICE: truncated t1_1 NOTICE: truncated t1_2 NOTICE: truncated t1_3 NOTICE: truncated t1_4 INSERT 0 1
如果后面再写入同一个区,不会触发自动清除,符合要求(仅第一条触发)。
postgres=# insert into t1_0 values (1,now(),'2018-01-02 10:00:40'); INSERT 0 1 postgres=# insert into t1_0 values (1,now(),'2018-01-02 10:01:50'); INSERT 0 1
https://github.com/digoal/blog/blob/master/201803/20180311_06.md#%E5%B0%8F%E7%BB%93 小结
使用分区,实现了数据的分区存放。(目前如果使用原生分区表的话,写入会对主表和所有子分区加共享锁,导致无法truncate。建议方法:使用pg_pathman分区,或者直接写子分区。)
《分区表锁粒度差异 - pg_pathman VS native partition table》使用TRUNCATE,使得清理数据时不会产生WAL日志。
使用触发器,实现了自动的数据清理。