标簽
PostgreSQL , 實時軌迹 , IoT , 車聯網 , GIS , 離散IO , 順序IO , IO放大 , 聚合 , PGLZ壓縮解壓縮 , IN OUT函數消耗
https://github.com/digoal/blog/blob/master/201812/20181209_01.md#%E8%83%8C%E6%99%AF 背景
車聯網場景,一個行程由若幹個點組成,業務需要按行程查詢,按。
但是問題來了:
一個行程包含多個點,那麼一次查詢意味着需要查詢并傳回多個點(多條記錄)。
由于有多個裝置(例如汽車),大機率可能同時活躍,同時上傳點,是以以堆表存儲的話,一個行程中的多條記錄,實際上在資料庫的表裡面,存儲在多個BLOCK裡面。
如上分析,性能問題:IO放大。
如何優化?
1、行程記錄,按行聚集存儲。
類似cluster操作。
2、行程記錄,合并到單條,聚集存儲。
類似将一個行程多條記錄聚合。
https://github.com/digoal/blog/blob/master/201812/20181209_01.md#%E4%BE%8B%E5%AD%90 例子
下面分别測試幾種優化方法帶來的性能優化效果。
1、cluster
2、array 聚合帶壓縮
3、array 聚合不帶壓縮
4、jsonb 聚合帶壓縮
5、jsonb 聚合不帶壓縮
6、text 聚合帶壓縮
7、text 聚合不帶壓縮
https://github.com/digoal/blog/blob/master/201812/20181209_01.md#1-%E5%8E%9F%E5%A7%8B%E7%8A%B6%E6%80%81 1 原始狀态
create unlogged table t_sensor(id serial8 primary key, sid int8, att text, crt_time timestamp);
create index idx_t_sensor_1 on t_sensor (sid, crt_time);
vi test.sql
\set sid random(1,10000)
insert into t_sensor(sid,att,crt_time) values (:sid, md5(random()::text), now());
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 50 -j 50 -t 2000000
檢視一個行程,需要通路11227個資料塊。
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_sensor where sid=1 order by crt_time;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_t_sensor_1 on public.t_sensor (cost=0.49..253.36 rows=11121 width=57) (actual time=0.020..47.591 rows=11189 loops=1)
Output: id, sid, att, crt_time
Index Cond: (t_sensor.sid = 1)
Buffers: shared hit=3406 read=7821
Planning Time: 0.092 ms
Execution Time: 48.303 ms
(6 rows)
postgres=# select 119*11227*8/1024.0;
?column?
--------------------
10437.601562500000
(1 row)
壓測性能
vi test.sql
\set sid random(1,10000)
select * from t_sensor where sid=:sid order by crt_time ;
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 56 -j 56 -T 120
transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 56
number of threads: 56
duration: 120 s
number of transactions actually processed: 14307
latency average = 470.175 ms
latency stddev = 43.500 ms
tps = 119.037842 (including connections establishing)
tps = 119.098221 (excluding connections establishing)
statement latencies in milliseconds:
0.003 \set sid random(1,10000)
470.178 select * from t_sensor where sid=:sid order by crt_time ;
https://github.com/digoal/blog/blob/master/201812/20181209_01.md#2-cluster 2 cluster
按行程,時間索引聚集。一次查詢傳回一條聚合後的記錄。
postgres=# cluster t_sensor USING idx_t_sensor_1 ;
查詢一個行程,掃描174個資料塊。IO 驟降。
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_sensor where sid=1 order by crt_time;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_t_sensor_1 on public.t_sensor (cost=0.49..253.39 rows=11149 width=57) (actual time=0.018..2.212 rows=11189 loops=1)
Output: id, sid, att, crt_time
Index Cond: (t_sensor.sid = 1)
Buffers: shared hit=174
Planning Time: 0.094 ms
Execution Time: 2.816 ms
(6 rows)
壓測性能,見末尾。
https://github.com/digoal/blog/blob/master/201812/20181209_01.md#3-array-%E5%B8%A6%E5%8E%8B%E7%BC%A9 3 array 帶壓縮
create unlogged table t_sensor_agg(sid int8 primary key, agg t_sensor[]);
insert into t_sensor_agg select sid,array_agg(t_sensor order by crt_time) from t_sensor group by sid;
瓶頸:array類型的INPUT OUTPUT接口,以及壓縮解壓縮接口
Samples: 3M of event 'cpu-clock', Event count (approx.): 559749813667
Overhead Shared Object Symbol
24.59% postgres [.] array_out
20.70% postgres [.] record_out
6.74% postgres [.] pglz_decompress
3.78% libc-2.17.so [.] __memcpy_ssse3_back
2.72% postgres [.] pg_ltostr_zeropad
2.34% [kernel] [k] run_timer_softirq
2.23% postgres [.] pg_lltoa
2.13% postgres [.] 0x000000000035c350
1.76% postgres [.] heap_deform_tuple
1.49% libc-2.17.so [.] __strlen_sse2_pminub
1.36% postgres [.] appendStringInfoChar
1.36% [kernel] [k] copy_user_enhanced_fast_string
1.29% postgres [.] 0x000000000035c36c
1.28% postgres [.] 0x000000000035c362
1.17% postgres [.] FunctionCall1Coll
0.92% postgres [.] hash_search_with_hash_value
0.86% [kernel] [k] _raw_spin_unlock_irqrestore
0.84% postgres [.] j2date
0.82% postgres [.] 0x000000000035c357
0.76% postgres [.] palloc
0.76% postgres [.] lookup_type_cache
0.67% postgres [.] 0x000000000035c360
0.66% postgres [.] timestamp2tm
0.64% [kernel] [k] rcu_process_callbacks
0.64% [kernel] [k] __do_softirq
vi test.sql
\set sid random(1,10000)
select * from t_sensor_agg4 where sid=:sid ;
https://github.com/digoal/blog/blob/master/201812/20181209_01.md#4-array-%E4%B8%8D%E5%B8%A6%E5%8E%8B%E7%BC%A9 4 array 不帶壓縮
create unlogged table t_sensor_agg1(sid int8 primary key, agg t_sensor[]);
alter table t_sensor_agg1 alter column agg set storage external;
insert into t_sensor_agg1 select sid,array_agg(t_sensor order by crt_time) from t_sensor group by sid;
瓶頸:array類型的INPUT OUTPUT接口
Samples: 1M of event 'cpu-clock', Event count (approx.): 310127790569
Overhead Shared Object Symbol
26.06% postgres [.] array_out
21.44% postgres [.] record_out
4.20% libc-2.17.so [.] __memcpy_ssse3_back
2.86% [kernel] [k] run_timer_softirq
2.75% postgres [.] pg_ltostr_zeropad
2.65% postgres [.] heap_deform_tuple
2.28% postgres [.] pg_lltoa
2.14% postgres [.] 0x000000000035c350
1.87% [kernel] [k] copy_user_enhanced_fast_string
1.52% libc-2.17.so [.] __strlen_sse2_pminub
1.47% postgres [.] appendStringInfoChar
1.32% postgres [.] 0x000000000035c36c
1.30% postgres [.] 0x000000000035c362
1.20% postgres [.] FunctionCall1Coll
1.11% postgres [.] hash_search_with_hash_value
0.87% postgres [.] j2date
0.85% postgres [.] 0x000000000035c357
0.81% [kernel] [k] _raw_spin_unlock_irqrestore
0.76% postgres [.] lookup_type_cache
0.75% postgres [.] 0x000000000046d33b
0.74% postgres [.] palloc
0.72% [kernel] [k] rcu_process_callbacks
0.68% postgres [.] timestamp2tm
0.68% postgres [.] pfree
https://github.com/digoal/blog/blob/master/201812/20181209_01.md#5-jsonb-%E5%B8%A6%E5%8E%8B%E7%BC%A9 5 jsonb 帶壓縮
create unlogged table t_sensor_agg2(sid int8 primary key, agg jsonb);
insert into t_sensor_agg2 select sid,jsonb_agg(t_sensor order by crt_time) from t_sensor group by sid;
https://github.com/digoal/blog/blob/master/201812/20181209_01.md#6-jsonb-%E4%B8%8D%E5%B8%A6%E5%8E%8B%E7%BC%A9 6 jsonb 不帶壓縮
create unlogged table t_sensor_agg3(sid int8 primary key, agg jsonb);
alter table t_sensor_agg3 alter column agg set storage external;
insert into t_sensor_agg3 select sid,jsonb_agg(t_sensor order by crt_time) from t_sensor group by sid;
https://github.com/digoal/blog/blob/master/201812/20181209_01.md#7-text-%E5%B8%A6%E5%8E%8B%E7%BC%A9 7 text 帶壓縮
create unlogged table t_sensor_agg4(sid int8 primary key, agg text);
insert into t_sensor_agg4 select sid,string_agg(t_sensor::text, '|' order by crt_time) from t_sensor group by sid;
https://github.com/digoal/blog/blob/master/201812/20181209_01.md#8-text-%E4%B8%8D%E5%B8%A6%E5%8E%8B%E7%BC%A9 8 text 不帶壓縮
create unlogged table t_sensor_agg5(sid int8 primary key, agg text);
alter table t_sensor_agg5 alter column agg set storage external;
insert into t_sensor_agg5 select sid,string_agg(t_sensor::text, '|' order by crt_time) from t_sensor group by sid;
https://github.com/digoal/blog/blob/master/201812/20181209_01.md#9-index-only-scan-%E7%B1%BB%E4%BC%BC%E8%81%9A%E9%9B%86%E8%A1%A8%E6%95%88%E6%9E%9C 9 index only scan 類似聚集表效果
所有内容作為INDEX的KEY,類似聚集表的效果(相鄰内容在同一個INDEX PAGE裡面)。查詢時走INDEX ONLY SCAN掃描方法,掃描的BLOCK最少。
注意:btree 索引内容不能超過1/3 PAGE (因為BTREE是雙向連結清單,一個PAGE至少要有一條有效記錄,是以有這個限制。)。
寫入資料
create table t_sensor (id serial8 primary key, sid int8, att text, crt_time timestamp);
create index idx_t_sensor_1 on t_sensor (sid, crt_time, att, id);
vi test.sql
\set sid random(1,10000)
insert into t_sensor(sid,att,crt_time) values (:sid, md5(random()::text), now());
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 50 -j 50 -t 2000000
transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 50
number of threads: 50
number of transactions per client: 2000000
number of transactions actually processed: 100000000/100000000
latency average = 0.193 ms
latency stddev = 0.461 ms
tps = 257995.418591 (including connections establishing)
tps = 258024.212148 (excluding connections establishing)
statement latencies in milliseconds:
0.001 \set sid random(1,10000)
0.192 insert into t_sensor(sid,att,crt_time) values (:sid, md5(random()::text), now());
生成VM檔案(autovacuum觸發時會自動生成,但是為了立馬看效果,手工執行一下。)
vacuum analyze t_sensor;
INDEX ONLY SCAN, IO減少效果如下:
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t_sensor where sid=2 order by crt_time;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using idx_t_sensor_1 on public.t_sensor (cost=0.60..70.41 rows=9960 width=57) (actual time=0.019..2.109 rows=9978 loops=1)
Output: id, sid, att, crt_time
Index Cond: (t_sensor.sid = 2)
Heap Fetches: 0
Buffers: shared hit=235
Planning Time: 0.090 ms
Execution Time: 2.652 ms
(7 rows)
查詢性能:
vi test.sql
\set sid random(1,10000)
select * from t_sensor where sid=:sid order by crt_time;
pgbench -M prepared -n -r -P 1 -f ./test.sql -c 28 -j 28 -T 120
transaction type: ./test.sql
scaling factor: 1
query mode: prepared
number of clients: 28
number of threads: 28
duration: 120 s
number of transactions actually processed: 283638
latency average = 11.844 ms
latency stddev = 1.931 ms
tps = 2363.410561 (including connections establishing)
tps = 2363.913145 (excluding connections establishing)
statement latencies in milliseconds:
0.002 \set sid random(1,10000)
11.842 select * from t_sensor where sid=:sid order by crt_time;
https://github.com/digoal/blog/blob/master/201812/20181209_01.md#%E5%B0%8F%E7%BB%93 小結
目标資料分散在多個BLOCK中,引起IO放大的問題,通過聚集存儲,或者通過聚合存儲,可以很好的解決這個問題。
聚合後,另一個瓶頸則是聚合後的類型(array,jsonb,text)的IN OUT接口。
/ | 表存儲 | 行程查詢 qps |
---|---|---|
原始(IO 放大) | 8880 MB | 119 |
順序(無IO 放大) | 2057 | |
index only scan(類似聚集表)(無IO 放大) | 2363 | |
聚合array(壓縮) | 4523 MB | 2362 |
聚合array(不壓縮) | 8714 MB | 2515 |
聚合json(壓縮) | 5052 MB | 3102 |
聚合json(不壓縮) | 13 GB | 3184 |
聚合text(壓縮) | 4969 MB | 6057 |
聚合text(不壓縮) | 7692 MB | 5997 |
從上面的測試,可以看到IN OUT函數接口的開銷,text<jsonb<array(composite array)。
實際的優化例子,可參考末尾的幾篇文章。例如:
1、按時間分區,舊的分區使用cluster,按行程整理資料,使用AB表切換,解決IO放大的問題。
2、異步聚合,将點資料準實時按行程ID,聚合到聚合後的行程表。
3、使用INDEX ONLY SCAN, 達到聚集表效果。對業務無任何侵入性。(例如按天分區,加全量(業務需要查詢的字段)索引。),相當于兩倍存儲空間(一份在堆表,一份在索引中)。