标簽
PostgreSQL , 并行計算
https://github.com/digoal/blog/blob/master/201812/20181218_01.md#%E8%83%8C%E6%99%AF 背景
PostgreSQL 并行計算原理、應用參考:
《PostgreSQL 多場景 沙箱實驗》https://github.com/digoal/blog/blob/master/201812/20181218_01.md#%E4%BC%98%E5%8C%96%E5%99%A8%E5%B9%B6%E8%A1%8C%E8%AE%A1%E7%AE%97%E7%9A%84%E5%B9%B6%E8%A1%8C%E5%BA%A6%E8%AE%A1%E7%AE%97%E6%96%B9%E6%B3%95 優化器并行計算的并行度計算方法
1、總worker程序數
postgres=# show ;
max_worker_processes
----------------------
128
(1 row)
2、所有會話,在同一時刻的QUERY,并行計算最大允許開啟的WORKER數。
max_parallel_workers
3、單條QUERY中,每個node最多允許開啟的并行計算WORKER數
postgres=# show max_parallel_workers_per_gather ;
max_parallel_workers_per_gather
---------------------------------
0
(1 row)
4、單個query, node的并行度
Min(parallel_workers(表級設定,沒有設定則,根據表大小計算得到), max_parallel_workers_per_gather)
5、表級并行度參數,預設不設定,從表大小計算。
postgres=# alter table pa set (parallel_workers =32);
ALTER TABLE
6、真實并行度算法
min (max_worker_processes - 已運作workers ,
max_parallel_workers - 其他會話目前真實啟用的并行度 ,
Min(parallel_workers(表級設定,沒有設定則,根據表大小計算得到), max_parallel_workers_per_gather)
)
https://github.com/digoal/blog/blob/master/201812/20181218_01.md#%E4%BC%98%E5%8C%96%E5%99%A8%E6%98%AF%E5%90%A6%E9%80%89%E6%8B%A9%E5%B9%B6%E8%A1%8C%E8%AE%A1%E7%AE%97 優化器是否選擇并行計算
優化器是否使用并行計算,取決于CBO,選擇成本最低的方法,并行計算成本估算,成本因子參數如下:
postgres=# show parallel_tuple_cost ;
parallel_tuple_cost
---------------------
0
(1 row)
postgres=# show parallel_setup_cost ;
parallel_setup_cost
---------------------
0
(1 row)
如果非并行計算的執行計劃成本低于并行計算的成本,則不使用并行計算。
https://github.com/digoal/blog/blob/master/201812/20181218_01.md#%E4%BC%98%E5%8C%96%E5%99%A8%E6%98%AF%E5%90%A6%E5%BF%BD%E7%95%A5%E5%B9%B6%E8%A1%8C%E8%AE%A1%E7%AE%97 優化器是否忽略并行計算
如果表掃描或索引掃描的表或索引低于設定的門檻值,這個表掃描或索引掃描則不啟用并行計算。
postgres=# show min_parallel_table_scan_size ;
min_parallel_table_scan_size
------------------------------
0
(1 row)
postgres=# show min_parallel_index_scan_size ;
min_parallel_index_scan_size
------------------------------
0
(1 row)
https://github.com/digoal/blog/blob/master/201812/20181218_01.md#%E4%BC%98%E5%8C%96%E5%99%A8%E5%BC%BA%E5%88%B6%E9%80%89%E6%8B%A9%E5%B9%B6%E8%A1%8C%E8%AE%A1%E7%AE%97%E5%8F%82%E6%95%B0 優化器強制選擇并行計算參數
#force_parallel_mode = on
https://github.com/digoal/blog/blob/master/201812/20181218_01.md#%E5%B9%B6%E8%A1%8C%E8%AE%A1%E7%AE%97%E7%9B%B8%E5%85%B3%E5%8F%82%E6%95%B0 并行計算相關參數
1、建立索引,CREATE TABLE AS,SELECT INTO 的并行度
postgres=# show max_parallel_maintenance_workers ;
max_parallel_maintenance_workers
----------------------------------
24
(1 row)
2、并行分區表JOIN
#enable_partitionwise_join = on
3、并行分區表分區聚合
#enable_partitionwise_aggregate = on
4、并行HASH計算
#enable_parallel_hash = on
5、LEADER主動擷取并行WORKER的傳回結果
parallel_leader_participation = on
6、并行APPEND(分區表),UNION ALL查詢
#enable_parallel_append = on
https://github.com/digoal/blog/blob/master/201812/20181218_01.md#%E5%BC%BA%E5%88%B6%E5%B9%B6%E8%A1%8C 強制并行
強制并行度24
1、總的可開啟的WORKER足夠大
postgres=# show max_worker_processes ;
max_worker_processes
----------------------
128
(1 row)
2、所有會話同時執行并行計算的并行度足夠大
postgres=# set max_parallel_workers=64;
SET
3、單個QUERY中并行計算NODE開啟的WORKER=24
postgres=# set max_parallel_workers_per_gather =24;
SET
4、所有表和索引掃描允許并行
postgres=# set min_parallel_table_scan_size =0;
SET
postgres=# set min_parallel_index_scan_size =0;
SET
5、并行計算優化器成本設定為0
postgres=# set parallel_tuple_cost =0;
SET
postgres=# set parallel_setup_cost =0;
SET
6、設定表級并行度為24
postgres=# alter table pa set (parallel_workers =24);
ALTER TABLE
7、效果,強制24并行。
postgres=# explain (analyze) select count(*) from pa;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=1615.89..1615.89 rows=1 width=8) (actual time=81.711..81.711 rows=1 loops=1)
-> Gather (cost=1615.83..1615.83 rows=24 width=8) (actual time=81.572..90.278 rows=25 loops=1)
Workers Planned: 24
Workers Launched: 24
-> Partial Aggregate (cost=1615.83..1615.83 rows=1 width=8) (actual time=58.411..58.411 rows=1 loops=25)
-> Parallel Seq Scan on pa (cost=0.00..712.71 rows=416667 width=0) (actual time=0.012..35.428 rows=400000 loops=25)
Planning Time: 0.449 ms
Execution Time: 90.335 ms
(8 rows)
https://github.com/digoal/blog/blob/master/201812/20181218_01.md#%E5%87%BD%E6%95%B0%E5%B9%B6%E8%A1%8C 函數并行
1、并行函數
create or replace function ftest(int) returns boolean as $$
select $1<1000;
$$ language sql strict
parallel safe;
-- parallel safe 文法
2、并行聚合函數
combinefunc
《PostgreSQL 11 preview - 多階段并行聚合array_agg, string_agg》 《PostgreSQL Oracle 相容性之 - 自定義并行聚合函數 PARALLEL_ENABLE AGGREGATE》 《PostgreSQL 10 自定義并行計算聚合函數的原理與實踐 - (含array_agg合并多個數組為單個一進制數組的例子)》