天天看點

PostgreSQL PostGIS point join polygon (by ST_xxxx) - pglz_decompress 性能優化

标簽

PostgreSQL , postgis , pglz_decompress , perf , tuning

https://github.com/digoal/blog/blob/master/201810/20181001_02.md#%E8%83%8C%E6%99%AF 背景

在空間資料中,通常會有軌迹、點、面的資料,假設有兩張表,一張為面的表,一張為點的表,使用包含 ST_xxxx(c.geom, p.geom) 來進行JOIN(例如以面為機關,聚合統計點的數量)。

PostgreSQL PostGIS point join polygon (by ST_xxxx) - pglz_decompress 性能優化

本文介紹了空間JOIN的性能分析,瓶頸分析,優化方法。

https://github.com/digoal/blog/blob/master/201810/20181001_02.md#%E5%8E%9F%E6%96%87 原文

http://blog.cleverelephant.ca/2018/09/postgis-external-storage.html

https://github.com/digoal/blog/blob/master/201810/20181001_02.md#%E4%BE%8B%E5%AD%90 例子

測試資料:

Setup

First download some polygons and some points.

Admin 0 - Countries Populated Places

Load the shapes into your database.

shp2pgsql -s 4326 -D -I ne_10m_admin_0_countries.shp countries | psql performance  
  
shp2pgsql -s 4326 -D -I ne_10m_populated_places.shp places | psql performance  
           

包含大量POINT的空間對象

SELECT count(*)   
  FROM countries   
  WHERE ST_NPoints(geom) > (8192 / 16);  
           

1、使用預設的壓縮格式時,這個空間JOIN查詢,耗時25秒。

SELECT count(*), c.name   
  FROM countries c   
  JOIN places p   
  ON ST_Intersects(c.geom, p.geom)   
  GROUP BY c.name;  
           

使用PERF或oprofile跟蹤其耗時的代碼

《PostgreSQL 代碼性能診斷之 - OProfile & Systemtap》 《PostgreSQL 源碼性能診斷(perf profiling)指南 - 珍藏級》

發現問題是解壓縮的pglz_decompress 接口造成的。

《TOAST,The Oversized-Attribute Storage Technique - 暨存儲格式main, extended, external, plain介紹》

2、将空間字段改成非壓縮格式,耗時降到4秒。

-- Change the storage type  
ALTER TABLE countries  
  ALTER COLUMN geom  
  SET STORAGE EXTERNAL;  
  
-- Force the column to rewrite  
UPDATE countries  
  SET geom = ST_SetSRID(geom, 4326);  
  
vacuum full countries;   
  
-- Re-run the query    
SELECT count(*), c.name   
  FROM countries c   
  JOIN places p   
  ON ST_Intersects(c.geom, p.geom)   
  GROUP BY c.name;  
           

https://github.com/digoal/blog/blob/master/201810/20181001_02.md#%E5%B0%8F%E7%BB%93 小結

1、代碼層面的性能瓶頸分析方法,perf.

2、PostGIS空間相關計算函數

http://postgis.net/docs/manual-dev/reference.html

3、資料庫表級存儲格式包括4種:

對于定長的字段類型,存儲格式如下:

PLAIN     
prevents either compression or out-of-line storage; furthermore it disables use of single-byte headers for varlena types. This is the only possible strategy for columns of non-TOAST-able data types.    
           

對于變長的字段類型,除了可以使用PLAIN格式,還可以使用如下存儲格式:

EXTENDED     
allows both compression and out-of-line storage.     
This is the default for most TOAST-able data types.     
Compression will be attempted first, then out-of-line storage if the row is still too big.    
    
EXTERNAL     
allows out-of-line storage but not compression.     
Use of EXTERNAL will make substring operations on wide text and bytea columns faster (at the penalty of increased storage space) because these operations are optimized to fetch only the required parts of the out-of-line value when it is not compressed.    
    
MAIN     
allows compression but not out-of-line storage.     
(Actually, out-of-line storage will still be performed for such columns, but only as a last resort when there is no other way to make the row small enough to fit on a page.)    
           

4、本文發現的瓶頸為變長字段,壓縮後,解壓縮的pglz_decompress 接口,是以将字段的存儲格式改為非壓縮格式,即提升了大量的性能。

https://github.com/digoal/blog/blob/master/201810/20181001_02.md#%E5%8F%82%E8%80%83 參考