标簽
PostgreSQL , postgis , pglz_decompress , perf , tuning
https://github.com/digoal/blog/blob/master/201810/20181001_02.md#%E8%83%8C%E6%99%AF 背景
在空間資料中,通常會有軌迹、點、面的資料,假設有兩張表,一張為面的表,一張為點的表,使用包含 ST_xxxx(c.geom, p.geom) 來進行JOIN(例如以面為機關,聚合統計點的數量)。
本文介紹了空間JOIN的性能分析,瓶頸分析,優化方法。
https://github.com/digoal/blog/blob/master/201810/20181001_02.md#%E5%8E%9F%E6%96%87 原文
http://blog.cleverelephant.ca/2018/09/postgis-external-storage.htmlhttps://github.com/digoal/blog/blob/master/201810/20181001_02.md#%E4%BE%8B%E5%AD%90 例子
測試資料:
Setup
First download some polygons and some points.
Admin 0 - Countries Populated PlacesLoad the shapes into your database.
shp2pgsql -s 4326 -D -I ne_10m_admin_0_countries.shp countries | psql performance
shp2pgsql -s 4326 -D -I ne_10m_populated_places.shp places | psql performance
包含大量POINT的空間對象
SELECT count(*)
FROM countries
WHERE ST_NPoints(geom) > (8192 / 16);
1、使用預設的壓縮格式時,這個空間JOIN查詢,耗時25秒。
SELECT count(*), c.name
FROM countries c
JOIN places p
ON ST_Intersects(c.geom, p.geom)
GROUP BY c.name;
使用PERF或oprofile跟蹤其耗時的代碼
《PostgreSQL 代碼性能診斷之 - OProfile & Systemtap》 《PostgreSQL 源碼性能診斷(perf profiling)指南 - 珍藏級》發現問題是解壓縮的pglz_decompress 接口造成的。
《TOAST,The Oversized-Attribute Storage Technique - 暨存儲格式main, extended, external, plain介紹》2、将空間字段改成非壓縮格式,耗時降到4秒。
-- Change the storage type
ALTER TABLE countries
ALTER COLUMN geom
SET STORAGE EXTERNAL;
-- Force the column to rewrite
UPDATE countries
SET geom = ST_SetSRID(geom, 4326);
vacuum full countries;
-- Re-run the query
SELECT count(*), c.name
FROM countries c
JOIN places p
ON ST_Intersects(c.geom, p.geom)
GROUP BY c.name;
https://github.com/digoal/blog/blob/master/201810/20181001_02.md#%E5%B0%8F%E7%BB%93 小結
1、代碼層面的性能瓶頸分析方法,perf.
2、PostGIS空間相關計算函數
http://postgis.net/docs/manual-dev/reference.html3、資料庫表級存儲格式包括4種:
對于定長的字段類型,存儲格式如下:
PLAIN
prevents either compression or out-of-line storage; furthermore it disables use of single-byte headers for varlena types. This is the only possible strategy for columns of non-TOAST-able data types.
對于變長的字段類型,除了可以使用PLAIN格式,還可以使用如下存儲格式:
EXTENDED
allows both compression and out-of-line storage.
This is the default for most TOAST-able data types.
Compression will be attempted first, then out-of-line storage if the row is still too big.
EXTERNAL
allows out-of-line storage but not compression.
Use of EXTERNAL will make substring operations on wide text and bytea columns faster (at the penalty of increased storage space) because these operations are optimized to fetch only the required parts of the out-of-line value when it is not compressed.
MAIN
allows compression but not out-of-line storage.
(Actually, out-of-line storage will still be performed for such columns, but only as a last resort when there is no other way to make the row small enough to fit on a page.)
4、本文發現的瓶頸為變長字段,壓縮後,解壓縮的pglz_decompress 接口,是以将字段的存儲格式改為非壓縮格式,即提升了大量的性能。