标簽
PostgreSQL , 一鍵診斷
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E8%83%8C%E6%99%AF 背景
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E4%B8%80%E8%AF%8A%E6%96%AD%E9%A1%B9%E7%B1%BB%E7%9B%AE---%E8%A7%84%E6%A0%BC%E9%85%8D%E7%BD%AE 一、診斷項類目 - 規格、配置
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E4%B8%80-%E8%A7%84%E6%A0%BC 一 規格
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#1-%E6%95%B0%E6%8D%AE%E5%BA%93%E5%AE%9E%E4%BE%8B%E8%A7%84%E6%A0%BC 1 資料庫執行個體規格
檢測方法
規格配置是否與實際配置相符
cpu
mem
iops
network bandwidth
disk size
告警規則
處理方法
背景知識
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E4%BA%8C-%E9%85%8D%E7%BD%AE 二 配置
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#1-roledb%E7%BA%A7%E5%AE%9A%E5%88%B6%E9%85%8D%E7%BD%AE 1 ROLE,DB級定制配置
select * from pg_db_role_setting;
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#2-%E6%95%B0%E6%8D%AE%E5%BA%93%E9%9D%9E%E9%BB%98%E8%AE%A4%E9%85%8D%E7%BD%AE 2 資料庫非預設配置
select source,name,setting,unit from pg_settings where source<>'default';
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E4%BA%8C%E8%AF%8A%E6%96%AD%E9%A1%B9%E7%B1%BB%E7%9B%AE---%E7%8A%B6%E6%80%81%E8%AF%8A%E6%96%AD 二、診斷項類目 - 狀态診斷
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E4%B8%80-%E6%A8%A1%E5%9D%97%E7%8A%B6%E6%80%81 一 子產品狀态
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#1-ha-%E6%A8%A1%E5%9D%97%E7%8A%B6%E6%80%81 1 HA 子產品狀态
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#2-agent%E7%8A%B6%E6%80%81 2 agent狀态
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#3-%E5%A4%87%E4%BB%BD%E6%A8%A1%E5%9D%97%E7%8A%B6%E6%80%81 3 備份子產品狀态
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#4-%E5%BD%92%E6%A1%A3%E6%97%A5%E5%BF%97%E4%B8%8A%E4%BC%A0%E6%A8%A1%E5%9D%97%E7%8A%B6%E6%80%81 4 歸檔日志上傳子產品狀态
或
pg_stat_archiver
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#5-%E5%AE%A1%E8%AE%A1%E6%97%A5%E5%BF%97%E4%B8%8A%E4%BC%A0%E6%A8%A1%E5%9D%97%E7%8A%B6%E6%80%81 5 審計日志上傳子產品狀态
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#6-%E4%BB%BB%E5%8A%A1%E6%A8%A1%E5%9D%97%E4%BB%A5%E5%8F%8A%E4%BB%BB%E5%8A%A1%E8%BF%90%E8%A1%8C%E7%8A%B6%E6%80%81 6 任務子產品以及任務運作狀态
使用者任務
備份任務
... ...
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E4%BA%8C-%E6%95%B0%E6%8D%AE%E5%BA%93%E7%8A%B6%E6%80%81 二 資料庫狀态
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#1-%E5%BF%83%E8%B7%B3%E7%8A%B6%E6%80%81 1 心跳狀态
資料庫讀寫是否正常
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#2-%E8%BF%9E%E6%8E%A5%E6%95%B0%E6%B4%BB%E8%B7%83%E5%89%A9%E4%BD%99 2 連接配接數(活躍、剩餘)
select current_setting('max_connections')::int - current_setting('superuser_reserved_connections')::int max_conn;
select state, count(*) from pg_stat_activity group by 1;
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#3-%E5%AE%B9%E7%81%BE%E8%8A%82%E7%82%B9%E7%8A%B6%E6%80%81 3 容災節點狀态
1、容災節點是否存在,
2、pg_stat_replication.state 狀态值
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#4-%E5%A4%87%E5%BA%93%E7%8A%B6%E6%80%81 4 備庫狀态
1、備庫是否存在,
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E4%B8%89-%E5%BB%B6%E8%BF%9F 三 延遲
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#1-%E5%AE%B9%E7%81%BE%E8%8A%82%E7%82%B9%E5%BB%B6%E8%BF%9F 1 容災節點延遲
send,write,apply 延遲
還有多少WAL SIZE有容災節點重搭風險. (wal_keep_segments - send延遲)
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#2-%E5%A4%87%E5%BA%93%E5%BB%B6%E8%BF%9F 2 備庫延遲
select usename,application_name,client_addr,state,
pg_wal_lsn_diff(pg_current_wal_insert_lsn(), sent_lsn) sent_delay,
pg_wal_lsn_diff(pg_current_wal_insert_lsn(), write_lsn) write_delay,
pg_wal_lsn_diff(pg_current_wal_insert_lsn(), flush_lsn) flush_delay,
pg_wal_lsn_diff(pg_current_wal_insert_lsn(), replay_lsn) replay_delay
from pg_stat_replication;
還有多少WAL SIZE有備庫重搭風險. (wal_keep_segments - send延遲)
9.4使用如下SQL
select usename,application_name,client_addr,state,
pg_xlog_location_diff(pg_current_xlog_insert_location(), sent_location) sent_delay,
pg_xlog_location_diff(pg_current_xlog_insert_location(), write_location) write_delay,
pg_xlog_location_diff(pg_current_xlog_insert_location(), flush_location) flush_delay,
pg_xlog_location_diff(pg_current_xlog_insert_location(), replay_location) replay_delay
from pg_stat_replication;
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#3-slot%E5%BB%B6%E8%BF%9F 3 SLOT延遲
風險,導緻目前資料庫節點WAL日志堆積、系統表膨脹。
檢查方法
select slot_name,plugin,slot_type,database,temporary,active,active_pid,
xmin,catalog_xmin,
pg_wal_lsn_diff(pg_current_wal_insert_lsn(),restart_lsn) restart_delay,
pg_wal_lsn_diff(pg_current_wal_insert_lsn(),confirmed_flush_lsn) flush_delay
from pg_replication_slots;
select slot_name,plugin,slot_type,database,active,
xmin,catalog_xmin,
pg_xlog_location_diff(pg_current_xlog_insert_location(),restart_lsn) restart_delay
from pg_replication_slots;
已有風險處理方法:
1、删除SLOT, 删掉後會自動清理。(删掉slot後,下遊使用這個SLOT接收的業務程式報錯,無法繼續接收)
2、下遊盡快消費延遲的SLOT的日志
3、手動設定SLOT的位點(相當于丢棄未被接收的部分,下遊使用這個SLOT的業務程式,重新的SLOT位點繼續接收)
函數URL
https://www.postgresql.org/docs/11/static/functions-admin.html#FUNCTIONS-REPLICATIONhttps://github.com/digoal/blog/blob/master/201806/20180613_05.md#4-%E5%BD%92%E6%A1%A3%E5%BB%B6%E8%BF%9F 4 歸檔延遲
風險,導緻目前資料庫節點WAL日志堆積。
pg_wal/archive_status/.ready 的數量 乘以 wal_segment_size
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E4%B8%89%E8%AF%8A%E6%96%AD%E9%A1%B9%E7%B1%BB%E7%9B%AE---%E8%B5%84%E6%BA%90%E8%AF%8A%E6%96%AD 三、診斷項類目 - 資源診斷
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E4%B8%80-%E7%A1%AC%E4%BB%B6%E8%B5%84%E6%BA%90 一 硬體資源
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#1-cpu 1 CPU
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#2-iops 2 IOPS
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#3-mem 3 MEM
page table
rss
cache
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#4-network 4 network
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#5-disk-%E7%A9%BA%E9%97%B4%E5%89%A9%E4%BD%99 5 disk 空間剩餘
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#6-disk-inode-%E5%89%A9%E4%BD%99 6 disk inode 剩餘
df -i
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E4%BA%8C-%E6%95%B0%E6%8D%AE%E5%BA%93%E8%B5%84%E6%BA%90 二 資料庫資源
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#1-%E6%95%B0%E6%8D%AE%E6%96%87%E4%BB%B6%E7%A9%BA%E9%97%B4 1 資料檔案空間
已使用
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#2-walredo%E7%A9%BA%E9%97%B4 2 WAL(REDO)空間
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#3-%E5%AE%9E%E4%BE%8B%E7%BA%A7%E7%A9%BA%E9%97%B4%E5%88%86%E5%B8%83 3 執行個體級空間分布
1、庫次元空間分布
select datname, pg_size_pretty(pg_database_size(oid)) from pg_database order by pg_database_size(oid) desc limit 10;
2、表空間次元空間分布
select spcname, pg_size_pretty(pg_tablespace_size(oid)) from pg_tablespace order by pg_tablespace_size(oid) desc limit 10;
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#4-%E5%BA%93%E7%BA%A7%E7%A9%BA%E9%97%B4%E5%88%86%E5%B8%83 4 庫級空間分布
1、對象次元空間分布(按pg_class.relkind歸類)
select relkind,pg_size_pretty(sum((pg_relation_size(oid)))) from pg_class group by 1 order by sum((pg_relation_size(oid))) desc;
2、表次元空間分布
select relname,pg_size_pretty(pg_relation_size(oid)) from pg_class where relkind='r' order by pg_relation_size(oid) desc limit 10;
3、索引次元空間分布
select relname,pg_size_pretty(pg_relation_size(oid)) from pg_class where relkind='i' order by pg_relation_size(oid) desc limit 10;
4、user次元空間分布
5、schema次元空間分布
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E5%9B%9B%E8%AF%8A%E6%96%AD%E9%A1%B9%E7%B1%BB%E7%9B%AE---%E6%80%A7%E8%83%BD%E8%AF%8A%E6%96%AD 四、診斷項類目 - 性能診斷
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E4%B8%80-%E7%83%AD%E7%82%B9sql%E5%AF%B9%E8%B1%A1%E5%87%BD%E6%95%B0 一 熱點SQL、對象、函數
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#1-%E8%87%AA%E4%B8%8A%E4%B8%80%E6%AC%A1%E9%87%8D%E7%BD%AE%E8%AE%A1%E6%95%B0%E5%99%A8%E4%BB%A5%E6%9D%A5-top-sql%E5%A4%9A%E7%BB%B4%E5%BA%A6 1 自上一次重置計數器以來, TOP SQL(多元度)
《PostgreSQL 如何查找TOP SQL (例如IO消耗最高的SQL) (包含SQL優化内容) - 珍藏級》1、總耗時TOP 10
select userid::regrole, dbid, query from pg_stat_statements order by total_time desc limit 10;
2、IO耗時TOP 10
select userid::regrole, dbid, query from pg_stat_statements order by (blk_read_time+blk_write_time) desc limit 10;
3、性能抖動TOP 10
select userid::regrole, dbid, query from pg_stat_statements order by stddev_time desc limit 10;
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#2-%E5%BD%93%E5%89%8D-top-sql 2 目前, TOP SQL
1、IO TOP SQL
結合iotop工具
2、CPU TOP SQL
結合top工具
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#3-%E8%87%AA%E4%B8%8A%E4%B8%80%E6%AC%A1%E9%87%8D%E7%BD%AE%E8%AE%A1%E6%95%B0%E5%99%A8%E4%BB%A5%E6%9D%A5-%E5%85%A8%E8%A1%A8%E6%89%AB%E6%8F%8F%E6%AC%A1%E6%95%B0top%E5%AF%B9%E8%B1%A1 3 自上一次重置計數器以來, 全表掃描次數TOP對象
select * from pg_stat_all_tables order by seq_scan desc limit 10;
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#4-%E8%87%AA%E4%B8%8A%E4%B8%80%E6%AC%A1%E9%87%8D%E7%BD%AE%E8%AE%A1%E6%95%B0%E5%99%A8%E4%BB%A5%E6%9D%A5-%E5%85%A8%E8%A1%A8%E6%89%AB%E6%8F%8F%E8%AE%B0%E5%BD%95%E6%95%B0top%E5%AF%B9%E8%B1%A1 4 自上一次重置計數器以來, 全表掃描記錄數TOP對象
select * from pg_stat_all_tables order by seq_tup_read desc limit 10;
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#5-%E7%B4%A2%E5%BC%95%E6%95%B0%E8%B6%85%E8%BF%874%E5%B9%B6%E4%B8%94size%E5%A4%A7%E4%BA%8E10mb%E7%9A%84topdml%E8%A1%A8 5 索引數超過4并且SIZE大于10MB的top(dml)表
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#6-%E8%87%AA%E4%B8%8A%E4%B8%80%E6%AC%A1%E9%87%8D%E7%BD%AE%E8%AE%A1%E6%95%B0%E5%99%A8%E4%BB%A5%E6%9D%A5-%E7%83%AD%E8%A1%A8 6 自上一次重置計數器以來, 熱表
1、dml次元
select schemaname,relname,pg_size_pretty(pg_total_relation_size(relid)),
n_tup_ins+n_tup_upd+n_tup_del+n_tup_hot_upd as iud_tuples from pg_stat_all_tables
order by iud_tuples desc limit 10;
2、查詢(IO)次元
select schemaname,relname,pg_size_pretty(pg_total_relation_size(relid)),
coalesce(heap_blks_read,0)+coalesce(toast_blks_read,0) read_blks from pg_statio_all_tables
order by read_blks desc limit 10;
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#7-%E8%87%AA%E4%B8%8A%E4%B8%80%E6%AC%A1%E9%87%8D%E7%BD%AE%E8%AE%A1%E6%95%B0%E5%99%A8%E4%BB%A5%E6%9D%A5-%E7%83%AD%E7%B4%A2%E5%BC%95 7 自上一次重置計數器以來, 熱索引
1、查詢(tuple)次元
select schemaname,relname,indexrelname,pg_size_pretty(pg_total_relation_size(indexrelid)),
idx_scan,idx_tup_read,idx_tup_fetch,idx_tup_read/(case idx_scan when 0 then 1.0 else idx_scan end)::float8 as sel from pg_stat_all_indexes
order by idx_tup_read desc limit 10;
select schemaname,relname,indexrelname,pg_size_pretty(pg_total_relation_size(indexrelid)),
coalesce(idx_blks_read,0) read_blks from pg_statio_all_indexes
order by read_blks desc limit 10;
3、選擇性可能不好的索引
select schemaname,relname,indexrelname,pg_size_pretty(pg_total_relation_size(indexrelid)),
idx_scan,idx_tup_read,idx_tup_fetch,idx_tup_read/(case idx_scan when 0 then 1.0 else idx_scan end)::float8 as sel from pg_stat_all_indexes
where idx_tup_read/(case idx_scan when 0 then 1.0 else idx_scan end)::float8 > ? -- 大于門檻值,選擇性不好
order by sel desc limit 10;
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#8-%E5%BD%93%E5%89%8D-%E7%B3%BB%E7%BB%9F%E7%BA%A7-profiling-%E7%83%AD%E7%82%B9 8 目前, 系統級 profiling 熱點
perf 分析
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E4%BA%8C-%E5%86%B7%E5%AF%B9%E8%B1%A1 二 冷對象
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#1-%E5%86%B7%E8%A1%A8-%E6%9C%80%E8%BF%91%E9%87%8D%E7%BD%AE%E7%BB%9F%E8%AE%A1%E4%BF%A1%E6%81%AF%E8%AE%A1%E6%95%B0%E5%99%A8%E5%90%8E 1 冷表, 最近(重置統計資訊計數器後)
select schemaname,relname,pg_size_pretty(pg_total_relation_size(relid)),
n_tup_ins+n_tup_upd+n_tup_del+n_tup_hot_upd as iud_tuples from pg_stat_all_tables
where n_tup_ins+n_tup_upd+n_tup_del+n_tup_hot_upd < ? -- 低于門檻值
order by iud_tuples limit 10;
select schemaname,relname,pg_size_pretty(pg_total_relation_size(relid)),
coalesce(heap_blks_read,0)+coalesce(toast_blks_read,0) read_blks from pg_statio_all_tables
where coalesce(heap_blks_read,0)+coalesce(toast_blks_read,0) < ? -- 低于門檻值
order by read_blks desc limit 10;
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#2-%E5%86%B7%E7%B4%A2%E5%BC%95-%E6%9C%80%E8%BF%91%E9%87%8D%E7%BD%AE%E7%BB%9F%E8%AE%A1%E4%BF%A1%E6%81%AF%E8%AE%A1%E6%95%B0%E5%99%A8%E5%90%8E 2 冷索引, 最近(重置統計資訊計數器後)
最近(重置統計資訊計數器後)未使用的索引
select * from pg_stat_all_indexes where idx_scan=0 and schemaname not in ('pg_catalog','pg_toast');
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E4%B8%89-%E6%85%A2sql 三 慢SQL
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#1-%E6%85%A2sql 1 慢SQL
1、曆史慢SQL
分析csvlog
2、目前慢SQL
select * from pg_stat_activity
where now()-query_start > ? -- 大于門檻值
order by now()-query_start desc;
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#2-%E6%85%A2sql%E7%9A%84%E8%AF%A6%E7%BB%86%E6%89%A7%E8%A1%8C%E8%AE%A1%E5%88%92auto_explain 2 慢SQL的詳細執行計劃(auto_explain)
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#3-%E9%95%BF%E4%BA%8B%E5%8A%A1%E5%8C%85%E6%8B%AC2pc 3 長事務(包括2PC)
select * from pg_stat_activity
where now()-xact_start > ? -- 大于門檻值
order by now()-xact_start desc;
select * from pg_prepared_xacts
where now()-prepared > ? -- 大于門檻值
order by now()-prepared desc;
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E5%9B%9B-%E5%9E%83%E5%9C%BE%E8%86%A8%E8%83%80 四 垃圾、膨脹
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#1-%E5%9E%83%E5%9C%BE-top 1 垃圾 TOP
select * from pg_stat_all_tables
where n_dead_tup >
order by
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#2-%E8%86%A8%E8%83%80-top 2 膨脹 TOP
1、表膨脹 TOP 5
SELECT
current_database() AS db, schemaname, tablename, reltuples::bigint AS tups, relpages::bigint AS pages, otta,
ROUND(CASE WHEN otta=0 OR sml.relpages=0 OR sml.relpages=otta THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS tbloat,
CASE WHEN relpages < otta THEN 0 ELSE relpages::bigint - otta END AS wastedpages,
CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END AS wastedbytes,
CASE WHEN relpages < otta THEN $$0 bytes$$::text ELSE (bs*(relpages-otta))::bigint || $$ bytes$$ END AS wastedsize,
iname, ituples::bigint AS itups, ipages::bigint AS ipages, iotta,
ROUND(CASE WHEN iotta=0 OR ipages=0 OR ipages=iotta THEN 0.0 ELSE ipages/iotta::numeric END,1) AS ibloat,
CASE WHEN ipages < iotta THEN 0 ELSE ipages::bigint - iotta END AS wastedipages,
CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes,
CASE WHEN ipages < iotta THEN $$0 bytes$$ ELSE (bs*(ipages-iotta))::bigint || $$ bytes$$ END AS wastedisize,
CASE WHEN relpages < otta THEN
CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta::bigint) END
ELSE CASE WHEN ipages < iotta THEN bs*(relpages-otta::bigint)
ELSE bs*(relpages-otta::bigint + ipages-iotta::bigint) END
END AS totalwastedbytes
FROM (
SELECT
nn.nspname AS schemaname,
cc.relname AS tablename,
COALESCE(cc.reltuples,0) AS reltuples,
COALESCE(cc.relpages,0) AS relpages,
COALESCE(bs,0) AS bs,
COALESCE(CEIL((cc.reltuples*((datahdr+ma-
(CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)),0) AS otta,
COALESCE(c2.relname,$$?$$) AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages,
COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes all cols
FROM
pg_class cc
JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname <> $$information_schema$$
LEFT JOIN
(
SELECT
ma,bs,foo.nspname,foo.relname,
(datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr,
(maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2
FROM (
SELECT
ns.nspname, tbl.relname, hdr, ma, bs,
SUM((1-coalesce(null_frac,0))*coalesce(avg_width, 2048)) AS datawidth,
MAX(coalesce(null_frac,0)) AS maxfracsum,
hdr+(
SELECT 1+count(*)/8
FROM pg_stats s2
WHERE null_frac<>0 AND s2.schemaname = ns.nspname AND s2.tablename = tbl.relname
) AS nullhdr
FROM pg_attribute att
JOIN pg_class tbl ON att.attrelid = tbl.oid
JOIN pg_namespace ns ON ns.oid = tbl.relnamespace
LEFT JOIN pg_stats s ON s.schemaname=ns.nspname
AND s.tablename = tbl.relname
AND s.inherited=false
AND s.attname=att.attname,
(
SELECT
(SELECT current_setting($$block_size$$)::numeric) AS bs,
CASE WHEN SUBSTRING(SPLIT_PART(v, $$ $$, 2) FROM $$#"[0-9]+.[0-9]+#"%$$ for $$#$$)
IN ($$8.0$$,$$8.1$$,$$8.2$$) THEN 27 ELSE 23 END AS hdr,
CASE WHEN v ~ $$mingw32$$ OR v ~ $$64-bit$$ THEN 8 ELSE 4 END AS ma
FROM (SELECT version() AS v) AS foo
) AS constants
WHERE att.attnum > 0 AND tbl.relkind=$$r$$
GROUP BY 1,2,3,4,5
) AS foo
) AS rs
ON cc.relname = rs.relname AND nn.nspname = rs.nspname
LEFT JOIN pg_index i ON indrelid = cc.oid
LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
) AS sml order by wastedbytes desc limit 5
2、索引膨脹 TOP 5
SELECT
current_database() AS db, schemaname, tablename, reltuples::bigint AS tups, relpages::bigint AS pages, otta,
ROUND(CASE WHEN otta=0 OR sml.relpages=0 OR sml.relpages=otta THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS tbloat,
CASE WHEN relpages < otta THEN 0 ELSE relpages::bigint - otta END AS wastedpages,
CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END AS wastedbytes,
CASE WHEN relpages < otta THEN $$0 bytes$$::text ELSE (bs*(relpages-otta))::bigint || $$ bytes$$ END AS wastedsize,
iname, ituples::bigint AS itups, ipages::bigint AS ipages, iotta,
ROUND(CASE WHEN iotta=0 OR ipages=0 OR ipages=iotta THEN 0.0 ELSE ipages/iotta::numeric END,1) AS ibloat,
CASE WHEN ipages < iotta THEN 0 ELSE ipages::bigint - iotta END AS wastedipages,
CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes,
CASE WHEN ipages < iotta THEN $$0 bytes$$ ELSE (bs*(ipages-iotta))::bigint || $$ bytes$$ END AS wastedisize,
CASE WHEN relpages < otta THEN
CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta::bigint) END
ELSE CASE WHEN ipages < iotta THEN bs*(relpages-otta::bigint)
ELSE bs*(relpages-otta::bigint + ipages-iotta::bigint) END
END AS totalwastedbytes
FROM (
SELECT
nn.nspname AS schemaname,
cc.relname AS tablename,
COALESCE(cc.reltuples,0) AS reltuples,
COALESCE(cc.relpages,0) AS relpages,
COALESCE(bs,0) AS bs,
COALESCE(CEIL((cc.reltuples*((datahdr+ma-
(CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)),0) AS otta,
COALESCE(c2.relname,$$?$$) AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages,
COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes all cols
FROM
pg_class cc
JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname <> $$information_schema$$
LEFT JOIN
(
SELECT
ma,bs,foo.nspname,foo.relname,
(datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr,
(maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2
FROM (
SELECT
ns.nspname, tbl.relname, hdr, ma, bs,
SUM((1-coalesce(null_frac,0))*coalesce(avg_width, 2048)) AS datawidth,
MAX(coalesce(null_frac,0)) AS maxfracsum,
hdr+(
SELECT 1+count(*)/8
FROM pg_stats s2
WHERE null_frac<>0 AND s2.schemaname = ns.nspname AND s2.tablename = tbl.relname
) AS nullhdr
FROM pg_attribute att
JOIN pg_class tbl ON att.attrelid = tbl.oid
JOIN pg_namespace ns ON ns.oid = tbl.relnamespace
LEFT JOIN pg_stats s ON s.schemaname=ns.nspname
AND s.tablename = tbl.relname
AND s.inherited=false
AND s.attname=att.attname,
(
SELECT
(SELECT current_setting($$block_size$$)::numeric) AS bs,
CASE WHEN SUBSTRING(SPLIT_PART(v, $$ $$, 2) FROM $$#"[0-9]+.[0-9]+#"%$$ for $$#$$)
IN ($$8.0$$,$$8.1$$,$$8.2$$) THEN 27 ELSE 23 END AS hdr,
CASE WHEN v ~ $$mingw32$$ OR v ~ $$64-bit$$ THEN 8 ELSE 4 END AS ma
FROM (SELECT version() AS v) AS foo
) AS constants
WHERE att.attnum > 0 AND tbl.relkind=$$r$$
GROUP BY 1,2,3,4,5
) AS foo
) AS rs
ON cc.relname = rs.relname AND nn.nspname = rs.nspname
LEFT JOIN pg_index i ON indrelid = cc.oid
LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
) AS sml order by wastedibytes desc limit 5
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#3-%E7%B3%BB%E7%BB%9F%E8%86%A8%E8%83%80%E6%97%B6%E9%97%B4%E7%82%B9---%E5%A4%9A%E4%B9%85%E4%BB%A5%E5%89%8D%E7%9A%84%E5%9E%83%E5%9C%BE%E5%8F%AF%E4%BB%A5%E8%A2%AB%E5%9B%9E%E6%94%B6 3 系統膨脹時間點 - 多久以前的垃圾可以被回收
select least(t1,t2) from (select min(prepared) t1 from pg_prepared_xacts) tt1, (select min(xact_start) t2 from pg_stat_activity) tt2;
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#4-%E8%87%AA%E5%8A%A8%E5%9E%83%E5%9C%BE%E5%9B%9E%E6%94%B6%E7%8A%B6%E6%80%81 4 自動垃圾回收狀态
1、系統級是否關閉自動垃圾回收
select current_setting('autovacuum');
2、已關閉自動垃圾回收的表
select relname, pg_size_pretty(pg_total_relation_size(oid)) from pg_class where reloptions @> array['autovacuum_enabled=off'];
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E4%BA%94-%E7%AD%89%E5%BE%85%E5%86%B2%E7%AA%81 五 等待、沖突
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#1-%E5%BD%93%E5%89%8D%E9%94%81%E7%AD%89%E5%BE%85 1 目前鎖等待
with
t_wait as
(
select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,a.classid,a.granted,
a.objid,a.objsubid,a.pid,a.virtualtransaction,a.virtualxid,a.transactionid,a.fastpath,
b.state,b.query,b.xact_start,b.query_start,b.usename,b.datname,b.client_addr,b.client_port,b.application_name
from pg_locks a,pg_stat_activity b where a.pid=b.pid and not a.granted
),
t_run as
(
select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,a.classid,a.granted,
a.objid,a.objsubid,a.pid,a.virtualtransaction,a.virtualxid,a.transactionid,a.fastpath,
b.state,b.query,b.xact_start,b.query_start,b.usename,b.datname,b.client_addr,b.client_port,b.application_name
from pg_locks a,pg_stat_activity b where a.pid=b.pid and a.granted
),
t_overlap as
(
select r.* from t_wait w join t_run r on
(
r.locktype is not distinct from w.locktype and
r.database is not distinct from w.database and
r.relation is not distinct from w.relation and
r.page is not distinct from w.page and
r.tuple is not distinct from w.tuple and
r.virtualxid is not distinct from w.virtualxid and
r.transactionid is not distinct from w.transactionid and
r.classid is not distinct from w.classid and
r.objid is not distinct from w.objid and
r.objsubid is not distinct from w.objsubid and
r.pid <> w.pid
)
),
t_unionall as
(
select r.* from t_overlap r
union all
select w.* from t_wait w
)
select locktype,datname,relation::regclass,page,tuple,virtualxid,transactionid::text,classid::regclass,objid,objsubid,
string_agg(
'Pid: '||case when pid is null then 'NULL' else pid::text end||chr(10)||
'Lock_Granted: '||case when granted is null then 'NULL' else granted::text end||' , Mode: '||case when mode is null then 'NULL' else mode::text end||' , FastPath: '||case when fastpath is null then 'NULL' else fastpath::text end||' , VirtualTransaction: '||case when virtualtransaction is null then 'NULL' else virtualtransaction::text end||' , Session_State: '||case when state is null then 'NULL' else state::text end||chr(10)||
'Username: '||case when usename is null then 'NULL' else usename::text end||' , Database: '||case when datname is null then 'NULL' else datname::text end||' , Client_Addr: '||case when client_addr is null then 'NULL' else client_addr::text end||' , Client_Port: '||case when client_port is null then 'NULL' else client_port::text end||' , Application_Name: '||case when application_name is null then 'NULL' else application_name::text end||chr(10)||
'Xact_Start: '||case when xact_start is null then 'NULL' else xact_start::text end||' , Query_Start: '||case when query_start is null then 'NULL' else query_start::text end||' , Xact_Elapse: '||case when (now()-xact_start) is null then 'NULL' else (now()-xact_start)::text end||' , Query_Elapse: '||case when (now()-query_start) is null then 'NULL' else (now()-query_start)::text end||chr(10)||
'SQL (Current SQL in Transaction): '||chr(10)||
case when query is null then 'NULL' else query::text end,
chr(10)||'--------'||chr(10)
order by
( case mode
when 'INVALID' then 0
when 'AccessShareLock' then 1
when 'RowShareLock' then 2
when 'RowExclusiveLock' then 3
when 'ShareUpdateExclusiveLock' then 4
when 'ShareLock' then 5
when 'ShareRowExclusiveLock' then 6
when 'ExclusiveLock' then 7
when 'AccessExclusiveLock' then 8
else 0
end ) desc,
(case when granted then 0 else 1 end)
) as lock_conflict
from t_unionall
group by
locktype,datname,relation,page,tuple,virtualxid,transactionid::text,classid,objid,objsubid ;
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#2-%E5%8E%86%E5%8F%B2%E7%AD%89%E5%BE%85%E4%BA%8B%E4%BB%B6%E7%BB%9F%E8%AE%A1 2 曆史等待事件統計
需要核心支援,PPAS已支援
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#3-%E5%8E%86%E5%8F%B2%E5%A4%87%E5%BA%93%E5%86%B2%E7%AA%81%E7%BB%9F%E8%AE%A1 3 曆史備庫沖突統計
備庫視圖
select * from pg_stat_database_conflicts;
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E5%85%AD-%E9%A2%84%E6%B5%8B 六 預測
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#1-freeze-%E9%A3%8E%E6%9A%B4%E9%A2%84%E6%B5%8B 1 FREEZE 風暴預測
CPU/IO 風暴預測
《PostgreSQL Freeze 風暴預測續 - 珍藏級SQL》https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E4%B8%83-%E6%A6%82%E8%B2%8C 七 概貌
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#1-%E6%95%B0%E6%8D%AE%E5%BA%93-%E7%BB%9F%E8%AE%A1%E4%BF%A1%E6%81%AF 1 資料庫 統計資訊
復原比例, 命中比例, 資料塊讀寫時間, 死鎖, 複制沖突
select * from pg_stat_database;
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#2-%E6%A3%80%E6%9F%A5%E7%82%B9-bgwriter-%E7%BB%9F%E8%AE%A1%E4%BF%A1%E6%81%AF 2 檢查點, bgwriter 統計資訊
select * from pg_stat_bgwriter;
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E4%BA%94%E8%AF%8A%E6%96%AD%E9%A1%B9%E7%B1%BB%E7%9B%AE---%E9%9A%90%E6%82%A3%E8%AF%8A%E6%96%AD 五、診斷項類目 - 隐患診斷
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E4%B8%80-%E6%80%A7%E8%83%BD%E9%9A%90%E6%82%A3 一 性能隐患
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#1-%E7%BB%9F%E8%AE%A1%E4%BF%A1%E6%81%AF%E4%B8%8D%E5%87%86%E7%9A%84%E5%AF%B9%E8%B1%A1%E8%A1%A8%E7%89%A9%E5%8C%96%E8%A7%86%E5%9B%BE 1 統計資訊不準的對象(表、物化視圖)
導緻執行計劃不準,影響SQL性能
select oid::regclass as table, relpages, pg_relation_size(oid)/current_setting('block_size')::float8 as real_pages from pg_class
where relkind in ('r', 'm') -- 表和物化視圖
and pg_relation_size(oid) > 1048576 -- 大于1MB
and (pg_relation_size(oid)/current_setting('block_size')::float8 - relpages)/(pg_relation_size(oid)/current_setting('block_size')::float8) > 0.2; -- 大于 20% 偏差
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E4%BA%8C-%E5%81%9C%E5%BA%93%E9%9A%90%E6%82%A3 二 停庫隐患
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#1-%E5%BA%93%E5%B9%B4%E9%BE%84 1 庫年齡
大于19億,剩餘1億事務後強制停庫。檢查是否有長事務導緻FREEZE失效。
select datname,age(datfrozenxid) from pg_database where age(datfrozenxid) > 1900000000;
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#2-%E8%A1%A8%E5%B9%B4%E9%BE%84 2 表年齡
select relname,age(relfrozenxid),* from pg_class where relfrozenxid<>0 and age(relfrozenxid) > 1900000000;
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E4%B8%89-%E4%B8%9A%E5%8A%A1%E9%80%BB%E8%BE%91%E9%9A%90%E6%82%A3 三 業務邏輯隐患
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#1-%E5%BA%8F%E5%88%97%E5%89%A9%E4%BD%99%E8%B0%83%E7%94%A8%E6%AC%A1%E6%95%B0 1 序列剩餘調用次數
剩餘調用次數低于10240000次
create or replace function f(OUT v_datname name, OUT v_role name, OUT v_nspname name, OUT v_relname name, OUT v_times_remain int8) returns setof record as $$
declare
begin
v_datname := current_database();
for v_role,v_nspname,v_relname in select rolname,nspname,relname from pg_authid t1 , pg_class t2 , pg_namespace t3 where t1.oid=t2.relowner and t2.relnamespace=t3.oid and t2.relkind='S'
LOOP
execute 'select (max_value-last_value)/increment_by from "'||v_nspname||'"."'||v_relname||'" where not is_cycled' into v_times_remain;
return next;
end loop;
end;
$$ language plpgsql;
select * from f() where v_times_remain is not null and v_times_remain < 10240000 -- 剩餘調用次數低于10240000次
order by v_times_remain limit 10;
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E5%9B%9B-%E6%97%A0%E6%B3%95%E7%99%BB%E9%99%86%E9%9A%90%E6%82%A3 四 無法登陸隐患
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#1-%E7%94%A8%E6%88%B7%E8%BF%87%E6%9C%9F 1 使用者過期
select rolname from pg_roles where
rolvaliduntil - now() < interval '30 d'; -- 使用者過期時間小于30天
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E4%BA%94-%E6%95%B0%E6%8D%AE%E4%B8%A2%E5%A4%B1%E9%9A%90%E6%82%A3 五 資料丢失隐患
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#1-unlogged-table-%E5%92%8C-%E5%93%88%E5%B8%8C%E7%B4%A2%E5%BC%9510%E6%B2%A1%E6%9C%89%E9%A3%8E%E9%99%A9 1 unlogged table 和 哈希索引(10沒有風險)
select relkind,relname from pg_class where relpersistence='u';
select * from pg_indexes where indexdef ~ 'USING hash';
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E5%85%AD-%E5%85%B6%E4%BB%96%E9%9A%90%E6%82%A3 六 其他隐患
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#1-%E4%BA%8B%E4%BB%B6%E8%A7%A6%E5%8F%91%E5%99%A8 1 事件觸發器
select current_database(),rolname,proname,evtname,evtevent,evtenabled,evttags from pg_event_trigger t1,pg_proc t2,pg_authid t3 where t1.evtfoid=t2.oid and t1.evtowner=t3.oid;
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E5%85%AD%E8%AF%8A%E6%96%AD%E9%A1%B9%E7%B1%BB%E7%9B%AE---%E5%AE%89%E5%85%A8%E8%AF%8A%E6%96%AD 六、診斷項類目 - 安全診斷
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E4%B8%80-%E9%93%BE%E8%B7%AF%E5%AE%89%E5%85%A8 一 鍊路安全
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#1-%E6%98%AF%E5%90%A6%E5%BC%80%E5%90%AF%E5%85%AC%E7%BD%91%E8%AE%BF%E9%97%AE%E5%85%AC%E7%BD%91%E5%9C%B0%E5%9D%80 1 是否開啟公網通路,公網位址
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#2-lvs%E9%98%B2%E7%81%AB%E5%A2%99 2 LVS防火牆
0.0.0.0
不太安全
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#3-proxy%E9%98%B2%E7%81%AB%E5%A2%99 3 PROXY防火牆
0.0.0.0
不太安全
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#4-%E6%95%B0%E6%8D%AE%E5%BA%93%E9%98%B2%E7%81%AB%E5%A2%99-pg_hbaconf 4 資料庫防火牆 pg_hba.conf
除127.0.0.1,local 以外,如果有trust認證,視為不安全
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#5-%E6%9A%B4%E5%8A%9B%E7%A0%B4%E8%A7%A3%E6%94%BB%E5%87%BBos-pg 5 暴力破解攻擊(os, pg)
grep -c "Failed password" /var/log/secure
統計錯誤代碼為28P01的報錯
28P01 invalid_password
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E4%BA%8C-%E6%95%B0%E6%8D%AE%E5%BA%93%E5%AE%89%E5%85%A8 二 資料庫安全
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#1-%E7%94%A8%E6%88%B7%E6%9D%83%E9%99%90%E5%BC%82%E5%B8%B8 1 使用者權限異常
列出(除白名單外)超級使用者
select rolname from pg_roles where rolsuper not in ('白名單');
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#2-%E6%99%AE%E9%80%9A%E7%94%A8%E6%88%B7%E8%87%AA%E5%AE%9A%E4%B9%89%E5%87%BD%E6%95%B0%E5%AE%89%E5%85%A8%E6%A3%80%E6%9F%A5 2 普通使用者自定義函數安全檢查
檢查定義為security invoker的UDF
存在安全陷阱隐患
select proname, prosrc from pg_proc where not prosecdef and proowner not in (select oid from pg_roles where rolsuper);
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#3-sql%E6%B3%A8%E5%85%A5 3 SQL注入
1、最近一天(時間範圍可選)ddl (drop, truncate)
取自審計日志
2、最近一天(時間範圍可選)不帶條件的DML (delete, update)
3、最近一天(時間範圍可選)影響行數超過N(可配置)的DML(update, delete)
取自審計日志(核心層支援DML輸出影響函數)
4、最近一天(時間範圍可選)傳回結果超過N(可配置)的查詢(select)
取自審計日志(核心層支援SELECT輸出影響函數)
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E4%B8%83%E8%AF%8A%E6%96%AD%E9%A1%B9%E7%B1%BB%E7%9B%AE---%E8%BD%AF%E7%A1%AC%E4%BB%B6%E5%BC%82%E5%B8%B8%E8%AF%8A%E6%96%AD 七、診斷項類目 - 軟硬體異常診斷
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E4%B8%80-%E7%A1%AC%E4%BB%B6 一 硬體
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#1-ssd-%E5%89%A9%E4%BD%99%E5%AF%BF%E5%91%BD 1 SSD 剩餘壽命
硬體廠商提供指令,或者smartctl
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#2-%E7%A1%AC%E4%BB%B6%E9%94%99%E8%AF%AF 2 硬體錯誤
cat /var/log/mcelog
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#%E4%BA%8C-%E8%BD%AF%E4%BB%B6 二 軟體
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#1-oom 1 oom
dmesg|grep -i oom
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#2-dmesg 2 dmesg
異常資訊
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#3-core-dump 3 core dump
檢查coredump目錄,例如 sysctl -a|grep kernel.core_pattern得到
kernel.core_pattern = /corefiles/core_%e_%u_%t_%s.%p
https://github.com/digoal/blog/blob/master/201806/20180613_05.md#4-%E9%94%99%E8%AF%AF%E6%97%A5%E5%BF%97 4 錯誤日志
1、PG 錯誤日志分析
統計:
錯誤代碼、錯誤次數、錯誤描述