天天看點

從pgbench性能測試結果中發現SQL延遲的規律

偶然從網友的這篇部落格中發現pgbench的TPC-B測試結果中有一個比較奇怪的地方

http://blog.chinaunix.net/uid-20802110-id-4889543.html

-------------------------------------------

-bash-4.1$ /usr/pgsql-9.4/bin/pgbench -c 800 -t 5 -j 4  -r pgbench

starting vacuum...end.

transaction type: TPC-B (sort of)

scaling factor: 1

query mode: simple

number of clients: 800

number of threads: 4

number of transactions per client: 5

number of transactions actually processed: 4000/4000

latency average: 0.000 ms

tps = 217.188339 (including connections establishing)

tps = 243.570070 (excluding connections establishing)

statement latencies in milliseconds:

0.007152 \set nbranches 1 * :scale

0.001217 \set ntellers 10 * :scale

0.000891 \set naccounts 100000 * :scale

0.001649 \setrandom aid 1 :naccounts

0.000924 \setrandom bid 1 :nbranches

0.000974 \setrandom tid 1 :ntellers

0.002318 \setrandom delta -5000 5000

43.159543 BEGIN;

67.412356 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;

19.172284 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;

2483.177474 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;                          -----延遲非常明顯,可以從此入手

37.104961 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;

0.466527 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);

3.609726 END;

為什麼說它奇怪?因為這個大的延遲無疑應該是高并發争用導緻的。但是pgbench_tellers中有10條記錄,pgbench_branches中隻有一條記錄,按道理對pgbench_branches的争用比pgbench_tellers還厲害,為什麼pgbench_branches反而很快呢?

于是,我在自己的環境中,逐漸增加并發數再現了一下這個問題。

并發數為1

點選(此處)折疊或打開

[chenhj@node2 ~]$ pgbench -c 1 -j 1 -t 10 -r pgbench

number of clients: 1

number of threads: 1

number of transactions per client: 10

number of transactions actually processed: 10/10

tps = 701.016474 (including connections establishing)

tps = 792.832792 (excluding connections establishing)

    0.002800    \set nbranches 1 * :scale

    0.000500    \set ntellers 10 * :scale

    0.000400    \set naccounts 100000 * :scale

    0.000600    \setrandom aid 1 :naccounts

    0.000600    \setrandom bid 1 :nbranches

    0.000500    \setrandom tid 1 :ntellers

    0.000400    \setrandom delta -5000 5000

    0.052700    BEGIN;

    0.272300    UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;

    0.139800    SELECT abalance FROM pgbench_accounts WHERE aid = :aid;

    0.157700    UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;

    0.133500    UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;

    0.118500    INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);

    0.369200    END;

除了pgbench_accounts外稍微慢點以外,其它幾個表的更新時間差不多。

并發數為4

[chenhj@node2 ~]$ pgbench -c 4 -j 4 -t 10 -r pgbench

number of clients: 4

number of transactions actually processed: 40/40

tps = 1271.779219 (including connections establishing)

tps = 1460.440323 (excluding connections establishing)

0.002700    \set nbranches 1 * :scale

0.000500    \set ntellers 10 * :scale

0.000425    \set naccounts 100000 * :scale

0.000675    \setrandom aid 1 :naccounts

0.000425    \setrandom bid 1 :nbranches

0.000325    \setrandom tid 1 :ntellers

0.000600    \setrandom delta -5000 5000

0.081725    BEGIN;

0.248250    UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;

0.136400    SELECT abalance FROM pgbench_accounts WHERE aid = :aid;

0.436850    UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;

0.961600    UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;

0.111075    INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);

0.462825    END;

現在最慢的是pgbench_branches,因為4個并發連接配接都要争搶它。但是總體的tps比1并發時提高了不少。

并發數為10

[chenhj@node2 ~]$ pgbench -c 10 -j 10 -t 10 -r pgbench

number of clients: 10

number of threads: 10

number of transactions actually processed: 100/100

tps = 1318.322040 (including connections establishing)

tps = 1517.142190 (excluding connections establishing)

    0.002500    \set nbranches 1 * :scale

    0.000550    \set ntellers 10 * :scale

    0.000480    \set naccounts 100000 * :scale

    0.000570    \setrandom aid 1 :naccounts

    0.000440    \setrandom bid 1 :nbranches

    0.000350    \setrandom tid 1 :ntellers

    0.000600    \setrandom delta -5000 5000

    0.069060    BEGIN;

    0.270230    UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;

    0.276860    SELECT abalance FROM pgbench_accounts WHERE aid = :aid;

    1.880150    UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;

    2.661570    UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;

    0.175470    INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);

    0.419380    END;

現在最慢的還是pgbench_branches,但pgbench_tellers也明顯慢了起來。總體的tps和并發數為4時相比有所增加,但不明顯。(測試用的環境是VM,配置設定給VM的CPU是2核)。

并發數為20

[chenhj@node2 ~]$ pgbench -c 20 -j 10 -t 10 -r pgbench

number of clients: 20

number of transactions actually processed: 200/200

tps = 1294.808465 (including connections establishing)

tps = 1499.957626 (excluding connections establishing)

    0.002695    \set nbranches 1 * :scale

    0.000545    \set ntellers 10 * :scale

    0.000550    \set naccounts 100000 * :scale

    0.000640    \setrandom aid 1 :naccounts

    0.000450    \setrandom bid 1 :nbranches

    0.000605    \setrandom tid 1 :ntellers

    0.000505    \setrandom delta -5000 5000

    0.200460    BEGIN;

    0.386675    UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;

    0.238615    SELECT abalance FROM pgbench_accounts WHERE aid = :aid;

    5.735790    UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;

    3.869125    UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;

    0.116305    INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);

    0.516330    END;

現在最慢的變成pgbench_tellers了,總體的tps也開始下降。

并發數為100

[chenhj@node2 ~]$ pgbench -c 100 -j 10 -t 10 -r pgbench

number of clients: 100

number of transactions actually processed: 1000/1000

tps = 994.032821 (including connections establishing)

tps = 1175.266424 (excluding connections establishing)

    0.002603    \set nbranches 1 * :scale

    0.000715    \set ntellers 10 * :scale

    0.000610    \set naccounts 100000 * :scale

    0.000682    \setrandom aid 1 :naccounts

    0.000485    \setrandom bid 1 :nbranches

    0.000488    \setrandom tid 1 :ntellers

    0.000547    \setrandom delta -5000 5000

    0.364759    BEGIN;

    0.730818    UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;

    0.322968    SELECT abalance FROM pgbench_accounts WHERE aid = :aid;

    63.065841    UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;

    7.542918    UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;

    0.152586    INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);

    0.740661    END;

pgbench_tellers越發慢的明顯了。

并發數為500

[chenhj@node2 ~]$ pgbench -c 500 -j 10 -t 10 -r pgbench

number of clients: 500

number of transactions actually processed: 5000/5000

tps = 396.234567 (including connections establishing)

tps = 414.176788 (excluding connections establishing)

    0.003855    \set nbranches 1 * :scale

    0.001171    \set ntellers 10 * :scale

    0.000693    \set naccounts 100000 * :scale

    0.000980    \setrandom aid 1 :naccounts

    0.000886    \setrandom bid 1 :nbranches

    0.000596    \setrandom tid 1 :ntellers

    0.000603    \setrandom delta -5000 5000

    0.746502    BEGIN;

    7.766349    UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;

    1.093415    SELECT abalance FROM pgbench_accounts WHERE aid = :aid;

    1002.415717    UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;

    22.612090    UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;

    0.213186    INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);

    1.627372    END;

不僅對pgbench_tellers的更新慢的離譜,總的tps也下降的非常厲害,還不到峰值時的3分之1。(這也說明了,控制好到達DB的連接配接數多麼重要。)

那麼,為什麼更新pgbench_tellers比更新pgbench_branches還慢呢?稍微想了想,終于明白了。

更新pgbench_tellers的SQL語句在更新pgbench_branches的SQL語句前面執行,是以pgbench_tellers已經幫pgbench_branches擋住了絕大部分的并發。導緻實際到達pgbench_branches的并發數不是500而隻有10,是以pgbench_branches的更新很快。下面驗證一下這個假設。

把pgbench的TCP-B測試SQL寫到一個腳本裡再測

[chenhj@node2 ~]$ cat test.sql

\set nbranches 1 * :scale

\set ntellers 10 * :scale

\set naccounts 100000 * :scale

\setrandom aid 1 :naccounts

\setrandom bid 1 :nbranches

\setrandom tid 1 :ntellers

\setrandom delta -5000 5000

BEGIN;

UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;

SELECT abalance FROM pgbench_accounts WHERE aid = :aid;

UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;

UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;

INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);

END;

[chenhj@node2 ~]$ pgbench -c 500 -j 10 -t 10 -r -f test.sql pgbench

transaction type: Custom query

tps = 386.651133 (including connections establishing)

tps = 403.524141 (excluding connections establishing)

    0.002743 \set nbranches 1 * :scale

    0.000700 \set ntellers 10 * :scale

    0.001121 \set naccounts 100000 * :scale

    0.000941 \setrandom aid 1 :naccounts

    0.000488 \setrandom bid 1 :nbranches

    0.000479 \setrandom tid 1 :ntellers

    0.000695 \setrandom delta -5000 5000

    0.669572 BEGIN;

    7.178750 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;

    1.119658 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;

    1010.879585 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;

    23.284097 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;

    0.176437 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);

    1.609216 END;

測試結果和之前沒有用腳本時一緻。

把腳本中的更新pgbench_branches的SQL語句移到更新pgbench_tellers的SQL語句的前面再測

[chenhj@node2 ~]$ cat test2.sql

[chenhj@node2 ~]$ pgbench -c 500 -j 10 -t 10 -r -f test2.sql pgbench

tps = 228.028143 (including connections establishing)

tps = 233.288608 (excluding connections establishing)

    0.004425    \set nbranches 1 * :scale

    0.000828    \set ntellers 10 * :scale

    0.001644    \set naccounts 100000 * :scale

    0.000938    \setrandom aid 1 :naccounts

    0.000534    \setrandom bid 1 :nbranches

    0.000507    \setrandom tid 1 :ntellers

    0.000540    \setrandom delta -5000 5000

    1.262434    BEGIN;

    9.896378    UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;

    1.454215    SELECT abalance FROM pgbench_accounts WHERE aid = :aid;

    1880.677017    UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;

    0.879294    UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;

    0.291396    INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);

    4.006580    END;

現在最慢的SQL變成pgbench_branches的更新了,驗證了我的猜測,也算是真相大白了。但是,從測試結果中還可以發現,先更新pgbench_branches比後更新pgbench_branches相比,總體的tps又有一個大幅度的下降,從403降到233。如果反過來考慮這件事,就是,調整SQL的執行順序有時候也可以作為一種性能調優的手段。