天天看点

PostgreSQL 利用编译器extension 支持int128,提升聚合性能

postgresql , int128 , clang , gcc , icc

postgresql 9.4以及以前的版本,在int,int2,int8的聚合计算中,为了保证数据不会溢出,中间结果使用numeric来存储。

numeric是postgresql自己实现的一种数值类型,可以存储非常大的数值(估计是做科学计算的需求),但是牺牲了一定的性能。

为了提高聚合,特别是大数据量的聚合时的性能,社区借用了编译器支持的int128类型,作为数据库int, int2, int8的中间计算结果,从而提升计算性能。

gcc,clang,icc都支持int128

1. gcc

2. icc

编译时根据编译器的特性自动判断是否使用int128特性.

<a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=8122e1437e332e156d971a0274879b0ee76e488a">https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=8122e1437e332e156d971a0274879b0ee76e488a</a>

there was recently talk about if we should start using 128-bit integers

(where available) to speed up the aggregate functions over integers

which uses numeric for their internal state. so i hacked together a

patch for this to see what the performance gain would be.

previous thread:

<a href="http://www.postgresql.org/message-id/[email protected]">http://www.postgresql.org/message-id/[email protected]</a>

what the patch does is switching from using numerics in the aggregate

state to int128 and then convert the type from the 128-bit integer in

the final function.

the functions where we can make use of int128 states are:

the initial benchmark results look very promising. when summing 10

million int8 i get a speedup of ~2.5x and similarly for var_samp() on 10

million int4 i see a speed up of ~3.7x. to me this indicates that it is

worth the extra code. what do you say? is this worth implementing?

the current patch still requires work. i have not written the detection

of int128 support yet, and the patch needs code cleanup (for example: i

used an int16_ prefix on the added functions, suggestions for better

names are welcome). i also need to decide on what estimate to use for

the size of that state.

the patch should work and pass make check on platforms where __int128_t

is supported.

the simple benchmarks:

继续阅读