天天看点

redis-server进程CPU百分百问题

结论:

待确认是否为redis的BUG,原因是进程实际占用的内存远小于配置的最大内存,所以不会是内存不够需要淘汰。

CPU百分百redis-server进程集群状态:

slave

临时解决办法:

使用gdb将d.ht[0].used的值改为0

问题原因:

dictGetRandomKey()过程中,

无法走到分支“if (dictSize(d) == 0) return NULL;”,

导致函数dbRandomKey()进入死循环。

版本:

Redis server v=3.2.0 sha=00000000:0 malloc=jemalloc-4.0.3 bits=64 build=9894db3ef433c070

现象1:CPU百分百

PID   USER  PR NI VIRT  RES  SHR  S %CPU  %MEM TIME+   COMMAND                                                                                                                

25636 redis 20 0  38492 4096 1360 R 100.0 0.0  2578:10 redis-server

现象2:大量CLOSE_WAIT状态连接:

tcp     2417      0 1.49.26.98:11382      1.49.26.98:37268      CLOSE_WAIT  -                   

tcp     2521      0 1.49.26.98:11382      1.49.26.98:35141      CLOSE_WAIT  -                   

tcp     2521      0 1.49.26.98:11382      1.49.26.98:57181      CLOSE_WAIT  -

进程状态:

redis 25636 30.0 0.0 38492  4096 ? Rsl 3月23 2579:55 /data/redis/bin/redis-server *:1382 [cluster]

最大内存配置(1G):

maxmemory 1073741824

运行日志:

25636:S 28 Mar 00:21:24.526 - 1 clients connected (0 slaves), 1312384 bytes in use

25636:S 28 Mar 00:21:29.531 - DB 0: 1 keys (1 volatile) in 8 slots HT.

25636:S 28 Mar 00:21:29.531 - 1 clients connected (0 slaves), 1312384 bytes in use

25636:S 28 Mar 00:21:32.585 - Accepted 1.118.14.7:58132

调用栈:

#0  dictGenHashFunction (key=<optimized out>, len=5) at dict.c:123

#1  0x00000000004232e6 in dictFind (d=0x7f71c2a17240, key=key@entry=0x7f71c2a15001) at dict.c:499

#2  0x000000000043a00a in dbRandomKey (db=0x7f71c2a24800) at db.c:176

#3  0x000000000043a0a2 in randomkeyCommand (c=0x7f71c2aae1c0) at db.c:355

#4  0x0000000000426b95 in call (c=c@entry=0x7f71c2aae1c0, flags=flags@entry=15) at server.c:2221

#5  0x0000000000429ba7 in processCommand (c=0x7f71c2aae1c0) at server.c:2500

#6  0x0000000000436515 in processInputBuffer (c=0x7f71c2aae1c0) at networking.c:1296

#7  0x0000000000421338 in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71c2a2e050, flags=flags@entry=3) at ae.c:412

#8  0x00000000004215eb in aeMain (eventLoop=0x7f71c2a2e050) at ae.c:455

#9  0x000000000041e5df in main (argc=2, argv=0x7ffef34b2418) at server.c:4079

#0  0x00007f71c2fbc3a2 in random () from /lib64/libc.so.6

#1  0x0000000000423745 in dictGetRandomKey (d=0x7f71c2a171e0) at dict.c:646

#2  0x0000000000439fc0 in dbRandomKey (db=0x7f71c2a24800) at db.c:171

#0  0x00007f71c30e17e4 in __memcmp_sse4_1 () from /lib64/libc.so.6

#1  0x0000000000424219 in dictSdsKeyCompare (privdata=<optimized out>, key1=<optimized out>, key2=<optimized out>) at server.c:445

#2  0x000000000042331d in dictFind (d=0x7f71c2a17240, key=0x7f71c2a27e73) at dict.c:504

#3  0x0000000000439494 in getExpire (db=0x7f71c2a24800, key=0x7f71c2a27e60) at db.c:824

#4  0x0000000000439c4f in expireIfNeeded (db=0x7f71c2a24800, key=0x7f71c2a27e60) at db.c:858

#5  0x000000000043a01a in dbRandomKey (db=0x7f71c2a24800) at db.c:177

#6  0x000000000043a0a2 in randomkeyCommand (c=0x7f71c2aae1c0) at db.c:355

#7  0x0000000000426b95 in call (c=c@entry=0x7f71c2aae1c0, flags=flags@entry=15) at server.c:2221

#8  0x0000000000429ba7 in processCommand (c=0x7f71c2aae1c0) at server.c:2500

#9  0x0000000000436515 in processInputBuffer (c=0x7f71c2aae1c0) at networking.c:1296

#10 0x0000000000421338 in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71c2a2e050, flags=flags@entry=3) at ae.c:412

#11 0x00000000004215eb in aeMain (eventLoop=0x7f71c2a2e050) at ae.c:455

#12 0x000000000041e5df in main (argc=2, argv=0x7ffef34b2418) at server.c:4079

#0  dictGetRandomKey (d=<optimized out>) at dict.c:663

#1  0x0000000000439fc0 in dbRandomKey (db=0x7f71c2a24800) at db.c:171

#2  0x000000000043a0a2 in randomkeyCommand (c=0x7f71c2aae1c0) at db.c:355

#3  0x0000000000426b95 in call (c=c@entry=0x7f71c2aae1c0, flags=flags@entry=15) at server.c:2221

#4  0x0000000000429ba7 in processCommand (c=0x7f71c2aae1c0) at server.c:2500

#5  0x0000000000436515 in processInputBuffer (c=0x7f71c2aae1c0) at networking.c:1296

#6  0x0000000000421338 in aeProcessEvents (eventLoop=eventLoop@entry=0x7f71c2a2e050, flags=flags@entry=3) at ae.c:412

#7  0x00000000004215eb in aeMain (eventLoop=0x7f71c2a2e050) at ae.c:455

#8  0x000000000041e5df in main (argc=2, argv=0x7ffef34b2418) at server.c:4079

猜测:

达到最大内存,进入淘汰keys逻辑,但没有keys符合淘汰,从而死循环。

相关代码:

进程内存(问题解决,退出死循环后才能看到,但结果和ps看到一致):

# Memory

used_memory:1375320

used_memory_human:1.31M

used_memory_rss:4321280

used_memory_rss_human:4.12M

used_memory_peak:2468448

used_memory_peak_human:2.35M

total_system_memory:33453797376

total_system_memory_human:31.16G

used_memory_lua:34816

used_memory_lua_human:34.00K

maxmemory:1073741824

maxmemory_human:1.00G

maxmemory_policy:allkeys-lru

mem_fragmentation_ratio:3.14

mem_allocator:jemalloc-4.0.3

继续阅读