天天看點

softlock---Kernel panic - not syncing: softlockup: hung tasks

第一次解一個softlockup的bug,FAE回複的資訊都不能很好的分析問題的根源,而且之前對softlockup的認識比較膚淺,

是以自己搗鼓這怎麼讓kernel自己産生個softlockup的warning出來。

基本原理:

每個CPU上起一個kernel thread,這個thread就用目前時間去更新一個percpu的全局時間锉;同時在tick中斷中檢查目前的時間

和這個全局的時間锉之間的內插補點是不是大于一個設定的門檻值,對于linux-2.6.27,這個門檻值設的是60s,如果60s這個核心線程還沒

有得到一次排程的機會,那就說明這個CPU上發生死鎖了。

在do_settimefoday中的clock_was_set 之前加上

preempt_disable();

while(1);   這種狀态是irq 使能了,但是搶占關閉了,沒有任何任務能夠搶占CPU17上的settimeofday這個task,也正因為如此

                  該CPU上的softlockup核心線程得不到運作,全局時間锉與目前時間內插補點超過門檻值後,在tick中斷中檢測這種情況。

然後遍一個調用settimefoday的小程式,運作,過60s後出現如下warning

softlock---Kernel panic - not syncing: softlockup: hung tasks

用同樣的方法在ARM上試

BUG: soft lockup - CPU#2 stuck for 22s! [settimeofday:485]

Modules linked in:

Pid: 485, comm:         settimeofday

CPU: 2    Not tainted  (3.3.0-rc3+ #48)

PC is at do_settimeofday+0xd8/0xe4

LR is at do_settimeofday+0xd7/0xe4

pc : [<800491e8>]    lr : [<800491e7>]    psr: 40000033  arm-non-linux-addr2line -e vmlinux800491e8 可以得到對應是哪一行代碼

sp : 86d97f30  ip : 00000000  fp : 00000000

r10: 00000000  r9 : 86d96000  r8 : 806dc144

r7 : 00000001  r6 : 00000421  r5 : 806dc140  r4 : 80711180

r3 : 806e2d9c  r2 : 00000000  r1 : 00000422  r0 : 00000000

Flags: nZcv  IRQs on  FIQs on  Mode SVC_32  ISA Thumb  Segment user

Control: 50c53c7d  Table: 66dd806a  DAC: 00000015

[<80012639>] (unwind_backtrace+0x1/0xa4) from [<8005ddf9>] (watchdog_timer_fn+0x145/0x190)

[<8005ddf9>] (watchdog_timer_fn+0x145/0x190) from [<80036f91>] (hrtimer_run_queues+0xf9/0x240)

[<80036f91>] (hrtimer_run_queues+0xf9/0x240) from [<80027a01>] (run_local_timers+0xd/0x18)

[<80027a01>] (run_local_timers+0xd/0x18) from [<80027a33>] (update_process_times+0x27/0x54)

[<80027a33>] (update_process_times+0x27/0x54) from [<8004d4a3>] (tick_periodic+0x4b/0xb0)

[<8004d4a3>] (tick_periodic+0x4b/0xb0) from [<8004d593>] (tick_handle_periodic+0x23/0x78)

[<8004d593>] (tick_handle_periodic+0x23/0x78) from [<80012153>] (twd_handler+0x23/0x2c)

[<80012153>] (twd_handler+0x23/0x2c) from [<80060a5b>] (handle_percpu_devid_irq+0x73/0xc8)

[<80060a5b>] (handle_percpu_devid_irq+0x73/0xc8) from [<8005e163>] (generic_handle_irq+0x23/0x34)

[<8005e163>] (generic_handle_irq+0x23/0x34) from [<8000df8b>] (handle_IRQ+0x4f/0x94)

[<8000df8b>] (handle_IRQ+0x4f/0x94) from [<80008431>] (gic_handle_irq+0x29/0x84)

[<80008431>] (gic_handle_irq+0x29/0x84) from [<8000d1bf>] (__irq_svc+0x3f/0x80)

Exception stack(0x86d97ee8 to 0x86d97f30)

7ee0:                   00000000 00000422 00000000 806e2d9c 80711180 806dc140

7f00: 00000421 00000001 806dc144 86d96000 00000000 00000000 00000000 86d97f30

7f20: 800491e7 800491e8 40000033 ffffffff

[<8000d1bf>] (__irq_svc+0x3f/0x80) from [<800491e8>] (do_settimeofday+0xd8/0xe4)  死鎖的位置在do_settimeofday+0xd8

[<800491e8>] (do_settimeofday+0xd8/0xe4) from [<80020ced>] (do_sys_settimeofday+0x49/0xa8)

[<80020ced>] (do_sys_settimeofday+0x49/0xa8) from [<80020d71>] (sys_settimeofday+0x25/0xb8)

[<80020d71>] (sys_settimeofday+0x25/0xb8) from [<8000d681>] (ret_fast_syscall+0x1/0x5c)

小試了下牛刀,等待FAE回複。

 =============================================================================

333473

BUG: soft lockup - CPU#0 stuck for 61s! [events/0:51] 

Modules linked in: OSLXcontext ippathmgt kmd kprn x_tables ip6_tables ip_tables ipv6 binfmt_misc 8021q 

Cpu 0 

$ 0   : 0000000000000000 ffffffff81410000 0000000000000003 0000000000000000 

$ 4   : 0000000000000010 0000000000000010 0000000000000010 a80000041fa0bdd0 

$ 8   : 0000000000000000 a80000041fbed080 0000000000000000 0000000000000001 

$12   : a80000041fa0bfe0 0000000000008c00 ffffffff813b4000 000000000000009a 

$16   : ffffffff813d0000 a8000004132faf00 000000000000000f 0000000000000001 

$20   : 0000000000000000 ffffffff8119a920 0000000000000000 fffffffffffffffe 

$24   : 0000000000000002 ffffffff81103e40 

$28   : a80000041fa08000 a80000041fa0be00 ffffffff8136e3f0 ffffffff811898cc 

Hi    : 0000000000000000 

Lo    : 0000000000000000 

epc   : ffffffff81189784 smp_call_function_mask+0x144/0x258 

    Not tainted 

ra    : ffffffff811898cc smp_call_function+0x34/0x78 

Status: 10008ce3    KX SX UX KERNEL EXL IE 

Cause : 40808000 

PrId  : 000d030b (Cavium Octeon) 

BUG: soft lockup - CPU#10 stuck for 61s! [ntpdate:3032] 

Kernel panic - not syncing: softlockup: hung tasks 

------------[ cut here ]------------ 

WARNING: at kernel/smp.c:332 smp_call_function_mask+0x250/0x258() 

Modules linked in: OSLXcontext ippathmgt kmd kprn x_tables ip6_tables ip_tables ipv6 binfmt_misc 8021q 

Call Trace: 

[<ffffffff813221f4>] dump_stack+0x8/0x34 

[<ffffffff8115f588>] warn_on_slowpath+0x58/0x90 

[<ffffffff81189890>] smp_call_function_mask+0x250/0x258 

[<ffffffff811898cc>] smp_call_function+0x34/0x78 

[<ffffffff81322428>] panic+0x98/0x220 

[<ffffffff81192c10>] softlockup_tick+0x290/0x310 

[<ffffffff8116ce84>] update_process_times+0x34/0x78 

[<ffffffff811852e8>] tick_sched_timer+0x58/0xd8 

[<ffffffff8117e398>] __run_hrtimer+0xb0/0xc8 

[<ffffffff8117f150>] hrtimer_interrupt+0x158/0x210 

[<ffffffff81138ac4>] c0_compare_interrupt+0x11c/0x148 

[<ffffffff8119334c>] _handle_IRQ_event+0x5c/0xb8 

[<ffffffff81193430>] handle_IRQ_event+0x88/0x140 

[<ffffffff811950e8>] handle_percpu_irq+0x78/0xd8 

[<ffffffff81100f28>] plat_irq_dispatch+0xc8/0x1d0 

[<ffffffff81130a00>] ret_from_irq+0x0/0x4 

[<ffffffff81189784>] smp_call_function_mask+0x144/0x258 

[<ffffffff811898cc>] smp_call_function+0x34/0x78 

[<ffffffff8116652c>] on_each_cpu+0x2c/0x88 

[<ffffffff81175410>] run_workqueue+0xc8/0x1a0 

[<ffffffff81175960>] worker_thread+0x60/0xd0 

[<ffffffff8117a404>] kthread+0x5c/0xb0 

[<ffffffff81133328>] kernel_thread_helper+0x10/0x18 

---[ end trace 1770b5013095d915 ]--- 

Rebooting in 10 seconds..<3> 

Modules linked in: OSLXcontext ippathmgt kmd kprn x_tables ip6_tables ip_tables ipv6 binfmt_misc 8021q 

Cpu 10 

$ 0   : 0000000000000000 a80000041cf3be70 0000000000000003 0000000000000000 

$ 4   : 0000000000000010 0000000000000010 0000000000000010 a80000041cf3bca0 

$ 8   : 0000000000000000 000000003b9aca00 ffffffff81410000 ffffffff81412520 

$12   : 0000000000000000 ffffffffc0000008 ffffffff81191258 fffffffff0000000 

$16   : ffffffff813d0000 a80000041feaca80 000000000000000f 0000000000000001 

$20   : 0000000000000000 ffffffff8117e610 0000000000000000 0000000000000001 

$24   : 0000000000000042 ffffffff81103e40 

$28   : a80000041cf38000 a80000041cf3bcd0 0000000010013fe0 ffffffff811898cc 

Hi    : 03d0900000000000 

Lo    : 0487ab0000000000 

epc   : ffffffff81189778 smp_call_function_mask+0x138/0x258 

    Tainted: G        W 

ra    : ffffffff811898cc smp_call_function+0x34/0x78 

Status: 10008ce3    KX SX UX KERNEL EXL IE  

Cause : 40808000 

PrId  : 000d030b (Cavium Octeon) 

Kernel panic - not syncing: softlockup: hung tasks 

------------[ cut here ]------------ 

WARNING: at kernel/smp.c:332 smp_call_function_mask+0x250/0x258() 

Modules linked in: OSLXcontext ippathmgt kmd kprn x_tables ip6_tables ip_tables ipv6 binfmt_misc 8021q 

Call Trace: 

[<ffffffff813221f4>] dump_stack+0x8/0x34 

[<ffffffff8115f588>] warn_on_slowpath+0x58/0x90 

[<ffffffff81189890>] smp_call_function_mask+0x250/0x258 

[<ffffffff811898cc>] smp_call_function+0x34/0x78 

[<ffffffff81322428>] panic+0x98/0x220 

[<ffffffff81192c10>] softlockup_tick+0x290/0x310 

[<ffffffff8116ce84>] update_process_times+0x34/0x78 

[<ffffffff811852e8>] tick_sched_timer+0x58/0xd8 

[<ffffffff8117e398>] __run_hrtimer+0xb0/0xc8 

[<ffffffff8117f150>] hrtimer_interrupt+0x158/0x210 

[<ffffffff81138ac4>] c0_compare_interrupt+0x11c/0x148 

[<ffffffff8119334c>] _handle_IRQ_event+0x5c/0xb8 

[<ffffffff81193430>] handle_IRQ_event+0x88/0x140 

[<ffffffff811950e8>] handle_percpu_irq+0x78/0xd8 

[<ffffffff81100f28>] plat_irq_dispatch+0xc8/0x1d0 

[<ffffffff81130a00>] ret_from_irq+0x0/0x4 

[<ffffffff81189778>] smp_call_function_mask+0x138/0x258 

[<ffffffff811898cc>] smp_call_function+0x34/0x78 

[<ffffffff8116652c>] on_each_cpu+0x2c/0x88 

[<ffffffff81181ad4>] do_settimeofday+0x14c/0x180 

[<ffffffff81178e0c>] SyS_clock_settime+0xec/0x108 

[<ffffffff811912d0>] compat_sys_clock_settime+0x78/0x90 

[<ffffffff81141b64>] handle_sysn32+0x44/0x84