第一次解一個softlockup的bug,FAE回複的資訊都不能很好的分析問題的根源,而且之前對softlockup的認識比較膚淺,
是以自己搗鼓這怎麼讓kernel自己産生個softlockup的warning出來。
基本原理:
每個CPU上起一個kernel thread,這個thread就用目前時間去更新一個percpu的全局時間锉;同時在tick中斷中檢查目前的時間
和這個全局的時間锉之間的內插補點是不是大于一個設定的門檻值,對于linux-2.6.27,這個門檻值設的是60s,如果60s這個核心線程還沒
有得到一次排程的機會,那就說明這個CPU上發生死鎖了。
在do_settimefoday中的clock_was_set 之前加上
preempt_disable();
while(1); 這種狀态是irq 使能了,但是搶占關閉了,沒有任何任務能夠搶占CPU17上的settimeofday這個task,也正因為如此
該CPU上的softlockup核心線程得不到運作,全局時間锉與目前時間內插補點超過門檻值後,在tick中斷中檢測這種情況。
然後遍一個調用settimefoday的小程式,運作,過60s後出現如下warning
用同樣的方法在ARM上試
BUG: soft lockup - CPU#2 stuck for 22s! [settimeofday:485]
Modules linked in:
Pid: 485, comm: settimeofday
CPU: 2 Not tainted (3.3.0-rc3+ #48)
PC is at do_settimeofday+0xd8/0xe4
LR is at do_settimeofday+0xd7/0xe4
pc : [<800491e8>] lr : [<800491e7>] psr: 40000033 arm-non-linux-addr2line -e vmlinux800491e8 可以得到對應是哪一行代碼
sp : 86d97f30 ip : 00000000 fp : 00000000
r10: 00000000 r9 : 86d96000 r8 : 806dc144
r7 : 00000001 r6 : 00000421 r5 : 806dc140 r4 : 80711180
r3 : 806e2d9c r2 : 00000000 r1 : 00000422 r0 : 00000000
Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment user
Control: 50c53c7d Table: 66dd806a DAC: 00000015
[<80012639>] (unwind_backtrace+0x1/0xa4) from [<8005ddf9>] (watchdog_timer_fn+0x145/0x190)
[<8005ddf9>] (watchdog_timer_fn+0x145/0x190) from [<80036f91>] (hrtimer_run_queues+0xf9/0x240)
[<80036f91>] (hrtimer_run_queues+0xf9/0x240) from [<80027a01>] (run_local_timers+0xd/0x18)
[<80027a01>] (run_local_timers+0xd/0x18) from [<80027a33>] (update_process_times+0x27/0x54)
[<80027a33>] (update_process_times+0x27/0x54) from [<8004d4a3>] (tick_periodic+0x4b/0xb0)
[<8004d4a3>] (tick_periodic+0x4b/0xb0) from [<8004d593>] (tick_handle_periodic+0x23/0x78)
[<8004d593>] (tick_handle_periodic+0x23/0x78) from [<80012153>] (twd_handler+0x23/0x2c)
[<80012153>] (twd_handler+0x23/0x2c) from [<80060a5b>] (handle_percpu_devid_irq+0x73/0xc8)
[<80060a5b>] (handle_percpu_devid_irq+0x73/0xc8) from [<8005e163>] (generic_handle_irq+0x23/0x34)
[<8005e163>] (generic_handle_irq+0x23/0x34) from [<8000df8b>] (handle_IRQ+0x4f/0x94)
[<8000df8b>] (handle_IRQ+0x4f/0x94) from [<80008431>] (gic_handle_irq+0x29/0x84)
[<80008431>] (gic_handle_irq+0x29/0x84) from [<8000d1bf>] (__irq_svc+0x3f/0x80)
Exception stack(0x86d97ee8 to 0x86d97f30)
7ee0: 00000000 00000422 00000000 806e2d9c 80711180 806dc140
7f00: 00000421 00000001 806dc144 86d96000 00000000 00000000 00000000 86d97f30
7f20: 800491e7 800491e8 40000033 ffffffff
[<8000d1bf>] (__irq_svc+0x3f/0x80) from [<800491e8>] (do_settimeofday+0xd8/0xe4) 死鎖的位置在do_settimeofday+0xd8
[<800491e8>] (do_settimeofday+0xd8/0xe4) from [<80020ced>] (do_sys_settimeofday+0x49/0xa8)
[<80020ced>] (do_sys_settimeofday+0x49/0xa8) from [<80020d71>] (sys_settimeofday+0x25/0xb8)
[<80020d71>] (sys_settimeofday+0x25/0xb8) from [<8000d681>] (ret_fast_syscall+0x1/0x5c)
小試了下牛刀,等待FAE回複。
=============================================================================
333473
BUG: soft lockup - CPU#0 stuck for 61s! [events/0:51]
Modules linked in: OSLXcontext ippathmgt kmd kprn x_tables ip6_tables ip_tables ipv6 binfmt_misc 8021q
Cpu 0
$ 0 : 0000000000000000 ffffffff81410000 0000000000000003 0000000000000000
$ 4 : 0000000000000010 0000000000000010 0000000000000010 a80000041fa0bdd0
$ 8 : 0000000000000000 a80000041fbed080 0000000000000000 0000000000000001
$12 : a80000041fa0bfe0 0000000000008c00 ffffffff813b4000 000000000000009a
$16 : ffffffff813d0000 a8000004132faf00 000000000000000f 0000000000000001
$20 : 0000000000000000 ffffffff8119a920 0000000000000000 fffffffffffffffe
$24 : 0000000000000002 ffffffff81103e40
$28 : a80000041fa08000 a80000041fa0be00 ffffffff8136e3f0 ffffffff811898cc
Hi : 0000000000000000
Lo : 0000000000000000
epc : ffffffff81189784 smp_call_function_mask+0x144/0x258
Not tainted
ra : ffffffff811898cc smp_call_function+0x34/0x78
Status: 10008ce3 KX SX UX KERNEL EXL IE
Cause : 40808000
PrId : 000d030b (Cavium Octeon)
BUG: soft lockup - CPU#10 stuck for 61s! [ntpdate:3032]
Kernel panic - not syncing: softlockup: hung tasks
------------[ cut here ]------------
WARNING: at kernel/smp.c:332 smp_call_function_mask+0x250/0x258()
Modules linked in: OSLXcontext ippathmgt kmd kprn x_tables ip6_tables ip_tables ipv6 binfmt_misc 8021q
Call Trace:
[<ffffffff813221f4>] dump_stack+0x8/0x34
[<ffffffff8115f588>] warn_on_slowpath+0x58/0x90
[<ffffffff81189890>] smp_call_function_mask+0x250/0x258
[<ffffffff811898cc>] smp_call_function+0x34/0x78
[<ffffffff81322428>] panic+0x98/0x220
[<ffffffff81192c10>] softlockup_tick+0x290/0x310
[<ffffffff8116ce84>] update_process_times+0x34/0x78
[<ffffffff811852e8>] tick_sched_timer+0x58/0xd8
[<ffffffff8117e398>] __run_hrtimer+0xb0/0xc8
[<ffffffff8117f150>] hrtimer_interrupt+0x158/0x210
[<ffffffff81138ac4>] c0_compare_interrupt+0x11c/0x148
[<ffffffff8119334c>] _handle_IRQ_event+0x5c/0xb8
[<ffffffff81193430>] handle_IRQ_event+0x88/0x140
[<ffffffff811950e8>] handle_percpu_irq+0x78/0xd8
[<ffffffff81100f28>] plat_irq_dispatch+0xc8/0x1d0
[<ffffffff81130a00>] ret_from_irq+0x0/0x4
[<ffffffff81189784>] smp_call_function_mask+0x144/0x258
[<ffffffff811898cc>] smp_call_function+0x34/0x78
[<ffffffff8116652c>] on_each_cpu+0x2c/0x88
[<ffffffff81175410>] run_workqueue+0xc8/0x1a0
[<ffffffff81175960>] worker_thread+0x60/0xd0
[<ffffffff8117a404>] kthread+0x5c/0xb0
[<ffffffff81133328>] kernel_thread_helper+0x10/0x18
---[ end trace 1770b5013095d915 ]---
Rebooting in 10 seconds..<3>
Modules linked in: OSLXcontext ippathmgt kmd kprn x_tables ip6_tables ip_tables ipv6 binfmt_misc 8021q
Cpu 10
$ 0 : 0000000000000000 a80000041cf3be70 0000000000000003 0000000000000000
$ 4 : 0000000000000010 0000000000000010 0000000000000010 a80000041cf3bca0
$ 8 : 0000000000000000 000000003b9aca00 ffffffff81410000 ffffffff81412520
$12 : 0000000000000000 ffffffffc0000008 ffffffff81191258 fffffffff0000000
$16 : ffffffff813d0000 a80000041feaca80 000000000000000f 0000000000000001
$20 : 0000000000000000 ffffffff8117e610 0000000000000000 0000000000000001
$24 : 0000000000000042 ffffffff81103e40
$28 : a80000041cf38000 a80000041cf3bcd0 0000000010013fe0 ffffffff811898cc
Hi : 03d0900000000000
Lo : 0487ab0000000000
epc : ffffffff81189778 smp_call_function_mask+0x138/0x258
Tainted: G W
ra : ffffffff811898cc smp_call_function+0x34/0x78
Status: 10008ce3 KX SX UX KERNEL EXL IE
Cause : 40808000
PrId : 000d030b (Cavium Octeon)
Kernel panic - not syncing: softlockup: hung tasks
------------[ cut here ]------------
WARNING: at kernel/smp.c:332 smp_call_function_mask+0x250/0x258()
Modules linked in: OSLXcontext ippathmgt kmd kprn x_tables ip6_tables ip_tables ipv6 binfmt_misc 8021q
Call Trace:
[<ffffffff813221f4>] dump_stack+0x8/0x34
[<ffffffff8115f588>] warn_on_slowpath+0x58/0x90
[<ffffffff81189890>] smp_call_function_mask+0x250/0x258
[<ffffffff811898cc>] smp_call_function+0x34/0x78
[<ffffffff81322428>] panic+0x98/0x220
[<ffffffff81192c10>] softlockup_tick+0x290/0x310
[<ffffffff8116ce84>] update_process_times+0x34/0x78
[<ffffffff811852e8>] tick_sched_timer+0x58/0xd8
[<ffffffff8117e398>] __run_hrtimer+0xb0/0xc8
[<ffffffff8117f150>] hrtimer_interrupt+0x158/0x210
[<ffffffff81138ac4>] c0_compare_interrupt+0x11c/0x148
[<ffffffff8119334c>] _handle_IRQ_event+0x5c/0xb8
[<ffffffff81193430>] handle_IRQ_event+0x88/0x140
[<ffffffff811950e8>] handle_percpu_irq+0x78/0xd8
[<ffffffff81100f28>] plat_irq_dispatch+0xc8/0x1d0
[<ffffffff81130a00>] ret_from_irq+0x0/0x4
[<ffffffff81189778>] smp_call_function_mask+0x138/0x258
[<ffffffff811898cc>] smp_call_function+0x34/0x78
[<ffffffff8116652c>] on_each_cpu+0x2c/0x88
[<ffffffff81181ad4>] do_settimeofday+0x14c/0x180
[<ffffffff81178e0c>] SyS_clock_settime+0xec/0x108
[<ffffffff811912d0>] compat_sys_clock_settime+0x78/0x90
[<ffffffff81141b64>] handle_sysn32+0x44/0x84