syzbot


INFO: rcu detected stall in addrconf_dad_work

Status: upstream: reported syz repro on 2024/09/01 00:22
Bug presence: origin:lts-only
[Documentation on labels]
Reported-by: syzbot+54ea4e59bcdd503f3ea7@syzkaller.appspotmail.com
First crash: 35d, last: 35d
Fix commit to backport (bisect log) :
tree: upstream
commit e634134180885574d1fe7aa162777ba41e7fcd5b
Author: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: Mon May 27 15:39:54 2024 +0000

  net/sched: taprio: make q->picos_per_byte available to fill_sched_entry()

  
Bug presence (2)
Date Name Commit Repro Result
2024/09/01 linux-5.15.y (ToT) fa93fa65db6e C [report] INFO: rcu detected stall in inet_rtm_newaddr
2024/09/01 upstream (ToT) 6cd90e5ea72f C Didn't crash
Similar bugs (11)
Kernel Title Repro Cause bisect Fix bisect Count Last Reported Patched Status
upstream INFO: rcu detected stall in addrconf_dad_work (5) net C done inconclusive 12 107d 1489d 0/28 upstream: reported C repro on 2020/09/07 15:59
upstream INFO: rcu detected stall in addrconf_dad_work (4) cgroups mm 8 1732d 1733d 0/28 closed as invalid on 2020/01/09 08:13
upstream INFO: rcu detected stall in addrconf_dad_work (3) kernel 6 1733d 1733d 0/28 closed as invalid on 2020/01/08 05:23
linux-4.14 INFO: rcu detected stall in addrconf_dad_work C done 18 1849d 1856d 1/1 fixed on 2019/12/06 10:33
upstream INFO: rcu detected stall in addrconf_dad_work (2) kernel 15 1768d 1769d 0/28 closed as invalid on 2019/12/04 14:14
upstream INFO: rcu detected stall in addrconf_dad_work C done 126 1847d 1852d 13/28 fixed on 2019/10/09 10:54
linux-4.19 INFO: rcu detected stall in addrconf_dad_work (2) C done 1 1752d 1752d 1/1 fixed on 2020/01/19 15:05
linux-4.19 INFO: rcu detected stall in addrconf_dad_work C done 19 1844d 1855d 1/1 fixed on 2019/12/07 19:18
linux-5.15 BUG: soft lockup in addrconf_dad_work 1 446d 446d 0/3 auto-obsoleted due to no activity on 2023/10/25 16:01
linux-4.19 BUG: soft lockup in addrconf_dad_work C error 55 624d 966d 0/1 upstream: reported C repro on 2022/02/13 10:05
android-5-15 BUG: soft lockup in addrconf_dad_work 3 106d 161d 0/2 auto-obsoleted due to no activity on 2024/09/19 17:30
Fix bisection attempts (1)
Created Duration User Patch Repo Result
2024/09/20 06:18 8h32m fix candidate upstream OK (1) job log

Sample crash report:
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: 	1-...!: (2 ticks this GP) idle=fe1/1/0x4000000000000000 softirq=5871/5871 fqs=0 
	(detected by 0, t=10502 jiffies, g=5385, q=811)
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 3695 Comm: kworker/1:4 Not tainted 5.15.165-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:lock_is_held_type+0x4d/0x180 kernel/locking/lockdep.c:5659
Code: 5c 36 b6 03 00 0f 84 00 01 00 00 65 8b 05 0b 29 cf 75 85 c0 0f 85 f1 00 00 00 65 4c 8b 2d bb 1f cf 75 41 83 bd ec 0a 00 00 00 <0f> 85 db 00 00 00 41 89 f6 49 89 ff 48 c7 04 24 00 00 00 00 9c 8f
RSP: 0018:ffffc90000dd0c78 EFLAGS: 00000046
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88801e5a1dc0
RDX: ffff88801e5a1dc0 RSI: 00000000ffffffff RDI: ffff88801f47f300
RBP: 00000000ffffffff R08: ffffffff888134db R09: 0000000000000003
R10: ffffffffffffffff R11: dffffc0000000001 R12: ffff88801f47f000
R13: ffff88801e5a1dc0 R14: dffffc0000000000 R15: ffff88801f47f328
FS:  0000000000000000(0000) GS:ffff8880b9100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0cc07ec761 CR3: 0000000071328000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <NMI>
 </NMI>
 <IRQ>
 lock_is_held include/linux/lockdep.h:287 [inline]
 advance_sched+0x13d/0x940 net/sched/sch_taprio.c:721
 __run_hrtimer kernel/time/hrtimer.c:1686 [inline]
 __hrtimer_run_queues+0x598/0xcf0 kernel/time/hrtimer.c:1750
 hrtimer_interrupt+0x392/0x980 kernel/time/hrtimer.c:1812
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1085 [inline]
 __sysvec_apic_timer_interrupt+0x139/0x470 arch/x86/kernel/apic/apic.c:1102
 sysvec_apic_timer_interrupt+0x8c/0xb0 arch/x86/kernel/apic/apic.c:1096
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x16/0x20 arch/x86/include/asm/idtentry.h:638
RIP: 0010:check_kcov_mode kernel/kcov.c:183 [inline]
RIP: 0010:__sanitizer_cov_trace_pc+0x32/0x60 kernel/kcov.c:206
Code: 14 0e 82 7e 65 8b 15 15 0e 82 7e 81 e2 00 01 ff 00 74 11 81 fa 00 01 00 00 75 35 83 b9 34 16 00 00 00 74 2c 8b 91 10 16 00 00 <83> fa 02 75 21 48 8b 91 18 16 00 00 48 8b 32 48 8d 7e 01 8b 89 14
RSP: 0018:ffffc90003037898 EFLAGS: 00000246
RAX: ffffffff89152906 RBX: ffff88807df3f7e8 RCX: ffff88801e5a1dc0
RDX: 0000000000000000 RSI: ffffffff8ad8f7a0 RDI: ffffffff8ad8f760
RBP: ffffc90003037a30 R08: ffffffff8915264f R09: fffffbfff20e1e19
R10: 0000000000000000 R11: dffffc0000000001 R12: ffff88805ac745d8
R13: ffffc90003037988 R14: dffffc0000000000 R15: 0000000000000000
 __fib6_clean_all+0x376/0x480 net/ipv6/ip6_fib.c:2257
 rt_genid_bump_ipv6 include/net/net_namespace.h:460 [inline]
 addrconf_dad_completed+0x518/0xc40 net/ipv6/addrconf.c:4292
 addrconf_dad_work+0xdd0/0x1720
 process_one_work+0x8a1/0x10c0 kernel/workqueue.c:2310
 worker_thread+0xaca/0x1280 kernel/workqueue.c:2457
 kthread+0x3f6/0x4f0 kernel/kthread.c:334
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:287
 </TASK>
rcu: rcu_preempt kthread timer wakeup didn't happen for 10501 jiffies! g5385 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
rcu: 	Possible timer handling issue on cpu=1 timer-softirq=2439
rcu: rcu_preempt kthread starved for 10502 jiffies! g5385 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1
rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:I stack:27256 pid:   15 ppid:     2 flags:0x00004000
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5027 [inline]
 __schedule+0x12c4/0x45b0 kernel/sched/core.c:6373
 schedule+0x11b/0x1f0 kernel/sched/core.c:6456
 schedule_timeout+0x1b9/0x300 kernel/time/timer.c:1914
 rcu_gp_fqs_loop+0x2bf/0x1080 kernel/rcu/tree.c:1972
 rcu_gp_kthread+0xa4/0x360 kernel/rcu/tree.c:2145
 kthread+0x3f6/0x4f0 kernel/kthread.c:334
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:287
 </TASK>
rcu: Stack dump where RCU GP kthread last ran:
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 3695 Comm: kworker/1:4 Not tainted 5.15.165-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
Workqueue: ipv6_addrconf addrconf_dad_work
RIP: 0010:rcu_dynticks_curr_cpu_in_eqs kernel/rcu/tree.c:331 [inline]
RIP: 0010:rcu_is_watching+0xc/0xa0 kernel/rcu/tree.c:1123
Code: ba 34 c8 08 41 f7 c4 00 02 00 00 75 b4 eb b3 e8 9a 34 c8 08 66 2e 0f 1f 84 00 00 00 00 00 41 57 41 56 53 65 ff 05 ec 56 97 7e <e8> 3f 48 c8 08 89 c3 83 f8 08 73 72 49 bf 00 00 00 00 00 fc ff df
RSP: 0018:ffffc90000dd0b20 EFLAGS: 00000083
RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffffff8162adbc
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffff8de952a8
RBP: ffffc90000dd0ca0 R08: dffffc0000000000 R09: fffffbfff1bd2a56
R10: 0000000000000000 R11: dffffc0000000001 R12: 1ffff920001ba170
R13: dffffc0000000000 R14: 0000000000000000 R15: 000000258044ca30
FS:  0000000000000000(0000) GS:ffff8880b9100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0cc07ec761 CR3: 0000000071328000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <NMI>
 </NMI>
 <IRQ>
 trace_lock_acquire include/trace/events/lock.h:13 [inline]
 lock_acquire+0xdd/0x4f0 kernel/locking/lockdep.c:5594
 rcu_lock_acquire+0x2a/0x30 include/linux/rcupdate.h:312
 rcu_read_lock include/linux/rcupdate.h:739 [inline]
 advance_sched+0x6ce/0x940 net/sched/sch_taprio.c:769
 __run_hrtimer kernel/time/hrtimer.c:1686 [inline]
 __hrtimer_run_queues+0x598/0xcf0 kernel/time/hrtimer.c:1750
 hrtimer_interrupt+0x392/0x980 kernel/time/hrtimer.c:1812
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1085 [inline]
 __sysvec_apic_timer_interrupt+0x139/0x470 arch/x86/kernel/apic/apic.c:1102
 sysvec_apic_timer_interrupt+0x8c/0xb0 arch/x86/kernel/apic/apic.c:1096
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x16/0x20 arch/x86/include/asm/idtentry.h:638
RIP: 0010:check_kcov_mode kernel/kcov.c:183 [inline]
RIP: 0010:__sanitizer_cov_trace_pc+0x32/0x60 kernel/kcov.c:206
Code: 14 0e 82 7e 65 8b 15 15 0e 82 7e 81 e2 00 01 ff 00 74 11 81 fa 00 01 00 00 75 35 83 b9 34 16 00 00 00 74 2c 8b 91 10 16 00 00 <83> fa 02 75 21 48 8b 91 18 16 00 00 48 8b 32 48 8d 7e 01 8b 89 14
RSP: 0018:ffffc90003037898 EFLAGS: 00000246
RAX: ffffffff89152906 RBX: ffff88807df3f7e8 RCX: ffff88801e5a1dc0
RDX: 0000000000000000 RSI: ffffffff8ad8f7a0 RDI: ffffffff8ad8f760
RBP: ffffc90003037a30 R08: ffffffff8915264f R09: fffffbfff20e1e19
R10: 0000000000000000 R11: dffffc0000000001 R12: ffff88805ac745d8
R13: ffffc90003037988 R14: dffffc0000000000 R15: 0000000000000000
 __fib6_clean_all+0x376/0x480 net/ipv6/ip6_fib.c:2257
 rt_genid_bump_ipv6 include/net/net_namespace.h:460 [inline]
 addrconf_dad_completed+0x518/0xc40 net/ipv6/addrconf.c:4292
 addrconf_dad_work+0xdd0/0x1720
 process_one_work+0x8a1/0x10c0 kernel/workqueue.c:2310
 worker_thread+0xaca/0x1280 kernel/workqueue.c:2457
 kthread+0x3f6/0x4f0 kernel/kthread.c:334
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:287
 </TASK>

Crashes (1):
Time Kernel Commit Syzkaller Config Log Report Syz repro C repro VM info Assets (help?) Manager Title
2024/09/01 00:22 linux-5.15.y fa93fa65db6e 1eda0d14 .config console log report syz / log [disk image] [vmlinux] [kernel image] ci2-linux-5-15-kasan INFO: rcu detected stall in addrconf_dad_work
* Struck through repros no longer work on HEAD.