syzbot


INFO: rcu detected stall in bond_3ad_state_machine_handler

Status: fixed on 2023/10/12 12:48
Subsystems: net
[Documentation on labels]
Fix commit: 8c21ab1bae94 net/sched: fq_pie: avoid stalls in fq_pie_timer()
First crash: 300d, last: 300d

Sample crash report:
rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 0-.... } 2661 jiffies s: 25777 root: 0x1/.
rcu: blocking rcu_node structures (internal RCU debug):
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 10 Comm: kworker/u4:0 Not tainted 6.5.0-rc2-syzkaller-00601-g2303fae13064 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2023
Workqueue: bond1 bond_3ad_state_machine_handler
RIP: 0010:pie_calculate_probability+0x392/0x850 net/sched/sch_pie.c:395
Code: 73 0d e8 01 f7 2c f9 48 c7 43 20 00 00 00 00 e8 f4 f6 2c f9 4c 8b 24 24 31 ff 4d 09 ec 4c 89 e6 e8 73 f2 2c f9 4d 85 e4 75 31 <e8> d9 f6 2c f9 44 0f b6 7c 24 2f 31 ff 44 89 fe e8 f9 f1 2c f9 45
RSP: 0018:ffffc90000007bc0 EFLAGS: 00000246
RAX: 0000000000000000 RBX: ffffc900178b2820 RCX: ffffffff8859874d
RDX: ffff888015e51dc0 RSI: 0000000000000100 RDI: 0000000000000007
RBP: ffff888063f86300 R08: 0000000000000007 R09: 0000000000000000
R10: 0000000000000000 R11: 00000000c798966b R12: 0000000000000000
R13: 0000000000000000 R14: fffffff0a3da8872 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fdde2f9d988 CR3: 0000000015b72000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <NMI>
 </NMI>
 <IRQ>
 fq_pie_timer+0x1da/0x4f0 net/sched/sch_fq_pie.c:387
 call_timer_fn+0x1a0/0x580 kernel/time/timer.c:1700
 expire_timers kernel/time/timer.c:1751 [inline]
 __run_timers+0x764/0xb10 kernel/time/timer.c:2022
 run_timer_softirq+0x58/0xd0 kernel/time/timer.c:2035
 __do_softirq+0x218/0x965 kernel/softirq.c:553
 invoke_softirq kernel/softirq.c:427 [inline]
 __irq_exit_rcu kernel/softirq.c:632 [inline]
 irq_exit_rcu+0xb7/0x120 kernel/softirq.c:644
 sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1109
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645
RIP: 0010:queue_delayed_work_on+0x9a/0x130 kernel/workqueue.c:1905
Code: ff 48 89 ee e8 f7 1f 31 00 48 85 ed 75 42 e8 5d 24 31 00 9c 5b 81 e3 00 02 00 00 31 ff 48 89 de e8 db 1f 31 00 48 85 db 75 71 <e8> 41 24 31 00 44 89 e8 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f
RSP: 0018:ffffc900000f7b48 EFLAGS: 00000293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff888015e51dc0 RSI: ffffffff81555a36 RDI: 0000000000000007
RBP: 0000000000000200 R08: 0000000000000007 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000001 R14: ffff888021225800 R15: 000000000000000a
 queue_delayed_work include/linux/workqueue.h:521 [inline]
 bond_3ad_state_machine_handler+0x2696/0x58b0 drivers/net/bonding/bond_3ad.c:2395
 process_one_work+0xaa2/0x16f0 kernel/workqueue.c:2597
 worker_thread+0x687/0x1110 kernel/workqueue.c:2748
 kthread+0x33a/0x430 kernel/kthread.c:389
 ret_from_fork+0x2c/0x70 arch/x86/kernel/process.c:145
 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:296
RIP: 0000:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
 </TASK>
GRED: Unable to relocate VQ 0x0 after dequeue, screwing up backlog

Crashes (1):
Time Kernel Commit Syzkaller Config Log Report Syz repro C repro VM info Assets (help?) Manager Title
2023/07/27 02:30 net-next 2303fae13064 41fe1bae .config console log report info ci-upstream-net-kasan-gce INFO: rcu detected stall in bond_3ad_state_machine_handler
* Struck through repros no longer work on HEAD.