syzbot


INFO: rcu detected stall in br_multicast_query_expired

Status: fixed on 2022/03/08 16:11
Subsystems: kernel
[Documentation on labels]
Fix commit: f83a112bd91a net: bridge: mcast: add and enforce startup query interval minimum
First crash: 1086d, last: 1086d
Cause bisection: the issue happens on the oldest tested release (bisect log)
Crash: INFO: rcu detected stall in lock_is_held_type (log)
Repro: C syz .config
  

Sample crash report:
Total swap = 0kB
2097051 pages RAM
0 pages HighMem/MovableOnly
384517 pages reserved
0 pages cma reserved
rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 	0-...!: (1 GPs behind) idle=6eb/1/0x4000000000000000 softirq=14214/14215 fqs=16 
	(t=10500 jiffies g=15505 q=102699)
rcu: rcu_preempt kthread starved for 9059 jiffies! g15505 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:R  running task     stack:28688 pid:   14 ppid:     2 flags:0x00004000
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:4972 [inline]
 context_switch kernel/sched/core.c:4972 [inline] kernel/sched/core.c:6253
 __schedule+0xa9a/0x4940 kernel/sched/core.c:6253 kernel/sched/core.c:6253
 schedule+0xd2/0x260 kernel/sched/core.c:6326 kernel/sched/core.c:6326
 schedule_timeout+0x14a/0x2a0 kernel/time/timer.c:1881 kernel/time/timer.c:1881
 rcu_gp_fqs_loop+0x186/0x810 kernel/rcu/tree.c:1955 kernel/rcu/tree.c:1955
 rcu_gp_kthread+0x1de/0x320 kernel/rcu/tree.c:2128 kernel/rcu/tree.c:2128
 kthread+0x405/0x4f0 kernel/kthread.c:327 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 arch/x86/entry/entry_64.S:295
 </TASK>
rcu: Stack dump where RCU GP kthread last ran:
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 PID: 8 Comm: kworker/u4:0 Not tainted 5.16.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events_unbound toggle_allocation_gate
RIP: 0010:csd_lock_wait kernel/smp.c:440 [inline]
RIP: 0010:csd_lock_wait kernel/smp.c:440 [inline] kernel/smp.c:969
RIP: 0010:smp_call_function_many_cond+0x457/0xc20 kernel/smp.c:969 kernel/smp.c:969
Code: 4d 48 b8 00 00 00 00 00 fc ff df 4d 89 f4 4c 89 f5 49 c1 ec 03 83 e5 07 49 01 c4 83 c5 03 e8 40 78 0b 00 f3 90 41 0f b6 04 24 <40> 38 c5 7c 08 84 c0 0f 85 33 06 00 00 8b 43 08 31 ff 83 e0 01 41
RSP: 0018:ffffc90000cd79f8 EFLAGS: 00000293
RAX: 0000000000000000 RBX: ffff8880b9c41d40 RCX: 0000000000000000
RDX: ffff888010e7d700 RSI: ffffffff816c2a80 RDI: 0000000000000003
RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000001
R10: ffffffff816c2aa6 R11: 0000000000000000 R12: ffffed10173883a9
R13: 0000000000000000 R14: ffff8880b9c41d48 R15: 0000000000000001
FS:  0000000000000000(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000564fa466b4c0 CR3: 000000000b88e000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 on_each_cpu_cond_mask+0x56/0xa0 kernel/smp.c:1135 kernel/smp.c:1135
 on_each_cpu include/linux/smp.h:71 [inline]
 text_poke_sync arch/x86/kernel/alternative.c:1112 [inline]
 on_each_cpu include/linux/smp.h:71 [inline] arch/x86/kernel/alternative.c:1297
 text_poke_sync arch/x86/kernel/alternative.c:1112 [inline] arch/x86/kernel/alternative.c:1297
 text_poke_bp_batch+0x1b3/0x560 arch/x86/kernel/alternative.c:1297 arch/x86/kernel/alternative.c:1297
 text_poke_flush arch/x86/kernel/alternative.c:1451 [inline]
 text_poke_flush arch/x86/kernel/alternative.c:1448 [inline]
 text_poke_flush arch/x86/kernel/alternative.c:1451 [inline] arch/x86/kernel/alternative.c:1458
 text_poke_flush arch/x86/kernel/alternative.c:1448 [inline] arch/x86/kernel/alternative.c:1458
 text_poke_finish+0x16/0x30 arch/x86/kernel/alternative.c:1458 arch/x86/kernel/alternative.c:1458
 arch_jump_label_transform_apply+0x13/0x20 arch/x86/kernel/jump_label.c:146 arch/x86/kernel/jump_label.c:146
 jump_label_update+0x1d5/0x430 kernel/jump_label.c:830 kernel/jump_label.c:830
 static_key_enable_cpuslocked+0x1b1/0x260 kernel/jump_label.c:177 kernel/jump_label.c:177
 static_key_enable+0x16/0x20 kernel/jump_label.c:190 kernel/jump_label.c:190
 toggle_allocation_gate mm/kfence/core.c:732 [inline]
 toggle_allocation_gate mm/kfence/core.c:732 [inline] mm/kfence/core.c:724
 toggle_allocation_gate+0x100/0x390 mm/kfence/core.c:724 mm/kfence/core.c:724
 process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298 kernel/workqueue.c:2298
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2445 kernel/workqueue.c:2445
 kthread+0x405/0x4f0 kernel/kthread.c:327 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 arch/x86/entry/entry_64.S:295
 </TASK>
NMI backtrace for cpu 0
CPU: 0 PID: 13 Comm: ksoftirqd/0 Not tainted 5.16.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:88 [inline]
 __dump_stack lib/dump_stack.c:88 [inline] lib/dump_stack.c:106
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 lib/dump_stack.c:106
 nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:111 lib/nmi_backtrace.c:111
 nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62 lib/nmi_backtrace.c:62
 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline] kernel/rcu/tree_stall.h:343
 rcu_dump_cpu_stacks+0x25e/0x3f0 kernel/rcu/tree_stall.h:343 kernel/rcu/tree_stall.h:343
 print_cpu_stall kernel/rcu/tree_stall.h:627 [inline]
 check_cpu_stall kernel/rcu/tree_stall.h:711 [inline]
 rcu_pending kernel/rcu/tree.c:3878 [inline]
 print_cpu_stall kernel/rcu/tree_stall.h:627 [inline] kernel/rcu/tree.c:2597
 check_cpu_stall kernel/rcu/tree_stall.h:711 [inline] kernel/rcu/tree.c:2597
 rcu_pending kernel/rcu/tree.c:3878 [inline] kernel/rcu/tree.c:2597
 rcu_sched_clock_irq.cold+0x9d/0x746 kernel/rcu/tree.c:2597 kernel/rcu/tree.c:2597
 update_process_times+0x16d/0x200 kernel/time/timer.c:1785 kernel/time/timer.c:1785
 tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226 kernel/time/tick-sched.c:226
 tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1428 kernel/time/tick-sched.c:1428
 __run_hrtimer kernel/time/hrtimer.c:1685 [inline]
 __run_hrtimer kernel/time/hrtimer.c:1685 [inline] kernel/time/hrtimer.c:1749
 __hrtimer_run_queues+0x1c0/0xe50 kernel/time/hrtimer.c:1749 kernel/time/hrtimer.c:1749
 hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811 kernel/time/hrtimer.c:1811
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline]
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline] arch/x86/kernel/apic/apic.c:1103
 __sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1103 arch/x86/kernel/apic/apic.c:1103
 sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1097 arch/x86/kernel/apic/apic.c:1097
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 arch/x86/include/asm/idtentry.h:638
RIP: 0010:unwind_next_frame+0xd0a/0x1ce0 arch/x86/kernel/unwind_orc.c:539 arch/x86/kernel/unwind_orc.c:539
Code: 28 84 c0 48 8b 4c 24 30 0f 84 21 07 00 00 48 b8 00 00 00 00 00 fc ff df 48 8b 54 24 08 4c 8b 64 24 60 48 c1 ea 03 80 3c 02 00 <0f> 85 86 0e 00 00 48 8b 14 24 4d 89 66 38 48 b8 00 00 00 00 00 fc
RSP: 0018:ffffc90000d27220 EFLAGS: 00000246
RAX: dffffc0000000000 RBX: 1ffff920001a4e4c RCX: ffffffff8e8251e7
RDX: 1ffff920001a4e66 RSI: ffffc90000d27838 RDI: ffffc90000d27838
RBP: 0000000000000001 R08: ffffffff8e8251e2 R09: 0000000000000001
R10: fffff520001a4e6a R11: 000000000008808a R12: ffffc90000d27840
R13: ffffc90000d2732d R14: ffffc90000d272f8 R15: ffffffff8e8251e6
 arch_stack_walk+0x7d/0xe0 arch/x86/kernel/stacktrace.c:25 arch/x86/kernel/stacktrace.c:25
 stack_trace_save+0x8c/0xc0 kernel/stacktrace.c:122 kernel/stacktrace.c:122
 kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38 mm/kasan/common.c:38
 kasan_set_track+0x21/0x30 mm/kasan/common.c:46 mm/kasan/common.c:46
 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370 mm/kasan/generic.c:370
 ____kasan_slab_free mm/kasan/common.c:366 [inline]
 ____kasan_slab_free mm/kasan/common.c:328 [inline]
 ____kasan_slab_free mm/kasan/common.c:366 [inline] mm/kasan/common.c:374
 ____kasan_slab_free mm/kasan/common.c:328 [inline] mm/kasan/common.c:374
 __kasan_slab_free+0xff/0x130 mm/kasan/common.c:374 mm/kasan/common.c:374
 kasan_slab_free include/linux/kasan.h:235 [inline]
 slab_free_hook mm/slub.c:1723 [inline]
 kasan_slab_free include/linux/kasan.h:235 [inline] mm/slub.c:1749
 slab_free_hook mm/slub.c:1723 [inline] mm/slub.c:1749
 slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1749 mm/slub.c:1749
 slab_free mm/slub.c:3513 [inline]
 slab_free mm/slub.c:3513 [inline] mm/slub.c:4561
 kfree+0xf6/0x560 mm/slub.c:4561 mm/slub.c:4561
 skb_free_head net/core/skbuff.c:655 [inline]
 skb_free_head net/core/skbuff.c:655 [inline] net/core/skbuff.c:677
 skb_release_data+0x65a/0x790 net/core/skbuff.c:677 net/core/skbuff.c:677
 skb_release_all net/core/skbuff.c:742 [inline]
 __kfree_skb net/core/skbuff.c:756 [inline]
 kfree_skb net/core/skbuff.c:774 [inline]
 skb_release_all net/core/skbuff.c:742 [inline] net/core/skbuff.c:768
 __kfree_skb net/core/skbuff.c:756 [inline] net/core/skbuff.c:768
 kfree_skb net/core/skbuff.c:774 [inline] net/core/skbuff.c:768
 kfree_skb+0x133/0x3f0 net/core/skbuff.c:768 net/core/skbuff.c:768
 enqueue_to_backlog+0x210/0x10c0 net/core/dev.c:4676 net/core/dev.c:4676
 netif_rx_internal+0x120/0x640 net/core/dev.c:4930 net/core/dev.c:4930
 netif_rx+0x7f/0x510 net/core/dev.c:4957 net/core/dev.c:4957
 __br_multicast_send_query+0x31df/0x3aa0 net/bridge/br_multicast.c:1658 net/bridge/br_multicast.c:1658
 br_multicast_send_query+0x266/0x4b0 net/bridge/br_multicast.c:1725 net/bridge/br_multicast.c:1725
 br_multicast_query_expired.constprop.0+0x168/0x1b0 net/bridge/br_multicast.c:3811 net/bridge/br_multicast.c:3811
 call_timer_fn+0x1a5/0x6b0 kernel/time/timer.c:1421 kernel/time/timer.c:1421
 expire_timers kernel/time/timer.c:1466 [inline]
 expire_timers kernel/time/timer.c:1466 [inline] kernel/time/timer.c:1734
 __run_timers.part.0+0x675/0xa20 kernel/time/timer.c:1734 kernel/time/timer.c:1734
 __run_timers kernel/time/timer.c:1715 [inline]
 __run_timers kernel/time/timer.c:1715 [inline] kernel/time/timer.c:1747
 run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1747 kernel/time/timer.c:1747
 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558 kernel/softirq.c:558
 run_ksoftirqd kernel/softirq.c:921 [inline]
 run_ksoftirqd kernel/softirq.c:921 [inline] kernel/softirq.c:913
 run_ksoftirqd+0x2d/0x60 kernel/softirq.c:913 kernel/softirq.c:913
 smpboot_thread_fn+0x645/0x9c0 kernel/smpboot.c:164 kernel/smpboot.c:164
 kthread+0x405/0x4f0 kernel/kthread.c:327 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 arch/x86/entry/entry_64.S:295
 </TASK>
----------------
Code disassembly (best guess):
   0:	4d                   	rex.WRB
   1:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax
   8:	fc ff df
   b:	4d 89 f4             	mov    %r14,%r12
   e:	4c 89 f5             	mov    %r14,%rbp
  11:	49 c1 ec 03          	shr    $0x3,%r12
  15:	83 e5 07             	and    $0x7,%ebp
  18:	49 01 c4             	add    %rax,%r12
  1b:	83 c5 03             	add    $0x3,%ebp
  1e:	e8 40 78 0b 00       	callq  0xb7863
  23:	f3 90                	pause
  25:	41 0f b6 04 24       	movzbl (%r12),%eax
* 2a:	40 38 c5             	cmp    %al,%bpl <-- trapping instruction
  2d:	7c 08                	jl     0x37
  2f:	84 c0                	test   %al,%al
  31:	0f 85 33 06 00 00    	jne    0x66a
  37:	8b 43 08             	mov    0x8(%rbx),%eax
  3a:	31 ff                	xor    %edi,%edi
  3c:	83 e0 01             	and    $0x1,%eax
  3f:	41                   	rex.B
----------------
Code disassembly (best guess):
   0:	4d                   	rex.WRB
   1:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax
   8:	fc ff df
   b:	4d 89 f4             	mov    %r14,%r12
   e:	4c 89 f5             	mov    %r14,%rbp
  11:	49 c1 ec 03          	shr    $0x3,%r12
  15:	83 e5 07             	and    $0x7,%ebp
  18:	49 01 c4             	add    %rax,%r12
  1b:	83 c5 03             	add    $0x3,%ebp
  1e:	e8 40 78 0b 00       	callq  0xb7863
  23:	f3 90                	pause
  25:	41 0f b6 04 24       	movzbl (%r12),%eax
* 2a:	40 38 c5             	cmp    %al,%bpl <-- trapping instruction
  2d:	7c 08                	jl     0x37
  2f:	84 c0                	test   %al,%al
  31:	0f 85 33 06 00 00    	jne    0x66a
  37:	8b 43 08             	mov    0x8(%rbx),%eax
  3a:	31 ff                	xor    %edi,%edi
  3c:	83 e0 01             	and    $0x1,%eax
  3f:	41                   	rex.B

Crashes (2):
Time Kernel Commit Syzkaller Config Log Report Syz repro C repro VM info Assets (help?) Manager Title
2021/12/21 09:54 upstream 86085fe79e3c 62bd192b .config console log report syz C ci-upstream-kasan-gce INFO: rcu detected stall in br_multicast_query_expired
2021/12/21 10:03 net-old 75a2f3152009 62bd192b .config console log report syz C ci-upstream-net-this-kasan-gce INFO: rcu detected stall in br_multicast_query_expired
* Struck through repros no longer work on HEAD.