syzbot


INFO: rcu detected stall in gc_worker (3)

Status: upstream: reported C repro on 2022/03/20 12:02
Reported-by: syzbot+eec403943a2a2455adaa@syzkaller.appspotmail.com
First crash: 201d, last: 18d

Cause bisection: the issue happens on the oldest tested release (bisect log)
Crash: BUG: soft lockup in do_idle (log)
Repro: C syz .config
similar bugs (3):
Kernel Title Repro Cause bisect Fix bisect Count Last Reported Patched Status
upstream INFO: rcu detected stall in gc_worker (2) C unreliable 4 260d 298d 0/24 closed as invalid on 2022/02/08 10:33
upstream INFO: rcu detected stall in gc_worker 8 1265d 1350d 0/24 auto-closed as invalid on 2019/10/14 16:34
linux-4.19 INFO: rcu detected stall in gc_worker syz error 1 164d 164d 0/1 upstream: reported syz repro on 2022/04/22 15:43
Patch testing requests:
Created Duration User Patch Repo Result
2022/07/31 09:48 11m hdanton@sina.com patch https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 91265a6da44d report log
2022/07/31 08:16 11m hdanton@sina.com patch https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 91265a6da44d report log
2022/07/31 07:46 10m hdanton@sina.com patch https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 91265a6da44d report log
2022/03/21 06:31 10m hdanton@sina.com patch upstream report log
2022/03/21 03:15 6m hdanton@sina.com patch https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/ 91265a6da44d error
2022/03/21 01:20 10m hdanton@sina.com patch https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/ 91265a6da44d report log
2022/03/20 15:22 10m hdanton@sina.com patch https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/ 91265a6da44d report log

Sample crash report:
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: 	0-...!: (1 GPs behind) idle=a59/1/0x4000000000000000 softirq=5468/5472 fqs=5 
	(detected by 1, t=10502 jiffies, g=4825, q=140)
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 3644 Comm: kworker/0:1 Not tainted 5.17.0-rc6-next-20220303-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events_power_efficient gc_worker
RIP: 0010:hlock_class kernel/locking/lockdep.c:240 [inline]
RIP: 0010:__lock_acquire+0x1460/0x56c0 kernel/locking/lockdep.c:5056
Code: 0f b7 db be 08 00 00 00 48 89 d8 48 c1 f8 06 48 8d 3c c5 80 59 01 90 e8 fe 54 67 00 48 0f a3 1d f6 ed a3 0e 0f 83 2c 06 00 00 <48> 8d 1c 5b 48 c1 e3 06 48 81 c3 a0 5d 01 90 48 8d 7b 40 48 b8 00
RSP: 0018:ffffc90000007be0 EFLAGS: 00000047
RAX: 0000000000000001 RBX: 0000000000000029 RCX: ffffffff815d6b82
RDX: fffffbfff2002b31 RSI: 0000000000000008 RDI: ffffffff90015980
RBP: ffff88801bc4a842 R08: 0000000000000000 R09: ffffffff90015987
R10: fffffbfff2002b30 R11: 0000000000000001 R12: ffff88801bc4a820
R13: ffff88801bc49d40 R14: 0000000000000001 R15: 3d0b7b89c89fff91
FS:  0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000600 CR3: 0000000076bbe000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <IRQ>
 lock_acquire kernel/locking/lockdep.c:5672 [inline]
 lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5637
 __raw_spin_lock_irq include/linux/spinlock_api_smp.h:119 [inline]
 _raw_spin_lock_irq+0x32/0x50 kernel/locking/spinlock.c:170
 __run_hrtimer kernel/time/hrtimer.c:1689 [inline]
 __hrtimer_run_queues+0x243/0xe50 kernel/time/hrtimer.c:1749
 hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline]
 __sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1103
 sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1097
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
RIP: 0010:preempt_count arch/x86/include/asm/preempt.h:27 [inline]
RIP: 0010:check_kcov_mode kernel/kcov.c:166 [inline]
RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x60 kernel/kcov.c:200
Code: 48 89 ef 5d e9 51 f3 4a 00 5d be 03 00 00 00 e9 96 96 74 02 66 0f 1f 44 00 00 48 8b be b0 01 00 00 e8 b4 ff ff ff 31 c0 c3 90 <65> 8b 05 29 49 89 7e 89 c1 48 8b 34 24 81 e1 00 01 00 00 65 48 8b
RSP: 0018:ffffc90003ccfc30 EFLAGS: 00000293
RAX: 0000000000000000 RBX: 0000000000000200 RCX: 0000000000000000
RDX: ffff88801bc49d40 RSI: ffffffff87821df7 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff90015a17
R10: ffffffff87821ded R11: 0000000000000001 R12: dffffc0000000000
R13: 0000000000000000 R14: 0000000000040000 R15: 0000000000040000
 __seqprop_spinlock_sequence include/linux/seqlock.h:277 [inline]
 nf_conntrack_get_ht include/net/netfilter/nf_conntrack.h:331 [inline]
 gc_worker+0x24d/0x12b0 net/netfilter/nf_conntrack_core.c:1441
 process_one_work+0x996/0x1610 kernel/workqueue.c:2289
 worker_thread+0x665/0x1080 kernel/workqueue.c:2436
 kthread+0x2e9/0x3a0 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
 </TASK>
INFO: NMI handler (nmi_cpu_backtrace_handler) took too long to run: 1.570 msecs
rcu: rcu_preempt kthread starved for 10492 jiffies! g4825 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:R  running task     stack:28752 pid:   16 ppid:     2 flags:0x00004000
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5043 [inline]
 __schedule+0xa94/0x4910 kernel/sched/core.c:6352
 schedule+0xd2/0x1f0 kernel/sched/core.c:6424
 schedule_timeout+0x14a/0x2a0 kernel/time/timer.c:1881
 rcu_gp_fqs_loop+0x186/0x810 kernel/rcu/tree.c:1999
 rcu_gp_kthread+0x1de/0x320 kernel/rcu/tree.c:2172
 kthread+0x2e9/0x3a0 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
 </TASK>
rcu: Stack dump where RCU GP kthread last ran:
NMI backtrace for cpu 1
CPU: 1 PID: 45 Comm: kworker/u4:2 Not tainted 5.17.0-rc6-next-20220303-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events_unbound toggle_allocation_gate
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:111
 nmi_trigger_cpumask_backtrace+0x1e6/0x230 lib/nmi_backtrace.c:62
 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
 rcu_check_gp_kthread_starvation.cold+0x1fb/0x200 kernel/rcu/tree_stall.h:516
 print_other_cpu_stall kernel/rcu/tree_stall.h:621 [inline]
 check_cpu_stall kernel/rcu/tree_stall.h:767 [inline]
 rcu_pending kernel/rcu/tree.c:3960 [inline]
 rcu_sched_clock_irq+0x21ae/0x22a0 kernel/rcu/tree.c:2660
 update_process_times+0x16d/0x200 kernel/time/timer.c:1785
 tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:243
 tick_sched_timer+0xee/0x120 kernel/time/tick-sched.c:1473
 __run_hrtimer kernel/time/hrtimer.c:1685 [inline]
 __hrtimer_run_queues+0x1c0/0xe50 kernel/time/hrtimer.c:1749
 hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline]
 __sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1103
 sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1097
 </IRQ>
 <TASK>
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
RIP: 0010:check_kcov_mode+0xf/0x40 kernel/kcov.c:166
Code: 7c 24 08 e8 b3 9b 4b 00 e9 61 fd ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc 65 8b 05 79 4d 89 7e 89 c2 81 e2 00 01 00 00 <a9> 00 01 ff 00 74 10 31 c0 85 d2 74 15 8b 96 ac 15 00 00 85 d2 74
RSP: 0018:ffffc90000b679c8 EFLAGS: 00000246
RAX: 0000000000000001 RBX: ffff8880b9c42380 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffff888016df8000 RDI: 0000000000000003
RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000001
R10: ffffffff816d99aa R11: 0000000000000000 R12: ffffed1017388471
R13: 0000000000000000 R14: ffff8880b9c42388 R15: 0000000000000001
 write_comp_data kernel/kcov.c:221 [inline]
 __sanitizer_cov_trace_const_cmp4+0x1c/0x70 kernel/kcov.c:287
 csd_lock_wait kernel/smp.c:443 [inline]
 smp_call_function_many_cond+0x50a/0xc90 kernel/smp.c:972
 on_each_cpu_cond_mask+0x56/0xa0 kernel/smp.c:1138
 on_each_cpu include/linux/smp.h:71 [inline]
 text_poke_sync arch/x86/kernel/alternative.c:1146 [inline]
 text_poke_bp_batch+0x3e9/0x6b0 arch/x86/kernel/alternative.c:1387
 text_poke_flush arch/x86/kernel/alternative.c:1504 [inline]
 text_poke_flush arch/x86/kernel/alternative.c:1501 [inline]
 text_poke_finish+0x16/0x30 arch/x86/kernel/alternative.c:1511
 arch_jump_label_transform_apply+0x13/0x20 arch/x86/kernel/jump_label.c:146
 jump_label_update+0x32f/0x410 kernel/jump_label.c:830
 static_key_enable_cpuslocked+0x1b1/0x260 kernel/jump_label.c:177
 static_key_enable+0x16/0x20 kernel/jump_label.c:190
 toggle_allocation_gate mm/kfence/core.c:735 [inline]
 toggle_allocation_gate+0x100/0x390 mm/kfence/core.c:727
 process_one_work+0x996/0x1610 kernel/workqueue.c:2289
 worker_thread+0x665/0x1080 kernel/workqueue.c:2436
 kthread+0x2e9/0x3a0 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
 </TASK>
----------------
Code disassembly (best guess):
   0:	0f b7 db             	movzwl %bx,%ebx
   3:	be 08 00 00 00       	mov    $0x8,%esi
   8:	48 89 d8             	mov    %rbx,%rax
   b:	48 c1 f8 06          	sar    $0x6,%rax
   f:	48 8d 3c c5 80 59 01 	lea    -0x6ffea680(,%rax,8),%rdi
  16:	90
  17:	e8 fe 54 67 00       	callq  0x67551a
  1c:	48 0f a3 1d f6 ed a3 	bt     %rbx,0xea3edf6(%rip)        # 0xea3ee1a
  23:	0e
  24:	0f 83 2c 06 00 00    	jae    0x656
* 2a:	48 8d 1c 5b          	lea    (%rbx,%rbx,2),%rbx <-- trapping instruction
  2e:	48 c1 e3 06          	shl    $0x6,%rbx
  32:	48 81 c3 a0 5d 01 90 	add    $0xffffffff90015da0,%rbx
  39:	48 8d 7b 40          	lea    0x40(%rbx),%rdi
  3d:	48                   	rex.W
  3e:	b8                   	.byte 0xb8

Crashes (4):
Manager Time Kernel Commit Syzkaller Config Log Report Syz repro C repro VM info Title
ci-upstream-linux-next-kasan-gce-root 2022/03/16 11:54 linux-next 91265a6da44d 9e8eaa75 .config log report syz C INFO: rcu detected stall in gc_worker
ci-upstream-net-kasan-gce 2022/09/15 00:49 net-next c9ae520ac3fa b884348d .config log report info INFO: rcu detected stall in gc_worker
ci-upstream-net-kasan-gce 2022/08/28 11:33 net-next 53a406803ca5 07177916 .config log report info INFO: rcu detected stall in gc_worker
ci-upstream-net-kasan-gce 2022/08/15 15:07 net-next 7ebfc85e2cd7 8dfcaa3d .config log report info INFO: rcu detected stall in gc_worker
* Struck through repros no longer work on HEAD.