syzbot


possible deadlock in worker_thread

Status: fixed on 2023/02/24 14:17
Subsystems: rdma
[Documentation on labels]
Reported-by: syzbot+831661966588c802aae9@syzkaller.appspotmail.com
Fix commit: 081bdc9fe05b RDMA/ib_srp: Fix a deadlock bf23747ee053 loop: revert "make autoclear operation asynchronous"
First crash: 810d, last: 810d
Discussions (12)
Title Replies (including bot) Last reply
[PATCH v3 (repost)] workqueue: Warn flushing of kernel-global workqueues 11 (11) 2022/05/16 01:56
[PATCH] workqueue: Use private WQ for schedule_on_each_cpu() API 9 (9) 2022/03/17 22:25
[PATCH 4.19 00/34] 4.19.232-rc1 review 49 (49) 2022/03/17 10:24
[PATCH 4.9 00/29] 4.9.304-rc1 review 36 (36) 2022/03/02 21:01
[PATCH 5.16 000/164] 5.16.12-rc1 review 177 (177) 2022/03/02 07:05
[PATCH 5.15 000/139] 5.15.26-rc1 review 150 (150) 2022/03/02 07:04
[PATCH 5.10 00/80] 5.10.103-rc1 review 89 (89) 2022/03/02 07:04
[PATCH 5.4 00/53] 5.4.182-rc1 review 60 (60) 2022/03/02 07:04
[PATCH 4.14 00/31] 4.14.269-rc1 review 35 (35) 2022/03/01 19:35
[PATCH v2] workqueue: Warn flush attempt using system-wide workqueues 5 (5) 2022/02/23 22:06
[syzbot] possible deadlock in worker_thread 18 (19) 2022/02/22 18:30
[PATCH for 5.17] loop: revert "make autoclear operation asynchronous" 2 (2) 2022/02/11 12:52

Sample crash report:
======================================================
WARNING: possible circular locking dependency detected
5.17.0-rc2-syzkaller-00398-gd8ad2ce873ab #0 Not tainted
------------------------------------------------------
kworker/0:7/17139 is trying to acquire lock:
ffff888077a89938 ((wq_completion)loop1){+.+.}-{0:0}, at: flush_workqueue+0xe1/0x13a0 kernel/workqueue.c:2824

but task is already holding lock:
ffffc9000fa07db8 ((work_completion)(&lo->rundown_work)){+.+.}-{0:0}, at: process_one_work+0x8c4/0x1650 kernel/workqueue.c:2282

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #7 ((work_completion)(&lo->rundown_work)){+.+.}-{0:0}:
       process_one_work+0x91b/0x1650 kernel/workqueue.c:2283
       worker_thread+0x657/0x1110 kernel/workqueue.c:2454
       kthread+0x2e9/0x3a0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

-> #6 ((wq_completion)events_long){+.+.}-{0:0}:
       flush_workqueue+0x110/0x13a0 kernel/workqueue.c:2827
       srp_remove_one+0x1cf/0x320 drivers/infiniband/ulp/srp/ib_srp.c:4052
       remove_client_context+0xbe/0x110 drivers/infiniband/core/device.c:775
       disable_device+0x13b/0x270 drivers/infiniband/core/device.c:1281
       __ib_unregister_device+0x98/0x1a0 drivers/infiniband/core/device.c:1474
       ib_unregister_device_and_put+0x57/0x80 drivers/infiniband/core/device.c:1538
       nldev_dellink+0x1fb/0x300 drivers/infiniband/core/nldev.c:1747
       rdma_nl_rcv_msg+0x36d/0x690 drivers/infiniband/core/netlink.c:195
       rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
       rdma_nl_rcv+0x2ee/0x430 drivers/infiniband/core/netlink.c:259
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:725
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #5 (&device->unregistration_lock){+.+.}-{3:3}:
       __mutex_lock_common kernel/locking/mutex.c:600 [inline]
       __mutex_lock+0x12f/0x12f0 kernel/locking/mutex.c:733
       __ib_unregister_device+0x25/0x1a0 drivers/infiniband/core/device.c:1470
       ib_unregister_device_and_put+0x57/0x80 drivers/infiniband/core/device.c:1538
       nldev_dellink+0x1fb/0x300 drivers/infiniband/core/nldev.c:1747
       rdma_nl_rcv_msg+0x36d/0x690 drivers/infiniband/core/netlink.c:195
       rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
       rdma_nl_rcv+0x2ee/0x430 drivers/infiniband/core/netlink.c:259
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:725
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #4 (&rdma_nl_types[idx].sem){.+.+}-{3:3}:
       down_read+0x98/0x440 kernel/locking/rwsem.c:1461
       rdma_nl_rcv_msg+0x161/0x690 drivers/infiniband/core/netlink.c:164
       rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
       rdma_nl_rcv+0x2ee/0x430 drivers/infiniband/core/netlink.c:259
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
       sock_sendmsg_nosec net/socket.c:705 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:725
       sock_no_sendpage+0xf6/0x140 net/core/sock.c:3091
       kernel_sendpage.part.0+0x1a0/0x340 net/socket.c:3492
       kernel_sendpage net/socket.c:3489 [inline]
       sock_sendpage+0xe5/0x140 net/socket.c:1007
       pipe_to_sendpage+0x2ad/0x380 fs/splice.c:364
       splice_from_pipe_feed fs/splice.c:418 [inline]
       __splice_from_pipe+0x43e/0x8a0 fs/splice.c:562
       splice_from_pipe fs/splice.c:597 [inline]
       generic_splice_sendpage+0xd4/0x140 fs/splice.c:746
       do_splice_from fs/splice.c:767 [inline]
       do_splice+0xb7e/0x1960 fs/splice.c:1079
       __do_splice+0x134/0x250 fs/splice.c:1144
       __do_sys_splice fs/splice.c:1350 [inline]
       __se_sys_splice fs/splice.c:1332 [inline]
       __x64_sys_splice+0x198/0x250 fs/splice.c:1332
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #3 (&pipe->mutex/1){+.+.}-{3:3}:
       __mutex_lock_common kernel/locking/mutex.c:600 [inline]
       __mutex_lock+0x12f/0x12f0 kernel/locking/mutex.c:733
       pipe_lock_nested fs/pipe.c:82 [inline]
       pipe_lock+0x5a/0x70 fs/pipe.c:90
       iter_file_splice_write+0x15a/0xc10 fs/splice.c:635
       do_splice_from fs/splice.c:767 [inline]
       do_splice+0xb7e/0x1960 fs/splice.c:1079
       __do_splice+0x134/0x250 fs/splice.c:1144
       __do_sys_splice fs/splice.c:1350 [inline]
       __se_sys_splice fs/splice.c:1332 [inline]
       __x64_sys_splice+0x198/0x250 fs/splice.c:1332
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae

-> #2 (sb_writers#3){.+.+}-{0:0}:
       percpu_down_read include/linux/percpu-rwsem.h:51 [inline]
       __sb_start_write include/linux/fs.h:1722 [inline]
       sb_start_write include/linux/fs.h:1792 [inline]
       file_start_write include/linux/fs.h:2937 [inline]
       lo_write_bvec drivers/block/loop.c:242 [inline]
       lo_write_simple drivers/block/loop.c:265 [inline]
       do_req_filebacked drivers/block/loop.c:494 [inline]
       loop_handle_cmd drivers/block/loop.c:1853 [inline]
       loop_process_work+0x1499/0x1db0 drivers/block/loop.c:1893
       process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307
       worker_thread+0x657/0x1110 kernel/workqueue.c:2454
       kthread+0x2e9/0x3a0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

-> #1 ((work_completion)(&worker->work)){+.+.}-{0:0}:
       process_one_work+0x91b/0x1650 kernel/workqueue.c:2283
       worker_thread+0x657/0x1110 kernel/workqueue.c:2454
       kthread+0x2e9/0x3a0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

-> #0 ((wq_completion)loop1){+.+.}-{0:0}:
       check_prev_add kernel/locking/lockdep.c:3063 [inline]
       check_prevs_add kernel/locking/lockdep.c:3186 [inline]
       validate_chain kernel/locking/lockdep.c:3801 [inline]
       __lock_acquire+0x2a2c/0x5470 kernel/locking/lockdep.c:5027
       lock_acquire kernel/locking/lockdep.c:5639 [inline]
       lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5604
       flush_workqueue+0x110/0x13a0 kernel/workqueue.c:2827
       drain_workqueue+0x1a5/0x3c0 kernel/workqueue.c:2992
       destroy_workqueue+0x71/0x800 kernel/workqueue.c:4429
       __loop_clr_fd+0x19b/0xd80 drivers/block/loop.c:1118
       loop_rundown_workfn+0x6f/0x150 drivers/block/loop.c:1185
       process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307
       worker_thread+0x657/0x1110 kernel/workqueue.c:2454
       kthread+0x2e9/0x3a0 kernel/kthread.c:377
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

other info that might help us debug this:

Chain exists of:
  (wq_completion)loop1 --> (wq_completion)events_long --> (work_completion)(&lo->rundown_work)

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock((work_completion)(&lo->rundown_work));
                               lock((wq_completion)events_long);
                               lock((work_completion)(&lo->rundown_work));
  lock((wq_completion)loop1);

 *** DEADLOCK ***

2 locks held by kworker/0:7/17139:
 #0: ffff888010c73538 ((wq_completion)events_long){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: ffff888010c73538 ((wq_completion)events_long){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
 #0: ffff888010c73538 ((wq_completion)events_long){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1280 [inline]
 #0: ffff888010c73538 ((wq_completion)events_long){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:631 [inline]
 #0: ffff888010c73538 ((wq_completion)events_long){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:658 [inline]
 #0: ffff888010c73538 ((wq_completion)events_long){+.+.}-{0:0}, at: process_one_work+0x890/0x1650 kernel/workqueue.c:2278
 #1: ffffc9000fa07db8 ((work_completion)(&lo->rundown_work)){+.+.}-{0:0}, at: process_one_work+0x8c4/0x1650 kernel/workqueue.c:2282

stack backtrace:
CPU: 0 PID: 17139 Comm: kworker/0:7 Not tainted 5.17.0-rc2-syzkaller-00398-gd8ad2ce873ab #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events_long loop_rundown_workfn
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
 check_noncircular+0x25f/0x2e0 kernel/locking/lockdep.c:2143
 check_prev_add kernel/locking/lockdep.c:3063 [inline]
 check_prevs_add kernel/locking/lockdep.c:3186 [inline]
 validate_chain kernel/locking/lockdep.c:3801 [inline]
 __lock_acquire+0x2a2c/0x5470 kernel/locking/lockdep.c:5027
 lock_acquire kernel/locking/lockdep.c:5639 [inline]
 lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5604
 flush_workqueue+0x110/0x13a0 kernel/workqueue.c:2827
 drain_workqueue+0x1a5/0x3c0 kernel/workqueue.c:2992
 destroy_workqueue+0x71/0x800 kernel/workqueue.c:4429
 __loop_clr_fd+0x19b/0xd80 drivers/block/loop.c:1118
 loop_rundown_workfn+0x6f/0x150 drivers/block/loop.c:1185
 process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307
 worker_thread+0x657/0x1110 kernel/workqueue.c:2454
 kthread+0x2e9/0x3a0 kernel/kthread.c:377
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
 </TASK>
usb 4-1: new high-speed USB device number 63 using dummy_hcd
usb 4-1: New USB device found, idVendor=0af0, idProduct=d058, bcdDevice= 0.00
usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 4-1: Product: syz
usb 4-1: Manufacturer: syz
usb 4-1: SerialNumber: syz
usb 4-1: config 0 descriptor??
usb-storage 4-1:0.0: USB Mass Storage device detected
usb 1-1: new high-speed USB device number 103 using dummy_hcd
usb 1-1: New USB device found, idVendor=0af0, idProduct=d058, bcdDevice= 0.00
usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 1-1: Product: syz
usb 1-1: Manufacturer: syz
usb 1-1: SerialNumber: syz
usb 1-1: config 0 descriptor??
usb-storage 1-1:0.0: USB Mass Storage device detected
usb 4-1: USB disconnect, device number 65

Crashes (1):
Time Kernel Commit Syzkaller Config Log Report Syz repro C repro VM info Assets (help?) Manager Title
2022/02/06 19:21 upstream d8ad2ce873ab a7dab638 .config console log report info ci-upstream-kasan-gce-selinux-root possible deadlock in worker_thread
* Struck through repros no longer work on HEAD.