syzbot


possible deadlock in refcount_dec_and_mutex_lock

Status: fixed on 2020/01/28 16:29
Reported-by: syzbot+b79c9afb5ab8b3173030@syzkaller.appspotmail.com
Fix commit: e83a26a49356 nbd: fix shutdown and recv work deadlock v2
First crash: 1845d, last: 1789d
Cause bisection: introduced by (bisect log) :
commit 9f0f39c92e4f50189155dfb13bb5524372e40eba
Author: Mike Christie <mchristi@redhat.com>
Date: Sun Aug 4 19:10:06 2019 +0000

  nbd: fix max number of supported devs

Crash: possible deadlock in refcount_dec_and_mutex_lock (log)
Repro: C syz .config
  
Fix bisection: fixed by (bisect log) :
commit e83a26a49356a3dbd4f54102abe17fc594643698
Author: Mike Christie <mchristi@redhat.com>
Date: Sun Dec 8 22:51:50 2019 +0000

  nbd: fix shutdown and recv work deadlock v2

  
Similar bugs (2)
Kernel Title Repro Cause bisect Fix bisect Count Last Reported Patched Status
upstream possible deadlock in refcount_dec_and_mutex_lock nbd 1 1888d 1884d 0/28 auto-closed as invalid on 2019/11/20 04:39
linux-4.14 possible deadlock in refcount_dec_and_mutex_lock C done 10 1792d 1864d 1/1 fixed on 2020/01/25 22:00

Sample crash report:
block nbd0: shutting down sockets
block nbd0: Receive control failed (result -107)
block nbd0: shutting down sockets
block nbd0: Receive control failed (result -107)
======================================================
WARNING: possible circular locking dependency detected
4.19.89-syzkaller #0 Not tainted
------------------------------------------------------
kworker/u5:0/1204 is trying to acquire lock:
00000000bded6ee3 (&nbd->config_lock){+.+.}, at: refcount_dec_and_mutex_lock lib/refcount.c:311 [inline]
00000000bded6ee3 (&nbd->config_lock){+.+.}, at: refcount_dec_and_mutex_lock+0x52/0x80 lib/refcount.c:306

but task is already holding lock:
00000000d3fb2208 ((work_completion)(&args->work)){+.+.}, at: process_one_work+0x8b4/0x1750 kernel/workqueue.c:2128

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 ((work_completion)(&args->work)){+.+.}:
       process_one_work+0x90f/0x1750 kernel/workqueue.c:2129
       worker_thread+0x98/0xe40 kernel/workqueue.c:2296
       kthread+0x354/0x420 kernel/kthread.c:246
       ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:415

-> #1 ((wq_completion)"knbd%d-recv"nbd->index){+.+.}:
       flush_workqueue+0x126/0x14b0 kernel/workqueue.c:2655
       drain_workqueue+0x1b4/0x470 kernel/workqueue.c:2820
       destroy_workqueue+0x21/0x6b0 kernel/workqueue.c:4158
       nbd_config_put+0x3cf/0x860 drivers/block/nbd.c:1161
       nbd_release+0xf6/0x140 drivers/block/nbd.c:1436
       __blkdev_put+0x4d1/0x810 fs/block_dev.c:1804
       blkdev_put+0x98/0x560 fs/block_dev.c:1866
       blkdev_close+0x8b/0xb0 fs/block_dev.c:1873
       __fput+0x2dd/0x8b0 fs/file_table.c:278
       ____fput+0x16/0x20 fs/file_table.c:309
       task_work_run+0x145/0x1c0 kernel/task_work.c:113
       tracehook_notify_resume include/linux/tracehook.h:193 [inline]
       exit_to_usermode_loop+0x273/0x2c0 arch/x86/entry/common.c:167
       prepare_exit_to_usermode arch/x86/entry/common.c:198 [inline]
       syscall_return_slowpath arch/x86/entry/common.c:271 [inline]
       do_syscall_64+0x53d/0x620 arch/x86/entry/common.c:296
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

-> #0 (&nbd->config_lock){+.+.}:
       lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3903
       __mutex_lock_common kernel/locking/mutex.c:925 [inline]
       __mutex_lock+0xf7/0x1300 kernel/locking/mutex.c:1072
       mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1087
       refcount_dec_and_mutex_lock lib/refcount.c:311 [inline]
       refcount_dec_and_mutex_lock+0x52/0x80 lib/refcount.c:306
       nbd_config_put+0x31/0x860 drivers/block/nbd.c:1139
       recv_work+0x19b/0x200 drivers/block/nbd.c:745
       process_one_work+0x989/0x1750 kernel/workqueue.c:2153
       worker_thread+0x98/0xe40 kernel/workqueue.c:2296
       kthread+0x354/0x420 kernel/kthread.c:246
       ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:415

other info that might help us debug this:

Chain exists of:
  &nbd->config_lock --> (wq_completion)"knbd%d-recv"nbd->index --> (work_completion)(&args->work)

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock((work_completion)(&args->work));
                               lock((wq_completion)"knbd%d-recv"nbd->index);
                               lock((work_completion)(&args->work));
  lock(&nbd->config_lock);

 *** DEADLOCK ***

2 locks held by kworker/u5:0/1204:
 #0: 00000000b1fdfef6 ((wq_completion)"knbd%d-recv"nbd->index){+.+.}, at: __write_once_size include/linux/compiler.h:220 [inline]
 #0: 00000000b1fdfef6 ((wq_completion)"knbd%d-recv"nbd->index){+.+.}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: 00000000b1fdfef6 ((wq_completion)"knbd%d-recv"nbd->index){+.+.}, at: atomic64_set include/asm-generic/atomic-instrumented.h:40 [inline]
 #0: 00000000b1fdfef6 ((wq_completion)"knbd%d-recv"nbd->index){+.+.}, at: atomic_long_set include/asm-generic/atomic-long.h:59 [inline]
 #0: 00000000b1fdfef6 ((wq_completion)"knbd%d-recv"nbd->index){+.+.}, at: set_work_data kernel/workqueue.c:617 [inline]
 #0: 00000000b1fdfef6 ((wq_completion)"knbd%d-recv"nbd->index){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
 #0: 00000000b1fdfef6 ((wq_completion)"knbd%d-recv"nbd->index){+.+.}, at: process_one_work+0x87e/0x1750 kernel/workqueue.c:2124
 #1: 00000000d3fb2208 ((work_completion)(&args->work)){+.+.}, at: process_one_work+0x8b4/0x1750 kernel/workqueue.c:2128

stack backtrace:
CPU: 1 PID: 1204 Comm: kworker/u5:0 Not tainted 4.19.89-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: knbd0-recv recv_work
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x197/0x210 lib/dump_stack.c:118
 print_circular_bug.isra.0.cold+0x1cc/0x28f kernel/locking/lockdep.c:1221
 check_prev_add kernel/locking/lockdep.c:1861 [inline]
 check_prevs_add kernel/locking/lockdep.c:1974 [inline]
 validate_chain kernel/locking/lockdep.c:2415 [inline]
 __lock_acquire+0x2e19/0x49c0 kernel/locking/lockdep.c:3411
 lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:3903
 __mutex_lock_common kernel/locking/mutex.c:925 [inline]
 __mutex_lock+0xf7/0x1300 kernel/locking/mutex.c:1072
 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1087
 refcount_dec_and_mutex_lock lib/refcount.c:311 [inline]
 refcount_dec_and_mutex_lock+0x52/0x80 lib/refcount.c:306
 nbd_config_put+0x31/0x860 drivers/block/nbd.c:1139
 recv_work+0x19b/0x200 drivers/block/nbd.c:745
 process_one_work+0x989/0x1750 kernel/workqueue.c:2153
 worker_thread+0x98/0xe40 kernel/workqueue.c:2296
 kthread+0x354/0x420 kernel/kthread.c:246
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:415
block nbd0: shutting down sockets

Crashes (5):
Time Kernel Commit Syzkaller Config Log Report Syz repro C repro VM info Assets (help?) Manager Title
2019/12/14 16:11 linux-4.19.y 312017a460d5 eef6e580 .config console log report syz C ci2-linux-4-19
2019/11/03 11:49 linux-4.19.y ef244c308885 c9610487 .config console log report syz C ci2-linux-4-19
2019/12/29 08:46 linux-4.19.y 672481c2deff af6b8ef8 .config console log report ci2-linux-4-19
2019/12/15 16:16 linux-4.19.y 312017a460d5 eef6e580 .config console log report ci2-linux-4-19
2019/11/18 12:02 linux-4.19.y c555efaf1402 d5696d51 .config console log report ci2-linux-4-19
* Struck through repros no longer work on HEAD.