syzbot


possible deadlock in smc_release

Status: upstream: reported C repro on 2024/01/29 16:17
Bug presence: origin:upstream
Labels: missing-backport
[Documentation on labels]
Reported-by: syzbot+edf7d3f5c58c3780ff30@syzkaller.appspotmail.com
First crash: 147d, last: 12d
Bug presence (3)
Date Name Commit Repro Result
2024/06/12 linux-6.1.y (ToT) 88690811da69 C [report] possible deadlock in smc_release
2024/01/29 upstream (ToT) 41bccc98fb79 C [report] possible deadlock in smc_release
2024/06/11 upstream (ToT) 83a7eefedc9b C Didn't crash
Similar bugs (1)
Kernel Title Repro Cause bisect Fix bisect Count Last Reported Patched Status
upstream possible deadlock in smc_release net s390 C error 26 14d 143d 0/27 upstream: reported C repro on 2024/02/02 13:26
Fix bisection attempts (3)
Created Duration User Patch Repo Result
2024/05/14 06:47 1h03m bisect fix linux-6.1.y job log (0) log
2024/04/07 11:48 1h10m bisect fix linux-6.1.y job log (0) log
2024/03/08 02:31 2h05m bisect fix linux-6.1.y job log (0) log

Sample crash report:
======================================================
WARNING: possible circular locking dependency detected
6.1.75-syzkaller #0 Not tainted
------------------------------------------------------
syz-executor242/3541 is trying to acquire lock:
ffff888075a01450 ((work_completion)(&new_smc->smc_listen_work)){+.+.}-{0:0}, at: __flush_work+0xe5/0xad0 kernel/workqueue.c:3072

but task is already holding lock:
ffff888075a00130 (sk_lock-AF_SMC/1){+.+.}-{0:0}, at: smc_release+0x22d/0x530

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (sk_lock-AF_SMC/1){+.+.}-{0:0}:
       lock_acquire+0x1f8/0x5a0 kernel/locking/lockdep.c:5662
       lock_sock_nested+0x44/0x100 net/core/sock.c:3483
       smc_listen_out+0x113/0x3d0 net/smc/af_smc.c:1882
       process_one_work+0x8a9/0x11d0 kernel/workqueue.c:2292
       worker_thread+0xa47/0x1200 kernel/workqueue.c:2439
       kthread+0x28d/0x320 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306

-> #0 ((work_completion)(&new_smc->smc_listen_work)){+.+.}-{0:0}:
       check_prev_add kernel/locking/lockdep.c:3090 [inline]
       check_prevs_add kernel/locking/lockdep.c:3209 [inline]
       validate_chain+0x1661/0x5950 kernel/locking/lockdep.c:3825
       __lock_acquire+0x125b/0x1f80 kernel/locking/lockdep.c:5049
       lock_acquire+0x1f8/0x5a0 kernel/locking/lockdep.c:5662
       __flush_work+0xfe/0xad0 kernel/workqueue.c:3072
       __cancel_work_timer+0x519/0x6a0 kernel/workqueue.c:3163
       smc_clcsock_release+0x5e/0xe0 net/smc/smc_close.c:29
       __smc_release+0x678/0x7f0 net/smc/af_smc.c:300
       smc_close_non_accepted+0xd4/0x1e0 net/smc/af_smc.c:1816
       smc_close_cleanup_listen net/smc/smc_close.c:45 [inline]
       smc_close_active+0xa75/0xe20 net/smc/smc_close.c:225
       __smc_release+0xa0/0x7f0 net/smc/af_smc.c:276
       smc_release+0x2d5/0x530 net/smc/af_smc.c:343
       __sock_release net/socket.c:654 [inline]
       sock_close+0xcd/0x230 net/socket.c:1400
       __fput+0x3b7/0x890 fs/file_table.c:320
       task_work_run+0x246/0x300 kernel/task_work.c:179
       exit_task_work include/linux/task_work.h:38 [inline]
       do_exit+0xa73/0x26a0 kernel/exit.c:869
       do_group_exit+0x202/0x2b0 kernel/exit.c:1019
       __do_sys_exit_group kernel/exit.c:1030 [inline]
       __se_sys_exit_group kernel/exit.c:1028 [inline]
       __x64_sys_exit_group+0x3b/0x40 kernel/exit.c:1028
       do_syscall_x64 arch/x86/entry/common.c:51 [inline]
       do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:81
       entry_SYSCALL_64_after_hwframe+0x63/0xcd

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(sk_lock-AF_SMC/1);
                               lock((work_completion)(&new_smc->smc_listen_work));
                               lock(sk_lock-AF_SMC/1);
  lock((work_completion)(&new_smc->smc_listen_work));

 *** DEADLOCK ***

2 locks held by syz-executor242/3541:
 #0: ffff888076216810 (&sb->s_type->i_mutex_key#10){+.+.}-{3:3}, at: inode_lock include/linux/fs.h:756 [inline]
 #0: ffff888076216810 (&sb->s_type->i_mutex_key#10){+.+.}-{3:3}, at: __sock_release net/socket.c:653 [inline]
 #0: ffff888076216810 (&sb->s_type->i_mutex_key#10){+.+.}-{3:3}, at: sock_close+0x98/0x230 net/socket.c:1400
 #1: ffff888075a00130 (sk_lock-AF_SMC/1){+.+.}-{0:0}, at: smc_release+0x22d/0x530

stack backtrace:
CPU: 0 PID: 3541 Comm: syz-executor242 Not tainted 6.1.75-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1e3/0x2cb lib/dump_stack.c:106
 check_noncircular+0x2fa/0x3b0 kernel/locking/lockdep.c:2170
 check_prev_add kernel/locking/lockdep.c:3090 [inline]
 check_prevs_add kernel/locking/lockdep.c:3209 [inline]
 validate_chain+0x1661/0x5950 kernel/locking/lockdep.c:3825
 __lock_acquire+0x125b/0x1f80 kernel/locking/lockdep.c:5049
 lock_acquire+0x1f8/0x5a0 kernel/locking/lockdep.c:5662
 __flush_work+0xfe/0xad0 kernel/workqueue.c:3072
 __cancel_work_timer+0x519/0x6a0 kernel/workqueue.c:3163
 smc_clcsock_release+0x5e/0xe0 net/smc/smc_close.c:29
 __smc_release+0x678/0x7f0 net/smc/af_smc.c:300
 smc_close_non_accepted+0xd4/0x1e0 net/smc/af_smc.c:1816
 smc_close_cleanup_listen net/smc/smc_close.c:45 [inline]
 smc_close_active+0xa75/0xe20 net/smc/smc_close.c:225
 __smc_release+0xa0/0x7f0 net/smc/af_smc.c:276
 smc_release+0x2d5/0x530 net/smc/af_smc.c:343
 __sock_release net/socket.c:654 [inline]
 sock_close+0xcd/0x230 net/socket.c:1400
 __fput+0x3b7/0x890 fs/file_table.c:320
 task_work_run+0x246/0x300 kernel/task_work.c:179
 exit_task_work include/linux/task_work.h:38 [inline]
 do_exit+0xa73/0x26a0 kernel/exit.c:869
 do_group_exit+0x202/0x2b0 kernel/exit.c:1019
 __do_sys_exit_group kernel/exit.c:1030 [inline]
 __se_sys_exit_group kernel/exit.c:1028 [inline]
 __x64_sys_exit_group+0x3b/0x40 kernel/exit.c:1028
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:81
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fe063f15cf9
Code: Unable to access opcode bytes at 0x7fe063f15ccf.
RSP: 002b:00007fffa8dfcc18 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe063f15cf9
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
RBP: 00007fe063f902b0 R08: ffffffffffffffb8 R09: 00007fffa8dfce38
R10: 00007fffa8dfce38 R11: 0000000000000246 R12: 00007fe063f902b0
R13: 0000000000000000 R14: 00007fe063f90d00 R15: 00007fe063ee7a

Crashes (4):
Time Kernel Commit Syzkaller Config Log Report Syz repro C repro VM info Assets (help?) Manager Title
2024/01/29 20:22 linux-6.1.y 883d1a956208 991a98f4 .config console log report syz C [disk image] [vmlinux] [kernel image] ci2-linux-6-1-kasan possible deadlock in smc_release
2024/01/29 20:17 linux-6.1.y 883d1a956208 991a98f4 .config console log report syz C [disk image] [vmlinux] [kernel image] ci2-linux-6-1-kasan-arm64 possible deadlock in smc_release
2024/01/29 18:59 linux-6.1.y 883d1a956208 991a98f4 .config console log report info [disk image] [vmlinux] [kernel image] ci2-linux-6-1-kasan possible deadlock in smc_release
2024/01/29 16:16 linux-6.1.y 883d1a956208 991a98f4 .config console log report info [disk image] [vmlinux] [kernel image] ci2-linux-6-1-kasan-arm64 possible deadlock in smc_release
* Struck through repros no longer work on HEAD.