bisecting cause commit starting from e30cd812dffadc58241ae378e48728e6a161becd building syzkaller on 70b76c1d627711cc3ef109af16d6cb7429a61fe3 testing commit e30cd812dffadc58241ae378e48728e6a161becd compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.2 kernel signature: 9aec5d865b457fdfaa07fedf5aedd46854e88c7b01282ae1b683c58c84aca2ca all runs: crashed: possible deadlock in mptcp_close testing release v5.14 testing commit 7d2a07b769330c34b4deabeed939325c77a7ec2f compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.2 kernel signature: 8c219514f33cf5f7e995cffb6643e798eb9f0e406da2dbcbb7423680d5a3f503 all runs: OK # git bisect start e30cd812dffadc58241ae378e48728e6a161becd 7d2a07b769330c34b4deabeed939325c77a7ec2f Bisecting: 5929 revisions left to test after this (roughly 13 steps) [1b4f3dfb4792f03b139edf10124fcbeb44e608e6] Merge tag 'usb-serial-5.15-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-next testing commit 1b4f3dfb4792f03b139edf10124fcbeb44e608e6 compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.2 kernel signature: 2a21ff9c3775a869ed875a487dbb63bb481a58752b4a2427f39fc52291f831d7 all runs: OK # git bisect good 1b4f3dfb4792f03b139edf10124fcbeb44e608e6 Bisecting: 2995 revisions left to test after this (roughly 12 steps) [83ec91697412ae64d25dcca74597ed03029aa00d] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid testing commit 83ec91697412ae64d25dcca74597ed03029aa00d compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.2 kernel signature: 48dcc7d183c60f3eb9bb61e0a3d5af9e9c73b1a8128f5deef31f8ac76d49f586 run #0: boot failed: possible deadlock in blktrans_open run #1: OK run #2: OK run #3: OK run #4: OK run #5: OK run #6: OK run #7: OK run #8: OK run #9: OK # git bisect good 83ec91697412ae64d25dcca74597ed03029aa00d Bisecting: 1538 revisions left to test after this (roughly 11 steps) [a2b28235335fee2586b4bd16448fb59ed6c80eef] Merge branch 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging testing commit a2b28235335fee2586b4bd16448fb59ed6c80eef compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.2 kernel signature: 3fc844170f857db9edb86cd2cf9e7a12f2b440ef3c756b5c36007b0ca2866cca run #0: basic kernel testing failed: KFENCE: use-after-free in kvm_fastop_exception run #1: OK run #2: OK run #3: OK run #4: OK run #5: OK run #6: OK run #7: OK run #8: OK run #9: OK # git bisect good a2b28235335fee2586b4bd16448fb59ed6c80eef Bisecting: 799 revisions left to test after this (roughly 10 steps) [671028728083e856e9919221b109e3b2cd2ccc49] parisc: Implement __get/put_kernel_nofault() testing commit 671028728083e856e9919221b109e3b2cd2ccc49 compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.2 kernel signature: 9aa39f85a65e4b869af921c2e7b52656eb27d29b3252e96bc0150d257f5e8aad run #0: basic kernel testing failed: KFENCE: use-after-free in kvm_fastop_exception run #1: basic kernel testing failed: failed to copy test binary to VM: timedout ["scp" "-P" "22" "-F" "/dev/null" "-o" "UserKnownHostsFile=/dev/null" "-o" "BatchMode=yes" "-o" "IdentitiesOnly=yes" "-o" "StrictHostKeyChecking=no" "-o" "ConnectTimeout=10" "/syzkaller/jobs/linux/gopath/src/github.com/google/syzkaller/bin/linux_amd64/syz-fuzzer" "root@10.128.10.45:./syz-fuzzer"] Warning: Permanently added '10.128.10.45' (ECDSA) to the list of known hosts. run #2: OK run #3: OK run #4: OK run #5: OK run #6: OK run #7: OK run #8: OK run #9: OK # git bisect good 671028728083e856e9919221b109e3b2cd2ccc49 Bisecting: 394 revisions left to test after this (roughly 9 steps) [765092e4cdaa8439b969952ec4e6de3b84241f90] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input testing commit 765092e4cdaa8439b969952ec4e6de3b84241f90 compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.2 kernel signature: 1aa9f73e7ae8833acaef6cfb4f193b49f38161920d5be86bab45d190bbaa2a43 all runs: OK # git bisect good 765092e4cdaa8439b969952ec4e6de3b84241f90 Bisecting: 199 revisions left to test after this (roughly 8 steps) [7bf3142625c193db2dfbd7df2176b7cd910d9e4f] Merge tag 'timers_urgent_for_v5.15_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip testing commit 7bf3142625c193db2dfbd7df2176b7cd910d9e4f compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.2 kernel signature: cb68882ce5f2fa5a810f50d0c4867be4f9306285692cbad080562c6d392f71bc all runs: OK # git bisect good 7bf3142625c193db2dfbd7df2176b7cd910d9e4f Bisecting: 99 revisions left to test after this (roughly 7 steps) [b60cee5bae733f49ba33840804c159a8e474cfda] cpufreq: vexpress: Drop unused variable testing commit b60cee5bae733f49ba33840804c159a8e474cfda compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.2 kernel signature: 8cf8bea44d24572580c7ed4dcec5bd00a2b6c79985728c1803cd221a319dc90f all runs: OK # git bisect good b60cee5bae733f49ba33840804c159a8e474cfda Bisecting: 49 revisions left to test after this (roughly 6 steps) [427900d27d86b820c559037a984bd403f910860f] net: hns3: fix the timing issue of VF clearing interrupt sources testing commit 427900d27d86b820c559037a984bd403f910860f compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.2 kernel signature: a850b758cddd9cbbf39d1dec1638f57c282b7c38c6979aee4dfdffa3060372e1 all runs: OK # git bisect good 427900d27d86b820c559037a984bd403f910860f Bisecting: 24 revisions left to test after this (roughly 5 steps) [fc0c0548c1a2e676d3a928aaed70f2d4d254e395] Merge tag 'net-5.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net testing commit fc0c0548c1a2e676d3a928aaed70f2d4d254e395 compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.2 kernel signature: 0b81c4944af2f18fffaa30ca70d6d9a7cf69215ea91169041ed339a9ef1bcc15 all runs: OK # git bisect good fc0c0548c1a2e676d3a928aaed70f2d4d254e395 Bisecting: 12 revisions left to test after this (roughly 4 steps) [7237a494decfa17d0b9d0076e6cee3235719de90] enetc: Fix illegal access when reading affinity_hint testing commit 7237a494decfa17d0b9d0076e6cee3235719de90 compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.2 kernel signature: 29d32a8b00f1f573ac173867208d00307c16de7d93a18d4ee019c8dad3eff49a all runs: OK # git bisect good 7237a494decfa17d0b9d0076e6cee3235719de90 Bisecting: 5 revisions left to test after this (roughly 3 steps) [d614489f6bc8de28efb347ded5360896b87c9496] Merge branch 'ocelot-phylink-fixes' testing commit d614489f6bc8de28efb347ded5360896b87c9496 compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.2 kernel signature: 63e615a17e87e6db3baa5eeac22e2b8250b6b37e6eb733897d96e0d2c164d7f9 all runs: crashed: possible deadlock in mptcp_close # git bisect bad d614489f6bc8de28efb347ded5360896b87c9496 Bisecting: 3 revisions left to test after this (roughly 2 steps) [48e6d083b3aa006052db687fb26eeceef1d325b6] docs: net: dsa: sja1105: fix reference to sja1105.txt testing commit 48e6d083b3aa006052db687fb26eeceef1d325b6 compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.2 kernel signature: 29d32a8b00f1f573ac173867208d00307c16de7d93a18d4ee019c8dad3eff49a all runs: OK # git bisect good 48e6d083b3aa006052db687fb26eeceef1d325b6 Bisecting: 1 revision left to test after this (roughly 1 step) [163957c43d96c2409d9d9d2e94823f7300f6e52c] net: mscc: ocelot: remove buggy and useless write to ANA_PFC_PFC_CFG testing commit 163957c43d96c2409d9d9d2e94823f7300f6e52c compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.2 kernel signature: 63e615a17e87e6db3baa5eeac22e2b8250b6b37e6eb733897d96e0d2c164d7f9 all runs: crashed: possible deadlock in mptcp_close # git bisect bad 163957c43d96c2409d9d9d2e94823f7300f6e52c Bisecting: 0 revisions left to test after this (roughly 0 steps) [2dcb96bacce36021c2f3eaae0cef607b5bb71ede] net: core: Correct the sock::sk_lock.owned lockdep annotations testing commit 2dcb96bacce36021c2f3eaae0cef607b5bb71ede compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.2 kernel signature: 63e615a17e87e6db3baa5eeac22e2b8250b6b37e6eb733897d96e0d2c164d7f9 all runs: crashed: possible deadlock in mptcp_close # git bisect bad 2dcb96bacce36021c2f3eaae0cef607b5bb71ede 2dcb96bacce36021c2f3eaae0cef607b5bb71ede is the first bad commit commit 2dcb96bacce36021c2f3eaae0cef607b5bb71ede Author: Thomas Gleixner Date: Sat Sep 18 14:42:35 2021 +0200 net: core: Correct the sock::sk_lock.owned lockdep annotations lock_sock_fast() and lock_sock_nested() contain lockdep annotations for the sock::sk_lock.owned 'mutex'. sock::sk_lock.owned is not a regular mutex. It is just lockdep wise equivalent. In fact it's an open coded trivial mutex implementation with some interesting features. sock::sk_lock.slock is a regular spinlock protecting the 'mutex' representation sock::sk_lock.owned which is a plain boolean. If 'owned' is true, then some other task holds the 'mutex', otherwise it is uncontended. As this locking construct is obviously endangered by lock ordering issues as any other locking primitive it got lockdep annotated via a dedicated dependency map sock::sk_lock.dep_map which has to be updated at the lock and unlock sites. lock_sock_nested() is a straight forward 'mutex' lock operation: might_sleep(); spin_lock_bh(sock::sk_lock.slock) while (!try_lock(sock::sk_lock.owned)) { spin_unlock_bh(sock::sk_lock.slock); wait_for_release(); spin_lock_bh(sock::sk_lock.slock); } The lockdep annotation for sock::sk_lock.owned is for unknown reasons _after_ the lock has been acquired, i.e. after the code block above and after releasing sock::sk_lock.slock, but inside the bottom halves disabled region: spin_unlock(sock::sk_lock.slock); mutex_acquire(&sk->sk_lock.dep_map, subclass, 0, _RET_IP_); local_bh_enable(); The placement after the unlock is obvious because otherwise the mutex_acquire() would nest into the spin lock held region. But that's from the lockdep perspective still the wrong place: 1) The mutex_acquire() is issued _after_ the successful acquisition which is pointless because in a dead lock scenario this point is never reached which means that if the deadlock is the first instance of exposing the wrong lock order lockdep does not have a chance to detect it. 2) It only works because lockdep is rather lax on the context from which the mutex_acquire() is issued. Acquiring a mutex inside a bottom halves and therefore non-preemptible region is obviously invalid, except for a trylock which is clearly not the case here. This 'works' stops working on RT enabled kernels where the bottom halves serialization is done via a local lock, which exposes this misplacement because the 'mutex' and the local lock nest the wrong way around and lockdep complains rightfully about a lock inversion. The placement is wrong since the initial commit a5b5bb9a053a ("[PATCH] lockdep: annotate sk_locks") which introduced this. Fix it by moving the mutex_acquire() in front of the actual lock acquisition, which is what the regular mutex_lock() operation does as well. lock_sock_fast() is not that straight forward. It looks at the first glance like a convoluted trylock operation: spin_lock_bh(sock::sk_lock.slock) if (!sock::sk_lock.owned) return false; while (!try_lock(sock::sk_lock.owned)) { spin_unlock_bh(sock::sk_lock.slock); wait_for_release(); spin_lock_bh(sock::sk_lock.slock); } spin_unlock(sock::sk_lock.slock); mutex_acquire(&sk->sk_lock.dep_map, subclass, 0, _RET_IP_); local_bh_enable(); return true; But that's not the case: lock_sock_fast() is an interesting optimization for short critical sections which can run with bottom halves disabled and sock::sk_lock.slock held. This allows to shortcut the 'mutex' operation in the non contended case by preventing other lockers to acquire sock::sk_lock.owned because they are blocked on sock::sk_lock.slock, which in turn avoids the overhead of doing the heavy processing in release_sock() including waking up wait queue waiters. In the contended case, i.e. when sock::sk_lock.owned == true the behavior is the same as lock_sock_nested(). Semantically this shortcut means, that the task acquired the 'mutex' even if it does not touch the sock::sk_lock.owned field in the non-contended case. Not telling lockdep about this shortcut acquisition is hiding potential lock ordering violations in the fast path. As a consequence the same reasoning as for the above lock_sock_nested() case vs. the placement of the lockdep annotation applies. The current placement of the lockdep annotation was just copied from the original lock_sock(), now renamed to lock_sock_nested(), implementation. Fix this by moving the mutex_acquire() in front of the actual lock acquisition and adding the corresponding mutex_release() into unlock_sock_fast(). Also document the fast path return case with a comment. Reported-by: Sebastian Siewior Signed-off-by: Thomas Gleixner Cc: netdev@vger.kernel.org Cc: "David S. Miller" Cc: Jakub Kicinski Cc: Eric Dumazet Signed-off-by: David S. Miller include/net/sock.h | 1 + net/core/sock.c | 37 +++++++++++++++++++++++-------------- 2 files changed, 24 insertions(+), 14 deletions(-) culprit signature: 63e615a17e87e6db3baa5eeac22e2b8250b6b37e6eb733897d96e0d2c164d7f9 parent signature: 29d32a8b00f1f573ac173867208d00307c16de7d93a18d4ee019c8dad3eff49a revisions tested: 16, total time: 4h26m9.374119677s (build: 1h52m46.952038379s, test: 2h31m56.478886593s) first bad commit: 2dcb96bacce36021c2f3eaae0cef607b5bb71ede net: core: Correct the sock::sk_lock.owned lockdep annotations recipients (to): ["davem@davemloft.net" "davem@davemloft.net" "kuba@kernel.org" "netdev@vger.kernel.org" "tglx@linutronix.de"] recipients (cc): ["edumazet@google.com" "fw@strlen.de" "linux-kernel@vger.kernel.org" "pabeni@redhat.com"] crash: possible deadlock in mptcp_close MPTCP: kernel_bind error, err=-98 ============================================ WARNING: possible recursive locking detected 5.15.0-rc1-syzkaller #0 Not tainted -------------------------------------------- syz-executor.2/9164 is trying to acquire lock: ffff888071863be0 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x211/0x670 net/mptcp/protocol.c:2738 but task is already holding lock: ffff88807a998c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1612 [inline] ffff88807a998c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x1b/0x670 net/mptcp/protocol.c:2720 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(k-sk_lock-AF_INET); lock(k-sk_lock-AF_INET); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by syz-executor.2/9164: #0: ffffffff8c2b6e90 (cb_lock){++++}-{3:3}, at: genl_rcv+0x10/0x30 net/netlink/genetlink.c:802 #1: ffffffff8c2b6f48 (genl_mutex){+.+.}-{3:3}, at: genl_lock net/netlink/genetlink.c:33 [inline] #1: ffffffff8c2b6f48 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg+0x315/0x4a0 net/netlink/genetlink.c:790 #2: ffff88807a998c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1612 [inline] #2: ffff88807a998c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x1b/0x670 net/mptcp/protocol.c:2720 stack backtrace: CPU: 0 PID: 9164 Comm: syz-executor.2 Not tainted 5.15.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x57/0x7d lib/dump_stack.c:106 print_deadlock_bug kernel/locking/lockdep.c:2944 [inline] check_deadlock kernel/locking/lockdep.c:2987 [inline] validate_chain kernel/locking/lockdep.c:3776 [inline] __lock_acquire.cold+0x229/0x3ab kernel/locking/lockdep.c:5015 lock_acquire kernel/locking/lockdep.c:5625 [inline] lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5590 lock_sock_fast+0x2d/0xe0 net/core/sock.c:3229 mptcp_close+0x211/0x670 net/mptcp/protocol.c:2738 inet_release+0xef/0x210 net/ipv4/af_inet.c:431 __sock_release net/socket.c:649 [inline] sock_release+0x7d/0x190 net/socket.c:677 mptcp_pm_nl_create_listen_socket+0x1f7/0x270 net/mptcp/pm_netlink.c:900 mptcp_nl_cmd_add_addr+0x2e2/0x790 net/mptcp/pm_netlink.c:1170 genl_family_rcv_msg_doit+0x1e4/0x2f0 net/netlink/genetlink.c:731 genl_family_rcv_msg net/netlink/genetlink.c:775 [inline] genl_rcv_msg+0x27a/0x4a0 net/netlink/genetlink.c:792 netlink_rcv_skb+0x118/0x370 net/netlink/af_netlink.c:2504 genl_rcv+0x1f/0x30 net/netlink/genetlink.c:803 netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline] netlink_unicast+0x42e/0x700 net/netlink/af_netlink.c:1340 netlink_sendmsg+0x704/0xbf0 net/netlink/af_netlink.c:1929 sock_sendmsg_nosec net/socket.c:704 [inline] sock_sendmsg+0xab/0xe0 net/socket.c:724 sock_no_sendpage+0xf5/0x140 net/core/sock.c:2980 kernel_sendpage.part.0+0x11e/0x240 net/socket.c:3504 kernel_sendpage net/socket.c:3501 [inline] sock_sendpage+0xc7/0x1a0 net/socket.c:1003 pipe_to_sendpage+0x245/0x410 fs/splice.c:364 splice_from_pipe_feed fs/splice.c:418 [inline] __splice_from_pipe+0x362/0x810 fs/splice.c:562 splice_from_pipe fs/splice.c:597 [inline] generic_splice_sendpage+0xba/0x120 fs/splice.c:746 do_splice_from fs/splice.c:767 [inline] direct_splice_actor+0xfb/0x1c0 fs/splice.c:936 splice_direct_to_actor+0x2dd/0x7c0 fs/splice.c:891 do_splice_direct+0x154/0x260 fs/splice.c:979 do_sendfile+0x91e/0x1120 fs/read_write.c:1249 __do_sys_sendfile64 fs/read_write.c:1314 [inline] __se_sys_sendfile64 fs/read_write.c:1300 [inline] __x64_sys_sendfile64+0x186/0x1d0 fs/read_write.c:1300 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f8fd3c23739 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f8fd339a188 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 RAX: ffffffffffffffda RBX: 00007f8fd3d27f80 RCX: 00007f8fd3c23739 RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000005 RBP: 00007f8fd3c7dcc4 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000100000002 R11: 0000000000000246 R12: 00007f8fd3d27f80 R13: 00007ffda53d2e6f R14: 00007f8fd339a300 R15: 0000000000022000