bisecting cause commit starting from 9d1bc24b52fb8c5d859f9a47084bf1179470e04c building syzkaller on 429efa16d6ca7fd282a93c614ef97612f9c9bf62 testing commit 9d1bc24b52fb8c5d859f9a47084bf1179470e04c with gcc (GCC) 8.1.0 run #0: crashed: possible deadlock in xsk_notifier run #1: crashed: possible deadlock in rtnl_lock run #2: crashed: possible deadlock in xsk_notifier run #3: crashed: possible deadlock in xsk_notifier run #4: crashed: possible deadlock in rtnl_lock run #5: crashed: possible deadlock in xsk_notifier run #6: crashed: possible deadlock in rtnl_lock run #7: crashed: possible deadlock in rtnl_lock run #8: crashed: possible deadlock in xsk_notifier run #9: crashed: possible deadlock in rtnl_lock testing release v5.1 testing commit e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd with gcc (GCC) 8.1.0 all runs: OK # git bisect start 9d1bc24b52fb8c5d859f9a47084bf1179470e04c v5.1 Bisecting: 6873 revisions left to test after this (roughly 13 steps) [a2d635decbfa9c1e4ae15cb05b68b2559f7f827c] Merge tag 'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drm testing commit a2d635decbfa9c1e4ae15cb05b68b2559f7f827c with gcc (GCC) 8.1.0 all runs: OK # git bisect good a2d635decbfa9c1e4ae15cb05b68b2559f7f827c Bisecting: 3693 revisions left to test after this (roughly 12 steps) [22c58fd70ca48a29505922b1563826593b08cc00] Merge tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc testing commit 22c58fd70ca48a29505922b1563826593b08cc00 with gcc (GCC) 8.1.0 all runs: OK # git bisect good 22c58fd70ca48a29505922b1563826593b08cc00 Bisecting: 1846 revisions left to test after this (roughly 11 steps) [7fbc78e3155a0c464bd832efc07fb3c2355fe9bd] Merge tag 'for-linus-20190524' of git://git.kernel.dk/linux-block testing commit 7fbc78e3155a0c464bd832efc07fb3c2355fe9bd with gcc (GCC) 8.1.0 all runs: OK # git bisect good 7fbc78e3155a0c464bd832efc07fb3c2355fe9bd Bisecting: 921 revisions left to test after this (roughly 10 steps) [9331b6740f86163908de69f4008e434fe0c27691] Merge tag 'spdx-5.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core testing commit 9331b6740f86163908de69f4008e434fe0c27691 with gcc (GCC) 8.1.0 all runs: OK # git bisect good 9331b6740f86163908de69f4008e434fe0c27691 Bisecting: 460 revisions left to test after this (roughly 9 steps) [fdbf6326912d578a31ac4ca0933c919eadf1d54c] net/af_iucv: remove GFP_DMA restriction for HiperTransport testing commit fdbf6326912d578a31ac4ca0933c919eadf1d54c with gcc (GCC) 8.1.0 all runs: OK # git bisect good fdbf6326912d578a31ac4ca0933c919eadf1d54c Bisecting: 230 revisions left to test after this (roughly 8 steps) [0728f6c3cab107f0aab2c8ded1292dd2cc41a228] Merge tag 'drm-fixes-2019-06-21' of git://anongit.freedesktop.org/drm/drm testing commit 0728f6c3cab107f0aab2c8ded1292dd2cc41a228 with gcc (GCC) 8.1.0 all runs: OK # git bisect good 0728f6c3cab107f0aab2c8ded1292dd2cc41a228 Bisecting: 114 revisions left to test after this (roughly 7 steps) [fe2da896fd9469317ff693fb08a86d9c435e101a] Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc testing commit fe2da896fd9469317ff693fb08a86d9c435e101a with gcc (GCC) 8.1.0 run #0: basic kernel testing failed: failed to copy test binary to VM: failed to run ["scp" "-P" "22" "-F" "/dev/null" "-o" "UserKnownHostsFile=/dev/null" "-o" "BatchMode=yes" "-o" "IdentitiesOnly=yes" "-o" "StrictHostKeyChecking=no" "-o" "ConnectTimeout=10" "-i" "/syzkaller/jobs/linux/workdir/image/key" "/tmp/syz-executor020537289" "root@10.128.15.200:./syz-executor020537289"]: exit status 1 ssh: connect to host 10.128.15.200 port 22: Connection timed out lost connection run #1: OK run #2: OK run #3: OK run #4: OK run #5: OK run #6: OK run #7: OK run #8: OK run #9: OK # git bisect good fe2da896fd9469317ff693fb08a86d9c435e101a Bisecting: 57 revisions left to test after this (roughly 6 steps) [3c91f25c2f72ba6001775a5932857c1d2131c531] bnx2x: Prevent ptp_task to be rescheduled indefinitely testing commit 3c91f25c2f72ba6001775a5932857c1d2131c531 with gcc (GCC) 8.1.0 all runs: OK # git bisect good 3c91f25c2f72ba6001775a5932857c1d2131c531 Bisecting: 28 revisions left to test after this (roughly 5 steps) [3d26eb8ad1e9b906433903ce05f775cf038e747f] net: bridge: don't cache ether dest pointer on input testing commit 3d26eb8ad1e9b906433903ce05f775cf038e747f with gcc (GCC) 8.1.0 all runs: OK # git bisect good 3d26eb8ad1e9b906433903ce05f775cf038e747f Bisecting: 14 revisions left to test after this (roughly 4 steps) [455302d1c9ae9318660aaeb9748a01ff414c9741] xdp: fix hang while unregistering device bound to xdp socket testing commit 455302d1c9ae9318660aaeb9748a01ff414c9741 with gcc (GCC) 8.1.0 run #0: crashed: possible deadlock in xsk_notifier run #1: crashed: possible deadlock in xsk_notifier run #2: crashed: possible deadlock in rtnl_lock run #3: crashed: possible deadlock in rtnl_lock run #4: crashed: possible deadlock in xsk_notifier run #5: crashed: possible deadlock in xsk_notifier run #6: crashed: possible deadlock in rtnl_lock run #7: crashed: possible deadlock in rtnl_lock run #8: crashed: possible deadlock in rtnl_lock run #9: crashed: possible deadlock in rtnl_lock # git bisect bad 455302d1c9ae9318660aaeb9748a01ff414c9741 Bisecting: 6 revisions left to test after this (roughly 3 steps) [75672dda27bd00109a84cd975c17949ad9c45663] bpf: fix BPF_ALU32 | BPF_ARSH on BE arches testing commit 75672dda27bd00109a84cd975c17949ad9c45663 with gcc (GCC) 8.1.0 all runs: OK # git bisect good 75672dda27bd00109a84cd975c17949ad9c45663 Bisecting: 3 revisions left to test after this (roughly 2 steps) [6fa632e719eec4d1b1ebf3ddc0b2d667997b057b] bpf, x32: Fix bug with ALU64 {LSH, RSH, ARSH} BPF_K shift by 0 testing commit 6fa632e719eec4d1b1ebf3ddc0b2d667997b057b with gcc (GCC) 8.1.0 all runs: OK # git bisect good 6fa632e719eec4d1b1ebf3ddc0b2d667997b057b Bisecting: 1 revision left to test after this (roughly 1 step) [11aca65ec4db09527d3e9b6b41a0615b7da4386b] selftests: bpf: fix inlines in test_lwt_seg6local testing commit 11aca65ec4db09527d3e9b6b41a0615b7da4386b with gcc (GCC) 8.1.0 all runs: OK # git bisect good 11aca65ec4db09527d3e9b6b41a0615b7da4386b Bisecting: 0 revisions left to test after this (roughly 0 steps) [162c820ed8965bf94d2685f97388aea5aee9e258] xdp: hold device for umem regardless of zero-copy mode testing commit 162c820ed8965bf94d2685f97388aea5aee9e258 with gcc (GCC) 8.1.0 all runs: OK # git bisect good 162c820ed8965bf94d2685f97388aea5aee9e258 455302d1c9ae9318660aaeb9748a01ff414c9741 is the first bad commit commit 455302d1c9ae9318660aaeb9748a01ff414c9741 Author: Ilya Maximets Date: Fri Jun 28 11:04:07 2019 +0300 xdp: fix hang while unregistering device bound to xdp socket Device that bound to XDP socket will not have zero refcount until the userspace application will not close it. This leads to hang inside 'netdev_wait_allrefs()' if device unregistering requested: # ip link del p1 < hang on recvmsg on netlink socket > # ps -x | grep ip 5126 pts/0 D+ 0:00 ip link del p1 # journalctl -b Jun 05 07:19:16 kernel: unregister_netdevice: waiting for p1 to become free. Usage count = 1 Jun 05 07:19:27 kernel: unregister_netdevice: waiting for p1 to become free. Usage count = 1 ... Fix that by implementing NETDEV_UNREGISTER event notification handler to properly clean up all the resources and unref device. This should also allow socket killing via ss(8) utility. Fixes: 965a99098443 ("xsk: add support for bind for Rx") Signed-off-by: Ilya Maximets Acked-by: Jonathan Lemon Signed-off-by: Daniel Borkmann :040000 040000 4798b9553c0bbcae5985146d6092c97cba26dde0 c258884c3ba27b62bdf579528178d17c4806a6d9 M include :040000 040000 0de70932488b132a54376c91590b88235ba74ee0 6d588f166b1b175afea3330f5b2258f94c7f49c8 M net revisions tested: 16, total time: 4h22m36.557960776s (build: 1h29m56.318130252s, test: 2h47m40.216882581s) first bad commit: 455302d1c9ae9318660aaeb9748a01ff414c9741 xdp: fix hang while unregistering device bound to xdp socket cc: ["ast@kernel.org" "bjorn.topel@intel.com" "bpf@vger.kernel.org" "daniel@iogearbox.net" "davem@davemloft.net" "hawk@kernel.org" "i.maximets@samsung.com" "jakub.kicinski@netronome.com" "john.fastabend@gmail.com" "jonathan.lemon@gmail.com" "linux-kernel@vger.kernel.org" "magnus.karlsson@intel.com" "netdev@vger.kernel.org" "xdp-newbies@vger.kernel.org"] crash: possible deadlock in rtnl_lock ====================================================== WARNING: possible circular locking dependency detected 5.2.0-rc3+ #1 Not tainted ------------------------------------------------------ syz-executor.1/7519 is trying to acquire lock: 000000006e793160 (rtnl_mutex){+.+.}, at: rtnl_lock+0x12/0x20 net/core/rtnetlink.c:72 but task is already holding lock: 000000000d55e293 (&xs->mutex){+.+.}, at: xsk_bind+0x122/0x1180 net/xdp/xsk.c:422 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&xs->mutex){+.+.}: __mutex_lock_common kernel/locking/mutex.c:926 [inline] __mutex_lock+0xf5/0x1210 kernel/locking/mutex.c:1073 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088 xsk_notifier+0x100/0x240 net/xdp/xsk.c:730 notifier_call_chain+0x8a/0x160 kernel/notifier.c:95 __raw_notifier_call_chain kernel/notifier.c:396 [inline] raw_notifier_call_chain+0x11/0x20 kernel/notifier.c:403 call_netdevice_notifiers_info+0x28/0x60 net/core/dev.c:1749 call_netdevice_notifiers_extack net/core/dev.c:1761 [inline] call_netdevice_notifiers net/core/dev.c:1775 [inline] rollback_registered_many+0x78e/0xce0 net/core/dev.c:8206 rollback_registered+0xdc/0x190 net/core/dev.c:8248 unregister_netdevice_queue+0x186/0x240 net/core/dev.c:9295 br_dev_delete+0x127/0x190 net/bridge/br_if.c:383 br_del_bridge+0x9a/0xe0 net/bridge/br_if.c:483 br_ioctl_deviceless_stub+0x21f/0x640 net/bridge/br_ioctl.c:376 sock_ioctl+0x367/0x600 net/socket.c:1141 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:509 [inline] do_vfs_ioctl+0x196/0x10c0 fs/ioctl.c:696 ksys_ioctl+0x62/0x90 fs/ioctl.c:713 __do_sys_ioctl fs/ioctl.c:720 [inline] __se_sys_ioctl fs/ioctl.c:718 [inline] __x64_sys_ioctl+0x6e/0xb0 fs/ioctl.c:718 do_syscall_64+0xd0/0x530 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #1 (&net->xdp.lock){+.+.}: __mutex_lock_common kernel/locking/mutex.c:926 [inline] __mutex_lock+0xf5/0x1210 kernel/locking/mutex.c:1073 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088 xsk_notifier+0x72/0x240 net/xdp/xsk.c:726 notifier_call_chain+0x8a/0x160 kernel/notifier.c:95 __raw_notifier_call_chain kernel/notifier.c:396 [inline] raw_notifier_call_chain+0x11/0x20 kernel/notifier.c:403 call_netdevice_notifiers_info+0x28/0x60 net/core/dev.c:1749 call_netdevice_notifiers_extack net/core/dev.c:1761 [inline] call_netdevice_notifiers net/core/dev.c:1775 [inline] rollback_registered_many+0x78e/0xce0 net/core/dev.c:8206 unregister_netdevice_many+0x3e/0x1f0 net/core/dev.c:9314 ip6gre_exit_batch_net+0x3ad/0x5b0 net/ipv6/ip6_gre.c:1603 ops_exit_list.isra.5+0xd3/0x120 net/core/net_namespace.c:157 cleanup_net+0x363/0x850 net/core/net_namespace.c:553 process_one_work+0x830/0x16a0 kernel/workqueue.c:2269 worker_thread+0x85/0xb60 kernel/workqueue.c:2415 kthread+0x324/0x3e0 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 -> #0 (rtnl_mutex){+.+.}: lock_acquire+0x173/0x3d0 kernel/locking/lockdep.c:4303 __mutex_lock_common kernel/locking/mutex.c:926 [inline] __mutex_lock+0xf5/0x1210 kernel/locking/mutex.c:1073 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088 rtnl_lock+0x12/0x20 net/core/rtnetlink.c:72 xdp_umem_assign_dev+0xa5/0xb20 net/xdp/xdp_umem.c:96 xsk_bind+0x49c/0x1180 net/xdp/xsk.c:488 __sys_bind+0x1e1/0x230 net/socket.c:1653 __do_sys_bind net/socket.c:1664 [inline] __se_sys_bind net/socket.c:1662 [inline] __x64_sys_bind+0x6e/0xb0 net/socket.c:1662 do_syscall_64+0xd0/0x530 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Chain exists of: rtnl_mutex --> &net->xdp.lock --> &xs->mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&xs->mutex); lock(&net->xdp.lock); lock(&xs->mutex); lock(rtnl_mutex); *** DEADLOCK *** 1 lock held by syz-executor.1/7519: #0: 000000000d55e293 (&xs->mutex){+.+.}, at: xsk_bind+0x122/0x1180 net/xdp/xsk.c:422 stack backtrace: CPU: 1 PID: 7519 Comm: syz-executor.1 Not tainted 5.2.0-rc3+ #1 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x113/0x167 lib/dump_stack.c:113 print_circular_bug.cold.59+0x1bd/0x27d kernel/locking/lockdep.c:1565 check_prev_add kernel/locking/lockdep.c:2310 [inline] check_prevs_add kernel/locking/lockdep.c:2418 [inline] validate_chain kernel/locking/lockdep.c:2800 [inline] __lock_acquire+0x3853/0x55b0 kernel/locking/lockdep.c:3793 lock_acquire+0x173/0x3d0 kernel/locking/lockdep.c:4303 __mutex_lock_common kernel/locking/mutex.c:926 [inline] __mutex_lock+0xf5/0x1210 kernel/locking/mutex.c:1073 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088 rtnl_lock+0x12/0x20 net/core/rtnetlink.c:72 xdp_umem_assign_dev+0xa5/0xb20 net/xdp/xdp_umem.c:96 xsk_bind+0x49c/0x1180 net/xdp/xsk.c:488 __sys_bind+0x1e1/0x230 net/socket.c:1653 __do_sys_bind net/socket.c:1664 [inline] __se_sys_bind net/socket.c:1662 [inline] __x64_sys_bind+0x6e/0xb0 net/socket.c:1662 do_syscall_64+0xd0/0x530 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4597c9 Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ff555fe0c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000031 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004597c9 RDX: 0000000000000010 RSI: 0000000020000300 RDI: 0000000000000005 RBP: 000000000075c070 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007ff555fe16d4 R13: 00000000004bfa03 R14: 00000000004d1370 R15: 00000000ffffffff