bisecting cause commit starting from 537de0c8ca2b2fd49046e06194425f56e6246148 building syzkaller on f62e1e85cf54ccfa990868a402eca32bf4513b21 testing commit 537de0c8ca2b2fd49046e06194425f56e6246148 with gcc (GCC) 8.1.0 run #0: crashed: possible deadlock in rtnl_lock run #1: crashed: possible deadlock in xsk_notifier run #2: crashed: possible deadlock in rtnl_lock run #3: crashed: possible deadlock in rtnl_lock run #4: crashed: possible deadlock in rtnl_lock run #5: crashed: possible deadlock in rtnl_lock run #6: crashed: possible deadlock in xsk_notifier run #7: crashed: possible deadlock in rtnl_lock run #8: crashed: possible deadlock in rtnl_lock run #9: crashed: possible deadlock in rtnl_lock testing release v5.1 testing commit e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd with gcc (GCC) 8.1.0 all runs: OK # git bisect start 537de0c8ca2b2fd49046e06194425f56e6246148 v5.1 Bisecting: 6887 revisions left to test after this (roughly 13 steps) [a2d635decbfa9c1e4ae15cb05b68b2559f7f827c] Merge tag 'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drm testing commit a2d635decbfa9c1e4ae15cb05b68b2559f7f827c with gcc (GCC) 8.1.0 all runs: OK # git bisect good a2d635decbfa9c1e4ae15cb05b68b2559f7f827c Bisecting: 3707 revisions left to test after this (roughly 12 steps) [22c58fd70ca48a29505922b1563826593b08cc00] Merge tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc testing commit 22c58fd70ca48a29505922b1563826593b08cc00 with gcc (GCC) 8.1.0 run #0: basic kernel testing failed: failed to copy test binary to VM: failed to run ["scp" "-P" "22" "-F" "/dev/null" "-o" "UserKnownHostsFile=/dev/null" "-o" "BatchMode=yes" "-o" "IdentitiesOnly=yes" "-o" "StrictHostKeyChecking=no" "-o" "ConnectTimeout=10" "-i" "/syzkaller/jobs/linux/workdir/image/key" "/tmp/syz-executor617988772" "root@10.128.15.208:./syz-executor617988772"]: exit status 1 ssh: connect to host 10.128.15.208 port 22: Connection timed out lost connection run #1: OK run #2: OK run #3: OK run #4: OK run #5: OK run #6: OK run #7: OK run #8: OK run #9: OK # git bisect good 22c58fd70ca48a29505922b1563826593b08cc00 Bisecting: 1848 revisions left to test after this (roughly 11 steps) [2409207a73cc8e4aff75ceccf6fe5c3ce4d391bc] Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi testing commit 2409207a73cc8e4aff75ceccf6fe5c3ce4d391bc with gcc (GCC) 8.1.0 run #0: basic kernel testing failed: failed to copy test binary to VM: failed to run ["scp" "-P" "22" "-F" "/dev/null" "-o" "UserKnownHostsFile=/dev/null" "-o" "BatchMode=yes" "-o" "IdentitiesOnly=yes" "-o" "StrictHostKeyChecking=no" "-o" "ConnectTimeout=10" "-i" "/syzkaller/jobs/linux/workdir/image/key" "/tmp/syz-executor666639269" "root@10.128.10.33:./syz-executor666639269"]: exit status 1 ssh: connect to host 10.128.10.33 port 22: Connection timed out lost connection run #1: OK run #2: OK run #3: OK run #4: OK run #5: OK run #6: OK run #7: OK run #8: OK run #9: OK # git bisect good 2409207a73cc8e4aff75ceccf6fe5c3ce4d391bc Bisecting: 924 revisions left to test after this (roughly 10 steps) [3d4645bf7a76886c70a482a1c6742bac98553f47] Merge tag 's390-5.2-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux testing commit 3d4645bf7a76886c70a482a1c6742bac98553f47 with gcc (GCC) 8.1.0 all runs: OK # git bisect good 3d4645bf7a76886c70a482a1c6742bac98553f47 Bisecting: 463 revisions left to test after this (roughly 9 steps) [5d1549847c76b1ffcf8e388ef4d0f229bdd1d7e8] netfilter: Fix remainder of pseudo-header protocol 0 testing commit 5d1549847c76b1ffcf8e388ef4d0f229bdd1d7e8 with gcc (GCC) 8.1.0 run #0: basic kernel testing failed: failed to copy test binary to VM: failed to run ["scp" "-P" "22" "-F" "/dev/null" "-o" "UserKnownHostsFile=/dev/null" "-o" "BatchMode=yes" "-o" "IdentitiesOnly=yes" "-o" "StrictHostKeyChecking=no" "-o" "ConnectTimeout=10" "-i" "/syzkaller/jobs/linux/workdir/image/key" "/tmp/syz-executor448201683" "root@10.128.10.62:./syz-executor448201683"]: exit status 1 ssh: connect to host 10.128.10.62 port 22: Connection timed out lost connection run #1: basic kernel testing failed: failed to copy test binary to VM: failed to run ["scp" "-P" "22" "-F" "/dev/null" "-o" "UserKnownHostsFile=/dev/null" "-o" "BatchMode=yes" "-o" "IdentitiesOnly=yes" "-o" "StrictHostKeyChecking=no" "-o" "ConnectTimeout=10" "-i" "/syzkaller/jobs/linux/workdir/image/key" "/tmp/syz-executor345550614" "root@10.128.0.139:./syz-executor345550614"]: exit status 1 ssh: connect to host 10.128.0.139 port 22: Connection timed out lost connection run #2: OK run #3: OK run #4: OK run #5: OK run #6: OK run #7: OK run #8: OK run #9: OK # git bisect good 5d1549847c76b1ffcf8e388ef4d0f229bdd1d7e8 Bisecting: 231 revisions left to test after this (roughly 8 steps) [a4c33bbb660b89fc7f21957386fb3a0b38e43f98] Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm testing commit a4c33bbb660b89fc7f21957386fb3a0b38e43f98 with gcc (GCC) 8.1.0 run #0: boot failed: can't ssh into the instance run #1: OK run #2: OK run #3: OK run #4: OK run #5: OK run #6: OK run #7: OK run #8: OK run #9: OK # git bisect good a4c33bbb660b89fc7f21957386fb3a0b38e43f98 Bisecting: 117 revisions left to test after this (roughly 7 steps) [763cf1f2d9bfc8349c5791689074c8c17edf660d] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid testing commit 763cf1f2d9bfc8349c5791689074c8c17edf660d with gcc (GCC) 8.1.0 all runs: OK # git bisect good 763cf1f2d9bfc8349c5791689074c8c17edf660d Bisecting: 58 revisions left to test after this (roughly 6 steps) [c8c8218ec5af5d2598381883acbefbf604e56b5e] netrom: fix a memory leak in nr_rx_frame() testing commit c8c8218ec5af5d2598381883acbefbf604e56b5e with gcc (GCC) 8.1.0 all runs: OK # git bisect good c8c8218ec5af5d2598381883acbefbf604e56b5e Bisecting: 29 revisions left to test after this (roughly 5 steps) [0d581ba311a27762fe1a14e5db5f65d225b3d844] net: hns: add support for vlan TSO testing commit 0d581ba311a27762fe1a14e5db5f65d225b3d844 with gcc (GCC) 8.1.0 all runs: OK # git bisect good 0d581ba311a27762fe1a14e5db5f65d225b3d844 Bisecting: 14 revisions left to test after this (roughly 4 steps) [c3ead2df9776ab22490d78a7f68a8ec58700e07f] Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf testing commit c3ead2df9776ab22490d78a7f68a8ec58700e07f with gcc (GCC) 8.1.0 run #0: crashed: possible deadlock in rtnl_lock run #1: crashed: possible deadlock in xsk_notifier run #2: crashed: possible deadlock in rtnl_lock run #3: crashed: possible deadlock in xsk_notifier run #4: crashed: possible deadlock in rtnl_lock run #5: crashed: possible deadlock in xsk_notifier run #6: crashed: possible deadlock in xsk_notifier run #7: crashed: possible deadlock in xsk_notifier run #8: crashed: possible deadlock in rtnl_lock run #9: crashed: possible deadlock in rtnl_lock # git bisect bad c3ead2df9776ab22490d78a7f68a8ec58700e07f Bisecting: 7 revisions left to test after this (roughly 3 steps) [75672dda27bd00109a84cd975c17949ad9c45663] bpf: fix BPF_ALU32 | BPF_ARSH on BE arches testing commit 75672dda27bd00109a84cd975c17949ad9c45663 with gcc (GCC) 8.1.0 all runs: OK # git bisect good 75672dda27bd00109a84cd975c17949ad9c45663 Bisecting: 3 revisions left to test after this (roughly 2 steps) [ac8786c72eba67dfc8ae751a75c586289a1b9b1b] selftests: bpf: add tests for shifts by zero testing commit ac8786c72eba67dfc8ae751a75c586289a1b9b1b with gcc (GCC) 8.1.0 run #0: basic kernel testing failed: failed to copy test binary to VM: failed to run ["scp" "-P" "22" "-F" "/dev/null" "-o" "UserKnownHostsFile=/dev/null" "-o" "BatchMode=yes" "-o" "IdentitiesOnly=yes" "-o" "StrictHostKeyChecking=no" "-o" "ConnectTimeout=10" "-i" "/syzkaller/jobs/linux/workdir/image/key" "/tmp/syz-executor307773373" "root@10.128.10.59:./syz-executor307773373"]: exit status 1 ssh: connect to host 10.128.10.59 port 22: Connection timed out lost connection run #1: OK run #2: OK run #3: OK run #4: OK run #5: OK run #6: OK run #7: OK run #8: OK run #9: OK # git bisect good ac8786c72eba67dfc8ae751a75c586289a1b9b1b Bisecting: 1 revision left to test after this (roughly 1 step) [162c820ed8965bf94d2685f97388aea5aee9e258] xdp: hold device for umem regardless of zero-copy mode testing commit 162c820ed8965bf94d2685f97388aea5aee9e258 with gcc (GCC) 8.1.0 all runs: OK # git bisect good 162c820ed8965bf94d2685f97388aea5aee9e258 Bisecting: 0 revisions left to test after this (roughly 0 steps) [455302d1c9ae9318660aaeb9748a01ff414c9741] xdp: fix hang while unregistering device bound to xdp socket testing commit 455302d1c9ae9318660aaeb9748a01ff414c9741 with gcc (GCC) 8.1.0 run #0: crashed: possible deadlock in rtnl_lock run #1: crashed: possible deadlock in rtnl_lock run #2: crashed: possible deadlock in rtnl_lock run #3: crashed: possible deadlock in xsk_notifier run #4: crashed: possible deadlock in xsk_notifier run #5: crashed: possible deadlock in rtnl_lock run #6: crashed: possible deadlock in xsk_notifier run #7: crashed: possible deadlock in rtnl_lock run #8: crashed: possible deadlock in rtnl_lock run #9: crashed: possible deadlock in xsk_notifier # git bisect bad 455302d1c9ae9318660aaeb9748a01ff414c9741 455302d1c9ae9318660aaeb9748a01ff414c9741 is the first bad commit commit 455302d1c9ae9318660aaeb9748a01ff414c9741 Author: Ilya Maximets Date: Fri Jun 28 11:04:07 2019 +0300 xdp: fix hang while unregistering device bound to xdp socket Device that bound to XDP socket will not have zero refcount until the userspace application will not close it. This leads to hang inside 'netdev_wait_allrefs()' if device unregistering requested: # ip link del p1 < hang on recvmsg on netlink socket > # ps -x | grep ip 5126 pts/0 D+ 0:00 ip link del p1 # journalctl -b Jun 05 07:19:16 kernel: unregister_netdevice: waiting for p1 to become free. Usage count = 1 Jun 05 07:19:27 kernel: unregister_netdevice: waiting for p1 to become free. Usage count = 1 ... Fix that by implementing NETDEV_UNREGISTER event notification handler to properly clean up all the resources and unref device. This should also allow socket killing via ss(8) utility. Fixes: 965a99098443 ("xsk: add support for bind for Rx") Signed-off-by: Ilya Maximets Acked-by: Jonathan Lemon Signed-off-by: Daniel Borkmann :040000 040000 4798b9553c0bbcae5985146d6092c97cba26dde0 c258884c3ba27b62bdf579528178d17c4806a6d9 M include :040000 040000 0de70932488b132a54376c91590b88235ba74ee0 6d588f166b1b175afea3330f5b2258f94c7f49c8 M net revisions tested: 16, total time: 4h14m9.28949039s (build: 1h29m42.467650613s, test: 2h39m39.987725275s) first bad commit: 455302d1c9ae9318660aaeb9748a01ff414c9741 xdp: fix hang while unregistering device bound to xdp socket cc: ["ast@kernel.org" "bjorn.topel@intel.com" "bpf@vger.kernel.org" "daniel@iogearbox.net" "davem@davemloft.net" "hawk@kernel.org" "i.maximets@samsung.com" "jakub.kicinski@netronome.com" "john.fastabend@gmail.com" "jonathan.lemon@gmail.com" "linux-kernel@vger.kernel.org" "magnus.karlsson@intel.com" "netdev@vger.kernel.org" "xdp-newbies@vger.kernel.org"] crash: possible deadlock in xsk_notifier ====================================================== WARNING: possible circular locking dependency detected 5.2.0-rc3+ #1 Not tainted ------------------------------------------------------ syz-executor.1/7438 is trying to acquire lock: 000000007656fa30 (&xs->mutex){+.+.}, at: xsk_notifier+0x100/0x240 net/xdp/xsk.c:730 but task is already holding lock: 0000000060077bbe (&net->xdp.lock){+.+.}, at: xsk_notifier+0x72/0x240 net/xdp/xsk.c:726 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&net->xdp.lock){+.+.}: __mutex_lock_common kernel/locking/mutex.c:926 [inline] __mutex_lock+0xf5/0x1210 kernel/locking/mutex.c:1073 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088 xsk_notifier+0x72/0x240 net/xdp/xsk.c:726 notifier_call_chain+0x8a/0x160 kernel/notifier.c:95 __raw_notifier_call_chain kernel/notifier.c:396 [inline] raw_notifier_call_chain+0x11/0x20 kernel/notifier.c:403 call_netdevice_notifiers_info+0x28/0x60 net/core/dev.c:1749 call_netdevice_notifiers_extack net/core/dev.c:1761 [inline] call_netdevice_notifiers net/core/dev.c:1775 [inline] rollback_registered_many+0x78e/0xce0 net/core/dev.c:8206 unregister_netdevice_many+0x3e/0x1f0 net/core/dev.c:9314 ip6gre_exit_batch_net+0x3ad/0x5b0 net/ipv6/ip6_gre.c:1603 ops_exit_list.isra.5+0xd3/0x120 net/core/net_namespace.c:157 cleanup_net+0x363/0x850 net/core/net_namespace.c:553 process_one_work+0x830/0x16a0 kernel/workqueue.c:2269 worker_thread+0x85/0xb60 kernel/workqueue.c:2415 kthread+0x324/0x3e0 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 -> #1 (rtnl_mutex){+.+.}: __mutex_lock_common kernel/locking/mutex.c:926 [inline] __mutex_lock+0xf5/0x1210 kernel/locking/mutex.c:1073 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088 rtnl_lock+0x12/0x20 net/core/rtnetlink.c:72 xdp_umem_assign_dev+0xa5/0xb20 net/xdp/xdp_umem.c:96 xsk_bind+0x49c/0x1180 net/xdp/xsk.c:488 __sys_bind+0x1e1/0x230 net/socket.c:1653 __do_sys_bind net/socket.c:1664 [inline] __se_sys_bind net/socket.c:1662 [inline] __x64_sys_bind+0x6e/0xb0 net/socket.c:1662 do_syscall_64+0xd0/0x530 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (&xs->mutex){+.+.}: lock_acquire+0x173/0x3d0 kernel/locking/lockdep.c:4303 __mutex_lock_common kernel/locking/mutex.c:926 [inline] __mutex_lock+0xf5/0x1210 kernel/locking/mutex.c:1073 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088 xsk_notifier+0x100/0x240 net/xdp/xsk.c:730 notifier_call_chain+0x8a/0x160 kernel/notifier.c:95 __raw_notifier_call_chain kernel/notifier.c:396 [inline] raw_notifier_call_chain+0x11/0x20 kernel/notifier.c:403 call_netdevice_notifiers_info+0x28/0x60 net/core/dev.c:1749 call_netdevice_notifiers_extack net/core/dev.c:1761 [inline] call_netdevice_notifiers net/core/dev.c:1775 [inline] rollback_registered_many+0x78e/0xce0 net/core/dev.c:8206 rollback_registered+0xdc/0x190 net/core/dev.c:8248 unregister_netdevice_queue+0x186/0x240 net/core/dev.c:9295 br_dev_delete+0x127/0x190 net/bridge/br_if.c:383 br_del_bridge+0x9a/0xe0 net/bridge/br_if.c:483 br_ioctl_deviceless_stub+0x21f/0x640 net/bridge/br_ioctl.c:376 sock_ioctl+0x367/0x600 net/socket.c:1141 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:509 [inline] do_vfs_ioctl+0x196/0x10c0 fs/ioctl.c:696 ksys_ioctl+0x62/0x90 fs/ioctl.c:713 __do_sys_ioctl fs/ioctl.c:720 [inline] __se_sys_ioctl fs/ioctl.c:718 [inline] __x64_sys_ioctl+0x6e/0xb0 fs/ioctl.c:718 do_syscall_64+0xd0/0x530 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Chain exists of: &xs->mutex --> rtnl_mutex --> &net->xdp.lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&net->xdp.lock); lock(rtnl_mutex); lock(&net->xdp.lock); lock(&xs->mutex); *** DEADLOCK *** 3 locks held by syz-executor.1/7438: #0: 00000000dcbf0869 (br_ioctl_mutex){+.+.}, at: sock_ioctl+0x34a/0x600 net/socket.c:1139 #1: 00000000f673a4c0 (rtnl_mutex){+.+.}, at: rtnl_lock+0x12/0x20 net/core/rtnetlink.c:72 #2: 0000000060077bbe (&net->xdp.lock){+.+.}, at: xsk_notifier+0x72/0x240 net/xdp/xsk.c:726 stack backtrace: CPU: 1 PID: 7438 Comm: syz-executor.1 Not tainted 5.2.0-rc3+ #1 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x113/0x167 lib/dump_stack.c:113 print_circular_bug.cold.59+0x1bd/0x27d kernel/locking/lockdep.c:1565 check_prev_add kernel/locking/lockdep.c:2310 [inline] check_prevs_add kernel/locking/lockdep.c:2418 [inline] validate_chain kernel/locking/lockdep.c:2800 [inline] __lock_acquire+0x3853/0x55b0 kernel/locking/lockdep.c:3793 lock_acquire+0x173/0x3d0 kernel/locking/lockdep.c:4303 __mutex_lock_common kernel/locking/mutex.c:926 [inline] __mutex_lock+0xf5/0x1210 kernel/locking/mutex.c:1073 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088 xsk_notifier+0x100/0x240 net/xdp/xsk.c:730 notifier_call_chain+0x8a/0x160 kernel/notifier.c:95 __raw_notifier_call_chain kernel/notifier.c:396 [inline] raw_notifier_call_chain+0x11/0x20 kernel/notifier.c:403 call_netdevice_notifiers_info+0x28/0x60 net/core/dev.c:1749 call_netdevice_notifiers_extack net/core/dev.c:1761 [inline] call_netdevice_notifiers net/core/dev.c:1775 [inline] rollback_registered_many+0x78e/0xce0 net/core/dev.c:8206 rollback_registered+0xdc/0x190 net/core/dev.c:8248 unregister_netdevice_queue+0x186/0x240 net/core/dev.c:9295 br_dev_delete+0x127/0x190 net/bridge/br_if.c:383 br_del_bridge+0x9a/0xe0 net/bridge/br_if.c:483 br_ioctl_deviceless_stub+0x21f/0x640 net/bridge/br_ioctl.c:376 sock_ioctl+0x367/0x600 net/socket.c:1141 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:509 [inline] do_vfs_ioctl+0x196/0x10c0 fs/ioctl.c:696 ksys_ioctl+0x62/0x90 fs/ioctl.c:713 __do_sys_ioctl fs/ioctl.c:720 [inline] __se_sys_ioctl fs/ioctl.c:718 [inline] __x64_sys_ioctl+0x6e/0xb0 fs/ioctl.c:718 do_syscall_64+0xd0/0x530 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4597c9 Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fdb66831c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00000000004597c9 RDX: 0000000020000180 RSI: 00000000000089a1 RDI: 0000000000000003 RBP: 000000000075bfc8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007fdb668326d4 R13: 00000000004c50fb R14: 00000000004d9308 R15: 00000000ffffffff