bisecting cause commit starting from 4ac6d90867a4de2e12117e755dbd76e08d88697f building syzkaller on 15cea0a381c6ef9a7b4ffb2770360ce8882274c5 testing commit 4ac6d90867a4de2e12117e755dbd76e08d88697f compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.1 kernel signature: f09c64f68d39336962ee0845ca2a0800e143d4dff2c67b293f389d6f032974fb all runs: crashed: BUG: unable to handle kernel NULL pointer dereference in kiocb_done testing release v5.14 testing commit 7d2a07b769330c34b4deabeed939325c77a7ec2f compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.1 kernel signature: 1bc25c310b773f41f60536ec01fad7eee81fbd08982b6b34e2bd4f1aafdef1ea all runs: OK # git bisect start 4ac6d90867a4de2e12117e755dbd76e08d88697f 7d2a07b769330c34b4deabeed939325c77a7ec2f Bisecting: 3759 revisions left to test after this (roughly 12 steps) [ba1dc7f273c73b93e0e1dd9707b239ed69eebd70] Merge tag 'char-misc-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc testing commit ba1dc7f273c73b93e0e1dd9707b239ed69eebd70 arch/x86/kernel/setup.c:916:6: error: implicit declaration of function 'acpi_mps_check' [-Werror=implicit-function-declaration] arch/x86/kernel/setup.c:1110:2: error: implicit declaration of function 'acpi_table_upgrade' [-Werror=implicit-function-declaration] arch/x86/kernel/setup.c:1112:2: error: implicit declaration of function 'acpi_boot_table_init' [-Werror=implicit-function-declaration] arch/x86/kernel/setup.c:1120:2: error: implicit declaration of function 'early_acpi_boot_init'; did you mean 'early_cpu_init'? [-Werror=implicit-function-declaration] arch/x86/kernel/setup.c:1162:2: error: implicit declaration of function 'acpi_boot_init' [-Werror=implicit-function-declaration] # git bisect skip ba1dc7f273c73b93e0e1dd9707b239ed69eebd70 Bisecting: 3759 revisions left to test after this (roughly 12 steps) [77233c2d2ec95030afcaf9fd90e4bdd6125e5c15] btrfs: zoned: allow disabling of zone auto reclaim testing commit 77233c2d2ec95030afcaf9fd90e4bdd6125e5c15 compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.1 kernel signature: 4a53a7b9fa43a5ea2ef6247439ac05ee824bc0c9ed27484814522f21c30f9505 all runs: OK # git bisect good 77233c2d2ec95030afcaf9fd90e4bdd6125e5c15 Bisecting: 3759 revisions left to test after this (roughly 12 steps) [999abd7a8c5dc52c5bfca71cdba98642dad5a0be] Merge existing fixes from asoc/for-5.14 testing commit 999abd7a8c5dc52c5bfca71cdba98642dad5a0be compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.1 kernel signature: 5e7cbc81f27fd077634b6816e4713f79388593a68457733fe1b8ecdb01ca1eec all runs: OK # git bisect good 999abd7a8c5dc52c5bfca71cdba98642dad5a0be Bisecting: 3758 revisions left to test after this (roughly 12 steps) [556120256ecd25aacea2c7e3ad11ec6584de7252] drm/i915/guc: GuC virtual engines testing commit 556120256ecd25aacea2c7e3ad11ec6584de7252 compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.1 kernel signature: 6381ceba6dcccc05a8c75557c9749967f91273276f20c6d4cb8583eab1078a6a run #0: crashed: BUG: sleeping function called from invalid context in lock_sock_nested run #1: OK run #2: OK run #3: OK run #4: OK run #5: OK run #6: OK run #7: OK run #8: OK run #9: OK reproducer seems to be flaky # git bisect bad 556120256ecd25aacea2c7e3ad11ec6584de7252 Bisecting: 100 revisions left to test after this (roughly 7 steps) [f92906e220f1f130995a67817cfec7f305a55bfc] i915/gem/selftests: Assign the VM at context creation in igt_shared_ctx_exec testing commit f92906e220f1f130995a67817cfec7f305a55bfc compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.1 kernel signature: 9e9886d4099ac671c44b4ea393defe29766e437789b90dbe9e18ba004446a52a all runs: OK # git bisect good f92906e220f1f130995a67817cfec7f305a55bfc Bisecting: 50 revisions left to test after this (roughly 6 steps) [0f4651359a235a702b383076fc2ccbd90d9bedb4] drm/i915: Make the kmem slab for i915_buddy_block a global testing commit 0f4651359a235a702b383076fc2ccbd90d9bedb4 compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.1 kernel signature: 6381ceba6dcccc05a8c75557c9749967f91273276f20c6d4cb8583eab1078a6a run #0: basic kernel testing failed: BUG: sleeping function called from invalid context in stack_depot_save run #1: crashed: BUG: sleeping function called from invalid context in lock_sock_nested run #2: OK run #3: OK run #4: OK run #5: OK run #6: OK run #7: OK run #8: OK run #9: OK run #10: OK run #11: OK run #12: OK run #13: OK run #14: OK run #15: OK run #16: OK run #17: OK run #18: OK run #19: OK # git bisect bad 0f4651359a235a702b383076fc2ccbd90d9bedb4 Bisecting: 18 revisions left to test after this (roughly 5 steps) [d6e6ac294d91563131265fdf44537aeac2984c21] Merge branch 'topic/revid_steppings' into drm-intel-gt-next testing commit d6e6ac294d91563131265fdf44537aeac2984c21 compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.1 kernel signature: 6381ceba6dcccc05a8c75557c9749967f91273276f20c6d4cb8583eab1078a6a run #0: basic kernel testing failed: BUG: sleeping function called from invalid context in stack_depot_save run #1: basic kernel testing failed: BUG: sleeping function called from invalid context in stack_depot_save run #2: OK run #3: OK run #4: OK run #5: OK run #6: OK run #7: OK run #8: OK run #9: OK run #10: OK run #11: OK run #12: OK run #13: OK run #14: OK run #15: OK run #16: OK run #17: OK run #18: OK run #19: OK # git bisect good d6e6ac294d91563131265fdf44537aeac2984c21 Bisecting: 9 revisions left to test after this (roughly 3 steps) [15eb083bdb561bb4862cd04cd0523e55483e877e] drm/i915: Correct the docs for intel_engine_cmd_parser testing commit 15eb083bdb561bb4862cd04cd0523e55483e877e compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.1 kernel signature: 6381ceba6dcccc05a8c75557c9749967f91273276f20c6d4cb8583eab1078a6a run #0: basic kernel testing failed: BUG: sleeping function called from invalid context in stack_depot_save run #1: OK run #2: OK run #3: OK run #4: OK run #5: OK run #6: OK run #7: OK run #8: OK run #9: OK run #10: OK run #11: OK run #12: OK run #13: OK run #14: OK run #15: OK run #16: OK run #17: OK run #18: OK run #19: OK # git bisect good 15eb083bdb561bb4862cd04cd0523e55483e877e Bisecting: 4 revisions left to test after this (roughly 2 steps) [6b73a7f380a3f1a9599bc802cf78febeb77f42db] drm/i915: Make GT workaround upper bounds exclusive testing commit 6b73a7f380a3f1a9599bc802cf78febeb77f42db compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.1 kernel signature: 6381ceba6dcccc05a8c75557c9749967f91273276f20c6d4cb8583eab1078a6a run #0: basic kernel testing failed: possible deadlock in fs_reclaim_acquire run #1: OK run #2: OK run #3: OK run #4: OK run #5: OK run #6: OK run #7: OK run #8: OK run #9: OK run #10: OK run #11: OK run #12: OK run #13: OK run #14: OK run #15: OK run #16: OK run #17: OK run #18: OK run #19: OK # git bisect good 6b73a7f380a3f1a9599bc802cf78febeb77f42db Bisecting: 2 revisions left to test after this (roughly 1 step) [75d3bf84dfca2fd3f83125eb68f0f55c7018d4de] drm/i915: Call i915_globals_exit() after i915_pmu_exit() testing commit 75d3bf84dfca2fd3f83125eb68f0f55c7018d4de compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.1 kernel signature: 6381ceba6dcccc05a8c75557c9749967f91273276f20c6d4cb8583eab1078a6a all runs: OK # git bisect good 75d3bf84dfca2fd3f83125eb68f0f55c7018d4de Bisecting: 0 revisions left to test after this (roughly 1 step) [a04ea6ae7c6728cd834709f3477e75d4f74583da] drm/i915: Use a table for i915_init/exit (v2) testing commit a04ea6ae7c6728cd834709f3477e75d4f74583da compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.1 kernel signature: 6381ceba6dcccc05a8c75557c9749967f91273276f20c6d4cb8583eab1078a6a run #0: basic kernel testing failed: BUG: sleeping function called from invalid context in stack_depot_save run #1: basic kernel testing failed: BUG: sleeping function called from invalid context in stack_depot_save run #2: crashed: BUG: sleeping function called from invalid context in lock_sock_nested run #3: OK run #4: OK run #5: OK run #6: OK run #7: OK run #8: OK run #9: OK run #10: OK run #11: OK run #12: OK run #13: OK run #14: OK run #15: OK run #16: OK run #17: OK run #18: OK run #19: OK # git bisect bad a04ea6ae7c6728cd834709f3477e75d4f74583da Bisecting: 0 revisions left to test after this (roughly 0 steps) [db484889d1ff0645e07e360d3e3ad306c0515821] drm/i915: Call i915_globals_exit() if pci_register_device() fails testing commit db484889d1ff0645e07e360d3e3ad306c0515821 compiler: gcc (GCC) 10.2.1 20210217, GNU ld (GNU Binutils for Debian) 2.35.1 kernel signature: 6381ceba6dcccc05a8c75557c9749967f91273276f20c6d4cb8583eab1078a6a all runs: OK # git bisect good db484889d1ff0645e07e360d3e3ad306c0515821 a04ea6ae7c6728cd834709f3477e75d4f74583da is the first bad commit commit a04ea6ae7c6728cd834709f3477e75d4f74583da Author: Jason Ekstrand Date: Wed Jul 21 10:23:55 2021 -0500 drm/i915: Use a table for i915_init/exit (v2) If the driver was not fully loaded, we may still have globals lying around. If we don't tear those down in i915_exit(), we'll leak a bunch of memory slabs. This can happen two ways: use_kms = false and if we've run mock selftests. In either case, we have an early exit from i915_init which happens after i915_globals_init() and we need to clean up those globals. The mock selftests case is especially sticky. The load isn't entirely a no-op. We actually do quite a bit inside those selftests including allocating a bunch of mock objects and running tests on them. Once all those tests are complete, we exit early from i915_init(). Perviously, i915_init() would return a non-zero error code on failure and a zero error code on success. In the success case, we would get to i915_exit() and check i915_pci_driver.driver.owner to detect if i915_init exited early and do nothing. In the failure case, we would fail i915_init() but there would be no opportunity to clean up globals. The most annoying part is that you don't actually notice the failure as part of the self-tests since leaking a bit of memory, while bad, doesn't result in anything observable from userspace. Instead, the next time we load the driver (usually for next IGT test), i915_globals_init() gets invoked again, we go to allocate a bunch of new memory slabs, those implicitly create debugfs entries, and debugfs warns that we're trying to create directories and files that already exist. Since this all happens as part of the next driver load, it shows up in the dmesg-warn of whatever IGT test ran after the mock selftests. While the obvious thing to do here might be to call i915_globals_exit() after selftests, that's not actually safe. The dma-buf selftests call i915_gem_prime_export which creates a file. We call dma_buf_put() on the resulting dmabuf which calls fput() on the file. However, fput() isn't immediate and gets flushed right before syscall returns. This means that all the fput()s from the selftests don't happen until right before the module load syscall used to fire off the selftests returns which is after i915_init(). If we call i915_globals_exit() in i915_init() after selftests, we end up freeing slabs out from under objects which won't get released until fput() is flushed at the end of the module load syscall. The solution here is to let i915_init() return success early and detect the early success in i915_exit() and only tear down globals and nothing else. This way the module loads successfully, regardless of the success or failure of the tests. Because we've not enumerated any PCI devices, no device nodes are created and it's entirely useless from userspace. The only thing the module does at that point is hold on to a bit of memory until we unload it and i915_exit() is called. Importantly, this means that everything from our selftests has the ability to properly flush out between i915_init() and i915_exit() because there is at least one syscall boundary in between. In order to handle all the delicate init/exit cases, we convert the whole thing to a table of init/exit pairs and track the init status in the new init_progress global. This allows us to ensure that i915_exit() always tears down exactly the things that i915_init() successfully initialized. We also allow early-exit of i915_init() without failure by an init function returning > 0. This is useful for nomodeset, and selftests. For the mock selftests, we convert them to always return 1 so we get the desired behavior of the driver always succeeding to load the driver and then properly tearing down the partially loaded driver. v2 (Tvrtko Ursulin): - Guard init_funcs[i].exit with GEM_BUG_ON(i >= ARRAY_SIZE(init_funcs)) v2 (Daniel Vetter): - Update the docstring for i915.mock_selftests Signed-off-by: Jason Ekstrand Reviewed-by: Daniel Vetter Cc: Tvrtko Ursulin Signed-off-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20210721152358.2893314-4-jason@jlekstrand.net drivers/gpu/drm/i915/i915_pci.c | 105 +++++++++++++++++-------- drivers/gpu/drm/i915/i915_perf.c | 3 +- drivers/gpu/drm/i915/i915_perf.h | 2 +- drivers/gpu/drm/i915/i915_pmu.c | 4 +- drivers/gpu/drm/i915/i915_pmu.h | 4 +- drivers/gpu/drm/i915/selftests/i915_selftest.c | 4 +- 6 files changed, 82 insertions(+), 40 deletions(-) culprit signature: 6381ceba6dcccc05a8c75557c9749967f91273276f20c6d4cb8583eab1078a6a parent signature: 6381ceba6dcccc05a8c75557c9749967f91273276f20c6d4cb8583eab1078a6a Reproducer flagged being flaky revisions tested: 13, total time: 3h33m45.180189405s (build: 1h37m5.173762799s, test: 1h54m32.452519525s) first bad commit: a04ea6ae7c6728cd834709f3477e75d4f74583da drm/i915: Use a table for i915_init/exit (v2) recipients (to): ["airlied@linux.ie" "daniel.vetter@ffwll.ch" "daniel@ffwll.ch" "dri-devel@lists.freedesktop.org" "intel-gfx@lists.freedesktop.org" "jani.nikula@linux.intel.com" "jason@jlekstrand.net" "joonas.lahtinen@linux.intel.com" "rodrigo.vivi@intel.com"] recipients (cc): ["linux-kernel@vger.kernel.org"] crash: BUG: sleeping function called from invalid context in lock_sock_nested BUG: sleeping function called from invalid context at net/core/sock.c:3100 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 8831, name: syz-executor.3 1 lock held by syz-executor.3/8831: #0: ffffffff8c40b440 (hci_sk_list.lock){++++}-{2:2}, at: hci_sock_dev_event+0x374/0x5c0 net/bluetooth/hci_sock.c:763 Preemption disabled at: [<0000000000000000>] 0x0 CPU: 0 PID: 8831 Comm: syz-executor.3 Not tainted 5.14.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x57/0x7d lib/dump_stack.c:105 ___might_sleep.cold+0x1f1/0x237 kernel/sched/core.c:9154 lock_sock_nested+0x1e/0xf0 net/core/sock.c:3100 lock_sock include/net/sock.h:1610 [inline] hci_sock_dev_event+0x3ea/0x5c0 net/bluetooth/hci_sock.c:765 hci_unregister_dev+0x29b/0xfb0 net/bluetooth/hci_core.c:4033 vhci_release+0x62/0xd0 drivers/bluetooth/hci_vhci.c:340 __fput+0x209/0x870 fs/file_table.c:280 task_work_run+0xc0/0x160 kernel/task_work.c:164 exit_task_work include/linux/task_work.h:32 [inline] do_exit+0x9fe/0x24e0 kernel/exit.c:825 do_group_exit+0xe7/0x290 kernel/exit.c:922 __do_sys_exit_group kernel/exit.c:933 [inline] __se_sys_exit_group kernel/exit.c:931 [inline] __x64_sys_exit_group+0x35/0x40 kernel/exit.c:931 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x4665f9 Code: Unable to access opcode bytes at RIP 0x4665cf. RSP: 002b:00007ffdc1e158a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 RAX: ffffffffffffffda RBX: 00007ffdc1e16068 RCX: 00000000004665f9 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000043 RBP: 0000000000000000 R08: 0000000000000025 R09: 00007ffdc1e16068 R10: 00000000ffffffff R11: 0000000000000246 R12: 00000000004bef74 R13: 0000000000000010 R14: 0000000000000000 R15: 0000000000400538 ======================================================