bisecting cause commit starting from 0e40da3efeb02ab0333d01abae20d421841db30a building syzkaller on c35ee0ea6d7ad386409ff5092d11361169886eef testing commit 0e40da3efeb02ab0333d01abae20d421841db30a with gcc (GCC) 8.1.0 all runs: crashed: BUG: MAX_STACK_TRACE_ENTRIES too low! testing release v5.0 testing commit 1c163f4c7b3f621efff9b28a47abb36f7378d783 with gcc (GCC) 8.1.0 all runs: OK # git bisect start 0e40da3efeb02ab0333d01abae20d421841db30a v5.0 Bisecting: 6240 revisions left to test after this (roughly 13 steps) [f90d64483ebd394958841f67f8794ab203b319a7] Merge tag 'usb-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb testing commit f90d64483ebd394958841f67f8794ab203b319a7 with gcc (GCC) 8.1.0 all runs: crashed: BUG: MAX_STACK_TRACE_ENTRIES too low! # git bisect bad f90d64483ebd394958841f67f8794ab203b319a7 Bisecting: 3194 revisions left to test after this (roughly 12 steps) [8feed3efa8022107bcb3432ac3ec9917e078ae70] Merge branch 'parisc-5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux testing commit 8feed3efa8022107bcb3432ac3ec9917e078ae70 with gcc (GCC) 8.1.0 all runs: OK # git bisect good 8feed3efa8022107bcb3432ac3ec9917e078ae70 Bisecting: 1544 revisions left to test after this (roughly 11 steps) [8dcd175bc3d50b78413c56d5b17d4bddd77412ef] Merge branch 'akpm' (patches from Andrew) testing commit 8dcd175bc3d50b78413c56d5b17d4bddd77412ef with gcc (GCC) 8.1.0 all runs: crashed: BUG: MAX_STACK_TRACE_ENTRIES too low! # git bisect bad 8dcd175bc3d50b78413c56d5b17d4bddd77412ef Bisecting: 980 revisions left to test after this (roughly 10 steps) [aebbfafc74b8d4eac573aab47aaa2e7965295096] Merge tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc testing commit aebbfafc74b8d4eac573aab47aaa2e7965295096 with gcc (GCC) 8.1.0 all runs: crashed: BUG: MAX_STACK_TRACE_ENTRIES too low! # git bisect bad aebbfafc74b8d4eac573aab47aaa2e7965295096 Bisecting: 357 revisions left to test after this (roughly 8 steps) [3478588b5136966c80c571cf0006f08e9e5b8f04] Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip testing commit 3478588b5136966c80c571cf0006f08e9e5b8f04 with gcc (GCC) 8.1.0 all runs: crashed: BUG: MAX_STACK_TRACE_ENTRIES too low! # git bisect bad 3478588b5136966c80c571cf0006f08e9e5b8f04 Bisecting: 137 revisions left to test after this (roughly 7 steps) [b1b988a6a035212f5ea205155c49ce449beedee8] Merge branch 'timers-2038-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip testing commit b1b988a6a035212f5ea205155c49ce449beedee8 with gcc (GCC) 8.1.0 all runs: OK # git bisect good b1b988a6a035212f5ea205155c49ce449beedee8 Bisecting: 64 revisions left to test after this (roughly 6 steps) [e7ffb4eb9a6d89678e7f62461737899f88dab64e] Merge branches 'doc.2019.01.26a', 'fixes.2019.01.26a', 'sil.2019.01.26a', 'spdx.2019.02.09a', 'srcu.2019.01.26a' and 'torture.2019.01.26a' into HEAD testing commit e7ffb4eb9a6d89678e7f62461737899f88dab64e with gcc (GCC) 8.1.0 all runs: OK # git bisect good e7ffb4eb9a6d89678e7f62461737899f88dab64e Bisecting: 32 revisions left to test after this (roughly 5 steps) [15ea86b58c71d05e0921bebcf707aa30e43e9e25] locking/lockdep: Fix reported required memory size (2/2) testing commit 15ea86b58c71d05e0921bebcf707aa30e43e9e25 with gcc (GCC) 8.1.0 all runs: OK # git bisect good 15ea86b58c71d05e0921bebcf707aa30e43e9e25 Bisecting: 16 revisions left to test after this (roughly 4 steps) [669de8bda87b92ab9a2fc663b3f5743c2ad1ae9f] kernel/workqueue: Use dynamic lockdep keys for workqueues testing commit 669de8bda87b92ab9a2fc663b3f5743c2ad1ae9f with gcc (GCC) 8.1.0 run #0: crashed: WARNING: locking bug in flush_workqueue run #1: basic kernel testing failed: timed out run #2: basic kernel testing failed: timed out run #3: basic kernel testing failed: timed out run #4: basic kernel testing failed: timed out run #5: basic kernel testing failed: timed out run #6: basic kernel testing failed: timed out run #7: basic kernel testing failed: timed out run #8: basic kernel testing failed: timed out run #9: basic kernel testing failed: timed out # git bisect bad 669de8bda87b92ab9a2fc663b3f5743c2ad1ae9f Bisecting: 7 revisions left to test after this (roughly 3 steps) [a0b0fd53e1e67639b303b15939b9c653dbe7a8c4] locking/lockdep: Free lock classes that are no longer in use testing commit a0b0fd53e1e67639b303b15939b9c653dbe7a8c4 with gcc (GCC) 8.1.0 all runs: OK # git bisect good a0b0fd53e1e67639b303b15939b9c653dbe7a8c4 Bisecting: 3 revisions left to test after this (roughly 2 steps) [de4643a77356a77bce73f64275b125b4b71a69cf] locking/lockdep: Reuse lock chains that have been freed testing commit de4643a77356a77bce73f64275b125b4b71a69cf with gcc (GCC) 8.1.0 all runs: OK # git bisect good de4643a77356a77bce73f64275b125b4b71a69cf Bisecting: 1 revision left to test after this (roughly 1 step) [4bf508621855613ca2ac782f70c3171e0e8bb011] locking/lockdep: Verify whether lock objects are small enough to be used as class keys testing commit 4bf508621855613ca2ac782f70c3171e0e8bb011 with gcc (GCC) 8.1.0 all runs: OK # git bisect good 4bf508621855613ca2ac782f70c3171e0e8bb011 Bisecting: 0 revisions left to test after this (roughly 0 steps) [108c14858b9ea224686e476c8f5ec345a0df9e27] locking/lockdep: Add support for dynamic keys testing commit 108c14858b9ea224686e476c8f5ec345a0df9e27 with gcc (GCC) 8.1.0 all runs: OK # git bisect good 108c14858b9ea224686e476c8f5ec345a0df9e27 669de8bda87b92ab9a2fc663b3f5743c2ad1ae9f is the first bad commit commit 669de8bda87b92ab9a2fc663b3f5743c2ad1ae9f Author: Bart Van Assche Date: Thu Feb 14 15:00:54 2019 -0800 kernel/workqueue: Use dynamic lockdep keys for workqueues The following commit: 87915adc3f0a ("workqueue: re-add lockdep dependencies for flushing") improved deadlock checking in the workqueue implementation. Unfortunately that patch also introduced a few false positive lockdep complaints. This patch suppresses these false positives by allocating the workqueue mutex lockdep key dynamically. An example of a false positive lockdep complaint suppressed by this patch can be found below. The root cause of the lockdep complaint shown below is that the direct I/O code can call alloc_workqueue() from inside a work item created by another alloc_workqueue() call and that both workqueues share the same lockdep key. This patch avoids that that lockdep complaint is triggered by allocating the work queue lockdep keys dynamically. In other words, this patch guarantees that a unique lockdep key is associated with each work queue mutex. ====================================================== WARNING: possible circular locking dependency detected 4.19.0-dbg+ #1 Not tainted fio/4129 is trying to acquire lock: 00000000a01cfe1a ((wq_completion)"dio/%s"sb->s_id){+.+.}, at: flush_workqueue+0xd0/0x970 but task is already holding lock: 00000000a0acecf9 (&sb->s_type->i_mutex_key#14){+.+.}, at: ext4_file_write_iter+0x154/0x710 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&sb->s_type->i_mutex_key#14){+.+.}: down_write+0x3d/0x80 __generic_file_fsync+0x77/0xf0 ext4_sync_file+0x3c9/0x780 vfs_fsync_range+0x66/0x100 dio_complete+0x2f5/0x360 dio_aio_complete_work+0x1c/0x20 process_one_work+0x481/0x9f0 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 -> #1 ((work_completion)(&dio->complete_work)){+.+.}: process_one_work+0x447/0x9f0 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 -> #0 ((wq_completion)"dio/%s"sb->s_id){+.+.}: lock_acquire+0xc5/0x200 flush_workqueue+0xf3/0x970 drain_workqueue+0xec/0x220 destroy_workqueue+0x23/0x350 sb_init_dio_done_wq+0x6a/0x80 do_blockdev_direct_IO+0x1f33/0x4be0 __blockdev_direct_IO+0x79/0x86 ext4_direct_IO+0x5df/0xbb0 generic_file_direct_write+0x119/0x220 __generic_file_write_iter+0x131/0x2d0 ext4_file_write_iter+0x3fa/0x710 aio_write+0x235/0x330 io_submit_one+0x510/0xeb0 __x64_sys_io_submit+0x122/0x340 do_syscall_64+0x71/0x220 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Chain exists of: (wq_completion)"dio/%s"sb->s_id --> (work_completion)(&dio->complete_work) --> &sb->s_type->i_mutex_key#14 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sb->s_type->i_mutex_key#14); lock((work_completion)(&dio->complete_work)); lock(&sb->s_type->i_mutex_key#14); lock((wq_completion)"dio/%s"sb->s_id); *** DEADLOCK *** 1 lock held by fio/4129: #0: 00000000a0acecf9 (&sb->s_type->i_mutex_key#14){+.+.}, at: ext4_file_write_iter+0x154/0x710 stack backtrace: CPU: 3 PID: 4129 Comm: fio Not tainted 4.19.0-dbg+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x86/0xc5 print_circular_bug.isra.32+0x20a/0x218 __lock_acquire+0x1c68/0x1cf0 lock_acquire+0xc5/0x200 flush_workqueue+0xf3/0x970 drain_workqueue+0xec/0x220 destroy_workqueue+0x23/0x350 sb_init_dio_done_wq+0x6a/0x80 do_blockdev_direct_IO+0x1f33/0x4be0 __blockdev_direct_IO+0x79/0x86 ext4_direct_IO+0x5df/0xbb0 generic_file_direct_write+0x119/0x220 __generic_file_write_iter+0x131/0x2d0 ext4_file_write_iter+0x3fa/0x710 aio_write+0x235/0x330 io_submit_one+0x510/0xeb0 __x64_sys_io_submit+0x122/0x340 do_syscall_64+0x71/0x220 entry_SYSCALL_64_after_hwframe+0x49/0xbe Signed-off-by: Bart Van Assche Signed-off-by: Peter Zijlstra (Intel) Cc: Andrew Morton Cc: Johannes Berg Cc: Linus Torvalds Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Tejun Heo Cc: Thomas Gleixner Cc: Waiman Long Cc: Will Deacon Link: https://lkml.kernel.org/r/20190214230058.196511-20-bvanassche@acm.org [ Reworked the changelog a bit. ] Signed-off-by: Ingo Molnar :040000 040000 530930fe3168a62e87f99f11636b73bda9de0499 591356bacb6c4a56275e8c7348f6b072ebdbbafc M include :040000 040000 9b254911561c5975fd2ec055dcd930371e868e45 c68d1c4a4d8242842fbb9a4216de34d52e0dde80 M kernel revisions tested: 15, total time: 4h5m51.649323326s (build: 1h41m24.739243042s, test: 2h17m27.262643841s) first bad commit: 669de8bda87b92ab9a2fc663b3f5743c2ad1ae9f kernel/workqueue: Use dynamic lockdep keys for workqueues cc: ["akpm@linux-foundation.org" "bvanassche@acm.org" "johannes.berg@intel.com" "longman@redhat.com" "mingo@kernel.org" "paulmck@linux.vnet.ibm.com" "peterz@infradead.org" "tglx@linutronix.de" "tj@kernel.org" "torvalds@linux-foundation.org" "will.deacon@arm.com"] crash: WARNING: locking bug in flush_workqueue IPv6: ADDRCONF(NETDEV_CHANGE): hsr_slave_1: link becomes ready IPv6: ADDRCONF(NETDEV_CHANGE): team0: link becomes ready 8021q: adding VLAN 0 to HW filter on device batadv0 IPv6: ADDRCONF(NETDEV_CHANGE): hsr0: link becomes ready 8021q: adding VLAN 0 to HW filter on device batadv0 WARNING: CPU: 0 PID: 9153 at kernel/locking/lockdep.c:747 arch_local_save_flags arch/x86/include/asm/paravirt.h:761 [inline] WARNING: CPU: 0 PID: 9153 at kernel/locking/lockdep.c:747 look_up_lock_class kernel/locking/lockdep.c:738 [inline] WARNING: CPU: 0 PID: 9153 at kernel/locking/lockdep.c:747 register_lock_class+0x10de/0x2220 kernel/locking/lockdep.c:1060 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 9153 Comm: syz-executor.2 Not tainted 5.0.0-rc8+ #1 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x113/0x167 lib/dump_stack.c:113 panic+0x212/0x41d kernel/panic.c:214 __warn.cold.8+0x1b/0x36 kernel/panic.c:571 report_bug+0x1a4/0x200 lib/bug.c:186 fixup_bug arch/x86/kernel/traps.c:178 [inline] do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271 do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973 RIP: 0010:look_up_lock_class kernel/locking/lockdep.c:747 [inline] RIP: 0010:register_lock_class+0x10de/0x2220 kernel/locking/lockdep.c:1060 Code: 4c 89 ea 4d 8b 7e c0 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 0f 85 38 10 00 00 4d 89 7d 00 e9 a2 f2 ff ff 0f 0b <0f> 0b e9 90 f1 ff ff 4c 89 f2 4c 8b 95 60 ff ff ff 48 8b 8d 58 ff kobject: 'loop1' (00000000bc47fe9f): kobject_uevent_env RSP: 0018:ffff8880815d7918 EFLAGS: 00010002 RAX: 0000000000000004 RBX: ffffffff89822e00 RCX: 1ffff110102baf2d RDX: ffff8880a817e080 RSI: 1ffffffff1394d34 RDI: ffff8880a98c93c8 RBP: ffff8880815d79f0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000003 R13: ffff8880a98c93b0 R14: 0000000000000000 R15: ffff88809b279c40 __lock_acquire+0xff/0x3a90 kernel/locking/lockdep.c:3550 lock_acquire+0x173/0x3d0 kernel/locking/lockdep.c:4178 flush_workqueue+0xf2/0x13d0 kernel/workqueue.c:2680 drain_workqueue+0x148/0x3a0 kernel/workqueue.c:2845 destroy_workqueue+0x19/0x5b0 kernel/workqueue.c:4218 ucma_close+0x23c/0x2e0 drivers/infiniband/core/ucma.c:1779 __fput+0x24c/0x800 fs/file_table.c:278 ____fput+0x9/0x10 fs/file_table.c:309 task_work_run+0x10e/0x190 kernel/task_work.c:113 tracehook_notify_resume include/linux/tracehook.h:188 [inline] exit_to_usermode_loop+0x1a9/0x200 arch/x86/entry/common.c:166 prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline] syscall_return_slowpath arch/x86/entry/common.c:268 [inline] do_syscall_64+0x40d/0x4e0 arch/x86/entry/common.c:293 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x411fc1 Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 e4 1a 00 00 c3 48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01 RSP: 002b:00007ffcff79b350 EFLAGS: 00000293 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000005 RCX: 0000000000411fc1 RDX: 0000000000000000 RSI: 00000000007400b8 RDI: 0000000000000004 RBP: 0000000000000000 R08: 0000000000011d47 R09: 0000000000011d47 R10: 00007ffcff79b280 R11: 0000000000000293 R12: 0000000000000001 R13: 00007ffcff79b390 R14: 0000000000000000 R15: 00007ffcff79b3a0 Kernel Offset: disabled Rebooting in 86400 seconds..