syzbot


possible deadlock in pcpu_alloc_noprof

Status: upstream: reported on 2024/09/29 16:17
Subsystems: bcachefs
[Documentation on labels]
Reported-by: syzbot+fe63f377148a6371a9db@syzkaller.appspotmail.com
First crash: 56d, last: 13d
Discussions (1)
Title Replies (including bot) Last reply
[syzbot] [bcachefs?] possible deadlock in pcpu_alloc_noprof 0 (1) 2024/09/29 16:17

Sample crash report:
loop0: detected capacity change from 0 to 32768
bcachefs (loop0): starting version 1.7: mi_btree_bitmap opts=metadata_checksum=none,data_checksum=none,compression=lz4,erasure_code,degraded,no_splitbrain_check,fsck,norecovery,nojournal_transaction_names,reconstruct_alloc,nocow
bcachefs (loop0): recovering from clean shutdown, journal seq 10
bcachefs (loop0): Version upgrade required:
Version upgrade from 0.24: unwritten_extents to 1.7: mi_btree_bitmap incomplete
Doing incompatible version upgrade from 0.24: unwritten_extents to 1.13: inode_has_child_snapshots
  running recovery passes: check_allocations,check_alloc_info,check_lrus,check_btree_backpointers,check_backpointers_to_extents,check_extents_to_backpointers,check_alloc_to_lru_refs,bucket_gens_init,check_snapshot_trees,check_snapshots,check_subvols,check_subvol_children,delete_dead_snapshots,check_inodes,check_extents,check_indirect_extents,check_dirents,check_xattrs,check_root,check_unreachable_inodes,check_subvolume_structure,check_directory_structure,check_nlinks,set_fs_needs_rebalance
bcachefs (loop0): dropping and reconstructing all alloc info
bcachefs (loop0): check_topology... done
bcachefs (loop0): accounting_read... done
bcachefs (loop0): alloc_read... done
bcachefs (loop0): stripes_read... done
bcachefs (loop0): snapshots_read... done
bcachefs (loop0): check_allocations... done
bcachefs (loop0): going read-write
bcachefs (loop0): done starting filesystem
netlink: 4 bytes leftover after parsing attributes in process `syz.0.0'.
======================================================
WARNING: possible circular locking dependency detected
6.12.0-rc6-syzkaller-00169-g906bd684e4b1 #0 Not tainted
------------------------------------------------------
syz.0.0/5328 is trying to acquire lock:
ffffffff8ea17308 (pcpu_alloc_mutex){+.+.}-{3:3}, at: pcpu_alloc_noprof+0x27f/0x16b0 mm/percpu.c:1795

but task is already holding lock:
ffff88804ff01c50 (&bc->lock){+.+.}-{3:3}, at: bch2_btree_node_mem_alloc+0x4ec/0x1340 fs/bcachefs/btree_cache.c:782

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&bc->lock){+.+.}-{3:3}:
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825
       __mutex_lock_common kernel/locking/mutex.c:608 [inline]
       __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
       bch2_btree_cache_scan+0x192/0xd00 fs/bcachefs/btree_cache.c:460
       do_shrink_slab+0x701/0x1160 mm/shrinker.c:437
       shrink_slab+0x1093/0x14d0 mm/shrinker.c:664
       shrink_one+0x43b/0x850 mm/vmscan.c:4824
       shrink_many mm/vmscan.c:4885 [inline]
       lru_gen_shrink_node mm/vmscan.c:4963 [inline]
       shrink_node+0x3791/0x3e20 mm/vmscan.c:5943
       kswapd_shrink_node mm/vmscan.c:6771 [inline]
       balance_pgdat mm/vmscan.c:6963 [inline]
       kswapd+0x1ca3/0x3700 mm/vmscan.c:7232
       kthread+0x2f0/0x390 kernel/kthread.c:389
       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

-> #1 (fs_reclaim){+.+.}-{0:0}:
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825
       __fs_reclaim_acquire mm/page_alloc.c:3834 [inline]
       fs_reclaim_acquire+0x88/0x130 mm/page_alloc.c:3848
       might_alloc include/linux/sched/mm.h:318 [inline]
       slab_pre_alloc_hook mm/slub.c:4036 [inline]
       slab_alloc_node mm/slub.c:4114 [inline]
       __do_kmalloc_node mm/slub.c:4263 [inline]
       __kmalloc_noprof+0xa9/0x400 mm/slub.c:4276
       kmalloc_noprof include/linux/slab.h:882 [inline]
       kzalloc_noprof include/linux/slab.h:1014 [inline]
       pcpu_mem_zalloc mm/percpu.c:510 [inline]
       pcpu_alloc_chunk mm/percpu.c:1443 [inline]
       pcpu_create_chunk+0x57/0xbc0 mm/percpu-vm.c:338
       pcpu_balance_populated mm/percpu.c:2075 [inline]
       pcpu_balance_workfn+0xc4d/0xd40 mm/percpu.c:2212
       process_one_work kernel/workqueue.c:3229 [inline]
       process_scheduled_works+0xa63/0x1850 kernel/workqueue.c:3310
       worker_thread+0x870/0xd30 kernel/workqueue.c:3391
       kthread+0x2f0/0x390 kernel/kthread.c:389
       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

-> #0 (pcpu_alloc_mutex){+.+.}-{3:3}:
       check_prev_add kernel/locking/lockdep.c:3161 [inline]
       check_prevs_add kernel/locking/lockdep.c:3280 [inline]
       validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904
       __lock_acquire+0x1384/0x2050 kernel/locking/lockdep.c:5202
       lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825
       __mutex_lock_common kernel/locking/mutex.c:608 [inline]
       __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
       pcpu_alloc_noprof+0x27f/0x16b0 mm/percpu.c:1795
       __six_lock_init+0x104/0x150 fs/bcachefs/six.c:869
       bch2_btree_lock_init+0x38/0x100 fs/bcachefs/btree_locking.c:12
       bch2_btree_node_mem_alloc+0x4f8/0x1340 fs/bcachefs/btree_cache.c:785
       __bch2_btree_node_alloc fs/bcachefs/btree_update_interior.c:325 [inline]
       bch2_btree_reserve_get+0x2df/0x1890 fs/bcachefs/btree_update_interior.c:554
       bch2_btree_update_start+0xe56/0x14e0 fs/bcachefs/btree_update_interior.c:1252
       bch2_btree_split_leaf+0x123/0x840 fs/bcachefs/btree_update_interior.c:1850
       bch2_trans_commit_error+0x212/0x1390 fs/bcachefs/btree_trans_commit.c:942
       __bch2_trans_commit+0x7ead/0x93c0 fs/bcachefs/btree_trans_commit.c:1140
       bch2_trans_commit fs/bcachefs/btree_update.h:184 [inline]
       bch2_logged_op_start+0x1c8/0x310 fs/bcachefs/logged_ops.c:92
       bch2_truncate+0x19e/0x2d0 fs/bcachefs/io_misc.c:294
       bchfs_truncate+0x85f/0xc90 fs/bcachefs/fs-io.c:464
       notify_change+0xbca/0xe90 fs/attr.c:503
       do_truncate+0x220/0x310 fs/open.c:65
       handle_truncate fs/namei.c:3395 [inline]
       do_open fs/namei.c:3778 [inline]
       path_openat+0x2e1e/0x3590 fs/namei.c:3933
       do_filp_open+0x235/0x490 fs/namei.c:3960
       do_sys_openat2+0x13e/0x1d0 fs/open.c:1415
       do_sys_open fs/open.c:1430 [inline]
       __do_sys_open fs/open.c:1438 [inline]
       __se_sys_open fs/open.c:1434 [inline]
       __x64_sys_open+0x225/0x270 fs/open.c:1434
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f

other info that might help us debug this:

Chain exists of:
  pcpu_alloc_mutex --> fs_reclaim --> &bc->lock

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&bc->lock);
                               lock(fs_reclaim);
                               lock(&bc->lock);
  lock(pcpu_alloc_mutex);

 *** DEADLOCK ***

6 locks held by syz.0.0/5328:
 #0: ffff88803d116420 (sb_writers#11){.+.+}-{0:0}, at: mnt_want_write+0x3f/0x90 fs/namespace.c:515
 #1: ffff8880124e88c8 (&sb->s_type->i_mutex_key#19){+.+.}-{3:3}, at: inode_lock include/linux/fs.h:815 [inline]
 #1: ffff8880124e88c8 (&sb->s_type->i_mutex_key#19){+.+.}-{3:3}, at: do_truncate+0x20c/0x310 fs/open.c:63
 #2: ffff88804ff00a38 (&c->snapshot_create_lock){.+.+}-{3:3}, at: bch2_truncate+0x166/0x2d0 fs/bcachefs/io_misc.c:292
 #3: ffff88804ff04398 (&c->btree_trans_barrier){.+.+}-{0:0}, at: srcu_lock_acquire include/linux/srcu.h:151 [inline]
 #3: ffff88804ff04398 (&c->btree_trans_barrier){.+.+}-{0:0}, at: srcu_read_lock include/linux/srcu.h:250 [inline]
 #3: ffff88804ff04398 (&c->btree_trans_barrier){.+.+}-{0:0}, at: __bch2_trans_get+0x7de/0xd20 fs/bcachefs/btree_iter.c:3228
 #4: ffff88804ff266d0 (&c->gc_lock){++++}-{3:3}, at: bch2_btree_update_start+0x682/0x14e0 fs/bcachefs/btree_update_interior.c:1202
 #5: ffff88804ff01c50 (&bc->lock){+.+.}-{3:3}, at: bch2_btree_node_mem_alloc+0x4ec/0x1340 fs/bcachefs/btree_cache.c:782

stack backtrace:
CPU: 0 UID: 0 PID: 5328 Comm: syz.0.0 Not tainted 6.12.0-rc6-syzkaller-00169-g906bd684e4b1 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
 print_circular_bug+0x13a/0x1b0 kernel/locking/lockdep.c:2074
 check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2206
 check_prev_add kernel/locking/lockdep.c:3161 [inline]
 check_prevs_add kernel/locking/lockdep.c:3280 [inline]
 validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904
 __lock_acquire+0x1384/0x2050 kernel/locking/lockdep.c:5202
 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825
 __mutex_lock_common kernel/locking/mutex.c:608 [inline]
 __mutex_lock+0x136/0xd70 kernel/locking/mutex.c:752
 pcpu_alloc_noprof+0x27f/0x16b0 mm/percpu.c:1795
 __six_lock_init+0x104/0x150 fs/bcachefs/six.c:869
 bch2_btree_lock_init+0x38/0x100 fs/bcachefs/btree_locking.c:12
 bch2_btree_node_mem_alloc+0x4f8/0x1340 fs/bcachefs/btree_cache.c:785
 __bch2_btree_node_alloc fs/bcachefs/btree_update_interior.c:325 [inline]
 bch2_btree_reserve_get+0x2df/0x1890 fs/bcachefs/btree_update_interior.c:554
 bch2_btree_update_start+0xe56/0x14e0 fs/bcachefs/btree_update_interior.c:1252
 bch2_btree_split_leaf+0x123/0x840 fs/bcachefs/btree_update_interior.c:1850
 bch2_trans_commit_error+0x212/0x1390 fs/bcachefs/btree_trans_commit.c:942
 __bch2_trans_commit+0x7ead/0x93c0 fs/bcachefs/btree_trans_commit.c:1140
 bch2_trans_commit fs/bcachefs/btree_update.h:184 [inline]
 bch2_logged_op_start+0x1c8/0x310 fs/bcachefs/logged_ops.c:92
 bch2_truncate+0x19e/0x2d0 fs/bcachefs/io_misc.c:294
 bchfs_truncate+0x85f/0xc90 fs/bcachefs/fs-io.c:464
 notify_change+0xbca/0xe90 fs/attr.c:503
 do_truncate+0x220/0x310 fs/open.c:65
 handle_truncate fs/namei.c:3395 [inline]
 do_open fs/namei.c:3778 [inline]
 path_openat+0x2e1e/0x3590 fs/namei.c:3933
 do_filp_open+0x235/0x490 fs/namei.c:3960
 do_sys_openat2+0x13e/0x1d0 fs/open.c:1415
 do_sys_open fs/open.c:1430 [inline]
 __do_sys_open fs/open.c:1438 [inline]
 __se_sys_open fs/open.c:1434 [inline]
 __x64_sys_open+0x225/0x270 fs/open.c:1434
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f369917e719
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f3699ef8038 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
RAX: ffffffffffffffda RBX: 00007f3699335f80 RCX: 00007f369917e719
RDX: 0000000000000000 RSI: 0000000000046342 RDI: 0000000020000040
RBP: 00007f36991f139e R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f3699335f80 R15: 00007ffea3924aa8
 </TASK>
syz.0.0 (5328) used greatest stack depth: 12728 bytes left

Crashes (4):
Time Kernel Commit Syzkaller Config Log Report Syz repro C repro VM info Assets (help?) Manager Title
2024/11/08 03:04 upstream 906bd684e4b1 179b040e .config console log report [disk image (non-bootable)] [vmlinux] [kernel image] ci-snapshot-upstream-root possible deadlock in pcpu_alloc_noprof
2024/11/03 21:03 upstream b9021de3ec2f f00eed24 .config console log report [disk image (non-bootable)] [vmlinux] [kernel image] ci-snapshot-upstream-root possible deadlock in pcpu_alloc_noprof
2024/10/24 16:17 upstream c2ee9f594da8 c08e46d6 .config console log report [disk image (non-bootable)] [vmlinux] [kernel image] ci-snapshot-upstream-root possible deadlock in pcpu_alloc_noprof
2024/09/25 16:13 upstream 684a64bf32b6 0b45cac3 .config console log report [disk image (non-bootable)] [vmlinux] [kernel image] ci-snapshot-upstream-root possible deadlock in pcpu_alloc_noprof
* Struck through repros no longer work on HEAD.