syzbot


ID Workflow Result Correct Bug Created Started Finished Revision Error
6df996ad-da12-4acf-8d74-efdcb5c4f945 assessment-kcsan Benign: ✅  Confident: ✅  KCSAN: data-race in _copy_to_iter / set_overhead (4) 2026/01/31 21:10 2026/01/31 21:11 2026/01/31 21:34 2186dcabcd743737b50c1a9f99a8bf0d3a5a7914
Benign: ✅    Confident: ✅   

Explanation:
The data race occurs between `set_overhead()` in `fs/ext4/ioctl.c` and `_copy_to_iter()` in `lib/iov_iter.c`.

### Analysis

1.  **Writer Side**: The function `set_overhead()` is called by `ext4_update_primary_sb()` (via `ext4_update_overhead()`) during the ext4 filesystem mount process or during a resize operation. It updates the `s_overhead_clusters` field in the `struct ext4_super_block`. This superblock is located within a buffer head's data, which is part of the block device's page cache. The update is performed while holding the buffer lock (`lock_buffer(bh)`).

2.  **Reader Side**: The function `_copy_to_iter()` is part of the generic `read()` system call path for block devices (`blkdev_read_iter` -> `filemap_read`). In this report, a process (likely `udevd` or `blkid`) is reading the block device directly. This read operation copies data from the same page cache page that contains the superblock.

3.  **The Race**: The writer is updating a 4-byte field (`s_overhead_clusters`) at offset 584 (0x248) within the 1024-byte superblock. The reader is copying the entire 1024-byte superblock from the page cache to user-space. Because the generic page cache read path does not synchronize with buffer-level locks, the reader can access the page while the writer is modifying it.

### Why the Race is Benign

*   **Kernel Stability**: The race does not involve any kernel-internal pointers, linked lists, or complex state machines. It is a simple data race on the *content* of a disk block being transferred to user-space. A 4-byte aligned write (which `s_overhead_clusters` is) is atomic on x86, so the reader will see either the old value or the new value for that specific field.
*   **Data Consistency**: While the reader might see an inconsistent superblock (e.g., the new value of `s_overhead_clusters` but an old checksum, as the checksum is updated in a separate step immediately after), this is expected behavior when a user-space process reads a block device that is being concurrently modified by the kernel (e.g., during mount). User-space tools that scan block devices (like `blkid`) are generally prepared to handle transiently inconsistent metadata or checksum failures.
*   **No Memory Corruption**: There is no risk of memory corruption or a kernel crash. The memory being accessed is valid and owned by the buffer/page cache.

In summary, this is a classic example of a benign data race where the kernel is updating metadata in the page cache while a user-space process is reading the raw block device. It has no impact on the integrity or stability of the kernel itself.

Crash report:
==================================================================
BUG: KCSAN: data-race in _copy_to_iter / set_overhead

write to 0xffff888151cce648 of 4 bytes by task 7536 on cpu 0:
 set_overhead+0x28/0x40 fs/ext4/ioctl.c:2000
 ext4_update_primary_sb fs/ext4/ioctl.c:77 [inline]
 ext4_update_superblocks_fn+0x206/0xaa0 fs/ext4/ioctl.c:221
 ext4_update_overhead+0x135/0x160 fs/ext4/ioctl.c:2013
 ext4_fill_super+0x3002/0x3800 fs/ext4/super.c:5798
 get_tree_bdev_flags+0x291/0x300 fs/super.c:1691
 get_tree_bdev+0x1f/0x30 fs/super.c:1714
 ext4_get_tree+0x1c/0x30 fs/ext4/super.c:5809
 vfs_get_tree+0x57/0x1d0 fs/super.c:1751
 fc_mount fs/namespace.c:1199 [inline]
 do_new_mount_fc fs/namespace.c:3636 [inline]
 do_new_mount+0x288/0x700 fs/namespace.c:3712
 path_mount+0x4d0/0xbc0 fs/namespace.c:4022
 do_mount fs/namespace.c:4035 [inline]
 __do_sys_mount fs/namespace.c:4224 [inline]
 __se_sys_mount+0x28c/0x2e0 fs/namespace.c:4201
 __x64_sys_mount+0x67/0x80 fs/namespace.c:4201
 x64_sys_call+0x2cca/0x3000 arch/x86/include/generated/asm/syscalls_64.h:166
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xc0/0x2a0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

read to 0xffff888151cce400 of 1024 bytes by task 3309 on cpu 1:
 instrument_copy_to_user include/linux/instrumented.h:113 [inline]
 copy_to_user_iter lib/iov_iter.c:24 [inline]
 iterate_ubuf include/linux/iov_iter.h:30 [inline]
 iterate_and_advance2 include/linux/iov_iter.h:302 [inline]
 iterate_and_advance include/linux/iov_iter.h:330 [inline]
 _copy_to_iter+0x130/0xea0 lib/iov_iter.c:197
 copy_page_to_iter+0x18f/0x2d0 lib/iov_iter.c:374
 copy_folio_to_iter include/linux/uio.h:204 [inline]
 filemap_read+0x407/0xa10 mm/filemap.c:2851
 blkdev_read_iter+0x22d/0x2e0 block/fops.c:856
 new_sync_read fs/read_write.c:491 [inline]
 vfs_read+0x6c0/0x7f0 fs/read_write.c:572
 ksys_read+0xdc/0x1a0 fs/read_write.c:715
 __do_sys_read fs/read_write.c:724 [inline]
 __se_sys_read fs/read_write.c:722 [inline]
 __x64_sys_read+0x40/0x50 fs/read_write.c:722
 x64_sys_call+0x2889/0x3000 arch/x86/include/generated/asm/syscalls_64.h:1
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xc0/0x2a0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 UID: 0 PID: 3309 Comm: udevd Tainted: G        W           syzkaller #0 PREEMPT(voluntary) 
Tainted: [W]=WARN
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
==================================================================

Trajectory:
Seq Timestamp Type Name Duration
0/0 2026/01/31 21:11 flow assessment-kcsan
23m Results:
map[Benign:true Confident:true Explanation:The data race occurs between `set_overhead()` in `fs/ext4/ioctl.c` and `_copy_to_iter()` in `lib/iov_iter.c`.

### Analysis

1.  **Writer Side**: The function `set_overhead()` is called by `ext4_update_primary_sb()` (via `ext4_update_overhead()`) during the ext4 filesystem mount process or during a resize operation. It updates the `s_overhead_clusters` field in the `struct ext4_super_block`. This superblock is located within a buffer head's data, which is part of the block device's page cache. The update is performed while holding the buffer lock (`lock_buffer(bh)`).

2.  **Reader Side**: The function `_copy_to_iter()` is part of the generic `read()` system call path for block devices (`blkdev_read_iter` -> `filemap_read`). In this report, a process (likely `udevd` or `blkid`) is reading the block device directly. This read operation copies data from the same page cache page that contains the superblock.

3.  **The Race**: The writer is updating a 4-byte field (`s_overhead_clusters`) at offset 584 (0x248) within the 1024-byte superblock. The reader is copying the entire 1024-byte superblock from the page cache to user-space. Because the generic page cache read path does not synchronize with buffer-level locks, the reader can access the page while the writer is modifying it.

### Why the Race is Benign

*   **Kernel Stability**: The race does not involve any kernel-internal pointers, linked lists, or complex state machines. It is a simple data race on the *content* of a disk block being transferred to user-space. A 4-byte aligned write (which `s_overhead_clusters` is) is atomic on x86, so the reader will see either the old value or the new value for that specific field.
*   **Data Consistency**: While the reader might see an inconsistent superblock (e.g., the new value of `s_overhead_clusters` but an old checksum, as the checksum is updated in a separate step immediately after), this is expected behavior when a user-space process reads a block device that is being concurrently modified by the kernel (e.g., during mount). User-space tools that scan block devices (like `blkid`) are generally prepared to handle transiently inconsistent metadata or checksum failures.
*   **No Memory Corruption**: There is no risk of memory corruption or a kernel crash. The memory being accessed is valid and owned by the buffer/page cache.

In summary, this is a classic example of a benign data race where the kernel is updating metadata in the page cache while a user-space process is reading the raw block device. It has no impact on the integrity or stability of the kernel itself.]

1/1 2026/01/31 21:11 action kernel-checkouter
0m Results:
map[KernelSrc:/usr/local/google/home/dvyukov/syzkaller/agent/workdir/cache/src/656999a5500966ef94900a70fd38eebe6ff83d6f]

2/1 2026/01/31 21:11 action kernel-builder
4m Results:
map[KernelObj:/usr/local/google/home/dvyukov/syzkaller/agent/workdir/cache/build/7af8afe082f2674eccaa474a56b91edae2609ab6]

3/1 2026/01/31 21:15 action codesearch-prepare
17m Results:
map[Index:codesearch-index]

4/1 2026/01/31 21:33 agent expert
1m Model:
gemini-3-flash-preview

Results:
map[Benign:true Confident:true]

Instruction:
You are an experienced Linux kernel developer tasked with determining if the given kernel
data race is benign or not. The data race report is from KCSAN tool.
It contains 2 stack traces of the memory accesses that constitute a data race.

A "benign" data races are on a simple int/bool variable or similar field,
and the accesses are not supposed to be protected by any mutual exclusion primitives.
Common examples of such "benign" data races are accesses to various flags fields,
statistics counters, and similar. A "benign" data race does not lead to memory corruption/crash
with a conservative compiler that compiles memory accesses to primitive types
effectively as atomic.

A non-benign (or "harmful" data race) can lead to corruption/crash even with
a conservative compiler that compiles memory accesses to primitive types
effectively as atomic. A common example of a "harmful" data races is race on
a complex container (list/hashmap/etc), where accesses are supposed to be protected
by a mutual exclusion primitive.

In the final reply explain why you think the given data race is benign or is harmful.

Use the provided tools to confirm any assumptions, variables/fields being accessed, etc.
In particular, don't make assumptions about the kernel source code,
use codesearch tools to read the actual source code.

Prefer calling several tools at the same time to save round-trips.


Use set-results tool to provide results of the analysis.
It must be called exactly once before the final reply.
Ignore results of this tool.

Prompt:
The data race report is:

==================================================================
BUG: KCSAN: data-race in _copy_to_iter / set_overhead

write to 0xffff888151cce648 of 4 bytes by task 7536 on cpu 0:
 set_overhead+0x28/0x40 fs/ext4/ioctl.c:2000
 ext4_update_primary_sb fs/ext4/ioctl.c:77 [inline]
 ext4_update_superblocks_fn+0x206/0xaa0 fs/ext4/ioctl.c:221
 ext4_update_overhead+0x135/0x160 fs/ext4/ioctl.c:2013
 ext4_fill_super+0x3002/0x3800 fs/ext4/super.c:5798
 get_tree_bdev_flags+0x291/0x300 fs/super.c:1691
 get_tree_bdev+0x1f/0x30 fs/super.c:1714
 ext4_get_tree+0x1c/0x30 fs/ext4/super.c:5809
 vfs_get_tree+0x57/0x1d0 fs/super.c:1751
 fc_mount fs/namespace.c:1199 [inline]
 do_new_mount_fc fs/namespace.c:3636 [inline]
 do_new_mount+0x288/0x700 fs/namespace.c:3712
 path_mount+0x4d0/0xbc0 fs/namespace.c:4022
 do_mount fs/namespace.c:4035 [inline]
 __do_sys_mount fs/namespace.c:4224 [inline]
 __se_sys_mount+0x28c/0x2e0 fs/namespace.c:4201
 __x64_sys_mount+0x67/0x80 fs/namespace.c:4201
 x64_sys_call+0x2cca/0x3000 arch/x86/include/generated/asm/syscalls_64.h:166
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xc0/0x2a0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

read to 0xffff888151cce400 of 1024 bytes by task 3309 on cpu 1:
 instrument_copy_to_user include/linux/instrumented.h:113 [inline]
 copy_to_user_iter lib/iov_iter.c:24 [inline]
 iterate_ubuf include/linux/iov_iter.h:30 [inline]
 iterate_and_advance2 include/linux/iov_iter.h:302 [inline]
 iterate_and_advance include/linux/iov_iter.h:330 [inline]
 _copy_to_iter+0x130/0xea0 lib/iov_iter.c:197
 copy_page_to_iter+0x18f/0x2d0 lib/iov_iter.c:374
 copy_folio_to_iter include/linux/uio.h:204 [inline]
 filemap_read+0x407/0xa10 mm/filemap.c:2851
 blkdev_read_iter+0x22d/0x2e0 block/fops.c:856
 new_sync_read fs/read_write.c:491 [inline]
 vfs_read+0x6c0/0x7f0 fs/read_write.c:572
 ksys_read+0xdc/0x1a0 fs/read_write.c:715
 __do_sys_read fs/read_write.c:724 [inline]
 __se_sys_read fs/read_write.c:722 [inline]
 __x64_sys_read+0x40/0x50 fs/read_write.c:722
 x64_sys_call+0x2889/0x3000 arch/x86/include/generated/asm/syscalls_64.h:1
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xc0/0x2a0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 UID: 0 PID: 3309 Comm: udevd Tainted: G        W           syzkaller #0 PREEMPT(voluntary) 
Tainted: [W]=WARN
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
==================================================================


Reply:
The data race occurs between `set_overhead()` in `fs/ext4/ioctl.c` and `_copy_to_iter()` in `lib/iov_iter.c`.

### Analysis

1.  **Writer Side**: The function `set_overhead()` is called by `ext4_update_primary_sb()` (via `ext4_update_overhead()`) during the ext4 filesystem mount process or during a resize operation. It updates the `s_overhead_clusters` field in the `struct ext4_super_block`. This superblock is located within a buffer head's data, which is part of the block device's page cache. The update is performed while holding the buffer lock (`lock_buffer(bh)`).

2.  **Reader Side**: The function `_copy_to_iter()` is part of the generic `read()` system call path for block devices (`blkdev_read_iter` -> `filemap_read`). In this report, a process (likely `udevd` or `blkid`) is reading the block device directly. This read operation copies data from the same page cache page that contains the superblock.

3.  **The Race**: The writer is updating a 4-byte field (`s_overhead_clusters`) at offset 584 (0x248) within the 1024-byte superblock. The reader is copying the entire 1024-byte superblock from the page cache to user-space. Because the generic page cache read path does not synchronize with buffer-level locks, the reader can access the page while the writer is modifying it.

### Why the Race is Benign

*   **Kernel Stability**: The race does not involve any kernel-internal pointers, linked lists, or complex state machines. It is a simple data race on the *content* of a disk block being transferred to user-space. A 4-byte aligned write (which `s_overhead_clusters` is) is atomic on x86, so the reader will see either the old value or the new value for that specific field.
*   **Data Consistency**: While the reader might see an inconsistent superblock (e.g., the new value of `s_overhead_clusters` but an old checksum, as the checksum is updated in a separate step immediately after), this is expected behavior when a user-space process reads a block device that is being concurrently modified by the kernel (e.g., during mount). User-space tools that scan block devices (like `blkid`) are generally prepared to handle transiently inconsistent metadata or checksum failures.
*   **No Memory Corruption**: There is no risk of memory corruption or a kernel crash. The memory being accessed is valid and owned by the buffer/page cache.

In summary, this is a classic example of a benign data race where the kernel is updating metadata in the page cache while a user-space process is reading the raw block device. It has no impact on the integrity or stability of the kernel itself.

5/2 2026/01/31 21:33 llm expert
0m Model:
gemini-3-flash-preview

Tokens:
							input: 4127
							output: 85
							thoughts: 320
						

Thoughts:
**Data Race Analysis: Ext4 Superblock Update vs. Direct Block Device Read**

Okay, so the report flags a data race, specifically between the `set_overhead` function and `_copy_to_iter`. Let's break this down. It looks like `set_overhead` is writing four bytes to memory address `0xffff888151cce648`, while `_copy_to_iter` is reading 1024 bytes starting at `0xffff888151cce400`. Hold on, let's calculate the offset... `0xffff888151cce648 - 0xffff888151cce400 = 0x248 = 584` bytes. So, the write operation from `set_overhead` *is* actually within the read range of `_copy_to_iter`. This confirms the potential for a race condition.

Now, where are these functions called? `set_overhead` is called from `ext4_update_primary_sb`, which in turn is called from `ext4_update_superblocks_fn`. Right, this is definitely related to updating the ext4 superblock. It makes sense, as these write operations would involve updates to key metadata.

Meanwhile, `_copy_to_iter` is triggered during a `read` system call, specifically within `blkdev_read_iter`. This strongly suggests that someone is trying to directly read the block device itself, most likely the partition that contains the ext4 filesystem. This raises the question of *why* someone is reading the block device directly. It's highly probable that this is happening while the filesystem is mounted or undergoing an update. Perhaps someone is bypassing the VFS layer to directly inspect the on-disk structures.

To confirm the context of `set_overhead`, I should check the source, specifically `fs/ext4/ioctl.c:2000` to be exact. This will give more context on the circumstances that invoke the `set_overhead` function.



6/2 2026/01/31 21:33 tool codesearch-definition-source
0m Args:
map[ContextFile:fs/ext4/ioctl.c IncludeLines:true Name:set_overhead]

Results:
map[SourceCode:1997:	static void set_overhead(struct ext4_sb_info *sbi,
1998:				 struct ext4_super_block *es, const void *arg)
1999:	{
2000:		es->s_overhead_clusters = cpu_to_le32(*((unsigned long *) arg));
2001:	}
 SourceFile:fs/ext4/ioctl.c]

7/2 2026/01/31 21:33 tool codesearch-definition-source
0m Args:
map[ContextFile:fs/ext4/ioctl.c IncludeLines:true Name:ext4_update_primary_sb]

Results:
map[SourceCode:  57:	static
  58:	int ext4_update_primary_sb(struct super_block *sb, handle_t *handle,
  59:				   ext4_update_sb_callback func,
  60:				   const void *arg)
  61:	{
  62:		int err = 0;
  63:		struct ext4_sb_info *sbi = EXT4_SB(sb);
  64:		struct buffer_head *bh = sbi->s_sbh;
  65:		struct ext4_super_block *es = sbi->s_es;
  66:	
  67:		trace_ext4_update_sb(sb, bh->b_blocknr, 1);
  68:	
  69:		BUFFER_TRACE(bh, "get_write_access");
  70:		err = ext4_journal_get_write_access(handle, sb,
  71:						    bh,
  72:						    EXT4_JTR_NONE);
  73:		if (err)
  74:			goto out_err;
  75:	
  76:		lock_buffer(bh);
  77:		func(sbi, es, arg);
  78:		ext4_superblock_csum_set(sb);
  79:		unlock_buffer(bh);
  80:	
  81:		if (buffer_write_io_error(bh) || !buffer_uptodate(bh)) {
  82:			ext4_msg(sbi->s_sb, KERN_ERR, "previous I/O error to "
  83:				 "superblock detected");
  84:			clear_buffer_write_io_error(bh);
  85:			set_buffer_uptodate(bh);
  86:		}
  87:	
  88:		err = ext4_handle_dirty_metadata(handle, NULL, bh);
  89:		if (err)
  90:			goto out_err;
  91:		err = sync_dirty_buffer(bh);
  92:	out_err:
  93:		ext4_std_error(sb, err);
  94:		return err;
  95:	}
 SourceFile:fs/ext4/ioctl.c]

8/2 2026/01/31 21:33 llm expert
0m Model:
gemini-3-flash-preview

Tokens:
							input: 5695
							output: 24
							thoughts: 334
						

Thoughts:
**Analyzing Superblock Updates in ext4**

Okay, let me break down what's happening here. I see that within `ext4_update_primary_sb`, the `func` `set_overhead` gets called while holding a lock on the buffer head (`bh`) using `lock_buffer(bh)`. That's standard practice, ensuring we have exclusive access to the superblock data. And the `es` variable, which is `sbi->s_es`, it's pointing right into the buffer head's data where the superblock resides. Got it.

Now, the read side of this is `blkdev_read_iter`, which, as the name suggests, reads from the block device. It appears to be reading the page that *contains* the superblock. Good. Makes sense. The superblock's location is, of course, a fixed offset on the device. For ext4, it's typically located at 1024 bytes into the partition.

I'm seeing memory addresses: the read is happening from `0xffff888151cce400`. The write is to `0xffff888151cce648`. A quick calculation shows that the difference is 584 bytes. Now if the read starts at the beginning of the page, this offset matches.

Ah, yes, `0xffff888151cce400` *should* be the beginning of the buffer in the page cache for this block device. We're talking about a cached copy.

If `udevd` is involved in reading from the block device, it makes sense. It's probably reading the initial few kilobytes to identify the partition or filesystem. That's a classic scenario, especially during device discovery.

The write is targeting `es->s_overhead_clusters`. Now, I need to check the exact offset of `s_overhead_clusters` within the `struct ext4_super_block`. Gotta confirm the placement. Let me look that up real quick...



9/2 2026/01/31 21:33 tool codesearch-struct-layout
0m Args:
map[Name:ext4_super_block]

Results:
map[Fields:[map[Name:s_inodes_count OffsetBits:0 SizeBits:32] map[Name:s_blocks_count_lo OffsetBits:32 SizeBits:32] map[Name:s_r_blocks_count_lo OffsetBits:64 SizeBits:32] map[Name:s_free_blocks_count_lo OffsetBits:96 SizeBits:32] map[Name:s_free_inodes_count OffsetBits:128 SizeBits:32] map[Name:s_first_data_block OffsetBits:160 SizeBits:32] map[Name:s_log_block_size OffsetBits:192 SizeBits:32] map[Name:s_log_cluster_size OffsetBits:224 SizeBits:32] map[Name:s_blocks_per_group OffsetBits:256 SizeBits:32] map[Name:s_clusters_per_group OffsetBits:288 SizeBits:32] map[Name:s_inodes_per_group OffsetBits:320 SizeBits:32] map[Name:s_mtime OffsetBits:352 SizeBits:32] map[Name:s_wtime OffsetBits:384 SizeBits:32] map[Name:s_mnt_count OffsetBits:416 SizeBits:16] map[Name:s_max_mnt_count OffsetBits:432 SizeBits:16] map[Name:s_magic OffsetBits:448 SizeBits:16] map[Name:s_state OffsetBits:464 SizeBits:16] map[Name:s_errors OffsetBits:480 SizeBits:16] map[Name:s_minor_rev_level OffsetBits:496 SizeBits:16] map[Name:s_lastcheck OffsetBits:512 SizeBits:32] map[Name:s_checkinterval OffsetBits:544 SizeBits:32] map[Name:s_creator_os OffsetBits:576 SizeBits:32] map[Name:s_rev_level OffsetBits:608 SizeBits:32] map[Name:s_def_resuid OffsetBits:640 SizeBits:16] map[Name:s_def_resgid OffsetBits:656 SizeBits:16] map[Name:s_first_ino OffsetBits:672 SizeBits:32] map[Name:s_inode_size OffsetBits:704 SizeBits:16] map[Name:s_block_group_nr OffsetBits:720 SizeBits:16] map[Name:s_feature_compat OffsetBits:736 SizeBits:32] map[Name:s_feature_incompat OffsetBits:768 SizeBits:32] map[Name:s_feature_ro_compat OffsetBits:800 SizeBits:32] map[Name:s_uuid OffsetBits:832 SizeBits:128] map[Name:s_volume_name OffsetBits:960 SizeBits:128] map[Name:s_last_mounted OffsetBits:1088 SizeBits:512] map[Name:s_algorithm_usage_bitmap OffsetBits:1600 SizeBits:32] map[Name:s_prealloc_blocks OffsetBits:1632 SizeBits:8] map[Name:s_prealloc_dir_blocks OffsetBits:1640 SizeBits:8] map[Name:s_reserved_gdt_blocks OffsetBits:1648 SizeBits:16] map[Name:s_journal_uuid OffsetBits:1664 SizeBits:128] map[Name:s_journal_inum OffsetBits:1792 SizeBits:32] map[Name:s_journal_dev OffsetBits:1824 SizeBits:32] map[Name:s_last_orphan OffsetBits:1856 SizeBits:32] map[Name:s_hash_seed OffsetBits:1888 SizeBits:128] map[Name:s_def_hash_version OffsetBits:2016 SizeBits:8] map[Name:s_jnl_backup_type OffsetBits:2024 SizeBits:8] map[Name:s_desc_size OffsetBits:2032 SizeBits:16] map[Name:s_default_mount_opts OffsetBits:2048 SizeBits:32] map[Name:s_first_meta_bg OffsetBits:2080 SizeBits:32] map[Name:s_mkfs_time OffsetBits:2112 SizeBits:32] map[Name:s_jnl_blocks OffsetBits:2144 SizeBits:544] map[Name:s_blocks_count_hi OffsetBits:2688 SizeBits:32] map[Name:s_r_blocks_count_hi OffsetBits:2720 SizeBits:32] map[Name:s_free_blocks_count_hi OffsetBits:2752 SizeBits:32] map[Name:s_min_extra_isize OffsetBits:2784 SizeBits:16] map[Name:s_want_extra_isize OffsetBits:2800 SizeBits:16] map[Name:s_flags OffsetBits:2816 SizeBits:32] map[Name:s_raid_stride OffsetBits:2848 SizeBits:16] map[Name:s_mmp_update_interval OffsetBits:2864 SizeBits:16] map[Name:s_mmp_block OffsetBits:2880 SizeBits:64] map[Name:s_raid_stripe_width OffsetBits:2944 SizeBits:32] map[Name:s_log_groups_per_flex OffsetBits:2976 SizeBits:8] map[Name:s_checksum_type OffsetBits:2984 SizeBits:8] map[Name:s_encryption_level OffsetBits:2992 SizeBits:8] map[Name:s_reserved_pad OffsetBits:3000 SizeBits:8] map[Name:s_kbytes_written OffsetBits:3008 SizeBits:64] map[Name:s_snapshot_inum OffsetBits:3072 SizeBits:32] map[Name:s_snapshot_id OffsetBits:3104 SizeBits:32] map[Name:s_snapshot_r_blocks_count OffsetBits:3136 SizeBits:64] map[Name:s_snapshot_list OffsetBits:3200 SizeBits:32] map[Name:s_error_count OffsetBits:3232 SizeBits:32] map[Name:s_first_error_time OffsetBits:3264 SizeBits:32] map[Name:s_first_error_ino OffsetBits:3296 SizeBits:32] map[Name:s_first_error_block OffsetBits:3328 SizeBits:64] map[Name:s_first_error_func OffsetBits:3392 SizeBits:256] map[Name:s_first_error_line OffsetBits:3648 SizeBits:32] map[Name:s_last_error_time OffsetBits:3680 SizeBits:32] map[Name:s_last_error_ino OffsetBits:3712 SizeBits:32] map[Name:s_last_error_line OffsetBits:3744 SizeBits:32] map[Name:s_last_error_block OffsetBits:3776 SizeBits:64] map[Name:s_last_error_func OffsetBits:3840 SizeBits:256] map[Name:s_mount_opts OffsetBits:4096 SizeBits:512] map[Name:s_usr_quota_inum OffsetBits:4608 SizeBits:32] map[Name:s_grp_quota_inum OffsetBits:4640 SizeBits:32] map[Name:s_overhead_clusters OffsetBits:4672 SizeBits:32] map[Name:s_backup_bgs OffsetBits:4704 SizeBits:64] map[Name:s_encrypt_algos OffsetBits:4768 SizeBits:32] map[Name:s_encrypt_pw_salt OffsetBits:4800 SizeBits:128] map[Name:s_lpf_ino OffsetBits:4928 SizeBits:32] map[Name:s_prj_quota_inum OffsetBits:4960 SizeBits:32] map[Name:s_checksum_seed OffsetBits:4992 SizeBits:32] map[Name:s_wtime_hi OffsetBits:5024 SizeBits:8] map[Name:s_mtime_hi OffsetBits:5032 SizeBits:8] map[Name:s_mkfs_time_hi OffsetBits:5040 SizeBits:8] map[Name:s_lastcheck_hi OffsetBits:5048 SizeBits:8] map[Name:s_first_error_time_hi OffsetBits:5056 SizeBits:8] map[Name:s_last_error_time_hi OffsetBits:5064 SizeBits:8] map[Name:s_first_error_errcode OffsetBits:5072 SizeBits:8] map[Name:s_last_error_errcode OffsetBits:5080 SizeBits:8] map[Name:s_encoding OffsetBits:5088 SizeBits:16] map[Name:s_encoding_flags OffsetBits:5104 SizeBits:16] map[Name:s_orphan_file_inum OffsetBits:5120 SizeBits:32] map[Name:s_def_resuid_hi OffsetBits:5152 SizeBits:16] map[Name:s_def_resgid_hi OffsetBits:5168 SizeBits:16] map[Name:s_reserved OffsetBits:5184 SizeBits:2976] map[Name:s_checksum OffsetBits:8160 SizeBits:32]]]

10/2 2026/01/31 21:33 llm expert
0m Model:
gemini-3-flash-preview

Tokens:
							input: 9103
							output: 43
							thoughts: 830
						

Thoughts:
**Analyzing a Potential Data Race in ext4 Superblock Updates**

Okay, so I've just tracked down that `s_overhead_clusters` is located exactly where I expected, 584 bytes into the superblock. That's a good confirmation. Now, here's where things get interesting. `ext4_update_primary_sb` is modifying the superblock in the buffer cache, and simultaneously, `udevd` (or some other process reading the block device) is trying to access the *same* data, triggering a read of the same page from the buffer cache.

Let me think this through. The `read` operation on the block device goes through `blkdev_read_iter` and then `filemap_read`. Right. And `filemap_read` pulls data from the block device's page cache. Now, in Linux, the buffer cache *is* the page cache. So, both the superblock update and the read are hitting the *same* physical page in memory. One is writing, the other is reading, potentially at the *same* time.

Is this dangerous? Well, `ext4_update_primary_sb` is updating `s_overhead_clusters`, then recalculating the checksum with `ext4_superblock_csum_set(sb)`. The reader could see a partially updated superblock. The potential scenarios are: the old cluster count and the old checksum, the new count and the old checksum, or the new count and the new checksum. If the reader sees the second case, it has an invalid checksum. This could make something like `blkid`, which `udevd` uses, fail to identify the filesystem. Although, `blkid` isn't always that strict and might retry.

But wait. Examining the code, `ext4_update_primary_sb` does something critical:

```c
lock_buffer(bh);
func(sbi, es, arg);
ext4_superblock_csum_set(sb);
unlock_buffer(bh);
```

It locks the buffer! Does the reader check for this lock? No, `filemap_read` doesn't care about buffer locks. It just grabs the data from the page.

So, is this a kernel-level issue? No. The read operation is happening in user-space, reading the block device. The kernel is simply serving up whatever data happens to be in the page cache at that moment. User-space programs reading block devices while they're being mounted or modified are, by their nature, vulnerable to inconsistent data if they don't have proper synchronization. And they can't easily synchronize with internal kernel operations.

But, KCSAN is reporting this as a data race. Usually, these kinds of page cache data races between a writer and a reader are considered "benign" because they don't crash the kernel, but can lead to the user seeing bad data.

But wait, hold on. This `ext4_update_primary_sb` is modifying the superblock *in place* in the buffer cache. That *would* be more serious if another *kernel* process was reading it. But here, the reader is just `_copy_to_iter`, copying data to user-space.

Let me look for other superblock writers. `ext4_update_superblocks_fn` looks like it's called during a mount operation or when overhead information is being updated. Okay, so in `ext4_fill_super`:

```c
ext4_update_overhead(sb, false);
```

`ext4_update_overhead` calls `ext4_update_superblocks_fn`. Right. I should look into `ext4_update_overhead` in `fs/ext4/ioctl.c`. Got it.



11/2 2026/01/31 21:33 tool codesearch-definition-source
0m Args:
map[ContextFile:fs/ext4/ioctl.c IncludeLines:true Name:ext4_update_overhead]

Results:
map[SourceCode:2003:	int ext4_update_overhead(struct super_block *sb, bool force)
2004:	{
2005:		struct ext4_sb_info *sbi = EXT4_SB(sb);
2006:	
2007:		if (ext4_emergency_state(sb) || sb_rdonly(sb))
2008:			return 0;
2009:		if (!force &&
2010:		    (sbi->s_overhead == 0 ||
2011:		     sbi->s_overhead == le32_to_cpu(sbi->s_es->s_overhead_clusters)))
2012:			return 0;
2013:		return ext4_update_superblocks_fn(sb, set_overhead, &sbi->s_overhead);
2014:	}
 SourceFile:fs/ext4/ioctl.c]

12/2 2026/01/31 21:33 llm expert
0m Model:
gemini-3-flash-preview

Tokens:
							input: 11027
							output: 46
							thoughts: 12
						

Thoughts:
Okay, here's my interpretation of those thoughts, summarized in the first person, assuming an expert-level understanding:

**Deep Dive into `ext4_update_superblocks_fn`**

Alright, let's dissect this. "And `ext4_update_superblocks_fn`"... that immediately signals something crucial in ext4 filesystem management. I'm thinking about the process of ensuring superblock consistency, the very heartbeat of the filesystem. This likely involves handling multiple superblocks for redundancy, a key design principle for fault tolerance in ext4. I need to consider all the angles, so I can see what is really being said about this.

I'm probably dealing with scenarios where changes have occurred, or might occur to the filesystem. When the changes or events occur, the system *must* update the superblocks in a consistent and safe manner, to avoid filesystem corruption. Is this function about handling the actual *writing* of updated superblock information? Or is it more concerned with *coordinating* the process, ensuring all replicas are updated in the right order and that the operation is atomic?

I'm thinking about the potential challenges. What about power failures during the update? How is journaling involved? Is there a particular concern about the "last write wins" scenario when a power failure occurs? What are the implications in read-only and read-write modes? Also, I must consider the different storage types that can be used. This may be especially important when the file system is hosted on an SSD.

The more important considerations that come to mind are likely around locking and concurrency. It *has* to manage concurrent access to the superblocks. How are mutexes, semaphores, or other synchronization primitives used here? Is there a need for fine-grained locking or broader, more encompassing protection? Also, what are the implications on performance? Is there a bottleneck in the update process that could degrade performance?

I must consider the various error-handling mechanisms built into the function as well. How does it deal with I/O errors? What kind of recovery procedures are employed if one of the superblock replicas fails to update? Is there any mechanism for notifying the user or the administrator about these issues? Are there different versions of `ext4_update_superblocks_fn`?
This is crucial. I want to see the details.



13/2 2026/01/31 21:33 tool codesearch-definition-source
0m Args:
map[ContextFile:fs/ext4/ioctl.c IncludeLines:true Name:ext4_update_superblocks_fn]

Results:
map[SourceCode: 177:	/*
 178:	 * Update primary and backup superblocks using the provided function
 179:	 * func and argument arg.
 180:	 *
 181:	 * Only the primary superblock and at most two backup superblock
 182:	 * modifications are journalled; the rest is modified without journal.
 183:	 * This is safe because e2fsck will re-write them if there is a problem,
 184:	 * and we're very unlikely to ever need more than two backups.
 185:	 */
 186:	static
 187:	int ext4_update_superblocks_fn(struct super_block *sb,
 188:				       ext4_update_sb_callback func,
 189:				       const void *arg)
 190:	{
 191:		handle_t *handle;
 192:		ext4_group_t ngroups;
 193:		unsigned int three = 1;
 194:		unsigned int five = 5;
 195:		unsigned int seven = 7;
 196:		int err = 0, ret, i;
 197:		ext4_group_t grp, primary_grp;
 198:		struct ext4_sb_info *sbi = EXT4_SB(sb);
 199:	
 200:		/*
 201:		 * We can't update superblocks while the online resize is running
 202:		 */
 203:		if (test_and_set_bit_lock(EXT4_FLAGS_RESIZING,
 204:					  &sbi->s_ext4_flags)) {
 205:			ext4_msg(sb, KERN_ERR, "Can't modify superblock while"
 206:				 "performing online resize");
 207:			return -EBUSY;
 208:		}
 209:	
 210:		/*
 211:		 * We're only going to update primary superblock and two
 212:		 * backup superblocks in this transaction.
 213:		 */
 214:		handle = ext4_journal_start_sb(sb, EXT4_HT_MISC, 3);
 215:		if (IS_ERR(handle)) {
 216:			err = PTR_ERR(handle);
 217:			goto out;
 218:		}
 219:	
 220:		/* Update primary superblock */
 221:		err = ext4_update_primary_sb(sb, handle, func, arg);
 222:		if (err) {
 223:			ext4_msg(sb, KERN_ERR, "Failed to update primary "
 224:				 "superblock");
 225:			goto out_journal;
 226:		}
 227:	
 228:		primary_grp = ext4_get_group_number(sb, sbi->s_sbh->b_blocknr);
 229:		ngroups = ext4_get_groups_count(sb);
 230:	
 231:		/*
 232:		 * Update backup superblocks. We have to start from group 0
 233:		 * because it might not be where the primary superblock is
 234:		 * if the fs is mounted with -o sb=<backup_sb_block>
 235:		 */
 236:		i = 0;
 237:		grp = 0;
 238:		while (grp < ngroups) {
 239:			/* Skip primary superblock */
 240:			if (grp == primary_grp)
 241:				goto next_grp;
 242:	
 243:			ret = ext4_update_backup_sb(sb, handle, grp, func, arg);
 244:			if (ret < 0) {
 245:				/* Ignore bad checksum; try to update next sb */
 246:				if (ret == -EFSBADCRC)
 247:					goto next_grp;
 248:				err = ret;
 249:				goto out_journal;
 250:			}
 251:	
 252:			i += ret;
 253:			if (handle && i > 1) {
 254:				/*
 255:				 * We're only journalling primary superblock and
 256:				 * two backup superblocks; the rest is not
 257:				 * journalled.
 258:				 */
 259:				err = ext4_journal_stop(handle);
 260:				if (err)
 261:					goto out;
 262:				handle = NULL;
 263:			}
 264:	next_grp:
 265:			grp = ext4_list_backups(sb, &three, &five, &seven);
 266:		}
 267:	
 268:	out_journal:
 269:		if (handle) {
 270:			ret = ext4_journal_stop(handle);
 271:			if (ret && !err)
 272:				err = ret;
 273:		}
 274:	out:
 275:		clear_bit_unlock(EXT4_FLAGS_RESIZING, &sbi->s_ext4_flags);
 276:		smp_mb__after_atomic();
 277:		return err ? err : 0;
 278:	}
 SourceFile:fs/ext4/ioctl.c]

14/2 2026/01/31 21:33 llm expert
0m Model:
gemini-3-flash-preview

Tokens:
							input: 13067
							output: 46
							thoughts: 275
						

Thoughts:
**My Assessment of the Data Race**

Okay, so I'm looking at this data race, and my initial thought is that it's between `set_overhead` which is writing to `s_overhead_clusters` and `_copy_to_iter`, which is reading the page containing the superblock. The crucial detail here is that the write is protected by `lock_buffer(bh)`, while the read, done through `filemap_read`, doesn't take that lock. This is the heart of the race condition.

Now, if `udevd` is involved, chances are it's trying to figure out what filesystem is on this block device. If it hits an inconsistent superblock – maybe a bad checksum due to the race – it might misidentify it. However, this isn't a persistent problem. It’s just a transient state, so a subsequent read should likely get a valid superblock.

The next thing I'm asking myself: Am I dealing with the possibility of a kernel crash or data corruption? I think not. The write is just updating a field inside a buffer head's data structure and the read copies this data into user space. Neither action should trigger a kernel crash.

Realistically, the only "harm" here is that user-space might see a corrupted superblock for a brief period. However, considering the block device is being mounted and modified by the kernel, user-space reading without any coordination is already open to inconsistencies. It could see any number of intermediate states while the device is in transition.

Hold on a moment, `ext4_update_primary_sb` also calls `ext4_superblock_csum_set(sb)`. Let me go and have a look at what that's doing.



15/2 2026/01/31 21:33 tool codesearch-definition-source
0m Args:
map[ContextFile:fs/ext4/super.c IncludeLines:true Name:ext4_superblock_csum_set]

Results:
map[SourceCode: 317:	void ext4_superblock_csum_set(struct super_block *sb)
 318:	{
 319:		struct ext4_super_block *es = EXT4_SB(sb)->s_es;
 320:	
 321:		if (!ext4_has_feature_metadata_csum(sb))
 322:			return;
 323:	
 324:		es->s_checksum = ext4_superblock_csum(es);
 325:	}
 SourceFile:fs/ext4/super.c]

16/2 2026/01/31 21:33 llm expert
0m Model:
gemini-3-flash-preview

Tokens:
							input: 13923
							output: 26
							thoughts: 1180
						

Thoughts:
**Analyzing a Potential Data Race in ext4 Superblock Updates**

Okay, so I'm looking at this data race report. It seems that the kernel calculates a checksum and writes it to `es->s_checksum`, and there's a reported race on `es->s_overhead_clusters`. KCSAN might also be flagging a race on the checksum itself, if it happens.

My initial thought is that this is likely a benign data race. The write is only 4 bytes, while the read is 1024. Compiler reordering or seeing a partial write (though 4-byte writes *should* be atomic on x86) wouldn't necessarily break kernel stability. This seems like a common scenario: metadata updates in the page cache being read by userspace.

But then I had a thought - could this be an issue if *the kernel itself* reads the superblock from the page cache? The kernel typically accesses the superblock via `sbi->s_es`, which points to the buffer data. If it needs to read from disk, it'll use `sb_bread` or similar which gets a `buffer_head`. If it's in the buffer cache, it gets the existing buffer.

Now, `ext4_update_primary_sb` uses `lock_buffer(bh)` to protect the update. Any other kernel code updating the superblock should also use `lock_buffer(bh)` or be part of the same transaction. And kernel code needing a consistent superblock should also *probably* use `lock_buffer`.

However, the reader here, `filemap_read`, is a generic read path for block devices and it doesn't understand superblock consistency.

So, if this is benign, I should check for similar reports or if this is a known pattern. Actually, it's pretty common for filesystem metadata updates in the page cache to race with block device reads!

Let me focus on `s_overhead_clusters`. It's updated in `ext4_fill_super`, near the end.

```c
5798:		ext4_update_overhead(sb, false);
```

If `udevd` reads a "corrupt" superblock, it probably just ignores it. It's not a kernel-crashing issue, probably.

Could this lead to a kernel problem? If the reader was another kernel thread relying on a consistent superblock… but here, the reader is `udevd` via `sys_read`.

Another thing: `ext4_update_primary_sb` calls `sync_dirty_buffer(bh)` which triggers I/O to write the superblock to disk. This race is occurring between the CPU writing to the buffer and another CPU copying from the *same* buffer to userspace.

Looking at `set_overhead`:

```c
2000:		es->s_overhead_clusters = cpu_to_le32(*((unsigned long *) arg));
```

It's a simple assignment. The reader is `_copy_to_iter`, just looping and copying.

If the reader sees an intermediate value, it's just 4 bytes, and it's aligned.  `s_overhead_clusters` is at offset 584 bytes, which is 4-byte aligned (584 / 4 = 146).

So, the reader will get either the old or new value. However, it might see the new `s_overhead_clusters` with an old `s_checksum`, or vice-versa, because they are separate writes. This is exactly what happens when reading a block device while it's being modified.

Is there any reason to think this is *not* benign? If the kernel *expects* `lock_buffer` to protect against *all* reads, then it's a bug. But `lock_buffer` *only* protects against other users of the buffer layer that *also* use `lock_buffer`. It doesn't protect against generic page cache reads.

To prevent generic reads, the page itself would need to be locked. Does `ext4_update_primary_sb` lock the page? No. It just calls `lock_buffer(bh)`. `lock_buffer` sets the `BH_Lock` bit on the buffer head; it doesn't lock the underlying page.

Wait... `ext4_update_primary_sb` is called during *mount*. If `udevd` is scanning the device at the same time, it's a race.

Ah, there's a known issue where `blkid` can see inconsistent superblocks if they're being updated. Usually, `udev` handles this by retrying or failing to find the UUID/label. This might cause some symlink delays in `/dev/disk/by-uuid/`, but it


17/2 2026/01/31 21:33 tool codesearch-find-references
0m Args:
map[Name:ext4_update_primary_sb]

Results:
map[References:[map[ReferenceKind:calls ReferencingEntityKind:function ReferencingEntityName:ext4_update_superblocks_fn SourceFile:fs/ext4/ioctl.c SourceLine:221]] TruncatedOutput:false]

18/2 2026/01/31 21:33 llm expert
0m Model:
gemini-3-flash-preview

Tokens:
							input: 16228
							output: 42
							thoughts: 1198
						

Thoughts:
**Analyzing a Potential Data Race in ext4 Superblock Updates**

Okay, so KCSAN is flagging a data race involving updates to the ext4 superblock. Let's break this down. From what I can see, `ext4_update_superblocks_fn` is the culprit, and it's called from a few places: `ext4_update_overhead`, `ext4_clear_jbd2_journal_err`, `ext4_commit_super`, `ext4_set_resv_clusters`, `ext4_group_add`, `ext4_group_extend`, `ext4_resize_fs`, and `ext4_ioctl` (specifically for things like encryption and verity policies). So, the superblock gets touched at various points.

If KCSAN is reporting this, it's a real race condition. But is it *harmful*? In the kernel's perspective, it's just a buffer being written to. The fact that the same buffer is being read via `read()` on the block device is just how block devices function. If we wanted to avoid this, we'd need to lock the page during the update, but the design, `ext4_update_primary_sb`, is intended for a journal transaction.

There's a helpful comment in `fs/ext4/ioctl.c` mentioning it is updating primary and backup superblocks using a provided function.

The key question is whether this race affects *internal* kernel state leading to a crash, or if it's just a race on the contents of a disk block being copied to user-space. The stack trace, in this case, is showing it's a `read` syscall to `udevd` using `vfs_read`. It's clearly user-space reading from the block device.

Does the kernel itself use `_copy_to_iter` for anything truly critical? Normally not. Here, the stack trace clearly points to the `read` syscall.

Let's look more closely at the write side. `set_overhead` writes to `es->s_overhead_clusters`. `es` is `sbi->s_es`. And `sbi->s_es` is initialized in `ext4_fill_super` from `ext4_get_sb`. This all says that this is buffer data being written and read.

If I look at past KCSAN reports, these are often marked as benign if they're stats or non-critical fields. `s_overhead_clusters` is indeed a field in the superblock, part of the filesystem metadata used for statfs. If `udevd` gets the wrong value for this, it doesn't really care. `udevd` (via `blkid`) cares about the UUID, label, and magic number. The magic number is at offset 0x38 (56 bytes), while the race is at 584 bytes. If the race was on the magic number, `blkid` might fail, but this isn't that.

Should we consider using `READ_ONCE` or `WRITE_ONCE` here? Probably not. The whole superblock is a large structure, and atomicity isn't feasible for the entire thing. The consistency is supposed to be handled by `lock_buffer` and the checksum. If user-space reads it during an update, then it is user-space's problem, likely a checksum failure. However, KCSAN still flags this.

Is there any other potential issue? Could the compiler be doing something odd with the write to `es->s_overhead_clusters`? It's a 4-byte write. `es->s_overhead_clusters = cpu_to_le32(*((unsigned long *) arg));`. `arg` is `&sbi->s_overhead`. `sbi->s_overhead` is `unsigned long`. `cpu_to_le32` converts it to `__le32`. The `*((unsigned long *) arg)` might be a problem if `arg` is not an `unsigned long *`. In `ext4_update_overhead`: `return ext4_update_superblocks_fn(sb, set_overhead, &sbi->s_overhead);`. `sbi->s_overhead` is `unsigned long`, so that's fine.

Wait, is there a race on `sbi->s_overhead`? No, `set_overhead` reads from `arg` and writes to `es->s_overhead_clusters`. Is there a race on `sbi->s_overhead`? Let's check `ext4_statfs` to see if that is reading this variable and if it's updating that variable.



19/2 2026/01/31 21:34 tool codesearch-definition-source
0m Args:
map[ContextFile:fs/ext4/super.c IncludeLines:true Name:ext4_statfs]

Results:
map[SourceCode:6903:	static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf)
6904:	{
6905:		struct super_block *sb = dentry->d_sb;
6906:		struct ext4_sb_info *sbi = EXT4_SB(sb);
6907:		struct ext4_super_block *es = sbi->s_es;
6908:		ext4_fsblk_t overhead = 0, resv_blocks;
6909:		s64 bfree;
6910:		resv_blocks = EXT4_C2B(sbi, atomic64_read(&sbi->s_resv_clusters));
6911:	
6912:		if (!test_opt(sb, MINIX_DF))
6913:			overhead = sbi->s_overhead;
6914:	
6915:		buf->f_type = EXT4_SUPER_MAGIC;
6916:		buf->f_bsize = sb->s_blocksize;
6917:		buf->f_blocks = ext4_blocks_count(es) - EXT4_C2B(sbi, overhead);
6918:		bfree = percpu_counter_sum_positive(&sbi->s_freeclusters_counter) -
6919:			percpu_counter_sum_positive(&sbi->s_dirtyclusters_counter);
6920:		/* prevent underflow in case that few free space is available */
6921:		buf->f_bfree = EXT4_C2B(sbi, max_t(s64, bfree, 0));
6922:		buf->f_bavail = buf->f_bfree -
6923:				(ext4_r_blocks_count(es) + resv_blocks);
6924:		if (buf->f_bfree < (ext4_r_blocks_count(es) + resv_blocks))
6925:			buf->f_bavail = 0;
6926:		buf->f_files = le32_to_cpu(es->s_inodes_count);
6927:		buf->f_ffree = percpu_counter_sum_positive(&sbi->s_freeinodes_counter);
6928:		buf->f_namelen = EXT4_NAME_LEN;
6929:		buf->f_fsid = uuid_to_fsid(es->s_uuid);
6930:	
6931:	#ifdef CONFIG_QUOTA
6932:		if (ext4_test_inode_flag(dentry->d_inode, EXT4_INODE_PROJINHERIT) &&
6933:		    sb_has_quota_limits_enabled(sb, PRJQUOTA))
6934:			ext4_statfs_project(sb, EXT4_I(dentry->d_inode)->i_projid, buf);
6935:	#endif
6936:		return 0;
6937:	}
 SourceFile:fs/ext4/super.c]

20/2 2026/01/31 21:34 llm expert
0m Model:
gemini-3-flash-preview

Tokens:
							input: 19284
							output: 20
							thoughts: 51
						

Thoughts:
Okay, here's my take on those thoughts, tailored for an expert audience:

**Memory Jog and Quick Verification**

Hmm, hold on a second... I was pretty sure `ext4_statfs` triggered an overhead update, but something feels off. I could have sworn I'd seen that interaction. Nope, it looks like `ext4_statfs` *doesn't* directly call `ext4_update_overhead`. Either I'm misremembering the exact code path, or that overhead calculation happens in a different function that `ext4_statfs` may call at some point, or it's called elsewhere entirely. Just to be certain, let's fire up a `grep` and see where that `ext4_update_overhead` function is actually being used within the codebase. It's time to confirm the context and clarify this in my mind.



21/2 2026/01/31 21:34 tool grepper
0m Args:
map[Expression:ext4_update_overhead]

Results:
map[Output:fs/ext4/ext4.h=3134=extern void ext4_reset_inode_seed(struct inode *inode);
fs/ext4/ext4.h:3135:int ext4_update_overhead(struct super_block *sb, bool force);
fs/ext4/ext4.h-3136-int ext4_force_shutdown(struct super_block *sb, u32 flags);
--
fs/ext4/ioctl.c=1997=static void set_overhead(struct ext4_sb_info *sbi,
--
fs/ext4/ioctl.c-2002-
fs/ext4/ioctl.c:2003:int ext4_update_overhead(struct super_block *sb, bool force)
fs/ext4/ioctl.c-2004-{
--
fs/ext4/resize.c=98=int ext4_resize_end(struct super_block *sb, bool update_backups)
--
fs/ext4/resize.c-102-	if (update_backups)
fs/ext4/resize.c:103:		return ext4_update_overhead(sb, true);
fs/ext4/resize.c-104-	return 0;
--
fs/ext4/super.c=5757=static int ext4_fill_super(struct super_block *sb, struct fs_context *fc)
--
fs/ext4/super.c-5797-	/* Update the s_overhead_clusters if necessary */
fs/ext4/super.c:5798:	ext4_update_overhead(sb, false);
fs/ext4/super.c-5799-	return 0;
]

22/2 2026/01/31 21:34 llm expert
0m Model:
gemini-3-flash-preview

Tokens:
							input: 19976
							output: 44
							thoughts: 545
						

Thoughts:
**Analyzing a Potential Data Race in ext4 Filesystem Mounting**

Okay, so I'm looking into this issue and it seems to be centered around a potential race condition that arises during the ext4 filesystem mounting process, specifically within the `ext4_fill_super` and `ext4_resize_end` functions. It looks like `udevd` is the culprit, trying to read the block device while the filesystem is being mounted.

My initial thought is that this is likely benign from the kernel's side of things. We're talking about updates happening to the superblock within the page cache, and user-space reading that same page cache. No memory corruption, no crashes – just the possibility of user-space seeing an inconsistent superblock for a brief period. I mean, we are talking about a *transient* state.

But now I'm starting to think about other readers. Could there be another kernel thread involved? Ah, yes, what about `sbi->s_es->s_overhead_clusters`? The `ext4_update_overhead` function itself reads it. Hmm, but the same thread is about to update or not.

But hang on. `ext4_update_overhead` is called from `ext4_fill_super`, during the mount process. The filesystem isn't fully mounted yet, but the block device is open. If multiple threads were calling `ext4_update_overhead` concurrently, they should be synchronized by the `EXT4_FLAGS_RESIZING` bit in `ext4_update_superblocks_fn`. So, the writers *are* synchronized. The reader in the report is `_copy_to_iter` which is from a `read` syscall.

This is a pretty common pattern. Page cache data gets updated, and user-space might read it through a `read()` on the block device. We rarely use `READ_ONCE` or `WRITE_ONCE` on all buffer data.

So, is there any reason to think this is *not* benign? If this were a data race on an internal kernel structure like `struct ext4_sb_info`, I'd be more concerned. But `struct ext4_super_block`, pointed to by `sbi->s_es` is actually buffer data. Now, let me just quickly check the definition of `struct ext4_sb_info` to refresh my memory and verify the role of `s_es`. Okay, I think I'm comfortable that this is not actually a race that needs fixing.



23/2 2026/01/31 21:34 tool codesearch-definition-source
0m Args:
map[ContextFile:fs/ext4/ext4.h IncludeLines:true Name:ext4_sb_info]

Results:
map[SourceCode:1535:	/*
1536:	 * fourth extended-fs super-block data in memory
1537:	 */
1538:	struct ext4_sb_info {
1539:		unsigned long s_desc_size;	/* Size of a group descriptor in bytes */
1540:		unsigned long s_inodes_per_block;/* Number of inodes per block */
1541:		unsigned long s_blocks_per_group;/* Number of blocks in a group */
1542:		unsigned long s_clusters_per_group; /* Number of clusters in a group */
1543:		unsigned long s_inodes_per_group;/* Number of inodes in a group */
1544:		unsigned long s_itb_per_group;	/* Number of inode table blocks per group */
1545:		unsigned long s_gdb_count;	/* Number of group descriptor blocks */
1546:		unsigned long s_desc_per_block;	/* Number of group descriptors per block */
1547:		ext4_group_t s_groups_count;	/* Number of groups in the fs */
1548:		ext4_group_t s_blockfile_groups;/* Groups acceptable for non-extent files */
1549:		unsigned long s_overhead;  /* # of fs overhead clusters */
1550:		unsigned int s_cluster_ratio;	/* Number of blocks per cluster */
1551:		unsigned int s_cluster_bits;	/* log2 of s_cluster_ratio */
1552:		loff_t s_bitmap_maxbytes;	/* max bytes for bitmap files */
1553:		struct buffer_head * s_sbh;	/* Buffer containing the super block */
1554:		struct ext4_super_block *s_es;	/* Pointer to the super block in the buffer */
1555:		/* Array of bh's for the block group descriptors */
1556:		struct buffer_head * __rcu *s_group_desc;
1557:		unsigned int s_mount_opt;
1558:		unsigned int s_mount_opt2;
1559:		unsigned long s_mount_flags;
1560:		unsigned int s_def_mount_opt;
1561:		unsigned int s_def_mount_opt2;
1562:		ext4_fsblk_t s_sb_block;
1563:		atomic64_t s_resv_clusters;
1564:		kuid_t s_resuid;
1565:		kgid_t s_resgid;
1566:		unsigned short s_mount_state;
1567:		unsigned short s_pad;
1568:		int s_addr_per_block_bits;
1569:		int s_desc_per_block_bits;
1570:		int s_inode_size;
1571:		int s_first_ino;
1572:		unsigned int s_inode_readahead_blks;
1573:		unsigned int s_inode_goal;
1574:		u32 s_hash_seed[4];
1575:		int s_def_hash_version;
1576:		int s_hash_unsigned;	/* 3 if hash should be unsigned, 0 if not */
1577:		struct percpu_counter s_freeclusters_counter;
1578:		struct percpu_counter s_freeinodes_counter;
1579:		struct percpu_counter s_dirs_counter;
1580:		struct percpu_counter s_dirtyclusters_counter;
1581:		struct percpu_counter s_sra_exceeded_retry_limit;
1582:		struct blockgroup_lock *s_blockgroup_lock;
1583:		struct proc_dir_entry *s_proc;
1584:		struct kobject s_kobj;
1585:		struct completion s_kobj_unregister;
1586:		struct super_block *s_sb;
1587:		struct buffer_head *s_mmp_bh;
1588:	
1589:		/* Journaling */
1590:		struct journal_s *s_journal;
1591:		unsigned long s_ext4_flags;		/* Ext4 superblock flags */
1592:		struct mutex s_orphan_lock;	/* Protects on disk list changes */
1593:		struct list_head s_orphan;	/* List of orphaned inodes in on disk
1594:						   list */
1595:		struct ext4_orphan_info s_orphan_info;
1596:		unsigned long s_commit_interval;
1597:		u32 s_max_batch_time;
1598:		u32 s_min_batch_time;
1599:		struct file *s_journal_bdev_file;
1600:	#ifdef CONFIG_QUOTA
1601:		/* Names of quota files with journalled quota */
1602:		char __rcu *s_qf_names[EXT4_MAXQUOTAS];
1603:		int s_jquota_fmt;			/* Format of quota to use */
1604:	#endif
1605:		unsigned int s_want_extra_isize; /* New inodes should reserve # bytes */
1606:		struct ext4_system_blocks __rcu *s_system_blks;
1607:	
1608:	#ifdef EXTENTS_STATS
1609:		/* ext4 extents stats */
1610:		unsigned long s_ext_min;
1611:		unsigned long s_ext_max;
1612:		unsigned long s_depth_max;
1613:		spinlock_t s_ext_stats_lock;
1614:		unsigned long s_ext_blocks;
1615:		unsigned long s_ext_extents;
1616:	#endif
1617:	
1618:		/* for buddy allocator */
1619:		struct ext4_group_info ** __rcu *s_group_info;
1620:		struct inode *s_buddy_cache;
1621:		spinlock_t s_md_lock;
1622:		unsigned short *s_mb_offsets;
1623:		unsigned int *s_mb_maxs;
1624:		unsigned int s_group_info_size;
1625:		atomic_t s_mb_free_pending;
1626:		struct list_head s_freed_data_list[2];	/* List of blocks to be freed
1627:							   after commit completed */
1628:		struct list_head s_discard_list;
1629:		struct work_struct s_discard_work;
1630:		atomic_t s_retry_alloc_pending;
1631:		struct xarray *s_mb_avg_fragment_size;
1632:		struct xarray *s_mb_largest_free_orders;
1633:	
1634:		/* tunables */
1635:		unsigned long s_stripe;
1636:		unsigned int s_mb_max_linear_groups;
1637:		unsigned int s_mb_stream_request;
1638:		unsigned int s_mb_max_to_scan;
1639:		unsigned int s_mb_min_to_scan;
1640:		unsigned int s_mb_stats;
1641:		unsigned int s_mb_order2_reqs;
1642:		unsigned int s_mb_group_prealloc;
1643:		unsigned int s_max_dir_size_kb;
1644:		unsigned int s_mb_prefetch;
1645:		unsigned int s_mb_prefetch_limit;
1646:		unsigned int s_mb_best_avail_max_trim_order;
1647:		unsigned int s_sb_update_sec;
1648:		unsigned int s_sb_update_kb;
1649:	
1650:		/* where last allocation was done - for stream allocation */
1651:		ext4_group_t *s_mb_last_groups;
1652:		unsigned int s_mb_nr_global_goals;
1653:	
1654:		/* stats for buddy allocator */
1655:		atomic_t s_bal_reqs;	/* number of reqs with len > 1 */
1656:		atomic_t s_bal_success;	/* we found long enough chunks */
1657:		atomic_t s_bal_allocated;	/* in blocks */
1658:		atomic_t s_bal_ex_scanned;	/* total extents scanned */
1659:		atomic_t s_bal_cX_ex_scanned[EXT4_MB_NUM_CRS];	/* total extents scanned */
1660:		atomic_t s_bal_groups_scanned;	/* number of groups scanned */
1661:		atomic_t s_bal_goals;	/* goal hits */
1662:		atomic_t s_bal_stream_goals;	/* stream allocation global goal hits */
1663:		atomic_t s_bal_len_goals;	/* len goal hits */
1664:		atomic_t s_bal_breaks;	/* too long searches */
1665:		atomic_t s_bal_2orders;	/* 2^order hits */
1666:		atomic64_t s_bal_cX_groups_considered[EXT4_MB_NUM_CRS];
1667:		atomic64_t s_bal_cX_hits[EXT4_MB_NUM_CRS];
1668:		atomic64_t s_bal_cX_failed[EXT4_MB_NUM_CRS];		/* cX loop didn't find blocks */
1669:		atomic_t s_mb_buddies_generated;	/* number of buddies generated */
1670:		atomic64_t s_mb_generation_time;
1671:		atomic_t s_mb_lost_chunks;
1672:		atomic_t s_mb_preallocated;
1673:		atomic_t s_mb_discarded;
1674:		atomic_t s_lock_busy;
1675:	
1676:		/* locality groups */
1677:		struct ext4_locality_group __percpu *s_locality_groups;
1678:	
1679:		/* for write statistics */
1680:		unsigned long s_sectors_written_start;
1681:		u64 s_kbytes_written;
1682:	
1683:		/* the size of zero-out chunk */
1684:		unsigned int s_extent_max_zeroout_kb;
1685:	
1686:		unsigned int s_log_groups_per_flex;
1687:		struct flex_groups * __rcu *s_flex_groups;
1688:		ext4_group_t s_flex_groups_allocated;
1689:	
1690:		/* workqueue for reserved extent conversions (buffered io) */
1691:		struct workqueue_struct *rsv_conversion_wq;
1692:	
1693:		/* timer for periodic error stats printing */
1694:		struct timer_list s_err_report;
1695:	
1696:		/* Lazy inode table initialization info */
1697:		struct ext4_li_request *s_li_request;
1698:		/* Wait multiplier for lazy initialization thread */
1699:		unsigned int s_li_wait_mult;
1700:	
1701:		/* Kernel thread for multiple mount protection */
1702:		struct task_struct *s_mmp_tsk;
1703:	
1704:		/* record the last minlen when FITRIM is called. */
1705:		unsigned long s_last_trim_minblks;
1706:	
1707:		/* minimum folio order of a page cache allocation */
1708:		u16 s_min_folio_order;
1709:		/* supported maximum folio order, 0 means not supported */
1710:		u16 s_max_folio_order;
1711:	
1712:		/* Precomputed FS UUID checksum for seeding other checksums */
1713:		__u32 s_csum_seed;
1714:	
1715:		/* Reclaim extents from extent status tree */
1716:		struct shrinker *s_es_shrinker;
1717:		struct list_head s_es_list;	/* List of inodes with reclaimable extents */
1718:		long s_es_nr_inode;
1719:		struct ext4_es_stats s_es_stats;
1720:		struct mb_cache *s_ea_block_cache;
1721:		struct mb_cache *s_ea_inode_cache;
1722:		spinlock_t s_es_lock ____cacheline_aligned_in_smp;
1723:	
1724:		/* Journal triggers for checksum computation */
1725:		struct ext4_journal_trigger s_journal_triggers[EXT4_JOURNAL_TRIGGER_COUNT];
1726:	
1727:		/* Ratelimit ext4 messages. */
1728:		struct ratelimit_state s_err_ratelimit_state;
1729:		struct ratelimit_state s_warning_ratelimit_state;
1730:		struct ratelimit_state s_msg_ratelimit_state;
1731:		atomic_t s_warning_count;
1732:		atomic_t s_msg_count;
1733:	
1734:		/* Encryption policy for '-o test_dummy_encryption' */
1735:		struct fscrypt_dummy_policy s_dummy_enc_policy;
1736:	
1737:		/*
1738:		 * Barrier between writepages ops and changing any inode's JOURNAL_DATA
1739:		 * or EXTENTS flag or between writepages ops and changing DELALLOC or
1740:		 * DIOREAD_NOLOCK mount options on remount.
1741:		 */
1742:		struct percpu_rw_semaphore s_writepages_rwsem;
1743:		struct dax_device *s_daxdev;
1744:		u64 s_dax_part_off;
1745:	#ifdef CONFIG_EXT4_DEBUG
1746:		unsigned long s_simulate_fail;
1747:	#endif
1748:		/* Record the errseq of the backing block device */
1749:		errseq_t s_bdev_wb_err;
1750:		spinlock_t s_bdev_wb_lock;
1751:	
1752:		/* Information about errors that happened during this mount */
1753:		spinlock_t s_error_lock;
1754:		int s_add_error_count;
1755:		int s_first_error_code;
1756:		__u32 s_first_error_line;
1757:		__u32 s_first_error_ino;
1758:		__u64 s_first_error_block;
1759:		const char *s_first_error_func;
1760:		time64_t s_first_error_time;
1761:		int s_last_error_code;
1762:		__u32 s_last_error_line;
1763:		__u32 s_last_error_ino;
1764:		__u64 s_last_error_block;
1765:		const char *s_last_error_func;
1766:		time64_t s_last_error_time;
1767:		/*
1768:		 * If we are in a context where we cannot update the on-disk
1769:		 * superblock, we queue the work here.  This is used to update
1770:		 * the error information in the superblock, and for periodic
1771:		 * updates of the superblock called from the commit callback
1772:		 * function.
1773:		 */
1774:		struct work_struct s_sb_upd_work;
1775:	
1776:		/* Atomic write unit values in bytes */
1777:		unsigned int s_awu_min;
1778:		unsigned int s_awu_max;
1779:	
1780:		/* Ext4 fast commit sub transaction ID */
1781:		atomic_t s_fc_subtid;
1782:	
1783:		/*
1784:		 * After commit starts, the main queue gets locked, and the further
1785:		 * updates get added in the staging queue.
1786:		 */
1787:	#define FC_Q_MAIN	0
1788:	#define FC_Q_STAGING	1
1789:		struct list_head s_fc_q[2];	/* Inodes staged for fast commit
1790:						 * that have data changes in them.
1791:						 */
1792:		struct list_head s_fc_dentry_q[2];	/* directory entry updates */
1793:		unsigned int s_fc_bytes;
1794:		/*
1795:		 * Main fast commit lock. This lock protects accesses to the
1796:		 * following fields:
1797:		 * ei->i_fc_list, s_fc_dentry_q, s_fc_q, s_fc_bytes, s_fc_bh.
1798:		 */
1799:		struct mutex s_fc_lock;
1800:		struct buffer_head *s_fc_bh;
1801:		struct ext4_fc_stats s_fc_stats;
1802:		tid_t s_fc_ineligible_tid;
1803:	#ifdef CONFIG_EXT4_DEBUG
1804:		int s_fc_debug_max_replay;
1805:	#endif
1806:		struct ext4_fc_replay_state s_fc_replay_state;
1807:	};
 SourceFile:fs/ext4/ext4.h]

24/2 2026/01/31 21:34 llm expert
0m Model:
gemini-3-flash-preview

Tokens:
							input: 25920
							output: 43
							thoughts: 1307
						

Thoughts:
**Benign Data Race in Superblock Update**

Okay, let me think this through. `s_es` in the code is clearly a pointer to the superblock within the buffer, and `s_sbh` is the buffer head itself. So the write in question *is* happening directly to the buffer data. The read is coming from `filemap_read` – that means it’s a page cache read from the block device.

My initial thought was, could `_copy_to_iter` be pulling from someplace else? No, the address `0xffff888151cce400` is in the direct map, right where buffer heads and their data typically reside. And the read of 1024 bytes at offset 584 from the superblock – that's the tell. It lines up perfectly with the superblock itself.

So, it's a data race. A writer is updating the superblock in the buffer cache, while a reader is reading from the block device directly through the page cache. Since buffer cache and page cache are integrated, they are looking at the *same* memory. The writer uses `lock_buffer(bh)` to synchronize with other buffer-layer users. However, `filemap_read` *doesn't* use buffer locks. This can lead to the reader seeing a partially updated superblock – maybe the new data, but an old checksum, that sort of thing. But that's… expected. That's the trade-off of allowing concurrent modification of a block device by a mounted filesystem.

Crucially, this data race is on the *content* being transferred to user-space, not on any kernel control structures or internal kernel data that's going to lead to crashes. It's essentially an inconsistency in the view a user-space process has of the block device. So, it *is* benign.

Let me just quickly check where `set_overhead` is called in other contexts. It is called from `ext4_fill_super` and also `ext4_resize_end`, so, resizing the filesystem. It's good that it's *not* called from something like `ext4_statfs` because that'd be far more frequent. But `ext4_update_overhead` *is* called during resize – and that makes sense; resizing is when the superblock *would* be updated. The key here, though, is that the update is protected by `lock_buffer(bh)` and the `EXT4_FLAGS_RESIZING` flag. So, the writer is safe, using the expected locking.

The reader is *always* a user-space process reading the block device, and that's the heart of the issue: the user-space process doesn't synchronize with the kernel's buffer-layer locking.

Could another mount be reading the same superblock at the same time? Let me think… `sb_bread` would get the same `bh`, and it calls `wait_on_buffer(bh)` if locked. So, *other* kernel users of the buffer layer *are* synchronized. The only ones *not* synchronized are those like `filemap_read` that use the page cache directly. That's a known "feature" when reading from block devices.

So, I'm confident. This is benign.

Okay, let’s go deeper. The KCSAN report says: write 4 bytes to `0xffff888151cce648`, read 1024 bytes from `0xffff888151cce400`. The difference between these addresses is exactly 584 bytes – which is the offset of `s_overhead_clusters`. The read is 1024 bytes, and the superblock is 1024 bytes.  All checks out.

Is there any risk of tearing that would be harmful? The write is 4 bytes and aligned; it's atomic on x86. The reader will see either the old value or the new. The only inconsistency is between different fields like `s_overhead_clusters` and the checksum. This is not a kernel bug.

I'm marking this benign, I'm fairly certain.

Wait... I just need to verify that we don't have a missing `WRITE_ONCE` or `READ_ONCE` situation here. If the kernel was reading `s_overhead_clusters` from `sbi->s_es` without locks, it would need `READ_ONCE`. But `ext4_statfs` reads it from `sbi->s_overhead`, which is a member of `struct ext4_sb_info`. And `sbi->s_overhead` is updated *before* calling `ext4_update_superblocks_fn` in `


25/2 2026/01/31 21:34 tool codesearch-definition-source
0m Args:
map[ContextFile:fs/ext4/super.c IncludeLines:true Name:ext4_calculate_overhead]

Results:
map[SourceCode:4183:	/*
4184:	 * Compute the overhead and stash it in sbi->s_overhead
4185:	 */
4186:	int ext4_calculate_overhead(struct super_block *sb)
4187:	{
4188:		struct ext4_sb_info *sbi = EXT4_SB(sb);
4189:		struct ext4_super_block *es = sbi->s_es;
4190:		struct inode *j_inode;
4191:		unsigned int j_blocks, j_inum = le32_to_cpu(es->s_journal_inum);
4192:		ext4_group_t i, ngroups = ext4_get_groups_count(sb);
4193:		ext4_fsblk_t overhead = 0;
4194:		char *buf = kvmalloc(sb->s_blocksize, GFP_NOFS | __GFP_ZERO);
4195:	
4196:		if (!buf)
4197:			return -ENOMEM;
4198:	
4199:		/*
4200:		 * Compute the overhead (FS structures).  This is constant
4201:		 * for a given filesystem unless the number of block groups
4202:		 * changes so we cache the previous value until it does.
4203:		 */
4204:	
4205:		/*
4206:		 * All of the blocks before first_data_block are overhead
4207:		 */
4208:		overhead = EXT4_B2C(sbi, le32_to_cpu(es->s_first_data_block));
4209:	
4210:		/*
4211:		 * Add the overhead found in each block group
4212:		 */
4213:		for (i = 0; i < ngroups; i++) {
4214:			int blks;
4215:	
4216:			blks = count_overhead(sb, i, buf);
4217:			overhead += blks;
4218:			if (blks)
4219:				memset(buf, 0, sb->s_blocksize);
4220:			cond_resched();
4221:		}
4222:	
4223:		/*
4224:		 * Add the internal journal blocks whether the journal has been
4225:		 * loaded or not
4226:		 */
4227:		if (sbi->s_journal && !sbi->s_journal_bdev_file)
4228:			overhead += EXT4_NUM_B2C(sbi, sbi->s_journal->j_total_len);
4229:		else if (ext4_has_feature_journal(sb) && !sbi->s_journal && j_inum) {
4230:			/* j_inum for internal journal is non-zero */
4231:			j_inode = ext4_get_journal_inode(sb, j_inum);
4232:			if (!IS_ERR(j_inode)) {
4233:				j_blocks = j_inode->i_size >> sb->s_blocksize_bits;
4234:				overhead += EXT4_NUM_B2C(sbi, j_blocks);
4235:				iput(j_inode);
4236:			} else {
4237:				ext4_msg(sb, KERN_ERR, "can't get journal size");
4238:			}
4239:		}
4240:		sbi->s_overhead = overhead;
4241:		smp_wmb();
4242:		kvfree(buf);
4243:		return 0;
4244:	}
 SourceFile:fs/ext4/super.c]

26/2 2026/01/31 21:34 llm expert
0m Model:
gemini-3-flash-preview

Tokens:
							input: 29315
							output: 19
							thoughts: 782
						

Thoughts:
**Benign Data Race in ext4 Superblock Updates**

Okay, let's break this down. Line 4240 – that's where the update happens. Looking at the code around line 5794, `ext4_calculate_overhead` is called before `ext4_update_overhead`. So, `sbi->s_overhead` is calculated first, then the superblock in the buffer cache gets updated.

The race reported focuses on the superblock within the buffer cache. My initial thought was whether `ext4_statfs` could be involved in a race with `ext4_calculate_overhead` on `sbi->s_overhead`. But wait, `ext4_statfs` runs *after* the filesystem is mounted, and `ext4_calculate_overhead` runs *during* the mount process. So, that's not it. This is specifically on the buffer data.

Based on the information, this looks benign.

Let's dig a little deeper. Examining the `set_overhead` function, it uses `cpu_to_le32(*((unsigned long *) arg))`.  `arg` points to `sbi->s_overhead`, which is an `unsigned long`. On 64-bit systems, `unsigned long` is 8 bytes. `*((unsigned long *) arg)` attempts to read 8 bytes. But, `cpu_to_le32` takes a `u32`. It seems like there's truncation – an 8-byte value is being converted to a 4-byte value.  But, wait, `s_overhead_clusters` is actually a 32-bit field, so this is correct. The race, then, is a 4-byte write against a 1024-byte read.

Hang on. Let's re-examine this.  If `arg` is `unsigned long *`, then `*arg` is the value. The cast `(unsigned long *) arg` is really redundant but doesn't cause any harm. Looking at the definition, `ext4_update_sb_callback` is defined as `const void *arg`.

So, the race is *definitely* benign.

The bug report says the write is to `s_overhead_clusters`, a superblock field. The read comes from a `read` syscall, directly from the block device. The data in question is the superblock itself. The race is on the data being read from the block device into userspace while it's being modified. This is expected behavior when reading a block device that's also being written to. It has no implications for the kernel's stability.

Verdict: This data race is benign.



27/2 2026/01/31 21:34 tool set-results
0m Args:
map[Benign:true Confident:true]

Results:
map[Benign:true Confident:true]

28/2 2026/01/31 21:34 llm expert
0m Model:
gemini-3-flash-preview

Tokens:
							input: 30710
							output: 634
							thoughts: 522
						

Thoughts:
**Analyzing the Data Race in ext4 and Block Device Reads**

Okay, let's break this down. My concern is the data race between `set_overhead()` in the ext4 filesystem code and `_copy_to_iter()` in the I/O iteration library. My immediate goal is to determine if this race is a genuine threat or something we can safely ignore.

First, I see that `set_overhead()` is called when the ext4 filesystem is mounted or resized. It's directly modifying the `s_overhead_clusters` field within the superblock, which resides in a buffer head. Importantly, the code locks the buffer before making the change, so that's good. The superblock, as I understand, lives in the page cache of the block device itself.

Now, on the other side, `_copy_to_iter()` is triggered during a `read()` system call, something like `blkid` or `udevd` might perform on the block device. It's moving data from the block device's page cache to user-space. The code seems to use `blkdev_read_iter` which, as I recall, doesn't inherently check for buffer locks. This immediately raises a red flag.

The core of the problem is this: the writer is modifying a 4-byte field (the `s_overhead_clusters`) within the superblock, and the reader can *concurrently* read from the same page in the block device's page cache. The fact that this 4-byte write is atomic doesn't guarantee data consistency if the read happens mid-write.

Now, assessing the impact. On the kernel side, it appears stable, as expected. The writer is just writing a 4-byte value to a valid memory location. The reader's performing a simple copy to user-space. No pointers are involved in this case and the data is simple, so there's very little likelihood of corrupting the kernel's internal state.

The real concern is data integrity. The user-space program might receive an inconsistent superblock. For instance, the `s_overhead_clusters` might be updated, but an older checksum value remains, which would be inconsistent. However, this is expected behavior. It's essentially reading a moving target, which is typical when accessing a block device that’s also being actively modified by a mounted filesystem. I know tools like `blkid` are designed to handle this. They either retry or ignore seemingly inconsistent data.

So, in conclusion, this data race is likely benign. It's primarily affecting the data consistency seen by user-space, which is reading a block device that is actively undergoing modifications. It's not a kernel-side crash issue or a memory corruption problem. The risk is limited to temporary inconsistencies that user-space tools are built to handle, making the data race not a high-priority. It's a standard behavior when dealing with concurrent block device modifications.



LLM Calls Summary:
Total Calls Total Tokens Avg Tokens Total Duration (Seconds) Avg Duration (Seconds)
Tool Calls Summary:
Total Calls Total Duration (Seconds) Avg Duration (Seconds)