syzbot |
sign-in | mailing list | source | docs | 🏰 |
| ID | Workflow | Result | Correct | Bug | Created | Started | Finished | Revision | Error |
|---|---|---|---|---|---|---|---|---|---|
| 93880a73-eb05-473c-9a94-3f1f0d0a66b5 | assessment-kcsan | Benign: ✅ Confident: ✅ | ❓ | KCSAN: data-race in __dentry_kill / __dentry_kill (7) | 2026/02/02 12:41 | 2026/02/02 12:41 | 2026/02/02 13:10 | 63a5f77ffe6c631f8781efb082e741722cdf8c6a+ |
The data race reported by KCSAN occurs in `fs/dcache.c` within the `__dentry_kill` function. Specifically, it is a race between one task (Task 3002) writing to a dentry's `d_flags` and another task (Task 3304) reading from a sibling dentry's `d_flags`.
### Analysis:
1. **Task 3002 (Writer):**
* It is executing `__dentry_kill(d2)`, where `d2` is the dentry at address `0xffff88811a9159c0`.
* It calls `d_lru_del(d2)`, which performs `dentry->d_flags &= ~DCACHE_LRU_LIST;` (clearing bit 18).
* Later, it calls `dentry_unlink_inode(d2)`, which invokes `__d_clear_type_and_inode(d2)` to clear the `DCACHE_ENTRY_TYPE` bits (bits 19-21).
* The report shows the value changing from `0x00140004` to `0x00000004`, confirming that bits 18 (`DCACHE_LRU_LIST`) and 20 (`DCACHE_DIRECTORY_TYPE`) were cleared.
2. **Task 3304 (Reader):**
* It is executing `__dentry_kill(d1)` for a different dentry `d1`, which happens to be a sibling of `d2`.
* While unlisting `d1` in `dentry_unlist(d1)`, it traverses the sibling list of the parent. It looks at the next sibling, which is `d2`.
* At line 640, it reads `next->d_flags` (i.e., `d2->d_flags`) to check if the `DCACHE_DENTRY_CURSOR` bit (bit 25) is set: `if (likely(!(next->d_flags & DCACHE_DENTRY_CURSOR)))`.
3. **Why the race is benign:**
* **Different Bits:** The writer (Task 3002) is modifying bits 18 and 20. The reader (Task 3304) is only interested in bit 25.
* **Stable State for Bit 25:** `DCACHE_DENTRY_CURSOR` is a flag used for special "cursor" dentries (used for directory iteration). Cursors are never placed on the LRU list. Since Task 3002 is calling `d_lru_del`, the dentry `d2` is guaranteed *not* to be a cursor. Thus, bit 25 is `0` and will remain `0` while `d2` is being killed.
* **Atomicity:** The `d_flags` field is a 32-bit `unsigned int`. On all supported architectures, a 32-bit aligned read/write is atomic, meaning the reader will see either the full value before the write or the full value after the write. It will not see a "torn" value where bits are partially updated.
* **Object Lifetime:** Task 3304 holds the parent dentry's `d_lock`. Task 3002 must acquire the same parent `d_lock` (at line 681 in `__dentry_kill`) before it can actually unlist `d2` and proceed to free the memory. Therefore, `d2` is guaranteed to be a valid object while Task 3304 is reading its flags.
### Conclusion:
The data race is benign. Although two tasks are accessing the same word (`d_flags`) without a common lock, they are operating on disjoint sets of bits, and the bit of interest to the reader is stable. This type of access is common in the VFS layer for performance reasons and does not lead to incorrect behavior or memory corruption.================================================================== BUG: KCSAN: data-race in __dentry_kill / __dentry_kill write to 0xffff88811a9159c0 of 4 bytes by task 3002 on cpu 1: d_lru_del fs/dcache.c:504 [inline] __dentry_kill+0xa8/0x460 fs/dcache.c:665 finish_dput+0x2b/0x200 fs/dcache.c:879 dput+0x52/0x60 fs/dcache.c:928 lookup_fast+0x29e/0x320 fs/namei.c:1832 walk_component fs/namei.c:2225 [inline] lookup_last fs/namei.c:2730 [inline] path_lookupat+0x15c/0x500 fs/namei.c:2754 filename_lookup+0x190/0x390 fs/namei.c:2783 do_readlinkat+0x7d/0x340 fs/stat.c:576 __do_sys_readlink fs/stat.c:613 [inline] __se_sys_readlink fs/stat.c:610 [inline] __x64_sys_readlink+0x47/0x60 fs/stat.c:610 x64_sys_call+0x2af1/0x3000 arch/x86/include/generated/asm/syscalls_64.h:90 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xc0/0x2a0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f read to 0xffff88811a9159c0 of 4 bytes by task 3304 on cpu 0: dentry_unlist fs/dcache.c:640 [inline] __dentry_kill+0x25c/0x460 fs/dcache.c:684 finish_dput+0x2b/0x200 fs/dcache.c:879 dput+0x52/0x60 fs/dcache.c:928 lookup_fast+0x29e/0x320 fs/namei.c:1832 walk_component fs/namei.c:2225 [inline] lookup_last fs/namei.c:2730 [inline] path_lookupat+0x15c/0x500 fs/namei.c:2754 filename_lookup+0x190/0x390 fs/namei.c:2783 user_path_at+0x3e/0x130 fs/namei.c:3576 do_faccessat+0x391/0x830 fs/open.c:490 __do_sys_faccessat2 fs/open.c:544 [inline] __se_sys_faccessat2 fs/open.c:541 [inline] __x64_sys_faccessat2+0x51/0x60 fs/open.c:541 x64_sys_call+0x2c1a/0x3000 arch/x86/include/generated/asm/syscalls_64.h:440 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xc0/0x2a0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f value changed: 0x00140004 -> 0x00000004 Reported by Kernel Concurrency Sanitizer on: CPU: 0 UID: 0 PID: 3304 Comm: udevd Tainted: G W syzkaller #0 PREEMPT(voluntary) Tainted: [W]=WARN Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 ==================================================================
| Seq | Timestamp | Type | Name | Duration |
|---|---|---|---|---|
| 0/0 | 2026/02/02 12:41 | flow | assessment-kcsan |
28mResults:map[Benign:true Confident:true Explanation:The data race reported by KCSAN occurs in `fs/dcache.c` within the `__dentry_kill` function. Specifically, it is a race between one task (Task 3002) writing to a dentry's `d_flags` and another task (Task 3304) reading from a sibling dentry's `d_flags`.
### Analysis:
1. **Task 3002 (Writer):**
* It is executing `__dentry_kill(d2)`, where `d2` is the dentry at address `0xffff88811a9159c0`.
* It calls `d_lru_del(d2)`, which performs `dentry->d_flags &= ~DCACHE_LRU_LIST;` (clearing bit 18).
* Later, it calls `dentry_unlink_inode(d2)`, which invokes `__d_clear_type_and_inode(d2)` to clear the `DCACHE_ENTRY_TYPE` bits (bits 19-21).
* The report shows the value changing from `0x00140004` to `0x00000004`, confirming that bits 18 (`DCACHE_LRU_LIST`) and 20 (`DCACHE_DIRECTORY_TYPE`) were cleared.
2. **Task 3304 (Reader):**
* It is executing `__dentry_kill(d1)` for a different dentry `d1`, which happens to be a sibling of `d2`.
* While unlisting `d1` in `dentry_unlist(d1)`, it traverses the sibling list of the parent. It looks at the next sibling, which is `d2`.
* At line 640, it reads `next->d_flags` (i.e., `d2->d_flags`) to check if the `DCACHE_DENTRY_CURSOR` bit (bit 25) is set: `if (likely(!(next->d_flags & DCACHE_DENTRY_CURSOR)))`.
3. **Why the race is benign:**
* **Different Bits:** The writer (Task 3002) is modifying bits 18 and 20. The reader (Task 3304) is only interested in bit 25.
* **Stable State for Bit 25:** `DCACHE_DENTRY_CURSOR` is a flag used for special "cursor" dentries (used for directory iteration). Cursors are never placed on the LRU list. Since Task 3002 is calling `d_lru_del`, the dentry `d2` is guaranteed *not* to be a cursor. Thus, bit 25 is `0` and will remain `0` while `d2` is being killed.
* **Atomicity:** The `d_flags` field is a 32-bit `unsigned int`. On all supported architectures, a 32-bit aligned read/write is atomic, meaning the reader will see either the full value before the write or the full value after the write. It will not see a "torn" value where bits are partially updated.
* **Object Lifetime:** Task 3304 holds the parent dentry's `d_lock`. Task 3002 must acquire the same parent `d_lock` (at line 681 in `__dentry_kill`) before it can actually unlist `d2` and proceed to free the memory. Therefore, `d2` is guaranteed to be a valid object while Task 3304 is reading its flags.
### Conclusion:
The data race is benign. Although two tasks are accessing the same word (`d_flags`) without a common lock, they are operating on disjoint sets of bits, and the bit of interest to the reader is stable. This type of access is common in the VFS layer for performance reasons and does not lead to incorrect behavior or memory corruption.] |
| 1/1 | 2026/02/02 12:41 | action | kernel-checkouter |
1mResults:map[KernelSrc:/usr/local/google/home/nogikh/dev/syz-agent/bin/workdir/cache/src/442656ac8be766ac08d507e9d55d7e172d7ca2ec] |
| 2/1 | 2026/02/02 12:43 | action | kernel-builder |
9mResults:map[KernelObj:/usr/local/google/home/nogikh/dev/syz-agent/bin/workdir/cache/build/d5e7fe031ef17fcd9d75f07422941eb5162b2fe8] |
| 3/1 | 2026/02/02 12:53 | action | codesearch-prepare |
14mResults:map[Index:codesearch-index] |
| 4/1 | 2026/02/02 13:07 | agent | expert |
2mModel:gemini-3-flash-preview Results: map[Benign:true Confident:true] Instruction: You are an experienced Linux kernel developer tasked with determining if the given kernel data race is benign or not. The data race report is from KCSAN tool. It contains 2 stack traces of the memory accesses that constitute a data race. A "benign" data races are on a simple int/bool variable or similar field, and the accesses are not supposed to be protected by any mutual exclusion primitives. Common examples of such "benign" data races are accesses to various flags fields, statistics counters, and similar. A "benign" data race does not lead to memory corruption/crash with a conservative compiler that compiles memory accesses to primitive types effectively as atomic. A non-benign (or "harmful" data race) can lead to corruption/crash even with a conservative compiler that compiles memory accesses to primitive types effectively as atomic. A common example of a "harmful" data races is race on a complex container (list/hashmap/etc), where accesses are supposed to be protected by a mutual exclusion primitive. In the final reply explain why you think the given data race is benign or is harmful. Use the provided tools to confirm any assumptions, variables/fields being accessed, etc. In particular, don't make assumptions about the kernel source code, use codesearch tools to read the actual source code. Prefer calling several tools at the same time to save round-trips. Use set-results tool to provide results of the analysis. It must be called exactly once before the final reply. Ignore results of this tool. Prompt: The data race report is: ================================================================== BUG: KCSAN: data-race in __dentry_kill / __dentry_kill write to 0xffff88811a9159c0 of 4 bytes by task 3002 on cpu 1: d_lru_del fs/dcache.c:504 [inline] __dentry_kill+0xa8/0x460 fs/dcache.c:665 finish_dput+0x2b/0x200 fs/dcache.c:879 dput+0x52/0x60 fs/dcache.c:928 lookup_fast+0x29e/0x320 fs/namei.c:1832 walk_component fs/namei.c:2225 [inline] lookup_last fs/namei.c:2730 [inline] path_lookupat+0x15c/0x500 fs/namei.c:2754 filename_lookup+0x190/0x390 fs/namei.c:2783 do_readlinkat+0x7d/0x340 fs/stat.c:576 __do_sys_readlink fs/stat.c:613 [inline] __se_sys_readlink fs/stat.c:610 [inline] __x64_sys_readlink+0x47/0x60 fs/stat.c:610 x64_sys_call+0x2af1/0x3000 arch/x86/include/generated/asm/syscalls_64.h:90 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xc0/0x2a0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f read to 0xffff88811a9159c0 of 4 bytes by task 3304 on cpu 0: dentry_unlist fs/dcache.c:640 [inline] __dentry_kill+0x25c/0x460 fs/dcache.c:684 finish_dput+0x2b/0x200 fs/dcache.c:879 dput+0x52/0x60 fs/dcache.c:928 lookup_fast+0x29e/0x320 fs/namei.c:1832 walk_component fs/namei.c:2225 [inline] lookup_last fs/namei.c:2730 [inline] path_lookupat+0x15c/0x500 fs/namei.c:2754 filename_lookup+0x190/0x390 fs/namei.c:2783 user_path_at+0x3e/0x130 fs/namei.c:3576 do_faccessat+0x391/0x830 fs/open.c:490 __do_sys_faccessat2 fs/open.c:544 [inline] __se_sys_faccessat2 fs/open.c:541 [inline] __x64_sys_faccessat2+0x51/0x60 fs/open.c:541 x64_sys_call+0x2c1a/0x3000 arch/x86/include/generated/asm/syscalls_64.h:440 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xc0/0x2a0 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f value changed: 0x00140004 -> 0x00000004 Reported by Kernel Concurrency Sanitizer on: CPU: 0 UID: 0 PID: 3304 Comm: udevd Tainted: G W syzkaller #0 PREEMPT(voluntary) Tainted: [W]=WARN Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025 ================================================================== Reply: The data race reported by KCSAN occurs in `fs/dcache.c` within the `__dentry_kill` function. Specifically, it is a race between one task (Task 3002) writing to a dentry's `d_flags` and another task (Task 3304) reading from a sibling dentry's `d_flags`.
### Analysis:
1. **Task 3002 (Writer):**
* It is executing `__dentry_kill(d2)`, where `d2` is the dentry at address `0xffff88811a9159c0`.
* It calls `d_lru_del(d2)`, which performs `dentry->d_flags &= ~DCACHE_LRU_LIST;` (clearing bit 18).
* Later, it calls `dentry_unlink_inode(d2)`, which invokes `__d_clear_type_and_inode(d2)` to clear the `DCACHE_ENTRY_TYPE` bits (bits 19-21).
* The report shows the value changing from `0x00140004` to `0x00000004`, confirming that bits 18 (`DCACHE_LRU_LIST`) and 20 (`DCACHE_DIRECTORY_TYPE`) were cleared.
2. **Task 3304 (Reader):**
* It is executing `__dentry_kill(d1)` for a different dentry `d1`, which happens to be a sibling of `d2`.
* While unlisting `d1` in `dentry_unlist(d1)`, it traverses the sibling list of the parent. It looks at the next sibling, which is `d2`.
* At line 640, it reads `next->d_flags` (i.e., `d2->d_flags`) to check if the `DCACHE_DENTRY_CURSOR` bit (bit 25) is set: `if (likely(!(next->d_flags & DCACHE_DENTRY_CURSOR)))`.
3. **Why the race is benign:**
* **Different Bits:** The writer (Task 3002) is modifying bits 18 and 20. The reader (Task 3304) is only interested in bit 25.
* **Stable State for Bit 25:** `DCACHE_DENTRY_CURSOR` is a flag used for special "cursor" dentries (used for directory iteration). Cursors are never placed on the LRU list. Since Task 3002 is calling `d_lru_del`, the dentry `d2` is guaranteed *not* to be a cursor. Thus, bit 25 is `0` and will remain `0` while `d2` is being killed.
* **Atomicity:** The `d_flags` field is a 32-bit `unsigned int`. On all supported architectures, a 32-bit aligned read/write is atomic, meaning the reader will see either the full value before the write or the full value after the write. It will not see a "torn" value where bits are partially updated.
* **Object Lifetime:** Task 3304 holds the parent dentry's `d_lock`. Task 3002 must acquire the same parent `d_lock` (at line 681 in `__dentry_kill`) before it can actually unlist `d2` and proceed to free the memory. Therefore, `d2` is guaranteed to be a valid object while Task 3304 is reading its flags.
### Conclusion:
The data race is benign. Although two tasks are accessing the same word (`d_flags`) without a common lock, they are operating on disjoint sets of bits, and the bit of interest to the reader is stable. This type of access is common in the VFS layer for performance reasons and does not lead to incorrect behavior or memory corruption. |
| 5/2 | 2026/02/02 13:07 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 3727 output: 60 thoughts: 110 Thoughts: **Analyzing a Data Race in the Filesystem Cache** Okay, so the report flags a nasty data race issue within the `fs/dcache.c` file – that's the directory cache, right? Hmm, let's see, it looks like task 3002 is writing to the cache through `d_lru_del`, which is called inline within `__dentry_kill` at line 504. Simultaneously, task 3304 is reading from it via `dentry_unlist`, also called inline within `__dentry_kill`, but at line 640. That's a classic read-write conflict, no doubt. The key is to understand *how* these functions are interacting with shared data and what synchronization mechanisms, or lack thereof, are in place. Alright, let's dive into that code and pinpoint the exact source of this data race. This warrants a careful look at the locking strategy within this portion of the VFS. |
| 6/2 | 2026/02/02 13:07 | tool | codesearch-dir-index |
0mArgs:map[Dir:fs/] Results: map[Files:[aio.c anon_inodes.c attr.c backing-file.c bad_inode.c binfmt_elf.c binfmt_elf_fdpic.c binfmt_flat.c binfmt_misc.c binfmt_script.c bpf_fs_kfuncs.c buffer.c char_dev.c compat_binfmt_elf.c coredump.c d_path.c dax.c dcache.c direct-io.c drop_caches.c eventfd.c eventpoll.c exec.c fcntl.c fhandle.c file.c file_attr.c file_table.c filesystems.c fs-writeback.c fs_context.c fs_dirent.c fs_parser.c fs_pin.c fs_struct.c fsopen.c init.c inode.c internal.h ioctl.c kernel_read_file.c libfs.c locks.c mbcache.c mnt_idmapping.c mount.h mpage.c namei.c namespace.c nsfs.c open.c pidfs.c pipe.c pnode.c pnode.h posix_acl.c proc_namespace.c read_write.c readdir.c remap_range.c select.c seq_file.c signalfd.c splice.c stack.c stat.c statfs.c super.c sync.c sysctls.c timerfd.c userfaultfd.c utimes.c xattr.c] Subdirs:[9p adfs affs afs autofs befs bfs btrfs cachefiles ceph coda configfs cramfs crypto debugfs devpts dlm ecryptfs efivarfs efs erofs exfat exportfs ext2 ext4 f2fs fat freevxfs fuse gfs2 hfs hfsplus hostfs hpfs hugetlbfs iomap isofs jbd2 jffs2 jfs kernfs lockd minix netfs nfs nfs_common nfsd nilfs2 nls notify ntfs3 ocfs2 omfs openpromfs orangefs overlayfs proc pstore qnx4 qnx6 quota ramfs resctrl romfs smb squashfs sysfs tests tracefs ubifs udf ufs unicode vboxsf verity xfs zonefs]] |
| 7/2 | 2026/02/02 13:07 | tool | codesearch-definition-source |
0mArgs:map[ContextFile:fs/dcache.c IncludeLines:true Name:__dentry_kill] Results: map[SourceCode: 646: static struct dentry *__dentry_kill(struct dentry *dentry)
647: {
648: struct dentry *parent = NULL;
649: bool can_free = true;
650:
651: /*
652: * The dentry is now unrecoverably dead to the world.
653: */
654: lockref_mark_dead(&dentry->d_lockref);
655:
656: /*
657: * inform the fs via d_prune that this dentry is about to be
658: * unhashed and destroyed.
659: */
660: if (dentry->d_flags & DCACHE_OP_PRUNE)
661: dentry->d_op->d_prune(dentry);
662:
663: if (dentry->d_flags & DCACHE_LRU_LIST) {
664: if (!(dentry->d_flags & DCACHE_SHRINK_LIST))
665: d_lru_del(dentry);
666: }
667: /* if it was on the hash then remove it */
668: __d_drop(dentry);
669: if (dentry->d_inode)
670: dentry_unlink_inode(dentry);
671: else
672: spin_unlock(&dentry->d_lock);
673: this_cpu_dec(nr_dentry);
674: if (dentry->d_op && dentry->d_op->d_release)
675: dentry->d_op->d_release(dentry);
676:
677: cond_resched();
678: /* now that it's negative, ->d_parent is stable */
679: if (!IS_ROOT(dentry)) {
680: parent = dentry->d_parent;
681: spin_lock(&parent->d_lock);
682: }
683: spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
684: dentry_unlist(dentry);
685: if (dentry->d_flags & DCACHE_SHRINK_LIST)
686: can_free = false;
687: spin_unlock(&dentry->d_lock);
688: if (likely(can_free))
689: dentry_free(dentry);
690: if (parent && --parent->d_lockref.count) {
691: spin_unlock(&parent->d_lock);
692: return NULL;
693: }
694: return parent;
695: }
SourceFile:fs/dcache.c] |
| 8/2 | 2026/02/02 13:07 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 5876 output: 80 thoughts: 201 Thoughts: **Analyzing a Potential Data Race in `__dentry_kill`** Okay, so I've got a potential data race flagged, and the report points me right to the heart of the issue: `__dentry_kill`. The trace shows a write in `d_lru_del` (line 665) and a read in `dentry_unlist` (line 684). Hmm, wait, those are both *inside* `__dentry_kill`. That's interesting. Task 3002 is tied to line 665, within `d_lru_del`, and Task 3304 is at line 684, inside `dentry_unlist`. The report explicitly states "data-race in __dentry_kill / __dentry_kill". That begs the question: are we killing *different* dentries, or the *same* dentry concurrently? If it's the same dentry, alarm bells are ringing. We know `__dentry_kill` should only be called on a dentry that's already dead, after it's been marked as such using `lockref_mark_dead`. This screams "bug" if we're seeing concurrent operations on the same dentry after it has been marked as dead. I'm going to dig deeper into the code. Time to examine the specific implementations of `d_lru_del` and `dentry_unlist` to understand the exact access patterns and determine what dentries each of them are affecting, and, how. |
| 9/2 | 2026/02/02 13:07 | tool | codesearch-definition-source |
0mArgs:map[ContextFile:fs/dcache.c IncludeLines:true Name:d_lru_del] Results: map[SourceCode: 501: static void d_lru_del(struct dentry *dentry)
502: {
503: D_FLAG_VERIFY(dentry, DCACHE_LRU_LIST);
504: dentry->d_flags &= ~DCACHE_LRU_LIST;
505: this_cpu_dec(nr_dentry_unused);
506: if (d_is_negative(dentry))
507: this_cpu_dec(nr_dentry_negative);
508: WARN_ON_ONCE(!list_lru_del_obj(
509: &dentry->d_sb->s_dentry_lru, &dentry->d_lru));
510: }
SourceFile:fs/dcache.c] |
| 10/2 | 2026/02/02 13:07 | tool | codesearch-definition-source |
0mArgs:map[ContextFile:fs/dcache.c IncludeLines:true Name:dentry_unlist] Results: map[SourceCode: 608: static inline void dentry_unlist(struct dentry *dentry)
609: {
610: struct dentry *next;
611: /*
612: * Inform d_walk() and shrink_dentry_list() that we are no longer
613: * attached to the dentry tree
614: */
615: dentry->d_flags |= DCACHE_DENTRY_KILLED;
616: if (unlikely(hlist_unhashed(&dentry->d_sib)))
617: return;
618: __hlist_del(&dentry->d_sib);
619: /*
620: * Cursors can move around the list of children. While we'd been
621: * a normal list member, it didn't matter - ->d_sib.next would've
622: * been updated. However, from now on it won't be and for the
623: * things like d_walk() it might end up with a nasty surprise.
624: * Normally d_walk() doesn't care about cursors moving around -
625: * ->d_lock on parent prevents that and since a cursor has no children
626: * of its own, we get through it without ever unlocking the parent.
627: * There is one exception, though - if we ascend from a child that
628: * gets killed as soon as we unlock it, the next sibling is found
629: * using the value left in its ->d_sib.next. And if _that_
630: * pointed to a cursor, and cursor got moved (e.g. by lseek())
631: * before d_walk() regains parent->d_lock, we'll end up skipping
632: * everything the cursor had been moved past.
633: *
634: * Solution: make sure that the pointer left behind in ->d_sib.next
635: * points to something that won't be moving around. I.e. skip the
636: * cursors.
637: */
638: while (dentry->d_sib.next) {
639: next = hlist_entry(dentry->d_sib.next, struct dentry, d_sib);
640: if (likely(!(next->d_flags & DCACHE_DENTRY_CURSOR)))
641: break;
642: dentry->d_sib.next = next->d_sib.next;
643: }
644: }
SourceFile:fs/dcache.c] |
| 11/2 | 2026/02/02 13:07 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 7443 output: 32 thoughts: 380 Thoughts: **Analyzing a Potential Race Condition in Dentry Handling** Okay, so I'm looking at these two tasks, 3002 and 3304, and there seems to be a race condition brewing around dentry management. Task 3002 is calling `d_lru_del(dentry)` and that immediately clears the `DCACHE_LRU_LIST` flag on the dentry. Specifically, that happens on line 504. Now, Task 3304 is calling `dentry_unlist(dentry)`, which seems to involve checking the `d_flags` field of a *sibling* dentry on line 640 using this `next` variable, specifically checking `next->d_flags`. So, the critical point is that 3304 appears to be unlisting a dentry (let's call it `d1`), but it's *also* looking at the flags of `d1`'s sibling, which I'll call `d2`. And here's where it gets interesting: Task 3002 is, simultaneously, removing `d2` from the LRU. This means that 3002 is calling `__dentry_kill(d2)` which then calls `d_lru_del(d2)` and then writes to `d2->d_flags`. Meanwhile, 3304 calls `__dentry_kill(d1)` and calls `dentry_unlist(d1)` and reads from `d2->d_flags`. The race, I think, is centered around the dentry at `0xffff88811a9159c0`. I can confirm, from the code, that in Task 3002, this address is the `dentry->d_flags` being modified by `d_lru_del`. Also, in Task 3304, the same address corresponds to `next->d_flags`. The potential for a race condition here seems pretty high. I need to now examine the `struct dentry` layout to be absolutely certain that this access pattern is as I suspect, and confirm the offset of `d_flags` to make a definitive diagnosis. |
| 12/2 | 2026/02/02 13:07 | tool | codesearch-struct-layout |
0mArgs:map[ContextFile:fs/dcache.c Name:dentry] Results: map[Fields:[map[Name:d_flags OffsetBits:0 SizeBits:32] map[Name:d_seq OffsetBits:32 SizeBits:32] map[Name:d_hash OffsetBits:64 SizeBits:128] map[Name:d_parent OffsetBits:192 SizeBits:64] map[Name: OffsetBits:256 SizeBits:128] map[Name:d_inode OffsetBits:384 SizeBits:64] map[Name:d_shortname OffsetBits:448 SizeBits:320] map[Name:d_op OffsetBits:768 SizeBits:64] map[Name:d_sb OffsetBits:832 SizeBits:64] map[Name:d_time OffsetBits:896 SizeBits:64] map[Name:d_fsdata OffsetBits:960 SizeBits:64] map[Name:d_lockref OffsetBits:1024 SizeBits:64] map[Name: OffsetBits:1088 SizeBits:128] map[Name:d_sib OffsetBits:1216 SizeBits:128] map[Name:d_children OffsetBits:1344 SizeBits:64] map[Name:d_u OffsetBits:1408 SizeBits:128]]] |
| 13/2 | 2026/02/02 13:07 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 8692 output: 23 thoughts: 689 Thoughts: **Analyzing a Potential Race Condition in Dentry Unlisting**
Okay, so `d_flags` is at offset 0, meaning `0xffff88811a9159c0` is indeed a `d_flags` for a dentry, confirming the context. Now, I'm focusing on Task 3304 and its call to `__dentry_kill(dentry)`. Looking at the code, I see it takes `dentry->d_lock` at line 683 and, crucially, `parent->d_lock` at line 681 *before* the call to `dentry_unlist(dentry)` at line 684. That's the key. Task 3304 is holding both locks.
`dentry_unlist(dentry)` is interesting. I'm focusing on this part:
```c
616: if (unlikely(hlist_unhashed(&dentry->d_sib)))
617: return;
618: __hlist_del(&dentry->d_sib);
...
638: while (dentry->d_sib.next) {
639: next = hlist_entry(dentry->d_sib.next, struct dentry, d_sib);
640: if (likely(!(next->d_flags & DCACHE_DENTRY_CURSOR)))
641: break;
642: dentry->d_sib.next = next->d_sib.next;
643: }
```
The code iterates through the siblings using `dentry->d_sib.next` while holding `parent->d_lock`. This is where I start getting concerned. Because Task 3304 *has* `parent->d_lock`, it should be safe to traverse the sibling list, since no other task can modify it. However, if *another* task is killing a sibling (`next`), do they also need `parent->d_lock`? This is the core question now.
Now, I'm shifting my focus to Task 3002, which is also in `__dentry_kill(next)`. I'm going to look at this block of code:
```c
663: if (dentry->d_flags & DCACHE_LRU_LIST) {
664: if (!(dentry->d_flags & DCACHE_SHRINK_LIST))
665: d_lru_del(dentry);
666: }
```
My focus is on line 665. Does it hold any locks during `d_lru_del(dentry)`? I need to know the locking context there. I also need to figure out the context of what other functions are calling `__dentry_kill()`. I see that `__dentry_kill()` is called from `finish_dput()`.
I need to see the full lock acquisition context within `finish_dput()`. This is key to understanding whether there's a potential race condition when killing siblings while one holds `parent->d_lock` and the others might be operating on them too. I need to know the lock acquisition sequence.
|
| 14/2 | 2026/02/02 13:08 | tool | codesearch-find-references |
0mArgs:map[Name:__dentry_kill] Results: map[References:[map[ReferenceKind:calls ReferencingEntityKind:function ReferencingEntityName:finish_dput SourceFile:fs/dcache.c SourceLine:879] map[ReferenceKind:calls ReferencingEntityKind:function ReferencingEntityName:shrink_kill SourceFile:fs/dcache.c SourceLine:1147]] TruncatedOutput:false] |
| 15/2 | 2026/02/02 13:08 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 10254 output: 39 thoughts: 11 Thoughts: Okay, let's dive into this. Here's my perspective on examining `finish_dput`: **Analyzing the `finish_dput` Function** Alright, so the task at hand is to understand the `finish_dput` function. Let's start by looking closely at its purpose within the broader context. What's its role in the data upload process? Is it responsible for the final stages, error handling, cleanup, or something else entirely? I need to understand its relationship to the other functions it likely interacts with, perhaps `start_dput`, `process_data`, and any others that come to mind. I'll need to trace the data flow to fully grasp its significance. My initial thought is to identify the critical variables it operates on. What data structures does it manipulate, and how? Are there any shared resources? I'll probably start by looking for those. Then I’ll need to understand any assumptions it makes about the environment it's running in, such as network connectivity, file permissions, or data integrity. I'll be looking for potential race conditions, edge cases, and areas where errors could creep in. Next, I need to evaluate its performance characteristics. Does it introduce any significant bottlenecks or overhead? Is it optimized for common use cases or a wide variety of inputs? And finally, what are the implications for scalability and maintainability? Does it use any deprecated APIs or complex logic that could be a future pain point? I'll be keeping an eye out for areas that could become problematic as the project evolves. My focus is on the robustness and effectiveness of this function. |
| 16/2 | 2026/02/02 13:08 | tool | codesearch-definition-source |
0mArgs:map[ContextFile:fs/dcache.c IncludeLines:true Name:finish_dput] Results: map[SourceCode: 873: static void finish_dput(struct dentry *dentry)
874: __releases(dentry->d_lock)
875: __releases(RCU)
876: {
877: while (lock_for_kill(dentry)) {
878: rcu_read_unlock();
879: dentry = __dentry_kill(dentry);
880: if (!dentry)
881: return;
882: if (retain_dentry(dentry, true)) {
883: spin_unlock(&dentry->d_lock);
884: return;
885: }
886: rcu_read_lock();
887: }
888: rcu_read_unlock();
889: spin_unlock(&dentry->d_lock);
890: }
SourceFile:fs/dcache.c] |
| 17/2 | 2026/02/02 13:08 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 10919 output: 40 thoughts: 51 Thoughts: Okay, let's see... **Deconstructing the `finish_dput` Call** Right, so `finish_dput` calls `__dentry_kill`. That's the starting point. But what's the deal with `lock_for_kill(dentry)`? It seems like this function is returning with `dentry->d_lock` already held, which is a key detail. Need to confirm that. Let's dig into the implementation of `lock_for_kill` and verify that the lock acquisition is happening within that function's scope, and more importantly, *why* it's doing so. I need to understand the locking semantics around the dentry kill process. This has potential implications for concurrency and potential deadlocks, so I need to be sure I understand the interaction. Let's get to the bottom of this. |
| 18/2 | 2026/02/02 13:08 | tool | codesearch-definition-source |
0mArgs:map[ContextFile:fs/dcache.c IncludeLines:true Name:lock_for_kill] Results: map[SourceCode: 708: static bool lock_for_kill(struct dentry *dentry)
709: {
710: struct inode *inode = dentry->d_inode;
711:
712: if (unlikely(dentry->d_lockref.count))
713: return false;
714:
715: if (!inode || likely(spin_trylock(&inode->i_lock)))
716: return true;
717:
718: do {
719: spin_unlock(&dentry->d_lock);
720: spin_lock(&inode->i_lock);
721: spin_lock(&dentry->d_lock);
722: if (likely(inode == dentry->d_inode))
723: break;
724: spin_unlock(&inode->i_lock);
725: inode = dentry->d_inode;
726: } while (inode);
727: if (likely(!dentry->d_lockref.count))
728: return true;
729: if (inode)
730: spin_unlock(&inode->i_lock);
731: return false;
732: }
SourceFile:fs/dcache.c] |
| 19/2 | 2026/02/02 13:08 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 11571 output: 41 thoughts: 516 Thoughts: **Analyzing the dentry Lock in `__dentry_kill`** Okay, so I'm looking at the `__dentry_kill` function and trying to understand the locking behavior. Specifically, I'm trying to figure out if there's a lock inversion possibility, and the logic around `dentry->d_lock` is key here. First, I see that `lock_for_kill` returns with `dentry->d_lock` (and `inode->i_lock` if there is one) already held. That's the starting point. So, looking at the beginning of `__dentry_kill` at line 646, `dentry->d_lock` is definitely held. The function proceeds and marks the dentry as dead with `lockref_mark_dead`. Then, it checks if the dentry is on an LRU list and removes it if necessary, but importantly, `dentry->d_lock` is *still* held at line 665. I'm noting that Task 3002 is writing to `d2->d_flags` and holding `d2->d_lock`, which isn't directly related to the current dentry, but it's a concurrent operation. Now, here's where it gets interesting: Task 3304 is also in `__dentry_kill`, but it's operating on `d1`. Line 679 introduces a `parent` and a `spin_lock(&parent->d_lock)`. After that, at line 683, it calls `spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED)`. My immediate thought is, wait, how is `dentry->d_lock` being re-acquired here? I *thought* it was held at the start of the function and during the LRU operations. It's almost as if it's been unlocked, but where? Looking closely, it seems my concern is warranted: at line 672, `spin_unlock(&dentry->d_lock)` is called if `dentry->d_inode` is not present, inside the `__d_drop` function, following `dentry_unlink_inode(dentry)`. This explains where the lock is released! I need to investigate what `dentry_unlink_inode` does. |
| 20/2 | 2026/02/02 13:08 | tool | codesearch-definition-source |
0mArgs:map[ContextFile:fs/dcache.c IncludeLines:true Name:dentry_unlink_inode] Results: map[SourceCode: 446: /*
447: * Release the dentry's inode, using the filesystem
448: * d_iput() operation if defined.
449: */
450: static void dentry_unlink_inode(struct dentry * dentry)
451: __releases(dentry->d_lock)
452: __releases(dentry->d_inode->i_lock)
453: {
454: struct inode *inode = dentry->d_inode;
455:
456: raw_write_seqcount_begin(&dentry->d_seq);
457: __d_clear_type_and_inode(dentry);
458: hlist_del_init(&dentry->d_u.d_alias);
459: raw_write_seqcount_end(&dentry->d_seq);
460: spin_unlock(&dentry->d_lock);
461: spin_unlock(&inode->i_lock);
462: if (!inode->i_nlink)
463: fsnotify_inoderemove(inode);
464: if (dentry->d_op && dentry->d_op->d_iput)
465: dentry->d_op->d_iput(dentry, inode);
466: else
467: iput(inode);
468: }
SourceFile:fs/dcache.c] |
| 21/2 | 2026/02/02 13:08 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 13062 output: 44 thoughts: 739 Thoughts: **Analyzing a Potential Race Condition in Dentry Unlinking** Okay, let's break this down. From my understanding of the `dentry` structures and locking, `dentry_unlink_inode` *does* release the `dentry->d_lock`. That's critical. So, looking at the code, specifically at line 677 in `__dentry_kill`, the `dentry->d_lock` isn't held. That makes sense. So, Task 3304 is in the process of "killing" `d1`. It's holding `parent->d_lock` and also `d1->d_lock`. It calls `dentry_unlist(d1)`. Inside `dentry_unlist(d1)`, it reads `d2->d_flags`, where `d2` is the sibling of `d1`. Meanwhile, Task 3002 is "killing" `d2`. It has `d2->d_lock` held and is at line 665, writing to `d2->d_flags`. My initial thought was, is Task 3002 also holding `parent->d_lock`? But no, a quick review of `__dentry_kill(d2)` shows that at line 665, it *only* has `d2->d_lock` (and the inode lock), not `parent->d_lock`. Ah, but *wait*, if Task 3304 is holding `parent->d_lock`, could Task 3002 actually be killing a sibling? Yes, `dput` called on `d2` can indeed trigger `__dentry_kill(d2)`. That makes this much more interesting... So, the race is between these two actions: * Task 3002: `d2->d_flags &= ~DCACHE_LRU_LIST;` (line 504) - Clear a bit. * Task 3304: `if (likely(!(next->d_flags & DCACHE_DENTRY_CURSOR)))` (line 640), where `next` is `d2` - Read a bit. Both are accessing `d2->d_flags`. The problem is, there's no single lock held by *both* tasks that protects that access! Task 3002 holds `d2->d_lock`. Task 3304 holds `parent->d_lock` and `d1->d_lock`, but critically, it does *not* hold `d2->d_lock`. Now, the critical question is, is this a harmful race? Well, `d2->d_flags` is essentially a set of flags – it's an `unsigned int`. Task 3002 is clearing the `DCACHE_LRU_LIST` flag, which I recall is bit 2 (value 4). Task 3304 is checking the `DCACHE_DENTRY_CURSOR` flag. I need to confirm the value of `DCACHE_DENTRY_CURSOR`. If they are separate, unrelated bits, the race is likely harmless. Let me check the kernel sources... |
| 22/2 | 2026/02/02 13:08 | tool | codesearch-definition-source |
0mError:requested entity does not exist Args: map[ContextFile:include/linux/dcache.h IncludeLines:true Name:DCACHE_DENTRY_CURSOR] Results: map[SourceCode: SourceFile:] |
| 23/2 | 2026/02/02 13:08 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 14567 output: 45 thoughts: 17 Thoughts: Here's my thought process, summarized as you requested: **Tracking Down `DCACHE_DENTRY_CURSOR`** Okay, so I need to understand where `DCACHE_DENTRY_CURSOR` is defined. This is likely a macro or symbol related to the dcache, probably within the kernel. Let's start with the obvious: I'll need to use my preferred tools to search the relevant kernel source code. I'll probably start with `grep` or `ack` (or whatever my favourite code search tool is at the moment) across the entire kernel source tree, looking for that specific string. I'll need to keep an eye out for any potential redefinitions or conditional compilations. I'm expecting this will point me to a header file, probably something in `include/linux/` or maybe under `fs/`. I'll pay attention to any context around the definition, the header guards, and any associated comments. That should give me a good understanding of what this thing actually *is* and how it's intended to be used. Hopefully it's not buried in some obscure build configuration or dynamically generated code! |
| 24/2 | 2026/02/02 13:08 | tool | codesearch-dir-index |
0mArgs:map[Dir:include/linux/] Results: map[Files:[8250_pci.h acct.h acpi.h acpi_amd_wbrf.h acpi_dma.h acpi_iort.h acpi_mdio.h acpi_pmtmr.h acpi_rimt.h acpi_viot.h adb.h adfs_fs.h adi-axi-common.h adreno-smmu-priv.h adxl.h aer.h agp_backend.h agpgart.h ahci-remap.h ahci_platform.h aio.h alarmtimer.h alcor_pci.h align.h alloc_tag.h altera_jtaguart.h altera_uart.h amd-iommu.h amd-pmf-io.h annotate.h anon_inodes.h aperture.h apm-emulation.h apm_bios.h apple-gmux.h arch_topology.h args.h arm-cci.h arm-smccc.h arm_ffa.h arm_mpam.h arm_sdei.h armada-37xx-rwtm-mailbox.h array_size.h ascii85.h asn1.h asn1_ber_bytecode.h asn1_decoder.h asn1_encoder.h assoc_array.h assoc_array_priv.h async.h async_tx.h ata.h ata_platform.h atalk.h atm.h atm_tcp.h atmdev.h atmel-isc-media.h atmel-ssc.h atmel_pdc.h atomic.h attribute_container.h audit.h audit_arch.h auto_dev-ioctl.h auto_fs.h auxiliary_bus.h auxvec.h average.h backing-dev-defs.h backing-dev.h backing-file.h backlight.h badblocks.h balloon_compaction.h base64.h bcd.h bch.h bcm47xx_nvram.h bcm47xx_sprom.h bcm47xx_wdt.h bcm963xx_nvram.h bcm963xx_tag.h binfmts.h bio-integrity.h bio.h bit_spinlock.h bitfield.h bitmap-str.h bitmap.h bitops.h bitrev.h bits.h blk-cgroup.h blk-crypto-profile.h blk-crypto.h blk-integrity.h blk-mq-dma.h blk-mq.h blk-pm.h blk_types.h blkdev.h blkpg.h blktrace_api.h blockgroup_lock.h bma150.h bootconfig.h bootmem_info.h bottom_half.h bpf-cgroup-defs.h bpf-cgroup.h bpf-netns.h bpf.h bpf_crypto.h bpf_lirc.h bpf_local_storage.h bpf_lsm.h bpf_mem_alloc.h bpf_mprog.h bpf_trace.h bpf_types.h bpf_verifier.h bpfptr.h brcmphy.h bsearch.h bsg-lib.h bsg.h btf.h btf_ids.h btree-128.h btree-type.h btree.h btrfs.h buffer_head.h bug.h build-salt.h build_bug.h buildid.h bvec.h c2port.h cache.h cache_coherency.h cacheflush.h cacheinfo.h call_once.h capability.h cb710.h cc_platform.h cciss_ioctl.h ccp.h cdev.h cdrom.h cfag12864b.h cfi.h cfi_types.h cgroup-defs.h cgroup.h cgroup_api.h cgroup_dmem.h cgroup_namespace.h cgroup_rdma.h cgroup_refcnt.h cgroup_subsys.h circ_buf.h cleanup.h clk-provider.h clk.h clkdev.h clockchips.h clocksource.h clocksource_ids.h closure.h cma.h cmpxchg-emu.h cn_proc.h cnt32_to_63.h coda.h codetag.h compaction.h compat.h compiler-clang.h compiler-gcc.h compiler-version.h compiler.h compiler_attributes.h compiler_types.h completion.h component.h configfs.h connector.h console.h console_struct.h consolemap.h const.h container.h container_of.h context_tracking.h context_tracking_irq.h context_tracking_state.h cookie.h cordic.h coredump.h coresight-pmu.h coresight-stm.h coresight.h count_zeros.h counter.h cper.h cpu.h cpu_cooling.h cpu_pm.h cpu_rmap.h cpu_smt.h cpufeature.h cpufreq.h cpuhotplug.h cpuhplock.h cpuidle.h cpuidle_haltpoll.h cpumask.h cpumask_api.h cpumask_types.h cpuset.h crash_core.h crash_dump.h crash_reserve.h crc-ccitt.h crc-itu-t.h crc-t10dif.h crc16.h crc32.h crc32c.h crc32poly.h crc4.h crc64.h crc7.h crc8.h cred.h crypto.h cs5535.h ctype.h cuda.h damon.h dasd_mod.h davinci_emac.h dax.h dca.h dcache.h dccp.h debug_locks.h debugfs.h debugobjects.h delay.h delayacct.h delayed_call.h dev_printk.h devcoredump.h devfreq-event.h devfreq-governor.h devfreq.h devfreq_cooling.h device-mapper.h device.h device_cgroup.h devm-helpers.h devpts_fs.h dfl.h dibs.h digsig.h dim.h dio.h dirent.h dlm.h dlm_plock.h dm-bufio.h dm-dirty-log.h dm-io.h dm-kcopyd.h dm-region-hash.h dm-verity-loadpin.h dm9000.h dma-buf-mapping.h dma-buf.h dma-direct.h dma-direction.h dma-fence-array.h dma-fence-chain.h dma-fence-unwrap.h dma-fence.h dma-heap.h dma-map-ops.h dma-mapping.h dma-resv.h dmaengine.h dmapool.h dmar.h dmi.h dnotify.h dns_resolver.h dpll.h dqblk_qtree.h dqblk_v1.h dqblk_v2.h drbd.h drbd_config.h drbd_genl.h drbd_genl_api.h drbd_limits.h ds2782_battery.h dtlk.h dtpm.h dw_apb_timer.h dynamic_debug.h dynamic_queue_limits.h earlycpio.h ecryptfs.h edac.h edd.h eeprom_93cx6.h efi-bgrt.h efi.h efi_embedded_fw.h efs_vh.h ehl_pse_io_aux.h eisa.h elf-fdpic.h elf-randomize.h elf.h elfcore-compat.h elfcore.h elfnote-lto.h elfnote.h enclosure.h energy_model.h entry-common.h entry-virt.h err.h errname.h errno.h error-injection.h errqueue.h errseq.h etherdevice.h ethtool.h ethtool_netlink.h eventfd.h eventpoll.h evm.h execmem.h export-internal.h export.h exportfs.h ext2_fs.h extable.h extcon-provider.h extcon.h f2fs_fs.h f75375s.h falloc.h fanotify.h fault-inject-usercopy.h fault-inject.h fb.h fbcon.h fcdevice.h fcntl.h fd.h fddidevice.h fdtable.h fec.h fiemap.h file.h file_ref.h fileattr.h filelock.h filter.h find.h fips.h firewire.h firmware-map.h firmware.h fixp-arith.h flat.h flex_proportions.h folio_queue.h font.h fortify-string.h fprobe.h fpu.h freezer.h fs.h fs_api.h fs_context.h fs_dirent.h fs_parser.h fs_pin.h fs_stack.h fs_struct.h fscache-cache.h fscache.h fscrypt.h fsi-occ.h fsi-sbefifo.h fsi.h fsl-diu-fb.h fsl_devices.h fsl_hypervisor.h fsl_ifc.h fsldma.h fsnotify.h fsnotify_backend.h fsverity.h ftrace.h ftrace_irq.h ftrace_regs.h futex.h fw_table.h fwctl.h fwnode.h fwnode_mdio.h gameport.h gcd.h genalloc.h generic-radix-tree.h genl_magic_func.h genl_magic_struct.h getcpu.h gfp.h gfp_api.h gfp_types.h glob.h gnss.h goldfish.h gpio-pxa.h gpio.h gpio_keys.h greybus.h group_cpus.h hardirq.h hash.h hashtable.h hashtable_api.h hdlc.h hdlcdrv.h hdmi.h hex.h hfs_common.h hid-debug.h hid-over-i2c.h hid-over-spi.h hid-roccat.h hid-sensor-hub.h hid-sensor-ids.h hid.h hid_bpf.h hidden.h hiddev.h hidraw.h highmem-internal.h highmem.h highuid.h hil.h hil_mlc.h hippidevice.h hisi_acc_qm.h hmm-dma.h hmm.h host1x.h host1x_context_bus.h hp_sdc.h hpet.h hrtimer.h hrtimer_api.h hrtimer_defs.h hrtimer_types.h hte.h huge_mm.h hugetlb.h hugetlb_cgroup.h hugetlb_inline.h hung_task.h hw_bitfield.h hw_breakpoint.h hw_random.h hwmon-sysfs.h hwmon-vid.h hwmon.h hwspinlock.h hyperv.h hypervisor.h i2c-algo-bit.h i2c-algo-pca.h i2c-algo-pcf.h i2c-atr.h i2c-dev.h i2c-mux.h i2c-of-prober.h i2c-smbus.h i2c.h i8042.h i8253.h i8254.h icmp.h icmpv6.h idle_inject.h idr.h ieee80211-eht.h ieee80211-he.h ieee80211-ht.h ieee80211-mesh.h ieee80211-nan.h ieee80211-p2p.h ieee80211-s1g.h ieee80211-vht.h ieee80211.h ieee802154.h if_arp.h if_bridge.h if_eql.h if_ether.h if_fddi.h if_hsr.h if_link.h if_macvlan.h if_phonet.h if_pppol2tp.h if_pppox.h if_rmnet.h if_tap.h if_team.h if_tun.h if_tunnel.h if_vlan.h igmp.h ihex.h ima.h imx-media.h in.h in6.h indirect_call_wrapper.h inet.h inet_diag.h inetdevice.h init.h init_ohci1394_dma.h init_syscalls.h init_task.h initrd.h inotify.h input.h instruction_pointer.h instrumentation.h instrumented.h int_log.h integrity.h intel-ish-client-if.h intel_dg_nvm_aux.h intel_pmt_features.h intel_rapl.h intel_tcc.h intel_th.h intel_tpmi.h intel_vsec.h interconnect-clk.h interconnect-provider.h interconnect.h interrupt.h interval_tree.h interval_tree_generic.h io-64-nonatomic-hi-lo.h io-64-nonatomic-lo-hi.h io-mapping.h io-pgtable.h io.h io_uring.h io_uring_types.h ioam6.h ioam6_genl.h ioam6_iptunnel.h iocontext.h iomap.h iommu-dma.h iommu-helper.h iommu.h iommufd.h iopoll.h ioport.h ioprio.h ioremap.h iosys-map.h iov_iter.h iova.h iova_bitmap.h ip.h ipack.h ipc.h ipc_namespace.h ipmi.h ipmi_smi.h ipv6.h ipv6_route.h irq-entry-common.h irq.h irq_poll.h irq_sim.h irq_work.h irq_work_types.h irqbypass.h irqchip.h irqdesc.h irqdomain.h irqdomain_defs.h irqflags.h irqflags_types.h irqhandler.h irqnr.h irqreturn.h isa-dma.h isa.h isapnp.h iscsi_boot_sysfs.h iscsi_ibft.h ism.h iversion.h jbd2.h jhash.h jiffies.h journal-head.h joystick.h jump_label.h jump_label_ratelimit.h jz4740-adc.h jz4780-nemc.h kallsyms.h kasan-checks.h kasan-enabled.h kasan-tags.h kasan.h kbd_diacr.h kbd_kern.h kbuild.h kconfig.h kcore.h kcov.h kcsan-checks.h kcsan.h kdb.h kdebug.h kdev_t.h kern_levels.h kernel-page-flags.h kernel.h kernel_read_file.h kernel_stat.h kernelcapi.h kernfs.h kexec.h kexec_handover.h key-type.h key.h keyboard.h keyctl.h kfence.h kfifo.h kgdb.h khugepaged.h klist.h kmemleak.h kmod.h kmsan-checks.h kmsan.h kmsan_string.h kmsan_types.h kmsg_dump.h kobj_map.h kobject.h kobject_api.h kobject_ns.h kprobes.h kref.h kref_api.h ks0108.h ks8842.h ks8851_mll.h ksm.h kstack_erase.h kstrtox.h kthread.h ktime.h ktime_api.h kvm_dirty_ring.h kvm_host.h kvm_irqfd.h kvm_para.h kvm_types.h l2tp.h lantiq.h lapb.h latencytop.h lcd.h lcm.h leafops.h led-class-flash.h led-class-multicolor.h led-lm3530.h leds-bd2802.h leds-expresswire.h leds-lp3944.h leds-lp3952.h leds-pca9532.h leds-regulator.h leds-ti-lmu-common.h leds.h libata.h libfdt.h libfdt_env.h libgcc.h libnvdimm.h libps2.h license.h limits.h linear_range.h linkage.h linkmode.h linux_logo.h lis3lv02d.h list.h list_bl.h list_lru.h list_nulls.h list_sort.h litex.h livepatch.h livepatch_external.h livepatch_helpers.h livepatch_sched.h liveupdate.h llc.h llist.h llist_api.h local_lock.h local_lock_internal.h lockdep.h lockdep_api.h lockdep_types.h lockref.h log2.h logic_iomem.h logic_pio.h lp.h lru_cache.h lsm_audit.h lsm_count.h lsm_hook_defs.h lsm_hooks.h lwq.h lz4.h lzo.h mISDNdsp.h mISDNhw.h mISDNif.h mailbox_client.h mailbox_controller.h maple.h maple_tree.h marvell_phy.h math.h math64.h mbcache.h mbus.h mc146818rtc.h mc33xs2410.h mc6821.h mcb.h mdev.h mdio-bitbang.h mdio-gpio.h mdio-mux.h mdio.h mei_aux.h mei_cl_bus.h mem_encrypt.h memblock.h memcontrol.h memfd.h memory-failure.h memory-tiers.h memory.h memory_hotplug.h mempolicy.h mempool.h memregion.h memremap.h memstick.h mhi.h mhi_ep.h micrel_phy.h microchipphy.h migrate.h migrate_mode.h mii.h mii_timestamper.h min_heap.h minmax.h misc_cgroup.h miscdevice.h mm.h mm_api.h mm_inline.h mm_types.h mm_types_task.h mman.h mmap_lock.h mmdebug.h mmiotrace.h mmu_context.h mmu_notifier.h mmzone.h mnt_idmapping.h mnt_namespace.h mod_devicetable.h module.h module_signature.h module_symbol.h moduleloader.h moduleparam.h most.h mount.h moxtet.h mpage.h mpi.h mpls.h mpls_iptunnel.h mroute.h mroute6.h mroute_base.h msdos_fs.h msdos_partition.h msg.h msi.h msi_api.h mtio.h mutex.h mutex_api.h mutex_types.h mv643xx_eth.h mv643xx_i2c.h mvebu-pmsu.h mxm-wmi.h namei.h nd.h ndctl.h net.h net_tstamp.h netdev_features.h netdevice.h netdevice_xmit.h netfilter.h netfilter_bridge.h netfilter_defs.h netfilter_ipv4.h netfilter_ipv6.h netfilter_netdev.h netfs.h netlink.h netpoll.h nfs.h nfs3.h nfs4.h nfs_common.h nfs_fs.h nfs_fs_i.h nfs_fs_sb.h nfs_iostat.h nfs_page.h nfs_ssc.h nfs_xdr.h nfsacl.h nfslocalio.h nitro_enclaves.h nl802154.h nls.h nmi.h node.h nodemask.h nodemask_types.h nospec.h notifier.h ns_common.h nsc_gpio.h nsfs.h nsproxy.h nstree.h ntb.h ntb_transport.h nubus.h numa.h numa_memblks.h nvme-auth.h nvme-fc-driver.h nvme-fc.h nvme-keyring.h nvme-rdma.h nvme-tcp.h nvme.h nvmem-consumer.h nvmem-provider.h nvram.h oa_tc6.h objagg.h objpool.h objtool.h objtool_types.h of.h of_address.h of_clk.h of_device.h of_dma.h of_fdt.h of_gpio.h of_graph.h of_iommu.h of_irq.h of_mdio.h of_net.h of_pci.h of_pdt.h of_platform.h of_reserved_mem.h oid_registry.h olpc-ec.h omap-dma.h omap-gpmc.h omap-iommu.h omap-mailbox.h omapfb.h once.h once_lite.h oom.h openvswitch.h osq_lock.h overflow.h packing.h padata.h page-flags-layout.h page-flags.h page-isolation.h page_counter.h page_ext.h page_frag_cache.h page_idle.h page_owner.h page_ref.h page_reporting.h page_table_check.h pageblock-flags.h pagemap.h pagevec.h pagewalk.h panic.h panic_notifier.h papr_scm.h parman.h parport.h parport_pc.h parser.h part_stat.h pata_arasan_cf_data.h patchkey.h path.h pch_dma.h pci-acpi.h pci-ats.h pci-bwctrl.h pci-doe.h pci-ecam.h pci-ep-cfs.h pci-ep-msi.h pci-epc.h pci-epf.h pci-ide.h pci-p2pdma.h pci-pwrctrl.h pci-tph.h pci-tsm.h pci.h pci_hotplug.h pci_ids.h pcie-dwc.h pcs-lynx.h pcs-rzn1-miic.h pe.h peci-cpu.h peci.h percpu-defs.h percpu-refcount.h percpu-rwsem.h percpu.h percpu_counter.h perf_event.h perf_event_api.h perf_regs.h personality.h pfn.h pgalloc.h pgalloc_tag.h pgtable.h pgtable_api.h phonet.h phy.h phy_fixed.h phy_led_triggers.h phy_link_topology.h phylib_stubs.h phylink.h pid.h pid_namespace.h pid_types.h pidfs.h pim.h pipe_fs_i.h pkeys.h pl320-ipc.h platform_device.h platform_profile.h pldmfw.h plist.h plist_types.h pm-trace.h pm.h pm_clock.h pm_domain.h pm_opp.h pm_qos.h pm_runtime.h pm_wakeirq.h pm_wakeup.h pmbus.h pmu.h pnp.h poison.h poll.h polynomial.h posix-clock.h posix-timers.h posix-timers_types.h posix_acl.h posix_acl_xattr.h power_supply.h powercap.h ppp-comp.h ppp_channel.h ppp_defs.h pps_gen_kernel.h pps_kernel.h pr.h prandom.h preempt.h prefetch.h prime_numbers.h printk.h prmt.h proc_fs.h proc_ns.h processor.h profile.h projid.h property.h pruss_driver.h psci.h pseudo_fs.h psi.h psi_types.h psp-platform-access.h psp-sev.h psp-tee.h psp.h pstore.h pstore_blk.h pstore_ram.h pstore_zone.h ptdump.h pti.h ptp_classify.h ptp_clock_kernel.h ptp_kvm.h ptp_mock.h ptp_pch.h ptr_ring.h ptrace.h ptrace_api.h purgatory.h pvclock_gtod.h pwm.h pwm_backlight.h pxa168_eth.h pxa2xx_ssp.h qnx6_fs.h quota.h quotaops.h radix-tree.h raid_class.h ramfs.h random.h randomize_kstack.h range.h ras.h ratelimit.h ratelimit_types.h rational.h rbtree.h rbtree_augmented.h rbtree_latch.h rbtree_types.h rcu_node_tree.h rcu_notifier.h rcu_segcblist.h rcu_sync.h rculist.h rculist_bl.h rculist_nulls.h rcupdate.h rcupdate_trace.h rcupdate_wait.h rcuref.h rcutiny.h rcutree.h rcuwait.h rcuwait_api.h reboot-mode.h reboot.h reciprocal_div.h ref_tracker.h refcount.h refcount_api.h refcount_types.h regmap.h regset.h relay.h remoteproc.h resctrl.h resctrl_types.h reset-controller.h reset.h resource.h resource_ext.h restart_block.h resume_user_mode.h rethook.h rfkill.h rhashtable-types.h rhashtable.h ring_buffer.h rio.h rio_drv.h rio_ids.h rio_regs.h rmap.h rmi.h rndis.h rodata_test.h rolling_buffer.h root_dev.h rpmb.h rpmsg.h rseq.h rseq_entry.h rseq_types.h rslib.h rtc.h rtmutex.h rtnetlink.h rtsx_common.h rtsx_pci.h rtsx_usb.h rv.h rw_hint.h rwbase_rt.h rwlock.h rwlock_api_smp.h rwlock_rt.h rwlock_types.h rwsem.h sbitmap.h scatterlist.h scc.h sched.h sched_clock.h scmi_imx_protocol.h scmi_protocol.h scpi_protocol.h screen_info.h scs.h sctp.h scx200.h scx200_gpio.h seccomp.h seccomp_types.h secretmem.h securebits.h security.h sed-opal-key.h sed-opal.h seg6.h seg6_genl.h seg6_hmac.h seg6_iptunnel.h seg6_local.h selection.h sem.h sem_types.h semaphore.h seq_buf.h seq_file.h seq_file_net.h seqlock.h seqlock_api.h seqlock_types.h serdev.h serial.h serial_8250.h serial_bcm63xx.h serial_core.h serial_s3c.h serial_sci.h serio.h set_memory.h sfp.h sh_clk.h sh_dma.h sh_eth.h sh_intc.h sh_timer.h shdma-base.h shm.h shmem_fs.h shrinker.h signal.h signal_types.h signalfd.h siox.h siphash.h sizes.h skb_array.h skbuff.h skbuff_ref.h skmsg.h slab.h slimbus.h sm501-regs.h sm501.h smc91x.h smp.h smp_types.h smpboot.h smsc911x.h smscphy.h sock_diag.h socket.h sockptr.h softirq.h sonet.h sonypi.h sort.h sound.h soundcard.h spinlock.h spinlock_api.h spinlock_api_smp.h spinlock_api_up.h spinlock_rt.h spinlock_types.h spinlock_types_raw.h spinlock_types_up.h spinlock_up.h splice.h spmi.h sprintf.h sram.h srcu.h srcutiny.h srcutree.h ssbi.h stackdepot.h stackprotector.h stacktrace.h start_kernel.h stat.h statfs.h static_call.h static_call_types.h static_key.h stdarg.h stddef.h stm.h stmmac.h stmp3xxx_rtc_wdt.h stmp_device.h stop_machine.h string.h string_choices.h string_helpers.h stringhash.h stringify.h sungem_phy.h sunserialcore.h sunxi-rsb.h surface_acpi_notify.h suspend.h svga.h sw842.h swab.h swait.h swait_api.h swap.h swap_cgroup.h swapfile.h swapops.h swiotlb.h switchtec.h sxgbe_platform.h sync_core.h sync_file.h synclink.h sys.h sys_info.h sys_soc.h syscall_user_dispatch.h syscall_user_dispatch_types.h syscalls.h syscalls_api.h syscore_ops.h sysctl.h sysfb.h sysfs.h syslog.h sysrq.h t10-pi.h task_io_accounting.h task_io_accounting_ops.h task_work.h taskstats_kern.h tboot.h tc.h tcp.h tee_core.h tee_drv.h tegra-icc.h termios_internal.h text-patching.h textsearch.h textsearch_fsm.h thermal.h thread_info.h threads.h thunderbolt.h ti-emif-sram.h ti_wilink_st.h tick.h tifm.h timb_dma.h timb_gpio.h time.h time32.h time64.h time_namespace.h timecounter.h timekeeper_internal.h timekeeping.h timer.h timer_types.h timerfd.h timeriomem-rng.h timerqueue.h timerqueue_types.h timex.h tnum.h topology.h torture.h toshiba.h tpm.h tpm_command.h tpm_eventlog.h tpm_svsm.h trace.h trace_clock.h trace_events.h trace_recursion.h trace_seq.h tracefs.h tracepoint-defs.h tracepoint.h transport_class.h ts-nbus.h tsacct_kern.h tsm-mr.h tsm.h tty.h tty_buffer.h tty_driver.h tty_flip.h tty_ldisc.h tty_port.h turris-omnia-mcu-interface.h turris-signing-key.h typecheck.h types.h u64_stats_sync.h u64_stats_sync_api.h uacce.h uaccess.h ubsan.h ucopysize.h ucs2_string.h udp.h uidgid.h uidgid_types.h uio.h uio_driver.h umh.h unaligned.h unicode.h union_find.h units.h unroll.h unwind_deferred.h unwind_deferred_types.h unwind_user.h unwind_user_types.h uprobes.h usb.h usb_usual.h usbdevice_fs.h user-return-notifier.h user.h user_events.h user_namespace.h userfaultfd_k.h util_macros.h uts.h uts_namespace.h utsname.h uuid.h vbox_utils.h vdpa.h vdso_datastore.h verification.h vermagic.h vexpress.h vfio.h vfio_pci_core.h vfs.h vfsdebug.h vga_switcheroo.h vgaarb.h vhost_iotlb.h via-core.h via.h via_i2c.h videodev2.h virtio.h virtio_anchor.h virtio_byteorder.h virtio_caif.h virtio_config.h virtio_dma_buf.h virtio_features.h virtio_net.h virtio_pci_admin.h virtio_pci_legacy.h virtio_pci_modern.h virtio_ring.h virtio_vsock.h vm_event_item.h vmalloc.h vmcore_info.h vmpressure.h vmstat.h vmw_vmci_api.h vmw_vmci_defs.h vringh.h vt.h vt_buffer.h vt_kern.h vtime.h w1.h wait.h wait_api.h wait_bit.h watch_queue.h watchdog.h win_minmax.h wireless.h wkup_m3_ipc.h wm97xx.h wmi.h wordpart.h workqueue.h workqueue_api.h workqueue_types.h writeback.h ww_mutex.h wwan.h xarray.h xattr.h xxhash.h xz.h yam.h zconf.h zlib.h zorro.h zsmalloc.h zstd.h zstd_errors.h zstd_lib.h zswap.h zutil.h] Subdirs:[amba atomic avf bcma bnxt bus byteorder can cdx ceph clk comedi crush decompress device dma dma-buf dsa extcon firmware fpga framer fs fsl generic_pt gpio greybus habanalabs hsi i3c iio input io_uring irqchip isdn kho lockd lsm mailbox mdio memory mfd misc mlx4 mlx5 mmc mtd mux net netfilter netfilter_arp netfilter_bridge netfilter_ipv4 netfilter_ipv6 ns pcs pds perf phy pinctrl platform_data power pse-pd pwrseq qat qed raid raspberrypi regulator remoteproc reset rpmsg rtc sched soc soundwire spi ssb sunrpc surface_aggregator ulpi unaligned usb]] |
| 25/2 | 2026/02/02 13:08 | tool | read-file |
0mArgs:map[File:include/linux/dcache.h] Results: map[Contents:/* SPDX-License-Identifier: GPL-2.0 */
#ifndef __LINUX_DCACHE_H
#define __LINUX_DCACHE_H
#include <linux/atomic.h>
#include <linux/list.h>
#include <linux/math.h>
#include <linux/rculist.h>
#include <linux/rculist_bl.h>
#include <linux/spinlock.h>
#include <linux/seqlock.h>
#include <linux/cache.h>
#include <linux/rcupdate.h>
#include <linux/lockref.h>
#include <linux/stringhash.h>
#include <linux/wait.h>
struct path;
struct file;
struct vfsmount;
/*
* linux/include/linux/dcache.h
*
* Dirent cache data structures
*
* (C) Copyright 1997 Thomas Schoebel-Theuer,
* with heavy changes by Linus Torvalds
*/
#define IS_ROOT(x) ((x) == (x)->d_parent)
/* The hash is always the low bits of hash_len */
#ifdef __LITTLE_ENDIAN
#define HASH_LEN_DECLARE u32 hash; u32 len
#define bytemask_from_count(cnt) (~(~0ul << (cnt)*8))
#else
#define HASH_LEN_DECLARE u32 len; u32 hash
#define bytemask_from_count(cnt) (~(~0ul >> (cnt)*8))
#endif
/*
* "quick string" -- eases parameter passing, but more importantly
* saves "metadata" about the string (ie length and the hash).
*
* hash comes first so it snuggles against d_parent in the
* dentry.
*/
struct qstr {
union {
struct {
HASH_LEN_DECLARE;
};
u64 hash_len;
};
const unsigned char *name;
};
#define QSTR_INIT(n,l) { { { .len = l } }, .name = n }
#define QSTR_LEN(n,l) (struct qstr)QSTR_INIT(n,l)
#define QSTR(n) QSTR_LEN(n, strlen(n))
extern const struct qstr empty_name;
extern const struct qstr slash_name;
extern const struct qstr dotdot_name;
/*
* Try to keep struct dentry aligned on 64 byte cachelines (this will
* give reasonable cacheline footprint with larger lines without the
* large memory footprint increase).
*/
#ifdef CONFIG_64BIT
# define DNAME_INLINE_WORDS 5 /* 192 bytes */
#else
# ifdef CONFIG_SMP
# define DNAME_INLINE_WORDS 9 /* 128 bytes */
# else
# define DNAME_INLINE_WORDS 11 /* 128 bytes */
# endif
#endif
#define DNAME_INLINE_LEN (DNAME_INLINE_WORDS*sizeof(unsigned long))
union shortname_store {
unsigned char string[DNAME_INLINE_LEN];
unsigned long words[DNAME_INLINE_WORDS];
};
#define d_lock d_lockref.lock
#define d_iname d_shortname.string
struct dentry {
/* RCU lookup touched fields */
unsigned int d_flags; /* protected by d_lock */
seqcount_spinlock_t d_seq; /* per dentry seqlock */
struct hlist_bl_node d_hash; /* lookup hash list */
struct dentry *d_parent; /* parent directory */
union {
struct qstr __d_name; /* for use ONLY in fs/dcache.c */
const struct qstr d_name;
};
struct inode *d_inode; /* Where the name belongs to - NULL is
* negative */
union shortname_store d_shortname;
/* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */
/* Ref lookup also touches following */
const struct dentry_operations *d_op;
struct super_block *d_sb; /* The root of the dentry tree */
unsigned long d_time; /* used by d_revalidate */
void *d_fsdata; /* fs-specific data */
/* --- cacheline 2 boundary (128 bytes) --- */
struct lockref d_lockref; /* per-dentry lock and refcount
* keep separate from RCU lookup area if
* possible!
*/
union {
struct list_head d_lru; /* LRU list */
wait_queue_head_t *d_wait; /* in-lookup ones only */
};
struct hlist_node d_sib; /* child of parent list */
struct hlist_head d_children; /* our children */
/*
* d_alias and d_rcu can share memory
*/
union {
struct hlist_node d_alias; /* inode alias list */
struct hlist_bl_node d_in_lookup_hash; /* only for in-lookup ones */
struct rcu_head d_rcu;
} d_u;
};
/*
* dentry->d_lock spinlock nesting subclasses:
*
* 0: normal
* 1: nested
*/
enum dentry_d_lock_class
{
DENTRY_D_LOCK_NORMAL, /* implicitly used by plain spin_lock() APIs. */
DENTRY_D_LOCK_NESTED
};
enum d_real_type {
D_REAL_DATA,
D_REAL_METADATA,
};
struct dentry_operations {
int (*d_revalidate)(struct inode *, const struct qstr *,
struct dentry *, unsigned int);
int (*d_weak_revalidate)(struct dentry *, unsigned int);
int (*d_hash)(const struct dentry *, struct qstr *);
int (*d_compare)(const struct dentry *,
unsigned int, const char *, const struct qstr *);
int (*d_delete)(const struct dentry *);
int (*d_init)(struct dentry *);
void (*d_release)(struct dentry *);
void (*d_prune)(struct dentry *);
void (*d_iput)(struct dentry *, struct inode *);
char *(*d_dname)(struct dentry *, char *, int);
struct vfsmount *(*d_automount)(struct path *);
int (*d_manage)(const struct path *, bool);
struct dentry *(*d_real)(struct dentry *, enum d_real_type type);
bool (*d_unalias_trylock)(const struct dentry *);
void (*d_unalias_unlock)(const struct dentry *);
} ____cacheline_aligned;
/*
* Locking rules for dentry_operations callbacks are to be found in
* Documentation/filesystems/locking.rst. Keep it updated!
*
* FUrther descriptions are found in Documentation/filesystems/vfs.rst.
* Keep it updated too!
*/
/* d_flags entries */
enum dentry_flags {
DCACHE_OP_HASH = BIT(0),
DCACHE_OP_COMPARE = BIT(1),
DCACHE_OP_REVALIDATE = BIT(2),
DCACHE_OP_DELETE = BIT(3),
DCACHE_OP_PRUNE = BIT(4),
/*
* This dentry is possibly not currently connected to the dcache tree,
* in which case its parent will either be itself, or will have this
* flag as well. nfsd will not use a dentry with this bit set, but will
* first endeavour to clear the bit either by discovering that it is
* connected, or by performing lookup operations. Any filesystem which
* supports nfsd_operations MUST have a lookup function which, if it
* finds a directory inode with a DCACHE_DISCONNECTED dentry, will
* d_move that dentry into place and return that dentry rather than the
* passed one, typically using d_splice_alias.
*/
DCACHE_DISCONNECTED = BIT(5),
DCACHE_REFERENCED = BIT(6), /* Recently used, don't discard. */
DCACHE_DONTCACHE = BIT(7), /* Purge from memory on final dput() */
DCACHE_CANT_MOUNT = BIT(8),
DCACHE_SHRINK_LIST = BIT(10),
DCACHE_OP_WEAK_REVALIDATE = BIT(11),
/*
* this dentry has been "silly renamed" and has to be deleted on the
* last dput()
*/
DCACHE_NFSFS_RENAMED = BIT(12),
DCACHE_FSNOTIFY_PARENT_WATCHED = BIT(13), /* Parent inode is watched by some fsnotify listener */
DCACHE_DENTRY_KILLED = BIT(14),
DCACHE_MOUNTED = BIT(15), /* is a mountpoint */
DCACHE_NEED_AUTOMOUNT = BIT(16), /* handle automount on this dir */
DCACHE_MANAGE_TRANSIT = BIT(17), /* manage transit from this dirent */
DCACHE_LRU_LIST = BIT(18),
DCACHE_ENTRY_TYPE = (7 << 19), /* bits 19..21 are for storing type: */
DCACHE_MISS_TYPE = (0 << 19), /* Negative dentry */
DCACHE_WHITEOUT_TYPE = (1 << 19), /* Whiteout dentry (stop pathwalk) */
DCACHE_DIRECTORY_TYPE = (2 << 19), /* Normal directory */
DCACHE_AUTODIR_TYPE = (3 << 19), /* Lookupless directory (presumed automount) */
DCACHE_REGULAR_TYPE = (4 << 19), /* Regular file type */
DCACHE_SPECIAL_TYPE = (5 << 19), /* Other file type */
DCACHE_SYMLINK_TYPE = (6 << 19), /* Symlink */
DCACHE_NOKEY_NAME = BIT(22), /* Encrypted name encoded without key */
DCACHE_OP_REAL = BIT(23),
DCACHE_PAR_LOOKUP = BIT(24), /* being looked up (with parent locked shared) */
DCACHE_DENTRY_CURSOR = BIT(25),
DCACHE_NORCU = BIT(26), /* No RCU delay for freeing */
DCACHE_PERSISTENT = BIT(27)
};
#define DCACHE_MANAGED_DENTRY \
(DCACHE_MOUNTED|DCACHE_NEED_AUTOMOUNT|DCACHE_MANAGE_TRANSIT)
extern seqlock_t rename_lock;
/*
* These are the low-level FS interfaces to the dcache..
*/
extern void d_instantiate(struct dentry *, struct inode *);
extern void d_instantiate_new(struct dentry *, struct inode *);
extern void __d_drop(struct dentry *dentry);
extern void d_drop(struct dentry *dentry);
extern void d_delete(struct dentry *);
/* allocate/de-allocate */
extern struct dentry * d_alloc(struct dentry *, const struct qstr *);
extern struct dentry * d_alloc_anon(struct super_block *);
extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *,
wait_queue_head_t *);
extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
/* weird procfs mess; *NOT* exported */
extern struct dentry * d_splice_alias_ops(struct inode *, struct dentry *,
const struct dentry_operations *);
extern struct dentry * d_add_ci(struct dentry *, struct inode *, struct qstr *);
extern bool d_same_name(const struct dentry *dentry, const struct dentry *parent,
const struct qstr *name);
extern struct dentry *d_find_any_alias(struct inode *inode);
extern struct dentry * d_obtain_alias(struct inode *);
extern struct dentry * d_obtain_root(struct inode *);
extern void shrink_dcache_sb(struct super_block *);
extern void shrink_dcache_parent(struct dentry *);
extern void d_invalidate(struct dentry *);
/* only used at mount-time */
extern struct dentry * d_make_root(struct inode *);
extern void d_mark_tmpfile(struct file *, struct inode *);
extern void d_tmpfile(struct file *, struct inode *);
extern struct dentry *d_find_alias(struct inode *);
extern void d_prune_aliases(struct inode *);
extern void d_dispose_if_unused(struct dentry *, struct list_head *);
extern void shrink_dentry_list(struct list_head *);
extern struct dentry *d_find_alias_rcu(struct inode *);
/* test whether we have any submounts in a subdir tree */
extern int path_has_submounts(const struct path *);
/*
* This adds the entry to the hash queues.
*/
extern void d_rehash(struct dentry *);
extern void d_add(struct dentry *, struct inode *);
/* used for rename() and baskets */
extern void d_move(struct dentry *, struct dentry *);
extern void d_exchange(struct dentry *, struct dentry *);
extern struct dentry *d_ancestor(struct dentry *, struct dentry *);
extern struct dentry *d_lookup(const struct dentry *, const struct qstr *);
static inline unsigned d_count(const struct dentry *dentry)
{
return dentry->d_lockref.count;
}
ino_t d_parent_ino(struct dentry *dentry);
/*
* helper function for dentry_operations.d_dname() members
*/
extern __printf(3, 4)
char *dynamic_dname(char *, int, const char *, ...);
extern char *__d_path(const struct path *, const struct path *, char *, int);
extern char *d_absolute_path(const struct path *, char *, int);
extern char *d_path(const struct path *, char *, int);
extern char *dentry_path_raw(const struct dentry *, char *, int);
extern char *dentry_path(const struct dentry *, char *, int);
/* Allocation counts.. */
/**
* dget_dlock - get a reference to a dentry
* @dentry: dentry to get a reference to
*
* Given a live dentry, increment the reference count and return the dentry.
* Caller must hold @dentry->d_lock. Making sure that dentry is alive is
* caller's resonsibility. There are many conditions sufficient to guarantee
* that; e.g. anything with non-negative refcount is alive, so's anything
* hashed, anything positive, anyone's parent, etc.
*/
static inline struct dentry *dget_dlock(struct dentry *dentry)
{
dentry->d_lockref.count++;
return dentry;
}
/**
* dget - get a reference to a dentry
* @dentry: dentry to get a reference to
*
* Given a dentry or %NULL pointer increment the reference count
* if appropriate and return the dentry. A dentry will not be
* destroyed when it has references. Conversely, a dentry with
* no references can disappear for any number of reasons, starting
* with memory pressure. In other words, that primitive is
* used to clone an existing reference; using it on something with
* zero refcount is a bug.
*
* NOTE: it will spin if @dentry->d_lock is held. From the deadlock
* avoidance point of view it is equivalent to spin_lock()/increment
* refcount/spin_unlock(), so calling it under @dentry->d_lock is
* always a bug; so's calling it under ->d_lock on any of its descendents.
*
*/
static inline struct dentry *dget(struct dentry *dentry)
{
if (dentry)
lockref_get(&dentry->d_lockref);
return dentry;
}
extern struct dentry *dget_parent(struct dentry *dentry);
/**
* d_unhashed - is dentry hashed
* @dentry: entry to check
*
* Returns true if the dentry passed is not currently hashed.
*/
static inline int d_unhashed(const struct dentry *dentry)
{
return hlist_bl_unhashed(&dentry->d_hash);
}
static inline int d_unlinked(const struct dentry *dentry)
{
return d_unhashed(dentry) && !IS_ROOT(dentry);
}
static inline int cant_mount(const struct dentry *dentry)
{
return (dentry->d_flags & DCACHE_CANT_MOUNT);
}
static inline void dont_mount(struct dentry *dentry)
{
spin_lock(&dentry->d_lock);
dentry->d_flags |= DCACHE_CANT_MOUNT;
spin_unlock(&dentry->d_lock);
}
extern void __d_lookup_unhash_wake(struct dentry *dentry);
static inline int d_in_lookup(const struct dentry *dentry)
{
return dentry->d_flags & DCACHE_PAR_LOOKUP;
}
static inline void d_lookup_done(struct dentry *dentry)
{
if (unlikely(d_in_lookup(dentry)))
__d_lookup_unhash_wake(dentry);
}
extern void dput(struct dentry *);
static inline bool d_managed(const struct dentry *dentry)
{
return dentry->d_flags & DCACHE_MANAGED_DENTRY;
}
static inline bool d_mountpoint(const struct dentry *dentry)
{
return dentry->d_flags & DCACHE_MOUNTED;
}
/*
* Directory cache entry type accessor functions.
*/
static inline unsigned __d_entry_type(const struct dentry *dentry)
{
return dentry->d_flags & DCACHE_ENTRY_TYPE;
}
static inline bool d_is_miss(const struct dentry *dentry)
{
return __d_entry_type(dentry) == DCACHE_MISS_TYPE;
}
static inline bool d_is_whiteout(const struct dentry *dentry)
{
return __d_entry_type(dentry) == DCACHE_WHITEOUT_TYPE;
}
static inline bool d_can_lookup(const struct dentry *dentry)
{
return __d_entry_type(dentry) == DCACHE_DIRECTORY_TYPE;
}
static inline bool d_is_autodir(const struct dentry *dentry)
{
return __d_entry_type(dentry) == DCACHE_AUTODIR_TYPE;
}
static inline bool d_is_dir(const struct dentry *dentry)
{
return d_can_lookup(dentry) || d_is_autodir(dentry);
}
static inline bool d_is_symlink(const struct dentry *dentry)
{
return __d_entry_type(dentry) == DCACHE_SYMLINK_TYPE;
}
static inline bool d_is_reg(const struct dentry *dentry)
{
return __d_entry_type(dentry) == DCACHE_REGULAR_TYPE;
}
static inline bool d_is_special(const struct dentry *dentry)
{
return __d_entry_type(dentry) == DCACHE_SPECIAL_TYPE;
}
static inline bool d_is_file(const struct dentry *dentry)
{
return d_is_reg(dentry) || d_is_special(dentry);
}
static inline bool d_is_negative(const struct dentry *dentry)
{
// TODO: check d_is_whiteout(dentry) also.
return d_is_miss(dentry);
}
static inline bool d_flags_negative(unsigned flags)
{
return (flags & DCACHE_ENTRY_TYPE) == DCACHE_MISS_TYPE;
}
static inline bool d_is_positive(const struct dentry *dentry)
{
return !d_is_negative(dentry);
}
/**
* d_really_is_negative - Determine if a dentry is really negative (ignoring fallthroughs)
* @dentry: The dentry in question
*
* Returns true if the dentry represents either an absent name or a name that
* doesn't map to an inode (ie. ->d_inode is NULL). The dentry could represent
* a true miss, a whiteout that isn't represented by a 0,0 chardev or a
* fallthrough marker in an opaque directory.
*
* Note! (1) This should be used *only* by a filesystem to examine its own
* dentries. It should not be used to look at some other filesystem's
* dentries. (2) It should also be used in combination with d_inode() to get
* the inode. (3) The dentry may have something attached to ->d_lower and the
* type field of the flags may be set to something other than miss or whiteout.
*/
static inline bool d_really_is_negative(const struct dentry *dentry)
{
return dentry->d_inode == NULL;
}
/**
* d_really_is_positive - Determine if a dentry is really positive (ignoring fallthroughs)
* @dentry: The dentry in question
*
* Returns true if the dentry represents a name that maps to an inode
* (ie. ->d_inode is not NULL). The dentry might still represent a whiteout if
* that is represented on medium as a 0,0 chardev.
*
* Note! (1) This should be used *only* by a filesystem to examine its own
* dentries. It should not be used to look at some other filesystem's
* dentries. (2) It should also be used in combination with d_inode() to get
* the inode.
*/
static inline bool d_really_is_positive(const struct dentry *dentry)
{
return dentry->d_inode != NULL;
}
static inline int simple_positive(const struct dentry *dentry)
{
return d_really_is_positive(dentry) && !d_unhashed(dentry);
}
unsigned long vfs_pressure_ratio(unsigned long val);
/**
* d_inode - Get the actual inode of this dentry
* @dentry: The dentry to query
*
* This is the helper normal filesystems should use to get at their own inodes
* in their own dentries and ignore the layering superimposed upon them.
*/
static inline struct inode *d_inode(const struct dentry *dentry)
{
return dentry->d_inode;
}
/**
* d_inode_rcu - Get the actual inode of this dentry with READ_ONCE()
* @dentry: The dentry to query
*
* This is the helper normal filesystems should use to get at their own inodes
* in their own dentries and ignore the layering superimposed upon them.
*/
static inline struct inode *d_inode_rcu(const struct dentry *dentry)
{
return READ_ONCE(dentry->d_inode);
}
/**
* d_backing_inode - Get upper or lower inode we should be using
* @upper: The upper layer
*
* This is the helper that should be used to get at the inode that will be used
* if this dentry were to be opened as a file. The inode may be on the upper
* dentry or it may be on a lower dentry pinned by the upper.
*
* Normal filesystems should not use this to access their own inodes.
*/
static inline struct inode *d_backing_inode(const struct dentry *upper)
{
struct inode *inode = upper->d_inode;
return inode;
}
/**
* d_real - Return the real dentry
* @dentry: the dentry to query
* @type: the type of real dentry (data or metadata)
*
* If dentry is on a union/overlay, then return the underlying, real dentry.
* Otherwise return the dentry itself.
*
* See also: Documentation/filesystems/vfs.rst
*/
static inline struct dentry *d_real(struct dentry *dentry, enum d_real_type type)
{
if (unlikely(dentry->d_flags & DCACHE_OP_REAL))
return dentry->d_op->d_real(dentry, type);
else
return dentry;
}
/**
* d_real_inode - Return the real inode hosting the data
* @dentry: The dentry to query
*
* If dentry is on a union/overlay, then return the underlying, real inode.
* Otherwise return d_inode().
*/
static inline struct inode *d_real_inode(const struct dentry *dentry)
{
/* This usage of d_real() results in const dentry */
return d_inode(d_real((struct dentry *) dentry, D_REAL_DATA));
}
struct name_snapshot {
struct qstr name;
union shortname_store inline_name;
};
void take_dentry_name_snapshot(struct name_snapshot *, struct dentry *);
void release_dentry_name_snapshot(struct name_snapshot *);
static inline struct dentry *d_first_child(const struct dentry *dentry)
{
return hlist_entry_safe(dentry->d_children.first, struct dentry, d_sib);
}
static inline struct dentry *d_next_sibling(const struct dentry *dentry)
{
return hlist_entry_safe(dentry->d_sib.next, struct dentry, d_sib);
}
void set_default_d_op(struct super_block *, const struct dentry_operations *);
struct dentry *d_make_persistent(struct dentry *, struct inode *);
void d_make_discardable(struct dentry *dentry);
#endif /* __LINUX_DCACHE_H */
] |
| 26/2 | 2026/02/02 13:08 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 34326 output: 45 thoughts: 864 Thoughts: **Analyzing Dentry Flag Changes** Okay, let's break this down. First, I'm setting the context: `DCACHE_DENTRY_CURSOR` is bit 25, and `DCACHE_LRU_LIST` is bit 18. I see a report that the value changed from `0x00140004` to `0x00000004`. Right, `0x00140004` - let me dissect that: it has bit 2 set, which is `DCACHE_OP_REVALIDATE`, and bit 18, `DCACHE_LRU_LIST`, and... wait, that looks like bit 20 is also set. Let me confirm, Ah, yes, bit 20 is actually part of `DCACHE_ENTRY_TYPE` defined as `(7 << 19)`. So: bit 19 is `0x00080000`, bit 20 is `0x00100000`, and so on. Specifically, it's `(2 << 19)` which translates to `DCACHE_DIRECTORY_TYPE`. So `0x00140004` is indeed `DCACHE_OP_REVALIDATE | DCACHE_LRU_LIST | DCACHE_DIRECTORY_TYPE`. Now, the code shows Task 3002 executing `dentry->d_flags &= ~DCACHE_LRU_LIST;`. That *should* just clear bit 18, right? I expect the value to become `0x00100004`. But the report says it went to `0x00000004`! That means both bit 18 *and* bit 20 were cleared. `0x00000004` is just `DCACHE_OP_REVALIDATE`, with everything else gone. That's a problem, bit 20 indicates the type. I'm checking, Task 3002 is executing `d_lru_del(dentry)`. Okay, so `dentry->d_flags &= ~DCACHE_LRU_LIST;` is the relevant line. It only touches bit 18. Why did bit 20 also change? There must be something else clearing it. Okay, let me trace further. Task 3002 is in `__dentry_kill(d2)`. Within `__dentry_kill`, it calls `__d_drop(dentry)` and, crucially, `dentry_unlink_inode(dentry)`. And that calls `__d_clear_type_and_inode(dentry)`. Aha! Now we have a suspect. I'll need to examine `__d_clear_type_and_inode` to see if that's where the directory type flag is being inadvertently cleared. |
| 27/2 | 2026/02/02 13:08 | tool | codesearch-definition-source |
0mArgs:map[ContextFile:fs/dcache.c IncludeLines:true Name:__d_clear_type_and_inode] Results: map[SourceCode: 414: static inline void __d_clear_type_and_inode(struct dentry *dentry)
415: {
416: unsigned flags = READ_ONCE(dentry->d_flags);
417:
418: flags &= ~DCACHE_ENTRY_TYPE;
419: WRITE_ONCE(dentry->d_flags, flags);
420: dentry->d_inode = NULL;
421: /*
422: * The negative counter only tracks dentries on the LRU. Don't inc if
423: * d_lru is on another list.
424: */
425: if ((flags & (DCACHE_LRU_LIST|DCACHE_SHRINK_LIST)) == DCACHE_LRU_LIST)
426: this_cpu_inc(nr_dentry_negative);
427: }
SourceFile:fs/dcache.c] |
| 28/2 | 2026/02/02 13:08 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 36155 output: 25 thoughts: 2842 Thoughts: **Data Race Analysis: dentry flags in dentry_unlist and d_lru_del** Okay, so I'm looking at this KCSAN report. Essentially, it's highlighting a potential data race between two tasks operating on a `dentry`'s `d_flags` field. Task 3002 is in `d_lru_del` (specifically line 504), and Task 3304 is in `dentry_unlist` (line 640). The core concern is that Task 3002 might be modifying `d_flags` while Task 3304 is reading it, leading to inconsistent values. The specific bits Task 3002 is touching are related to the dcache LRU list (`DCACHE_LRU_LIST`, bit 18) and the dentry type (`DCACHE_ENTRY_TYPE`, bit 20), cleared by `__d_clear_type_and_inode`. Task 3304, however, is interested in the `DCACHE_DENTRY_CURSOR` bit (bit 25). My initial thought is that since these bits are different, there's no immediate harm. Task 3304 might see a partially updated value of `d_flags`, but it's only checking `DCACHE_DENTRY_CURSOR`, which Task 3002 isn't modifying directly. However, the fact that `d_flags` is modified with a non-atomic read-modify-write (`&= ~DCACHE_LRU_LIST`) makes me worry. The compiler could break this into a load, a bitwise clear, and a store. If Task 3304 reads between the load and store, it sees an old value, but that's normal. The key is that `d_flags` is *supposed* to be protected by `dentry->d_lock`, but Task 3304 reads `next->d_flags` *without* `next->d_lock`. It's only holding `parent->d_lock`. The documentation in `include/linux/dcache.h` states that `d_flags` is protected by `d_lock`. This *should* be protected. The crucial point here is whether `DCACHE_DENTRY_CURSOR` can be modified concurrently. If `DCACHE_DENTRY_CURSOR` is stable, reading it without the lock is okay. Cursors are special dentries that *should* have a fixed cursor status. They're never on the LRU, so the `dentry` in question can't be a cursor. If it *were* a cursor being killed, that would be different. But the `D_FLAG_VERIFY` macro in `d_lru_del` confirms that the entry *is* on the LRU, and cursors never are. So `d2` is not a cursor; `DCACHE_DENTRY_CURSOR` is zero. Therefore, even if there's a race, Task 3304 should always read 0 for `DCACHE_DENTRY_CURSOR`. And the accesses are 32-bit (unsigned int), so the access itself is atomic, preventing tearing. I also thought: could Task 3304 be reading `d2->d_flags` *after* `d2` has been freed? But Task 3304 holds `parent->d_lock`, and Task 3002 must acquire that lock before unlisting and freeing `d2`. So Task 3002 is blocked before `d2` can be freed. I had to understand why Task 3304 skips cursors. It's because cursors can move, and the siblings list must be maintained consistently. When unlisting `d1`, it needs to ensure the `next` pointer points to a valid dentry, not a cursor that might change its location. So the skip is a safety measure. And `d2` is not a cursor. Now, let's examine the KCSAN report's details about the value change: It shows `0x00140004` changing to `0x00000004`, indicating bits 18 and 20 were cleared. `d_lru_del` clears only bit 18. The other must be cleared by `__d_clear_type_and_inode`. It seems Task 3002 executes `d_lru_del` and proceeds, and the whole word access is a race. The important thing: even with the race, bit 25 (the `DCACHE_DENTRY_CURSOR` bit), which Task 3 |
| 29/2 | 2026/02/02 13:09 | tool | codesearch-find-references |
0mError:requested entity does not exist Args: map[Name:DCACHE_DENTRY_CURSOR] Results: map[References:<nil> TruncatedOutput:false] |
| 30/2 | 2026/02/02 13:09 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 40068 output: 41 thoughts: 127 Thoughts: **Navigating the Labyrinth of Code** Okay, so I've hit an enum member – *classic*. Right away, I'm thinking `codesearch-find-references` might be a little too narrow here, potentially missing the mark if it's not specifically tuned to look for enum constants. I need something more robust, something that'll really dig deep. My initial instinct? Reach for `grep`. Combine that with a dive into the `codesearch-dir-index` and a little `read-file` magic, or maybe just go straight for a text search. That should be pretty comprehensive. Then a thought flashes: I *could* try `codesearch-find-references` on `DCACHE_DENTRY_CURSOR` directly, but maybe treating it as a literal string would be more effective in this case. Wait a second… hold on. I'm already familiar with this code base. Didn't I *just* see that member declared in `include/linux/dcache.h`? Yes, I did! Okay, so I'll go straight for a targeted search within `fs/dcache.c`. That should give me the most direct insight into how it's actually being used. Time to get to the heart of the matter. |
| 31/2 | 2026/02/02 13:09 | tool | codesearch-dir-index |
0mArgs:map[Dir:fs/] Results: map[Files:[aio.c anon_inodes.c attr.c backing-file.c bad_inode.c binfmt_elf.c binfmt_elf_fdpic.c binfmt_flat.c binfmt_misc.c binfmt_script.c bpf_fs_kfuncs.c buffer.c char_dev.c compat_binfmt_elf.c coredump.c d_path.c dax.c dcache.c direct-io.c drop_caches.c eventfd.c eventpoll.c exec.c fcntl.c fhandle.c file.c file_attr.c file_table.c filesystems.c fs-writeback.c fs_context.c fs_dirent.c fs_parser.c fs_pin.c fs_struct.c fsopen.c init.c inode.c internal.h ioctl.c kernel_read_file.c libfs.c locks.c mbcache.c mnt_idmapping.c mount.h mpage.c namei.c namespace.c nsfs.c open.c pidfs.c pipe.c pnode.c pnode.h posix_acl.c proc_namespace.c read_write.c readdir.c remap_range.c select.c seq_file.c signalfd.c splice.c stack.c stat.c statfs.c super.c sync.c sysctls.c timerfd.c userfaultfd.c utimes.c xattr.c] Subdirs:[9p adfs affs afs autofs befs bfs btrfs cachefiles ceph coda configfs cramfs crypto debugfs devpts dlm ecryptfs efivarfs efs erofs exfat exportfs ext2 ext4 f2fs fat freevxfs fuse gfs2 hfs hfsplus hostfs hpfs hugetlbfs iomap isofs jbd2 jffs2 jfs kernfs lockd minix netfs nfs nfs_common nfsd nilfs2 nls notify ntfs3 ocfs2 omfs openpromfs orangefs overlayfs proc pstore qnx4 qnx6 quota ramfs resctrl romfs smb squashfs sysfs tests tracefs ubifs udf ufs unicode vboxsf verity xfs zonefs]] |
| 32/2 | 2026/02/02 13:09 | tool | read-file |
0mArgs:map[File:fs/dcache.c] Results: map[Contents:// SPDX-License-Identifier: GPL-2.0-only
/*
* fs/dcache.c
*
* Complete reimplementation
* (C) 1997 Thomas Schoebel-Theuer,
* with heavy changes by Linus Torvalds
*/
/*
* Notes on the allocation strategy:
*
* The dcache is a master of the icache - whenever a dcache entry
* exists, the inode will always exist. "iput()" is done either when
* the dcache entry is deleted or garbage collected.
*/
#include <linux/ratelimit.h>
#include <linux/string.h>
#include <linux/mm.h>
#include <linux/fs.h>
#include <linux/fscrypt.h>
#include <linux/fsnotify.h>
#include <linux/slab.h>
#include <linux/init.h>
#include <linux/hash.h>
#include <linux/cache.h>
#include <linux/export.h>
#include <linux/security.h>
#include <linux/seqlock.h>
#include <linux/memblock.h>
#include <linux/bit_spinlock.h>
#include <linux/rculist_bl.h>
#include <linux/list_lru.h>
#include "internal.h"
#include "mount.h"
#include <asm/runtime-const.h>
/*
* Usage:
* dcache->d_inode->i_lock protects:
* - i_dentry, d_u.d_alias, d_inode of aliases
* dcache_hash_bucket lock protects:
* - the dcache hash table
* s_roots bl list spinlock protects:
* - the s_roots list (see __d_drop)
* dentry->d_sb->s_dentry_lru_lock protects:
* - the dcache lru lists and counters
* d_lock protects:
* - d_flags
* - d_name
* - d_lru
* - d_count
* - d_unhashed()
* - d_parent and d_chilren
* - childrens' d_sib and d_parent
* - d_u.d_alias, d_inode
*
* Ordering:
* dentry->d_inode->i_lock
* dentry->d_lock
* dentry->d_sb->s_dentry_lru_lock
* dcache_hash_bucket lock
* s_roots lock
*
* If there is an ancestor relationship:
* dentry->d_parent->...->d_parent->d_lock
* ...
* dentry->d_parent->d_lock
* dentry->d_lock
*
* If no ancestor relationship:
* arbitrary, since it's serialized on rename_lock
*/
static int sysctl_vfs_cache_pressure __read_mostly = 100;
static int sysctl_vfs_cache_pressure_denom __read_mostly = 100;
unsigned long vfs_pressure_ratio(unsigned long val)
{
return mult_frac(val, sysctl_vfs_cache_pressure, sysctl_vfs_cache_pressure_denom);
}
EXPORT_SYMBOL_GPL(vfs_pressure_ratio);
__cacheline_aligned_in_smp DEFINE_SEQLOCK(rename_lock);
EXPORT_SYMBOL(rename_lock);
static struct kmem_cache *__dentry_cache __ro_after_init;
#define dentry_cache runtime_const_ptr(__dentry_cache)
const struct qstr empty_name = QSTR_INIT("", 0);
EXPORT_SYMBOL(empty_name);
const struct qstr slash_name = QSTR_INIT("/", 1);
EXPORT_SYMBOL(slash_name);
const struct qstr dotdot_name = QSTR_INIT("..", 2);
EXPORT_SYMBOL(dotdot_name);
/*
* This is the single most critical data structure when it comes
* to the dcache: the hashtable for lookups. Somebody should try
* to make this good - I've just made it work.
*
* This hash-function tries to avoid losing too many bits of hash
* information, yet avoid using a prime hash-size or similar.
*
* Marking the variables "used" ensures that the compiler doesn't
* optimize them away completely on architectures with runtime
* constant infrastructure, this allows debuggers to see their
* values. But updating these values has no effect on those arches.
*/
static unsigned int d_hash_shift __ro_after_init __used;
static struct hlist_bl_head *dentry_hashtable __ro_after_init __used;
static inline struct hlist_bl_head *d_hash(unsigned long hashlen)
{
return runtime_const_ptr(dentry_hashtable) +
runtime_const_shift_right_32(hashlen, d_hash_shift);
}
#define IN_LOOKUP_SHIFT 10
static struct hlist_bl_head in_lookup_hashtable[1 << IN_LOOKUP_SHIFT];
static inline struct hlist_bl_head *in_lookup_hash(const struct dentry *parent,
unsigned int hash)
{
hash += (unsigned long) parent / L1_CACHE_BYTES;
return in_lookup_hashtable + hash_32(hash, IN_LOOKUP_SHIFT);
}
struct dentry_stat_t {
long nr_dentry;
long nr_unused;
long age_limit; /* age in seconds */
long want_pages; /* pages requested by system */
long nr_negative; /* # of unused negative dentries */
long dummy; /* Reserved for future use */
};
static DEFINE_PER_CPU(long, nr_dentry);
static DEFINE_PER_CPU(long, nr_dentry_unused);
static DEFINE_PER_CPU(long, nr_dentry_negative);
static int dentry_negative_policy;
#if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS)
/* Statistics gathering. */
static struct dentry_stat_t dentry_stat = {
.age_limit = 45,
};
/*
* Here we resort to our own counters instead of using generic per-cpu counters
* for consistency with what the vfs inode code does. We are expected to harvest
* better code and performance by having our own specialized counters.
*
* Please note that the loop is done over all possible CPUs, not over all online
* CPUs. The reason for this is that we don't want to play games with CPUs going
* on and off. If one of them goes off, we will just keep their counters.
*
* glommer: See cffbc8a for details, and if you ever intend to change this,
* please update all vfs counters to match.
*/
static long get_nr_dentry(void)
{
int i;
long sum = 0;
for_each_possible_cpu(i)
sum += per_cpu(nr_dentry, i);
return sum < 0 ? 0 : sum;
}
static long get_nr_dentry_unused(void)
{
int i;
long sum = 0;
for_each_possible_cpu(i)
sum += per_cpu(nr_dentry_unused, i);
return sum < 0 ? 0 : sum;
}
static long get_nr_dentry_negative(void)
{
int i;
long sum = 0;
for_each_possible_cpu(i)
sum += per_cpu(nr_dentry_negative, i);
return sum < 0 ? 0 : sum;
}
static int proc_nr_dentry(const struct ctl_table *table, int write, void *buffer,
size_t *lenp, loff_t *ppos)
{
dentry_stat.nr_dentry = get_nr_dentry();
dentry_stat.nr_unused = get_nr_dentry_unused();
dentry_stat.nr_negative = get_nr_dentry_negative();
return proc_doulongvec_minmax(table, write, buffer, lenp, ppos);
}
static const struct ctl_table fs_dcache_sysctls[] = {
{
.procname = "dentry-state",
.data = &dentry_stat,
.maxlen = 6*sizeof(long),
.mode = 0444,
.proc_handler = proc_nr_dentry,
},
{
.procname = "dentry-negative",
.data = &dentry_negative_policy,
.maxlen = sizeof(dentry_negative_policy),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_ONE,
},
};
static const struct ctl_table vm_dcache_sysctls[] = {
{
.procname = "vfs_cache_pressure",
.data = &sysctl_vfs_cache_pressure,
.maxlen = sizeof(sysctl_vfs_cache_pressure),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO,
},
{
.procname = "vfs_cache_pressure_denom",
.data = &sysctl_vfs_cache_pressure_denom,
.maxlen = sizeof(sysctl_vfs_cache_pressure_denom),
.mode = 0644,
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ONE_HUNDRED,
},
};
static int __init init_fs_dcache_sysctls(void)
{
register_sysctl_init("vm", vm_dcache_sysctls);
register_sysctl_init("fs", fs_dcache_sysctls);
return 0;
}
fs_initcall(init_fs_dcache_sysctls);
#endif
/*
* Compare 2 name strings, return 0 if they match, otherwise non-zero.
* The strings are both count bytes long, and count is non-zero.
*/
#ifdef CONFIG_DCACHE_WORD_ACCESS
#include <asm/word-at-a-time.h>
/*
* NOTE! 'cs' and 'scount' come from a dentry, so it has a
* aligned allocation for this particular component. We don't
* strictly need the load_unaligned_zeropad() safety, but it
* doesn't hurt either.
*
* In contrast, 'ct' and 'tcount' can be from a pathname, and do
* need the careful unaligned handling.
*/
static inline int dentry_string_cmp(const unsigned char *cs, const unsigned char *ct, unsigned tcount)
{
unsigned long a,b,mask;
for (;;) {
a = read_word_at_a_time(cs);
b = load_unaligned_zeropad(ct);
if (tcount < sizeof(unsigned long))
break;
if (unlikely(a != b))
return 1;
cs += sizeof(unsigned long);
ct += sizeof(unsigned long);
tcount -= sizeof(unsigned long);
if (!tcount)
return 0;
}
mask = bytemask_from_count(tcount);
return unlikely(!!((a ^ b) & mask));
}
#else
static inline int dentry_string_cmp(const unsigned char *cs, const unsigned char *ct, unsigned tcount)
{
do {
if (*cs != *ct)
return 1;
cs++;
ct++;
tcount--;
} while (tcount);
return 0;
}
#endif
static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *ct, unsigned tcount)
{
/*
* Be careful about RCU walk racing with rename:
* use 'READ_ONCE' to fetch the name pointer.
*
* NOTE! Even if a rename will mean that the length
* was not loaded atomically, we don't care. The
* RCU walk will check the sequence count eventually,
* and catch it. And we won't overrun the buffer,
* because we're reading the name pointer atomically,
* and a dentry name is guaranteed to be properly
* terminated with a NUL byte.
*
* End result: even if 'len' is wrong, we'll exit
* early because the data cannot match (there can
* be no NUL in the ct/tcount data)
*/
const unsigned char *cs = READ_ONCE(dentry->d_name.name);
return dentry_string_cmp(cs, ct, tcount);
}
/*
* long names are allocated separately from dentry and never modified.
* Refcounted, freeing is RCU-delayed. See take_dentry_name_snapshot()
* for the reason why ->count and ->head can't be combined into a union.
* dentry_string_cmp() relies upon ->name[] being word-aligned.
*/
struct external_name {
atomic_t count;
struct rcu_head head;
unsigned char name[] __aligned(sizeof(unsigned long));
};
static inline struct external_name *external_name(struct dentry *dentry)
{
return container_of(dentry->d_name.name, struct external_name, name[0]);
}
static void __d_free(struct rcu_head *head)
{
struct dentry *dentry = container_of(head, struct dentry, d_u.d_rcu);
kmem_cache_free(dentry_cache, dentry);
}
static void __d_free_external(struct rcu_head *head)
{
struct dentry *dentry = container_of(head, struct dentry, d_u.d_rcu);
kfree(external_name(dentry));
kmem_cache_free(dentry_cache, dentry);
}
static inline int dname_external(const struct dentry *dentry)
{
return dentry->d_name.name != dentry->d_shortname.string;
}
void take_dentry_name_snapshot(struct name_snapshot *name, struct dentry *dentry)
{
unsigned seq;
const unsigned char *s;
rcu_read_lock();
retry:
seq = read_seqcount_begin(&dentry->d_seq);
s = READ_ONCE(dentry->d_name.name);
name->name.hash_len = dentry->d_name.hash_len;
name->name.name = name->inline_name.string;
if (likely(s == dentry->d_shortname.string)) {
name->inline_name = dentry->d_shortname;
} else {
struct external_name *p;
p = container_of(s, struct external_name, name[0]);
// get a valid reference
if (unlikely(!atomic_inc_not_zero(&p->count)))
goto retry;
name->name.name = s;
}
if (read_seqcount_retry(&dentry->d_seq, seq)) {
release_dentry_name_snapshot(name);
goto retry;
}
rcu_read_unlock();
}
EXPORT_SYMBOL(take_dentry_name_snapshot);
void release_dentry_name_snapshot(struct name_snapshot *name)
{
if (unlikely(name->name.name != name->inline_name.string)) {
struct external_name *p;
p = container_of(name->name.name, struct external_name, name[0]);
if (unlikely(atomic_dec_and_test(&p->count)))
kfree_rcu(p, head);
}
}
EXPORT_SYMBOL(release_dentry_name_snapshot);
static inline void __d_set_inode_and_type(struct dentry *dentry,
struct inode *inode,
unsigned type_flags)
{
unsigned flags;
dentry->d_inode = inode;
flags = READ_ONCE(dentry->d_flags);
flags &= ~DCACHE_ENTRY_TYPE;
flags |= type_flags;
smp_store_release(&dentry->d_flags, flags);
}
static inline void __d_clear_type_and_inode(struct dentry *dentry)
{
unsigned flags = READ_ONCE(dentry->d_flags);
flags &= ~DCACHE_ENTRY_TYPE;
WRITE_ONCE(dentry->d_flags, flags);
dentry->d_inode = NULL;
/*
* The negative counter only tracks dentries on the LRU. Don't inc if
* d_lru is on another list.
*/
if ((flags & (DCACHE_LRU_LIST|DCACHE_SHRINK_LIST)) == DCACHE_LRU_LIST)
this_cpu_inc(nr_dentry_negative);
}
static void dentry_free(struct dentry *dentry)
{
WARN_ON(!hlist_unhashed(&dentry->d_u.d_alias));
if (unlikely(dname_external(dentry))) {
struct external_name *p = external_name(dentry);
if (likely(atomic_dec_and_test(&p->count))) {
call_rcu(&dentry->d_u.d_rcu, __d_free_external);
return;
}
}
/* if dentry was never visible to RCU, immediate free is OK */
if (dentry->d_flags & DCACHE_NORCU)
__d_free(&dentry->d_u.d_rcu);
else
call_rcu(&dentry->d_u.d_rcu, __d_free);
}
/*
* Release the dentry's inode, using the filesystem
* d_iput() operation if defined.
*/
static void dentry_unlink_inode(struct dentry * dentry)
__releases(dentry->d_lock)
__releases(dentry->d_inode->i_lock)
{
struct inode *inode = dentry->d_inode;
raw_write_seqcount_begin(&dentry->d_seq);
__d_clear_type_and_inode(dentry);
hlist_del_init(&dentry->d_u.d_alias);
raw_write_seqcount_end(&dentry->d_seq);
spin_unlock(&dentry->d_lock);
spin_unlock(&inode->i_lock);
if (!inode->i_nlink)
fsnotify_inoderemove(inode);
if (dentry->d_op && dentry->d_op->d_iput)
dentry->d_op->d_iput(dentry, inode);
else
iput(inode);
}
/*
* The DCACHE_LRU_LIST bit is set whenever the 'd_lru' entry
* is in use - which includes both the "real" per-superblock
* LRU list _and_ the DCACHE_SHRINK_LIST use.
*
* The DCACHE_SHRINK_LIST bit is set whenever the dentry is
* on the shrink list (ie not on the superblock LRU list).
*
* The per-cpu "nr_dentry_unused" counters are updated with
* the DCACHE_LRU_LIST bit.
*
* The per-cpu "nr_dentry_negative" counters are only updated
* when deleted from or added to the per-superblock LRU list, not
* from/to the shrink list. That is to avoid an unneeded dec/inc
* pair when moving from LRU to shrink list in select_collect().
*
* These helper functions make sure we always follow the
* rules. d_lock must be held by the caller.
*/
#define D_FLAG_VERIFY(dentry,x) WARN_ON_ONCE(((dentry)->d_flags & (DCACHE_LRU_LIST | DCACHE_SHRINK_LIST)) != (x))
static void d_lru_add(struct dentry *dentry)
{
D_FLAG_VERIFY(dentry, 0);
dentry->d_flags |= DCACHE_LRU_LIST;
this_cpu_inc(nr_dentry_unused);
if (d_is_negative(dentry))
this_cpu_inc(nr_dentry_negative);
WARN_ON_ONCE(!list_lru_add_obj(
&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
}
static void d_lru_del(struct dentry *dentry)
{
D_FLAG_VERIFY(dentry, DCACHE_LRU_LIST);
dentry->d_flags &= ~DCACHE_LRU_LIST;
this_cpu_dec(nr_dentry_unused);
if (d_is_negative(dentry))
this_cpu_dec(nr_dentry_negative);
WARN_ON_ONCE(!list_lru_del_obj(
&dentry->d_sb->s_dentry_lru, &dentry->d_lru));
}
static void d_shrink_del(struct dentry *dentry)
{
D_FLAG_VERIFY(dentry, DCACHE_SHRINK_LIST | DCACHE_LRU_LIST);
list_del_init(&dentry->d_lru);
dentry->d_flags &= ~(DCACHE_SHRINK_LIST | DCACHE_LRU_LIST);
this_cpu_dec(nr_dentry_unused);
}
static void d_shrink_add(struct dentry *dentry, struct list_head *list)
{
D_FLAG_VERIFY(dentry, 0);
list_add(&dentry->d_lru, list);
dentry->d_flags |= DCACHE_SHRINK_LIST | DCACHE_LRU_LIST;
this_cpu_inc(nr_dentry_unused);
}
/*
* These can only be called under the global LRU lock, ie during the
* callback for freeing the LRU list. "isolate" removes it from the
* LRU lists entirely, while shrink_move moves it to the indicated
* private list.
*/
static void d_lru_isolate(struct list_lru_one *lru, struct dentry *dentry)
{
D_FLAG_VERIFY(dentry, DCACHE_LRU_LIST);
dentry->d_flags &= ~DCACHE_LRU_LIST;
this_cpu_dec(nr_dentry_unused);
if (d_is_negative(dentry))
this_cpu_dec(nr_dentry_negative);
list_lru_isolate(lru, &dentry->d_lru);
}
static void d_lru_shrink_move(struct list_lru_one *lru, struct dentry *dentry,
struct list_head *list)
{
D_FLAG_VERIFY(dentry, DCACHE_LRU_LIST);
dentry->d_flags |= DCACHE_SHRINK_LIST;
if (d_is_negative(dentry))
this_cpu_dec(nr_dentry_negative);
list_lru_isolate_move(lru, &dentry->d_lru, list);
}
static void ___d_drop(struct dentry *dentry)
{
struct hlist_bl_head *b;
/*
* Hashed dentries are normally on the dentry hashtable,
* with the exception of those newly allocated by
* d_obtain_root, which are always IS_ROOT:
*/
if (unlikely(IS_ROOT(dentry)))
b = &dentry->d_sb->s_roots;
else
b = d_hash(dentry->d_name.hash);
hlist_bl_lock(b);
__hlist_bl_del(&dentry->d_hash);
hlist_bl_unlock(b);
}
void __d_drop(struct dentry *dentry)
{
if (!d_unhashed(dentry)) {
___d_drop(dentry);
dentry->d_hash.pprev = NULL;
write_seqcount_invalidate(&dentry->d_seq);
}
}
EXPORT_SYMBOL(__d_drop);
/**
* d_drop - drop a dentry
* @dentry: dentry to drop
*
* d_drop() unhashes the entry from the parent dentry hashes, so that it won't
* be found through a VFS lookup any more. Note that this is different from
* deleting the dentry - d_delete will try to mark the dentry negative if
* possible, giving a successful _negative_ lookup, while d_drop will
* just make the cache lookup fail.
*
* d_drop() is used mainly for stuff that wants to invalidate a dentry for some
* reason (NFS timeouts or autofs deletes).
*
* __d_drop requires dentry->d_lock
*
* ___d_drop doesn't mark dentry as "unhashed"
* (dentry->d_hash.pprev will be LIST_POISON2, not NULL).
*/
void d_drop(struct dentry *dentry)
{
spin_lock(&dentry->d_lock);
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
}
EXPORT_SYMBOL(d_drop);
static inline void dentry_unlist(struct dentry *dentry)
{
struct dentry *next;
/*
* Inform d_walk() and shrink_dentry_list() that we are no longer
* attached to the dentry tree
*/
dentry->d_flags |= DCACHE_DENTRY_KILLED;
if (unlikely(hlist_unhashed(&dentry->d_sib)))
return;
__hlist_del(&dentry->d_sib);
/*
* Cursors can move around the list of children. While we'd been
* a normal list member, it didn't matter - ->d_sib.next would've
* been updated. However, from now on it won't be and for the
* things like d_walk() it might end up with a nasty surprise.
* Normally d_walk() doesn't care about cursors moving around -
* ->d_lock on parent prevents that and since a cursor has no children
* of its own, we get through it without ever unlocking the parent.
* There is one exception, though - if we ascend from a child that
* gets killed as soon as we unlock it, the next sibling is found
* using the value left in its ->d_sib.next. And if _that_
* pointed to a cursor, and cursor got moved (e.g. by lseek())
* before d_walk() regains parent->d_lock, we'll end up skipping
* everything the cursor had been moved past.
*
* Solution: make sure that the pointer left behind in ->d_sib.next
* points to something that won't be moving around. I.e. skip the
* cursors.
*/
while (dentry->d_sib.next) {
next = hlist_entry(dentry->d_sib.next, struct dentry, d_sib);
if (likely(!(next->d_flags & DCACHE_DENTRY_CURSOR)))
break;
dentry->d_sib.next = next->d_sib.next;
}
}
static struct dentry *__dentry_kill(struct dentry *dentry)
{
struct dentry *parent = NULL;
bool can_free = true;
/*
* The dentry is now unrecoverably dead to the world.
*/
lockref_mark_dead(&dentry->d_lockref);
/*
* inform the fs via d_prune that this dentry is about to be
* unhashed and destroyed.
*/
if (dentry->d_flags & DCACHE_OP_PRUNE)
dentry->d_op->d_prune(dentry);
if (dentry->d_flags & DCACHE_LRU_LIST) {
if (!(dentry->d_flags & DCACHE_SHRINK_LIST))
d_lru_del(dentry);
}
/* if it was on the hash then remove it */
__d_drop(dentry);
if (dentry->d_inode)
dentry_unlink_inode(dentry);
else
spin_unlock(&dentry->d_lock);
this_cpu_dec(nr_dentry);
if (dentry->d_op && dentry->d_op->d_release)
dentry->d_op->d_release(dentry);
cond_resched();
/* now that it's negative, ->d_parent is stable */
if (!IS_ROOT(dentry)) {
parent = dentry->d_parent;
spin_lock(&parent->d_lock);
}
spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
dentry_unlist(dentry);
if (dentry->d_flags & DCACHE_SHRINK_LIST)
can_free = false;
spin_unlock(&dentry->d_lock);
if (likely(can_free))
dentry_free(dentry);
if (parent && --parent->d_lockref.count) {
spin_unlock(&parent->d_lock);
return NULL;
}
return parent;
}
/*
* Lock a dentry for feeding it to __dentry_kill().
* Called under rcu_read_lock() and dentry->d_lock; the former
* guarantees that nothing we access will be freed under us.
* Note that dentry is *not* protected from concurrent dentry_kill(),
* d_delete(), etc.
*
* Return false if dentry is busy. Otherwise, return true and have
* that dentry's inode locked.
*/
static bool lock_for_kill(struct dentry *dentry)
{
struct inode *inode = dentry->d_inode;
if (unlikely(dentry->d_lockref.count))
return false;
if (!inode || likely(spin_trylock(&inode->i_lock)))
return true;
do {
spin_unlock(&dentry->d_lock);
spin_lock(&inode->i_lock);
spin_lock(&dentry->d_lock);
if (likely(inode == dentry->d_inode))
break;
spin_unlock(&inode->i_lock);
inode = dentry->d_inode;
} while (inode);
if (likely(!dentry->d_lockref.count))
return true;
if (inode)
spin_unlock(&inode->i_lock);
return false;
}
/*
* Decide if dentry is worth retaining. Usually this is called with dentry
* locked; if not locked, we are more limited and might not be able to tell
* without a lock. False in this case means "punt to locked path and recheck".
*
* In case we aren't locked, these predicates are not "stable". However, it is
* sufficient that at some point after we dropped the reference the dentry was
* hashed and the flags had the proper value. Other dentry users may have
* re-gotten a reference to the dentry and change that, but our work is done -
* we can leave the dentry around with a zero refcount.
*/
static inline bool retain_dentry(struct dentry *dentry, bool locked)
{
unsigned int d_flags;
smp_rmb();
d_flags = READ_ONCE(dentry->d_flags);
// Unreachable? Nobody would be able to look it up, no point retaining
if (unlikely(d_unhashed(dentry)))
return false;
// Same if it's disconnected
if (unlikely(d_flags & DCACHE_DISCONNECTED))
return false;
// ->d_delete() might tell us not to bother, but that requires
// ->d_lock; can't decide without it
if (unlikely(d_flags & DCACHE_OP_DELETE)) {
if (!locked || dentry->d_op->d_delete(dentry))
return false;
}
// Explicitly told not to bother
if (unlikely(d_flags & DCACHE_DONTCACHE))
return false;
// At this point it looks like we ought to keep it. We also might
// need to do something - put it on LRU if it wasn't there already
// and mark it referenced if it was on LRU, but not marked yet.
// Unfortunately, both actions require ->d_lock, so in lockless
// case we'd have to punt rather than doing those.
if (unlikely(!(d_flags & DCACHE_LRU_LIST))) {
if (!locked)
return false;
d_lru_add(dentry);
} else if (unlikely(!(d_flags & DCACHE_REFERENCED))) {
if (!locked)
return false;
dentry->d_flags |= DCACHE_REFERENCED;
}
return true;
}
void d_mark_dontcache(struct inode *inode)
{
struct dentry *de;
spin_lock(&inode->i_lock);
hlist_for_each_entry(de, &inode->i_dentry, d_u.d_alias) {
spin_lock(&de->d_lock);
de->d_flags |= DCACHE_DONTCACHE;
spin_unlock(&de->d_lock);
}
inode_state_set(inode, I_DONTCACHE);
spin_unlock(&inode->i_lock);
}
EXPORT_SYMBOL(d_mark_dontcache);
/*
* Try to do a lockless dput(), and return whether that was successful.
*
* If unsuccessful, we return false, having already taken the dentry lock.
* In that case refcount is guaranteed to be zero and we have already
* decided that it's not worth keeping around.
*
* The caller needs to hold the RCU read lock, so that the dentry is
* guaranteed to stay around even if the refcount goes down to zero!
*/
static inline bool fast_dput(struct dentry *dentry)
{
int ret;
/*
* try to decrement the lockref optimistically.
*/
ret = lockref_put_return(&dentry->d_lockref);
/*
* If the lockref_put_return() failed due to the lock being held
* by somebody else, the fast path has failed. We will need to
* get the lock, and then check the count again.
*/
if (unlikely(ret < 0)) {
spin_lock(&dentry->d_lock);
if (WARN_ON_ONCE(dentry->d_lockref.count <= 0)) {
spin_unlock(&dentry->d_lock);
return true;
}
dentry->d_lockref.count--;
goto locked;
}
/*
* If we weren't the last ref, we're done.
*/
if (ret)
return true;
/*
* Can we decide that decrement of refcount is all we needed without
* taking the lock? There's a very common case when it's all we need -
* dentry looks like it ought to be retained and there's nothing else
* to do.
*/
if (retain_dentry(dentry, false))
return true;
/*
* Either not worth retaining or we can't tell without the lock.
* Get the lock, then. We've already decremented the refcount to 0,
* but we'll need to re-check the situation after getting the lock.
*/
spin_lock(&dentry->d_lock);
/*
* Did somebody else grab a reference to it in the meantime, and
* we're no longer the last user after all? Alternatively, somebody
* else could have killed it and marked it dead. Either way, we
* don't need to do anything else.
*/
locked:
if (dentry->d_lockref.count || retain_dentry(dentry, true)) {
spin_unlock(&dentry->d_lock);
return true;
}
return false;
}
static void finish_dput(struct dentry *dentry)
__releases(dentry->d_lock)
__releases(RCU)
{
while (lock_for_kill(dentry)) {
rcu_read_unlock();
dentry = __dentry_kill(dentry);
if (!dentry)
return;
if (retain_dentry(dentry, true)) {
spin_unlock(&dentry->d_lock);
return;
}
rcu_read_lock();
}
rcu_read_unlock();
spin_unlock(&dentry->d_lock);
}
/*
* This is dput
*
* This is complicated by the fact that we do not want to put
* dentries that are no longer on any hash chain on the unused
* list: we'd much rather just get rid of them immediately.
*
* However, that implies that we have to traverse the dentry
* tree upwards to the parents which might _also_ now be
* scheduled for deletion (it may have been only waiting for
* its last child to go away).
*
* This tail recursion is done by hand as we don't want to depend
* on the compiler to always get this right (gcc generally doesn't).
* Real recursion would eat up our stack space.
*/
/*
* dput - release a dentry
* @dentry: dentry to release
*
* Release a dentry. This will drop the usage count and if appropriate
* call the dentry unlink method as well as removing it from the queues and
* releasing its resources. If the parent dentries were scheduled for release
* they too may now get deleted.
*/
void dput(struct dentry *dentry)
{
if (!dentry)
return;
might_sleep();
rcu_read_lock();
if (likely(fast_dput(dentry))) {
rcu_read_unlock();
return;
}
finish_dput(dentry);
}
EXPORT_SYMBOL(dput);
void d_make_discardable(struct dentry *dentry)
{
spin_lock(&dentry->d_lock);
WARN_ON(!(dentry->d_flags & DCACHE_PERSISTENT));
dentry->d_flags &= ~DCACHE_PERSISTENT;
dentry->d_lockref.count--;
rcu_read_lock();
finish_dput(dentry);
}
EXPORT_SYMBOL(d_make_discardable);
static void to_shrink_list(struct dentry *dentry, struct list_head *list)
__must_hold(&dentry->d_lock)
{
if (!(dentry->d_flags & DCACHE_SHRINK_LIST)) {
if (dentry->d_flags & DCACHE_LRU_LIST)
d_lru_del(dentry);
d_shrink_add(dentry, list);
}
}
void dput_to_list(struct dentry *dentry, struct list_head *list)
{
rcu_read_lock();
if (likely(fast_dput(dentry))) {
rcu_read_unlock();
return;
}
rcu_read_unlock();
to_shrink_list(dentry, list);
spin_unlock(&dentry->d_lock);
}
struct dentry *dget_parent(struct dentry *dentry)
{
int gotref;
struct dentry *ret;
unsigned seq;
/*
* Do optimistic parent lookup without any
* locking.
*/
rcu_read_lock();
seq = raw_seqcount_begin(&dentry->d_seq);
ret = READ_ONCE(dentry->d_parent);
gotref = lockref_get_not_zero(&ret->d_lockref);
rcu_read_unlock();
if (likely(gotref)) {
if (!read_seqcount_retry(&dentry->d_seq, seq))
return ret;
dput(ret);
}
repeat:
/*
* Don't need rcu_dereference because we re-check it was correct under
* the lock.
*/
rcu_read_lock();
ret = dentry->d_parent;
spin_lock(&ret->d_lock);
if (unlikely(ret != dentry->d_parent)) {
spin_unlock(&ret->d_lock);
rcu_read_unlock();
goto repeat;
}
rcu_read_unlock();
BUG_ON(!ret->d_lockref.count);
ret->d_lockref.count++;
spin_unlock(&ret->d_lock);
return ret;
}
EXPORT_SYMBOL(dget_parent);
static struct dentry * __d_find_any_alias(struct inode *inode)
{
struct dentry *alias;
if (hlist_empty(&inode->i_dentry))
return NULL;
alias = hlist_entry(inode->i_dentry.first, struct dentry, d_u.d_alias);
lockref_get(&alias->d_lockref);
return alias;
}
/**
* d_find_any_alias - find any alias for a given inode
* @inode: inode to find an alias for
*
* If any aliases exist for the given inode, take and return a
* reference for one of them. If no aliases exist, return %NULL.
*/
struct dentry *d_find_any_alias(struct inode *inode)
{
struct dentry *de;
spin_lock(&inode->i_lock);
de = __d_find_any_alias(inode);
spin_unlock(&inode->i_lock);
return de;
}
EXPORT_SYMBOL(d_find_any_alias);
static struct dentry *__d_find_alias(struct inode *inode)
{
struct dentry *alias;
if (S_ISDIR(inode->i_mode))
return __d_find_any_alias(inode);
hlist_for_each_entry(alias, &inode->i_dentry, d_u.d_alias) {
spin_lock(&alias->d_lock);
if (!d_unhashed(alias)) {
dget_dlock(alias);
spin_unlock(&alias->d_lock);
return alias;
}
spin_unlock(&alias->d_lock);
}
return NULL;
}
/**
* d_find_alias - grab a hashed alias of inode
* @inode: inode in question
*
* If inode has a hashed alias, or is a directory and has any alias,
* acquire the reference to alias and return it. Otherwise return NULL.
* Notice that if inode is a directory there can be only one alias and
* it can be unhashed only if it has no children, or if it is the root
* of a filesystem, or if the directory was renamed and d_revalidate
* was the first vfs operation to notice.
*
* If the inode has an IS_ROOT, DCACHE_DISCONNECTED alias, then prefer
* any other hashed alias over that one.
*/
struct dentry *d_find_alias(struct inode *inode)
{
struct dentry *de = NULL;
if (!hlist_empty(&inode->i_dentry)) {
spin_lock(&inode->i_lock);
de = __d_find_alias(inode);
spin_unlock(&inode->i_lock);
}
return de;
}
EXPORT_SYMBOL(d_find_alias);
/*
* Caller MUST be holding rcu_read_lock() and be guaranteed
* that inode won't get freed until rcu_read_unlock().
*/
struct dentry *d_find_alias_rcu(struct inode *inode)
{
struct hlist_head *l = &inode->i_dentry;
struct dentry *de = NULL;
spin_lock(&inode->i_lock);
// ->i_dentry and ->i_rcu are colocated, but the latter won't be
// used without having I_FREEING set, which means no aliases left
if (likely(!(inode_state_read(inode) & I_FREEING) && !hlist_empty(l))) {
if (S_ISDIR(inode->i_mode)) {
de = hlist_entry(l->first, struct dentry, d_u.d_alias);
} else {
hlist_for_each_entry(de, l, d_u.d_alias)
if (!d_unhashed(de))
break;
}
}
spin_unlock(&inode->i_lock);
return de;
}
/**
* d_dispose_if_unused - move unreferenced dentries to shrink list
* @dentry: dentry in question
* @dispose: head of shrink list
*
* If dentry has no external references, move it to shrink list.
*
* NOTE!!! The caller is responsible for preventing eviction of the dentry by
* holding dentry->d_inode->i_lock or equivalent.
*/
void d_dispose_if_unused(struct dentry *dentry, struct list_head *dispose)
{
spin_lock(&dentry->d_lock);
if (!dentry->d_lockref.count)
to_shrink_list(dentry, dispose);
spin_unlock(&dentry->d_lock);
}
EXPORT_SYMBOL(d_dispose_if_unused);
/*
* Try to kill dentries associated with this inode.
* WARNING: you must own a reference to inode.
*/
void d_prune_aliases(struct inode *inode)
{
LIST_HEAD(dispose);
struct dentry *dentry;
spin_lock(&inode->i_lock);
hlist_for_each_entry(dentry, &inode->i_dentry, d_u.d_alias)
d_dispose_if_unused(dentry, &dispose);
spin_unlock(&inode->i_lock);
shrink_dentry_list(&dispose);
}
EXPORT_SYMBOL(d_prune_aliases);
static inline void shrink_kill(struct dentry *victim)
{
do {
rcu_read_unlock();
victim = __dentry_kill(victim);
rcu_read_lock();
} while (victim && lock_for_kill(victim));
rcu_read_unlock();
if (victim)
spin_unlock(&victim->d_lock);
}
void shrink_dentry_list(struct list_head *list)
{
while (!list_empty(list)) {
struct dentry *dentry;
dentry = list_entry(list->prev, struct dentry, d_lru);
spin_lock(&dentry->d_lock);
rcu_read_lock();
if (!lock_for_kill(dentry)) {
bool can_free;
rcu_read_unlock();
d_shrink_del(dentry);
can_free = dentry->d_flags & DCACHE_DENTRY_KILLED;
spin_unlock(&dentry->d_lock);
if (can_free)
dentry_free(dentry);
continue;
}
d_shrink_del(dentry);
shrink_kill(dentry);
}
}
EXPORT_SYMBOL(shrink_dentry_list);
static enum lru_status dentry_lru_isolate(struct list_head *item,
struct list_lru_one *lru, void *arg)
{
struct list_head *freeable = arg;
struct dentry *dentry = container_of(item, struct dentry, d_lru);
/*
* we are inverting the lru lock/dentry->d_lock here,
* so use a trylock. If we fail to get the lock, just skip
* it
*/
if (!spin_trylock(&dentry->d_lock))
return LRU_SKIP;
/*
* Referenced dentries are still in use. If they have active
* counts, just remove them from the LRU. Otherwise give them
* another pass through the LRU.
*/
if (dentry->d_lockref.count) {
d_lru_isolate(lru, dentry);
spin_unlock(&dentry->d_lock);
return LRU_REMOVED;
}
if (dentry->d_flags & DCACHE_REFERENCED) {
dentry->d_flags &= ~DCACHE_REFERENCED;
spin_unlock(&dentry->d_lock);
/*
* The list move itself will be made by the common LRU code. At
* this point, we've dropped the dentry->d_lock but keep the
* lru lock. This is safe to do, since every list movement is
* protected by the lru lock even if both locks are held.
*
* This is guaranteed by the fact that all LRU management
* functions are intermediated by the LRU API calls like
* list_lru_add_obj and list_lru_del_obj. List movement in this file
* only ever occur through this functions or through callbacks
* like this one, that are called from the LRU API.
*
* The only exceptions to this are functions like
* shrink_dentry_list, and code that first checks for the
* DCACHE_SHRINK_LIST flag. Those are guaranteed to be
* operating only with stack provided lists after they are
* properly isolated from the main list. It is thus, always a
* local access.
*/
return LRU_ROTATE;
}
d_lru_shrink_move(lru, dentry, freeable);
spin_unlock(&dentry->d_lock);
return LRU_REMOVED;
}
/**
* prune_dcache_sb - shrink the dcache
* @sb: superblock
* @sc: shrink control, passed to list_lru_shrink_walk()
*
* Attempt to shrink the superblock dcache LRU by @sc->nr_to_scan entries. This
* is done when we need more memory and called from the superblock shrinker
* function.
*
* This function may fail to free any resources if all the dentries are in
* use.
*/
long prune_dcache_sb(struct super_block *sb, struct shrink_control *sc)
{
LIST_HEAD(dispose);
long freed;
freed = list_lru_shrink_walk(&sb->s_dentry_lru, sc,
dentry_lru_isolate, &dispose);
shrink_dentry_list(&dispose);
return freed;
}
static enum lru_status dentry_lru_isolate_shrink(struct list_head *item,
struct list_lru_one *lru, void *arg)
{
struct list_head *freeable = arg;
struct dentry *dentry = container_of(item, struct dentry, d_lru);
/*
* we are inverting the lru lock/dentry->d_lock here,
* so use a trylock. If we fail to get the lock, just skip
* it
*/
if (!spin_trylock(&dentry->d_lock))
return LRU_SKIP;
d_lru_shrink_move(lru, dentry, freeable);
spin_unlock(&dentry->d_lock);
return LRU_REMOVED;
}
/**
* shrink_dcache_sb - shrink dcache for a superblock
* @sb: superblock
*
* Shrink the dcache for the specified super block. This is used to free
* the dcache before unmounting a file system.
*/
void shrink_dcache_sb(struct super_block *sb)
{
do {
LIST_HEAD(dispose);
list_lru_walk(&sb->s_dentry_lru,
dentry_lru_isolate_shrink, &dispose, 1024);
shrink_dentry_list(&dispose);
} while (list_lru_count(&sb->s_dentry_lru) > 0);
}
EXPORT_SYMBOL(shrink_dcache_sb);
/**
* enum d_walk_ret - action to talke during tree walk
* @D_WALK_CONTINUE: contrinue walk
* @D_WALK_QUIT: quit walk
* @D_WALK_NORETRY: quit when retry is needed
* @D_WALK_SKIP: skip this dentry and its children
*/
enum d_walk_ret {
D_WALK_CONTINUE,
D_WALK_QUIT,
D_WALK_NORETRY,
D_WALK_SKIP,
};
/**
* d_walk - walk the dentry tree
* @parent: start of walk
* @data: data passed to @enter() and @finish()
* @enter: callback when first entering the dentry
*
* The @enter() callbacks are called with d_lock held.
*/
static void d_walk(struct dentry *parent, void *data,
enum d_walk_ret (*enter)(void *, struct dentry *))
{
struct dentry *this_parent, *dentry;
unsigned seq = 0;
enum d_walk_ret ret;
bool retry = true;
again:
read_seqbegin_or_lock(&rename_lock, &seq);
this_parent = parent;
spin_lock(&this_parent->d_lock);
ret = enter(data, this_parent);
switch (ret) {
case D_WALK_CONTINUE:
break;
case D_WALK_QUIT:
case D_WALK_SKIP:
goto out_unlock;
case D_WALK_NORETRY:
retry = false;
break;
}
repeat:
dentry = d_first_child(this_parent);
resume:
hlist_for_each_entry_from(dentry, d_sib) {
if (unlikely(dentry->d_flags & DCACHE_DENTRY_CURSOR))
continue;
spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
ret = enter(data, dentry);
switch (ret) {
case D_WALK_CONTINUE:
break;
case D_WALK_QUIT:
spin_unlock(&dentry->d_lock);
goto out_unlock;
case D_WALK_NORETRY:
retry = false;
break;
case D_WALK_SKIP:
spin_unlock(&dentry->d_lock);
continue;
}
if (!hlist_empty(&dentry->d_children)) {
spin_unlock(&this_parent->d_lock);
spin_release(&dentry->d_lock.dep_map, _RET_IP_);
this_parent = dentry;
spin_acquire(&this_parent->d_lock.dep_map, 0, 1, _RET_IP_);
goto repeat;
}
spin_unlock(&dentry->d_lock);
}
/*
* All done at this level ... ascend and resume the search.
*/
rcu_read_lock();
ascend:
if (this_parent != parent) {
dentry = this_parent;
this_parent = dentry->d_parent;
spin_unlock(&dentry->d_lock);
spin_lock(&this_parent->d_lock);
/* might go back up the wrong parent if we have had a rename. */
if (need_seqretry(&rename_lock, seq))
goto rename_retry;
/* go into the first sibling still alive */
hlist_for_each_entry_continue(dentry, d_sib) {
if (likely(!(dentry->d_flags & DCACHE_DENTRY_KILLED))) {
rcu_read_unlock();
goto resume;
}
}
goto ascend;
}
if (need_seqretry(&rename_lock, seq))
goto rename_retry;
rcu_read_unlock();
out_unlock:
spin_unlock(&this_parent->d_lock);
done_seqretry(&rename_lock, seq);
return;
rename_retry:
spin_unlock(&this_parent->d_lock);
rcu_read_unlock();
BUG_ON(seq & 1);
if (!retry)
return;
seq = 1;
goto again;
}
struct check_mount {
struct vfsmount *mnt;
unsigned int mounted;
};
/* locks: mount_locked_reader && dentry->d_lock */
static enum d_walk_ret path_check_mount(void *data, struct dentry *dentry)
{
struct check_mount *info = data;
struct path path = { .mnt = info->mnt, .dentry = dentry };
if (likely(!d_mountpoint(dentry)))
return D_WALK_CONTINUE;
if (__path_is_mountpoint(&path)) {
info->mounted = 1;
return D_WALK_QUIT;
}
return D_WALK_CONTINUE;
}
/**
* path_has_submounts - check for mounts over a dentry in the
* current namespace.
* @parent: path to check.
*
* Return true if the parent or its subdirectories contain
* a mount point in the current namespace.
*/
int path_has_submounts(const struct path *parent)
{
struct check_mount data = { .mnt = parent->mnt, .mounted = 0 };
guard(mount_locked_reader)();
d_walk(parent->dentry, &data, path_check_mount);
return data.mounted;
}
EXPORT_SYMBOL(path_has_submounts);
/*
* Called by mount code to set a mountpoint and check if the mountpoint is
* reachable (e.g. NFS can unhash a directory dentry and then the complete
* subtree can become unreachable).
*
* Only one of d_invalidate() and d_set_mounted() must succeed. For
* this reason take rename_lock and d_lock on dentry and ancestors.
*/
int d_set_mounted(struct dentry *dentry)
{
struct dentry *p;
int ret = -ENOENT;
read_seqlock_excl(&rename_lock);
for (p = dentry->d_parent; !IS_ROOT(p); p = p->d_parent) {
/* Need exclusion wrt. d_invalidate() */
spin_lock(&p->d_lock);
if (unlikely(d_unhashed(p))) {
spin_unlock(&p->d_lock);
goto out;
}
spin_unlock(&p->d_lock);
}
spin_lock(&dentry->d_lock);
if (!d_unlinked(dentry)) {
ret = -EBUSY;
if (!d_mountpoint(dentry)) {
dentry->d_flags |= DCACHE_MOUNTED;
ret = 0;
}
}
spin_unlock(&dentry->d_lock);
out:
read_sequnlock_excl(&rename_lock);
return ret;
}
/*
* Search the dentry child list of the specified parent,
* and move any unused dentries to the end of the unused
* list for prune_dcache(). We descend to the next level
* whenever the d_children list is non-empty and continue
* searching.
*
* It returns zero iff there are no unused children,
* otherwise it returns the number of children moved to
* the end of the unused list. This may not be the total
* number of unused children, because select_parent can
* drop the lock and return early due to latency
* constraints.
*/
struct select_data {
struct dentry *start;
union {
long found;
struct dentry *victim;
};
struct list_head dispose;
};
static enum d_walk_ret select_collect(void *_data, struct dentry *dentry)
{
struct select_data *data = _data;
enum d_walk_ret ret = D_WALK_CONTINUE;
if (data->start == dentry)
goto out;
if (dentry->d_flags & DCACHE_SHRINK_LIST) {
data->found++;
} else if (!dentry->d_lockref.count) {
to_shrink_list(dentry, &data->dispose);
data->found++;
} else if (dentry->d_lockref.count < 0) {
data->found++;
}
/*
* We can return to the caller if we have found some (this
* ensures forward progress). We'll be coming back to find
* the rest.
*/
if (!list_empty(&data->dispose))
ret = need_resched() ? D_WALK_QUIT : D_WALK_NORETRY;
out:
return ret;
}
static enum d_walk_ret select_collect_umount(void *_data, struct dentry *dentry)
{
if (dentry->d_flags & DCACHE_PERSISTENT) {
dentry->d_flags &= ~DCACHE_PERSISTENT;
dentry->d_lockref.count--;
}
return select_collect(_data, dentry);
}
static enum d_walk_ret select_collect2(void *_data, struct dentry *dentry)
{
struct select_data *data = _data;
enum d_walk_ret ret = D_WALK_CONTINUE;
if (data->start == dentry)
goto out;
if (!dentry->d_lockref.count) {
if (dentry->d_flags & DCACHE_SHRINK_LIST) {
rcu_read_lock();
data->victim = dentry;
return D_WALK_QUIT;
}
to_shrink_list(dentry, &data->dispose);
}
/*
* We can return to the caller if we have found some (this
* ensures forward progress). We'll be coming back to find
* the rest.
*/
if (!list_empty(&data->dispose))
ret = need_resched() ? D_WALK_QUIT : D_WALK_NORETRY;
out:
return ret;
}
/**
* shrink_dcache_tree - prune dcache
* @parent: parent of entries to prune
* @for_umount: true if we want to unpin the persistent ones
*
* Prune the dcache to remove unused children of the parent dentry.
*/
static void shrink_dcache_tree(struct dentry *parent, bool for_umount)
{
for (;;) {
struct select_data data = {.start = parent};
INIT_LIST_HEAD(&data.dispose);
d_walk(parent, &data,
for_umount ? select_collect_umount : select_collect);
if (!list_empty(&data.dispose)) {
shrink_dentry_list(&data.dispose);
continue;
}
cond_resched();
if (!data.found)
break;
data.victim = NULL;
d_walk(parent, &data, select_collect2);
if (data.victim) {
spin_lock(&data.victim->d_lock);
if (!lock_for_kill(data.victim)) {
spin_unlock(&data.victim->d_lock);
rcu_read_unlock();
} else {
shrink_kill(data.victim);
}
}
if (!list_empty(&data.dispose))
shrink_dentry_list(&data.dispose);
}
}
void shrink_dcache_parent(struct dentry *parent)
{
shrink_dcache_tree(parent, false);
}
EXPORT_SYMBOL(shrink_dcache_parent);
static enum d_walk_ret umount_check(void *_data, struct dentry *dentry)
{
/* it has busy descendents; complain about those instead */
if (!hlist_empty(&dentry->d_children))
return D_WALK_CONTINUE;
/* root with refcount 1 is fine */
if (dentry == _data && dentry->d_lockref.count == 1)
return D_WALK_CONTINUE;
WARN(1, "BUG: Dentry %p{i=%lx,n=%pd} "
" still in use (%d) [unmount of %s %s]\n",
dentry,
dentry->d_inode ?
dentry->d_inode->i_ino : 0UL,
dentry,
dentry->d_lockref.count,
dentry->d_sb->s_type->name,
dentry->d_sb->s_id);
return D_WALK_CONTINUE;
}
static void do_one_tree(struct dentry *dentry)
{
shrink_dcache_tree(dentry, true);
d_walk(dentry, dentry, umount_check);
d_drop(dentry);
dput(dentry);
}
/*
* destroy the dentries attached to a superblock on unmounting
*/
void shrink_dcache_for_umount(struct super_block *sb)
{
struct dentry *dentry;
rwsem_assert_held_write(&sb->s_umount);
dentry = sb->s_root;
sb->s_root = NULL;
do_one_tree(dentry);
while (!hlist_bl_empty(&sb->s_roots)) {
dentry = dget(hlist_bl_entry(hlist_bl_first(&sb->s_roots), struct dentry, d_hash));
do_one_tree(dentry);
}
}
static enum d_walk_ret find_submount(void *_data, struct dentry *dentry)
{
struct dentry **victim = _data;
if (d_mountpoint(dentry)) {
*victim = dget_dlock(dentry);
return D_WALK_QUIT;
}
return D_WALK_CONTINUE;
}
/**
* d_invalidate - detach submounts, prune dcache, and drop
* @dentry: dentry to invalidate (aka detach, prune and drop)
*/
void d_invalidate(struct dentry *dentry)
{
bool had_submounts = false;
spin_lock(&dentry->d_lock);
if (d_unhashed(dentry)) {
spin_unlock(&dentry->d_lock);
return;
}
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
/* Negative dentries can be dropped without further checks */
if (!dentry->d_inode)
return;
shrink_dcache_parent(dentry);
for (;;) {
struct dentry *victim = NULL;
d_walk(dentry, &victim, find_submount);
if (!victim) {
if (had_submounts)
shrink_dcache_parent(dentry);
return;
}
had_submounts = true;
detach_mounts(victim);
dput(victim);
}
}
EXPORT_SYMBOL(d_invalidate);
/**
* __d_alloc - allocate a dcache entry
* @sb: filesystem it will belong to
* @name: qstr of the name
*
* Allocates a dentry. It returns %NULL if there is insufficient memory
* available. On a success the dentry is returned. The name passed in is
* copied and the copy passed in may be reused after this call.
*/
static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
{
struct dentry *dentry;
char *dname;
int err;
dentry = kmem_cache_alloc_lru(dentry_cache, &sb->s_dentry_lru,
GFP_KERNEL);
if (!dentry)
return NULL;
/*
* We guarantee that the inline name is always NUL-terminated.
* This way the memcpy() done by the name switching in rename
* will still always have a NUL at the end, even if we might
* be overwriting an internal NUL character
*/
dentry->d_shortname.string[DNAME_INLINE_LEN-1] = 0;
if (unlikely(!name)) {
name = &slash_name;
dname = dentry->d_shortname.string;
} else if (name->len > DNAME_INLINE_LEN-1) {
size_t size = offsetof(struct external_name, name[1]);
struct external_name *p = kmalloc(size + name->len,
GFP_KERNEL_ACCOUNT |
__GFP_RECLAIMABLE);
if (!p) {
kmem_cache_free(dentry_cache, dentry);
return NULL;
}
atomic_set(&p->count, 1);
dname = p->name;
} else {
dname = dentry->d_shortname.string;
}
dentry->__d_name.len = name->len;
dentry->__d_name.hash = name->hash;
memcpy(dname, name->name, name->len);
dname[name->len] = 0;
/* Make sure we always see the terminating NUL character */
smp_store_release(&dentry->__d_name.name, dname); /* ^^^ */
dentry->d_flags = 0;
lockref_init(&dentry->d_lockref);
seqcount_spinlock_init(&dentry->d_seq, &dentry->d_lock);
dentry->d_inode = NULL;
dentry->d_parent = dentry;
dentry->d_sb = sb;
dentry->d_op = sb->__s_d_op;
dentry->d_flags = sb->s_d_flags;
dentry->d_fsdata = NULL;
INIT_HLIST_BL_NODE(&dentry->d_hash);
INIT_LIST_HEAD(&dentry->d_lru);
INIT_HLIST_HEAD(&dentry->d_children);
INIT_HLIST_NODE(&dentry->d_u.d_alias);
INIT_HLIST_NODE(&dentry->d_sib);
if (dentry->d_op && dentry->d_op->d_init) {
err = dentry->d_op->d_init(dentry);
if (err) {
if (dname_external(dentry))
kfree(external_name(dentry));
kmem_cache_free(dentry_cache, dentry);
return NULL;
}
}
this_cpu_inc(nr_dentry);
return dentry;
}
/**
* d_alloc - allocate a dcache entry
* @parent: parent of entry to allocate
* @name: qstr of the name
*
* Allocates a dentry. It returns %NULL if there is insufficient memory
* available. On a success the dentry is returned. The name passed in is
* copied and the copy passed in may be reused after this call.
*/
struct dentry *d_alloc(struct dentry * parent, const struct qstr *name)
{
struct dentry *dentry = __d_alloc(parent->d_sb, name);
if (!dentry)
return NULL;
spin_lock(&parent->d_lock);
/*
* don't need child lock because it is not subject
* to concurrency here
*/
dentry->d_parent = dget_dlock(parent);
hlist_add_head(&dentry->d_sib, &parent->d_children);
spin_unlock(&parent->d_lock);
return dentry;
}
EXPORT_SYMBOL(d_alloc);
struct dentry *d_alloc_anon(struct super_block *sb)
{
return __d_alloc(sb, NULL);
}
EXPORT_SYMBOL(d_alloc_anon);
struct dentry *d_alloc_cursor(struct dentry * parent)
{
struct dentry *dentry = d_alloc_anon(parent->d_sb);
if (dentry) {
dentry->d_flags |= DCACHE_DENTRY_CURSOR;
dentry->d_parent = dget(parent);
}
return dentry;
}
/**
* d_alloc_pseudo - allocate a dentry (for lookup-less filesystems)
* @sb: the superblock
* @name: qstr of the name
*
* For a filesystem that just pins its dentries in memory and never
* performs lookups at all, return an unhashed IS_ROOT dentry.
* This is used for pipes, sockets et.al. - the stuff that should
* never be anyone's children or parents. Unlike all other
* dentries, these will not have RCU delay between dropping the
* last reference and freeing them.
*
* The only user is alloc_file_pseudo() and that's what should
* be considered a public interface. Don't use directly.
*/
struct dentry *d_alloc_pseudo(struct super_block *sb, const struct qstr *name)
{
static const struct dentry_operations anon_ops = {
.d_dname = simple_dname
};
struct dentry *dentry = __d_alloc(sb, name);
if (likely(dentry)) {
dentry->d_flags |= DCACHE_NORCU;
/* d_op_flags(&anon_ops) is 0 */
if (!dentry->d_op)
dentry->d_op = &anon_ops;
}
return dentry;
}
struct dentry *d_alloc_name(struct dentry *parent, const char *name)
{
struct qstr q;
q.name = name;
q.hash_len = hashlen_string(parent, name);
return d_alloc(parent, &q);
}
EXPORT_SYMBOL(d_alloc_name);
#define DCACHE_OP_FLAGS \
(DCACHE_OP_HASH | DCACHE_OP_COMPARE | DCACHE_OP_REVALIDATE | \
DCACHE_OP_WEAK_REVALIDATE | DCACHE_OP_DELETE | DCACHE_OP_PRUNE | \
DCACHE_OP_REAL)
static unsigned int d_op_flags(const struct dentry_operations *op)
{
unsigned int flags = 0;
if (op) {
if (op->d_hash)
flags |= DCACHE_OP_HASH;
if (op->d_compare)
flags |= DCACHE_OP_COMPARE;
if (op->d_revalidate)
flags |= DCACHE_OP_REVALIDATE;
if (op->d_weak_revalidate)
flags |= DCACHE_OP_WEAK_REVALIDATE;
if (op->d_delete)
flags |= DCACHE_OP_DELETE;
if (op->d_prune)
flags |= DCACHE_OP_PRUNE;
if (op->d_real)
flags |= DCACHE_OP_REAL;
}
return flags;
}
static void d_set_d_op(struct dentry *dentry, const struct dentry_operations *op)
{
unsigned int flags = d_op_flags(op);
WARN_ON_ONCE(dentry->d_op);
WARN_ON_ONCE(dentry->d_flags & DCACHE_OP_FLAGS);
dentry->d_op = op;
if (flags)
dentry->d_flags |= flags;
}
void set_default_d_op(struct super_block *s, const struct dentry_operations *ops)
{
unsigned int flags = d_op_flags(ops);
s->__s_d_op = ops;
s->s_d_flags = (s->s_d_flags & ~DCACHE_OP_FLAGS) | flags;
}
EXPORT_SYMBOL(set_default_d_op);
static unsigned d_flags_for_inode(struct inode *inode)
{
unsigned add_flags = DCACHE_REGULAR_TYPE;
if (!inode)
return DCACHE_MISS_TYPE;
if (S_ISDIR(inode->i_mode)) {
add_flags = DCACHE_DIRECTORY_TYPE;
if (unlikely(!(inode->i_opflags & IOP_LOOKUP))) {
if (unlikely(!inode->i_op->lookup))
add_flags = DCACHE_AUTODIR_TYPE;
else
inode->i_opflags |= IOP_LOOKUP;
}
goto type_determined;
}
if (unlikely(!(inode->i_opflags & IOP_NOFOLLOW))) {
if (unlikely(inode->i_op->get_link)) {
add_flags = DCACHE_SYMLINK_TYPE;
goto type_determined;
}
inode->i_opflags |= IOP_NOFOLLOW;
}
if (unlikely(!S_ISREG(inode->i_mode)))
add_flags = DCACHE_SPECIAL_TYPE;
type_determined:
if (unlikely(IS_AUTOMOUNT(inode)))
add_flags |= DCACHE_NEED_AUTOMOUNT;
return add_flags;
}
static void __d_instantiate(struct dentry *dentry, struct inode *inode)
{
unsigned add_flags = d_flags_for_inode(inode);
WARN_ON(d_in_lookup(dentry));
/*
* The negative counter only tracks dentries on the LRU. Don't dec if
* d_lru is on another list.
*/
if ((dentry->d_flags &
(DCACHE_LRU_LIST|DCACHE_SHRINK_LIST)) == DCACHE_LRU_LIST)
this_cpu_dec(nr_dentry_negative);
hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry);
raw_write_seqcount_begin(&dentry->d_seq);
__d_set_inode_and_type(dentry, inode, add_flags);
raw_write_seqcount_end(&dentry->d_seq);
fsnotify_update_flags(dentry);
}
/**
* d_instantiate - fill in inode information for a dentry
* @entry: dentry to complete
* @inode: inode to attach to this dentry
*
* Fill in inode information in the entry.
*
* This turns negative dentries into productive full members
* of society.
*
* NOTE! This assumes that the inode count has been incremented
* (or otherwise set) by the caller to indicate that it is now
* in use by the dcache.
*/
void d_instantiate(struct dentry *entry, struct inode * inode)
{
BUG_ON(!hlist_unhashed(&entry->d_u.d_alias));
if (inode) {
security_d_instantiate(entry, inode);
spin_lock(&inode->i_lock);
spin_lock(&entry->d_lock);
__d_instantiate(entry, inode);
spin_unlock(&entry->d_lock);
spin_unlock(&inode->i_lock);
}
}
EXPORT_SYMBOL(d_instantiate);
/*
* This should be equivalent to d_instantiate() + unlock_new_inode(),
* with lockdep-related part of unlock_new_inode() done before
* anything else. Use that instead of open-coding d_instantiate()/
* unlock_new_inode() combinations.
*/
void d_instantiate_new(struct dentry *entry, struct inode *inode)
{
BUG_ON(!hlist_unhashed(&entry->d_u.d_alias));
BUG_ON(!inode);
lockdep_annotate_inode_mutex_key(inode);
security_d_instantiate(entry, inode);
spin_lock(&inode->i_lock);
spin_lock(&entry->d_lock);
__d_instantiate(entry, inode);
spin_unlock(&entry->d_lock);
WARN_ON(!(inode_state_read(inode) & I_NEW));
inode_state_clear(inode, I_NEW | I_CREATING);
inode_wake_up_bit(inode, __I_NEW);
spin_unlock(&inode->i_lock);
}
EXPORT_SYMBOL(d_instantiate_new);
struct dentry *d_make_root(struct inode *root_inode)
{
struct dentry *res = NULL;
if (root_inode) {
res = d_alloc_anon(root_inode->i_sb);
if (res)
d_instantiate(res, root_inode);
else
iput(root_inode);
}
return res;
}
EXPORT_SYMBOL(d_make_root);
static struct dentry *__d_obtain_alias(struct inode *inode, bool disconnected)
{
struct super_block *sb;
struct dentry *new, *res;
if (!inode)
return ERR_PTR(-ESTALE);
if (IS_ERR(inode))
return ERR_CAST(inode);
sb = inode->i_sb;
res = d_find_any_alias(inode); /* existing alias? */
if (res)
goto out;
new = d_alloc_anon(sb);
if (!new) {
res = ERR_PTR(-ENOMEM);
goto out;
}
security_d_instantiate(new, inode);
spin_lock(&inode->i_lock);
res = __d_find_any_alias(inode); /* recheck under lock */
if (likely(!res)) { /* still no alias, attach a disconnected dentry */
unsigned add_flags = d_flags_for_inode(inode);
if (disconnected)
add_flags |= DCACHE_DISCONNECTED;
spin_lock(&new->d_lock);
__d_set_inode_and_type(new, inode, add_flags);
hlist_add_head(&new->d_u.d_alias, &inode->i_dentry);
if (!disconnected) {
hlist_bl_lock(&sb->s_roots);
hlist_bl_add_head(&new->d_hash, &sb->s_roots);
hlist_bl_unlock(&sb->s_roots);
}
spin_unlock(&new->d_lock);
spin_unlock(&inode->i_lock);
inode = NULL; /* consumed by new->d_inode */
res = new;
} else {
spin_unlock(&inode->i_lock);
dput(new);
}
out:
iput(inode);
return res;
}
/**
* d_obtain_alias - find or allocate a DISCONNECTED dentry for a given inode
* @inode: inode to allocate the dentry for
*
* Obtain a dentry for an inode resulting from NFS filehandle conversion or
* similar open by handle operations. The returned dentry may be anonymous,
* or may have a full name (if the inode was already in the cache).
*
* When called on a directory inode, we must ensure that the inode only ever
* has one dentry. If a dentry is found, that is returned instead of
* allocating a new one.
*
* On successful return, the reference to the inode has been transferred
* to the dentry. In case of an error the reference on the inode is released.
* To make it easier to use in export operations a %NULL or IS_ERR inode may
* be passed in and the error will be propagated to the return value,
* with a %NULL @inode replaced by ERR_PTR(-ESTALE).
*/
struct dentry *d_obtain_alias(struct inode *inode)
{
return __d_obtain_alias(inode, true);
}
EXPORT_SYMBOL(d_obtain_alias);
/**
* d_obtain_root - find or allocate a dentry for a given inode
* @inode: inode to allocate the dentry for
*
* Obtain an IS_ROOT dentry for the root of a filesystem.
*
* We must ensure that directory inodes only ever have one dentry. If a
* dentry is found, that is returned instead of allocating a new one.
*
* On successful return, the reference to the inode has been transferred
* to the dentry. In case of an error the reference on the inode is
* released. A %NULL or IS_ERR inode may be passed in and will be the
* error will be propagate to the return value, with a %NULL @inode
* replaced by ERR_PTR(-ESTALE).
*/
struct dentry *d_obtain_root(struct inode *inode)
{
return __d_obtain_alias(inode, false);
}
EXPORT_SYMBOL(d_obtain_root);
/**
* d_add_ci - lookup or allocate new dentry with case-exact name
* @dentry: the negative dentry that was passed to the parent's lookup func
* @inode: the inode case-insensitive lookup has found
* @name: the case-exact name to be associated with the returned dentry
*
* This is to avoid filling the dcache with case-insensitive names to the
* same inode, only the actual correct case is stored in the dcache for
* case-insensitive filesystems.
*
* For a case-insensitive lookup match and if the case-exact dentry
* already exists in the dcache, use it and return it.
*
* If no entry exists with the exact case name, allocate new dentry with
* the exact case, and return the spliced entry.
*/
struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
struct qstr *name)
{
struct dentry *found, *res;
/*
* First check if a dentry matching the name already exists,
* if not go ahead and create it now.
*/
found = d_hash_and_lookup(dentry->d_parent, name);
if (found) {
iput(inode);
return found;
}
if (d_in_lookup(dentry)) {
found = d_alloc_parallel(dentry->d_parent, name,
dentry->d_wait);
if (IS_ERR(found) || !d_in_lookup(found)) {
iput(inode);
return found;
}
} else {
found = d_alloc(dentry->d_parent, name);
if (!found) {
iput(inode);
return ERR_PTR(-ENOMEM);
}
}
res = d_splice_alias(inode, found);
if (res) {
d_lookup_done(found);
dput(found);
return res;
}
return found;
}
EXPORT_SYMBOL(d_add_ci);
/**
* d_same_name - compare dentry name with case-exact name
* @dentry: the negative dentry that was passed to the parent's lookup func
* @parent: parent dentry
* @name: the case-exact name to be associated with the returned dentry
*
* Return: true if names are same, or false
*/
bool d_same_name(const struct dentry *dentry, const struct dentry *parent,
const struct qstr *name)
{
if (likely(!(parent->d_flags & DCACHE_OP_COMPARE))) {
if (dentry->d_name.len != name->len)
return false;
return dentry_cmp(dentry, name->name, name->len) == 0;
}
return parent->d_op->d_compare(dentry,
dentry->d_name.len, dentry->d_name.name,
name) == 0;
}
EXPORT_SYMBOL_GPL(d_same_name);
/*
* This is __d_lookup_rcu() when the parent dentry has
* DCACHE_OP_COMPARE, which makes things much nastier.
*/
static noinline struct dentry *__d_lookup_rcu_op_compare(
const struct dentry *parent,
const struct qstr *name,
unsigned *seqp)
{
u64 hashlen = name->hash_len;
struct hlist_bl_head *b = d_hash(hashlen);
struct hlist_bl_node *node;
struct dentry *dentry;
hlist_bl_for_each_entry_rcu(dentry, node, b, d_hash) {
int tlen;
const char *tname;
unsigned seq;
seqretry:
seq = raw_seqcount_begin(&dentry->d_seq);
if (dentry->d_parent != parent)
continue;
if (d_unhashed(dentry))
continue;
if (dentry->d_name.hash != hashlen_hash(hashlen))
continue;
tlen = dentry->d_name.len;
tname = dentry->d_name.name;
/* we want a consistent (name,len) pair */
if (read_seqcount_retry(&dentry->d_seq, seq)) {
cpu_relax();
goto seqretry;
}
if (parent->d_op->d_compare(dentry, tlen, tname, name) != 0)
continue;
*seqp = seq;
return dentry;
}
return NULL;
}
/**
* __d_lookup_rcu - search for a dentry (racy, store-free)
* @parent: parent dentry
* @name: qstr of name we wish to find
* @seqp: returns d_seq value at the point where the dentry was found
* Returns: dentry, or NULL
*
* __d_lookup_rcu is the dcache lookup function for rcu-walk name
* resolution (store-free path walking) design described in
* Documentation/filesystems/path-lookup.txt.
*
* This is not to be used outside core vfs.
*
* __d_lookup_rcu must only be used in rcu-walk mode, ie. with vfsmount lock
* held, and rcu_read_lock held. The returned dentry must not be stored into
* without taking d_lock and checking d_seq sequence count against @seq
* returned here.
*
* Alternatively, __d_lookup_rcu may be called again to look up the child of
* the returned dentry, so long as its parent's seqlock is checked after the
* child is looked up. Thus, an interlocking stepping of sequence lock checks
* is formed, giving integrity down the path walk.
*
* NOTE! The caller *has* to check the resulting dentry against the sequence
* number we've returned before using any of the resulting dentry state!
*/
struct dentry *__d_lookup_rcu(const struct dentry *parent,
const struct qstr *name,
unsigned *seqp)
{
u64 hashlen = name->hash_len;
const unsigned char *str = name->name;
struct hlist_bl_head *b = d_hash(hashlen);
struct hlist_bl_node *node;
struct dentry *dentry;
/*
* Note: There is significant duplication with __d_lookup_rcu which is
* required to prevent single threaded performance regressions
* especially on architectures where smp_rmb (in seqcounts) are costly.
* Keep the two functions in sync.
*/
if (unlikely(parent->d_flags & DCACHE_OP_COMPARE))
return __d_lookup_rcu_op_compare(parent, name, seqp);
/*
* The hash list is protected using RCU.
*
* Carefully use d_seq when comparing a candidate dentry, to avoid
* races with d_move().
*
* It is possible that concurrent renames can mess up our list
* walk here and result in missing our dentry, resulting in the
* false-negative result. d_lookup() protects against concurrent
* renames using rename_lock seqlock.
*
* See Documentation/filesystems/path-lookup.txt for more details.
*/
hlist_bl_for_each_entry_rcu(dentry, node, b, d_hash) {
unsigned seq;
/*
* The dentry sequence count protects us from concurrent
* renames, and thus protects parent and name fields.
*
* The caller must perform a seqcount check in order
* to do anything useful with the returned dentry.
*
* NOTE! We do a "raw" seqcount_begin here. That means that
* we don't wait for the sequence count to stabilize if it
* is in the middle of a sequence change. If we do the slow
* dentry compare, we will do seqretries until it is stable,
* and if we end up with a successful lookup, we actually
* want to exit RCU lookup anyway.
*
* Note that raw_seqcount_begin still *does* smp_rmb(), so
* we are still guaranteed NUL-termination of ->d_name.name.
*/
seq = raw_seqcount_begin(&dentry->d_seq);
if (dentry->d_parent != parent)
continue;
if (dentry->d_name.hash_len != hashlen)
continue;
if (unlikely(dentry_cmp(dentry, str, hashlen_len(hashlen)) != 0))
continue;
/*
* Check for the dentry being unhashed.
*
* As tempting as it is, we *can't* skip it because of a race window
* between us finding the dentry before it gets unhashed and loading
* the sequence counter after unhashing is finished.
*
* We can at least predict on it.
*/
if (unlikely(d_unhashed(dentry)))
continue;
*seqp = seq;
return dentry;
}
return NULL;
}
/**
* d_lookup - search for a dentry
* @parent: parent dentry
* @name: qstr of name we wish to find
* Returns: dentry, or NULL
*
* d_lookup searches the children of the parent dentry for the name in
* question. If the dentry is found its reference count is incremented and the
* dentry is returned. The caller must use dput to free the entry when it has
* finished using it. %NULL is returned if the dentry does not exist.
*/
struct dentry *d_lookup(const struct dentry *parent, const struct qstr *name)
{
struct dentry *dentry;
unsigned seq;
do {
seq = read_seqbegin(&rename_lock);
dentry = __d_lookup(parent, name);
if (dentry)
break;
} while (read_seqretry(&rename_lock, seq));
return dentry;
}
EXPORT_SYMBOL(d_lookup);
/**
* __d_lookup - search for a dentry (racy)
* @parent: parent dentry
* @name: qstr of name we wish to find
* Returns: dentry, or NULL
*
* __d_lookup is like d_lookup, however it may (rarely) return a
* false-negative result due to unrelated rename activity.
*
* __d_lookup is slightly faster by avoiding rename_lock read seqlock,
* however it must be used carefully, eg. with a following d_lookup in
* the case of failure.
*
* __d_lookup callers must be commented.
*/
struct dentry *__d_lookup(const struct dentry *parent, const struct qstr *name)
{
unsigned int hash = name->hash;
struct hlist_bl_head *b = d_hash(hash);
struct hlist_bl_node *node;
struct dentry *found = NULL;
struct dentry *dentry;
/*
* Note: There is significant duplication with __d_lookup_rcu which is
* required to prevent single threaded performance regressions
* especially on architectures where smp_rmb (in seqcounts) are costly.
* Keep the two functions in sync.
*/
/*
* The hash list is protected using RCU.
*
* Take d_lock when comparing a candidate dentry, to avoid races
* with d_move().
*
* It is possible that concurrent renames can mess up our list
* walk here and result in missing our dentry, resulting in the
* false-negative result. d_lookup() protects against concurrent
* renames using rename_lock seqlock.
*
* See Documentation/filesystems/path-lookup.txt for more details.
*/
rcu_read_lock();
hlist_bl_for_each_entry_rcu(dentry, node, b, d_hash) {
if (dentry->d_name.hash != hash)
continue;
spin_lock(&dentry->d_lock);
if (dentry->d_parent != parent)
goto next;
if (d_unhashed(dentry))
goto next;
if (!d_same_name(dentry, parent, name))
goto next;
dentry->d_lockref.count++;
found = dentry;
spin_unlock(&dentry->d_lock);
break;
next:
spin_unlock(&dentry->d_lock);
}
rcu_read_unlock();
return found;
}
/**
* d_hash_and_lookup - hash the qstr then search for a dentry
* @dir: Directory to search in
* @name: qstr of name we wish to find
*
* On lookup failure NULL is returned; on bad name - ERR_PTR(-error)
*/
struct dentry *d_hash_and_lookup(struct dentry *dir, struct qstr *name)
{
/*
* Check for a fs-specific hash function. Note that we must
* calculate the standard hash first, as the d_op->d_hash()
* routine may choose to leave the hash value unchanged.
*/
name->hash = full_name_hash(dir, name->name, name->len);
if (dir->d_flags & DCACHE_OP_HASH) {
int err = dir->d_op->d_hash(dir, name);
if (unlikely(err < 0))
return ERR_PTR(err);
}
return d_lookup(dir, name);
}
/*
* When a file is deleted, we have two options:
* - turn this dentry into a negative dentry
* - unhash this dentry and free it.
*
* Usually, we want to just turn this into
* a negative dentry, but if anybody else is
* currently using the dentry or the inode
* we can't do that and we fall back on removing
* it from the hash queues and waiting for
* it to be deleted later when it has no users
*/
/**
* d_delete - delete a dentry
* @dentry: The dentry to delete
*
* Turn the dentry into a negative dentry if possible, otherwise
* remove it from the hash queues so it can be deleted later
*/
void d_delete(struct dentry * dentry)
{
struct inode *inode = dentry->d_inode;
spin_lock(&inode->i_lock);
spin_lock(&dentry->d_lock);
/*
* Are we the only user?
*/
if (dentry->d_lockref.count == 1) {
if (dentry_negative_policy)
__d_drop(dentry);
dentry->d_flags &= ~DCACHE_CANT_MOUNT;
dentry_unlink_inode(dentry);
} else {
__d_drop(dentry);
spin_unlock(&dentry->d_lock);
spin_unlock(&inode->i_lock);
}
}
EXPORT_SYMBOL(d_delete);
static void __d_rehash(struct dentry *entry)
{
struct hlist_bl_head *b = d_hash(entry->d_name.hash);
hlist_bl_lock(b);
hlist_bl_add_head_rcu(&entry->d_hash, b);
hlist_bl_unlock(b);
}
/**
* d_rehash - add an entry back to the hash
* @entry: dentry to add to the hash
*
* Adds a dentry to the hash according to its name.
*/
void d_rehash(struct dentry * entry)
{
spin_lock(&entry->d_lock);
__d_rehash(entry);
spin_unlock(&entry->d_lock);
}
EXPORT_SYMBOL(d_rehash);
static inline unsigned start_dir_add(struct inode *dir)
{
preempt_disable_nested();
for (;;) {
unsigned n = READ_ONCE(dir->i_dir_seq);
if (!(n & 1) && try_cmpxchg(&dir->i_dir_seq, &n, n + 1))
return n;
cpu_relax();
}
}
static inline void end_dir_add(struct inode *dir, unsigned int n,
wait_queue_head_t *d_wait)
{
smp_store_release(&dir->i_dir_seq, n + 2);
preempt_enable_nested();
if (wq_has_sleeper(d_wait))
wake_up_all(d_wait);
}
static void d_wait_lookup(struct dentry *dentry)
{
if (d_in_lookup(dentry)) {
DECLARE_WAITQUEUE(wait, current);
add_wait_queue(dentry->d_wait, &wait);
do {
set_current_state(TASK_UNINTERRUPTIBLE);
spin_unlock(&dentry->d_lock);
schedule();
spin_lock(&dentry->d_lock);
} while (d_in_lookup(dentry));
}
}
struct dentry *d_alloc_parallel(struct dentry *parent,
const struct qstr *name,
wait_queue_head_t *wq)
{
unsigned int hash = name->hash;
struct hlist_bl_head *b = in_lookup_hash(parent, hash);
struct hlist_bl_node *node;
struct dentry *new = __d_alloc(parent->d_sb, name);
struct dentry *dentry;
unsigned seq, r_seq, d_seq;
if (unlikely(!new))
return ERR_PTR(-ENOMEM);
new->d_flags |= DCACHE_PAR_LOOKUP;
spin_lock(&parent->d_lock);
new->d_parent = dget_dlock(parent);
hlist_add_head(&new->d_sib, &parent->d_children);
if (parent->d_flags & DCACHE_DISCONNECTED)
new->d_flags |= DCACHE_DISCONNECTED;
spin_unlock(&parent->d_lock);
retry:
rcu_read_lock();
seq = smp_load_acquire(&parent->d_inode->i_dir_seq);
r_seq = read_seqbegin(&rename_lock);
dentry = __d_lookup_rcu(parent, name, &d_seq);
if (unlikely(dentry)) {
if (!lockref_get_not_dead(&dentry->d_lockref)) {
rcu_read_unlock();
goto retry;
}
if (read_seqcount_retry(&dentry->d_seq, d_seq)) {
rcu_read_unlock();
dput(dentry);
goto retry;
}
rcu_read_unlock();
dput(new);
return dentry;
}
if (unlikely(read_seqretry(&rename_lock, r_seq))) {
rcu_read_unlock();
goto retry;
}
if (unlikely(seq & 1)) {
rcu_read_unlock();
goto retry;
}
hlist_bl_lock(b);
if (unlikely(READ_ONCE(parent->d_inode->i_dir_seq) != seq)) {
hlist_bl_unlock(b);
rcu_read_unlock();
goto retry;
}
/*
* No changes for the parent since the beginning of d_lookup().
* Since all removals from the chain happen with hlist_bl_lock(),
* any potential in-lookup matches are going to stay here until
* we unlock the chain. All fields are stable in everything
* we encounter.
*/
hlist_bl_for_each_entry(dentry, node, b, d_u.d_in_lookup_hash) {
if (dentry->d_name.hash != hash)
continue;
if (dentry->d_parent != parent)
continue;
if (!d_same_name(dentry, parent, name))
continue;
hlist_bl_unlock(b);
/* now we can try to grab a reference */
if (!lockref_get_not_dead(&dentry->d_lockref)) {
rcu_read_unlock();
goto retry;
}
rcu_read_unlock();
/*
* somebody is likely to be still doing lookup for it;
* wait for them to finish
*/
spin_lock(&dentry->d_lock);
d_wait_lookup(dentry);
/*
* it's not in-lookup anymore; in principle we should repeat
* everything from dcache lookup, but it's likely to be what
* d_lookup() would've found anyway. If it is, just return it;
* otherwise we really have to repeat the whole thing.
*/
if (unlikely(dentry->d_name.hash != hash))
goto mismatch;
if (unlikely(dentry->d_parent != parent))
goto mismatch;
if (unlikely(d_unhashed(dentry)))
goto mismatch;
if (unlikely(!d_same_name(dentry, parent, name)))
goto mismatch;
/* OK, it *is* a hashed match; return it */
spin_unlock(&dentry->d_lock);
dput(new);
return dentry;
}
rcu_read_unlock();
new->d_wait = wq;
hlist_bl_add_head(&new->d_u.d_in_lookup_hash, b);
hlist_bl_unlock(b);
return new;
mismatch:
spin_unlock(&dentry->d_lock);
dput(dentry);
goto retry;
}
EXPORT_SYMBOL(d_alloc_parallel);
/*
* - Unhash the dentry
* - Retrieve and clear the waitqueue head in dentry
* - Return the waitqueue head
*/
static wait_queue_head_t *__d_lookup_unhash(struct dentry *dentry)
{
wait_queue_head_t *d_wait;
struct hlist_bl_head *b;
lockdep_assert_held(&dentry->d_lock);
b = in_lookup_hash(dentry->d_parent, dentry->d_name.hash);
hlist_bl_lock(b);
dentry->d_flags &= ~DCACHE_PAR_LOOKUP;
__hlist_bl_del(&dentry->d_u.d_in_lookup_hash);
d_wait = dentry->d_wait;
dentry->d_wait = NULL;
hlist_bl_unlock(b);
INIT_HLIST_NODE(&dentry->d_u.d_alias);
INIT_LIST_HEAD(&dentry->d_lru);
return d_wait;
}
void __d_lookup_unhash_wake(struct dentry *dentry)
{
spin_lock(&dentry->d_lock);
wake_up_all(__d_lookup_unhash(dentry));
spin_unlock(&dentry->d_lock);
}
EXPORT_SYMBOL(__d_lookup_unhash_wake);
/* inode->i_lock held if inode is non-NULL */
static inline void __d_add(struct dentry *dentry, struct inode *inode,
const struct dentry_operations *ops)
{
wait_queue_head_t *d_wait;
struct inode *dir = NULL;
unsigned n;
spin_lock(&dentry->d_lock);
if (unlikely(d_in_lookup(dentry))) {
dir = dentry->d_parent->d_inode;
n = start_dir_add(dir);
d_wait = __d_lookup_unhash(dentry);
}
if (unlikely(ops))
d_set_d_op(dentry, ops);
if (inode) {
unsigned add_flags = d_flags_for_inode(inode);
hlist_add_head(&dentry->d_u.d_alias, &inode->i_dentry);
raw_write_seqcount_begin(&dentry->d_seq);
__d_set_inode_and_type(dentry, inode, add_flags);
raw_write_seqcount_end(&dentry->d_seq);
fsnotify_update_flags(dentry);
}
__d_rehash(dentry);
if (dir)
end_dir_add(dir, n, d_wait);
spin_unlock(&dentry->d_lock);
if (inode)
spin_unlock(&inode->i_lock);
}
/**
* d_add - add dentry to hash queues
* @entry: dentry to add
* @inode: The inode to attach to this dentry
*
* This adds the entry to the hash queues and initializes @inode.
* The entry was actually filled in earlier during d_alloc().
*/
void d_add(struct dentry *entry, struct inode *inode)
{
if (inode) {
security_d_instantiate(entry, inode);
spin_lock(&inode->i_lock);
}
__d_add(entry, inode, NULL);
}
EXPORT_SYMBOL(d_add);
struct dentry *d_make_persistent(struct dentry *dentry, struct inode *inode)
{
WARN_ON(!hlist_unhashed(&dentry->d_u.d_alias));
WARN_ON(!inode);
security_d_instantiate(dentry, inode);
spin_lock(&inode->i_lock);
spin_lock(&dentry->d_lock);
__d_instantiate(dentry, inode);
dentry->d_flags |= DCACHE_PERSISTENT;
dget_dlock(dentry);
if (d_unhashed(dentry))
__d_rehash(dentry);
spin_unlock(&dentry->d_lock);
spin_unlock(&inode->i_lock);
return dentry;
}
EXPORT_SYMBOL(d_make_persistent);
static void swap_names(struct dentry *dentry, struct dentry *target)
{
if (unlikely(dname_external(target))) {
if (unlikely(dname_external(dentry))) {
/*
* Both external: swap the pointers
*/
swap(target->__d_name.name, dentry->__d_name.name);
} else {
/*
* dentry:internal, target:external. Steal target's
* storage and make target internal.
*/
dentry->__d_name.name = target->__d_name.name;
target->d_shortname = dentry->d_shortname;
target->__d_name.name = target->d_shortname.string;
}
} else {
if (unlikely(dname_external(dentry))) {
/*
* dentry:external, target:internal. Give dentry's
* storage to target and make dentry internal
*/
target->__d_name.name = dentry->__d_name.name;
dentry->d_shortname = target->d_shortname;
dentry->__d_name.name = dentry->d_shortname.string;
} else {
/*
* Both are internal.
*/
for (int i = 0; i < DNAME_INLINE_WORDS; i++)
swap(dentry->d_shortname.words[i],
target->d_shortname.words[i]);
}
}
swap(dentry->__d_name.hash_len, target->__d_name.hash_len);
}
static void copy_name(struct dentry *dentry, struct dentry *target)
{
struct external_name *old_name = NULL;
if (unlikely(dname_external(dentry)))
old_name = external_name(dentry);
if (unlikely(dname_external(target))) {
atomic_inc(&external_name(target)->count);
dentry->__d_name = target->__d_name;
} else {
dentry->d_shortname = target->d_shortname;
dentry->__d_name.name = dentry->d_shortname.string;
dentry->__d_name.hash_len = target->__d_name.hash_len;
}
if (old_name && likely(atomic_dec_and_test(&old_name->count)))
kfree_rcu(old_name, head);
}
/*
* __d_move - move a dentry
* @dentry: entry to move
* @target: new dentry
* @exchange: exchange the two dentries
*
* Update the dcache to reflect the move of a file name. Negative dcache
* entries should not be moved in this way. Caller must hold rename_lock, the
* i_rwsem of the source and target directories (exclusively), and the sb->
* s_vfs_rename_mutex if they differ. See lock_rename().
*/
static void __d_move(struct dentry *dentry, struct dentry *target,
bool exchange)
{
struct dentry *old_parent, *p;
wait_queue_head_t *d_wait;
struct inode *dir = NULL;
unsigned n;
WARN_ON(!dentry->d_inode);
if (WARN_ON(dentry == target))
return;
BUG_ON(d_ancestor(target, dentry));
old_parent = dentry->d_parent;
p = d_ancestor(old_parent, target);
if (IS_ROOT(dentry)) {
BUG_ON(p);
spin_lock(&target->d_parent->d_lock);
} else if (!p) {
/* target is not a descendent of dentry->d_parent */
spin_lock(&target->d_parent->d_lock);
spin_lock_nested(&old_parent->d_lock, DENTRY_D_LOCK_NESTED);
} else {
BUG_ON(p == dentry);
spin_lock(&old_parent->d_lock);
if (p != target)
spin_lock_nested(&target->d_parent->d_lock,
DENTRY_D_LOCK_NESTED);
}
spin_lock_nested(&dentry->d_lock, 2);
spin_lock_nested(&target->d_lock, 3);
if (unlikely(d_in_lookup(target))) {
dir = target->d_parent->d_inode;
n = start_dir_add(dir);
d_wait = __d_lookup_unhash(target);
}
write_seqcount_begin(&dentry->d_seq);
write_seqcount_begin_nested(&target->d_seq, DENTRY_D_LOCK_NESTED);
/* unhash both */
if (!d_unhashed(dentry))
___d_drop(dentry);
if (!d_unhashed(target))
___d_drop(target);
/* ... and switch them in the tree */
dentry->d_parent = target->d_parent;
if (!exchange) {
copy_name(dentry, target);
target->d_hash.pprev = NULL;
dentry->d_parent->d_lockref.count++;
if (dentry != old_parent) /* wasn't IS_ROOT */
WARN_ON(!--old_parent->d_lockref.count);
} else {
target->d_parent = old_parent;
swap_names(dentry, target);
if (!hlist_unhashed(&target->d_sib))
__hlist_del(&target->d_sib);
hlist_add_head(&target->d_sib, &target->d_parent->d_children);
__d_rehash(target);
fsnotify_update_flags(target);
}
if (!hlist_unhashed(&dentry->d_sib))
__hlist_del(&dentry->d_sib);
hlist_add_head(&dentry->d_sib, &dentry->d_parent->d_children);
__d_rehash(dentry);
fsnotify_update_flags(dentry);
fscrypt_handle_d_move(dentry);
write_seqcount_end(&target->d_seq);
write_seqcount_end(&dentry->d_seq);
if (dir)
end_dir_add(dir, n, d_wait);
if (dentry->d_parent != old_parent)
spin_unlock(&dentry->d_parent->d_lock);
if (dentry != old_parent)
spin_unlock(&old_parent->d_lock);
spin_unlock(&target->d_lock);
spin_unlock(&dentry->d_lock);
}
/*
* d_move - move a dentry
* @dentry: entry to move
* @target: new dentry
*
* Update the dcache to reflect the move of a file name. Negative
* dcache entries should not be moved in this way. See the locking
* requirements for __d_move.
*/
void d_move(struct dentry *dentry, struct dentry *target)
{
write_seqlock(&rename_lock);
__d_move(dentry, target, false);
write_sequnlock(&rename_lock);
}
EXPORT_SYMBOL(d_move);
/*
* d_exchange - exchange two dentries
* @dentry1: first dentry
* @dentry2: second dentry
*/
void d_exchange(struct dentry *dentry1, struct dentry *dentry2)
{
write_seqlock(&rename_lock);
WARN_ON(!dentry1->d_inode);
WARN_ON(!dentry2->d_inode);
WARN_ON(IS_ROOT(dentry1));
WARN_ON(IS_ROOT(dentry2));
__d_move(dentry1, dentry2, true);
write_sequnlock(&rename_lock);
}
EXPORT_SYMBOL(d_exchange);
/**
* d_ancestor - search for an ancestor
* @p1: ancestor dentry
* @p2: child dentry
*
* Returns the ancestor dentry of p2 which is a child of p1, if p1 is
* an ancestor of p2, else NULL.
*/
struct dentry *d_ancestor(struct dentry *p1, struct dentry *p2)
{
struct dentry *p;
for (p = p2; !IS_ROOT(p); p = p->d_parent) {
if (p->d_parent == p1)
return p;
}
return NULL;
}
/*
* This helper attempts to cope with remotely renamed directories
*
* It assumes that the caller is already holding
* dentry->d_parent->d_inode->i_rwsem, and rename_lock
*
* Note: If ever the locking in lock_rename() changes, then please
* remember to update this too...
*/
static int __d_unalias(struct dentry *dentry, struct dentry *alias)
{
struct mutex *m1 = NULL;
struct rw_semaphore *m2 = NULL;
int ret = -ESTALE;
/* If alias and dentry share a parent, then no extra locks required */
if (alias->d_parent == dentry->d_parent)
goto out_unalias;
/* See lock_rename() */
if (!mutex_trylock(&dentry->d_sb->s_vfs_rename_mutex))
goto out_err;
m1 = &dentry->d_sb->s_vfs_rename_mutex;
if (!inode_trylock_shared(alias->d_parent->d_inode))
goto out_err;
m2 = &alias->d_parent->d_inode->i_rwsem;
out_unalias:
if (alias->d_op && alias->d_op->d_unalias_trylock &&
!alias->d_op->d_unalias_trylock(alias))
goto out_err;
__d_move(alias, dentry, false);
if (alias->d_op && alias->d_op->d_unalias_unlock)
alias->d_op->d_unalias_unlock(alias);
ret = 0;
out_err:
if (m2)
up_read(m2);
if (m1)
mutex_unlock(m1);
return ret;
}
struct dentry *d_splice_alias_ops(struct inode *inode, struct dentry *dentry,
const struct dentry_operations *ops)
{
if (IS_ERR(inode))
return ERR_CAST(inode);
BUG_ON(!d_unhashed(dentry));
if (!inode)
goto out;
security_d_instantiate(dentry, inode);
spin_lock(&inode->i_lock);
if (S_ISDIR(inode->i_mode)) {
struct dentry *new = __d_find_any_alias(inode);
if (unlikely(new)) {
/* The reference to new ensures it remains an alias */
spin_unlock(&inode->i_lock);
write_seqlock(&rename_lock);
if (unlikely(d_ancestor(new, dentry))) {
write_sequnlock(&rename_lock);
dput(new);
new = ERR_PTR(-ELOOP);
pr_warn_ratelimited(
"VFS: Lookup of '%s' in %s %s"
" would have caused loop\n",
dentry->d_name.name,
inode->i_sb->s_type->name,
inode->i_sb->s_id);
} else if (!IS_ROOT(new)) {
struct dentry *old_parent = dget(new->d_parent);
int err = __d_unalias(dentry, new);
write_sequnlock(&rename_lock);
if (err) {
dput(new);
new = ERR_PTR(err);
}
dput(old_parent);
} else {
__d_move(new, dentry, false);
write_sequnlock(&rename_lock);
}
iput(inode);
return new;
}
}
out:
__d_add(dentry, inode, ops);
return NULL;
}
/**
* d_splice_alias - splice a disconnected dentry into the tree if one exists
* @inode: the inode which may have a disconnected dentry
* @dentry: a negative dentry which we want to point to the inode.
*
* If inode is a directory and has an IS_ROOT alias, then d_move that in
* place of the given dentry and return it, else simply d_add the inode
* to the dentry and return NULL.
*
* If a non-IS_ROOT directory is found, the filesystem is corrupt, and
* we should error out: directories can't have multiple aliases.
*
* This is needed in the lookup routine of any filesystem that is exportable
* (via knfsd) so that we can build dcache paths to directories effectively.
*
* If a dentry was found and moved, then it is returned. Otherwise NULL
* is returned. This matches the expected return value of ->lookup.
*
* Cluster filesystems may call this function with a negative, hashed dentry.
* In that case, we know that the inode will be a regular file, and also this
* will only occur during atomic_open. So we need to check for the dentry
* being already hashed only in the final case.
*/
struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)
{
return d_splice_alias_ops(inode, dentry, NULL);
}
EXPORT_SYMBOL(d_splice_alias);
/*
* Test whether new_dentry is a subdirectory of old_dentry.
*
* Trivially implemented using the dcache structure
*/
/**
* is_subdir - is new dentry a subdirectory of old_dentry
* @new_dentry: new dentry
* @old_dentry: old dentry
*
* Returns true if new_dentry is a subdirectory of the parent (at any depth).
* Returns false otherwise.
* Caller must ensure that "new_dentry" is pinned before calling is_subdir()
*/
bool is_subdir(struct dentry *new_dentry, struct dentry *old_dentry)
{
bool subdir;
unsigned seq;
if (new_dentry == old_dentry)
return true;
/* Access d_parent under rcu as d_move() may change it. */
rcu_read_lock();
seq = read_seqbegin(&rename_lock);
subdir = d_ancestor(old_dentry, new_dentry);
/* Try lockless once... */
if (read_seqretry(&rename_lock, seq)) {
/* ...else acquire lock for progress even on deep chains. */
read_seqlock_excl(&rename_lock);
subdir = d_ancestor(old_dentry, new_dentry);
read_sequnlock_excl(&rename_lock);
}
rcu_read_unlock();
return subdir;
}
EXPORT_SYMBOL(is_subdir);
void d_mark_tmpfile(struct file *file, struct inode *inode)
{
struct dentry *dentry = file->f_path.dentry;
BUG_ON(dname_external(dentry) ||
!hlist_unhashed(&dentry->d_u.d_alias) ||
!d_unlinked(dentry));
spin_lock(&dentry->d_parent->d_lock);
spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
dentry->__d_name.len = sprintf(dentry->d_shortname.string, "#%llu",
(unsigned long long)inode->i_ino);
spin_unlock(&dentry->d_lock);
spin_unlock(&dentry->d_parent->d_lock);
}
EXPORT_SYMBOL(d_mark_tmpfile);
void d_tmpfile(struct file *file, struct inode *inode)
{
struct dentry *dentry = file->f_path.dentry;
inode_dec_link_count(inode);
d_mark_tmpfile(file, inode);
d_instantiate(dentry, inode);
}
EXPORT_SYMBOL(d_tmpfile);
/*
* Obtain inode number of the parent dentry.
*/
ino_t d_parent_ino(struct dentry *dentry)
{
struct dentry *parent;
struct inode *iparent;
unsigned seq;
ino_t ret;
scoped_guard(rcu) {
seq = raw_seqcount_begin(&dentry->d_seq);
parent = READ_ONCE(dentry->d_parent);
iparent = d_inode_rcu(parent);
if (likely(iparent)) {
ret = iparent->i_ino;
if (!read_seqcount_retry(&dentry->d_seq, seq))
return ret;
}
}
spin_lock(&dentry->d_lock);
ret = dentry->d_parent->d_inode->i_ino;
spin_unlock(&dentry->d_lock);
return ret;
}
EXPORT_SYMBOL(d_parent_ino);
static __initdata unsigned long dhash_entries;
static int __init set_dhash_entries(char *str)
{
if (!str)
return 0;
dhash_entries = simple_strtoul(str, &str, 0);
return 1;
}
__setup("dhash_entries=", set_dhash_entries);
static void __init dcache_init_early(void)
{
/* If hashes are distributed across NUMA nodes, defer
* hash allocation until vmalloc space is available.
*/
if (hashdist)
return;
dentry_hashtable =
alloc_large_system_hash("Dentry cache",
sizeof(struct hlist_bl_head),
dhash_entries,
13,
HASH_EARLY | HASH_ZERO,
&d_hash_shift,
NULL,
0,
0);
d_hash_shift = 32 - d_hash_shift;
runtime_const_init(shift, d_hash_shift);
runtime_const_init(ptr, dentry_hashtable);
}
static void __init dcache_init(void)
{
/*
* A constructor could be added for stable state like the lists,
* but it is probably not worth it because of the cache nature
* of the dcache.
*/
__dentry_cache = KMEM_CACHE_USERCOPY(dentry,
SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_ACCOUNT,
d_shortname.string);
runtime_const_init(ptr, __dentry_cache);
/* Hash may have been set up in dcache_init_early */
if (!hashdist)
return;
dentry_hashtable =
alloc_large_system_hash("Dentry cache",
sizeof(struct hlist_bl_head),
dhash_entries,
13,
HASH_ZERO,
&d_hash_shift,
NULL,
0,
0);
d_hash_shift = 32 - d_hash_shift;
runtime_const_init(shift, d_hash_shift);
runtime_const_init(ptr, dentry_hashtable);
}
/* SLAB cache for __getname() consumers */
struct kmem_cache *names_cachep __ro_after_init;
EXPORT_SYMBOL(names_cachep);
void __init vfs_caches_init_early(void)
{
int i;
for (i = 0; i < ARRAY_SIZE(in_lookup_hashtable); i++)
INIT_HLIST_BL_HEAD(&in_lookup_hashtable[i]);
dcache_init_early();
inode_init_early();
}
void __init vfs_caches_init(void)
{
names_cachep = kmem_cache_create_usercopy("names_cache", PATH_MAX, 0,
SLAB_HWCACHE_ALIGN|SLAB_PANIC, 0, PATH_MAX, NULL);
dcache_init();
inode_init();
files_init();
files_maxfiles_init();
mnt_init();
bdev_cache_init();
chrdev_init();
}
] |
| 33/2 | 2026/02/02 13:09 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 74269 output: 19 thoughts: 2982 Thoughts: **Analysis of the Dentry Data Race** Okay, here's the breakdown. The code in question involves `dentry_unlist` and `d_lru_del`, specifically in the context of cursor entries used for directory traversal, and I'm looking at a reported data race detected by KCSAN. My immediate thought is to trace the flow and identify any potential for actual harm, considering my expertise in kernel internals. I quickly zero in on `d_alloc_cursor` which clearly sets the `DCACHE_DENTRY_CURSOR` flag. These cursors are allocated using `d_alloc_anon` and, importantly, are *not* added to the standard sibling list using `d_alloc`. They seem to function as temporary placeholders during directory walks as shown by `d_walk`. The reported data race seems to revolve around `dentry_unlist` (Task 3304) accessing the `d_sib.next` field and checking the `DCACHE_DENTRY_CURSOR` flag of a dentry (let's call it `d2`) while `d_lru_del` (Task 3002) is running on that dentry and clearing some flags in `d2->d_flags`. My initial thought is the race will not be harmful, since `d_lru_del` removes the `DCACHE_LRU_LIST` flag, which cursors *never* have. They are not LRU entries, they don't have inodes, they are just placeholders for directory traversal. The KCSAN report indicates a read of `d2->d_flags` by Task 3304 and a write to `d2->d_flags` by Task 3002. Specifically, the line that reads is checking `DCACHE_DENTRY_CURSOR`. The key here is that *cursors are never put on the LRU list*. So `d_lru_del` is *only* operating on non-cursor dentries. `dentry_unlist` iterates through the sibling list, skipping cursors, and the lock protects the state of the list. The critical check is in `dentry_unlist`. It uses a loop to iterate through siblings, and skips cursors. So, if `d2` is a cursor, the loop continues. If `d2` is not a cursor, the loop breaks. In that second case, we *know* `d2` is not a cursor, which mean that `d_lru_del` has the right to remove its `DCACHE_LRU_LIST` flag. Given that cursor-ness is a *stable* flag set at allocation and never cleared, I can safely assume that the read in `dentry_unlist` will consistently yield the same value. I considered whether there could be a case where `DCACHE_DENTRY_CURSOR` could be inadvertently modified to look set. Since `d_flags` is an `unsigned int` and 32-bit writes are atomic, either the old or new value of the word will be seen. Since the bits written by `d_lru_del` are different from the cursor bit, it is safe. I briefly considered `READ_ONCE`/`WRITE_ONCE` but while common in RCU scenarios, they are not strictly needed here because of the atomicity of the 32-bit access. `d_flags` is *often* accessed without locks in many parts of the kernel. The compiler *might* be able to do something weird with the code, but there aren't any such optimizations that could make a bit appear set when it isn't. The compiler *could* do something crazy but it's unlikely to happen in this case, since `READ_ONCE` and `WRITE_ONCE` are not being used in this instance. I also checked if `DCACHE_DENTRY_KILLED` was relevant. It is set in `dentry_unlist`, but that's for the current dentry, while we are looking at a sibling. So the race is still on the sibling. I then considered whether the sibling could be freed under us, since the KCSAN report also has some locking in it. But `dentry_unlist` holds the parent lock, so the sibling dentry cannot be freed while the reader is alive and operating in the list. Now I double-check to be absolutely sure: I searched the kernel code to ensure that there are no cases where `DCACHE_DENTRY_CURSOR` is ever *cleared*. I did not find any. Then I carefully re-examined the code in `dentry_unlist` (and I also look at the diff). The code iterates the sibling list, and if it sees a cursor, it skips it. If it sees |
| 34/2 | 2026/02/02 13:10 | tool | set-results |
0mArgs:map[Benign:true Confident:true] Results: map[Benign:true Confident:true] |
| 35/2 | 2026/02/02 13:10 | llm | expert |
0mModel:gemini-3-flash-preview Tokens: input: 78312 output: 905 thoughts: 385 Thoughts: **My Analysis of the Dentry Flag Interaction** Okay, the analysis is done, and here's the gist of it for the final response. We have two tasks, 3002 and 3304, dealing with dentries, specifically `d1` and `d2`. Task 3002 is essentially performing a dentry kill operation on `d2`, modifying its `d_flags` during the process in `d_lru_del` and `__d_clear_type_and_inode`. Specifically, it's clearing the `DCACHE_LRU_LIST` and `DCACHE_ENTRY_TYPE` bits. Nothing unusual there. Then, Task 3304 comes into play, also killing another dentry, `d1`. While unlisting `d1`, it needs to look at its next sibling, `d2`. Crucially, it reads `d2->d_flags` to check for the `DCACHE_DENTRY_CURSOR` bit. This is the key. Now, `DCACHE_DENTRY_CURSOR` is a stable bit reserved for these special "cursor" dentries, and it's never set on dentries that are also on the LRU list; `d_lru_del` would never be called on a dentry with this bit set. Since `d2` is being killed by Task 3002, and `DCACHE_DENTRY_CURSOR` is *not* set on `d2` to begin with, the bit remains at `0` throughout the operation in Task 3002. Even better, since Task 3002 is only modifying bits *other* than the one that Task 3304 is checking, and because 32-bit memory accesses are atomic, Task 3304 will always see the correct state of the bit in `d2` as `0`, preventing tearing. We're safe here. Finally, Task 3304 holds the parent's `d_lock`, which is important. This ensures that `d2` cannot be unlinked from the sibling list or freed until Task 3304 has completely finished its traversal. So, although the race is on the same memory location, different bits are involved, and it's ultimately harmless. No concurrency issues. |
| Total Calls | Total Tokens | Avg Tokens | Total Duration (Seconds) | Avg Duration (Seconds) |
|---|
| Total Calls | Total Duration (Seconds) | Avg Duration (Seconds) |
|---|