bisecting cause commit starting from 746f534a4809e07f427f7d13d10f3a6a9641e5c3 building syzkaller on 409809d8a7c9c775eaea317add40e7a86a1e836c testing commit 746f534a4809e07f427f7d13d10f3a6a9641e5c3 with gcc (GCC) 8.1.0 kernel signature: 10ca71ababba9169d0029b4e0656709301490fea2539b3e152d3bd642633135f all runs: crashed: WARNING in tracepoint_probe_register_prio testing release v5.8 testing commit bcf876870b95592b52519ed4aafcf9d95999bc9c with gcc (GCC) 8.1.0 kernel signature: fa4ebf5da22c0b22eaa900c84433806481292f1169c09ba0c70dd47dd42f8f4e all runs: crashed: WARNING in tracepoint_probe_register_prio testing release v5.7 testing commit 3d77e6a8804abcc0504c904bd6e5cdf3a5cf8162 with gcc (GCC) 8.1.0 kernel signature: 076bc1ce31939d0be8004204be9c118272d57f14c4fde3fe9908ba0631dbb6c6 all runs: crashed: WARNING in tracepoint_probe_register_prio testing release v5.6 testing commit 7111951b8d4973bda27ff663f2cf18b663d15b48 with gcc (GCC) 8.1.0 kernel signature: 74e0127b7efe4a692118761bdb3b03ef52681403e21ef24a33fe4da8f852b6ba all runs: crashed: WARNING in tracepoint_probe_register_prio testing release v5.5 testing commit d5226fa6dbae0569ee43ecfc08bdcd6770fc4755 with gcc (GCC) 8.1.0 kernel signature: 35fc28e9cf5ba6267e96e90ea88ab41d73de62e1d253187763fcb6e8a7babb18 all runs: crashed: WARNING in tracepoint_probe_register_prio testing release v5.4 testing commit 219d54332a09e8d8741c1e1982f5eae56099de85 with gcc (GCC) 8.1.0 kernel signature: 011b9dcc04d42ad993a4770ed2a23f69576609a536d77ec4d74da6964bd3a627 all runs: crashed: WARNING in tracepoint_probe_register_prio testing release v5.3 testing commit 4d856f72c10ecb060868ed10ff1b1453943fc6c8 with gcc (GCC) 8.1.0 kernel signature: 0eb6635f05a682b86be203cd46fb471cfddff6b55cb824f7e978afd1457fb263 all runs: crashed: WARNING in tracepoint_probe_register_prio testing release v5.2 testing commit 0ecfebd2b52404ae0c54a878c872bb93363ada36 with gcc (GCC) 8.1.0 kernel signature: 21bac51ef2c087bb69346a64ef8b2aaa1c5552de4f1c440497828a73598579c5 all runs: crashed: WARNING in tracepoint_probe_register_prio testing release v5.1 testing commit e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd with gcc (GCC) 8.1.0 kernel signature: bce0871b9d457bb68b83af9349037aaaa51d1ab914a5c8ce12011a3e893e23f0 run #0: crashed: KASAN: use-after-free Read in bpf_prog_kallsyms_find run #1: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #2: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #3: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #4: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #5: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #6: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #7: crashed: WARNING in bpf_prog_kallsyms_find run #8: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #9: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find testing release v5.0 testing commit 1c163f4c7b3f621efff9b28a47abb36f7378d783 with gcc (GCC) 8.1.0 kernel signature: 4289e2e9af6759197df213673ebd6c569d0ef804122d1d40b2855b67555e6a7c run #0: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #1: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #2: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #3: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #4: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #5: crashed: KASAN: use-after-free Read in bpf_prog_kallsyms_find run #6: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #7: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #8: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #9: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find testing release v4.20 testing commit 8fe28cb58bcb235034b64cbbb7550a8a43fd88be with gcc (GCC) 8.1.0 kernel signature: 681fcf030421b91263de66cf773dcc2193ab02dadf49875524ee520d3291fc15 all runs: OK # git bisect start 1c163f4c7b3f621efff9b28a47abb36f7378d783 8fe28cb58bcb235034b64cbbb7550a8a43fd88be Bisecting: 7011 revisions left to test after this (roughly 13 steps) [af7ddd8a627c62a835524b3f5b471edbbbcce025] Merge tag 'dma-mapping-4.21' of git://git.infradead.org/users/hch/dma-mapping testing commit af7ddd8a627c62a835524b3f5b471edbbbcce025 with gcc (GCC) 8.1.0 kernel signature: 5eee717422042e20158a3f7927de8ab91e4bee8575d637046396b3488a4863f7 run #0: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #1: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #2: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #3: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #4: crashed: KASAN: use-after-free Read in bpf_prog_kallsyms_find run #5: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #6: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #7: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #8: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #9: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find # git bisect bad af7ddd8a627c62a835524b3f5b471edbbbcce025 Bisecting: 3448 revisions left to test after this (roughly 12 steps) [792bf4d871dea8b69be2aaabdd320d7c6ed15985] Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip testing commit 792bf4d871dea8b69be2aaabdd320d7c6ed15985 with gcc (GCC) 8.1.0 kernel signature: 4afecbc17384c9c3f286c854bc1f145b1705f77bee24d9127ddd65c93918e52a all runs: OK # git bisect good 792bf4d871dea8b69be2aaabdd320d7c6ed15985 Bisecting: 1729 revisions left to test after this (roughly 11 steps) [aa9d6e0f33aea8a1879e7e53fe0e436943f9ce0c] linux/netlink.h: drop unnecessary extern prefix testing commit aa9d6e0f33aea8a1879e7e53fe0e436943f9ce0c with gcc (GCC) 8.1.0 kernel signature: f5eb7d25aff781e4ae060ea37dce55c58d08cf99b58efe7e650016a5095f94b2 run #0: crashed: WARNING in bpf_prog_kallsyms_find run #1: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #2: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #3: crashed: KASAN: use-after-free Read in bpf_prog_kallsyms_find run #4: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #5: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #6: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #7: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #8: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #9: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find # git bisect bad aa9d6e0f33aea8a1879e7e53fe0e436943f9ce0c Bisecting: 858 revisions left to test after this (roughly 10 steps) [2a95471c3397734ba6869ca3fa084490fb35b40b] Merge branch 'prog_test_run-improvement' testing commit 2a95471c3397734ba6869ca3fa084490fb35b40b with gcc (GCC) 8.1.0 kernel signature: dd1232063be6f34e9e91f8e86f1d7f09e5c641c393eb7d7535325f7a3fc7fcdf all runs: OK # git bisect good 2a95471c3397734ba6869ca3fa084490fb35b40b Bisecting: 412 revisions left to test after this (roughly 9 steps) [addb0679839a1f74da6ec742137558be244dd0e9] Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next testing commit addb0679839a1f74da6ec742137558be244dd0e9 with gcc (GCC) 8.1.0 kernel signature: 3be3b473a9e8af5ce39330348aba74bba3cba342448dce57fcc6336fbdc81dcd run #0: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #1: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #2: crashed: KASAN: use-after-free Read in bpf_prog_kallsyms_find run #3: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #4: crashed: KASAN: use-after-free Read in bpf_prog_kallsyms_find run #5: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #6: crashed: KASAN: use-after-free Read in bpf_prog_kallsyms_find run #7: crashed: BUG: unable to handle kernel paging request in bpf_prog_kallsyms_find run #8: crashed: WARNING in bpf_prog_kallsyms_find run #9: crashed: KASAN: use-after-free Read in bpf_prog_kallsyms_find # git bisect bad addb0679839a1f74da6ec742137558be244dd0e9 Bisecting: 222 revisions left to test after this (roughly 8 steps) [b6f153d3e5a5bfbda17b997bd9e258143aa11809] selftests: mlxsw: Add one-armed router test testing commit b6f153d3e5a5bfbda17b997bd9e258143aa11809 with gcc (GCC) 8.1.0 kernel signature: e0531e6a245efa1cab3b2d91a9a0dcf0593dc835b0f5fde646ea92715118d5e0 all runs: OK # git bisect good b6f153d3e5a5bfbda17b997bd9e258143aa11809 Bisecting: 119 revisions left to test after this (roughly 7 steps) [d8ed257f313f64e9835e61d1365dea95a0a1c9c6] tcp: handle EOR and FIN conditions the same in tcp_tso_should_defer() testing commit d8ed257f313f64e9835e61d1365dea95a0a1c9c6 with gcc (GCC) 8.1.0 kernel signature: 557e6acf11988911c27349b20b812af767d328aec9153ba077aab8793ce95194 run #0: crashed: KASAN: use-after-free Read in ___neigh_create run #1: crashed: KASAN: use-after-free Read in neigh_mark_dead run #2: crashed: KASAN: use-after-free Read in neigh_mark_dead run #3: crashed: BUG: corrupted list in neigh_mark_dead run #4: crashed: BUG: corrupted list in neigh_mark_dead run #5: crashed: BUG: corrupted list in neigh_mark_dead run #6: crashed: BUG: corrupted list in neigh_mark_dead run #7: crashed: BUG: corrupted list in neigh_mark_dead run #8: OK run #9: OK # git bisect bad d8ed257f313f64e9835e61d1365dea95a0a1c9c6 Bisecting: 51 revisions left to test after this (roughly 6 steps) [6b241e411607a8f78bee74b96655e9d7835ea8ba] Merge branch 'net-aquantia-add-RSS-configuration' testing commit 6b241e411607a8f78bee74b96655e9d7835ea8ba with gcc (GCC) 8.1.0 kernel signature: 162e4b267b4e5a310cdce30effcf84dbf78a36649858c3af970f23459eb35a8c all runs: OK # git bisect good 6b241e411607a8f78bee74b96655e9d7835ea8ba Bisecting: 25 revisions left to test after this (roughly 5 steps) [c3529177db471b964fbe327ffb801266f0482d64] net: hns3: handle hw errors of SSU testing commit c3529177db471b964fbe327ffb801266f0482d64 with gcc (GCC) 8.1.0 kernel signature: 0c8617a3171b41c36ee7ca74ff7cd16462fd1b5159dc9d617dcc2966f0f4dca1 all runs: OK # git bisect good c3529177db471b964fbe327ffb801266f0482d64 Bisecting: 12 revisions left to test after this (roughly 4 steps) [120d633f199b264e0ad4c98eedc0564a89171c1d] Merge branch 'platform-data-controls-for-mdio-gpio' testing commit 120d633f199b264e0ad4c98eedc0564a89171c1d with gcc (GCC) 8.1.0 kernel signature: d2de6a00352e865c48eecacdaa1fd3e672a81a47ad4e232058cf08745be180f3 run #0: crashed: KASAN: use-after-free Read in neigh_mark_dead run #1: crashed: BUG: corrupted list in neigh_mark_dead run #2: crashed: BUG: corrupted list in neigh_mark_dead run #3: crashed: BUG: corrupted list in neigh_mark_dead run #4: crashed: KASAN: use-after-free Read in neigh_mark_dead run #5: crashed: BUG: corrupted list in ___neigh_create run #6: crashed: BUG: corrupted list in ___neigh_create run #7: crashed: BUG: corrupted list in neigh_mark_dead run #8: crashed: BUG: corrupted list in neigh_mark_dead run #9: OK # git bisect bad 120d633f199b264e0ad4c98eedc0564a89171c1d Bisecting: 6 revisions left to test after this (roughly 3 steps) [dfe465d33e7fce145746d836ddb31f0e27f28e28] tc-testing: Add new TdcResults module testing commit dfe465d33e7fce145746d836ddb31f0e27f28e28 with gcc (GCC) 8.1.0 kernel signature: 39f7eb8dd77690865d28779b5d771093ddbd9d3b85213fa9c3ede8b771b87b97 run #0: crashed: KASAN: use-after-free Read in ___neigh_create run #1: crashed: BUG: corrupted list in ___neigh_create run #2: crashed: BUG: corrupted list in ___neigh_create run #3: crashed: BUG: corrupted list in neigh_mark_dead run #4: crashed: BUG: corrupted list in neigh_mark_dead run #5: crashed: BUG: corrupted list in neigh_mark_dead run #6: OK run #7: OK run #8: OK run #9: OK # git bisect bad dfe465d33e7fce145746d836ddb31f0e27f28e28 Bisecting: 2 revisions left to test after this (roughly 2 steps) [58956317c8de52009d1a38a721474c24aef74fe7] neighbor: Improve garbage collection testing commit 58956317c8de52009d1a38a721474c24aef74fe7 with gcc (GCC) 8.1.0 kernel signature: a590ddd33b4a769b9832f7478d0bf66b878b8531e2eee29b7ea5cd5e7a1ee3cc run #0: crashed: BUG: corrupted list in neigh_mark_dead run #1: crashed: BUG: corrupted list in neigh_mark_dead run #2: crashed: BUG: corrupted list in neigh_mark_dead run #3: crashed: BUG: corrupted list in neigh_mark_dead run #4: crashed: BUG: corrupted list in neigh_mark_dead run #5: crashed: BUG: corrupted list in ___neigh_create run #6: crashed: BUG: corrupted list in neigh_mark_dead run #7: OK run #8: OK run #9: OK # git bisect bad 58956317c8de52009d1a38a721474c24aef74fe7 Bisecting: 0 revisions left to test after this (roughly 1 step) [12edfdfc79860f42fa493f81518e040376b6a5bc] Merge branch 'hns3-error-handling' testing commit 12edfdfc79860f42fa493f81518e040376b6a5bc with gcc (GCC) 8.1.0 kernel signature: ab82a333e6586d1883f43264aaf553335400f558bc6ec8806be28753665668f0 all runs: OK # git bisect good 12edfdfc79860f42fa493f81518e040376b6a5bc 58956317c8de52009d1a38a721474c24aef74fe7 is the first bad commit commit 58956317c8de52009d1a38a721474c24aef74fe7 Author: David Ahern Date: Fri Dec 7 12:24:57 2018 -0800 neighbor: Improve garbage collection The existing garbage collection algorithm has a number of problems: 1. The gc algorithm will not evict PERMANENT entries as those entries are managed by userspace, yet the existing algorithm walks the entire hash table which means it always considers PERMANENT entries when looking for entries to evict. In some use cases (e.g., EVPN) there can be tens of thousands of PERMANENT entries leading to wasted CPU cycles when gc kicks in. As an example, with 32k permanent entries, neigh_alloc has been observed taking more than 4 msec per invocation. 2. Currently, when the number of neighbor entries hits gc_thresh2 and the last flush for the table was more than 5 seconds ago gc kicks in walks the entire hash table evicting *all* entries not in PERMANENT or REACHABLE state and not marked as externally learned. There is no discriminator on when the neigh entry was created or if it just moved from REACHABLE to another NUD_VALID state (e.g., NUD_STALE). It is possible for entries to be created or for established neighbor entries to be moved to STALE (e.g., an external node sends an ARP request) right before the 5 second window lapses: -----|---------x|----------|----- t-5 t t+5 If that happens those entries are evicted during gc causing unnecessary thrashing on neighbor entries and userspace caches trying to track them. Further, this contradicts the description of gc_thresh2 which says "Entries older than 5 seconds will be cleared". One workaround is to make gc_thresh2 == gc_thresh3 but that negates the whole point of having separate thresholds. 3. Clearing *all* neigh non-PERMANENT/REACHABLE/externally learned entries when gc_thresh2 is exceeded is over kill and contributes to trashing especially during startup. This patch addresses these problems as follows: 1. Use of a separate list_head to track entries that can be garbage collected along with a separate counter. PERMANENT entries are not added to this list. The gc_thresh parameters are only compared to the new counter, not the total entries in the table. The forced_gc function is updated to only walk this new gc_list looking for entries to evict. 2. Entries are added to the list head at the tail and removed from the front. 3. Entries are only evicted if they were last updated more than 5 seconds ago, adhering to the original intent of gc_thresh2. 4. Forced gc is stopped once the number of gc_entries drops below gc_thresh2. 5. Since gc checks do not apply to PERMANENT entries, gc levels are skipped when allocating a new neighbor for a PERMANENT entry. By extension this means there are no explicit limits on the number of PERMANENT entries that can be created, but this is no different than FIB entries or FDB entries. Signed-off-by: David Ahern Signed-off-by: David S. Miller Documentation/networking/ip-sysctl.txt | 4 +- include/net/neighbour.h | 3 + net/core/neighbour.c | 119 +++++++++++++++++++++++---------- 3 files changed, 90 insertions(+), 36 deletions(-) culprit signature: a590ddd33b4a769b9832f7478d0bf66b878b8531e2eee29b7ea5cd5e7a1ee3cc parent signature: ab82a333e6586d1883f43264aaf553335400f558bc6ec8806be28753665668f0 revisions tested: 24, total time: 4h54m25.062058639s (build: 2h19m53.218773632s, test: 2h31m58.320132181s) first bad commit: 58956317c8de52009d1a38a721474c24aef74fe7 neighbor: Improve garbage collection recipients (to): ["corbet@lwn.net" "davem@davemloft.net" "davem@davemloft.net" "dsahern@gmail.com" "linux-doc@vger.kernel.org" "netdev@vger.kernel.org"] recipients (cc): ["linux-kernel@vger.kernel.org"] crash: BUG: corrupted list in neigh_mark_dead list_del corruption. next->prev should be ffff88808d37d010, but was ffff888094abd950 ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:56! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 17 Comm: kworker/1:0 Not tainted 4.20.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: events_power_efficient neigh_periodic_work RIP: 0010:__list_del_entry_valid.cold.1+0x37/0x4a lib/list_debug.c:54 Code: e8 11 a9 1e fe 0f 0b 4c 89 ea 48 89 de 48 c7 c7 a0 1b 67 87 e8 fd a8 1e fe 0f 0b 48 89 de 48 c7 c7 00 1d 67 87 e8 ec a8 1e fe <0f> 0b 48 89 de 48 c7 c7 a0 1c 67 87 e8 db a8 1e fe 0f 0b 48 89 d9 RSP: 0018:ffff8880a9867c88 EFLAGS: 00010282 RAX: 0000000000000054 RBX: ffff88808d37d010 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff87671aa0 RDI: ffffffff8a146c60 RBP: ffff8880a9867ca0 R08: ffffed1015d65029 R09: ffffed1015d65028 R10: ffffed1015d65028 R11: ffff8880aeb28147 R12: ffff888095c27490 R13: ffffffff8903aa60 R14: ffff88808d37cda8 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8880aeb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000090f7d000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __list_del_entry include/linux/list.h:117 [inline] list_del_init include/linux/list.h:159 [inline] neigh_mark_dead+0x86/0x210 net/core/neighbour.c:125 neigh_periodic_work+0x56f/0x870 net/core/neighbour.c:905 process_one_work+0x7b9/0x15a0 kernel/workqueue.c:2153 worker_thread+0x85/0xb60 kernel/workqueue.c:2296 kthread+0x324/0x3e0 kernel/kthread.c:246 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 Modules linked in: ---[ end trace 7bab0a0af802de1c ]--- RIP: 0010:__list_del_entry_valid.cold.1+0x37/0x4a lib/list_debug.c:54 Code: e8 11 a9 1e fe 0f 0b 4c 89 ea 48 89 de 48 c7 c7 a0 1b 67 87 e8 fd a8 1e fe 0f 0b 48 89 de 48 c7 c7 00 1d 67 87 e8 ec a8 1e fe <0f> 0b 48 89 de 48 c7 c7 a0 1c 67 87 e8 db a8 1e fe 0f 0b 48 89 d9 RSP: 0018:ffff8880a9867c88 EFLAGS: 00010282 RAX: 0000000000000054 RBX: ffff88808d37d010 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff87671aa0 RDI: ffffffff8a146c60 RBP: ffff8880a9867ca0 R08: ffffed1015d65029 R09: ffffed1015d65028 R10: ffffed1015d65028 R11: ffff8880aeb28147 R12: ffff888095c27490 R13: ffffffff8903aa60 R14: ffff88808d37cda8 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8880aeb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000000846a000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400