On Fri, Apr 26, 2024 at 1:56 PM Yi Zhang wrote: > > On Wed, Apr 24, 2024 at 9:28 PM Guoqing Jiang wrote: > > > > Hi, > > > > On 4/8/24 14:03, Yi Zhang wrote: > > > Hi > > > I found the below kmemleak issue during blktests nvme/rdma on the > > > latest linux-rdma/for-next, please help check it and let me know if > > > you need any info/testing for it, thanks. > > > > Could you share which test case caused the issue? I can't reproduce > > it with 6.9-rc3+ kernel (commit 586b5dfb51b) with the below. > > It can be reproduced by [1], you can find more info from the symbol > info[2], I also attached the config file, maybe you can this config > file Just attached the config file > > [1] nvme_trtype=rdma ./check nvme/012 > [2] > unreferenced object 0xffff8883a87e8800 (size 192): > comm "rdma", pid 2355, jiffies 4294836069 > hex dump (first 32 bytes): > 32 00 00 00 00 00 00 00 c0 ff ff ff 1f 00 00 00 2............... > 10 88 7e a8 83 88 ff ff 10 88 7e a8 83 88 ff ff ..~.......~..... > backtrace (crc 4db191c4): > [] kmalloc_trace+0x30d/0x3b0 > [] alloc_gid_entry+0x47/0x380 [ib_core] > [] add_modify_gid+0x166/0x930 [ib_core] > [] ib_cache_update.part.0+0x6d8/0x910 [ib_core] > [] ib_cache_setup_one+0x24a/0x350 [ib_core] > [] ib_register_device+0x9e/0x3a0 [ib_core] > [] 0xffffffffc24ac389 > [] nldev_newlink+0x2b8/0x520 [ib_core] > [] rdma_nl_rcv_msg+0x2c3/0x520 [ib_core] > [] > rdma_nl_rcv_skb.constprop.0.isra.0+0x23c/0x3a0 [ib_core] > [] netlink_unicast+0x445/0x710 > [] netlink_sendmsg+0x761/0xc40 > [] __sys_sendto+0x3a9/0x420 > [] __x64_sys_sendto+0xdc/0x1b0 > [] do_syscall_64+0x93/0x180 > [] entry_SYSCALL_64_after_hwframe+0x71/0x79 > > (gdb) l *(alloc_gid_entry+0x47) > 0x2eff7 is in alloc_gid_entry (./include/linux/slab.h:628). > 623 > 624 if (size > KMALLOC_MAX_CACHE_SIZE) > 625 return kmalloc_large(size, flags); > 626 > 627 index = kmalloc_index(size); > 628 return kmalloc_trace( > 629 kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index], > 630 flags, size); > 631 } > 632 return __kmalloc(size, flags); > > (gdb) l *(add_modify_gid+0x166) > 0x30206 is in add_modify_gid (drivers/infiniband/core/cache.c:447). > 442 * empty table entries instead of storing them. > 443 */ > 444 if (rdma_is_zero_gid(&attr->gid)) > 445 return 0; > 446 > 447 entry = alloc_gid_entry(attr); > 448 if (!entry) > 449 return -ENOMEM; > 450 > 451 if (rdma_protocol_roce(attr->device, attr->port_num)) { > > > > > > use_siw=1 nvme_trtype=rdma ./check nvme/ > > > > > # dmesg | grep kmemleak > > > [ 67.130652] kmemleak: Kernel memory leak detector initialized (mem > > > pool available: 36041) > > > [ 67.130728] kmemleak: Automatic memory scanning thread started > > > [ 1051.771867] kmemleak: 2 new suspected memory leaks (see > > > /sys/kernel/debug/kmemleak) > > > [ 1832.796189] kmemleak: 8 new suspected memory leaks (see > > > /sys/kernel/debug/kmemleak) > > > [ 2578.189075] kmemleak: 17 new suspected memory leaks (see > > > /sys/kernel/debug/kmemleak) > > > [ 3330.710984] kmemleak: 4 new suspected memory leaks (see > > > /sys/kernel/debug/kmemleak) > > > > > > unreferenced object 0xffff88855da53400 (size 192): > > > comm "rdma", pid 10630, jiffies 4296575922 > > > hex dump (first 32 bytes): > > > 37 00 00 00 00 00 00 00 c0 ff ff ff 1f 00 00 00 7............... > > > 10 34 a5 5d 85 88 ff ff 10 34 a5 5d 85 88 ff ff .4.].....4.].... > > > backtrace (crc 47f66721): > > > [] kmalloc_trace+0x30d/0x3b0 > > > [] alloc_gid_entry+0x47/0x380 [ib_core] > > > [] add_modify_gid+0x166/0x930 [ib_core] > > > > I guess add_modify_gid is called from config_non_roce_gid_cache, not sure > > why we don't check the return value of it here. > > > > Looks put_gid_entry is called in case add_modify_gid returns failure, it > > would > > trigger schedule_free_gid -> queue_work(ib_wq, &entry->del_work), then > > free_gid_work -> free_gid_entry_locked would free storage asynchronously by > > put_gid_ndev and also entry. > > > > > [] ib_cache_update.part.0+0x6d8/0x910 [ib_core] > > > [] ib_cache_setup_one+0x24a/0x350 [ib_core] > > > [] ib_register_device+0x9e/0x3a0 [ib_core] > > > [] 0xffffffffc2a3d389 > > > [] nldev_newlink+0x2b8/0x520 [ib_core] > > > [] rdma_nl_rcv_msg+0x2c3/0x520 [ib_core] > > > [] > > > rdma_nl_rcv_skb.constprop.0.isra.0+0x23c/0x3a0 [ib_core] > > > [] netlink_unicast+0x445/0x710 > > > [] netlink_sendmsg+0x761/0xc40 > > > [] __sys_sendto+0x3a9/0x420 > > > [] __x64_sys_sendto+0xdc/0x1b0 > > > [] do_syscall_64+0x93/0x180 > > > [] entry_SYSCALL_64_after_hwframe+0x71/0x79 > > > > After ib_cache_setup_one failed, maybe ib_cache_cleanup_one is needed > > which flush ib_wq to ensure storage is freed. Could you try with the change? > Will try it later. > The kmemleak still can be reproduced with this change: unreferenced object 0xffff8881f89fde00 (size 192): comm "rdma", pid 8708, jiffies 4295703453 hex dump (first 32 bytes): 02 00 00 00 00 00 00 00 c0 ff ff ff 1f 00 00 00 ................ 10 de 9f f8 81 88 ff ff 10 de 9f f8 81 88 ff ff ................ backtrace (crc 888c494b): [] kmalloc_trace+0x30d/0x3b0 [] alloc_gid_entry+0x47/0x380 [ib_core] [] add_modify_gid+0x166/0x930 [ib_core] [] ib_cache_update.part.0+0x6d8/0x910 [ib_core] [] ib_cache_setup_one+0x24a/0x350 [ib_core] [] ib_register_device+0x9e/0x3a0 [ib_core] [] siw_qp_state_to_ib_qp_state+0x28a9/0xfffffffffffd1520 [siw] [] nldev_newlink+0x2b8/0x520 [ib_core] [] rdma_nl_rcv_msg+0x2c3/0x520 [ib_core] [] rdma_nl_rcv_skb.constprop.0.isra.0+0x23c/0x3a0 [ib_core] [] netlink_unicast+0x445/0x710 [] netlink_sendmsg+0x761/0xc40 [] __sys_sendto+0x3a9/0x420 [] __x64_sys_sendto+0xdc/0x1b0 [] do_syscall_64+0x93/0x180 [] entry_SYSCALL_64_after_hwframe+0x71/0x79 > > > > --- a/drivers/infiniband/core/device.c > > +++ b/drivers/infiniband/core/device.c > > @@ -1388,7 +1388,7 @@ int ib_register_device(struct ib_device *device, > > const char *name, > > if (ret) { > > dev_warn(&device->dev, > > "Couldn't set up InfiniBand P_Key/GID cache\n"); > > - return ret; > > + goto cache_cleanup; > > } > > > > Thanks, > > Guoqing > > > > > -- > Best Regards, > Yi Zhang -- Best Regards, Yi Zhang