https://bugs.dpdk.org/show_bug.cgi?id=1419 Bug ID: 1419 Summary: [mlx5] Segfault when calling rte_eth_dev_start() twice Product: DPDK Version: unspecified Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: normal Priority: Normal Component: ethdev Assignee: dev@dpdk.org Reporter: vojanec@cesnet.cz Target Milestone: --- Created attachment 279 --> https://bugs.dpdk.org/attachment.cgi?id=279&action=edit Example application for reproducing When calling 'rte_eth_dev_start()' on a port whose mempool is not large enough, the function fails with an error code '-ENOMEM' and message: mlx5_net: port 0 Rx queue allocation failed: Cannot allocate memory This is expected behaviour. However, when retrying the same call right after the failure, the function now fails with error code '-EINVAL' and a message: mlx5_net: port 0 failed to set defaults flows This behaviour is suspicious, as the expected behaviour would be to return the same error message since no more memory was allocated in the meantime. Furthermore, even more suspicious and incorrect behaviour is observed when flow isolated mode is enabled. In that case, the first call to 'rte_eth_dev_start()' fails as expected, but the second call actually succeeds (return value 0). This leads to undefined behaviour and a segfault when calling 'rte_eth_rx_burst()' later. [Steps to reproduce] See the attached patch introducing an example application. Apply the patch and build the application using 'make'. Run the application as follows: # dpdk-hugepages --setup 2G # ./build/crash -- 1024 The only application argument is the packet mempool size. Setting it to 1024 ensures that the mempool is small enough to get allocated, but also fails the first 'rte_eth_dev_start()'. The application initializes a single DPDK port (use the '--allow' argument to specify), enables flow isolate mode and attempts to start the port twice. After that, the application segfaults when calling 'rte_eth_rx_burst()'. [Bug investigation] The 'mlx5_dev_start()' function deallocates used memory when failing after its first call. However, it seems that it deallocates more memory than it actually allocated, thus effectively unconfiguring the queues (or entire port, unsure). In flow isolate mode, it seems the second call to 'mlx5_dev_start()' skips some initialization and does not return an error. [DPDK Version] Tested on: e2e546ab5b ("version: 24.07-rc0") eeb0605f11 ("version: 23.11.0"), tag: v23.11 [OS Version] Operating system: Red Hat Enterprise Linux release 8.9 (Ootpa) Kernel: 4.18.0-477.10.1.el8_8.x86_64 Architecture: x86_64 [Network Devices] 0000:c4:00.0 'MT2892 Family [ConnectX-6 Dx] 101d' if=ens3f0np0 drv=mlx5_core unused= 0000:c4:00.1 'MT2892 Family [ConnectX-6 Dx] 101d' if=ens3f1np1 drv=mlx5_core unused= -- You are receiving this mail because: You are the assignee for the bug.