All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Shane Miller <gshanemiller6@gmail.com>
To: Jiri Pirko <jiri@resnulli.us>
Cc: netdev@vger.kernel.org
Subject: Re: SR-IOV + switchdev + vlan + Mellanox: Cannot ping
Date: Sun, 28 Apr 2024 16:24:14 -0400	[thread overview]
Message-ID: <CAFtQo5BxQR56e5PNFQoRXNHOfssPZNdTDMEFpHFVS07FPpKCKg@mail.gmail.com> (raw)
In-Reply-To: <ZizS4MlZcIE0KoHq@nanopsycho>

J Pirko wrote,

"You have to configure forwarding between appropriate representors. Use
ovs (probably easiest) or tc."

Thank you for taking time to reply. But I need additional information/guidance
 on how to bridge and what to bridge.

TC can be used to mirror packets for example and in fact, I have set that up,
which is why I need the NIC in switchdev mode. However, this is orthogonal.
As I say in the original post, leaving the NIC in "legacy" mode has no ping
issues. As far as I understand it TC is not part of the solution space here.

My vague understanding is putting a NIC into switchdev mode means packets
flow into HW only not passing through the kernel, and this is what screws ARP
up since the kernel is needed at bit. A bridge is supposed to fix that. I tried,

brctl addbr sriovbr
brctl addif sriovbr <DEV>
ip link set dev sriovbr up
ip addr ... sriov ...

where <DEV> was the link name of the physical device, or the virtual link, or
the port representor, or combo to no effect.

So, restating the issue: A NIC is SR-IOV virtualized into 4 virt NICs each with
a vlan, IP address. The NIC is placed into switchdev mode. The virtual NICs
are not pingable from other boxes. The other boxes see the NIC's MAC
addresses as incomplete (arp -n or arp -e).

What and how do I bridge/link to fix this problem?

On Sat, Apr 27, 2024 at 6:26 AM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Fri, Apr 26, 2024 at 10:35:28PM CEST, gshanemiller6@gmail.com wrote:
> >Problem:
> >-----------------------------------------------------------------
> >root@machA $ ping 10.xx.xx.194
> >PING 10.xx.xx.194 (10.xx.xx.194) 56(84) bytes of data
> >From 10.xx.xx.191 icmp seq=10 Destination Host Unreachable
> >Proximate Cause:
> >-----------------------------------------------------------------
> >This seems to be a side effect of "switchdev" mode. When the identical
> >configuration is set up EXCEPT that the SR-IOV virtualized NIC is left
> >"legacy", ping (and ncat) works just fine.
> >
> >As far as I can tell I need a bridge or bridge commands, but I have no
> >idea where to start. This environment will not allow me to add modify
> >commands when enabling switchdev mode. devlink seems to accept
> >"switchdev" alone without modifiers.
>
> You have to configure forwarding between appropriate representors. Use
> ovs (probably easiest) or tc.
>
> >
> >Note: putting a NIC into switchdev mode makes the virtual functions
> >show as "link-state disable" which is confusing. (See below.) Contrary
> >to what it seems to suggest, the virtual NICs are up and running
> >
> >Running "arp -e" on machine A shows machine B's ieth3v0 MAC address as
> >incomplete suggesting switchdev+ARP is broken.
> >
> >Problem Environment:
> >-----------------------------------------------------------------
> >OS: RHEL 8.6 4.18.0-372.46.1.el8 x64
> >NICs: Mellanox ConnectX-6
> >
> >Machine A Links:
> >70 tst@ieth3: <...LOWER_UP...> mtu 1500
> >   link/ether xx.xx.xx.xx.xx.xx
> >   vlan protocol 802.1Q id 133 <REORDER_HDR>
> >   Inet 10.xx.xx.191
> >
> >Machine B Links With ieth3 in SR-IOV mode in switchdev mode:
> ># Physical Function and its virtual functions:
> >                                                 2: ieth3:
> ><...PROMISC,UP,LOWER_UP> mtu 1500
> >    link/ether xx.xx.xx.xx.xx.f6 portname p0 switchid xxxxe988
> >    vf 0 link/ether xx.xx.xx.xx.xx.00 vlan 133 spoof off, link-state
> >disable, trust off
> >    . . .
> ># Port representers
> >893: ieth3r0: <...UP,LOWER_UP> mtu 1500
> >link/ether xx.xx.xx.xx.xx.e1 portname pf0vf0 switchid xxxxe988
> >. . .
> ># Virtual Links
> >897: ieth3v0: <...UP,LOWER_UP> mtu 1500
> >  link/ether xx.xx.xx.xx.xx.00 promiscuity 0
> >  inet 10.xx.xx.194/24 scope global ieth3v0
> >  . . .
> >
> >SR-IOV Setup Summary
> >-----------------------------------------------------------------
> >This is done right since, in legacy mode, ping/ncat works fine:
> >
> >1. Enable IOMMU, Vtx in BIOS
> >2. Boot Linux with iommu=on on command line
> >3. Install Mellanox OFED
> >4. Enable SR-IOV for max 8 devices in Mellanox firmware
> >(reboot)
> >5. Create 4 virtual NICs w/ SR-IOV
> >6. Configure 4 virtual NICs mac, trust off, spoofchk off, state auto
> >7. Unbind virtual NICs
> >8. Put ieth3 into switchdev mode
> >9. Rebind virtual NICs
> >10. Bring all links up
> >11. Assign IPV4 addresses to virtual links
> >

  reply	other threads:[~2024-04-28 20:24 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-26 20:35 SR-IOV + switchdev + vlan + Mellanox: Cannot ping Shane Miller
2024-04-27 10:26 ` Jiri Pirko
2024-04-28 20:24   ` Shane Miller [this message]
2024-04-29 11:29     ` Jiri Pirko
2024-04-30 21:29       ` Shane Miller
2024-05-01 18:16         ` Benjamin Poirier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFtQo5BxQR56e5PNFQoRXNHOfssPZNdTDMEFpHFVS07FPpKCKg@mail.gmail.com \
    --to=gshanemiller6@gmail.com \
    --cc=jiri@resnulli.us \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.