All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [virtio-dev] Re: [PATCH] virtio_net: support split header
  2022-02-16  3:01 [virtio-dev] " Xuan Zhuo
@ 2022-02-16  4:32 ` Jason Wang
  2022-02-16  6:05   ` Xuan Zhuo
  0 siblings, 1 reply; 12+ messages in thread
From: Jason Wang @ 2022-02-16  4:32 UTC (permalink / raw)
  To: Xuan Zhuo; +Cc: Virtio-Dev, Michael S. Tsirkin

On Wed, Feb 16, 2022 at 11:01 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> The purpose of this feature is to write the payload of the packet to a
> specified location in the receive buffer after the device receives the
> packet.
>
> |                    receive buffer                             |
> |                       offset                     |            |
> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload  |

This looks more like header padding instead of header split?

I had something similar thought in the past, I wonder why not simply
add the padding/hold via virtnet hdr?

| virtnet hdr| <- hold -> | mac | ip | tcp | payload |

>
> Based on this feature, we can obtain two benefits.
>
> 1. We can use a buffer of size "offset" plus a separate page when
>    allocating the receive buffer. In this way, we can ensure that all
>    payloads can be independently in a page,

Can we achieve this without this proposal today?

It looks to me it's possible if we allocated a dedicated sg for vnet
header + all the other headers.

> which is very beneficial for
>    the zerocopy implemented by the upper layer.

Right, it allows us to use TCP RX zerocopy.

So TCP zero copy requires header split which is kind of different from
what is being proposed here. AFAIK, it requires the hardware to place
the l4 payload in a descriptor. Is this something what you really
want:

VIRTIO_NET_F_HEADER_SPLIT: The device MUST place the l4 payload at the
start of the next descriptor (or drop it if there's no descriptor in
the chain).

Then we don't even need the stuff like, cvq/commands/protocol types.


>
> 2. We can include SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) when
>    setting the offset, so that in the linux implementation, we can
>    construct skb very efficiently based on build_skb(). No need to copy
>    the network header.

For spec patch, I think we need to avoid talking about any specific
driver implementation but describe the logic behind that.

>
>    For MRG_RX, we require that all payloads are placed in the position
>    specified by offset in each receive buffer. So we can also use
>    build_skb() without wasting memory. Because we can reuse the
>    "offset" parts that are not used.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  conformance.tex |  2 ++
>  content.tex     | 59 +++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 61 insertions(+)
>
> diff --git a/conformance.tex b/conformance.tex
> index 42f8537..e5d2ca8 100644
> --- a/conformance.tex
> +++ b/conformance.tex
> @@ -142,6 +142,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
>  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
>  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
> +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
>  \end{itemize}
>
>  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
> @@ -401,6 +402,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Gratuitous Packet Sending}
>  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
>  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
> +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
>  \end{itemize}
>
>  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
> diff --git a/content.tex b/content.tex
> index c6f116c..02cde55 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -3092,6 +3092,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>      channel.
>
> +\item[VIRTIO_NET_F_SPLIT_HEADER (55)] Device can separate the header and the
> +    payload. The payload will be placed at the specified offset.
> +
>  \item[VIRTIO_NET_F_HOST_USO (56)] Device can receive USO packets. Unlike UFO
>   (fragmenting the packet) the USO splits large UDP packet
>   to several segments when each of these smaller packets has UDP header.
> @@ -3139,6 +3142,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>  \item[VIRTIO_NET_F_CTRL_MAC_ADDR] Requires VIRTIO_NET_F_CTRL_VQ.
>  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> +\item[VIRTIO_NET_F_SPLIT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
>  \end{description}
>
>  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> @@ -3370,6 +3374,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
>  #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
>  #define VIRTIO_NET_HDR_F_DATA_VALID    2
>  #define VIRTIO_NET_HDR_F_RSC_INFO      4
> +#define VIRTIO_NET_HDR_F_SPLIT_HEADER  8
>          u8 flags;
>  #define VIRTIO_NET_HDR_GSO_NONE        0
>  #define VIRTIO_NET_HDR_GSO_TCPV4       1
> @@ -4471,6 +4476,60 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>  according to the native endian of the guest rather than
>  (necessarily when not using the legacy interface) little-endian.
>
> +\paragraph{Split Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> +
> +If the VIRTIO_NET_F_SPLIT_HEADER feature is negotiated, the device can separate
> +the header and the payload. The payload will be placed at the specified offset.
> +
> +\subparagraph{Split Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header / Setting Split Header}
> +
> +To configure the split header, the following layout structure and definitions
> +are used:
> +
> +\begin{lstlisting}
> +struct virtio_net_split_header_config {
> +#define VIRTIO_NET_SPLIT_HEADER_TYPE_TCPv4     1
> +#define VIRTIO_NET_SPLIT_HEADER_TYPE_TCPv6     2
> +#define VIRTIO_NET_SPLIT_HEADER_TYPE_UDPv4     4
> +#define VIRTIO_NET_SPLIT_HEADER_TYPE_UDPv6     8
> +    le64 type
> +    le64 offset;
> +};

Can the offset be calculated automatically by the device?

> +
> +#define VIRTIO_NET_CTRL_SPLIT_HEADER       6
> + #define VIRTIO_NET_CTRL_SPLIT_HEADER_SET   0
> +\end{lstlisting}
> +
> +The class VIRTIO_NET_CTRL_SPLIT_HEADER has one command:
> +VIRTIO_NET_CTRL_SPLIT_HEADER_SET applies the new split header configuration.
> +
> +\field{type} passed as command data is a bitmask, bits set define
> +packet types to split header, bits cleared - split header to be disabled.
> +
> +\field{offset}(from the beginning of the receive buffer) specifies where the
> +payload is placed.
> +
> +\devicenormative{\subparagraph}{Setting Split Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> +
> +Split header MUST be disabled after device initialization.
> +
> +If the packet header plus virtnet hdr exceeds \field{offset}, the device does
> +not need to split the header for this packet.

So the driver needs to deal with this "corner case".

> +
> +If the size of the payload is greater than the size of the receive buffer minus
> +\field{offset}, the device does not need to split the header for this packet.
> +
> +If the packet is successfully split header, then the type of virtnet hdr MUST
> +contains VIRTIO_NET_HDR_F_SPLIT_HEADER.
> +
> +If VIRTIO_NET_F_MRG_RXBUF is negotiated and the device is to use multiple
> +receive buffers, each subsequent receive buffer MUST skip the beginning of
> +offset.

This prevents the driver from coalsing pages.

Thanks

> +
> +\drivernormative{\subparagraph}{Setting Split Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> +
> +If VIRTIO_NET_F_SPLIT_HEADER negotiation is successful, the driver MUST be able
> +to properly handle packets containing VIRTIO_NET_HDR_F_SPLIT_HEADER.
>
>  \subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device
>  Types / Network Device / Legacy Interface: Framing Requirements}
> --
> 2.31.0
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [virtio-dev] Re: [PATCH] virtio_net: support split header
  2022-02-16  4:32 ` [virtio-dev] " Jason Wang
@ 2022-02-16  6:05   ` Xuan Zhuo
  2022-02-17  7:01     ` Jason Wang
  0 siblings, 1 reply; 12+ messages in thread
From: Xuan Zhuo @ 2022-02-16  6:05 UTC (permalink / raw)
  To: Jason Wang; +Cc: Virtio-Dev, Michael S. Tsirkin

On Wed, 16 Feb 2022 12:32:03 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Feb 16, 2022 at 11:01 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > The purpose of this feature is to write the payload of the packet to a
> > specified location in the receive buffer after the device receives the
> > packet.
> >
> > |                    receive buffer                             |
> > |                       offset                     |            |
> > | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload  |
>
> This looks more like header padding instead of header split?
>
> I had something similar thought in the past, I wonder why not simply
> add the padding/hold via virtnet hdr?
>
> | virtnet hdr| <- hold -> | mac | ip | tcp | payload |
>
> >
> > Based on this feature, we can obtain two benefits.
> >
> > 1. We can use a buffer of size "offset" plus a separate page when
> >    allocating the receive buffer. In this way, we can ensure that all
> >    payloads can be independently in a page,
>
> Can we achieve this without this proposal today?
>
> It looks to me it's possible if we allocated a dedicated sg for vnet
> header + all the other headers.
>
> > which is very beneficial for
> >    the zerocopy implemented by the upper layer.
>
> Right, it allows us to use TCP RX zerocopy.
>
> So TCP zero copy requires header split which is kind of different from
> what is being proposed here. AFAIK, it requires the hardware to place
> the l4 payload in a descriptor. Is this something what you really
> want:
>
> VIRTIO_NET_F_HEADER_SPLIT: The device MUST place the l4 payload at the
> start of the next descriptor (or drop it if there's no descriptor in
> the chain).

Yes, in fact I am referring to such a function.

In order to make it more general, I introduced a mechanism such as offset.
Because in some cases, we can obtain continuous memory while meeting the
requirements, using offset can save a desc.

|     page    |    page    |
|      |offset|            |
       ^
    pointer

At the same time I want to specify which types of packages to use this function.
And I hope this may be dynamically switchable. So introduce cvq/command.


>
> Then we don't even need the stuff like, cvq/commands/protocol types.
>
>
> >
> > 2. We can include SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) when
> >    setting the offset, so that in the linux implementation, we can
> >    construct skb very efficiently based on build_skb(). No need to copy
> >    the network header.
>
> For spec patch, I think we need to avoid talking about any specific
> driver implementation but describe the logic behind that.

Yes.

>
> >
> >    For MRG_RX, we require that all payloads are placed in the position
> >    specified by offset in each receive buffer. So we can also use
> >    build_skb() without wasting memory. Because we can reuse the
> >    "offset" parts that are not used.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  conformance.tex |  2 ++
> >  content.tex     | 59 +++++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 61 insertions(+)
> >
> > diff --git a/conformance.tex b/conformance.tex
> > index 42f8537..e5d2ca8 100644
> > --- a/conformance.tex
> > +++ b/conformance.tex
> > @@ -142,6 +142,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
> >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
> >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
> > +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> >  \end{itemize}
> >
> >  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
> > @@ -401,6 +402,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Gratuitous Packet Sending}
> >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
> >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
> > +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> >  \end{itemize}
> >
> >  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
> > diff --git a/content.tex b/content.tex
> > index c6f116c..02cde55 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -3092,6 +3092,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
> >  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> >      channel.
> >
> > +\item[VIRTIO_NET_F_SPLIT_HEADER (55)] Device can separate the header and the
> > +    payload. The payload will be placed at the specified offset.
> > +
> >  \item[VIRTIO_NET_F_HOST_USO (56)] Device can receive USO packets. Unlike UFO
> >   (fragmenting the packet) the USO splits large UDP packet
> >   to several segments when each of these smaller packets has UDP header.
> > @@ -3139,6 +3142,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
> >  \item[VIRTIO_NET_F_CTRL_MAC_ADDR] Requires VIRTIO_NET_F_CTRL_VQ.
> >  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
> >  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> > +\item[VIRTIO_NET_F_SPLIT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
> >  \end{description}
> >
> >  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> > @@ -3370,6 +3374,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
> >  #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
> >  #define VIRTIO_NET_HDR_F_DATA_VALID    2
> >  #define VIRTIO_NET_HDR_F_RSC_INFO      4
> > +#define VIRTIO_NET_HDR_F_SPLIT_HEADER  8
> >          u8 flags;
> >  #define VIRTIO_NET_HDR_GSO_NONE        0
> >  #define VIRTIO_NET_HDR_GSO_TCPV4       1
> > @@ -4471,6 +4476,60 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> >  according to the native endian of the guest rather than
> >  (necessarily when not using the legacy interface) little-endian.
> >
> > +\paragraph{Split Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> > +
> > +If the VIRTIO_NET_F_SPLIT_HEADER feature is negotiated, the device can separate
> > +the header and the payload. The payload will be placed at the specified offset.
> > +
> > +\subparagraph{Split Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header / Setting Split Header}
> > +
> > +To configure the split header, the following layout structure and definitions
> > +are used:
> > +
> > +\begin{lstlisting}
> > +struct virtio_net_split_header_config {
> > +#define VIRTIO_NET_SPLIT_HEADER_TYPE_TCPv4     1
> > +#define VIRTIO_NET_SPLIT_HEADER_TYPE_TCPv6     2
> > +#define VIRTIO_NET_SPLIT_HEADER_TYPE_UDPv4     4
> > +#define VIRTIO_NET_SPLIT_HEADER_TYPE_UDPv6     8
> > +    le64 type
> > +    le64 offset;
> > +};
>
> Can the offset be calculated automatically by the device?
>

No

> > +
> > +#define VIRTIO_NET_CTRL_SPLIT_HEADER       6
> > + #define VIRTIO_NET_CTRL_SPLIT_HEADER_SET   0
> > +\end{lstlisting}
> > +
> > +The class VIRTIO_NET_CTRL_SPLIT_HEADER has one command:
> > +VIRTIO_NET_CTRL_SPLIT_HEADER_SET applies the new split header configuration.
> > +
> > +\field{type} passed as command data is a bitmask, bits set define
> > +packet types to split header, bits cleared - split header to be disabled.
> > +
> > +\field{offset}(from the beginning of the receive buffer) specifies where the
> > +payload is placed.
> > +
> > +\devicenormative{\subparagraph}{Setting Split Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> > +
> > +Split header MUST be disabled after device initialization.
> > +
> > +If the packet header plus virtnet hdr exceeds \field{offset}, the device does
> > +not need to split the header for this packet.
>
> So the driver needs to deal with this "corner case".

Yes, the driver will always receive some packets that are not header split. Use
virtnet hdr type to distinguish.


>
> > +
> > +If the size of the payload is greater than the size of the receive buffer minus
> > +\field{offset}, the device does not need to split the header for this packet.
> > +
> > +If the packet is successfully split header, then the type of virtnet hdr MUST
> > +contains VIRTIO_NET_HDR_F_SPLIT_HEADER.
> > +
> > +If VIRTIO_NET_F_MRG_RXBUF is negotiated and the device is to use multiple
> > +receive buffers, each subsequent receive buffer MUST skip the beginning of
> > +offset.
>
> This prevents the driver from coalsing pages.

In fact, when we allocate, the header part and the payload part can be allocated
separately, so that the payload part of multiple receive buffers can come from
consecutive pages, so that merged pages can be achieved.


     | header0 | header1 | header2 |
     | page0 | page1 | page2 |

receive buf0 => header0 + page0
receive buf1 => header1 + page1
receive buf2 => header2 + page2

Thanks.


>
> Thanks
>
> > +
> > +\drivernormative{\subparagraph}{Setting Split Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> > +
> > +If VIRTIO_NET_F_SPLIT_HEADER negotiation is successful, the driver MUST be able
> > +to properly handle packets containing VIRTIO_NET_HDR_F_SPLIT_HEADER.
> >
> >  \subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device
> >  Types / Network Device / Legacy Interface: Framing Requirements}
> > --
> > 2.31.0
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [virtio-dev] Re: [PATCH] virtio_net: support split header
  2022-02-16  6:05   ` Xuan Zhuo
@ 2022-02-17  7:01     ` Jason Wang
  0 siblings, 0 replies; 12+ messages in thread
From: Jason Wang @ 2022-02-17  7:01 UTC (permalink / raw)
  To: Xuan Zhuo; +Cc: Virtio-Dev, Michael S. Tsirkin

On Wed, Feb 16, 2022 at 2:31 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 16 Feb 2022 12:32:03 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Wed, Feb 16, 2022 at 11:01 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > The purpose of this feature is to write the payload of the packet to a
> > > specified location in the receive buffer after the device receives the
> > > packet.
> > >
> > > |                    receive buffer                             |
> > > |                       offset                     |            |
> > > | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload  |
> >
> > This looks more like header padding instead of header split?
> >
> > I had something similar thought in the past, I wonder why not simply
> > add the padding/hold via virtnet hdr?
> >
> > | virtnet hdr| <- hold -> | mac | ip | tcp | payload |
> >
> > >
> > > Based on this feature, we can obtain two benefits.
> > >
> > > 1. We can use a buffer of size "offset" plus a separate page when
> > >    allocating the receive buffer. In this way, we can ensure that all
> > >    payloads can be independently in a page,
> >
> > Can we achieve this without this proposal today?
> >
> > It looks to me it's possible if we allocated a dedicated sg for vnet
> > header + all the other headers.
> >
> > > which is very beneficial for
> > >    the zerocopy implemented by the upper layer.
> >
> > Right, it allows us to use TCP RX zerocopy.
> >
> > So TCP zero copy requires header split which is kind of different from
> > what is being proposed here. AFAIK, it requires the hardware to place
> > the l4 payload in a descriptor. Is this something what you really
> > want:
> >
> > VIRTIO_NET_F_HEADER_SPLIT: The device MUST place the l4 payload at the
> > start of the next descriptor (or drop it if there's no descriptor in
> > the chain).
>
> Yes, in fact I am referring to such a function.
>
> In order to make it more general, I introduced a mechanism such as offset.
> Because in some cases, we can obtain continuous memory while meeting the
> requirements, using offset can save a desc.
>
> |     page    |    page    |
> |      |offset|            |
>        ^
>     pointer
>

This looks rare, e.g it requires the header start near the end of a
page which looks odd (at least not the behaviour of the current
Linux). Having dedicated buffer for header and data seems much more
simpler.

Btw, using dedicated buffer descriptors seems more general and it is
the way that is used in real hardware vendors (e.g xl710).

> At the same time I want to specify which types of packages to use this function.
> And I hope this may be dynamically switchable. So introduce cvq/command.

That's fine just as control_offload but it needs to be designed that
not having too much of configurations (e.g we may want to support
tunneling anyhow).

>
>
> >
> > Then we don't even need the stuff like, cvq/commands/protocol types.
> >
> >
> > >
> > > 2. We can include SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) when
> > >    setting the offset, so that in the linux implementation, we can
> > >    construct skb very efficiently based on build_skb(). No need to copy
> > >    the network header.
> >
> > For spec patch, I think we need to avoid talking about any specific
> > driver implementation but describe the logic behind that.
>
> Yes.
>
> >
> > >
> > >    For MRG_RX, we require that all payloads are placed in the position
> > >    specified by offset in each receive buffer. So we can also use
> > >    build_skb() without wasting memory. Because we can reuse the
> > >    "offset" parts that are not used.
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  conformance.tex |  2 ++
> > >  content.tex     | 59 +++++++++++++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 61 insertions(+)
> > >
> > > diff --git a/conformance.tex b/conformance.tex
> > > index 42f8537..e5d2ca8 100644
> > > --- a/conformance.tex
> > > +++ b/conformance.tex
> > > @@ -142,6 +142,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> > >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
> > >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
> > >  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
> > > +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> > >  \end{itemize}
> > >
> > >  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
> > > @@ -401,6 +402,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> > >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Gratuitous Packet Sending}
> > >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
> > >  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
> > > +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> > >  \end{itemize}
> > >
> > >  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
> > > diff --git a/content.tex b/content.tex
> > > index c6f116c..02cde55 100644
> > > --- a/content.tex
> > > +++ b/content.tex
> > > @@ -3092,6 +3092,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
> > >  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> > >      channel.
> > >
> > > +\item[VIRTIO_NET_F_SPLIT_HEADER (55)] Device can separate the header and the
> > > +    payload. The payload will be placed at the specified offset.
> > > +
> > >  \item[VIRTIO_NET_F_HOST_USO (56)] Device can receive USO packets. Unlike UFO
> > >   (fragmenting the packet) the USO splits large UDP packet
> > >   to several segments when each of these smaller packets has UDP header.
> > > @@ -3139,6 +3142,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
> > >  \item[VIRTIO_NET_F_CTRL_MAC_ADDR] Requires VIRTIO_NET_F_CTRL_VQ.
> > >  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
> > >  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> > > +\item[VIRTIO_NET_F_SPLIT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
> > >  \end{description}
> > >
> > >  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> > > @@ -3370,6 +3374,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
> > >  #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
> > >  #define VIRTIO_NET_HDR_F_DATA_VALID    2
> > >  #define VIRTIO_NET_HDR_F_RSC_INFO      4
> > > +#define VIRTIO_NET_HDR_F_SPLIT_HEADER  8
> > >          u8 flags;
> > >  #define VIRTIO_NET_HDR_GSO_NONE        0
> > >  #define VIRTIO_NET_HDR_GSO_TCPV4       1
> > > @@ -4471,6 +4476,60 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> > >  according to the native endian of the guest rather than
> > >  (necessarily when not using the legacy interface) little-endian.
> > >
> > > +\paragraph{Split Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> > > +
> > > +If the VIRTIO_NET_F_SPLIT_HEADER feature is negotiated, the device can separate
> > > +the header and the payload. The payload will be placed at the specified offset.
> > > +
> > > +\subparagraph{Split Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header / Setting Split Header}
> > > +
> > > +To configure the split header, the following layout structure and definitions
> > > +are used:
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_net_split_header_config {
> > > +#define VIRTIO_NET_SPLIT_HEADER_TYPE_TCPv4     1
> > > +#define VIRTIO_NET_SPLIT_HEADER_TYPE_TCPv6     2
> > > +#define VIRTIO_NET_SPLIT_HEADER_TYPE_UDPv4     4
> > > +#define VIRTIO_NET_SPLIT_HEADER_TYPE_UDPv6     8
> > > +    le64 type
> > > +    le64 offset;
> > > +};
> >
> > Can the offset be calculated automatically by the device?
> >
>
> No

So the device knows the buffer address, and the header format is well
known. I don't get why it can't.

>
> > > +
> > > +#define VIRTIO_NET_CTRL_SPLIT_HEADER       6
> > > + #define VIRTIO_NET_CTRL_SPLIT_HEADER_SET   0
> > > +\end{lstlisting}
> > > +
> > > +The class VIRTIO_NET_CTRL_SPLIT_HEADER has one command:
> > > +VIRTIO_NET_CTRL_SPLIT_HEADER_SET applies the new split header configuration.
> > > +
> > > +\field{type} passed as command data is a bitmask, bits set define
> > > +packet types to split header, bits cleared - split header to be disabled.
> > > +
> > > +\field{offset}(from the beginning of the receive buffer) specifies where the
> > > +payload is placed.
> > > +
> > > +\devicenormative{\subparagraph}{Setting Split Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> > > +
> > > +Split header MUST be disabled after device initialization.
> > > +
> > > +If the packet header plus virtnet hdr exceeds \field{offset}, the device does
> > > +not need to split the header for this packet.
> >
> > So the driver needs to deal with this "corner case".
>
> Yes, the driver will always receive some packets that are not header split. Use
> virtnet hdr type to distinguish.
>
>
> >
> > > +
> > > +If the size of the payload is greater than the size of the receive buffer minus
> > > +\field{offset}, the device does not need to split the header for this packet.
> > > +
> > > +If the packet is successfully split header, then the type of virtnet hdr MUST
> > > +contains VIRTIO_NET_HDR_F_SPLIT_HEADER.
> > > +
> > > +If VIRTIO_NET_F_MRG_RXBUF is negotiated and the device is to use multiple
> > > +receive buffers, each subsequent receive buffer MUST skip the beginning of
> > > +offset.
> >
> > This prevents the driver from coalsing pages.
>
> In fact, when we allocate, the header part and the payload part can be allocated
> separately, so that the payload part of multiple receive buffers can come from
> consecutive pages, so that merged pages can be achieved.

So this looks like the dedicated descriptor instead of the offset
you're proposing here.

If we use dedicated descriptor for header and data, we can let the
device always use data descriptor and ignore the header in the case of
multiply buffers.

Thanks

>
>
>      | header0 | header1 | header2 |
>      | page0 | page1 | page2 |
>
> receive buf0 => header0 + page0
> receive buf1 => header1 + page1
> receive buf2 => header2 + page2
>
> Thanks.
>
>
> >
> > Thanks
> >
> > > +
> > > +\drivernormative{\subparagraph}{Setting Split Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> > > +
> > > +If VIRTIO_NET_F_SPLIT_HEADER negotiation is successful, the driver MUST be able
> > > +to properly handle packets containing VIRTIO_NET_HDR_F_SPLIT_HEADER.
> > >
> > >  \subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device
> > >  Types / Network Device / Legacy Interface: Framing Requirements}
> > > --
> > > 2.31.0
> > >
> >
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [virtio-dev] [PATCH] virtio_net: support split header
@ 2022-08-01  6:59 Heng Qi
  2022-08-01  8:28 ` [virtio-dev] " Heng Qi
  2022-08-04  6:27 ` Jason Wang
  0 siblings, 2 replies; 12+ messages in thread
From: Heng Qi @ 2022-08-01  6:59 UTC (permalink / raw)
  To: virtio-dev; +Cc: jasowang, mst, xuanzhuo, kangjie.xu

From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

The purpose of this feature is to split the header and the payload of
the packet.

|                    receive buffer                                    |
|                       0th descriptor             | 1th descriptor    |
| virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload         |

We can use a buffer plus a separate page when allocating the receive
buffer. In this way, we can ensure that all payloads can be
independently in a page, which is very beneficial for the zerocopy
implemented by the upper layer.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
---

v6:
    1. Fix some syntax issues. @Cornelia Huck
    2. Clarify some paragraphs. @Cornelia Huck
    3. Determine the device what to do if it does not perform header split on a packet.

v5:
    1. Determine when hdr_len is credible in the process of rx
    2. Clean up the use of buffers and descriptors
    3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge

v4:
    1. fix typo @Cornelia Huck @Jason Wang
    2. do not split header for IP fragmentation packet. @Jason Wang

v3:
    1. Fix some syntax issues
    2. Fix some terminology issues
    3. It is not unified with ip alignment, so ip alignment is not included
    4. Make it clear that the device must support four types, in the case of
    successful negotiation.

 conformance.tex |   2 +
 content.tex     | 111 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 109 insertions(+), 4 deletions(-)

diff --git a/conformance.tex b/conformance.tex
index 2b86fc6..bd0f463 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
 \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
+\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
 \end{itemize}
 
 \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
@@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
 \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
+\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
 \end{itemize}
 
 \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
diff --git a/content.tex b/content.tex
index e863709..74c36fe 100644
--- a/content.tex
+++ b/content.tex
@@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
 \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
     channel.
 
+\item[VIRTIO_NET_F_SPLIT_HEADER (52)] Device supports splitting the protocol
+    header and the payload.
+
 \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
 
 \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
@@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
 \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
 \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
 \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
+\item[VIRTIO_NET_F_SPLIT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
 \end{description}
 
 \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
@@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
 #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
 #define VIRTIO_NET_HDR_F_DATA_VALID    2
 #define VIRTIO_NET_HDR_F_RSC_INFO      4
+#define VIRTIO_NET_HDR_F_SPLIT_HEADER  8
         u8 flags;
 #define VIRTIO_NET_HDR_GSO_NONE        0
 #define VIRTIO_NET_HDR_GSO_TCPV4       1
@@ -3799,9 +3804,10 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
 not set VIRTIO_NET_HDR_F_RSC_INFO bit in \field{flags}.
 
 If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
-been negotiated, the device SHOULD set \field{hdr_len} to a value
+been negotiated and the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags}
+is not set, the device SHOULD set \field{hdr_len} to a value
 not less than the length of the headers, including the transport
-header.
+header \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header / Setting Split Header}.
 
 If the VIRTIO_NET_F_GUEST_CSUM feature has been negotiated, the
 device MAY set the VIRTIO_NET_HDR_F_DATA_VALID bit in
@@ -3820,9 +3826,13 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
 driver MUST NOT use the \field{csum_start} and \field{csum_offset}.
 
 If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
-been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
+been negotiated and the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags}
+is not set, the driver MAY use \field{hdr_len} only as a hint about the
 transport header size.
-The driver MUST NOT rely on \field{hdr_len} to be correct.
+
+If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is not set, the driver
+MUST NOT rely on \field{hdr_len} to be correct.
+
 \begin{note}
 This is due to various bugs in implementations.
 \end{note}
@@ -4483,6 +4493,99 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
 according to the native endian of the guest rather than
 (necessarily when not using the legacy interface) little-endian.
 
+\paragraph{Split Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
+
+If the VIRTIO_NET_F_SPLIT_HEADER feature is negotiated,
+the device supports splitting the protocol header and the payload.
+The protocol header and the payload will be separated into different
+descriptors.
+
+\subparagraph{Split Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header / Setting Split Header}
+
+To configure the split header, the following layout structure and definitions
+are used:
+
+\begin{lstlisting}
+struct virtio_net_split_header_config {
+#define VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4     (1 << 0)
+#define VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6     (1 << 1)
+#define VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4     (1 << 2)
+#define VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6     (1 << 3)
+    le64 type;
+};
+
+#define VIRTIO_NET_CTRL_SPLIT_HEADER       6
+ #define VIRTIO_NET_CTRL_SPLIT_HEADER_SET   0
+\end{lstlisting}
+
+The class VIRTIO_NET_CTRL_SPLIT_HEADER has one command:
+VIRTIO_NET_CTRL_SPLIT_HEADER_SET applies the new split header configuration.
+
+The driver can enable or disable split header for different protocols by
+setting or clearing corresponding bits in \field{type}.
+
+\begin{itemize}
+    \item VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4: split after ipv4 tcp header
+    \item VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6: split after ipv6 tcp header
+    \item VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4: split after ipv4 udp header
+    \item VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6: split after ipv6 udp header
+\end{itemize}
+
+\devicenormative{\subparagraph}{Setting Split Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
+
+A device MUST initialize \field{type} to 0, and MUST set it to 0
+upon device reset.
+
+If VIRTIO_NET_F_SPLIT_HEADER is negotiated, the device MUST support
+VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6,
+VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6.
+
+A device MUST NOT split the header if it encounters any of the following cases:
+\begin{itemize}
+    \item the device does not recognize the protocol of the packet.
+    \item the packet is an IP fragmentation.
+    \item \field{type} does not include the protocol of the packet.
+    \item the buffer consists of only one descriptor.
+    \item the total size of the virtio net header and the protocol header exceeds
+        the size of the first descriptor.
+    \item when VIRTIO_NET_F_MRG_RXBUF is not negotiated and the size of the
+        payload exceeds the size of the descriptor chain starting from the 2nd
+        descriptor.
+\end{itemize}
+
+If the header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_HEADER bit
+in \field{flags} MUST be set. The protocol header MUST be on the first
+descriptor, following the virtio net header. The payload MUST start from the
+second descriptor. The device MUST set \field{hdr_len} of structure
+virtio_net_hdr to the length of the protocol header.
+
+If all of the following applies:
+\begin{itemize}
+    \item the header is split by the device.
+    \item VIRTIO_NET_F_MRG_RXBUF has been negotiated.
+    \item the received packet is spread over multiple buffers.
+\end {itemize}
+then if the device uses the buffers after the 1st buffer, and the buffer
+consists of multiple descriptors, the device MUST skip the first descriptor,
+because the first descriptor is used to carry the protocol header.
+The used length still reports the number of bytes it has written to memory.
+
+If VIRTIO_NET_F_SPLIT_HEADER is negotiated, but the device does not split the header
+and the buffer consists of at least two descriptors, the device MUST put the virtio net header
+in the first descriptor, and MUST NOT set the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags}.
+The device MUST use the second descriptor to store the packet, since the first descriptor
+is designed to accommodate the header and is tiny for the packet.
+
+\drivernormative{\subparagraph}{Setting Split Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
+
+If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is set, the driver
+MUST treat the contents of \field{hdr_len} as the length of the protocol header
+inside the first descriptor.
+
+If the split header is enabled, the buffers submitted to receiveq by the
+driver SHOULD be composed of at least two descriptors.
+When the buffer consists of two descriptors, the length of the first
+descriptor MUST be greater than the one of the virtio net header.
 
 \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
 
-- 
1.8.3.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [virtio-dev] Re: [PATCH] virtio_net: support split header
  2022-08-01  6:59 [virtio-dev] [PATCH] virtio_net: support split header Heng Qi
@ 2022-08-01  8:28 ` Heng Qi
  2022-08-04  6:27 ` Jason Wang
  1 sibling, 0 replies; 12+ messages in thread
From: Heng Qi @ 2022-08-01  8:28 UTC (permalink / raw)
  To: virtio-dev; +Cc: jasowang, mst, xuanzhuo, kangjie.xu

The content of the mail just sent is about "[PATCH v6] virtio_net: 
support split header".

在 2022/8/1 下午2:59, Heng Qi 写道:
> From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>
> The purpose of this feature is to split the header and the payload of
> the packet.
>
> |                    receive buffer                                    |
> |                       0th descriptor             | 1th descriptor    |
> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload         |
>
> We can use a buffer plus a separate page when allocating the receive
> buffer. In this way, we can ensure that all payloads can be
> independently in a page, which is very beneficial for the zerocopy
> implemented by the upper layer.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Reviewed-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
> ---
>
> v6:
>      1. Fix some syntax issues. @Cornelia Huck
>      2. Clarify some paragraphs. @Cornelia Huck
>      3. Determine the device what to do if it does not perform header split on a packet.
>
> v5:
>      1. Determine when hdr_len is credible in the process of rx
>      2. Clean up the use of buffers and descriptors
>      3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
>
> v4:
>      1. fix typo @Cornelia Huck @Jason Wang
>      2. do not split header for IP fragmentation packet. @Jason Wang
>
> v3:
>      1. Fix some syntax issues
>      2. Fix some terminology issues
>      3. It is not unified with ip alignment, so ip alignment is not included
>      4. Make it clear that the device must support four types, in the case of
>      successful negotiation.
>
>   conformance.tex |   2 +
>   content.tex     | 111 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>   2 files changed, 109 insertions(+), 4 deletions(-)
>
> diff --git a/conformance.tex b/conformance.tex
> index 2b86fc6..bd0f463 100644
> --- a/conformance.tex
> +++ b/conformance.tex
> @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
>   \end{itemize}
>   
>   \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
> @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
>   \end{itemize}
>   
>   \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
> diff --git a/content.tex b/content.tex
> index e863709..74c36fe 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>       channel.
>   
> +\item[VIRTIO_NET_F_SPLIT_HEADER (52)] Device supports splitting the protocol
> +    header and the payload.
> +
>   \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
>   
>   \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>   \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>   \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>   \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> +\item[VIRTIO_NET_F_SPLIT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
>   \end{description}
>   
>   \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
>   #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
>   #define VIRTIO_NET_HDR_F_DATA_VALID    2
>   #define VIRTIO_NET_HDR_F_RSC_INFO      4
> +#define VIRTIO_NET_HDR_F_SPLIT_HEADER  8
>           u8 flags;
>   #define VIRTIO_NET_HDR_GSO_NONE        0
>   #define VIRTIO_NET_HDR_GSO_TCPV4       1
> @@ -3799,9 +3804,10 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>   not set VIRTIO_NET_HDR_F_RSC_INFO bit in \field{flags}.
>   
>   If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
> -been negotiated, the device SHOULD set \field{hdr_len} to a value
> +been negotiated and the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags}
> +is not set, the device SHOULD set \field{hdr_len} to a value
>   not less than the length of the headers, including the transport
> -header.
> +header \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header / Setting Split Header}.
>   
>   If the VIRTIO_NET_F_GUEST_CSUM feature has been negotiated, the
>   device MAY set the VIRTIO_NET_HDR_F_DATA_VALID bit in
> @@ -3820,9 +3826,13 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>   driver MUST NOT use the \field{csum_start} and \field{csum_offset}.
>   
>   If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
> -been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
> +been negotiated and the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags}
> +is not set, the driver MAY use \field{hdr_len} only as a hint about the
>   transport header size.
> -The driver MUST NOT rely on \field{hdr_len} to be correct.
> +
> +If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is not set, the driver
> +MUST NOT rely on \field{hdr_len} to be correct.
> +
>   \begin{note}
>   This is due to various bugs in implementations.
>   \end{note}
> @@ -4483,6 +4493,99 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>   according to the native endian of the guest rather than
>   (necessarily when not using the legacy interface) little-endian.
>   
> +\paragraph{Split Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> +
> +If the VIRTIO_NET_F_SPLIT_HEADER feature is negotiated,
> +the device supports splitting the protocol header and the payload.
> +The protocol header and the payload will be separated into different
> +descriptors.
> +
> +\subparagraph{Split Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header / Setting Split Header}
> +
> +To configure the split header, the following layout structure and definitions
> +are used:
> +
> +\begin{lstlisting}
> +struct virtio_net_split_header_config {
> +#define VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4     (1 << 0)
> +#define VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6     (1 << 1)
> +#define VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4     (1 << 2)
> +#define VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6     (1 << 3)
> +    le64 type;
> +};
> +
> +#define VIRTIO_NET_CTRL_SPLIT_HEADER       6
> + #define VIRTIO_NET_CTRL_SPLIT_HEADER_SET   0
> +\end{lstlisting}
> +
> +The class VIRTIO_NET_CTRL_SPLIT_HEADER has one command:
> +VIRTIO_NET_CTRL_SPLIT_HEADER_SET applies the new split header configuration.
> +
> +The driver can enable or disable split header for different protocols by
> +setting or clearing corresponding bits in \field{type}.
> +
> +\begin{itemize}
> +    \item VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4: split after ipv4 tcp header
> +    \item VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6: split after ipv6 tcp header
> +    \item VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4: split after ipv4 udp header
> +    \item VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6: split after ipv6 udp header
> +\end{itemize}
> +
> +\devicenormative{\subparagraph}{Setting Split Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> +
> +A device MUST initialize \field{type} to 0, and MUST set it to 0
> +upon device reset.
> +
> +If VIRTIO_NET_F_SPLIT_HEADER is negotiated, the device MUST support
> +VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6,
> +VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6.
> +
> +A device MUST NOT split the header if it encounters any of the following cases:
> +\begin{itemize}
> +    \item the device does not recognize the protocol of the packet.
> +    \item the packet is an IP fragmentation.
> +    \item \field{type} does not include the protocol of the packet.
> +    \item the buffer consists of only one descriptor.
> +    \item the total size of the virtio net header and the protocol header exceeds
> +        the size of the first descriptor.
> +    \item when VIRTIO_NET_F_MRG_RXBUF is not negotiated and the size of the
> +        payload exceeds the size of the descriptor chain starting from the 2nd
> +        descriptor.
> +\end{itemize}
> +
> +If the header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_HEADER bit
> +in \field{flags} MUST be set. The protocol header MUST be on the first
> +descriptor, following the virtio net header. The payload MUST start from the
> +second descriptor. The device MUST set \field{hdr_len} of structure
> +virtio_net_hdr to the length of the protocol header.
> +
> +If all of the following applies:
> +\begin{itemize}
> +    \item the header is split by the device.
> +    \item VIRTIO_NET_F_MRG_RXBUF has been negotiated.
> +    \item the received packet is spread over multiple buffers.
> +\end {itemize}
> +then if the device uses the buffers after the 1st buffer, and the buffer
> +consists of multiple descriptors, the device MUST skip the first descriptor,
> +because the first descriptor is used to carry the protocol header.
> +The used length still reports the number of bytes it has written to memory.
> +
> +If VIRTIO_NET_F_SPLIT_HEADER is negotiated, but the device does not split the header
> +and the buffer consists of at least two descriptors, the device MUST put the virtio net header
> +in the first descriptor, and MUST NOT set the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags}.
> +The device MUST use the second descriptor to store the packet, since the first descriptor
> +is designed to accommodate the header and is tiny for the packet.
> +
> +\drivernormative{\subparagraph}{Setting Split Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> +
> +If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is set, the driver
> +MUST treat the contents of \field{hdr_len} as the length of the protocol header
> +inside the first descriptor.
> +
> +If the split header is enabled, the buffers submitted to receiveq by the
> +driver SHOULD be composed of at least two descriptors.
> +When the buffer consists of two descriptors, the length of the first
> +descriptor MUST be greater than the one of the virtio net header.
>   
>   \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [virtio-dev] Re: [PATCH] virtio_net: support split header
  2022-08-01  6:59 [virtio-dev] [PATCH] virtio_net: support split header Heng Qi
  2022-08-01  8:28 ` [virtio-dev] " Heng Qi
@ 2022-08-04  6:27 ` Jason Wang
  2022-08-04 13:01   ` Heng Qi
       [not found]   ` <4ea77b4a-2571-2b12-433c-2f1a8ceaeff8@linux.alibaba.com>
  1 sibling, 2 replies; 12+ messages in thread
From: Jason Wang @ 2022-08-04  6:27 UTC (permalink / raw)
  To: Heng Qi; +Cc: Virtio-Dev, mst, Xuan Zhuo, Kangjie Xu

On Mon, Aug 1, 2022 at 2:59 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
> From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>
> The purpose of this feature is to split the header and the payload of
> the packet.
>
> |                    receive buffer                                    |
> |                       0th descriptor             | 1th descriptor    |
> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload         |
>
> We can use a buffer plus a separate page when allocating the receive
> buffer. In this way, we can ensure that all payloads can be
> independently in a page, which is very beneficial for the zerocopy
> implemented by the upper layer.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Reviewed-by: Kangjie Xu <kangjie.xu@linux.alibaba.com>
> ---
>
> v6:
>     1. Fix some syntax issues. @Cornelia Huck
>     2. Clarify some paragraphs. @Cornelia Huck
>     3. Determine the device what to do if it does not perform header split on a packet.
>
> v5:
>     1. Determine when hdr_len is credible in the process of rx
>     2. Clean up the use of buffers and descriptors
>     3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
>
> v4:
>     1. fix typo @Cornelia Huck @Jason Wang
>     2. do not split header for IP fragmentation packet. @Jason Wang
>
> v3:
>     1. Fix some syntax issues
>     2. Fix some terminology issues
>     3. It is not unified with ip alignment, so ip alignment is not included
>     4. Make it clear that the device must support four types, in the case of
>     successful negotiation.
>
>  conformance.tex |   2 +
>  content.tex     | 111 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 109 insertions(+), 4 deletions(-)
>
> diff --git a/conformance.tex b/conformance.tex
> index 2b86fc6..bd0f463 100644
> --- a/conformance.tex
> +++ b/conformance.tex
> @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
>  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>  \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
>  \end{itemize}
>
>  \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
> @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
>  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
>  \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
>  \end{itemize}
>
>  \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
> diff --git a/content.tex b/content.tex
> index e863709..74c36fe 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>  \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>      channel.
>
> +\item[VIRTIO_NET_F_SPLIT_HEADER (52)] Device supports splitting the protocol
> +    header and the payload.
> +
>  \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
>
>  \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>  \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>  \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>  \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> +\item[VIRTIO_NET_F_SPLIT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
>  \end{description}
>
>  \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
>  #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
>  #define VIRTIO_NET_HDR_F_DATA_VALID    2
>  #define VIRTIO_NET_HDR_F_RSC_INFO      4
> +#define VIRTIO_NET_HDR_F_SPLIT_HEADER  8
>          u8 flags;
>  #define VIRTIO_NET_HDR_GSO_NONE        0
>  #define VIRTIO_NET_HDR_GSO_TCPV4       1
> @@ -3799,9 +3804,10 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>  not set VIRTIO_NET_HDR_F_RSC_INFO bit in \field{flags}.
>
>  If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
> -been negotiated, the device SHOULD set \field{hdr_len} to a value
> +been negotiated and the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags}
> +is not set, the device SHOULD set \field{hdr_len} to a value
>  not less than the length of the headers, including the transport
> -header.
> +header \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header / Setting Split Header}.
>
>  If the VIRTIO_NET_F_GUEST_CSUM feature has been negotiated, the
>  device MAY set the VIRTIO_NET_HDR_F_DATA_VALID bit in
> @@ -3820,9 +3826,13 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>  driver MUST NOT use the \field{csum_start} and \field{csum_offset}.
>
>  If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
> -been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
> +been negotiated and the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags}
> +is not set, the driver MAY use \field{hdr_len} only as a hint about the
>  transport header size.
> -The driver MUST NOT rely on \field{hdr_len} to be correct.
> +
> +If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is not set, the driver
> +MUST NOT rely on \field{hdr_len} to be correct.

I think we should keep the above description as-is. For whatever case,
the driver must not trust the metadata set by the device and must
perform necessary sanity tests on them.

> +
>  \begin{note}
>  This is due to various bugs in implementations.
>  \end{note}
> @@ -4483,6 +4493,99 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>  according to the native endian of the guest rather than
>  (necessarily when not using the legacy interface) little-endian.
>
> +\paragraph{Split Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> +
> +If the VIRTIO_NET_F_SPLIT_HEADER feature is negotiated,
> +the device supports splitting the protocol header and the payload.
> +The protocol header and the payload will be separated into different
> +descriptors.
> +
> +\subparagraph{Split Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header / Setting Split Header}
> +
> +To configure the split header, the following layout structure and definitions
> +are used:
> +
> +\begin{lstlisting}
> +struct virtio_net_split_header_config {
> +#define VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4     (1 << 0)
> +#define VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6     (1 << 1)
> +#define VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4     (1 << 2)
> +#define VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6     (1 << 3)
> +    le64 type;
> +};
> +
> +#define VIRTIO_NET_CTRL_SPLIT_HEADER       6
> + #define VIRTIO_NET_CTRL_SPLIT_HEADER_SET   0
> +\end{lstlisting}
> +
> +The class VIRTIO_NET_CTRL_SPLIT_HEADER has one command:
> +VIRTIO_NET_CTRL_SPLIT_HEADER_SET applies the new split header configuration.
> +
> +The driver can enable or disable split header for different protocols by
> +setting or clearing corresponding bits in \field{type}.
> +
> +\begin{itemize}
> +    \item VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4: split after ipv4 tcp header
> +    \item VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6: split after ipv6 tcp header
> +    \item VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4: split after ipv4 udp header
> +    \item VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6: split after ipv6 udp header
> +\end{itemize}
> +
> +\devicenormative{\subparagraph}{Setting Split Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> +
> +A device MUST initialize \field{type} to 0, and MUST set it to 0
> +upon device reset.
> +
> +If VIRTIO_NET_F_SPLIT_HEADER is negotiated, the device MUST support
> +VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6,
> +VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6.
> +
> +A device MUST NOT split the header if it encounters any of the following cases:
> +\begin{itemize}
> +    \item the device does not recognize the protocol of the packet.
> +    \item the packet is an IP fragmentation.
> +    \item \field{type} does not include the protocol of the packet.
> +    \item the buffer consists of only one descriptor.
> +    \item the total size of the virtio net header and the protocol header exceeds
> +        the size of the first descriptor.
> +    \item when VIRTIO_NET_F_MRG_RXBUF is not negotiated and the size of the
> +        payload exceeds the size of the descriptor chain starting from the 2nd
> +        descriptor.

When MRG_RXBUF is negotiated, what if the rest of the buffers don't
fit for the payload?

> +\end{itemize}
> +
> +If the header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_HEADER bit
> +in \field{flags} MUST be set. The protocol header MUST be on the first
> +descriptor, following the virtio net header. The payload MUST start from the
> +second descriptor. The device MUST set \field{hdr_len} of structure
> +virtio_net_hdr to the length of the protocol header.
> +
> +If all of the following applies:
> +\begin{itemize}
> +    \item the header is split by the device.
> +    \item VIRTIO_NET_F_MRG_RXBUF has been negotiated.
> +    \item the received packet is spread over multiple buffers.
> +\end {itemize}
> +then if the device uses the buffers after the 1st buffer, and the buffer
> +consists of multiple descriptors, the device MUST skip the first descriptor,
> +because the first descriptor is used to carry the protocol header.
> +The used length still reports the number of bytes it has written to memory.

Does this mean, in order to get best performance, driver needs to
prepare buffer like

[small descriptor for header] + [large descriptor for payload] ?

> +
> +If VIRTIO_NET_F_SPLIT_HEADER is negotiated, but the device does not split the header
> +and the buffer consists of at least two descriptors, the device MUST put the virtio net header
> +in the first descriptor, and MUST NOT set the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags}.
> +The device MUST use the second descriptor to store the packet, since the first descriptor
> +is designed to accommodate the header and is tiny for the packet.

Which header did you mean here, is it the protocol header or the vnet header?

Btw I don't get why we need this, maybe you can give an example for this?

> +
> +\drivernormative{\subparagraph}{Setting Split Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> +
> +If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is set, the driver
> +MUST treat the contents of \field{hdr_len} as the length of the protocol header
> +inside the first descriptor.

Not sure MUST is appropriate here, driver needs to validate the hdr_len anyhow.

> +
> +If the split header is enabled, the buffers submitted to receiveq by the
> +driver SHOULD be composed of at least two descriptors.

It looks to me the "SHOULD" here is somehow conflict with the previous
description about when the device can't perform a split.

> +When the buffer consists of two descriptors, the length of the first
> +descriptor MUST be greater than the one of the virtio net header.

So I think we'd better have guidance on how to achieve the best
performance and move the above there instead of using the normatives.

Thanks

>
>  \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>
> --
> 1.8.3.1
>


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [virtio-dev] Re: [PATCH] virtio_net: support split header
  2022-08-04  6:27 ` Jason Wang
@ 2022-08-04 13:01   ` Heng Qi
  2022-08-04 13:50     ` Cornelia Huck
  2022-08-11  9:01     ` Xuan Zhuo
       [not found]   ` <4ea77b4a-2571-2b12-433c-2f1a8ceaeff8@linux.alibaba.com>
  1 sibling, 2 replies; 12+ messages in thread
From: Heng Qi @ 2022-08-04 13:01 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, xuanzhuo, kangjie.xu, virtio-dev

[-- Attachment #1: Type: text/plain, Size: 15463 bytes --]


在 2022/8/4 下午2:27, Jason Wang 写道:
> On Mon, Aug 1, 2022 at 2:59 PM Heng Qi<hengqi@linux.alibaba.com>  wrote:
>> From: Xuan Zhuo<xuanzhuo@linux.alibaba.com>
>>
>> The purpose of this feature is to split the header and the payload of
>> the packet.
>>
>> |                    receive buffer                                    |
>> |                       0th descriptor             | 1th descriptor    |
>> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload         |
>>
>> We can use a buffer plus a separate page when allocating the receive
>> buffer. In this way, we can ensure that all payloads can be
>> independently in a page, which is very beneficial for the zerocopy
>> implemented by the upper layer.
>>
>> Signed-off-by: Xuan Zhuo<xuanzhuo@linux.alibaba.com>
>> Signed-off-by: Heng Qi<hengqi@linux.alibaba.com>
>> Reviewed-by: Kangjie Xu<kangjie.xu@linux.alibaba.com>
>> ---
>>
>> v6:
>>      1. Fix some syntax issues. @Cornelia Huck
>>      2. Clarify some paragraphs. @Cornelia Huck
>>      3. Determine the device what to do if it does not perform header split on a packet.
>>
>> v5:
>>      1. Determine when hdr_len is credible in the process of rx
>>      2. Clean up the use of buffers and descriptors
>>      3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
>>
>> v4:
>>      1. fix typo @Cornelia Huck @Jason Wang
>>      2. do not split header for IP fragmentation packet. @Jason Wang
>>
>> v3:
>>      1. Fix some syntax issues
>>      2. Fix some terminology issues
>>      3. It is not unified with ip alignment, so ip alignment is not included
>>      4. Make it clear that the device must support four types, in the case of
>>      successful negotiation.
>>
>>   conformance.tex |   2 +
>>   content.tex     | 111 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>>   2 files changed, 109 insertions(+), 4 deletions(-)
>>
>> diff --git a/conformance.tex b/conformance.tex
>> index 2b86fc6..bd0f463 100644
>> --- a/conformance.tex
>> +++ b/conformance.tex
>> @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>> +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
>>   \end{itemize}
>>
>>   \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
>> @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>> +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
>>   \end{itemize}
>>
>>   \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
>> diff --git a/content.tex b/content.tex
>> index e863709..74c36fe 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>>   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>>       channel.
>>
>> +\item[VIRTIO_NET_F_SPLIT_HEADER (52)] Device supports splitting the protocol
>> +    header and the payload.
>> +
>>   \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
>>
>>   \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
>> @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>>   \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>>   \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>>   \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
>> +\item[VIRTIO_NET_F_SPLIT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
>>   \end{description}
>>
>>   \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
>> @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
>>   #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
>>   #define VIRTIO_NET_HDR_F_DATA_VALID    2
>>   #define VIRTIO_NET_HDR_F_RSC_INFO      4
>> +#define VIRTIO_NET_HDR_F_SPLIT_HEADER  8
>>           u8 flags;
>>   #define VIRTIO_NET_HDR_GSO_NONE        0
>>   #define VIRTIO_NET_HDR_GSO_TCPV4       1
>> @@ -3799,9 +3804,10 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>   not set VIRTIO_NET_HDR_F_RSC_INFO bit in \field{flags}.
>>
>>   If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
>> -been negotiated, the device SHOULD set \field{hdr_len} to a value
>> +been negotiated and the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags}
>> +is not set, the device SHOULD set \field{hdr_len} to a value
>>   not less than the length of the headers, including the transport
>> -header.
>> +header \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header / Setting Split Header}.
>>
>>   If the VIRTIO_NET_F_GUEST_CSUM feature has been negotiated, the
>>   device MAY set the VIRTIO_NET_HDR_F_DATA_VALID bit in
>> @@ -3820,9 +3826,13 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>   driver MUST NOT use the \field{csum_start} and \field{csum_offset}.
>>
>>   If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
>> -been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
>> +been negotiated and the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags}
>> +is not set, the driver MAY use \field{hdr_len} only as a hint about the
>>   transport header size.
>> -The driver MUST NOT rely on \field{hdr_len} to be correct.
>> +
>> +If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is not set, the driver
>> +MUST NOT rely on \field{hdr_len} to be correct.
> I think we should keep the above description as-is. For whatever case,
> the driver must not trust the metadata set by the device and must
> perform necessary sanity tests on them.


My idea is to keep the current description as it is,
but to emphasize in the next version:
"If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is set,
the driver MAY treat the \field{hdr_len} as the length of the
protocol header inside the first descriptor."


>
>> +
>>   \begin{note}
>>   This is due to various bugs in implementations.
>>   \end{note}
>> @@ -4483,6 +4493,99 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>   according to the native endian of the guest rather than
>>   (necessarily when not using the legacy interface) little-endian.
>>
>> +\paragraph{Split Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
>> +
>> +If the VIRTIO_NET_F_SPLIT_HEADER feature is negotiated,
>> +the device supports splitting the protocol header and the payload.
>> +The protocol header and the payload will be separated into different
>> +descriptors.
>> +
>> +\subparagraph{Split Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header / Setting Split Header}
>> +
>> +To configure the split header, the following layout structure and definitions
>> +are used:
>> +
>> +\begin{lstlisting}
>> +struct virtio_net_split_header_config {
>> +#define VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4     (1 << 0)
>> +#define VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6     (1 << 1)
>> +#define VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4     (1 << 2)
>> +#define VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6     (1 << 3)
>> +    le64 type;
>> +};
>> +
>> +#define VIRTIO_NET_CTRL_SPLIT_HEADER       6
>> + #define VIRTIO_NET_CTRL_SPLIT_HEADER_SET   0
>> +\end{lstlisting}
>> +
>> +The class VIRTIO_NET_CTRL_SPLIT_HEADER has one command:
>> +VIRTIO_NET_CTRL_SPLIT_HEADER_SET applies the new split header configuration.
>> +
>> +The driver can enable or disable split header for different protocols by
>> +setting or clearing corresponding bits in \field{type}.
>> +
>> +\begin{itemize}
>> +    \item VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4: split after ipv4 tcp header
>> +    \item VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6: split after ipv6 tcp header
>> +    \item VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4: split after ipv4 udp header
>> +    \item VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6: split after ipv6 udp header
>> +\end{itemize}
>> +
>> +\devicenormative{\subparagraph}{Setting Split Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
>> +
>> +A device MUST initialize \field{type} to 0, and MUST set it to 0
>> +upon device reset.
>> +
>> +If VIRTIO_NET_F_SPLIT_HEADER is negotiated, the device MUST support
>> +VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6,
>> +VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6.
>> +
>> +A device MUST NOT split the header if it encounters any of the following cases:
>> +\begin{itemize}
>> +    \item the device does not recognize the protocol of the packet.
>> +    \item the packet is an IP fragmentation.
>> +    \item \field{type} does not include the protocol of the packet.
>> +    \item the buffer consists of only one descriptor.
>> +    \item the total size of the virtio net header and the protocol header exceeds
>> +        the size of the first descriptor.
>> +    \item when VIRTIO_NET_F_MRG_RXBUF is not negotiated and the size of the
>> +        payload exceeds the size of the descriptor chain starting from the 2nd
>> +        descriptor.
> When MRG_RXBUF is negotiated, what if the rest of the buffers don't
> fit for the payload?

When VIRTIO_NET_F_SPLIT_HEADER is not negotiated, the device will also face
this situation, and we keep the same as the previous processing logic.

>> +\end{itemize}
>> +
>> +If the header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_HEADER bit
>> +in \field{flags} MUST be set. The protocol header MUST be on the first
>> +descriptor, following the virtio net header. The payload MUST start from the
>> +second descriptor. The device MUST set \field{hdr_len} of structure
>> +virtio_net_hdr to the length of the protocol header.
>> +
>> +If all of the following applies:
>> +\begin{itemize}
>> +    \item the header is split by the device.
>> +    \item VIRTIO_NET_F_MRG_RXBUF has been negotiated.
>> +    \item the received packet is spread over multiple buffers.
>> +\end {itemize}
>> +then if the device uses the buffers after the 1st buffer, and the buffer
>> +consists of multiple descriptors, the device MUST skip the first descriptor,
>> +because the first descriptor is used to carry the protocol header.
>> +The used length still reports the number of bytes it has written to memory.
> Does this mean, in order to get best performance, driver needs to
> prepare buffer like
>
> [small descriptor for header] + [large descriptor for payload] ?

Yes, that's it.

>> +
>> +If VIRTIO_NET_F_SPLIT_HEADER is negotiated, but the device does not split the header
>> +and the buffer consists of at least two descriptors, the device MUST put the virtio net header
>> +in the first descriptor, and MUST NOT set the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags}.
>> +The device MUST use the second descriptor to store the packet, since the first descriptor
>> +is designed to accommodate the header and is tiny for the packet.
> Which header did you mean here, is it the protocol header or the vnet header?
>
> Btw I don't get why we need this, maybe you can give an example for this?

"the header" means the virtio net header and the protocol header.


Why does virtio net header MUST be placed in the first descriptor by the device?

When VIRTIO_NET_F_SPLIT_HEADER is negotiated and the packet received by
the driver consists of at least 2 descriptors, the driver MUST use the first
descriptor in order to determine whether the packet is header-splitted by the device.
Otherwise, if virtio net header is placed on the second descriptor, but the packet has been
header-splitted by the device, such processing is unreasonable.


Why does the device MUST skip the first descriptor and use the second
descriptor to store the packet?

There are three points in total.

1. the semantic of the first descriptor:

When VIRTIO_NET_F_SPLIT_HEADER is negotiated, the first descriptor
is designed to store the virtio net header and the protocol header. If the first
descriptor stores the data of the packet at this time, then the semantic of the first
descriptor has changed.

2. the performance of the driver while receiving the packets.

First, it is worth noting that the size of the first descriptor is small.

Second, imagine a situation where if a packet is exactly one byte larger than the size
of the first descriptor, and the device stores the packet starting from
the first descriptor,the packet will be placed on two descriptors, but these two descriptors
not continuous in physical addresses, which will affect the performance of the driver when
receiving packets.

3. compatibility of driver's receiving packet logic:

If the device stores the packet in the second descriptor, the
driver's existing packet receiving logic is minimally affected.

>> +
>> +\drivernormative{\subparagraph}{Setting Split Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
>> +
>> +If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is set, the driver
>> +MUST treat the contents of \field{hdr_len} as the length of the protocol header
>> +inside the first descriptor.
> Not sure MUST is appropriate here, driver needs to validate the hdr_len anyhow.

I will use "SHOULD" instead of "MUST".

>> +
>> +If the split header is enabled, the buffers submitted to receiveq by the
>> +driver SHOULD be composed of at least two descriptors.
> It looks to me the "SHOULD" here is somehow conflict with the previous
> description about when the device can't perform a split.
>
I will use "MUST" instead of "SHOULD".

>> +When the buffer consists of two descriptors, the length of the first
>> +descriptor MUST be greater than the one of the virtio net header.
> So I think we'd better have guidance on how to achieve the best
> performance and move the above there instead of using the normatives.
>
Will fix.

I will fix these in the next version.

Thanks.


[-- Attachment #2: Type: text/html, Size: 17926 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [virtio-dev] Re: [PATCH] virtio_net: support split header
  2022-08-04 13:01   ` Heng Qi
@ 2022-08-04 13:50     ` Cornelia Huck
  2022-08-05  5:59       ` Heng Qi
  2022-08-11  9:01     ` Xuan Zhuo
  1 sibling, 1 reply; 12+ messages in thread
From: Cornelia Huck @ 2022-08-04 13:50 UTC (permalink / raw)
  To: Heng Qi, Jason Wang; +Cc: mst, xuanzhuo, kangjie.xu, virtio-dev

On Thu, Aug 04 2022, Heng Qi <hengqi@linux.alibaba.com> wrote:

> 在 2022/8/4 下午2:27, Jason Wang 写道:
>> On Mon, Aug 1, 2022 at 2:59 PM Heng Qi<hengqi@linux.alibaba.com>  wrote:
>>> @@ -3820,9 +3826,13 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>>   driver MUST NOT use the \field{csum_start} and \field{csum_offset}.
>>>
>>>   If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
>>> -been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
>>> +been negotiated and the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags}
>>> +is not set, the driver MAY use \field{hdr_len} only as a hint about the
>>>   transport header size.
>>> -The driver MUST NOT rely on \field{hdr_len} to be correct.
>>> +
>>> +If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is not set, the driver
>>> +MUST NOT rely on \field{hdr_len} to be correct.
>> I think we should keep the above description as-is. For whatever case,
>> the driver must not trust the metadata set by the device and must
>> perform necessary sanity tests on them.
>
>
> My idea is to keep the current description as it is,
> but to emphasize in the next version:
> "If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is set,
> the driver MAY treat the \field{hdr_len} as the length of the
> protocol header inside the first descriptor."


Just to be clear, you suggest using

"If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
transport header size.

The driver MUST NOT rely on \field{hdr_len} to be correct.

If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is set,
the driver MAY treat the \field{hdr_len} as the length of the
protocol header inside the first descriptor."

(Maybe "...the driver MAY use \field{hdr_len} as a hint about the length
of the protocol header..."? It's still not reliable, right?)

>
>
>>
>>> +
>>>   \begin{note}
>>>   This is due to various bugs in implementations.
>>>   \end{note}


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [virtio-dev] Re: [PATCH] virtio_net: support split header
  2022-08-04 13:50     ` Cornelia Huck
@ 2022-08-05  5:59       ` Heng Qi
  2022-08-09 15:33         ` Cornelia Huck
  0 siblings, 1 reply; 12+ messages in thread
From: Heng Qi @ 2022-08-05  5:59 UTC (permalink / raw)
  To: Cornelia Huck; +Cc: Jason Wang, xuanzhuo, kangjie.xu, virtio-dev

[-- Attachment #1: Type: text/plain, Size: 2547 bytes --]


在 2022/8/4 下午9:50, Cornelia Huck 写道:
> On Thu, Aug 04 2022, Heng Qi<hengqi@linux.alibaba.com>  wrote:
>
>> 在 2022/8/4 下午2:27, Jason Wang 写道:
>>> On Mon, Aug 1, 2022 at 2:59 PM Heng Qi<hengqi@linux.alibaba.com>   wrote:
>>>> @@ -3820,9 +3826,13 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>>>    driver MUST NOT use the \field{csum_start} and \field{csum_offset}.
>>>>
>>>>    If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
>>>> -been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
>>>> +been negotiated and the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags}
>>>> +is not set, the driver MAY use \field{hdr_len} only as a hint about the
>>>>    transport header size.
>>>> -The driver MUST NOT rely on \field{hdr_len} to be correct.
>>>> +
>>>> +If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is not set, the driver
>>>> +MUST NOT rely on \field{hdr_len} to be correct.
>>> I think we should keep the above description as-is. For whatever case,
>>> the driver must not trust the metadata set by the device and must
>>> perform necessary sanity tests on them.
>>
>> My idea is to keep the current description as it is,
>> but to emphasize in the next version:
>> "If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is set,
>> the driver MAY treat the \field{hdr_len} as the length of the
>> protocol header inside the first descriptor."
>
> Just to be clear, you suggest using
>
> "If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
> been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
> transport header size.
>
> The driver MUST NOT rely on \field{hdr_len} to be correct.
>
> If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is set,
> the driver MAY treat the \field{hdr_len} as the length of the
> protocol header inside the first descriptor."

Yes. I will use the above description to make it clearer in the next version.


>
> (Maybe "...the driver MAY use \field{hdr_len} as a hint about the length
> of the protocol header..."? It's still not reliable, right?)
>
\field{hdr_len} is unreliable when VIRTIO_NET_F_SPLIT_HEADER is not negotiated.


If VIRTIO_NET_F_SPLIT_HEADER is negotiated, "split header" MAY perform the split
from the IP layer, so the protocol header and the transport header are different.

so I think the "...the driver MAY use \field{hdr_len} only as a hint about the
transport header size..." paragraph can be left as-is.



[-- Attachment #2: Type: text/html, Size: 3559 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [virtio-dev] Re: [PATCH] virtio_net: support split header
  2022-08-05  5:59       ` Heng Qi
@ 2022-08-09 15:33         ` Cornelia Huck
  0 siblings, 0 replies; 12+ messages in thread
From: Cornelia Huck @ 2022-08-09 15:33 UTC (permalink / raw)
  To: Heng Qi; +Cc: Jason Wang, xuanzhuo, kangjie.xu, virtio-dev

On Fri, Aug 05 2022, Heng Qi <hengqi@linux.alibaba.com> wrote:

> 在 2022/8/4 下午9:50, Cornelia Huck 写道:
>> On Thu, Aug 04 2022, Heng Qi<hengqi@linux.alibaba.com>  wrote:
>>
>>> 在 2022/8/4 下午2:27, Jason Wang 写道:
>>>> On Mon, Aug 1, 2022 at 2:59 PM Heng Qi<hengqi@linux.alibaba.com>   wrote:
>>>>> @@ -3820,9 +3826,13 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>>>>    driver MUST NOT use the \field{csum_start} and \field{csum_offset}.
>>>>>
>>>>>    If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
>>>>> -been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
>>>>> +been negotiated and the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags}
>>>>> +is not set, the driver MAY use \field{hdr_len} only as a hint about the
>>>>>    transport header size.
>>>>> -The driver MUST NOT rely on \field{hdr_len} to be correct.
>>>>> +
>>>>> +If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is not set, the driver
>>>>> +MUST NOT rely on \field{hdr_len} to be correct.
>>>> I think we should keep the above description as-is. For whatever case,
>>>> the driver must not trust the metadata set by the device and must
>>>> perform necessary sanity tests on them.
>>>
>>> My idea is to keep the current description as it is,
>>> but to emphasize in the next version:
>>> "If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is set,
>>> the driver MAY treat the \field{hdr_len} as the length of the
>>> protocol header inside the first descriptor."
>>
>> Just to be clear, you suggest using
>>
>> "If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
>> been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
>> transport header size.
>>
>> The driver MUST NOT rely on \field{hdr_len} to be correct.
>>
>> If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is set,
>> the driver MAY treat the \field{hdr_len} as the length of the
>> protocol header inside the first descriptor."
>
> Yes. I will use the above description to make it clearer in the next version.
>
>
>>
>> (Maybe "...the driver MAY use \field{hdr_len} as a hint about the length
>> of the protocol header..."? It's still not reliable, right?)
>>
> \field{hdr_len} is unreliable when VIRTIO_NET_F_SPLIT_HEADER is not negotiated.
>
>
> If VIRTIO_NET_F_SPLIT_HEADER is negotiated, "split header" MAY perform the split
> from the IP layer, so the protocol header and the transport header are different.
>
> so I think the "...the driver MAY use \field{hdr_len} only as a hint about the
> transport header size..." paragraph can be left as-is.

Ok, then let's keep it like that.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [virtio-dev] Re: [PATCH] virtio_net: support split header
       [not found]     ` <CACGkMEvTr44xhziEwMhCpAVp4dWDPvGHfkwGm+o4WRqsKBX=gw@mail.gmail.com>
@ 2022-08-10  9:10       ` Heng Qi
  0 siblings, 0 replies; 12+ messages in thread
From: Heng Qi @ 2022-08-10  9:10 UTC (permalink / raw)
  To: Jason Wang; +Cc: Cornelia Huck, mst, xuanzhuo, kangjie.xu, virtio-dev

[-- Attachment #1: Type: text/plain, Size: 15040 bytes --]


在 2022/8/9 下午5:18, Jason Wang 写道:
> On Thu, Aug 4, 2022 at 8:48 PM Heng Qi<hengqi@linux.alibaba.com>  wrote:
>>
>> 在 2022/8/4 下午2:27, Jason Wang 写道:
>>
>> On Mon, Aug 1, 2022 at 2:59 PM Heng Qi<hengqi@linux.alibaba.com>  wrote:
>>
>> From: Xuan Zhuo<xuanzhuo@linux.alibaba.com>
>>
>> The purpose of this feature is to split the header and the payload of
>> the packet.
>>
>> |                    receive buffer                                    |
>> |                       0th descriptor             | 1th descriptor    |
>> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload         |
>>
>> We can use a buffer plus a separate page when allocating the receive
>> buffer. In this way, we can ensure that all payloads can be
>> independently in a page, which is very beneficial for the zerocopy
>> implemented by the upper layer.
>>
>> Signed-off-by: Xuan Zhuo<xuanzhuo@linux.alibaba.com>
>> Signed-off-by: Heng Qi<hengqi@linux.alibaba.com>
>> Reviewed-by: Kangjie Xu<kangjie.xu@linux.alibaba.com>
>> ---
>>
>> v6:
>>      1. Fix some syntax issues. @Cornelia Huck
>>      2. Clarify some paragraphs. @Cornelia Huck
>>      3. Determine the device what to do if it does not perform header split on a packet.
>>
>> v5:
>>      1. Determine when hdr_len is credible in the process of rx
>>      2. Clean up the use of buffers and descriptors
>>      3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
>>
>> v4:
>>      1. fix typo @Cornelia Huck @Jason Wang
>>      2. do not split header for IP fragmentation packet. @Jason Wang
>>
>> v3:
>>      1. Fix some syntax issues
>>      2. Fix some terminology issues
>>      3. It is not unified with ip alignment, so ip alignment is not included
>>      4. Make it clear that the device must support four types, in the case of
>>      successful negotiation.
>>
>>   conformance.tex |   2 +
>>   content.tex     | 111 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
>>   2 files changed, 109 insertions(+), 4 deletions(-)
>>
>> diff --git a/conformance.tex b/conformance.tex
>> index 2b86fc6..bd0f463 100644
>> --- a/conformance.tex
>> +++ b/conformance.tex
>> @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
>>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>> +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
>>   \end{itemize}
>>
>>   \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
>> @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
>>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
>> +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
>>   \end{itemize}
>>
>>   \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
>> diff --git a/content.tex b/content.tex
>> index e863709..74c36fe 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
>>   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
>>       channel.
>>
>> +\item[VIRTIO_NET_F_SPLIT_HEADER (52)] Device supports splitting the protocol
>> +    header and the payload.
>> +
>>   \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
>>
>>   \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
>> @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
>>   \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
>>   \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
>>   \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
>> +\item[VIRTIO_NET_F_SPLIT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
>>   \end{description}
>>
>>   \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
>> @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
>>   #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
>>   #define VIRTIO_NET_HDR_F_DATA_VALID    2
>>   #define VIRTIO_NET_HDR_F_RSC_INFO      4
>> +#define VIRTIO_NET_HDR_F_SPLIT_HEADER  8
>>           u8 flags;
>>   #define VIRTIO_NET_HDR_GSO_NONE        0
>>   #define VIRTIO_NET_HDR_GSO_TCPV4       1
>> @@ -3799,9 +3804,10 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>   not set VIRTIO_NET_HDR_F_RSC_INFO bit in \field{flags}.
>>
>>   If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
>> -been negotiated, the device SHOULD set \field{hdr_len} to a value
>> +been negotiated and the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags}
>> +is not set, the device SHOULD set \field{hdr_len} to a value
>>   not less than the length of the headers, including the transport
>> -header.
>> +header \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header / Setting Split Header}.
>>
>>   If the VIRTIO_NET_F_GUEST_CSUM feature has been negotiated, the
>>   device MAY set the VIRTIO_NET_HDR_F_DATA_VALID bit in
>> @@ -3820,9 +3826,13 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>   driver MUST NOT use the \field{csum_start} and \field{csum_offset}.
>>
>>   If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
>> -been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
>> +been negotiated and the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags}
>> +is not set, the driver MAY use \field{hdr_len} only as a hint about the
>>   transport header size.
>> -The driver MUST NOT rely on \field{hdr_len} to be correct.
>> +
>> +If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is not set, the driver
>> +MUST NOT rely on \field{hdr_len} to be correct.
>>
>> I think we should keep the above description as-is. For whatever case,
>> the driver must not trust the metadata set by the device and must
>> perform necessary sanity tests on them.
>>
>>
>> My idea is to keep the current description as it is, but to emphasize in the next version :
>> "If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is set, the driver MAY treat
>> the \field{hdr_len} as the length of the protocol header inside the first descriptor."
> I'm not sure this is useful, the original description looks fine:
>
> 1) It says hdr_len is a hint about the transport header size when
> _F_SPLIT_HEADER is set. So this means the device has successfully
> split the header in the first descriptor.
> 2) driver MUST NOT rely on it.
>
> Btw, since the spec said it is the transport header size, it might be
> better to rename the feature as "split transport header"? So we can
> use another feature for network header split in the future otherwise
> there will be a conflict.
>
>>
>> +
>>   \begin{note}
>>   This is due to various bugs in implementations.
>>   \end{note}
>> @@ -4483,6 +4493,99 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
>>   according to the native endian of the guest rather than
>>   (necessarily when not using the legacy interface) little-endian.
>>
>> +\paragraph{Split Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
>> +
>> +If the VIRTIO_NET_F_SPLIT_HEADER feature is negotiated,
>> +the device supports splitting the protocol header and the payload.
>> +The protocol header and the payload will be separated into different
>> +descriptors.
>> +
>> +\subparagraph{Split Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header / Setting Split Header}
>> +
>> +To configure the split header, the following layout structure and definitions
>> +are used:
>> +
>> +\begin{lstlisting}
>> +struct virtio_net_split_header_config {
>> +#define VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4     (1 << 0)
>> +#define VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6     (1 << 1)
>> +#define VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4     (1 << 2)
>> +#define VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6     (1 << 3)
>> +    le64 type;
>> +};
>> +
>> +#define VIRTIO_NET_CTRL_SPLIT_HEADER       6
>> + #define VIRTIO_NET_CTRL_SPLIT_HEADER_SET   0
>> +\end{lstlisting}
>> +
>> +The class VIRTIO_NET_CTRL_SPLIT_HEADER has one command:
>> +VIRTIO_NET_CTRL_SPLIT_HEADER_SET applies the new split header configuration.
>> +
>> +The driver can enable or disable split header for different protocols by
>> +setting or clearing corresponding bits in \field{type}.
>> +
>> +\begin{itemize}
>> +    \item VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4: split after ipv4 tcp header
>> +    \item VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6: split after ipv6 tcp header
>> +    \item VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4: split after ipv4 udp header
>> +    \item VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6: split after ipv6 udp header
>> +\end{itemize}
>> +
>> +\devicenormative{\subparagraph}{Setting Split Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
>> +
>> +A device MUST initialize \field{type} to 0, and MUST set it to 0
>> +upon device reset.
>> +
>> +If VIRTIO_NET_F_SPLIT_HEADER is negotiated, the device MUST support
>> +VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6,
>> +VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6.
>> +
>> +A device MUST NOT split the header if it encounters any of the following cases:
>> +\begin{itemize}
>> +    \item the device does not recognize the protocol of the packet.
>> +    \item the packet is an IP fragmentation.
>> +    \item \field{type} does not include the protocol of the packet.
>> +    \item the buffer consists of only one descriptor.
>> +    \item the total size of the virtio net header and the protocol header exceeds
>> +        the size of the first descriptor.
>> +    \item when VIRTIO_NET_F_MRG_RXBUF is not negotiated and the size of the
>> +        payload exceeds the size of the descriptor chain starting from the 2nd
>> +        descriptor.
>>
>> When MRG_RXBUF is negotiated, what if the rest of the buffers don't
>> fit for the payload?
>>
>>
>> When VIRTIO_NET_F_SPLIT_HEADER is not negotiated, the device will also face
>> this situation, and we keep the same as the previous processing logic.
> The previous logic only applies for a single buffer not the case for a
> packet that can occupy multiple buffers?

I'm not quite sure what you mean. But I guess you mean something like this:

For example, the device we use is DPDK vhost-user, and the packet receiving
mode of the virtio-net driver is MRG_RXBUF. In this environment, when
VIRTIO_NET_F_SPLIT_HEADER is not negotiated, a buffer in the device consists
of only one descriptor, but a packet may be spread over multiple buffers.
The previous logic is that if the rest of the buffers don't fit for the payload,
it will return an error code and discard the remaining packets of the batch.
When VIRTIO_NET_F_SPLIT_HEADER is negotiated, a buffer consists of 2 descriptors,
the first for the header (the virtio net header and the protocol header) and
the second for the payload. Now, a packet is spread over multiple buffers and if the
rest of the buffers don't fit for the payload, we use the same logic as before
without negotiating VIRTIO_NET_F_SPLIT_HEADER, which is to return an error code
and discard the remaining packets of the batch.

Or do you mean something else? Can you help clarify the question?

Thanks.


>>
>> +\end{itemize}
>> +
>> +If the header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_HEADER bit
>> +in \field{flags} MUST be set. The protocol header MUST be on the first
>> +descriptor, following the virtio net header. The payload MUST start from the
>> +second descriptor. The device MUST set \field{hdr_len} of structure
>> +virtio_net_hdr to the length of the protocol header.
>> +
>> +If all of the following applies:
>> +\begin{itemize}
>> +    \item the header is split by the device.
>> +    \item VIRTIO_NET_F_MRG_RXBUF has been negotiated.
>> +    \item the received packet is spread over multiple buffers.
>> +\end {itemize}
>> +then if the device uses the buffers after the 1st buffer, and the buffer
>> +consists of multiple descriptors, the device MUST skip the first descriptor,
>> +because the first descriptor is used to carry the protocol header.
>> +The used length still reports the number of bytes it has written to memory.
>>
>> Does this mean, in order to get best performance, driver needs to
>> prepare buffer like
>>
>> [small descriptor for header] + [large descriptor for payload] ?
>>
>>
>> Yes, that's it.
> Let's add something in the driver normatives then. And I wonder if

Okay, we can do this later.


> this will have any side effects for the packet that can be splitted.

Currently when a packet is split, we haven't seen side effects on performance.

In the environment where the device uses OVS-DPDK vhost-user and the driver uses
virtio-net, we perform TCP receive zero-copy testing based on the
'tools/testing/selftests/net/tcp_mmap.c' test program by our split header implement.

Compared with the way that packets pass through the network stack by copying,
the efficiency of zero-copy packet receiving is improved by at least 50%.

>>
>> +
>> +If VIRTIO_NET_F_SPLIT_HEADER is negotiated, but the device does not split the header
>> +and the buffer consists of at least two descriptors, the device MUST put the virtio net header
>> +in the first descriptor, and MUST NOT set the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags}.
>> +The device MUST use the second descriptor to store the packet, since the first descriptor
>> +is designed to accommodate the header and is tiny for the packet.
>>
>> Which header did you mean here, is it the protocol header or the vnet header?
>>
>> Btw I don't get why we need this, maybe you can give an example for this?
>>
>> "the header" means the virtio net header and the protocol header.
> Okay, let's clarify it in the next version.
>
> Thanks

Okay. Thanks.

[-- Attachment #2: Type: text/html, Size: 16551 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [virtio-dev] Re: [PATCH] virtio_net: support split header
  2022-08-04 13:01   ` Heng Qi
  2022-08-04 13:50     ` Cornelia Huck
@ 2022-08-11  9:01     ` Xuan Zhuo
  1 sibling, 0 replies; 12+ messages in thread
From: Xuan Zhuo @ 2022-08-11  9:01 UTC (permalink / raw)
  To: Heng Qi; +Cc: mst, kangjie.xu, virtio-dev, Jason Wang

On Thu, 4 Aug 2022 21:01:30 +0800, Heng Qi <hengqi@linux.alibaba.com> wrote:
>
> 在 2022/8/4 下午2:27, Jason Wang 写道:
> > On Mon, Aug 1, 2022 at 2:59 PM Heng Qi<hengqi@linux.alibaba.com>  wrote:
> >> From: Xuan Zhuo<xuanzhuo@linux.alibaba.com>
> >>
> >> The purpose of this feature is to split the header and the payload of
> >> the packet.
> >>
> >> |                    receive buffer                                    |
> >> |                       0th descriptor             | 1th descriptor    |
> >> | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->|   payload         |
> >>
> >> We can use a buffer plus a separate page when allocating the receive
> >> buffer. In this way, we can ensure that all payloads can be
> >> independently in a page, which is very beneficial for the zerocopy
> >> implemented by the upper layer.
> >>
> >> Signed-off-by: Xuan Zhuo<xuanzhuo@linux.alibaba.com>
> >> Signed-off-by: Heng Qi<hengqi@linux.alibaba.com>
> >> Reviewed-by: Kangjie Xu<kangjie.xu@linux.alibaba.com>
> >> ---
> >>
> >> v6:
> >>      1. Fix some syntax issues. @Cornelia Huck
> >>      2. Clarify some paragraphs. @Cornelia Huck
> >>      3. Determine the device what to do if it does not perform header split on a packet.
> >>
> >> v5:
> >>      1. Determine when hdr_len is credible in the process of rx
> >>      2. Clean up the use of buffers and descriptors
> >>      3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge
> >>
> >> v4:
> >>      1. fix typo @Cornelia Huck @Jason Wang
> >>      2. do not split header for IP fragmentation packet. @Jason Wang
> >>
> >> v3:
> >>      1. Fix some syntax issues
> >>      2. Fix some terminology issues
> >>      3. It is not unified with ip alignment, so ip alignment is not included
> >>      4. Make it clear that the device must support four types, in the case of
> >>      successful negotiation.
> >>
> >>   conformance.tex |   2 +
> >>   content.tex     | 111 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--
> >>   2 files changed, 109 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/conformance.tex b/conformance.tex
> >> index 2b86fc6..bd0f463 100644
> >> --- a/conformance.tex
> >> +++ b/conformance.tex
> >> @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> >>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State}
> >>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) }
> >>   \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> >> +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> >>   \end{itemize}
> >>
> >>   \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance}
> >> @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> >>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode}
> >>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing}
> >>   \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing}
> >> +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> >>   \end{itemize}
> >>
> >>   \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance}
> >> diff --git a/content.tex b/content.tex
> >> index e863709..74c36fe 100644
> >> --- a/content.tex
> >> +++ b/content.tex
> >> @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
> >>   \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> >>       channel.
> >>
> >> +\item[VIRTIO_NET_F_SPLIT_HEADER (52)] Device supports splitting the protocol
> >> +    header and the payload.
> >> +
> >>   \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing.
> >>
> >>   \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets.
> >> @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
> >>   \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ.
> >>   \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
> >>   \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ.
> >> +\item[VIRTIO_NET_F_SPLIT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ.
> >>   \end{description}
> >>
> >>   \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits}
> >> @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
> >>   #define VIRTIO_NET_HDR_F_NEEDS_CSUM    1
> >>   #define VIRTIO_NET_HDR_F_DATA_VALID    2
> >>   #define VIRTIO_NET_HDR_F_RSC_INFO      4
> >> +#define VIRTIO_NET_HDR_F_SPLIT_HEADER  8
> >>           u8 flags;
> >>   #define VIRTIO_NET_HDR_GSO_NONE        0
> >>   #define VIRTIO_NET_HDR_GSO_TCPV4       1
> >> @@ -3799,9 +3804,10 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> >>   not set VIRTIO_NET_HDR_F_RSC_INFO bit in \field{flags}.
> >>
> >>   If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
> >> -been negotiated, the device SHOULD set \field{hdr_len} to a value
> >> +been negotiated and the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags}
> >> +is not set, the device SHOULD set \field{hdr_len} to a value
> >>   not less than the length of the headers, including the transport
> >> -header.
> >> +header \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header / Setting Split Header}.
> >>
> >>   If the VIRTIO_NET_F_GUEST_CSUM feature has been negotiated, the
> >>   device MAY set the VIRTIO_NET_HDR_F_DATA_VALID bit in
> >> @@ -3820,9 +3826,13 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> >>   driver MUST NOT use the \field{csum_start} and \field{csum_offset}.
> >>
> >>   If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
> >> -been negotiated, the driver MAY use \field{hdr_len} only as a hint about the
> >> +been negotiated and the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags}
> >> +is not set, the driver MAY use \field{hdr_len} only as a hint about the
> >>   transport header size.
> >> -The driver MUST NOT rely on \field{hdr_len} to be correct.
> >> +
> >> +If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is not set, the driver
> >> +MUST NOT rely on \field{hdr_len} to be correct.
> > I think we should keep the above description as-is. For whatever case,
> > the driver must not trust the metadata set by the device and must
> > perform necessary sanity tests on them.
>
>
> My idea is to keep the current description as it is,
> but to emphasize in the next version:
> "If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is set,
> the driver MAY treat the \field{hdr_len} as the length of the
> protocol header inside the first descriptor."
>
>
> >
> >> +
> >>   \begin{note}
> >>   This is due to various bugs in implementations.
> >>   \end{note}
> >> @@ -4483,6 +4493,99 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> >>   according to the native endian of the guest rather than
> >>   (necessarily when not using the legacy interface) little-endian.
> >>
> >> +\paragraph{Split Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> >> +
> >> +If the VIRTIO_NET_F_SPLIT_HEADER feature is negotiated,
> >> +the device supports splitting the protocol header and the payload.
> >> +The protocol header and the payload will be separated into different
> >> +descriptors.
> >> +
> >> +\subparagraph{Split Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header / Setting Split Header}
> >> +
> >> +To configure the split header, the following layout structure and definitions
> >> +are used:
> >> +
> >> +\begin{lstlisting}
> >> +struct virtio_net_split_header_config {
> >> +#define VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4     (1 << 0)
> >> +#define VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6     (1 << 1)
> >> +#define VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4     (1 << 2)
> >> +#define VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6     (1 << 3)
> >> +    le64 type;
> >> +};
> >> +
> >> +#define VIRTIO_NET_CTRL_SPLIT_HEADER       6
> >> + #define VIRTIO_NET_CTRL_SPLIT_HEADER_SET   0
> >> +\end{lstlisting}
> >> +
> >> +The class VIRTIO_NET_CTRL_SPLIT_HEADER has one command:
> >> +VIRTIO_NET_CTRL_SPLIT_HEADER_SET applies the new split header configuration.
> >> +
> >> +The driver can enable or disable split header for different protocols by
> >> +setting or clearing corresponding bits in \field{type}.
> >> +
> >> +\begin{itemize}
> >> +    \item VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4: split after ipv4 tcp header
> >> +    \item VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6: split after ipv6 tcp header
> >> +    \item VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4: split after ipv4 udp header
> >> +    \item VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6: split after ipv6 udp header
> >> +\end{itemize}
> >> +
> >> +\devicenormative{\subparagraph}{Setting Split Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> >> +
> >> +A device MUST initialize \field{type} to 0, and MUST set it to 0
> >> +upon device reset.
> >> +
> >> +If VIRTIO_NET_F_SPLIT_HEADER is negotiated, the device MUST support
> >> +VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6,
> >> +VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6.
> >> +
> >> +A device MUST NOT split the header if it encounters any of the following cases:
> >> +\begin{itemize}
> >> +    \item the device does not recognize the protocol of the packet.
> >> +    \item the packet is an IP fragmentation.
> >> +    \item \field{type} does not include the protocol of the packet.
> >> +    \item the buffer consists of only one descriptor.
> >> +    \item the total size of the virtio net header and the protocol header exceeds
> >> +        the size of the first descriptor.
> >> +    \item when VIRTIO_NET_F_MRG_RXBUF is not negotiated and the size of the
> >> +        payload exceeds the size of the descriptor chain starting from the 2nd
> >> +        descriptor.
> > When MRG_RXBUF is negotiated, what if the rest of the buffers don't
> > fit for the payload?
>
> When VIRTIO_NET_F_SPLIT_HEADER is not negotiated, the device will also face
> this situation, and we keep the same as the previous processing logic.
>
> >> +\end{itemize}
> >> +
> >> +If the header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_HEADER bit
> >> +in \field{flags} MUST be set. The protocol header MUST be on the first
> >> +descriptor, following the virtio net header. The payload MUST start from the
> >> +second descriptor. The device MUST set \field{hdr_len} of structure
> >> +virtio_net_hdr to the length of the protocol header.
> >> +
> >> +If all of the following applies:
> >> +\begin{itemize}
> >> +    \item the header is split by the device.
> >> +    \item VIRTIO_NET_F_MRG_RXBUF has been negotiated.
> >> +    \item the received packet is spread over multiple buffers.
> >> +\end {itemize}
> >> +then if the device uses the buffers after the 1st buffer, and the buffer
> >> +consists of multiple descriptors, the device MUST skip the first descriptor,
> >> +because the first descriptor is used to carry the protocol header.
> >> +The used length still reports the number of bytes it has written to memory.
> > Does this mean, in order to get best performance, driver needs to
> > prepare buffer like
> >
> > [small descriptor for header] + [large descriptor for payload] ?
>
> Yes, that's it.
>
> >> +
> >> +If VIRTIO_NET_F_SPLIT_HEADER is negotiated, but the device does not split the header
> >> +and the buffer consists of at least two descriptors, the device MUST put the virtio net header
> >> +in the first descriptor, and MUST NOT set the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags}.
> >> +The device MUST use the second descriptor to store the packet, since the first descriptor
> >> +is designed to accommodate the header and is tiny for the packet.
> > Which header did you mean here, is it the protocol header or the vnet header?
> >
> > Btw I don't get why we need this, maybe you can give an example for this?
>
> "the header" means the virtio net header and the protocol header.
>
>
> Why does virtio net header MUST be placed in the first descriptor by the device?
>
> When VIRTIO_NET_F_SPLIT_HEADER is negotiated and the packet received by
> the driver consists of at least 2 descriptors, the driver MUST use the first
> descriptor in order to determine whether the packet is header-splitted by the device.
> Otherwise, if virtio net header is placed on the second descriptor, but the packet has been
> header-splitted by the device, such processing is unreasonable.
>
>
> Why does the device MUST skip the first descriptor and use the second
> descriptor to store the packet?
>
> There are three points in total.
>
> 1. the semantic of the first descriptor:
>
> When VIRTIO_NET_F_SPLIT_HEADER is negotiated, the first descriptor
> is designed to store the virtio net header and the protocol header. If the first
> descriptor stores the data of the packet at this time, then the semantic of the first
> descriptor has changed.

If split header is not implemented, then we don't need to pay too much attention
to the semantics of the first descriptor, but it may be better to treat it as a
normal receive logic.


>
> 2. the performance of the driver while receiving the packets.
>
> First, it is worth noting that the size of the first descriptor is small.
>
> Second, imagine a situation where if a packet is exactly one byte larger than the size
> of the first descriptor, and the device stores the packet starting from
> the first descriptor,the packet will be placed on two descriptors, but these two descriptors
> not continuous in physical addresses, which will affect the performance of the driver when
> receiving packets.

In fact, since the virtio net hdr must be placed on the first descriptor, we
must access two descriptors. If we place packets consecutively, when the packets
are small, we have the opportunity to use only one descriptor.

>
> 3. compatibility of driver's receiving packet logic:
>
> If the device stores the packet in the second descriptor, the
> driver's existing packet receiving logic is minimally affected.

When we think about specs, we shouldn't be too distracted by the implementation.

But when we did think about this, I noticed that many times we reserve some
headroom in front of the buffer, then your headroom should be in front of the
first descriptor, if we put the packet directly into the second descriptor ,
then we will face the problem that the driver has no headroom when processing
packets.

Whether we do split header or not, we expect the packet to start with
the first descriptor.

Thanks.


>
> >> +
> >> +\drivernormative{\subparagraph}{Setting Split Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Header}
> >> +
> >> +If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is set, the driver
> >> +MUST treat the contents of \field{hdr_len} as the length of the protocol header
> >> +inside the first descriptor.
> > Not sure MUST is appropriate here, driver needs to validate the hdr_len anyhow.
>
> I will use "SHOULD" instead of "MUST".
>
> >> +
> >> +If the split header is enabled, the buffers submitted to receiveq by the
> >> +driver SHOULD be composed of at least two descriptors.
> > It looks to me the "SHOULD" here is somehow conflict with the previous
> > description about when the device can't perform a split.
> >
> I will use "MUST" instead of "SHOULD".
>
> >> +When the buffer consists of two descriptors, the length of the first
> >> +descriptor MUST be greater than the one of the virtio net header.
> > So I think we'd better have guidance on how to achieve the best
> > performance and move the above there instead of using the normatives.
> >
> Will fix.
>
> I will fix these in the next version.
>
> Thanks.
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-08-11  9:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-01  6:59 [virtio-dev] [PATCH] virtio_net: support split header Heng Qi
2022-08-01  8:28 ` [virtio-dev] " Heng Qi
2022-08-04  6:27 ` Jason Wang
2022-08-04 13:01   ` Heng Qi
2022-08-04 13:50     ` Cornelia Huck
2022-08-05  5:59       ` Heng Qi
2022-08-09 15:33         ` Cornelia Huck
2022-08-11  9:01     ` Xuan Zhuo
     [not found]   ` <4ea77b4a-2571-2b12-433c-2f1a8ceaeff8@linux.alibaba.com>
     [not found]     ` <CACGkMEvTr44xhziEwMhCpAVp4dWDPvGHfkwGm+o4WRqsKBX=gw@mail.gmail.com>
2022-08-10  9:10       ` Heng Qi
  -- strict thread matches above, loose matches on Subject: below --
2022-02-16  3:01 [virtio-dev] " Xuan Zhuo
2022-02-16  4:32 ` [virtio-dev] " Jason Wang
2022-02-16  6:05   ` Xuan Zhuo
2022-02-17  7:01     ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.