From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Sender: List-Post: List-Help: List-Unsubscribe: List-Subscribe: Received: from lists.oasis-open.org (oasis-open.org [10.110.1.242]) by lists.oasis-open.org (Postfix) with ESMTP id 6FB4698658D for ; Thu, 4 Aug 2022 06:28:17 +0000 (UTC) MIME-Version: 1.0 References: <1659337182-128890-1-git-send-email-hengqi@linux.alibaba.com> In-Reply-To: <1659337182-128890-1-git-send-email-hengqi@linux.alibaba.com> From: Jason Wang Date: Thu, 4 Aug 2022 14:27:57 +0800 Message-ID: Subject: [virtio-dev] Re: [PATCH] virtio_net: support split header Content-Type: text/plain; charset="UTF-8" To: Heng Qi Cc: Virtio-Dev , mst , Xuan Zhuo , Kangjie Xu List-ID: On Mon, Aug 1, 2022 at 2:59 PM Heng Qi wrote: > > From: Xuan Zhuo > > The purpose of this feature is to split the header and the payload of > the packet. > > | receive buffer | > | 0th descriptor | 1th descriptor | > | virtnet hdr | mac | ip hdr | tcp hdr|<-- hold -->| payload | > > We can use a buffer plus a separate page when allocating the receive > buffer. In this way, we can ensure that all payloads can be > independently in a page, which is very beneficial for the zerocopy > implemented by the upper layer. > > Signed-off-by: Xuan Zhuo > Signed-off-by: Heng Qi > Reviewed-by: Kangjie Xu > --- > > v6: > 1. Fix some syntax issues. @Cornelia Huck > 2. Clarify some paragraphs. @Cornelia Huck > 3. Determine the device what to do if it does not perform header split on a packet. > > v5: > 1. Determine when hdr_len is credible in the process of rx > 2. Clean up the use of buffers and descriptors > 3. Clarify the meaning of used lenght if the first descriptor is skipped in the case of merge > > v4: > 1. fix typo @Cornelia Huck @Jason Wang > 2. do not split header for IP fragmentation packet. @Jason Wang > > v3: > 1. Fix some syntax issues > 2. Fix some terminology issues > 3. It is not unified with ip alignment, so ip alignment is not included > 4. Make it clear that the device must support four types, in the case of > successful negotiation. > > conformance.tex | 2 + > content.tex | 111 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-- > 2 files changed, 109 insertions(+), 4 deletions(-) > > diff --git a/conformance.tex b/conformance.tex > index 2b86fc6..bd0f463 100644 > --- a/conformance.tex > +++ b/conformance.tex > @@ -150,6 +150,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} > \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Offloads State Configuration / Setting Offloads State} > \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) } > \item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing} > +\item \ref{drivernormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header} > \end{itemize} > > \conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance / Driver Conformance / Block Driver Conformance} > @@ -415,6 +416,7 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets} > \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Automatic receive steering in multiqueue mode} > \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Receive-side scaling (RSS) / RSS processing} > \item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing} > +\item \ref{devicenormative:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header} > \end{itemize} > > \conformance{\subsection}{Block Device Conformance}\label{sec:Conformance / Device Conformance / Block Device Conformance} > diff --git a/content.tex b/content.tex > index e863709..74c36fe 100644 > --- a/content.tex > +++ b/content.tex > @@ -3084,6 +3084,9 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits > \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control > channel. > > +\item[VIRTIO_NET_F_SPLIT_HEADER (52)] Device supports splitting the protocol > + header and the payload. > + > \item[VIRTIO_NET_F_NOTF_COAL(53)] Device supports notifications coalescing. > > \item[VIRTIO_NET_F_GUEST_USO4 (54)] Driver can receive USOv4 packets. > @@ -3140,6 +3143,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device > \item[VIRTIO_NET_F_NOTF_COAL] Requires VIRTIO_NET_F_CTRL_VQ. > \item[VIRTIO_NET_F_RSC_EXT] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6. > \item[VIRTIO_NET_F_RSS] Requires VIRTIO_NET_F_CTRL_VQ. > +\item[VIRTIO_NET_F_SPLIT_HEADER] Requires VIRTIO_NET_F_CTRL_VQ. > \end{description} > > \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits} > @@ -3371,6 +3375,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O > #define VIRTIO_NET_HDR_F_NEEDS_CSUM 1 > #define VIRTIO_NET_HDR_F_DATA_VALID 2 > #define VIRTIO_NET_HDR_F_RSC_INFO 4 > +#define VIRTIO_NET_HDR_F_SPLIT_HEADER 8 > u8 flags; > #define VIRTIO_NET_HDR_GSO_NONE 0 > #define VIRTIO_NET_HDR_GSO_TCPV4 1 > @@ -3799,9 +3804,10 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network > not set VIRTIO_NET_HDR_F_RSC_INFO bit in \field{flags}. > > If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have > -been negotiated, the device SHOULD set \field{hdr_len} to a value > +been negotiated and the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} > +is not set, the device SHOULD set \field{hdr_len} to a value > not less than the length of the headers, including the transport > -header. > +header \ref{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header / Setting Split Header}. > > If the VIRTIO_NET_F_GUEST_CSUM feature has been negotiated, the > device MAY set the VIRTIO_NET_HDR_F_DATA_VALID bit in > @@ -3820,9 +3826,13 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network > driver MUST NOT use the \field{csum_start} and \field{csum_offset}. > > If one of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have > -been negotiated, the driver MAY use \field{hdr_len} only as a hint about the > +been negotiated and the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} > +is not set, the driver MAY use \field{hdr_len} only as a hint about the > transport header size. > -The driver MUST NOT rely on \field{hdr_len} to be correct. > + > +If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is not set, the driver > +MUST NOT rely on \field{hdr_len} to be correct. I think we should keep the above description as-is. For whatever case, the driver must not trust the metadata set by the device and must perform necessary sanity tests on them. > + > \begin{note} > This is due to various bugs in implementations. > \end{note} > @@ -4483,6 +4493,99 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi > according to the native endian of the guest rather than > (necessarily when not using the legacy interface) little-endian. > > +\paragraph{Split Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header} > + > +If the VIRTIO_NET_F_SPLIT_HEADER feature is negotiated, > +the device supports splitting the protocol header and the payload. > +The protocol header and the payload will be separated into different > +descriptors. > + > +\subparagraph{Split Header}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Split Header / Setting Split Header} > + > +To configure the split header, the following layout structure and definitions > +are used: > + > +\begin{lstlisting} > +struct virtio_net_split_header_config { > +#define VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4 (1 << 0) > +#define VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6 (1 << 1) > +#define VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4 (1 << 2) > +#define VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6 (1 << 3) > + le64 type; > +}; > + > +#define VIRTIO_NET_CTRL_SPLIT_HEADER 6 > + #define VIRTIO_NET_CTRL_SPLIT_HEADER_SET 0 > +\end{lstlisting} > + > +The class VIRTIO_NET_CTRL_SPLIT_HEADER has one command: > +VIRTIO_NET_CTRL_SPLIT_HEADER_SET applies the new split header configuration. > + > +The driver can enable or disable split header for different protocols by > +setting or clearing corresponding bits in \field{type}. > + > +\begin{itemize} > + \item VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4: split after ipv4 tcp header > + \item VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6: split after ipv6 tcp header > + \item VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4: split after ipv4 udp header > + \item VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6: split after ipv6 udp header > +\end{itemize} > + > +\devicenormative{\subparagraph}{Setting Split Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Header} > + > +A device MUST initialize \field{type} to 0, and MUST set it to 0 > +upon device reset. > + > +If VIRTIO_NET_F_SPLIT_HEADER is negotiated, the device MUST support > +VIRTIO_NET_SPLIT_HEADER_TYPE_TCP4, VIRTIO_NET_SPLIT_HEADER_TYPE_TCP6, > +VIRTIO_NET_SPLIT_HEADER_TYPE_UDP4, VIRTIO_NET_SPLIT_HEADER_TYPE_UDP6. > + > +A device MUST NOT split the header if it encounters any of the following cases: > +\begin{itemize} > + \item the device does not recognize the protocol of the packet. > + \item the packet is an IP fragmentation. > + \item \field{type} does not include the protocol of the packet. > + \item the buffer consists of only one descriptor. > + \item the total size of the virtio net header and the protocol header exceeds > + the size of the first descriptor. > + \item when VIRTIO_NET_F_MRG_RXBUF is not negotiated and the size of the > + payload exceeds the size of the descriptor chain starting from the 2nd > + descriptor. When MRG_RXBUF is negotiated, what if the rest of the buffers don't fit for the payload? > +\end{itemize} > + > +If the header is split by the device, the VIRTIO_NET_HDR_F_SPLIT_HEADER bit > +in \field{flags} MUST be set. The protocol header MUST be on the first > +descriptor, following the virtio net header. The payload MUST start from the > +second descriptor. The device MUST set \field{hdr_len} of structure > +virtio_net_hdr to the length of the protocol header. > + > +If all of the following applies: > +\begin{itemize} > + \item the header is split by the device. > + \item VIRTIO_NET_F_MRG_RXBUF has been negotiated. > + \item the received packet is spread over multiple buffers. > +\end {itemize} > +then if the device uses the buffers after the 1st buffer, and the buffer > +consists of multiple descriptors, the device MUST skip the first descriptor, > +because the first descriptor is used to carry the protocol header. > +The used length still reports the number of bytes it has written to memory. Does this mean, in order to get best performance, driver needs to prepare buffer like [small descriptor for header] + [large descriptor for payload] ? > + > +If VIRTIO_NET_F_SPLIT_HEADER is negotiated, but the device does not split the header > +and the buffer consists of at least two descriptors, the device MUST put the virtio net header > +in the first descriptor, and MUST NOT set the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags}. > +The device MUST use the second descriptor to store the packet, since the first descriptor > +is designed to accommodate the header and is tiny for the packet. Which header did you mean here, is it the protocol header or the vnet header? Btw I don't get why we need this, maybe you can give an example for this? > + > +\drivernormative{\subparagraph}{Setting Split Header}{Device Types / Network Device / Device Operation / Control Virtqueue / Split Header} > + > +If the VIRTIO_NET_HDR_F_SPLIT_HEADER bit in \field{flags} is set, the driver > +MUST treat the contents of \field{hdr_len} as the length of the protocol header > +inside the first descriptor. Not sure MUST is appropriate here, driver needs to validate the hdr_len anyhow. > + > +If the split header is enabled, the buffers submitted to receiveq by the > +driver SHOULD be composed of at least two descriptors. It looks to me the "SHOULD" here is somehow conflict with the previous description about when the device can't perform a split. > +When the buffer consists of two descriptors, the length of the first > +descriptor MUST be greater than the one of the virtio net header. So I think we'd better have guidance on how to achieve the best performance and move the above there instead of using the normatives. Thanks > > \paragraph{Notifications Coalescing}\label{sec:Device Types / Network Device / Device Operation / Control Virtqueue / Notifications Coalescing} > > -- > 1.8.3.1 > --------------------------------------------------------------------- To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org