Netdev Archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/3] net: A lightweight zero-copy notification
@ 2024-04-19 21:48 zijianzhang
  2024-04-19 21:48 ` [PATCH net-next v2 1/3] selftests: fix OOM problem in msg_zerocopy selftest zijianzhang
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: zijianzhang @ 2024-04-19 21:48 UTC (permalink / raw
  To: netdev
  Cc: edumazet, willemdebruijn.kernel, davem, kuba, cong.wang,
	xiaochun.lu, Zijian Zhang

From: Zijian Zhang <zijianzhang@bytedance.com>

Original title is "net: socket sendmsg MSG_ZEROCOPY_UARG"
https://lore.kernel.org/all/
20240409205300.1346681-2-zijianzhang@bytedance.com/

Original notification mechanism needs poll + recvmmsg which is not
easy for applcations to accommodate. And, it also incurs unignorable
overhead including extra system calls and usage of optmem.

While making maximum reuse of the existing MSG_ZEROCOPY related code,
this patch set introduces a new zerocopy socket notification mechanism.
Users of sendmsg pass a control message as a placeholder for the incoming
notifications. Upon returning, kernel embeds notifications directly into
user arguments passed in. By doing so, we can significantly reduce the
complexity and overhead for managing notifications. In an ideal pattern,
the user will keep calling sendmsg with SO_ZC_NOTIFICATION msg_control,
and the notification will be delivered as soon as possible.

Changelog:
  v1 -> v2:
    - Reuse msg_errqueue in the new notification mechanism, suggested
      by Willem de Bruijn, users can actually use these two mechanisms
      in hybrid way if they want to do so.
    - Update case SO_ZC_NOTIFICATION in __sock_cmsg_send
      1. Regardless of 32-bit, 64-bit program, we will always handle
      u64 type user address.
      2. The size of data to copy_to_user is precisely calculated
      in case of kernel stack leak.
    - fix (kbuild-bot)
      1. Add SO_ZC_NOTIFICATION to arch-specific header files.
      2. header file types.h in include/uapi/linux/socket.h

* Performance

We extend the selftests/msg_zerocopy.c to accommodate the new mechanism,
test result is as follows,

cfg_notification_limit = 1, in this case the original method approximately
aligns with the semantics of new one. In this case, the new flag has
around 13% cpu savings in TCP and 18% cpu savings in UDP.

+---------------------+---------+---------+---------+---------+
| Test Type / Protocol| TCP v4  | TCP v6  | UDP v4  | UDP v6  |
+---------------------+---------+---------+---------+---------+
| ZCopy (MB)          | 5147    | 4885    | 7489    | 7854    |
+---------------------+---------+---------+---------+---------+
| New ZCopy (MB)      | 5859    | 5505    | 9053    | 9236    |
+---------------------+---------+---------+---------+---------+
| New ZCopy / ZCopy   | 113.83% | 112.69% | 120.88% | 117.59% |
+---------------------+---------+---------+---------+---------+


cfg_notification_limit = 32, the new mechanism performs 8% better in TCP.
For UDP, no obvious performance gain is observed and sometimes may lead
to degradation. Thus, if users don't need to retrieve the notification
ASAP in UDP, the original mechanism is preferred.

+---------------------+---------+---------+---------+---------+
| Test Type / Protocol| TCP v4  | TCP v6  | UDP v4  | UDP v6  |
+---------------------+---------+---------+---------+---------+
| ZCopy (MB)          | 6272    | 6138    | 12138   | 10055   |
+---------------------+---------+---------+---------+---------+
| New ZCopy (MB)      | 6774    | 6620    | 11504   | 10355   |
+---------------------+---------+---------+---------+---------+
| New ZCopy / ZCopy   | 108.00% | 107.85% | 94.78%  | 102.98% |
+---------------------+---------+---------+---------+---------+

Zijian Zhang (3):
  selftests: fix OOM problem in msg_zerocopy selftest
  sock: add MSG_ZEROCOPY notification mechanism based on msg_control
  selftests: add MSG_ZEROCOPY msg_control notification test

 arch/alpha/include/uapi/asm/socket.h        |   2 +
 arch/mips/include/uapi/asm/socket.h         |   2 +
 arch/parisc/include/uapi/asm/socket.h       |   2 +
 arch/sparc/include/uapi/asm/socket.h        |   2 +
 include/uapi/asm-generic/socket.h           |   2 +
 include/uapi/linux/socket.h                 |  16 +++
 net/core/sock.c                             |  70 +++++++++++++
 tools/testing/selftests/net/msg_zerocopy.c  | 105 ++++++++++++++++++--
 tools/testing/selftests/net/msg_zerocopy.sh |   1 +
 9 files changed, 195 insertions(+), 7 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net-next v2 1/3] selftests: fix OOM problem in msg_zerocopy selftest
  2024-04-19 21:48 [PATCH net-next v2 0/3] net: A lightweight zero-copy notification zijianzhang
@ 2024-04-19 21:48 ` zijianzhang
  2024-04-21 15:16   ` Willem de Bruijn
  2024-04-19 21:48 ` [PATCH net-next v2 2/3] sock: add MSG_ZEROCOPY notification mechanism based on msg_control zijianzhang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: zijianzhang @ 2024-04-19 21:48 UTC (permalink / raw
  To: netdev
  Cc: edumazet, willemdebruijn.kernel, davem, kuba, cong.wang,
	xiaochun.lu, Zijian Zhang

From: Zijian Zhang <zijianzhang@bytedance.com>

In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg
on a socket with MSG_ZEROCOPY flag, and it will recv the notifications
until the socket is not writable. Typically, it will start the receiving
process after around 30+ sendmsgs. However, because of the
commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
the sender is always writable and does not get any chance to run recv
notifications. The selftest always exits with OUT_OF_MEMORY because the
memory used by opt_skb exceeds the core.sysctl_optmem_max.

According to our experiments, this problem can be solved by open the
DEBUG_LOCKDEP configuration for the kernel. But it will makes the
notificatoins disordered even in good commits before
commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale").

We introduce "cfg_notification_limit" to force sender to receive
notifications after some number of sendmsgs. And, notifications may not
come in order, because of the reason we present above. We have order
checking code managed by cfg_verbose.

Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com>
---
 tools/testing/selftests/net/msg_zerocopy.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/net/msg_zerocopy.c b/tools/testing/selftests/net/msg_zerocopy.c
index bdc03a2097e8..6c18b07cab30 100644
--- a/tools/testing/selftests/net/msg_zerocopy.c
+++ b/tools/testing/selftests/net/msg_zerocopy.c
@@ -1,3 +1,4 @@
+// SPDX-License-Identifier: GPL-2.0
 /* Evaluate MSG_ZEROCOPY
  *
  * Send traffic between two processes over one of the supported
@@ -85,6 +86,7 @@ static bool cfg_rx;
 static int  cfg_runtime_ms	= 4200;
 static int  cfg_verbose;
 static int  cfg_waittime_ms	= 500;
+static int  cfg_notification_limit = 32;
 static bool cfg_zerocopy;
 
 static socklen_t cfg_alen;
@@ -95,6 +97,8 @@ static char payload[IP_MAXPACKET];
 static long packets, bytes, completions, expected_completions;
 static int  zerocopied = -1;
 static uint32_t next_completion;
+/* The number of sendmsgs which have not received notified yet */
+static uint32_t sendmsg_counter;
 
 static unsigned long gettimeofday_ms(void)
 {
@@ -208,6 +212,7 @@ static bool do_sendmsg(int fd, struct msghdr *msg, bool do_zerocopy, int domain)
 		error(1, errno, "send");
 	if (cfg_verbose && ret != len)
 		fprintf(stderr, "send: ret=%u != %u\n", ret, len);
+	sendmsg_counter++;
 
 	if (len) {
 		packets++;
@@ -435,7 +440,7 @@ static bool do_recv_completion(int fd, int domain)
 	/* Detect notification gaps. These should not happen often, if at all.
 	 * Gaps can occur due to drops, reordering and retransmissions.
 	 */
-	if (lo != next_completion)
+	if (cfg_verbose && lo != next_completion)
 		fprintf(stderr, "gap: %u..%u does not append to %u\n",
 			lo, hi, next_completion);
 	next_completion = hi + 1;
@@ -460,6 +465,7 @@ static bool do_recv_completion(int fd, int domain)
 static void do_recv_completions(int fd, int domain)
 {
 	while (do_recv_completion(fd, domain)) {}
+	sendmsg_counter = 0;
 }
 
 /* Wait for all remaining completions on the errqueue */
@@ -549,6 +555,9 @@ static void do_tx(int domain, int type, int protocol)
 		else
 			do_sendmsg(fd, &msg, cfg_zerocopy, domain);
 
+		if (cfg_zerocopy && sendmsg_counter >= cfg_notification_limit)
+			do_recv_completions(fd, domain);
+
 		while (!do_poll(fd, POLLOUT)) {
 			if (cfg_zerocopy)
 				do_recv_completions(fd, domain);
@@ -708,7 +717,7 @@ static void parse_opts(int argc, char **argv)
 
 	cfg_payload_len = max_payload_len;
 
-	while ((c = getopt(argc, argv, "46c:C:D:i:mp:rs:S:t:vz")) != -1) {
+	while ((c = getopt(argc, argv, "46c:C:D:i:mp:rs:S:t:vzl:n")) != -1) {
 		switch (c) {
 		case '4':
 			if (cfg_family != PF_UNSPEC)
@@ -760,6 +769,9 @@ static void parse_opts(int argc, char **argv)
 		case 'z':
 			cfg_zerocopy = true;
 			break;
+		case 'l':
+			cfg_notification_limit = strtoul(optarg, NULL, 0);
+			break;
 		}
 	}
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next v2 2/3] sock: add MSG_ZEROCOPY notification mechanism based on msg_control
  2024-04-19 21:48 [PATCH net-next v2 0/3] net: A lightweight zero-copy notification zijianzhang
  2024-04-19 21:48 ` [PATCH net-next v2 1/3] selftests: fix OOM problem in msg_zerocopy selftest zijianzhang
@ 2024-04-19 21:48 ` zijianzhang
  2024-04-21 15:45   ` Willem de Bruijn
  2024-04-23  3:16   ` kernel test robot
  2024-04-19 21:48 ` [PATCH net-next v2 3/3] selftests: add MSG_ZEROCOPY msg_control notification test zijianzhang
  2024-04-20  3:47 ` [PATCH net-next v2 0/3] net: A lightweight zero-copy notification Jakub Kicinski
  3 siblings, 2 replies; 11+ messages in thread
From: zijianzhang @ 2024-04-19 21:48 UTC (permalink / raw
  To: netdev
  Cc: edumazet, willemdebruijn.kernel, davem, kuba, cong.wang,
	xiaochun.lu, Zijian Zhang

From: Zijian Zhang <zijianzhang@bytedance.com>

The MSG_ZEROCOPY flag enables copy avoidance for socket send calls.
However, zerocopy is not a free lunch. Apart from the management of user
pages, the combination of poll + recvmsg to receive notifications incurs
unignorable overhead in the applications. The overhead of such sometimes
might be more than the CPU savings from zerocopy. We try to solve this
problem with a new notification mechanism based on msgcontrol.
This new mechanism aims to reduce the overhead associated with receiving
notifications by embedding them directly into user arguments passed with
each sendmsg control message. By doing so, we can significantly reduce
the complexity and overhead for managing notifications. In an ideal
pattern, the user will keep calling sendmsg with SO_ZC_NOTIFICATION
msg_control, and the notification will be delivered as soon as possible.

Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com>
---
 arch/alpha/include/uapi/asm/socket.h  |  2 +
 arch/mips/include/uapi/asm/socket.h   |  2 +
 arch/parisc/include/uapi/asm/socket.h |  2 +
 arch/sparc/include/uapi/asm/socket.h  |  2 +
 include/uapi/asm-generic/socket.h     |  2 +
 include/uapi/linux/socket.h           | 16 ++++++
 net/core/sock.c                       | 70 +++++++++++++++++++++++++++
 7 files changed, 96 insertions(+)

diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index e94f621903fe..b24622a9cd47 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -140,6 +140,8 @@
 #define SO_PASSPIDFD		76
 #define SO_PEERPIDFD		77
 
+#define SO_ZC_NOTIFICATION 78
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index 60ebaed28a4c..638a4ebbffa7 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -151,6 +151,8 @@
 #define SO_PASSPIDFD		76
 #define SO_PEERPIDFD		77
 
+#define SO_ZC_NOTIFICATION 78
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index be264c2b1a11..393f1a6e9562 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -132,6 +132,8 @@
 #define SO_PASSPIDFD		0x404A
 #define SO_PEERPIDFD		0x404B
 
+#define SO_ZC_NOTIFICATION 0x404C
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 682da3714686..4fe127b0682b 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -133,6 +133,8 @@
 #define SO_PASSPIDFD             0x0055
 #define SO_PEERPIDFD             0x0056
 
+#define SO_ZC_NOTIFICATION 0x0057
+
 #if !defined(__KERNEL__)
 
 
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index 8ce8a39a1e5f..acbbbe7ac06a 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -135,6 +135,8 @@
 #define SO_PASSPIDFD		76
 #define SO_PEERPIDFD		77
 
+#define SO_ZC_NOTIFICATION 78
+
 #if !defined(__KERNEL__)
 
 #if __BITS_PER_LONG == 64 || (defined(__x86_64__) && defined(__ILP32__))
diff --git a/include/uapi/linux/socket.h b/include/uapi/linux/socket.h
index d3fcd3b5ec53..60e4db759d49 100644
--- a/include/uapi/linux/socket.h
+++ b/include/uapi/linux/socket.h
@@ -2,6 +2,8 @@
 #ifndef _UAPI_LINUX_SOCKET_H
 #define _UAPI_LINUX_SOCKET_H
 
+#include <linux/types.h>
+
 /*
  * Desired design of maximum size and alignment (see RFC2553)
  */
@@ -35,4 +37,18 @@ struct __kernel_sockaddr_storage {
 #define SOCK_TXREHASH_DISABLED	0
 #define SOCK_TXREHASH_ENABLED	1
 
+#define SOCK_ZC_INFO_MAX 256
+
+struct zc_info_elem {
+	__u32 lo;
+	__u32 hi;
+	__u8 zerocopy;
+};
+
+struct zc_info_usr {
+	__u64 usr_addr;
+	unsigned int length;
+	struct zc_info_elem info[];
+};
+
 #endif /* _UAPI_LINUX_SOCKET_H */
diff --git a/net/core/sock.c b/net/core/sock.c
index fe9195186c13..13f06480f2d8 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2809,6 +2809,13 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg,
 		     struct sockcm_cookie *sockc)
 {
 	u32 tsflags;
+	int ret, zc_info_size, i = 0;
+	unsigned long flags;
+	struct sk_buff_head *q, local_q;
+	struct sk_buff *skb, *tmp;
+	struct sock_exterr_skb *serr;
+	struct zc_info_usr *zc_info_usr_p, *zc_info_kern_p;
+	void __user	*usr_addr;
 
 	switch (cmsg->cmsg_type) {
 	case SO_MARK:
@@ -2842,6 +2849,69 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg,
 	case SCM_RIGHTS:
 	case SCM_CREDENTIALS:
 		break;
+	case SO_ZC_NOTIFICATION:
+		if (!sock_flag(sk, SOCK_ZEROCOPY) || sk->sk_family == PF_RDS)
+			return -EINVAL;
+
+		zc_info_usr_p = (struct zc_info_usr *)CMSG_DATA(cmsg);
+		if (zc_info_usr_p->length <= 0 || zc_info_usr_p->length > SOCK_ZC_INFO_MAX)
+			return -EINVAL;
+
+		zc_info_size = struct_size(zc_info_usr_p, info, zc_info_usr_p->length);
+		if (cmsg->cmsg_len != CMSG_LEN(zc_info_size))
+			return -EINVAL;
+
+		usr_addr = (void *)(uintptr_t)(zc_info_usr_p->usr_addr);
+		if (!access_ok(usr_addr, zc_info_size))
+			return -EFAULT;
+
+		zc_info_kern_p = kmalloc(zc_info_size, GFP_KERNEL);
+		if (!zc_info_kern_p)
+			return -ENOMEM;
+
+		q = &sk->sk_error_queue;
+		skb_queue_head_init(&local_q);
+		spin_lock_irqsave(&q->lock, flags);
+		skb = skb_peek(q);
+		while (skb && i < zc_info_usr_p->length) {
+			struct sk_buff *skb_next = skb_peek_next(skb, q);
+
+			serr = SKB_EXT_ERR(skb);
+			if (serr->ee.ee_errno == 0 &&
+			    serr->ee.ee_origin == SO_EE_ORIGIN_ZEROCOPY) {
+				zc_info_kern_p->info[i].hi = serr->ee.ee_data;
+				zc_info_kern_p->info[i].lo = serr->ee.ee_info;
+				zc_info_kern_p->info[i].zerocopy = !(serr->ee.ee_code
+								& SO_EE_CODE_ZEROCOPY_COPIED);
+				__skb_unlink(skb, q);
+				__skb_queue_tail(&local_q, skb);
+				i++;
+			}
+			skb = skb_next;
+		}
+		spin_unlock_irqrestore(&q->lock, flags);
+
+		zc_info_kern_p->usr_addr = zc_info_usr_p->usr_addr;
+		zc_info_kern_p->length = i;
+
+		ret = copy_to_user(usr_addr,
+				   zc_info_kern_p,
+					struct_size(zc_info_kern_p, info, i));
+		kfree(zc_info_kern_p);
+
+		if (unlikely(ret)) {
+			spin_lock_irqsave(&q->lock, flags);
+			skb_queue_reverse_walk_safe(&local_q, skb, tmp) {
+				__skb_unlink(skb, &local_q);
+				__skb_queue_head(q, skb);
+			}
+			spin_unlock_irqrestore(&q->lock, flags);
+			return -EFAULT;
+		}
+
+		while ((skb = __skb_dequeue(&local_q)))
+			consume_skb(skb);
+		break;
 	default:
 		return -EINVAL;
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net-next v2 3/3] selftests: add MSG_ZEROCOPY msg_control notification test
  2024-04-19 21:48 [PATCH net-next v2 0/3] net: A lightweight zero-copy notification zijianzhang
  2024-04-19 21:48 ` [PATCH net-next v2 1/3] selftests: fix OOM problem in msg_zerocopy selftest zijianzhang
  2024-04-19 21:48 ` [PATCH net-next v2 2/3] sock: add MSG_ZEROCOPY notification mechanism based on msg_control zijianzhang
@ 2024-04-19 21:48 ` zijianzhang
  2024-04-20  3:47 ` [PATCH net-next v2 0/3] net: A lightweight zero-copy notification Jakub Kicinski
  3 siblings, 0 replies; 11+ messages in thread
From: zijianzhang @ 2024-04-19 21:48 UTC (permalink / raw
  To: netdev
  Cc: edumazet, willemdebruijn.kernel, davem, kuba, cong.wang,
	xiaochun.lu, Zijian Zhang

From: Zijian Zhang <zijianzhang@bytedance.com>

We update selftests/net/msg_zerocopy.c to accommodate the new mechanism.

Test result from selftests/net/msg_zerocopy.c,
cfg_notification_limit = 1, in this case MSG_ZEROCOPY approximately
aligns with the semantics of MSG_ZEROCOPY_UARG. In this case, the new
flag has around 13% cpu savings in TCP and 18% cpu savings in UDP.
+---------------------+---------+---------+---------+---------+
| Test Type / Protocol| TCP v4  | TCP v6  | UDP v4  | UDP v6  |
+---------------------+---------+---------+---------+---------+
| ZCopy (MB)          | 5147    | 4885    | 7489    | 7854    |
+---------------------+---------+---------+---------+---------+
| New ZCopy (MB)      | 5859    | 5505    | 9053    | 9236    |
+---------------------+---------+---------+---------+---------+
| New ZCopy / ZCopy   | 113.83% | 112.69% | 120.88% | 117.59% |
+---------------------+---------+---------+---------+---------+

cfg_notification_limit = 32, it means less poll + recvmsg overhead,
the new mechanism performs 8% better in TCP. For UDP, no obvious
performance gain is observed and sometimes may lead to degradation.
Thus, if users don't need to retrieve the notification ASAP in UDP,
the original mechanism is preferred.
+---------------------+---------+---------+---------+---------+
| Test Type / Protocol| TCP v4  | TCP v6  | UDP v4  | UDP v6  |
+---------------------+---------+---------+---------+---------+
| ZCopy (MB)          | 6272    | 6138    | 12138   | 10055   |
+---------------------+---------+---------+---------+---------+
| New ZCopy (MB)      | 6774    | 6620    | 11504   | 10355   |
+---------------------+---------+---------+---------+---------+
| New ZCopy / ZCopy   | 108.00% | 107.85% | 94.78%  | 102.98% |
+---------------------+---------+---------+---------+---------+

Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com>
---
 tools/testing/selftests/net/msg_zerocopy.c  | 91 +++++++++++++++++++--
 tools/testing/selftests/net/msg_zerocopy.sh |  1 +
 2 files changed, 86 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/net/msg_zerocopy.c b/tools/testing/selftests/net/msg_zerocopy.c
index 6c18b07cab30..aa60d5c3f0a7 100644
--- a/tools/testing/selftests/net/msg_zerocopy.c
+++ b/tools/testing/selftests/net/msg_zerocopy.c
@@ -67,6 +67,10 @@
 #define SO_ZEROCOPY	60
 #endif
 
+#ifndef SO_ZC_NOTIFICATION
+#define SO_ZC_NOTIFICATION	78
+#endif
+
 #ifndef SO_EE_CODE_ZEROCOPY_COPIED
 #define SO_EE_CODE_ZEROCOPY_COPIED	1
 #endif
@@ -75,6 +79,12 @@
 #define MSG_ZEROCOPY	0x4000000
 #endif
 
+#define ZC_MSGERR_NOTIFICATION 1
+#define ZC_MSGCTL_NOTIFICATION 2
+
+#define SOCK_ZC_INFO_NUM 8
+#define ZC_INFO_SIZE(x) (sizeof(struct zc_info_usr) + x * sizeof(struct zc_info_elem))
+
 static int  cfg_cork;
 static bool cfg_cork_mixed;
 static int  cfg_cpu		= -1;		/* default: pin to last cpu */
@@ -87,13 +97,14 @@ static int  cfg_runtime_ms	= 4200;
 static int  cfg_verbose;
 static int  cfg_waittime_ms	= 500;
 static int  cfg_notification_limit = 32;
-static bool cfg_zerocopy;
+static int  cfg_zerocopy;           /* Either 1 or 2, mode for SO_ZEROCOPY */
 
 static socklen_t cfg_alen;
 static struct sockaddr_storage cfg_dst_addr;
 static struct sockaddr_storage cfg_src_addr;
 
 static char payload[IP_MAXPACKET];
+static char zc_ckbuf[CMSG_SPACE(ZC_INFO_SIZE(SOCK_ZC_INFO_NUM))];
 static long packets, bytes, completions, expected_completions;
 static int  zerocopied = -1;
 static uint32_t next_completion;
@@ -171,6 +182,23 @@ static int do_accept(int fd)
 	return fd;
 }
 
+static void add_zcopy_info(struct msghdr *msg)
+{
+	struct cmsghdr *cm;
+	struct zc_info_usr *zc_info_usr_p;
+
+	if (!msg->msg_control)
+		error(1, errno, "NULL user arg");
+	cm = (void *)msg->msg_control;
+	cm->cmsg_len = CMSG_LEN(ZC_INFO_SIZE(SOCK_ZC_INFO_NUM));
+	cm->cmsg_level = SOL_SOCKET;
+	cm->cmsg_type = SO_ZC_NOTIFICATION;
+
+	zc_info_usr_p = (struct zc_info_usr *)CMSG_DATA(cm);
+	zc_info_usr_p->usr_addr = (__u64)zc_info_usr_p;
+	zc_info_usr_p->length = SOCK_ZC_INFO_NUM;
+}
+
 static void add_zcopy_cookie(struct msghdr *msg, uint32_t cookie)
 {
 	struct cmsghdr *cm;
@@ -184,7 +212,7 @@ static void add_zcopy_cookie(struct msghdr *msg, uint32_t cookie)
 	memcpy(CMSG_DATA(cm), &cookie, sizeof(cookie));
 }
 
-static bool do_sendmsg(int fd, struct msghdr *msg, bool do_zerocopy, int domain)
+static bool do_sendmsg(int fd, struct msghdr *msg, int do_zerocopy, int domain)
 {
 	int ret, len, i, flags;
 	static uint32_t cookie;
@@ -202,6 +230,11 @@ static bool do_sendmsg(int fd, struct msghdr *msg, bool do_zerocopy, int domain)
 			msg->msg_controllen = CMSG_SPACE(sizeof(cookie));
 			msg->msg_control = (struct cmsghdr *)ckbuf;
 			add_zcopy_cookie(msg, ++cookie);
+		} else if (do_zerocopy == ZC_MSGCTL_NOTIFICATION) {
+			memset(&msg->msg_control, 0, sizeof(msg->msg_control));
+			msg->msg_controllen = CMSG_SPACE(ZC_INFO_SIZE(SOCK_ZC_INFO_NUM));
+			msg->msg_control = (struct cmsghdr *)zc_ckbuf;
+			add_zcopy_info(msg);
 		}
 	}
 
@@ -220,7 +253,7 @@ static bool do_sendmsg(int fd, struct msghdr *msg, bool do_zerocopy, int domain)
 		if (do_zerocopy && ret)
 			expected_completions++;
 	}
-	if (do_zerocopy && domain == PF_RDS) {
+	if (msg->msg_control) {
 		msg->msg_control = NULL;
 		msg->msg_controllen = 0;
 	}
@@ -394,6 +427,43 @@ static bool do_recvmsg_completion(int fd)
 	return ret;
 }
 
+static void do_recv_completions2(struct cmsghdr *cmsg)
+{
+	int i;
+	__u32 hi, lo, range;
+	__u8 zerocopy;
+	struct zc_info_usr *zc_info_usr_p = (struct zc_info_usr *)CMSG_DATA(cmsg);
+
+	for (i = 0; i < zc_info_usr_p->length; ++i) {
+		struct zc_info_elem elem = zc_info_usr_p->info[i];
+
+		hi = elem.hi;
+		lo = elem.lo;
+		zerocopy = elem.zerocopy;
+		range = hi - lo + 1;
+
+		if (cfg_verbose && lo != next_completion)
+			fprintf(stderr, "gap: %u..%u does not append to %u\n",
+				lo, hi, next_completion);
+		next_completion = hi + 1;
+
+		if (zerocopied == -1)
+			zerocopied = zerocopy;
+		else if (zerocopied != zerocopy) {
+			fprintf(stderr, "serr: inconsistent\n");
+			zerocopied = zerocopy;
+		}
+
+		completions += range;
+
+		if (cfg_verbose >= 2)
+			fprintf(stderr, "completed: %u (h=%u l=%u)\n",
+				range, hi, lo);
+	}
+
+	sendmsg_counter -= zc_info_usr_p->length;
+}
+
 static bool do_recv_completion(int fd, int domain)
 {
 	struct sock_extended_err *serr;
@@ -555,11 +625,15 @@ static void do_tx(int domain, int type, int protocol)
 		else
 			do_sendmsg(fd, &msg, cfg_zerocopy, domain);
 
-		if (cfg_zerocopy && sendmsg_counter >= cfg_notification_limit)
+		if (cfg_zerocopy == ZC_MSGERR_NOTIFICATION &&
+			sendmsg_counter >= cfg_notification_limit)
 			do_recv_completions(fd, domain);
 
+		if (cfg_zerocopy == ZC_MSGCTL_NOTIFICATION)
+			do_recv_completions2((struct cmsghdr *)zc_ckbuf);
+
 		while (!do_poll(fd, POLLOUT)) {
-			if (cfg_zerocopy)
+			if (cfg_zerocopy == ZC_MSGERR_NOTIFICATION)
 				do_recv_completions(fd, domain);
 		}
 
@@ -767,11 +841,14 @@ static void parse_opts(int argc, char **argv)
 			cfg_verbose++;
 			break;
 		case 'z':
-			cfg_zerocopy = true;
+			cfg_zerocopy = ZC_MSGERR_NOTIFICATION;
 			break;
 		case 'l':
 			cfg_notification_limit = strtoul(optarg, NULL, 0);
 			break;
+		case 'n':
+			cfg_zerocopy = ZC_MSGCTL_NOTIFICATION;
+			break;
 		}
 	}
 
@@ -781,6 +858,8 @@ static void parse_opts(int argc, char **argv)
 			error(1, 0, "-D <server addr> required for PF_RDS\n");
 		if (!cfg_rx && !saddr)
 			error(1, 0, "-S <client addr> required for PF_RDS\n");
+		if (cfg_zerocopy == ZC_MSGCTL_NOTIFICATION)
+			error(1, 0, "PF_RDS does not support ZC_MSGCTL_NOTIFICATION");
 	}
 	setup_sockaddr(cfg_family, daddr, &cfg_dst_addr);
 	setup_sockaddr(cfg_family, saddr, &cfg_src_addr);
diff --git a/tools/testing/selftests/net/msg_zerocopy.sh b/tools/testing/selftests/net/msg_zerocopy.sh
index 89c22f5320e0..022a6936d86f 100755
--- a/tools/testing/selftests/net/msg_zerocopy.sh
+++ b/tools/testing/selftests/net/msg_zerocopy.sh
@@ -118,4 +118,5 @@ do_test() {
 
 do_test "${EXTRA_ARGS}"
 do_test "-z ${EXTRA_ARGS}"
+do_test "-n ${EXTRA_ARGS}"
 echo ok
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 0/3] net: A lightweight zero-copy notification
  2024-04-19 21:48 [PATCH net-next v2 0/3] net: A lightweight zero-copy notification zijianzhang
                   ` (2 preceding siblings ...)
  2024-04-19 21:48 ` [PATCH net-next v2 3/3] selftests: add MSG_ZEROCOPY msg_control notification test zijianzhang
@ 2024-04-20  3:47 ` Jakub Kicinski
  2024-04-23 19:29   ` [External] " Zijian Zhang
  3 siblings, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2024-04-20  3:47 UTC (permalink / raw
  To: zijianzhang
  Cc: netdev, edumazet, willemdebruijn.kernel, davem, cong.wang,
	xiaochun.lu

On Fri, 19 Apr 2024 21:48:16 +0000 zijianzhang@bytedance.com wrote:
> Original title is "net: socket sendmsg MSG_ZEROCOPY_UARG"
> https://lore.kernel.org/all/
> 20240409205300.1346681-2-zijianzhang@bytedance.com/

AFAICT sparse reports this new warning:

net/core/sock.c:2864:26: warning: incorrect type in assignment (different address spaces)
net/core/sock.c:2864:26:    expected void [noderef] __user *usr_addr
net/core/sock.c:2864:26:    got void *
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 1/3] selftests: fix OOM problem in msg_zerocopy selftest
  2024-04-19 21:48 ` [PATCH net-next v2 1/3] selftests: fix OOM problem in msg_zerocopy selftest zijianzhang
@ 2024-04-21 15:16   ` Willem de Bruijn
  2024-04-23 19:31     ` [External] " Zijian Zhang
  0 siblings, 1 reply; 11+ messages in thread
From: Willem de Bruijn @ 2024-04-21 15:16 UTC (permalink / raw
  To: zijianzhang, netdev
  Cc: edumazet, willemdebruijn.kernel, davem, kuba, cong.wang,
	xiaochun.lu, Zijian Zhang

zijianzhang@ wrote:
> From: Zijian Zhang <zijianzhang@bytedance.com>
> 
> In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg
> on a socket with MSG_ZEROCOPY flag, and it will recv the notifications
> until the socket is not writable. Typically, it will start the receiving
> process after around 30+ sendmsgs. However, because of the
> commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
> the sender is always writable and does not get any chance to run recv
> notifications. The selftest always exits with OUT_OF_MEMORY because the
> memory used by opt_skb exceeds the core.sysctl_optmem_max.
> 
> According to our experiments, this problem can be solved by open the
> DEBUG_LOCKDEP configuration for the kernel.

Not so much solved, as mitigated. 

> But it will makes the
> notificatoins disordered even in good commits before

typo: notifications

We still have no explanation for this behavior, right. OOO
notifications for TCP should be extremely rare.

A completion is queued when both the skb with the send() data was sent
and freed, and the ACK arrived, freeing the clone on the retransmit
queue. This is almost certainly some effect of running over loopback.

OOO being rare is also what makes the notification batch mechanism so
effective for TCP.

> commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale").
> 
> We introduce "cfg_notification_limit" to force sender to receive
> notifications after some number of sendmsgs. And, notifications may not
> come in order, because of the reason we present above. We have order
> checking code managed by cfg_verbose.
> 
> Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
> Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com>
> ---
>  tools/testing/selftests/net/msg_zerocopy.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/net/msg_zerocopy.c b/tools/testing/selftests/net/msg_zerocopy.c
> index bdc03a2097e8..6c18b07cab30 100644
> --- a/tools/testing/selftests/net/msg_zerocopy.c
> +++ b/tools/testing/selftests/net/msg_zerocopy.c
> @@ -1,3 +1,4 @@
> +// SPDX-License-Identifier: GPL-2.0
>  /* Evaluate MSG_ZEROCOPY
>   *
>   * Send traffic between two processes over one of the supported
> @@ -85,6 +86,7 @@ static bool cfg_rx;
>  static int  cfg_runtime_ms	= 4200;
>  static int  cfg_verbose;
>  static int  cfg_waittime_ms	= 500;
> +static int  cfg_notification_limit = 32;
>  static bool cfg_zerocopy;
>  
>  static socklen_t cfg_alen;
> @@ -95,6 +97,8 @@ static char payload[IP_MAXPACKET];
>  static long packets, bytes, completions, expected_completions;
>  static int  zerocopied = -1;
>  static uint32_t next_completion;
> +/* The number of sendmsgs which have not received notified yet */
> +static uint32_t sendmsg_counter;
>  
>  static unsigned long gettimeofday_ms(void)
>  {
> @@ -208,6 +212,7 @@ static bool do_sendmsg(int fd, struct msghdr *msg, bool do_zerocopy, int domain)
>  		error(1, errno, "send");
>  	if (cfg_verbose && ret != len)
>  		fprintf(stderr, "send: ret=%u != %u\n", ret, len);
> +	sendmsg_counter++;
>  
>  	if (len) {
>  		packets++;
> @@ -435,7 +440,7 @@ static bool do_recv_completion(int fd, int domain)
>  	/* Detect notification gaps. These should not happen often, if at all.
>  	 * Gaps can occur due to drops, reordering and retransmissions.
>  	 */
> -	if (lo != next_completion)
> +	if (cfg_verbose && lo != next_completion)
>  		fprintf(stderr, "gap: %u..%u does not append to %u\n",
>  			lo, hi, next_completion);
>  	next_completion = hi + 1;
> @@ -460,6 +465,7 @@ static bool do_recv_completion(int fd, int domain)
>  static void do_recv_completions(int fd, int domain)
>  {
>  	while (do_recv_completion(fd, domain)) {}
> +	sendmsg_counter = 0;
>  }
>  
>  /* Wait for all remaining completions on the errqueue */
> @@ -549,6 +555,9 @@ static void do_tx(int domain, int type, int protocol)
>  		else
>  			do_sendmsg(fd, &msg, cfg_zerocopy, domain);
>  
> +		if (cfg_zerocopy && sendmsg_counter >= cfg_notification_limit)
> +			do_recv_completions(fd, domain);
> +
>  		while (!do_poll(fd, POLLOUT)) {
>  			if (cfg_zerocopy)
>  				do_recv_completions(fd, domain);
> @@ -708,7 +717,7 @@ static void parse_opts(int argc, char **argv)
>  
>  	cfg_payload_len = max_payload_len;
>  
> -	while ((c = getopt(argc, argv, "46c:C:D:i:mp:rs:S:t:vz")) != -1) {
> +	while ((c = getopt(argc, argv, "46c:C:D:i:mp:rs:S:t:vzl:n")) != -1) {

no n defined

please keep lexicographic order
>  		switch (c) {
>  		case '4':
>  			if (cfg_family != PF_UNSPEC)
> @@ -760,6 +769,9 @@ static void parse_opts(int argc, char **argv)
>  		case 'z':
>  			cfg_zerocopy = true;
>  			break;
> +		case 'l':
> +			cfg_notification_limit = strtoul(optarg, NULL, 0);
> +			break;
>  		}
>  	}
>  
> -- 
> 2.20.1
> 



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 2/3] sock: add MSG_ZEROCOPY notification mechanism based on msg_control
  2024-04-19 21:48 ` [PATCH net-next v2 2/3] sock: add MSG_ZEROCOPY notification mechanism based on msg_control zijianzhang
@ 2024-04-21 15:45   ` Willem de Bruijn
  2024-04-23 19:51     ` [External] " Zijian Zhang
  2024-04-23  3:16   ` kernel test robot
  1 sibling, 1 reply; 11+ messages in thread
From: Willem de Bruijn @ 2024-04-21 15:45 UTC (permalink / raw
  To: zijianzhang, netdev
  Cc: edumazet, willemdebruijn.kernel, davem, kuba, cong.wang,
	xiaochun.lu, Zijian Zhang

zijianzhang@ wrote:
> From: Zijian Zhang <zijianzhang@bytedance.com>
> 
> The MSG_ZEROCOPY flag enables copy avoidance for socket send calls.
> However, zerocopy is not a free lunch. Apart from the management of user
> pages, the combination of poll + recvmsg to receive notifications incurs
> unignorable overhead in the applications. The overhead of such sometimes
> might be more than the CPU savings from zerocopy. We try to solve this
> problem with a new notification mechanism based on msgcontrol.
> This new mechanism aims to reduce the overhead associated with receiving
> notifications by embedding them directly into user arguments passed with
> each sendmsg control message. By doing so, we can significantly reduce
> the complexity and overhead for managing notifications. In an ideal
> pattern, the user will keep calling sendmsg with SO_ZC_NOTIFICATION
> msg_control, and the notification will be delivered as soon as possible.
> 
> Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
> Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com>
> ---
>  arch/alpha/include/uapi/asm/socket.h  |  2 +
>  arch/mips/include/uapi/asm/socket.h   |  2 +
>  arch/parisc/include/uapi/asm/socket.h |  2 +
>  arch/sparc/include/uapi/asm/socket.h  |  2 +
>  include/uapi/asm-generic/socket.h     |  2 +
>  include/uapi/linux/socket.h           | 16 ++++++
>  net/core/sock.c                       | 70 +++++++++++++++++++++++++++
>  7 files changed, 96 insertions(+)
> 
> diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
> index e94f621903fe..b24622a9cd47 100644
> --- a/arch/alpha/include/uapi/asm/socket.h
> +++ b/arch/alpha/include/uapi/asm/socket.h
> @@ -140,6 +140,8 @@
>  #define SO_PASSPIDFD		76
>  #define SO_PEERPIDFD		77
>  
> +#define SO_ZC_NOTIFICATION 78
> +

SCM_ for cmsgs

>  /*
>   * Desired design of maximum size and alignment (see RFC2553)
>   */
> @@ -35,4 +37,18 @@ struct __kernel_sockaddr_storage {
>  #define SOCK_TXREHASH_DISABLED	0
>  #define SOCK_TXREHASH_ENABLED	1
>  
> +#define SOCK_ZC_INFO_MAX 256
> +
> +struct zc_info_elem {
> +	__u32 lo;
> +	__u32 hi;
> +	__u8 zerocopy;
> +};
> +
> +struct zc_info_usr {
> +	__u64 usr_addr;
> +	unsigned int length;
> +	struct zc_info_elem info[];
> +};
> +

Don't pass a pointer to user memory, just have msg_control point to an
array of zc_info_elem.

>  #endif /* _UAPI_LINUX_SOCKET_H */
> diff --git a/net/core/sock.c b/net/core/sock.c
> index fe9195186c13..13f06480f2d8 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2809,6 +2809,13 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg,
>  		     struct sockcm_cookie *sockc)
>  {
>  	u32 tsflags;
> +	int ret, zc_info_size, i = 0;
> +	unsigned long flags;
> +	struct sk_buff_head *q, local_q;
> +	struct sk_buff *skb, *tmp;
> +	struct sock_exterr_skb *serr;
> +	struct zc_info_usr *zc_info_usr_p, *zc_info_kern_p;
> +	void __user	*usr_addr;

Please wrap the case in parentheses and define variables in that scope
(Since there are so many variables for this case only.)

>  
>  	switch (cmsg->cmsg_type) {
>  	case SO_MARK:
> @@ -2842,6 +2849,69 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg,
>  	case SCM_RIGHTS:
>  	case SCM_CREDENTIALS:
>  		break;
> +	case SO_ZC_NOTIFICATION:
> +		if (!sock_flag(sk, SOCK_ZEROCOPY) || sk->sk_family == PF_RDS)
> +			return -EINVAL;
> +

Why allow PF_RDS without the sock flag set?

> +		zc_info_usr_p = (struct zc_info_usr *)CMSG_DATA(cmsg);
> +		if (zc_info_usr_p->length <= 0 || zc_info_usr_p->length > SOCK_ZC_INFO_MAX)
> +			return -EINVAL;
> +
> +		zc_info_size = struct_size(zc_info_usr_p, info, zc_info_usr_p->length);
> +		if (cmsg->cmsg_len != CMSG_LEN(zc_info_size))
> +			return -EINVAL;

By passing a straightforward array, the array len can be inferred from
cmsg_len, simplifying all these checks.

See for instance how SO_DEVMEM_DONTNEED returns an array of tokens to
the kernel.

> +
> +		usr_addr = (void *)(uintptr_t)(zc_info_usr_p->usr_addr);
> +		if (!access_ok(usr_addr, zc_info_size))
> +			return -EFAULT;
> +
> +		zc_info_kern_p = kmalloc(zc_info_size, GFP_KERNEL);
> +		if (!zc_info_kern_p)
> +			return -ENOMEM;
> +
> +		q = &sk->sk_error_queue;
> +		skb_queue_head_init(&local_q);
> +		spin_lock_irqsave(&q->lock, flags);
> +		skb = skb_peek(q);
> +		while (skb && i < zc_info_usr_p->length) {
> +			struct sk_buff *skb_next = skb_peek_next(skb, q);
> +
> +			serr = SKB_EXT_ERR(skb);
> +			if (serr->ee.ee_errno == 0 &&
> +			    serr->ee.ee_origin == SO_EE_ORIGIN_ZEROCOPY) {
> +				zc_info_kern_p->info[i].hi = serr->ee.ee_data;
> +				zc_info_kern_p->info[i].lo = serr->ee.ee_info;
> +				zc_info_kern_p->info[i].zerocopy = !(serr->ee.ee_code
> +								& SO_EE_CODE_ZEROCOPY_COPIED);
> +				__skb_unlink(skb, q);
> +				__skb_queue_tail(&local_q, skb);
> +				i++;
> +			}
> +			skb = skb_next;
> +		}
> +		spin_unlock_irqrestore(&q->lock, flags);

In almost all sane cases, all outstanding notifications can be passed
to userspace.

It may be interesting to experiment with briefly taking the lock to
move to a private list. See for instance net_rx_action.

Then if userspace cannot handle all notifications, the rest have to be
spliced back. This can reorder notifications. But rare reordering is
not a correctness issue.

I would choose the more complex splice approach only if it shows
benefit, i.e., if taking the lock does contend with error enqueue
events.

> +
> +		zc_info_kern_p->usr_addr = zc_info_usr_p->usr_addr;
> +		zc_info_kern_p->length = i;
> +
> +		ret = copy_to_user(usr_addr,
> +				   zc_info_kern_p,
> +					struct_size(zc_info_kern_p, info, i));

You'll still need to support the gnarly MSG_CMSG_COMPAT version too.

Wait, is this the reason to pass a usr_addr explicitly? To get around
any compat issues?

Or even the entire issue of having to copy msg_sys->msg_control to
user if !msg_control_is_user.

I suppose this simplifies a lot in terms of development. If making the
user interface uglier.

IMHO the sane interface should be used eventually. There may also be
other uses of passing msg_control data up to userspace from sendmsg.

But this approach works for now for evaluation and discussion.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net-next v2 2/3] sock: add MSG_ZEROCOPY notification mechanism based on msg_control
  2024-04-19 21:48 ` [PATCH net-next v2 2/3] sock: add MSG_ZEROCOPY notification mechanism based on msg_control zijianzhang
  2024-04-21 15:45   ` Willem de Bruijn
@ 2024-04-23  3:16   ` kernel test robot
  1 sibling, 0 replies; 11+ messages in thread
From: kernel test robot @ 2024-04-23  3:16 UTC (permalink / raw
  To: zijianzhang, netdev
  Cc: oe-kbuild-all, edumazet, willemdebruijn.kernel, davem, kuba,
	cong.wang, xiaochun.lu, Zijian Zhang

Hi,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net-next/main]

url:    https://github.com/intel-lab-lkp/linux/commits/zijianzhang-bytedance-com/selftests-fix-OOM-problem-in-msg_zerocopy-selftest/20240420-055035
base:   net-next/main
patch link:    https://lore.kernel.org/r/20240419214819.671536-3-zijianzhang%40bytedance.com
patch subject: [PATCH net-next v2 2/3] sock: add MSG_ZEROCOPY notification mechanism based on msg_control
config: i386-randconfig-061-20240423 (https://download.01.org/0day-ci/archive/20240423/202404231145.7NfmE7Hn-lkp@intel.com/config)
compiler: gcc-13 (Ubuntu 13.2.0-4ubuntu3) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240423/202404231145.7NfmE7Hn-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202404231145.7NfmE7Hn-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
>> net/core/sock.c:2864:26: sparse: sparse: incorrect type in assignment (different address spaces) @@     expected void [noderef] __user *usr_addr @@     got void * @@
   net/core/sock.c:2864:26: sparse:     expected void [noderef] __user *usr_addr
   net/core/sock.c:2864:26: sparse:     got void *
   net/core/sock.c:2393:9: sparse: sparse: context imbalance in 'sk_clone_lock' - wrong count at exit
   net/core/sock.c:2397:6: sparse: sparse: context imbalance in 'sk_free_unlock_clone' - unexpected unlock
   net/core/sock.c:4103:13: sparse: sparse: context imbalance in 'proto_seq_start' - wrong count at exit
   net/core/sock.c:4115:13: sparse: sparse: context imbalance in 'proto_seq_stop' - wrong count at exit

vim +2864 net/core/sock.c

  2807	
  2808	int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg,
  2809			     struct sockcm_cookie *sockc)
  2810	{
  2811		u32 tsflags;
  2812		int ret, zc_info_size, i = 0;
  2813		unsigned long flags;
  2814		struct sk_buff_head *q, local_q;
  2815		struct sk_buff *skb, *tmp;
  2816		struct sock_exterr_skb *serr;
  2817		struct zc_info_usr *zc_info_usr_p, *zc_info_kern_p;
  2818		void __user	*usr_addr;
  2819	
  2820		switch (cmsg->cmsg_type) {
  2821		case SO_MARK:
  2822			if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) &&
  2823			    !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
  2824				return -EPERM;
  2825			if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32)))
  2826				return -EINVAL;
  2827			sockc->mark = *(u32 *)CMSG_DATA(cmsg);
  2828			break;
  2829		case SO_TIMESTAMPING_OLD:
  2830		case SO_TIMESTAMPING_NEW:
  2831			if (cmsg->cmsg_len != CMSG_LEN(sizeof(u32)))
  2832				return -EINVAL;
  2833	
  2834			tsflags = *(u32 *)CMSG_DATA(cmsg);
  2835			if (tsflags & ~SOF_TIMESTAMPING_TX_RECORD_MASK)
  2836				return -EINVAL;
  2837	
  2838			sockc->tsflags &= ~SOF_TIMESTAMPING_TX_RECORD_MASK;
  2839			sockc->tsflags |= tsflags;
  2840			break;
  2841		case SCM_TXTIME:
  2842			if (!sock_flag(sk, SOCK_TXTIME))
  2843				return -EINVAL;
  2844			if (cmsg->cmsg_len != CMSG_LEN(sizeof(u64)))
  2845				return -EINVAL;
  2846			sockc->transmit_time = get_unaligned((u64 *)CMSG_DATA(cmsg));
  2847			break;
  2848		/* SCM_RIGHTS and SCM_CREDENTIALS are semantically in SOL_UNIX. */
  2849		case SCM_RIGHTS:
  2850		case SCM_CREDENTIALS:
  2851			break;
  2852		case SO_ZC_NOTIFICATION:
  2853			if (!sock_flag(sk, SOCK_ZEROCOPY) || sk->sk_family == PF_RDS)
  2854				return -EINVAL;
  2855	
  2856			zc_info_usr_p = (struct zc_info_usr *)CMSG_DATA(cmsg);
  2857			if (zc_info_usr_p->length <= 0 || zc_info_usr_p->length > SOCK_ZC_INFO_MAX)
  2858				return -EINVAL;
  2859	
  2860			zc_info_size = struct_size(zc_info_usr_p, info, zc_info_usr_p->length);
  2861			if (cmsg->cmsg_len != CMSG_LEN(zc_info_size))
  2862				return -EINVAL;
  2863	
> 2864			usr_addr = (void *)(uintptr_t)(zc_info_usr_p->usr_addr);
  2865			if (!access_ok(usr_addr, zc_info_size))
  2866				return -EFAULT;
  2867	
  2868			zc_info_kern_p = kmalloc(zc_info_size, GFP_KERNEL);
  2869			if (!zc_info_kern_p)
  2870				return -ENOMEM;
  2871	
  2872			q = &sk->sk_error_queue;
  2873			skb_queue_head_init(&local_q);
  2874			spin_lock_irqsave(&q->lock, flags);
  2875			skb = skb_peek(q);
  2876			while (skb && i < zc_info_usr_p->length) {
  2877				struct sk_buff *skb_next = skb_peek_next(skb, q);
  2878	
  2879				serr = SKB_EXT_ERR(skb);
  2880				if (serr->ee.ee_errno == 0 &&
  2881				    serr->ee.ee_origin == SO_EE_ORIGIN_ZEROCOPY) {
  2882					zc_info_kern_p->info[i].hi = serr->ee.ee_data;
  2883					zc_info_kern_p->info[i].lo = serr->ee.ee_info;
  2884					zc_info_kern_p->info[i].zerocopy = !(serr->ee.ee_code
  2885									& SO_EE_CODE_ZEROCOPY_COPIED);
  2886					__skb_unlink(skb, q);
  2887					__skb_queue_tail(&local_q, skb);
  2888					i++;
  2889				}
  2890				skb = skb_next;
  2891			}
  2892			spin_unlock_irqrestore(&q->lock, flags);
  2893	
  2894			zc_info_kern_p->usr_addr = zc_info_usr_p->usr_addr;
  2895			zc_info_kern_p->length = i;
  2896	
  2897			ret = copy_to_user(usr_addr,
  2898					   zc_info_kern_p,
  2899						struct_size(zc_info_kern_p, info, i));
  2900			kfree(zc_info_kern_p);
  2901	
  2902			if (unlikely(ret)) {
  2903				spin_lock_irqsave(&q->lock, flags);
  2904				skb_queue_reverse_walk_safe(&local_q, skb, tmp) {
  2905					__skb_unlink(skb, &local_q);
  2906					__skb_queue_head(q, skb);
  2907				}
  2908				spin_unlock_irqrestore(&q->lock, flags);
  2909				return -EFAULT;
  2910			}
  2911	
  2912			while ((skb = __skb_dequeue(&local_q)))
  2913				consume_skb(skb);
  2914			break;
  2915		default:
  2916			return -EINVAL;
  2917		}
  2918		return 0;
  2919	}
  2920	EXPORT_SYMBOL(__sock_cmsg_send);
  2921	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [External] Re: [PATCH net-next v2 0/3] net: A lightweight zero-copy notification
  2024-04-20  3:47 ` [PATCH net-next v2 0/3] net: A lightweight zero-copy notification Jakub Kicinski
@ 2024-04-23 19:29   ` Zijian Zhang
  0 siblings, 0 replies; 11+ messages in thread
From: Zijian Zhang @ 2024-04-23 19:29 UTC (permalink / raw
  To: Jakub Kicinski
  Cc: netdev, edumazet, willemdebruijn.kernel, davem, cong.wang,
	xiaochun.lu

Thanks, will update in the next version.

On 4/19/24 8:47 PM, Jakub Kicinski wrote:
> On Fri, 19 Apr 2024 21:48:16 +0000 zijianzhang@bytedance.com wrote:
>> Original title is "net: socket sendmsg MSG_ZEROCOPY_UARG"
>> https://lore.kernel.org/all/
>> 20240409205300.1346681-2-zijianzhang@bytedance.com/
> 
> AFAICT sparse reports this new warning:
> 
> net/core/sock.c:2864:26: warning: incorrect type in assignment (different address spaces)
> net/core/sock.c:2864:26:    expected void [noderef] __user *usr_addr
> net/core/sock.c:2864:26:    got void *

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [External] Re: [PATCH net-next v2 1/3] selftests: fix OOM problem in msg_zerocopy selftest
  2024-04-21 15:16   ` Willem de Bruijn
@ 2024-04-23 19:31     ` Zijian Zhang
  0 siblings, 0 replies; 11+ messages in thread
From: Zijian Zhang @ 2024-04-23 19:31 UTC (permalink / raw
  To: Willem de Bruijn, netdev; +Cc: edumazet, davem, kuba, cong.wang, xiaochun.lu

Thanks for the suggestions, all make sense to me. I will update in the
next version.

On 4/21/24 8:16 AM, Willem de Bruijn wrote:
> zijianzhang@ wrote:
>> From: Zijian Zhang <zijianzhang@bytedance.com>
>>
>> In selftests/net/msg_zerocopy.c, it has a while loop keeps calling sendmsg
>> on a socket with MSG_ZEROCOPY flag, and it will recv the notifications
>> until the socket is not writable. Typically, it will start the receiving
>> process after around 30+ sendmsgs. However, because of the
>> commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale")
>> the sender is always writable and does not get any chance to run recv
>> notifications. The selftest always exits with OUT_OF_MEMORY because the
>> memory used by opt_skb exceeds the core.sysctl_optmem_max.
>>
>> According to our experiments, this problem can be solved by open the
>> DEBUG_LOCKDEP configuration for the kernel.
> 
> Not so much solved, as mitigated.
> 
>> But it will makes the
>> notificatoins disordered even in good commits before
> 
> typo: notifications
> 
> We still have no explanation for this behavior, right. OOO
> notifications for TCP should be extremely rare.
> 
> A completion is queued when both the skb with the send() data was sent
> and freed, and the ACK arrived, freeing the clone on the retransmit
> queue. This is almost certainly some effect of running over loopback.
> 
> OOO being rare is also what makes the notification batch mechanism so
> effective for TCP.
> 
>> commit dfa2f0483360 ("tcp: get rid of sysctl_tcp_adv_win_scale").
>>
>> We introduce "cfg_notification_limit" to force sender to receive
>> notifications after some number of sendmsgs. And, notifications may not
>> come in order, because of the reason we present above. We have order
>> checking code managed by cfg_verbose.
>>
>> Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
>> Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com>
>> ---
>>   tools/testing/selftests/net/msg_zerocopy.c | 16 ++++++++++++++--
>>   1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/testing/selftests/net/msg_zerocopy.c b/tools/testing/selftests/net/msg_zerocopy.c
>> index bdc03a2097e8..6c18b07cab30 100644
>> --- a/tools/testing/selftests/net/msg_zerocopy.c
>> +++ b/tools/testing/selftests/net/msg_zerocopy.c
>> @@ -1,3 +1,4 @@
>> +// SPDX-License-Identifier: GPL-2.0
>>   /* Evaluate MSG_ZEROCOPY
>>    *
>>    * Send traffic between two processes over one of the supported
>> @@ -85,6 +86,7 @@ static bool cfg_rx;
>>   static int  cfg_runtime_ms	= 4200;
>>   static int  cfg_verbose;
>>   static int  cfg_waittime_ms	= 500;
>> +static int  cfg_notification_limit = 32;
>>   static bool cfg_zerocopy;
>>   
>>   static socklen_t cfg_alen;
>> @@ -95,6 +97,8 @@ static char payload[IP_MAXPACKET];
>>   static long packets, bytes, completions, expected_completions;
>>   static int  zerocopied = -1;
>>   static uint32_t next_completion;
>> +/* The number of sendmsgs which have not received notified yet */
>> +static uint32_t sendmsg_counter;
>>   
>>   static unsigned long gettimeofday_ms(void)
>>   {
>> @@ -208,6 +212,7 @@ static bool do_sendmsg(int fd, struct msghdr *msg, bool do_zerocopy, int domain)
>>   		error(1, errno, "send");
>>   	if (cfg_verbose && ret != len)
>>   		fprintf(stderr, "send: ret=%u != %u\n", ret, len);
>> +	sendmsg_counter++;
>>   
>>   	if (len) {
>>   		packets++;
>> @@ -435,7 +440,7 @@ static bool do_recv_completion(int fd, int domain)
>>   	/* Detect notification gaps. These should not happen often, if at all.
>>   	 * Gaps can occur due to drops, reordering and retransmissions.
>>   	 */
>> -	if (lo != next_completion)
>> +	if (cfg_verbose && lo != next_completion)
>>   		fprintf(stderr, "gap: %u..%u does not append to %u\n",
>>   			lo, hi, next_completion);
>>   	next_completion = hi + 1;
>> @@ -460,6 +465,7 @@ static bool do_recv_completion(int fd, int domain)
>>   static void do_recv_completions(int fd, int domain)
>>   {
>>   	while (do_recv_completion(fd, domain)) {}
>> +	sendmsg_counter = 0;
>>   }
>>   
>>   /* Wait for all remaining completions on the errqueue */
>> @@ -549,6 +555,9 @@ static void do_tx(int domain, int type, int protocol)
>>   		else
>>   			do_sendmsg(fd, &msg, cfg_zerocopy, domain);
>>   
>> +		if (cfg_zerocopy && sendmsg_counter >= cfg_notification_limit)
>> +			do_recv_completions(fd, domain);
>> +
>>   		while (!do_poll(fd, POLLOUT)) {
>>   			if (cfg_zerocopy)
>>   				do_recv_completions(fd, domain);
>> @@ -708,7 +717,7 @@ static void parse_opts(int argc, char **argv)
>>   
>>   	cfg_payload_len = max_payload_len;
>>   
>> -	while ((c = getopt(argc, argv, "46c:C:D:i:mp:rs:S:t:vz")) != -1) {
>> +	while ((c = getopt(argc, argv, "46c:C:D:i:mp:rs:S:t:vzl:n")) != -1) {
> 
> no n defined
> 
> please keep lexicographic order
>>   		switch (c) {
>>   		case '4':
>>   			if (cfg_family != PF_UNSPEC)
>> @@ -760,6 +769,9 @@ static void parse_opts(int argc, char **argv)
>>   		case 'z':
>>   			cfg_zerocopy = true;
>>   			break;
>> +		case 'l':
>> +			cfg_notification_limit = strtoul(optarg, NULL, 0);
>> +			break;
>>   		}
>>   	}
>>   
>> -- 
>> 2.20.1
>>
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [External] Re: [PATCH net-next v2 2/3] sock: add MSG_ZEROCOPY notification mechanism based on msg_control
  2024-04-21 15:45   ` Willem de Bruijn
@ 2024-04-23 19:51     ` Zijian Zhang
  0 siblings, 0 replies; 11+ messages in thread
From: Zijian Zhang @ 2024-04-23 19:51 UTC (permalink / raw
  To: Willem de Bruijn, netdev; +Cc: edumazet, davem, kuba, cong.wang, xiaochun.lu

On 4/21/24 8:45 AM, Willem de Bruijn wrote:
> zijianzhang@ wrote:
>> From: Zijian Zhang <zijianzhang@bytedance.com>
>>
>> The MSG_ZEROCOPY flag enables copy avoidance for socket send calls.
>> However, zerocopy is not a free lunch. Apart from the management of user
>> pages, the combination of poll + recvmsg to receive notifications incurs
>> unignorable overhead in the applications. The overhead of such sometimes
>> might be more than the CPU savings from zerocopy. We try to solve this
>> problem with a new notification mechanism based on msgcontrol.
>> This new mechanism aims to reduce the overhead associated with receiving
>> notifications by embedding them directly into user arguments passed with
>> each sendmsg control message. By doing so, we can significantly reduce
>> the complexity and overhead for managing notifications. In an ideal
>> pattern, the user will keep calling sendmsg with SO_ZC_NOTIFICATION
>> msg_control, and the notification will be delivered as soon as possible.
>>
>> Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
>> Signed-off-by: Xiaochun Lu <xiaochun.lu@bytedance.com>
>> ---
>>   arch/alpha/include/uapi/asm/socket.h  |  2 +
>>   arch/mips/include/uapi/asm/socket.h   |  2 +
>>   arch/parisc/include/uapi/asm/socket.h |  2 +
>>   arch/sparc/include/uapi/asm/socket.h  |  2 +
>>   include/uapi/asm-generic/socket.h     |  2 +
>>   include/uapi/linux/socket.h           | 16 ++++++
>>   net/core/sock.c                       | 70 +++++++++++++++++++++++++++
>>   7 files changed, 96 insertions(+)
>>
>> diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
>> index e94f621903fe..b24622a9cd47 100644
>> --- a/arch/alpha/include/uapi/asm/socket.h
>> +++ b/arch/alpha/include/uapi/asm/socket.h
>> @@ -140,6 +140,8 @@
>>   #define SO_PASSPIDFD		76
>>   #define SO_PEERPIDFD		77
>>   
>> +#define SO_ZC_NOTIFICATION 78
>> +
> 
> SCM_ for cmsgs
> 

Ack.

>>   /*
>>    * Desired design of maximum size and alignment (see RFC2553)
>>    */
>> @@ -35,4 +37,18 @@ struct __kernel_sockaddr_storage {
>>   #define SOCK_TXREHASH_DISABLED	0
>>   #define SOCK_TXREHASH_ENABLED	1
>>   
>> +#define SOCK_ZC_INFO_MAX 256
>> +
>> +struct zc_info_elem {
>> +	__u32 lo;
>> +	__u32 hi;
>> +	__u8 zerocopy;
>> +};
>> +
>> +struct zc_info_usr {
>> +	__u64 usr_addr;
>> +	unsigned int length;
>> +	struct zc_info_elem info[];
>> +};
>> +
> 
> Don't pass a pointer to user memory, just have msg_control point to an
> array of zc_info_elem.
> 

Ack.

>>   #endif /* _UAPI_LINUX_SOCKET_H */
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index fe9195186c13..13f06480f2d8 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -2809,6 +2809,13 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg,
>>   		     struct sockcm_cookie *sockc)
>>   {
>>   	u32 tsflags;
>> +	int ret, zc_info_size, i = 0;
>> +	unsigned long flags;
>> +	struct sk_buff_head *q, local_q;
>> +	struct sk_buff *skb, *tmp;
>> +	struct sock_exterr_skb *serr;
>> +	struct zc_info_usr *zc_info_usr_p, *zc_info_kern_p;
>> +	void __user	*usr_addr;
> 
> Please wrap the case in parentheses and define variables in that scope
> (Since there are so many variables for this case only.)
> 
>>   
>>   	switch (cmsg->cmsg_type) {
>>   	case SO_MARK:
>> @@ -2842,6 +2849,69 @@ int __sock_cmsg_send(struct sock *sk, struct cmsghdr *cmsg,
>>   	case SCM_RIGHTS:
>>   	case SCM_CREDENTIALS:
>>   		break;
>> +	case SO_ZC_NOTIFICATION:
>> +		if (!sock_flag(sk, SOCK_ZEROCOPY) || sk->sk_family == PF_RDS)
>> +			return -EINVAL;
>> +
> 
> Why allow PF_RDS without the sock flag set?
> 

PF_RDS uses POLLIN instead of POLLERR, thus this mechanism cannot work
for PF_RDS. I am rejecting any PF_RDS socket here.

>> +		zc_info_usr_p = (struct zc_info_usr *)CMSG_DATA(cmsg);
>> +		if (zc_info_usr_p->length <= 0 || zc_info_usr_p->length > SOCK_ZC_INFO_MAX)
>> +			return -EINVAL;
>> +
>> +		zc_info_size = struct_size(zc_info_usr_p, info, zc_info_usr_p->length);
>> +		if (cmsg->cmsg_len != CMSG_LEN(zc_info_size))
>> +			return -EINVAL;
> 
> By passing a straightforward array, the array len can be inferred from
> cmsg_len, simplifying all these checks.
> 
> See for instance how SO_DEVMEM_DONTNEED returns an array of tokens to
> the kernel.
> 

Ack.

>> +
>> +		usr_addr = (void *)(uintptr_t)(zc_info_usr_p->usr_addr);
>> +		if (!access_ok(usr_addr, zc_info_size))
>> +			return -EFAULT;
>> +
>> +		zc_info_kern_p = kmalloc(zc_info_size, GFP_KERNEL);
>> +		if (!zc_info_kern_p)
>> +			return -ENOMEM;
>> +
>> +		q = &sk->sk_error_queue;
>> +		skb_queue_head_init(&local_q);
>> +		spin_lock_irqsave(&q->lock, flags);
>> +		skb = skb_peek(q);
>> +		while (skb && i < zc_info_usr_p->length) {
>> +			struct sk_buff *skb_next = skb_peek_next(skb, q);
>> +
>> +			serr = SKB_EXT_ERR(skb);
>> +			if (serr->ee.ee_errno == 0 &&
>> +			    serr->ee.ee_origin == SO_EE_ORIGIN_ZEROCOPY) {
>> +				zc_info_kern_p->info[i].hi = serr->ee.ee_data;
>> +				zc_info_kern_p->info[i].lo = serr->ee.ee_info;
>> +				zc_info_kern_p->info[i].zerocopy = !(serr->ee.ee_code
>> +								& SO_EE_CODE_ZEROCOPY_COPIED);
>> +				__skb_unlink(skb, q);
>> +				__skb_queue_tail(&local_q, skb);
>> +				i++;
>> +			}
>> +			skb = skb_next;
>> +		}
>> +		spin_unlock_irqrestore(&q->lock, flags);
> 
> In almost all sane cases, all outstanding notifications can be passed
> to userspace.
> 
> It may be interesting to experiment with briefly taking the lock to
> move to a private list. See for instance net_rx_action.
> 

Nice catch.

> Then if userspace cannot handle all notifications, the rest have to be
> spliced back. This can reorder notifications. But rare reordering is
> not a correctness issue.
> 
> I would choose the more complex splice approach only if it shows
> benefit, i.e., if taking the lock does contend with error enqueue
> events.
> 

Maybe when the network is very busy, it will contend with
__msg_zerocopy_callback(where new notifications are added)?
I think splice is a better idea.

>> +
>> +		zc_info_kern_p->usr_addr = zc_info_usr_p->usr_addr;
>> +		zc_info_kern_p->length = i;
>> +
>> +		ret = copy_to_user(usr_addr,
>> +				   zc_info_kern_p,
>> +					struct_size(zc_info_kern_p, info, i));
> 
> You'll still need to support the gnarly MSG_CMSG_COMPAT version too.
> 

Assume users pass in zc_info_elem array pointer here, I may use
in_compat_syscall function to check if it's compat. If so, I can
use compat_ptr to convert it. Is it correct?

> Wait, is this the reason to pass a usr_addr explicitly? To get around
> any compat issues?
> 

Yes, I try to make it u64 regardless of 32-bit or 64-bit program, but
it is indeed ugly though.

> Or even the entire issue of having to copy msg_sys->msg_control to
> user if !msg_control_is_user.
> 
> I suppose this simplifies a lot in terms of development. If making the
> user interface uglier.
> 
> IMHO the sane interface should be used eventually. There may also be
> other uses of passing msg_control data up to userspace from sendmsg.
> 
> But this approach works for now for evaluation and discussion.
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-04-23 19:52 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-19 21:48 [PATCH net-next v2 0/3] net: A lightweight zero-copy notification zijianzhang
2024-04-19 21:48 ` [PATCH net-next v2 1/3] selftests: fix OOM problem in msg_zerocopy selftest zijianzhang
2024-04-21 15:16   ` Willem de Bruijn
2024-04-23 19:31     ` [External] " Zijian Zhang
2024-04-19 21:48 ` [PATCH net-next v2 2/3] sock: add MSG_ZEROCOPY notification mechanism based on msg_control zijianzhang
2024-04-21 15:45   ` Willem de Bruijn
2024-04-23 19:51     ` [External] " Zijian Zhang
2024-04-23  3:16   ` kernel test robot
2024-04-19 21:48 ` [PATCH net-next v2 3/3] selftests: add MSG_ZEROCOPY msg_control notification test zijianzhang
2024-04-20  3:47 ` [PATCH net-next v2 0/3] net: A lightweight zero-copy notification Jakub Kicinski
2024-04-23 19:29   ` [External] " Zijian Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).