LKML Archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/3] Allow sk_lookup UDP return traffic to egress when setting src port/address.
@ 2024-09-20 17:02 Tiago Lam
  2024-09-20 17:02 ` [RFC PATCH v2 1/3] ipv4: Support setting src port in sendmsg() Tiago Lam
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Tiago Lam @ 2024-09-20 17:02 UTC (permalink / raw)
  To: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Mykola Lysenko,
	Shuah Khan
  Cc: netdev, linux-kernel, bpf, linux-kselftest, Jakub Sitnicki,
	Tiago Lam, kernel-team

Currently, sk_lookup allows an ebpf program to run on the ingress socket
lookup path, and accept traffic not only on a range of addresses, but
also on a range of ports. At Cloudflare we use sk_lookup for two main
cases:
1. Sharing a single port between multiple services - i.e. two services
   (or more) use disjoint IP ranges but share the same port;
2. Receiving traffic on all ports - i.e. a service which accepts traffic
   on specific IP ranges but any port [1].

However, one main challenge we face while using sk_lookup for these use
cases is how to source return UDP traffic:
- On point 1. above, sometimes this range of addresses are not local
  (i.e. there's no local routes for these in the server), which means we
  need IP_TRANSPARENT set to be able to egress traffic from addresses
  we've received traffic on (or simply IP_FREEBIND in the case of IPv6);
- And on point 2. above, allowing traffic to a range of ports means a
  service could get traffic on multiple ports, but currently there's no
  way to set the source UDP port egress traffic should be sourced from -
  it's possible to receive the original destination port using the
  IP_ORIGDSTADDR ancilliary message in recvmsg, but not set it in
  sendmsg.

Both of these limitations can be worked around, but in a sub-optimal
way. Using IP_TRANSPARENT, for instance, requires special privileges.
And while one could use UDP connected sockets to send return traffic,
creating a connected socket for each different address a UDP traffic is
received on does have performance implications.

Given sk_lookup allows services to accept traffic on a range of
addresses or ports, it seems sensible to also allow return traffic to
proceed through as well, without needing extra configurations / set ups.

This patch sets out to fix both of this issues by:
1. Allowing users to set the src address/port egress traffic should be
   sent from, when calling sendmsg();
2. Validating that this egress traffic comes from a socket that matches
   an ingress socket in sk_lookup.
   - If it does, traffic is allowed to proceed;
   - Otherwise it falls back to the regular egress path.

The downsides to this is that this runs on the egress hot path, although
this work tries to minimise its impact by only performing the reverse
socket lookup when necessary (i.e. only when the src address/port are
modified). Further performance measurements are to be taken, but we're
reaching out early for feedback to see what the technical concerns are
and if we can address them.

[1] https://blog.cloudflare.com/how-we-built-spectrum/

Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Tiago Lam <tiagolam@cloudflare.com>
---
Changes in v2:
- Amended commit messages and cover letter to make the intent and
  implementation clearer (Willem de Bruijn);
- Fixed socket comparison by not using socket cookies and comparing them
  directly (Eric Dumazet);
- Fixed misspellings and checkpatch.pl warnings on line lengths (Simon
  Horman);
- Fixed usage of start_server_addr() and gcc compilation (Philo Lu);
- Link to v1: https://lore.kernel.org/r/20240913-reverse-sk-lookup-v1-0-e721ea003d4c@cloudflare.com

---
Tiago Lam (3):
      ipv4: Support setting src port in sendmsg().
      ipv6: Support setting src port in sendmsg().
      bpf: Add sk_lookup test to use ORIGDSTADDR cmsg.

 include/net/ip.h                                   |  1 +
 net/ipv4/ip_sockglue.c                             | 11 +++
 net/ipv4/udp.c                                     | 35 +++++++++-
 net/ipv6/datagram.c                                | 79 ++++++++++++++++++++++
 net/ipv6/udp.c                                     |  8 ++-
 tools/testing/selftests/bpf/prog_tests/sk_lookup.c | 67 ++++++++++++------
 6 files changed, 176 insertions(+), 25 deletions(-)
---
base-commit: 6562a89739bbefddb5495c09aaab67c1c3756f36
change-id: 20240909-reverse-sk-lookup-f7bf36292bc4

Best regards,
-- 
Tiago Lam <tiagolam@cloudflare.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH v2 1/3] ipv4: Support setting src port in sendmsg().
  2024-09-20 17:02 [RFC PATCH v2 0/3] Allow sk_lookup UDP return traffic to egress when setting src port/address Tiago Lam
@ 2024-09-20 17:02 ` Tiago Lam
  2024-09-21  9:12   ` Willem de Bruijn
  2024-09-22 16:23   ` ericnetdev dumazet
  2024-09-20 17:02 ` [RFC PATCH v2 2/3] ipv6: " Tiago Lam
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 12+ messages in thread
From: Tiago Lam @ 2024-09-20 17:02 UTC (permalink / raw)
  To: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Mykola Lysenko,
	Shuah Khan
  Cc: netdev, linux-kernel, bpf, linux-kselftest, Jakub Sitnicki,
	Tiago Lam, kernel-team

sendmsg() doesn't currently allow users to set the src port from which
egress traffic should be sent from. This is possible if a user wants to
configure the src address from which egress traffic should be sent from
- with the IP_PKTINFO ancillary message, a user is currently able to
  specify a source address to egress from when calling sendmsg().
However, this still requires the user to set the IP_TRANSPARENT flag
using setsockopt(), which happens to require special privileges in the
case of IPv4.

To support users setting the src port for egress traffic when using
sendmsg(), this patch extends the ancillary messages supported by
sendmsg() to support the IP_ORIGDSTADDR ancillary message, reusing the
same cmsg and struct used in recvmsg() - which already supports
specifying a port.

Additionally, to avoid having to have special configurations, such as
IP_TRANSPARENT, this patch allows egress traffic that's been configured
using (the newly added) IP_ORIGDSTADDR to proceed if there's an ingress
socket lookup (sk_lookup) that matches that traffic - by performing a
reserve sk_lookup. Thus, if the sk_lookup reverse call returns a socket
that matches the egress socket, we also let the egress traffic through -
following the principle of, allowing return traffic to proceed if
ingress traffic is allowed in. In case no match is found in the reverse
sk_lookup, traffic falls back to the regular egress path.

This reverse lookup is only performed in case an sk_lookup ebpf program
is attached and the source address and/or port for the return traffic
have been modified using the (newly added) IP_ORIGDSTADDR in sendmsg.

Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Tiago Lam <tiagolam@cloudflare.com>
---
 include/net/ip.h       |  1 +
 net/ipv4/ip_sockglue.c | 11 +++++++++++
 net/ipv4/udp.c         | 35 ++++++++++++++++++++++++++++++++++-
 3 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index c5606cadb1a5..e5753abd7247 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -75,6 +75,7 @@ static inline unsigned int ip_hdrlen(const struct sk_buff *skb)
 struct ipcm_cookie {
 	struct sockcm_cookie	sockc;
 	__be32			addr;
+	__be16			port;
 	int			oif;
 	struct ip_options_rcu	*opt;
 	__u8			protocol;
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index cf377377b52d..6e55bd25b5f7 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -297,6 +297,17 @@ int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc,
 			ipc->addr = info->ipi_spec_dst.s_addr;
 			break;
 		}
+		case IP_ORIGDSTADDR:
+		{
+			struct sockaddr_in *dst_addr;
+
+			if (cmsg->cmsg_len != CMSG_LEN(sizeof(struct sockaddr_in)))
+				return -EINVAL;
+			dst_addr = (struct sockaddr_in *)CMSG_DATA(cmsg);
+			ipc->port = dst_addr->sin_port;
+			ipc->addr = dst_addr->sin_addr.s_addr;
+			break;
+		}
 		case IP_TTL:
 			if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)))
 				return -EINVAL;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 49c622e743e8..208cee40c0ec 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1060,6 +1060,7 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	DECLARE_SOCKADDR(struct sockaddr_in *, usin, msg->msg_name);
 	struct flowi4 fl4_stack;
 	struct flowi4 *fl4;
+	__u8 flow_flags = inet_sk_flowi_flags(sk);
 	int ulen = len;
 	struct ipcm_cookie ipc;
 	struct rtable *rt = NULL;
@@ -1179,6 +1180,39 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		}
 	}
 
+	/* If we're egressing with a different source address and/or port, we
+	 * perform a reverse socket lookup.  The rationale behind this is that
+	 * we can allow return UDP traffic that has ingressed through sk_lookup
+	 * to also egress correctly. In case this the reverse lookup fails.
+	 *
+	 * The lookup is performed if either source address and/or port
+	 * changed, and neither is "0".
+	 */
+	if (static_branch_unlikely(&bpf_sk_lookup_enabled) &&
+	    !connected &&
+	    (ipc.port && ipc.addr) &&
+	    (inet->inet_saddr != ipc.addr || inet->inet_sport != ipc.port)) {
+		struct sock *sk_egress;
+
+		bpf_sk_lookup_run_v4(sock_net(sk), IPPROTO_UDP,
+				     daddr, dport, ipc.addr, ntohs(ipc.port),
+				     1, &sk_egress);
+		if (IS_ERR_OR_NULL(sk_egress) || sk_egress != sk) {
+			net_info_ratelimited("No reverse socket lookup match for local addr %pI4:%d remote addr %pI4:%d\n",
+					     &ipc.addr, ntohs(ipc.port), &daddr,
+					     ntohs(dport));
+		} else {
+			/* Override the source port to use with the one we got
+			 * in cmsg, and tell routing to let us use a non-local
+			 * address. Otherwise route lookups will fail with
+			 * non-local source address when IP_TRANSPARENT isn't
+			 * set.
+			 */
+			inet->inet_sport = ipc.port;
+			flow_flags |= FLOWI_FLAG_ANYSRC;
+		}
+	}
+
 	saddr = ipc.addr;
 	ipc.addr = faddr = daddr;
 
@@ -1223,7 +1257,6 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 	if (!rt) {
 		struct net *net = sock_net(sk);
-		__u8 flow_flags = inet_sk_flowi_flags(sk);
 
 		fl4 = &fl4_stack;
 

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH v2 2/3] ipv6: Support setting src port in sendmsg().
  2024-09-20 17:02 [RFC PATCH v2 0/3] Allow sk_lookup UDP return traffic to egress when setting src port/address Tiago Lam
  2024-09-20 17:02 ` [RFC PATCH v2 1/3] ipv4: Support setting src port in sendmsg() Tiago Lam
@ 2024-09-20 17:02 ` Tiago Lam
  2024-09-21  9:13   ` Willem de Bruijn
  2024-09-23 13:08   ` David Laight
  2024-09-20 17:02 ` [RFC PATCH v2 3/3] bpf: Add sk_lookup test to use ORIGDSTADDR cmsg Tiago Lam
  2024-09-23 14:34 ` [RFC PATCH v2 0/3] Allow sk_lookup UDP return traffic to egress when setting src port/address Jakub Sitnicki
  3 siblings, 2 replies; 12+ messages in thread
From: Tiago Lam @ 2024-09-20 17:02 UTC (permalink / raw)
  To: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Mykola Lysenko,
	Shuah Khan
  Cc: netdev, linux-kernel, bpf, linux-kselftest, Jakub Sitnicki,
	Tiago Lam, kernel-team

This follows the same rationale provided for the ipv4 counterpart, where
the sendmsg() path is also extended here to support the IPV6_ORIGDSTADDR
ancillary message to be able to specify a source address/port. This
allows users to configure the source address and/or port egress traffic
should be sent from.

To limit its usage, a reverse socket lookup is performed to check if the
configured egress source address and/or port have any ingress sk_lookup
match. If it does, traffic is allowed to proceed, otherwise it falls
back to the regular egress path.

Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Tiago Lam <tiagolam@cloudflare.com>
---
 net/ipv6/datagram.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv6/udp.c      |  8 ++++--
 2 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index fff78496803d..369c64a478ec 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -756,6 +756,29 @@ void ip6_datagram_recv_ctl(struct sock *sk, struct msghdr *msg,
 }
 EXPORT_SYMBOL_GPL(ip6_datagram_recv_ctl);
 
+static inline bool reverse_sk_lookup(struct flowi6 *fl6, struct sock *sk,
+				     struct in6_addr *saddr, __be16 sport)
+{
+	if (static_branch_unlikely(&bpf_sk_lookup_enabled) &&
+	    (saddr && sport) &&
+	    (ipv6_addr_cmp(&sk->sk_v6_rcv_saddr, saddr) ||
+	    inet_sk(sk)->inet_sport != sport)) {
+		struct sock *sk_egress;
+
+		bpf_sk_lookup_run_v6(sock_net(sk), IPPROTO_UDP, &fl6->daddr,
+				     fl6->fl6_dport, saddr, ntohs(sport), 0,
+				     &sk_egress);
+		if (!IS_ERR_OR_NULL(sk_egress) && sk_egress == sk)
+			return true;
+
+		net_info_ratelimited("No reverse socket lookup match for local addr %pI6:%d remote addr %pI6:%d\n",
+				     &saddr, ntohs(sport), &fl6->daddr,
+				     ntohs(fl6->fl6_dport));
+	}
+
+	return false;
+}
+
 int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
 			  struct msghdr *msg, struct flowi6 *fl6,
 			  struct ipcm6_cookie *ipc6)
@@ -844,7 +867,63 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
 
 			break;
 		    }
+		case IPV6_ORIGDSTADDR:
+			{
+			struct sockaddr_in6 *sockaddr_in;
+			struct net_device *dev = NULL;
+
+			if (cmsg->cmsg_len < CMSG_LEN(sizeof(struct sockaddr_in6))) {
+				err = -EINVAL;
+				goto exit_f;
+			}
+
+			sockaddr_in = (struct sockaddr_in6 *)CMSG_DATA(cmsg);
+
+			addr_type = __ipv6_addr_type(&sockaddr_in->sin6_addr);
+
+			if (addr_type & IPV6_ADDR_LINKLOCAL)
+				return -EINVAL;
+
+			/* If we're egressing with a different source address
+			 * and/or port, we perform a reverse socket lookup. The
+			 * rationale behind this is that we can allow return
+			 * UDP traffic that has ingressed through sk_lookup to
+			 * also egress correctly. In case the reverse lookup
+			 * fails, we continue with the normal path.
+			 *
+			 * The lookup is performed if either source address
+			 * and/or port changed, and neither is "0".
+			 */
+			if (reverse_sk_lookup(fl6, sk, &sockaddr_in->sin6_addr,
+					      sockaddr_in->sin6_port)) {
+				/* Override the source port and address to use
+				 * with the one we got in cmsg and bail early.
+				 */
+				fl6->saddr = sockaddr_in->sin6_addr;
+				fl6->fl6_sport = sockaddr_in->sin6_port;
+				break;
+			}
 
+			if (addr_type != IPV6_ADDR_ANY) {
+				int strict = __ipv6_addr_src_scope(addr_type) <= IPV6_ADDR_SCOPE_LINKLOCAL;
+
+				if (!ipv6_can_nonlocal_bind(net, inet_sk(sk)) &&
+				    !ipv6_chk_addr_and_flags(net,
+							     &sockaddr_in->sin6_addr,
+							     dev, !strict, 0,
+							     IFA_F_TENTATIVE) &&
+				    !ipv6_chk_acast_addr_src(net, dev,
+							     &sockaddr_in->sin6_addr))
+					err = -EINVAL;
+				else
+					fl6->saddr = sockaddr_in->sin6_addr;
+			}
+
+			if (err)
+				goto exit_f;
+
+			break;
+			}
 		case IPV6_FLOWINFO:
 			if (cmsg->cmsg_len < CMSG_LEN(4)) {
 				err = -EINVAL;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 6602a2e9cdb5..6121cbb71ad3 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1476,6 +1476,12 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 	fl6->flowi6_uid = sk->sk_uid;
 
+	/* We use fl6's daddr and fl6_sport in the reverse sk_lookup done
+	 * within ip6_datagram_send_ctl() now.
+	 */
+	fl6->daddr = *daddr;
+	fl6->fl6_sport = inet->inet_sport;
+
 	if (msg->msg_controllen) {
 		opt = &opt_space;
 		memset(opt, 0, sizeof(struct ipv6_txoptions));
@@ -1511,10 +1517,8 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 	fl6->flowi6_proto = sk->sk_protocol;
 	fl6->flowi6_mark = ipc6.sockc.mark;
-	fl6->daddr = *daddr;
 	if (ipv6_addr_any(&fl6->saddr) && !ipv6_addr_any(&np->saddr))
 		fl6->saddr = np->saddr;
-	fl6->fl6_sport = inet->inet_sport;
 
 	if (cgroup_bpf_enabled(CGROUP_UDP6_SENDMSG) && !connected) {
 		err = BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk,

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH v2 3/3] bpf: Add sk_lookup test to use ORIGDSTADDR cmsg.
  2024-09-20 17:02 [RFC PATCH v2 0/3] Allow sk_lookup UDP return traffic to egress when setting src port/address Tiago Lam
  2024-09-20 17:02 ` [RFC PATCH v2 1/3] ipv4: Support setting src port in sendmsg() Tiago Lam
  2024-09-20 17:02 ` [RFC PATCH v2 2/3] ipv6: " Tiago Lam
@ 2024-09-20 17:02 ` Tiago Lam
  2024-09-23 14:34 ` [RFC PATCH v2 0/3] Allow sk_lookup UDP return traffic to egress when setting src port/address Jakub Sitnicki
  3 siblings, 0 replies; 12+ messages in thread
From: Tiago Lam @ 2024-09-20 17:02 UTC (permalink / raw)
  To: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Mykola Lysenko,
	Shuah Khan
  Cc: netdev, linux-kernel, bpf, linux-kselftest, Jakub Sitnicki,
	Tiago Lam, kernel-team

This patch reuses the framework already in place for sk_lookup, allowing
it now to send a reply from the server fd directly, instead of having to
create a socket bound to the original destination address and reply from
there. It does this by passing the source address and port from where to
reply from in a IP_ORIGDSTADDR ancillary message passed in sendmsg.

Signed-off-by: Tiago Lam <tiagolam@cloudflare.com>
---
 tools/testing/selftests/bpf/prog_tests/sk_lookup.c | 67 +++++++++++++++-------
 1 file changed, 45 insertions(+), 22 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sk_lookup.c b/tools/testing/selftests/bpf/prog_tests/sk_lookup.c
index ae87c00867ba..df780624c16c 100644
--- a/tools/testing/selftests/bpf/prog_tests/sk_lookup.c
+++ b/tools/testing/selftests/bpf/prog_tests/sk_lookup.c
@@ -75,6 +75,7 @@ struct test {
 	struct inet_addr listen_at;
 	enum server accept_on;
 	bool reuseport_has_conns; /* Add a connected socket to reuseport group */
+	bool dont_bind_reply; /* Don't bind, send direct from server fd. */
 };
 
 struct cb_opts {
@@ -363,7 +364,7 @@ static void v4_to_v6(struct sockaddr_storage *ss)
 	memset(&v6->sin6_addr.s6_addr[0], 0, 10);
 }
 
-static int udp_recv_send(int server_fd)
+static int udp_recv_send(int server_fd, bool dont_bind_reply)
 {
 	char cmsg_buf[CMSG_SPACE(sizeof(struct sockaddr_storage))];
 	struct sockaddr_storage _src_addr = { 0 };
@@ -373,7 +374,7 @@ static int udp_recv_send(int server_fd)
 	struct iovec iov = { 0 };
 	struct cmsghdr *cm;
 	char buf[1];
-	int ret, fd;
+	int fd;
 	ssize_t n;
 
 	iov.iov_base = buf;
@@ -415,26 +416,37 @@ static int udp_recv_send(int server_fd)
 		v4_to_v6(dst_addr);
 	}
 
-	/* Reply from original destination address. */
-	fd = start_server_addr(SOCK_DGRAM, dst_addr, sizeof(*dst_addr), NULL);
-	if (!ASSERT_OK_FD(fd, "start_server_addr")) {
-		log_err("failed to create tx socket");
-		return -1;
-	}
+	if (dont_bind_reply) {
+		/* Reply directly from server fd, specifying the source address
+		 * and/or port in struct msghdr.
+		 */
+		n = sendmsg(server_fd, &msg, 0);
+		if (CHECK(n <= 0, "sendmsg", "failed\n")) {
+			log_err("failed to send echo reply");
+			return -1;
+		}
+	} else {
+		/* Reply from original destination address. */
+		fd = start_server_addr(SOCK_DGRAM, dst_addr, sizeof(*dst_addr),
+				       NULL);
+		if (!ASSERT_OK_FD(fd, "start_server_addr")) {
+			log_err("failed to create tx socket");
+			return -1;
+		}
 
-	msg.msg_control = NULL;
-	msg.msg_controllen = 0;
-	n = sendmsg(fd, &msg, 0);
-	if (CHECK(n <= 0, "sendmsg", "failed\n")) {
-		log_err("failed to send echo reply");
-		ret = -1;
-		goto out;
+		msg.msg_control = NULL;
+		msg.msg_controllen = 0;
+		n = sendmsg(fd, &msg, 0);
+		if (CHECK(n <= 0, "sendmsg", "failed\n")) {
+			log_err("failed to send echo reply");
+			close(fd);
+			return -1;
+		}
+
+		close(fd);
 	}
 
-	ret = 0;
-out:
-	close(fd);
-	return ret;
+	return 0;
 }
 
 static int tcp_echo_test(int client_fd, int server_fd)
@@ -454,14 +466,14 @@ static int tcp_echo_test(int client_fd, int server_fd)
 	return 0;
 }
 
-static int udp_echo_test(int client_fd, int server_fd)
+static int udp_echo_test(int client_fd, int server_fd, bool dont_bind_reply)
 {
 	int err;
 
 	err = send_byte(client_fd);
 	if (err)
 		return -1;
-	err = udp_recv_send(server_fd);
+	err = udp_recv_send(server_fd, dont_bind_reply);
 	if (err)
 		return -1;
 	err = recv_byte(client_fd);
@@ -653,7 +665,8 @@ static void run_lookup_prog(const struct test *t)
 	if (t->sotype == SOCK_STREAM)
 		tcp_echo_test(client_fd, server_fds[t->accept_on]);
 	else
-		udp_echo_test(client_fd, server_fds[t->accept_on]);
+		udp_echo_test(client_fd, server_fds[t->accept_on],
+			      t->dont_bind_reply);
 
 	close(client_fd);
 close:
@@ -775,6 +788,16 @@ static void test_redirect_lookup(struct test_sk_lookup *skel)
 			.listen_at	= { INT_IP4, INT_PORT },
 			.accept_on	= SERVER_B,
 		},
+		{
+			.desc		= "UDP IPv4 redir different ports",
+			.lookup_prog	= skel->progs.select_sock_a_no_reuseport,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { INT_IP4, INT_PORT },
+			.accept_on	= SERVER_A,
+			.dont_bind_reply = true,
+		},
 		{
 			.desc		= "UDP IPv4 redir and reuseport with conns",
 			.lookup_prog	= skel->progs.select_sock_a,

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH v2 1/3] ipv4: Support setting src port in sendmsg().
  2024-09-20 17:02 ` [RFC PATCH v2 1/3] ipv4: Support setting src port in sendmsg() Tiago Lam
@ 2024-09-21  9:12   ` Willem de Bruijn
  2024-09-22 16:23   ` ericnetdev dumazet
  1 sibling, 0 replies; 12+ messages in thread
From: Willem de Bruijn @ 2024-09-21  9:12 UTC (permalink / raw)
  To: Tiago Lam, David S. Miller, David Ahern, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Mykola Lysenko,
	Shuah Khan
  Cc: netdev, linux-kernel, bpf, linux-kselftest, Jakub Sitnicki,
	Tiago Lam, kernel-team

Tiago Lam wrote:
> sendmsg() doesn't currently allow users to set the src port from which
> egress traffic should be sent from. This is possible if a user wants to
> configure the src address from which egress traffic should be sent from
> - with the IP_PKTINFO ancillary message, a user is currently able to
>   specify a source address to egress from when calling sendmsg().
> However, this still requires the user to set the IP_TRANSPARENT flag
> using setsockopt(), which happens to require special privileges in the
> case of IPv4.
> 
> To support users setting the src port for egress traffic when using
> sendmsg(), this patch extends the ancillary messages supported by
> sendmsg() to support the IP_ORIGDSTADDR ancillary message, reusing the
> same cmsg and struct used in recvmsg() - which already supports
> specifying a port.
> 
> Additionally, to avoid having to have special configurations, such as
> IP_TRANSPARENT, this patch allows egress traffic that's been configured
> using (the newly added) IP_ORIGDSTADDR to proceed if there's an ingress
> socket lookup (sk_lookup) that matches that traffic - by performing a
> reserve sk_lookup. Thus, if the sk_lookup reverse call returns a socket
> that matches the egress socket, we also let the egress traffic through -
> following the principle of, allowing return traffic to proceed if
> ingress traffic is allowed in. In case no match is found in the reverse
> sk_lookup, traffic falls back to the regular egress path.
> 
> This reverse lookup is only performed in case an sk_lookup ebpf program
> is attached and the source address and/or port for the return traffic
> have been modified using the (newly added) IP_ORIGDSTADDR in sendmsg.
> 
> Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
> Signed-off-by: Tiago Lam <tiagolam@cloudflare.com>
> ---
>  include/net/ip.h       |  1 +
>  net/ipv4/ip_sockglue.c | 11 +++++++++++
>  net/ipv4/udp.c         | 35 ++++++++++++++++++++++++++++++++++-
>  3 files changed, 46 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/ip.h b/include/net/ip.h
> index c5606cadb1a5..e5753abd7247 100644
> --- a/include/net/ip.h
> +++ b/include/net/ip.h
> @@ -75,6 +75,7 @@ static inline unsigned int ip_hdrlen(const struct sk_buff *skb)
>  struct ipcm_cookie {
>  	struct sockcm_cookie	sockc;
>  	__be32			addr;
> +	__be16			port;
>  	int			oif;
>  	struct ip_options_rcu	*opt;
>  	__u8			protocol;
> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> index cf377377b52d..6e55bd25b5f7 100644
> --- a/net/ipv4/ip_sockglue.c
> +++ b/net/ipv4/ip_sockglue.c
> @@ -297,6 +297,17 @@ int ip_cmsg_send(struct sock *sk, struct msghdr *msg, struct ipcm_cookie *ipc,
>  			ipc->addr = info->ipi_spec_dst.s_addr;
>  			break;
>  		}
> +		case IP_ORIGDSTADDR:

Should this just be IP_SRCADDR?

> +		{
> +			struct sockaddr_in *dst_addr;
> +
> +			if (cmsg->cmsg_len != CMSG_LEN(sizeof(struct sockaddr_in)))
> +				return -EINVAL;
> +			dst_addr = (struct sockaddr_in *)CMSG_DATA(cmsg);
> +			ipc->port = dst_addr->sin_port;
> +			ipc->addr = dst_addr->sin_addr.s_addr;
> +			break;
> +		}
>  		case IP_TTL:
>  			if (cmsg->cmsg_len != CMSG_LEN(sizeof(int)))
>  				return -EINVAL;
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 49c622e743e8..208cee40c0ec 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1060,6 +1060,7 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>  	DECLARE_SOCKADDR(struct sockaddr_in *, usin, msg->msg_name);
>  	struct flowi4 fl4_stack;
>  	struct flowi4 *fl4;
> +	__u8 flow_flags = inet_sk_flowi_flags(sk);
>  	int ulen = len;
>  	struct ipcm_cookie ipc;
>  	struct rtable *rt = NULL;
> @@ -1179,6 +1180,39 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>  		}
>  	}
>  
> +	/* If we're egressing with a different source address and/or port, we
> +	 * perform a reverse socket lookup.  The rationale behind this is that
> +	 * we can allow return UDP traffic that has ingressed through sk_lookup
> +	 * to also egress correctly. In case this the reverse lookup fails.
> +	 *
> +	 * The lookup is performed if either source address and/or port
> +	 * changed, and neither is "0".
> +	 */
> +	if (static_branch_unlikely(&bpf_sk_lookup_enabled) &&
> +	    !connected &&
> +	    (ipc.port && ipc.addr) &&
> +	    (inet->inet_saddr != ipc.addr || inet->inet_sport != ipc.port)) {
> +		struct sock *sk_egress;
> +
> +		bpf_sk_lookup_run_v4(sock_net(sk), IPPROTO_UDP,
> +				     daddr, dport, ipc.addr, ntohs(ipc.port),
> +				     1, &sk_egress);

Does this need to use a bpf helper rather than a normal route lookup
function?

I don't know this func, but the sk is returned without a reference
taken?

> +		if (IS_ERR_OR_NULL(sk_egress) || sk_egress != sk) {
> +			net_info_ratelimited("No reverse socket lookup match for local addr %pI4:%d remote addr %pI4:%d\n",
> +					     &ipc.addr, ntohs(ipc.port), &daddr,
> +					     ntohs(dport));

No need for logging to the kernel log when syscalls can just return an
error.

> +		} else {
> +			/* Override the source port to use with the one we got
> +			 * in cmsg, and tell routing to let us use a non-local
> +			 * address. Otherwise route lookups will fail with
> +			 * non-local source address when IP_TRANSPARENT isn't
> +			 * set.
> +			 */
> +			inet->inet_sport = ipc.port;
> +			flow_flags |= FLOWI_FLAG_ANYSRC;
> +		}
> +	}
> +
>  	saddr = ipc.addr;
>  	ipc.addr = faddr = daddr;
>  
> @@ -1223,7 +1257,6 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>  
>  	if (!rt) {
>  		struct net *net = sock_net(sk);
> -		__u8 flow_flags = inet_sk_flowi_flags(sk);
>  
>  		fl4 = &fl4_stack;
>  
> 
> -- 
> 2.34.1
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH v2 2/3] ipv6: Support setting src port in sendmsg().
  2024-09-20 17:02 ` [RFC PATCH v2 2/3] ipv6: " Tiago Lam
@ 2024-09-21  9:13   ` Willem de Bruijn
  2024-09-23 13:08   ` David Laight
  1 sibling, 0 replies; 12+ messages in thread
From: Willem de Bruijn @ 2024-09-21  9:13 UTC (permalink / raw)
  To: Tiago Lam, David S. Miller, David Ahern, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Mykola Lysenko,
	Shuah Khan
  Cc: netdev, linux-kernel, bpf, linux-kselftest, Jakub Sitnicki,
	Tiago Lam, kernel-team

Tiago Lam wrote:
> This follows the same rationale provided for the ipv4 counterpart, where
> the sendmsg() path is also extended here to support the IPV6_ORIGDSTADDR
> ancillary message to be able to specify a source address/port. This
> allows users to configure the source address and/or port egress traffic
> should be sent from.
> 
> To limit its usage, a reverse socket lookup is performed to check if the
> configured egress source address and/or port have any ingress sk_lookup
> match. If it does, traffic is allowed to proceed, otherwise it falls
> back to the regular egress path.
> 
> Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
> Signed-off-by: Tiago Lam <tiagolam@cloudflare.com>
> ---
>  net/ipv6/datagram.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  net/ipv6/udp.c      |  8 ++++--
>  2 files changed, 85 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
> index fff78496803d..369c64a478ec 100644
> --- a/net/ipv6/datagram.c
> +++ b/net/ipv6/datagram.c
> @@ -756,6 +756,29 @@ void ip6_datagram_recv_ctl(struct sock *sk, struct msghdr *msg,
>  }
>  EXPORT_SYMBOL_GPL(ip6_datagram_recv_ctl);
>  
> +static inline bool reverse_sk_lookup(struct flowi6 *fl6, struct sock *sk,
> +				     struct in6_addr *saddr, __be16 sport)
> +{
> +	if (static_branch_unlikely(&bpf_sk_lookup_enabled) &&
> +	    (saddr && sport) &&
> +	    (ipv6_addr_cmp(&sk->sk_v6_rcv_saddr, saddr) ||
> +	    inet_sk(sk)->inet_sport != sport)) {
> +		struct sock *sk_egress;
> +
> +		bpf_sk_lookup_run_v6(sock_net(sk), IPPROTO_UDP, &fl6->daddr,
> +				     fl6->fl6_dport, saddr, ntohs(sport), 0,
> +				     &sk_egress);
> +		if (!IS_ERR_OR_NULL(sk_egress) && sk_egress == sk)
> +			return true;
> +
> +		net_info_ratelimited("No reverse socket lookup match for local addr %pI6:%d remote addr %pI6:%d\n",
> +				     &saddr, ntohs(sport), &fl6->daddr,
> +				     ntohs(fl6->fl6_dport));
> +	}
> +
> +	return false;
> +}
> +
>  int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
>  			  struct msghdr *msg, struct flowi6 *fl6,
>  			  struct ipcm6_cookie *ipc6)
> @@ -844,7 +867,63 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
>  
>  			break;
>  		    }
> +		case IPV6_ORIGDSTADDR:
> +			{
> +			struct sockaddr_in6 *sockaddr_in;
> +			struct net_device *dev = NULL;
> +
> +			if (cmsg->cmsg_len < CMSG_LEN(sizeof(struct sockaddr_in6))) {
> +				err = -EINVAL;
> +				goto exit_f;
> +			}
> +
> +			sockaddr_in = (struct sockaddr_in6 *)CMSG_DATA(cmsg);
> +
> +			addr_type = __ipv6_addr_type(&sockaddr_in->sin6_addr);
> +
> +			if (addr_type & IPV6_ADDR_LINKLOCAL)
> +				return -EINVAL;
> +
> +			/* If we're egressing with a different source address
> +			 * and/or port, we perform a reverse socket lookup. The
> +			 * rationale behind this is that we can allow return
> +			 * UDP traffic that has ingressed through sk_lookup to
> +			 * also egress correctly. In case the reverse lookup
> +			 * fails, we continue with the normal path.
> +			 *
> +			 * The lookup is performed if either source address
> +			 * and/or port changed, and neither is "0".
> +			 */
> +			if (reverse_sk_lookup(fl6, sk, &sockaddr_in->sin6_addr,
> +					      sockaddr_in->sin6_port)) {
> +				/* Override the source port and address to use
> +				 * with the one we got in cmsg and bail early.
> +				 */
> +				fl6->saddr = sockaddr_in->sin6_addr;
> +				fl6->fl6_sport = sockaddr_in->sin6_port;
> +				break;
> +			}
>  
> +			if (addr_type != IPV6_ADDR_ANY) {
> +				int strict = __ipv6_addr_src_scope(addr_type) <= IPV6_ADDR_SCOPE_LINKLOCAL;
> +
> +				if (!ipv6_can_nonlocal_bind(net, inet_sk(sk)) &&
> +				    !ipv6_chk_addr_and_flags(net,
> +							     &sockaddr_in->sin6_addr,
> +							     dev, !strict, 0,
> +							     IFA_F_TENTATIVE) &&
> +				    !ipv6_chk_acast_addr_src(net, dev,
> +							     &sockaddr_in->sin6_addr))
> +					err = -EINVAL;
> +				else
> +					fl6->saddr = sockaddr_in->sin6_addr;
> +			}
> +
> +			if (err)
> +				goto exit_f;
> +
> +			break;
> +			}

How come IPv6 runs the check in the cmsg handler, but ipv4 in sendmsg
directly? Can the two be symmetric?

>  		case IPV6_FLOWINFO:
>  			if (cmsg->cmsg_len < CMSG_LEN(4)) {
>  				err = -EINVAL;
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index 6602a2e9cdb5..6121cbb71ad3 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -1476,6 +1476,12 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>  
>  	fl6->flowi6_uid = sk->sk_uid;
>  
> +	/* We use fl6's daddr and fl6_sport in the reverse sk_lookup done
> +	 * within ip6_datagram_send_ctl() now.
> +	 */
> +	fl6->daddr = *daddr;
> +	fl6->fl6_sport = inet->inet_sport;
> +
>  	if (msg->msg_controllen) {
>  		opt = &opt_space;
>  		memset(opt, 0, sizeof(struct ipv6_txoptions));
> @@ -1511,10 +1517,8 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>  
>  	fl6->flowi6_proto = sk->sk_protocol;
>  	fl6->flowi6_mark = ipc6.sockc.mark;
> -	fl6->daddr = *daddr;
>  	if (ipv6_addr_any(&fl6->saddr) && !ipv6_addr_any(&np->saddr))
>  		fl6->saddr = np->saddr;
> -	fl6->fl6_sport = inet->inet_sport;
>  
>  	if (cgroup_bpf_enabled(CGROUP_UDP6_SENDMSG) && !connected) {
>  		err = BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk,
> 
> -- 
> 2.34.1
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH v2 1/3] ipv4: Support setting src port in sendmsg().
  2024-09-20 17:02 ` [RFC PATCH v2 1/3] ipv4: Support setting src port in sendmsg() Tiago Lam
  2024-09-21  9:12   ` Willem de Bruijn
@ 2024-09-22 16:23   ` ericnetdev dumazet
  1 sibling, 0 replies; 12+ messages in thread
From: ericnetdev dumazet @ 2024-09-22 16:23 UTC (permalink / raw)
  To: Tiago Lam
  Cc: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Mykola Lysenko,
	Shuah Khan, netdev, linux-kernel, bpf, linux-kselftest,
	Jakub Sitnicki, kernel-team

Le ven. 20 sept. 2024 à 19:03, Tiago Lam <tiagolam@cloudflare.com> a écrit :
>
> sendmsg() doesn't currently allow users to set the src port from which
> egress traffic should be sent from. This is possible if a user wants to
> configure the src address from which egress traffic should be sent from
> - with the IP_PKTINFO ancillary message, a user is currently able to
>   specify a source address to egress from when calling sendmsg().
> However, this still requires the user to set the IP_TRANSPARENT flag
> using setsockopt(), which happens to require special privileges in the
> case of IPv4.
>
> To support users setting the src port for egress traffic when using
> sendmsg(), this patch extends the ancillary messages supported by
> sendmsg() to support the IP_ORIGDSTADDR ancillary message, reusing the
> same cmsg and struct used in recvmsg() - which already supports
> specifying a port.
>
> Additionally, to avoid having to have special configurations, such as
> IP_TRANSPARENT, this patch allows egress traffic that's been configured
> using (the newly added) IP_ORIGDSTADDR to proceed if there's an ingress
> socket lookup (sk_lookup) that matches that traffic - by performing a
> reserve sk_lookup. Thus, if the sk_lookup reverse call returns a socket
> that matches the egress socket, we also let the egress traffic through -
> following the principle of, allowing return traffic to proceed if
> ingress traffic is allowed in. In case no match is found in the reverse
> sk_lookup, traffic falls back to the regular egress path.
>
> This reverse lookup is only performed in case an sk_lookup ebpf program
> is attached and the source address and/or port for the return traffic
> have been modified using the (newly added) IP_ORIGDSTADDR in sendmsg.

Is it compatible with SO_REUSEPORT ?

Most heavy duty UDP servers use SO_REUSEPORT to spread incoming
packets to multiple sockets.

How is the reverse lookup going to choose the 'right' socket ?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [RFC PATCH v2 2/3] ipv6: Support setting src port in sendmsg().
  2024-09-20 17:02 ` [RFC PATCH v2 2/3] ipv6: " Tiago Lam
  2024-09-21  9:13   ` Willem de Bruijn
@ 2024-09-23 13:08   ` David Laight
  2024-09-23 14:56     ` Jakub Sitnicki
  1 sibling, 1 reply; 12+ messages in thread
From: David Laight @ 2024-09-23 13:08 UTC (permalink / raw)
  To: 'Tiago Lam', David S. Miller, David Ahern, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Willem de Bruijn, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Mykola Lysenko,
	Shuah Khan
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
	Jakub Sitnicki, kernel-team@cloudflare.com

From: Tiago Lam <tiagolam@cloudflare.com>
> Sent: 20 September 2024 18:02
> 
> This follows the same rationale provided for the ipv4 counterpart, where
> the sendmsg() path is also extended here to support the IPV6_ORIGDSTADDR
> ancillary message to be able to specify a source address/port. This
> allows users to configure the source address and/or port egress traffic
> should be sent from.

I'd missed that being added - could save us the horrid problem of getting
the UDP checksum correct when sending UDP over a raw socket.
(That isn't a problem for IPv6.)

> To limit its usage, a reverse socket lookup is performed to check if the
> configured egress source address and/or port have any ingress sk_lookup
> match. If it does, traffic is allowed to proceed, otherwise it falls
> back to the regular egress path.

Is that really useful/necessary?
The check (but not the commit message) implies that some 'bpf thingy'
also needs to be enabled.
Any check would need to include the test that the program sending the packet
has the ability to send a packet through the ingress socket.
Additionally a check for the sending process having (IIRC) CAP_NET_ADMIN
(which would let the process send the message by other means) would save the
slow path.

The code we have sends a lot of UDP RTP (typically 160 bytes of audio every 20ms).
There is actually no reason for there to be a valid matching ingress path.
(That code would benefit from being able to bind a lot of ports to the same
UDP socket.)

	David

> 
> Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
> Signed-off-by: Tiago Lam <tiagolam@cloudflare.com>
> ---
>  net/ipv6/datagram.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  net/ipv6/udp.c      |  8 ++++--
>  2 files changed, 85 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
> index fff78496803d..369c64a478ec 100644
> --- a/net/ipv6/datagram.c
> +++ b/net/ipv6/datagram.c
> @@ -756,6 +756,29 @@ void ip6_datagram_recv_ctl(struct sock *sk, struct msghdr *msg,
>  }
>  EXPORT_SYMBOL_GPL(ip6_datagram_recv_ctl);
> 
> +static inline bool reverse_sk_lookup(struct flowi6 *fl6, struct sock *sk,
> +				     struct in6_addr *saddr, __be16 sport)
> +{
> +	if (static_branch_unlikely(&bpf_sk_lookup_enabled) &&
> +	    (saddr && sport) &&
> +	    (ipv6_addr_cmp(&sk->sk_v6_rcv_saddr, saddr) ||
> +	    inet_sk(sk)->inet_sport != sport)) {
> +		struct sock *sk_egress;
> +
> +		bpf_sk_lookup_run_v6(sock_net(sk), IPPROTO_UDP, &fl6->daddr,
> +				     fl6->fl6_dport, saddr, ntohs(sport), 0,
> +				     &sk_egress);
> +		if (!IS_ERR_OR_NULL(sk_egress) && sk_egress == sk)
> +			return true;
> +
> +		net_info_ratelimited("No reverse socket lookup match for local addr %pI6:%d remote addr
> %pI6:%d\n",
> +				     &saddr, ntohs(sport), &fl6->daddr,
> +				     ntohs(fl6->fl6_dport));
> +	}
> +
> +	return false;
> +}
> +
>  int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
>  			  struct msghdr *msg, struct flowi6 *fl6,
>  			  struct ipcm6_cookie *ipc6)
> @@ -844,7 +867,63 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
> 
>  			break;
>  		    }
> +		case IPV6_ORIGDSTADDR:
> +			{
> +			struct sockaddr_in6 *sockaddr_in;
> +			struct net_device *dev = NULL;
> +
> +			if (cmsg->cmsg_len < CMSG_LEN(sizeof(struct sockaddr_in6))) {
> +				err = -EINVAL;
> +				goto exit_f;
> +			}
> +
> +			sockaddr_in = (struct sockaddr_in6 *)CMSG_DATA(cmsg);
> +
> +			addr_type = __ipv6_addr_type(&sockaddr_in->sin6_addr);
> +
> +			if (addr_type & IPV6_ADDR_LINKLOCAL)
> +				return -EINVAL;
> +
> +			/* If we're egressing with a different source address
> +			 * and/or port, we perform a reverse socket lookup. The
> +			 * rationale behind this is that we can allow return
> +			 * UDP traffic that has ingressed through sk_lookup to
> +			 * also egress correctly. In case the reverse lookup
> +			 * fails, we continue with the normal path.
> +			 *
> +			 * The lookup is performed if either source address
> +			 * and/or port changed, and neither is "0".
> +			 */
> +			if (reverse_sk_lookup(fl6, sk, &sockaddr_in->sin6_addr,
> +					      sockaddr_in->sin6_port)) {
> +				/* Override the source port and address to use
> +				 * with the one we got in cmsg and bail early.
> +				 */
> +				fl6->saddr = sockaddr_in->sin6_addr;
> +				fl6->fl6_sport = sockaddr_in->sin6_port;
> +				break;
> +			}
> 
> +			if (addr_type != IPV6_ADDR_ANY) {
> +				int strict = __ipv6_addr_src_scope(addr_type) <= IPV6_ADDR_SCOPE_LINKLOCAL;
> +
> +				if (!ipv6_can_nonlocal_bind(net, inet_sk(sk)) &&
> +				    !ipv6_chk_addr_and_flags(net,
> +							     &sockaddr_in->sin6_addr,
> +							     dev, !strict, 0,
> +							     IFA_F_TENTATIVE) &&
> +				    !ipv6_chk_acast_addr_src(net, dev,
> +							     &sockaddr_in->sin6_addr))
> +					err = -EINVAL;
> +				else
> +					fl6->saddr = sockaddr_in->sin6_addr;
> +			}
> +
> +			if (err)
> +				goto exit_f;
> +
> +			break;
> +			}
>  		case IPV6_FLOWINFO:
>  			if (cmsg->cmsg_len < CMSG_LEN(4)) {
>  				err = -EINVAL;
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index 6602a2e9cdb5..6121cbb71ad3 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -1476,6 +1476,12 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> 
>  	fl6->flowi6_uid = sk->sk_uid;
> 
> +	/* We use fl6's daddr and fl6_sport in the reverse sk_lookup done
> +	 * within ip6_datagram_send_ctl() now.
> +	 */
> +	fl6->daddr = *daddr;
> +	fl6->fl6_sport = inet->inet_sport;
> +
>  	if (msg->msg_controllen) {
>  		opt = &opt_space;
>  		memset(opt, 0, sizeof(struct ipv6_txoptions));
> @@ -1511,10 +1517,8 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> 
>  	fl6->flowi6_proto = sk->sk_protocol;
>  	fl6->flowi6_mark = ipc6.sockc.mark;
> -	fl6->daddr = *daddr;
>  	if (ipv6_addr_any(&fl6->saddr) && !ipv6_addr_any(&np->saddr))
>  		fl6->saddr = np->saddr;
> -	fl6->fl6_sport = inet->inet_sport;
> 
>  	if (cgroup_bpf_enabled(CGROUP_UDP6_SENDMSG) && !connected) {
>  		err = BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk,
> 
> --
> 2.34.1
> 

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH v2 0/3] Allow sk_lookup UDP return traffic to egress when setting src port/address.
  2024-09-20 17:02 [RFC PATCH v2 0/3] Allow sk_lookup UDP return traffic to egress when setting src port/address Tiago Lam
                   ` (2 preceding siblings ...)
  2024-09-20 17:02 ` [RFC PATCH v2 3/3] bpf: Add sk_lookup test to use ORIGDSTADDR cmsg Tiago Lam
@ 2024-09-23 14:34 ` Jakub Sitnicki
  3 siblings, 0 replies; 12+ messages in thread
From: Jakub Sitnicki @ 2024-09-23 14:34 UTC (permalink / raw)
  To: Tiago Lam
  Cc: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Willem de Bruijn, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Mykola Lysenko,
	Shuah Khan, netdev, linux-kernel, bpf, linux-kselftest,
	kernel-team

Just an FYI to the reviewers -

Tiago is out this week, so his reponses will be delayed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH v2 2/3] ipv6: Support setting src port in sendmsg().
  2024-09-23 13:08   ` David Laight
@ 2024-09-23 14:56     ` Jakub Sitnicki
  2024-09-23 15:45       ` David Laight
  0 siblings, 1 reply; 12+ messages in thread
From: Jakub Sitnicki @ 2024-09-23 14:56 UTC (permalink / raw)
  To: David Laight, Martin KaFai Lau, 'Tiago Lam', Eric Dumazet,
	Willem de Bruijn
  Cc: David S. Miller, David Ahern, Jakub Kicinski, Paolo Abeni,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Mykola Lysenko,
	Shuah Khan, netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
	kernel-team@cloudflare.com

On Mon, Sep 23, 2024 at 01:08 PM GMT, David Laight wrote:
> From: Tiago Lam <tiagolam@cloudflare.com>

[...]

>> To limit its usage, a reverse socket lookup is performed to check if the
>> configured egress source address and/or port have any ingress sk_lookup
>> match. If it does, traffic is allowed to proceed, otherwise it falls
>> back to the regular egress path.
>
> Is that really useful/necessary?

We've been asking ourselves the same question during Plumbers with
Martin.

Unprivileges processes can already source UDP traffic from (almost) any
IP & port by binding a socket to the desired source port and passing
IP_PKTINFO. So perhaps having a reverse socket lookup is an overkill.

We should probably respect net.ipv4.ip_local_reserved_ports and
net.ipv4.ip_unprivileged_port_start system settings, though, or check
for relevant caps.

Open question if it is acceptable to disregard exclusive UDP port
ownership by sockets binding to a wildcard address without SO_REUSEADDR?

[...]





> The check (but not the commit message) implies that some 'bpf thingy'
> also needs to be enabled.
> Any check would need to include the test that the program sending the packet
> has the ability to send a packet through the ingress socket.
> Additionally a check for the sending process having (IIRC) CAP_NET_ADMIN
> (which would let the process send the message by other means) would save the
> slow path.
>
> The code we have sends a lot of UDP RTP (typically 160 bytes of audio every 20ms).
> There is actually no reason for there to be a valid matching ingress path.
> (That code would benefit from being able to bind a lot of ports to the same
> UDP socket.)
>
> 	David
>
>> 
>> Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
>> Signed-off-by: Tiago Lam <tiagolam@cloudflare.com>
>> ---
>>  net/ipv6/datagram.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  net/ipv6/udp.c      |  8 ++++--
>>  2 files changed, 85 insertions(+), 2 deletions(-)
>> 
>> diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
>> index fff78496803d..369c64a478ec 100644
>> --- a/net/ipv6/datagram.c
>> +++ b/net/ipv6/datagram.c
>> @@ -756,6 +756,29 @@ void ip6_datagram_recv_ctl(struct sock *sk, struct msghdr *msg,
>>  }
>>  EXPORT_SYMBOL_GPL(ip6_datagram_recv_ctl);
>> 
>> +static inline bool reverse_sk_lookup(struct flowi6 *fl6, struct sock *sk,
>> +				     struct in6_addr *saddr, __be16 sport)
>> +{
>> +	if (static_branch_unlikely(&bpf_sk_lookup_enabled) &&
>> +	    (saddr && sport) &&
>> +	    (ipv6_addr_cmp(&sk->sk_v6_rcv_saddr, saddr) ||
>> +	    inet_sk(sk)->inet_sport != sport)) {
>> +		struct sock *sk_egress;
>> +
>> +		bpf_sk_lookup_run_v6(sock_net(sk), IPPROTO_UDP, &fl6->daddr,
>> +				     fl6->fl6_dport, saddr, ntohs(sport), 0,
>> +				     &sk_egress);
>> +		if (!IS_ERR_OR_NULL(sk_egress) && sk_egress == sk)
>> +			return true;
>> +
>> +		net_info_ratelimited("No reverse socket lookup match for local addr %pI6:%d remote addr
>> %pI6:%d\n",
>> +				     &saddr, ntohs(sport), &fl6->daddr,
>> +				     ntohs(fl6->fl6_dport));
>> +	}
>> +
>> +	return false;
>> +}
>> +
>>  int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
>>  			  struct msghdr *msg, struct flowi6 *fl6,
>>  			  struct ipcm6_cookie *ipc6)
>> @@ -844,7 +867,63 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
>> 
>>  			break;
>>  		    }
>> +		case IPV6_ORIGDSTADDR:
>> +			{
>> +			struct sockaddr_in6 *sockaddr_in;
>> +			struct net_device *dev = NULL;
>> +
>> +			if (cmsg->cmsg_len < CMSG_LEN(sizeof(struct sockaddr_in6))) {
>> +				err = -EINVAL;
>> +				goto exit_f;
>> +			}
>> +
>> +			sockaddr_in = (struct sockaddr_in6 *)CMSG_DATA(cmsg);
>> +
>> +			addr_type = __ipv6_addr_type(&sockaddr_in->sin6_addr);
>> +
>> +			if (addr_type & IPV6_ADDR_LINKLOCAL)
>> +				return -EINVAL;
>> +
>> +			/* If we're egressing with a different source address
>> +			 * and/or port, we perform a reverse socket lookup. The
>> +			 * rationale behind this is that we can allow return
>> +			 * UDP traffic that has ingressed through sk_lookup to
>> +			 * also egress correctly. In case the reverse lookup
>> +			 * fails, we continue with the normal path.
>> +			 *
>> +			 * The lookup is performed if either source address
>> +			 * and/or port changed, and neither is "0".
>> +			 */
>> +			if (reverse_sk_lookup(fl6, sk, &sockaddr_in->sin6_addr,
>> +					      sockaddr_in->sin6_port)) {
>> +				/* Override the source port and address to use
>> +				 * with the one we got in cmsg and bail early.
>> +				 */
>> +				fl6->saddr = sockaddr_in->sin6_addr;
>> +				fl6->fl6_sport = sockaddr_in->sin6_port;
>> +				break;
>> +			}
>> 
>> +			if (addr_type != IPV6_ADDR_ANY) {
>> +				int strict = __ipv6_addr_src_scope(addr_type) <= IPV6_ADDR_SCOPE_LINKLOCAL;
>> +
>> +				if (!ipv6_can_nonlocal_bind(net, inet_sk(sk)) &&
>> +				    !ipv6_chk_addr_and_flags(net,
>> +							     &sockaddr_in->sin6_addr,
>> +							     dev, !strict, 0,
>> +							     IFA_F_TENTATIVE) &&
>> +				    !ipv6_chk_acast_addr_src(net, dev,
>> +							     &sockaddr_in->sin6_addr))
>> +					err = -EINVAL;
>> +				else
>> +					fl6->saddr = sockaddr_in->sin6_addr;
>> +			}
>> +
>> +			if (err)
>> +				goto exit_f;
>> +
>> +			break;
>> +			}
>>  		case IPV6_FLOWINFO:
>>  			if (cmsg->cmsg_len < CMSG_LEN(4)) {
>>  				err = -EINVAL;
>> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
>> index 6602a2e9cdb5..6121cbb71ad3 100644
>> --- a/net/ipv6/udp.c
>> +++ b/net/ipv6/udp.c
>> @@ -1476,6 +1476,12 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>> 
>>  	fl6->flowi6_uid = sk->sk_uid;
>> 
>> +	/* We use fl6's daddr and fl6_sport in the reverse sk_lookup done
>> +	 * within ip6_datagram_send_ctl() now.
>> +	 */
>> +	fl6->daddr = *daddr;
>> +	fl6->fl6_sport = inet->inet_sport;
>> +
>>  	if (msg->msg_controllen) {
>>  		opt = &opt_space;
>>  		memset(opt, 0, sizeof(struct ipv6_txoptions));
>> @@ -1511,10 +1517,8 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>> 
>>  	fl6->flowi6_proto = sk->sk_protocol;
>>  	fl6->flowi6_mark = ipc6.sockc.mark;
>> -	fl6->daddr = *daddr;
>>  	if (ipv6_addr_any(&fl6->saddr) && !ipv6_addr_any(&np->saddr))
>>  		fl6->saddr = np->saddr;
>> -	fl6->fl6_sport = inet->inet_sport;
>> 
>>  	if (cgroup_bpf_enabled(CGROUP_UDP6_SENDMSG) && !connected) {
>>  		err = BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk,
>> 
>> --
>> 2.34.1
>> 
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [RFC PATCH v2 2/3] ipv6: Support setting src port in sendmsg().
  2024-09-23 14:56     ` Jakub Sitnicki
@ 2024-09-23 15:45       ` David Laight
  2024-09-23 16:44         ` Jakub Sitnicki
  0 siblings, 1 reply; 12+ messages in thread
From: David Laight @ 2024-09-23 15:45 UTC (permalink / raw)
  To: 'Jakub Sitnicki', Martin KaFai Lau, 'Tiago Lam',
	Eric Dumazet, Willem de Bruijn
  Cc: David S. Miller, David Ahern, Jakub Kicinski, Paolo Abeni,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Mykola Lysenko,
	Shuah Khan, netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
	kernel-team@cloudflare.com

From: Jakub Sitnicki
> Sent: 23 September 2024 15:56
> 
> On Mon, Sep 23, 2024 at 01:08 PM GMT, David Laight wrote:
> > From: Tiago Lam <tiagolam@cloudflare.com>
> 
> [...]
> 
> >> To limit its usage, a reverse socket lookup is performed to check if the
> >> configured egress source address and/or port have any ingress sk_lookup
> >> match. If it does, traffic is allowed to proceed, otherwise it falls
> >> back to the regular egress path.
> >
> > Is that really useful/necessary?
> 
> We've been asking ourselves the same question during Plumbers with
> Martin.
> 
> Unprivileges processes can already source UDP traffic from (almost) any
> IP & port by binding a socket to the desired source port and passing
> IP_PKTINFO. So perhaps having a reverse socket lookup is an overkill.

Traditionally you'd need to bind to the source port on any local IP
(or the wildcard IP) that didn't have another socket bound to that port.
Modern Linux might have more restrictions and SO_REUSADDR muddies things.

And I don't think you can do a connect() on an unbound UDP socket to
set the source port at the same time as the destination IP+port.
(That would actually be useful.)

OTOH if you just want to send a UDP message you can just use another
system on the same network.
You might need to spoof the source mac - but that isn't hard (although
it might confuse any ethernet switches).

> We should probably respect net.ipv4.ip_local_reserved_ports and
> net.ipv4.ip_unprivileged_port_start system settings, though, or check
> for relevant caps.

True.

> Open question if it is acceptable to disregard exclusive UDP port
> ownership by sockets binding to a wildcard address without SO_REUSEADDR?

We've often suffered from the opposite - a program binds to the wildcard
IP and everything works until something else binds to the same port and
a specific local IP.
I'm sure this is grief some on both TCP and UDP - especially since you
often need to set SO_REUSADDR to stop other things breaking.

	David
 

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH v2 2/3] ipv6: Support setting src port in sendmsg().
  2024-09-23 15:45       ` David Laight
@ 2024-09-23 16:44         ` Jakub Sitnicki
  0 siblings, 0 replies; 12+ messages in thread
From: Jakub Sitnicki @ 2024-09-23 16:44 UTC (permalink / raw)
  To: David Laight
  Cc: Martin KaFai Lau, 'Tiago Lam', Eric Dumazet,
	Willem de Bruijn, David S. Miller, David Ahern, Jakub Kicinski,
	Paolo Abeni, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Mykola Lysenko,
	Shuah Khan, netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	bpf@vger.kernel.org, linux-kselftest@vger.kernel.org,
	kernel-team@cloudflare.com

On Mon, Sep 23, 2024 at 03:45 PM GMT, David Laight wrote:
> From: Jakub Sitnicki
>> Sent: 23 September 2024 15:56
>> 
>> On Mon, Sep 23, 2024 at 01:08 PM GMT, David Laight wrote:
>> > From: Tiago Lam <tiagolam@cloudflare.com>
>> 
>> [...]
>> 
>> >> To limit its usage, a reverse socket lookup is performed to check if the
>> >> configured egress source address and/or port have any ingress sk_lookup
>> >> match. If it does, traffic is allowed to proceed, otherwise it falls
>> >> back to the regular egress path.
>> >
>> > Is that really useful/necessary?
>> 
>> We've been asking ourselves the same question during Plumbers with
>> Martin.
>> 
>> Unprivileges processes can already source UDP traffic from (almost) any
>> IP & port by binding a socket to the desired source port and passing
>> IP_PKTINFO. So perhaps having a reverse socket lookup is an overkill.
>
> Traditionally you'd need to bind to the source port on any local IP
> (or the wildcard IP) that didn't have another socket bound to that port.

Right. Linux IP_PKTINFO extension relaxes this requirement. You can bind
to some local IP (whichever is free, plently to choose from in 127/8
local subnet), and specify the source address to use OOB at sendmsg()
time (as long as the address is local to the host, otherwise you need
additional capabilities).

> Modern Linux might have more restrictions and SO_REUSADDR muddies things.
>
> And I don't think you can do a connect() on an unbound UDP socket to
> set the source port at the same time as the destination IP+port.
> (That would actually be useful.)

You can. It's somewhat recent (v6.3+) [1]:

https://manpages.debian.org/unstable/manpages/ip.7.en.html#IP_LOCAL_PORT_RANGE

It's not on par with TCP when it comes to local port sharing because we
hash UDP sockets only by 2-tuple. Though, some effort to improve that is
taking place I see.

The recipe is:

1. delay the auto-bind until connect() time with IP_BIND_ADDRESS_NO_PORT
   socket option, and
2. tell the udp stack to consider only a single local port during the
   free port search with IP_LOCAL_PORT_RANGE option.

That amounts to something like (in pseudocode):

  s = socket(AF_INET, SOCK_DGRAM)
  s.setsockopt(SOL_IP, IP_BIND_ADDRESS_NO_PORT, 1)
  s.setsockopt(SOL_IP, IP_LOCAL_PORT_RANGE, 44_444 << 16 | 44_444)
  s.bind(("192.0.2.42", 0))
  s.connect(("1.1.1.1", 53))

You can combine it with SO_REUSEADDR to share the local address between
sockets, but you have to ensure manually that you don't run into
conflicts between sockets (two sockets using the same 4-tuple). That's
something we're hoping to improve in the future.

> OTOH if you just want to send a UDP message you can just use another
> system on the same network.
> You might need to spoof the source mac - but that isn't hard (although
> it might confuse any ethernet switches).
>
>> We should probably respect net.ipv4.ip_local_reserved_ports and
>> net.ipv4.ip_unprivileged_port_start system settings, though, or check
>> for relevant caps.
>
> True.
>
>> Open question if it is acceptable to disregard exclusive UDP port
>> ownership by sockets binding to a wildcard address without SO_REUSEADDR?
>
> We've often suffered from the opposite - a program binds to the wildcard
> IP and everything works until something else binds to the same port and
> a specific local IP.

Let me see if I understand - what would happen today for UDP is:

app #1 - bind(("0.0.0.0", 53)) -> OK
app #2 - bind(("192.0.2.1", 53)) -> EADDRINUSE

... unless both are setting SO_REUSEADDR (or SO_REUSEPORT and run under
same UID).

That is why if we allow selecting the source port at sendmsg() time, we
would be relaxing the existing UDP port ownership guarantees for
wildcard binds.

Perhaps this merits a sysctl, so the admin can decide if it is an
acceptable trade-off in their environment.

> I'm sure this is grief some on both TCP and UDP - especially since you
> often need to set SO_REUSADDR to stop other things breaking.
>
> 	David
>  
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-09-23 16:44 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-20 17:02 [RFC PATCH v2 0/3] Allow sk_lookup UDP return traffic to egress when setting src port/address Tiago Lam
2024-09-20 17:02 ` [RFC PATCH v2 1/3] ipv4: Support setting src port in sendmsg() Tiago Lam
2024-09-21  9:12   ` Willem de Bruijn
2024-09-22 16:23   ` ericnetdev dumazet
2024-09-20 17:02 ` [RFC PATCH v2 2/3] ipv6: " Tiago Lam
2024-09-21  9:13   ` Willem de Bruijn
2024-09-23 13:08   ` David Laight
2024-09-23 14:56     ` Jakub Sitnicki
2024-09-23 15:45       ` David Laight
2024-09-23 16:44         ` Jakub Sitnicki
2024-09-20 17:02 ` [RFC PATCH v2 3/3] bpf: Add sk_lookup test to use ORIGDSTADDR cmsg Tiago Lam
2024-09-23 14:34 ` [RFC PATCH v2 0/3] Allow sk_lookup UDP return traffic to egress when setting src port/address Jakub Sitnicki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).