All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RESEND bpf v3 0/8] sock_map: some bug fixes and improvements
@ 2021-06-15  2:13 Cong Wang
  2021-06-15  2:13 ` [PATCH RESEND bpf v3 1/8] skmsg: improve udp_bpf_recvmsg() accuracy Cong Wang
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: Cong Wang @ 2021-06-15  2:13 UTC (permalink / raw)
  To: netdev; +Cc: bpf, Cong Wang, John Fastabend

From: Cong Wang <cong.wang@bytedance.com>

This patchset contains a few bug fixes and improvements for sock_map.

Patch 1 improves recvmsg() accuracy for UDP, patch 2 improves UDP
non-blocking read() by retrying on EAGAIN. With both of them, the
failure rate of the UDP test case goes down from 10% to 1%.

Patch 3 is memory leak fix I posted, no change since v1. The rest
patches address similar memory leaks or improve error handling,
including one increases sk_drops counter for error cases. Please
check each patch description for more details.

Acked-by: John Fastabend <john.fastabend@gmail.com>

---
Resend this patchset as it is lost after John's review.

v3: add another bug fix as patch 4
    update patch 5 accordingly
    address John's review on the last patch
    fix a few typos in patch descriptions

v2: group all patches together
    set max for retries of EAGAIN

Cong Wang (8):
  skmsg: improve udp_bpf_recvmsg() accuracy
  selftests/bpf: Retry for EAGAIN in udp_redir_to_connected()
  udp: fix a memory leak in udp_read_sock()
  skmsg: clear skb redirect pointer before dropping it
  skmsg: fix a memory leak in sk_psock_verdict_apply()
  skmsg: teach sk_psock_verdict_apply() to return errors
  skmsg: pass source psock to sk_psock_skb_redirect()
  skmsg: increase sk->sk_drops when dropping packets

 include/linux/skmsg.h                         |  2 -
 net/core/skmsg.c                              | 82 +++++++++----------
 net/ipv4/tcp_bpf.c                            | 24 +++++-
 net/ipv4/udp.c                                |  2 +
 net/ipv4/udp_bpf.c                            | 47 +++++++++--
 .../selftests/bpf/prog_tests/sockmap_listen.c |  7 +-
 6 files changed, 112 insertions(+), 52 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH RESEND bpf v3 1/8] skmsg: improve udp_bpf_recvmsg() accuracy
  2021-06-15  2:13 [PATCH RESEND bpf v3 0/8] sock_map: some bug fixes and improvements Cong Wang
@ 2021-06-15  2:13 ` Cong Wang
  2021-06-15  2:13 ` [PATCH RESEND bpf v3 2/8] selftests/bpf: Retry for EAGAIN in udp_redir_to_connected() Cong Wang
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Cong Wang @ 2021-06-15  2:13 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

I tried to reuse sk_msg_wait_data() for different protocols,
but it turns out it can not be simply reused. For example,
UDP actually uses two queues to receive skb:
udp_sk(sk)->reader_queue and sk->sk_receive_queue. So we have
to check both of them to know whether we have received any
packet.

Also, UDP does not lock the sock during BH Rx path, it makes
no sense for its ->recvmsg() to lock the sock. It is always
possible for ->recvmsg() to be called before packets actually
arrive in the receive queue, we just use best effort to make
it accurate here.

Fixes: 1f5be6b3b063 ("udp: Implement udp_bpf_recvmsg() for sockmap")
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 include/linux/skmsg.h |  2 --
 net/core/skmsg.c      | 23 ---------------------
 net/ipv4/tcp_bpf.c    | 24 +++++++++++++++++++++-
 net/ipv4/udp_bpf.c    | 47 ++++++++++++++++++++++++++++++++++++++-----
 4 files changed, 65 insertions(+), 31 deletions(-)

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index aba0f0f429be..e3d080c299f6 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -126,8 +126,6 @@ int sk_msg_zerocopy_from_iter(struct sock *sk, struct iov_iter *from,
 			      struct sk_msg *msg, u32 bytes);
 int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from,
 			     struct sk_msg *msg, u32 bytes);
-int sk_msg_wait_data(struct sock *sk, struct sk_psock *psock, int flags,
-		     long timeo, int *err);
 int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
 		   int len, int flags);
 
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 43ce17a6a585..f9a81b314e4c 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -399,29 +399,6 @@ int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from,
 }
 EXPORT_SYMBOL_GPL(sk_msg_memcopy_from_iter);
 
-int sk_msg_wait_data(struct sock *sk, struct sk_psock *psock, int flags,
-		     long timeo, int *err)
-{
-	DEFINE_WAIT_FUNC(wait, woken_wake_function);
-	int ret = 0;
-
-	if (sk->sk_shutdown & RCV_SHUTDOWN)
-		return 1;
-
-	if (!timeo)
-		return ret;
-
-	add_wait_queue(sk_sleep(sk), &wait);
-	sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
-	ret = sk_wait_event(sk, &timeo,
-			    !list_empty(&psock->ingress_msg) ||
-			    !skb_queue_empty(&sk->sk_receive_queue), &wait);
-	sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk);
-	remove_wait_queue(sk_sleep(sk), &wait);
-	return ret;
-}
-EXPORT_SYMBOL_GPL(sk_msg_wait_data);
-
 /* Receive sk_msg from psock->ingress_msg to @msg. */
 int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
 		   int len, int flags)
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index ad9d17923fc5..bb49b52d7be8 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -163,6 +163,28 @@ static bool tcp_bpf_stream_read(const struct sock *sk)
 	return !empty;
 }
 
+static int tcp_msg_wait_data(struct sock *sk, struct sk_psock *psock, int flags,
+			     long timeo, int *err)
+{
+	DEFINE_WAIT_FUNC(wait, woken_wake_function);
+	int ret = 0;
+
+	if (sk->sk_shutdown & RCV_SHUTDOWN)
+		return 1;
+
+	if (!timeo)
+		return ret;
+
+	add_wait_queue(sk_sleep(sk), &wait);
+	sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
+	ret = sk_wait_event(sk, &timeo,
+			    !list_empty(&psock->ingress_msg) ||
+			    !skb_queue_empty(&sk->sk_receive_queue), &wait);
+	sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk);
+	remove_wait_queue(sk_sleep(sk), &wait);
+	return ret;
+}
+
 static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		    int nonblock, int flags, int *addr_len)
 {
@@ -188,7 +210,7 @@ static int tcp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		long timeo;
 
 		timeo = sock_rcvtimeo(sk, nonblock);
-		data = sk_msg_wait_data(sk, psock, flags, timeo, &err);
+		data = tcp_msg_wait_data(sk, psock, flags, timeo, &err);
 		if (data) {
 			if (!sk_psock_queue_empty(psock))
 				goto msg_bytes_ready;
diff --git a/net/ipv4/udp_bpf.c b/net/ipv4/udp_bpf.c
index 954c4591a6fd..565a70040c57 100644
--- a/net/ipv4/udp_bpf.c
+++ b/net/ipv4/udp_bpf.c
@@ -21,6 +21,45 @@ static int sk_udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 	return udp_prot.recvmsg(sk, msg, len, noblock, flags, addr_len);
 }
 
+static bool udp_sk_has_data(struct sock *sk)
+{
+	return !skb_queue_empty(&udp_sk(sk)->reader_queue) ||
+	       !skb_queue_empty(&sk->sk_receive_queue);
+}
+
+static bool psock_has_data(struct sk_psock *psock)
+{
+	return !skb_queue_empty(&psock->ingress_skb) ||
+	       !sk_psock_queue_empty(psock);
+}
+
+#define udp_msg_has_data(__sk, __psock)	\
+		({ udp_sk_has_data(__sk) || psock_has_data(__psock); })
+
+static int udp_msg_wait_data(struct sock *sk, struct sk_psock *psock, int flags,
+			     long timeo, int *err)
+{
+	DEFINE_WAIT_FUNC(wait, woken_wake_function);
+	int ret = 0;
+
+	if (sk->sk_shutdown & RCV_SHUTDOWN)
+		return 1;
+
+	if (!timeo)
+		return ret;
+
+	add_wait_queue(sk_sleep(sk), &wait);
+	sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
+	ret = udp_msg_has_data(sk, psock);
+	if (!ret) {
+		wait_woken(&wait, TASK_INTERRUPTIBLE, timeo);
+		ret = udp_msg_has_data(sk, psock);
+	}
+	sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk);
+	remove_wait_queue(sk_sleep(sk), &wait);
+	return ret;
+}
+
 static int udp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 			   int nonblock, int flags, int *addr_len)
 {
@@ -34,8 +73,7 @@ static int udp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 	if (unlikely(!psock))
 		return sk_udp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
 
-	lock_sock(sk);
-	if (sk_psock_queue_empty(psock)) {
+	if (!psock_has_data(psock)) {
 		ret = sk_udp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
 		goto out;
 	}
@@ -47,9 +85,9 @@ static int udp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 		long timeo;
 
 		timeo = sock_rcvtimeo(sk, nonblock);
-		data = sk_msg_wait_data(sk, psock, flags, timeo, &err);
+		data = udp_msg_wait_data(sk, psock, flags, timeo, &err);
 		if (data) {
-			if (!sk_psock_queue_empty(psock))
+			if (psock_has_data(psock))
 				goto msg_bytes_ready;
 			ret = sk_udp_recvmsg(sk, msg, len, nonblock, flags, addr_len);
 			goto out;
@@ -62,7 +100,6 @@ static int udp_bpf_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
 	}
 	ret = copied;
 out:
-	release_sock(sk);
 	sk_psock_put(sk, psock);
 	return ret;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RESEND bpf v3 2/8] selftests/bpf: Retry for EAGAIN in udp_redir_to_connected()
  2021-06-15  2:13 [PATCH RESEND bpf v3 0/8] sock_map: some bug fixes and improvements Cong Wang
  2021-06-15  2:13 ` [PATCH RESEND bpf v3 1/8] skmsg: improve udp_bpf_recvmsg() accuracy Cong Wang
@ 2021-06-15  2:13 ` Cong Wang
  2021-06-15  2:13 ` [PATCH RESEND bpf v3 3/8] udp: fix a memory leak in udp_read_sock() Cong Wang
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Cong Wang @ 2021-06-15  2:13 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, Jiang Wang, John Fastabend, Daniel Borkmann,
	Jakub Sitnicki, Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

We use non-blocking sockets for testing sockmap redirections,
and got some random EAGAIN errors from UDP tests.

There is no guarantee the packet would be immediately available
to receive as soon as it is sent out, even on the local host.
For UDP, this is especially true because it does not lock the
sock during BH (unlike the TCP path). This is probably why we
only saw this error in UDP cases.

No matter how hard we try to make the queue empty check accurate,
it is always possible for recvmsg() to beat ->sk_data_ready().
Therefore, we should just retry in case of EAGAIN.

Fixes: d6378af615275 ("selftests/bpf: Add a test case for udp sockmap")
Reported-by: Jiang Wang <jiang.wang@bytedance.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
index 648d9ae898d2..01ab11259809 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
@@ -1610,6 +1610,7 @@ static void udp_redir_to_connected(int family, int sotype, int sock_mapfd,
 	struct sockaddr_storage addr;
 	int c0, c1, p0, p1;
 	unsigned int pass;
+	int retries = 100;
 	socklen_t len;
 	int err, n;
 	u64 value;
@@ -1686,9 +1687,13 @@ static void udp_redir_to_connected(int family, int sotype, int sock_mapfd,
 	if (pass != 1)
 		FAIL("%s: want pass count 1, have %d", log_prefix, pass);
 
+again:
 	n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
-	if (n < 0)
+	if (n < 0) {
+		if (errno == EAGAIN && retries--)
+			goto again;
 		FAIL_ERRNO("%s: read", log_prefix);
+	}
 	if (n == 0)
 		FAIL("%s: incomplete read", log_prefix);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RESEND bpf v3 3/8] udp: fix a memory leak in udp_read_sock()
  2021-06-15  2:13 [PATCH RESEND bpf v3 0/8] sock_map: some bug fixes and improvements Cong Wang
  2021-06-15  2:13 ` [PATCH RESEND bpf v3 1/8] skmsg: improve udp_bpf_recvmsg() accuracy Cong Wang
  2021-06-15  2:13 ` [PATCH RESEND bpf v3 2/8] selftests/bpf: Retry for EAGAIN in udp_redir_to_connected() Cong Wang
@ 2021-06-15  2:13 ` Cong Wang
  2021-06-15  2:13 ` [PATCH RESEND bpf v3 4/8] skmsg: clear skb redirect pointer before dropping it Cong Wang
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Cong Wang @ 2021-06-15  2:13 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

sk_psock_verdict_recv() clones the skb and uses the clone
afterward, so udp_read_sock() should free the skb after using
it, regardless of error or not.

This fixes a real kmemleak.

Fixes: d7f571188ecf ("udp: Implement ->read_sock() for sockmap")
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/ipv4/udp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 15f5504adf5b..e31d67fd5183 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1798,11 +1798,13 @@ int udp_read_sock(struct sock *sk, read_descriptor_t *desc,
 		if (used <= 0) {
 			if (!copied)
 				copied = used;
+			kfree_skb(skb);
 			break;
 		} else if (used <= skb->len) {
 			copied += used;
 		}
 
+		kfree_skb(skb);
 		if (!desc->count)
 			break;
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RESEND bpf v3 4/8] skmsg: clear skb redirect pointer before dropping it
  2021-06-15  2:13 [PATCH RESEND bpf v3 0/8] sock_map: some bug fixes and improvements Cong Wang
                   ` (2 preceding siblings ...)
  2021-06-15  2:13 ` [PATCH RESEND bpf v3 3/8] udp: fix a memory leak in udp_read_sock() Cong Wang
@ 2021-06-15  2:13 ` Cong Wang
  2021-06-15  2:13 ` [PATCH RESEND bpf v3 5/8] skmsg: fix a memory leak in sk_psock_verdict_apply() Cong Wang
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Cong Wang @ 2021-06-15  2:13 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, Jiang Wang, John Fastabend, Daniel Borkmann,
	Jakub Sitnicki, Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

When we drop skb inside sk_psock_skb_redirect(), we have to clear
its skb->_sk_redir pointer too, otherwise kfree_skb() would
misinterpret it as a valid skb->_skb_refdst and dst_release()
would eventually complain.

Fixes: e3526bb92a20 ("skmsg: Move sk_redir from TCP_SKB_CB to skb")
Reported-by: Jiang Wang <jiang.wang@bytedance.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/core/skmsg.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index f9a81b314e4c..4334720e2a04 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -843,12 +843,14 @@ static void sk_psock_skb_redirect(struct sk_buff *skb)
 	 * a socket that is in this state so we drop the skb.
 	 */
 	if (!psock_other || sock_flag(sk_other, SOCK_DEAD)) {
+		skb_bpf_redirect_clear(skb);
 		kfree_skb(skb);
 		return;
 	}
 	spin_lock_bh(&psock_other->ingress_lock);
 	if (!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) {
 		spin_unlock_bh(&psock_other->ingress_lock);
+		skb_bpf_redirect_clear(skb);
 		kfree_skb(skb);
 		return;
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RESEND bpf v3 5/8] skmsg: fix a memory leak in sk_psock_verdict_apply()
  2021-06-15  2:13 [PATCH RESEND bpf v3 0/8] sock_map: some bug fixes and improvements Cong Wang
                   ` (3 preceding siblings ...)
  2021-06-15  2:13 ` [PATCH RESEND bpf v3 4/8] skmsg: clear skb redirect pointer before dropping it Cong Wang
@ 2021-06-15  2:13 ` Cong Wang
  2021-06-15  2:13 ` [PATCH RESEND bpf v3 6/8] skmsg: teach sk_psock_verdict_apply() to return errors Cong Wang
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Cong Wang @ 2021-06-15  2:13 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

If the dest psock does not set SK_PSOCK_TX_ENABLED,
the skb can't be queued anywhere so must be dropped.

This one is found during code review.

Fixes: 799aa7f98d53 ("skmsg: Avoid lock_sock() in sk_psock_backlog()")
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/core/skmsg.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 4334720e2a04..5464477e2d3d 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -924,8 +924,13 @@ static void sk_psock_verdict_apply(struct sk_psock *psock,
 			if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) {
 				skb_queue_tail(&psock->ingress_skb, skb);
 				schedule_work(&psock->work);
+				err = 0;
 			}
 			spin_unlock_bh(&psock->ingress_lock);
+			if (err < 0) {
+				skb_bpf_redirect_clear(skb);
+				goto out_free;
+			}
 		}
 		break;
 	case __SK_REDIRECT:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RESEND bpf v3 6/8] skmsg: teach sk_psock_verdict_apply() to return errors
  2021-06-15  2:13 [PATCH RESEND bpf v3 0/8] sock_map: some bug fixes and improvements Cong Wang
                   ` (4 preceding siblings ...)
  2021-06-15  2:13 ` [PATCH RESEND bpf v3 5/8] skmsg: fix a memory leak in sk_psock_verdict_apply() Cong Wang
@ 2021-06-15  2:13 ` Cong Wang
  2021-06-15  2:13 ` [PATCH RESEND bpf v3 7/8] skmsg: pass source psock to sk_psock_skb_redirect() Cong Wang
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Cong Wang @ 2021-06-15  2:13 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

Currently sk_psock_verdict_apply() is void, but it handles some
error conditions too. Its caller is impossible to learn whether
it succeeds or fails, especially sk_psock_verdict_recv().

Make it return int to indicate error cases and propagate errors
to callers properly.

Fixes: ef5659280eb1 ("bpf, sockmap: Allow skipping sk_skb parser program")
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/core/skmsg.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 5464477e2d3d..e3d210811db4 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -824,7 +824,7 @@ int sk_psock_msg_verdict(struct sock *sk, struct sk_psock *psock,
 }
 EXPORT_SYMBOL_GPL(sk_psock_msg_verdict);
 
-static void sk_psock_skb_redirect(struct sk_buff *skb)
+static int sk_psock_skb_redirect(struct sk_buff *skb)
 {
 	struct sk_psock *psock_other;
 	struct sock *sk_other;
@@ -835,7 +835,7 @@ static void sk_psock_skb_redirect(struct sk_buff *skb)
 	 */
 	if (unlikely(!sk_other)) {
 		kfree_skb(skb);
-		return;
+		return -EIO;
 	}
 	psock_other = sk_psock(sk_other);
 	/* This error indicates the socket is being torn down or had another
@@ -845,19 +845,20 @@ static void sk_psock_skb_redirect(struct sk_buff *skb)
 	if (!psock_other || sock_flag(sk_other, SOCK_DEAD)) {
 		skb_bpf_redirect_clear(skb);
 		kfree_skb(skb);
-		return;
+		return -EIO;
 	}
 	spin_lock_bh(&psock_other->ingress_lock);
 	if (!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) {
 		spin_unlock_bh(&psock_other->ingress_lock);
 		skb_bpf_redirect_clear(skb);
 		kfree_skb(skb);
-		return;
+		return -EIO;
 	}
 
 	skb_queue_tail(&psock_other->ingress_skb, skb);
 	schedule_work(&psock_other->work);
 	spin_unlock_bh(&psock_other->ingress_lock);
+	return 0;
 }
 
 static void sk_psock_tls_verdict_apply(struct sk_buff *skb, struct sock *sk, int verdict)
@@ -894,14 +895,15 @@ int sk_psock_tls_strp_read(struct sk_psock *psock, struct sk_buff *skb)
 }
 EXPORT_SYMBOL_GPL(sk_psock_tls_strp_read);
 
-static void sk_psock_verdict_apply(struct sk_psock *psock,
-				   struct sk_buff *skb, int verdict)
+static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb,
+				  int verdict)
 {
 	struct sock *sk_other;
-	int err = -EIO;
+	int err = 0;
 
 	switch (verdict) {
 	case __SK_PASS:
+		err = -EIO;
 		sk_other = psock->sk;
 		if (sock_flag(sk_other, SOCK_DEAD) ||
 		    !sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) {
@@ -934,13 +936,15 @@ static void sk_psock_verdict_apply(struct sk_psock *psock,
 		}
 		break;
 	case __SK_REDIRECT:
-		sk_psock_skb_redirect(skb);
+		err = sk_psock_skb_redirect(skb);
 		break;
 	case __SK_DROP:
 	default:
 out_free:
 		kfree_skb(skb);
 	}
+
+	return err;
 }
 
 static void sk_psock_write_space(struct sock *sk)
@@ -1107,7 +1111,8 @@ static int sk_psock_verdict_recv(read_descriptor_t *desc, struct sk_buff *skb,
 		ret = sk_psock_map_verd(ret, skb_bpf_redirect_fetch(skb));
 		skb->sk = NULL;
 	}
-	sk_psock_verdict_apply(psock, skb, ret);
+	if (sk_psock_verdict_apply(psock, skb, ret) < 0)
+		len = 0;
 out:
 	rcu_read_unlock();
 	return len;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RESEND bpf v3 7/8] skmsg: pass source psock to sk_psock_skb_redirect()
  2021-06-15  2:13 [PATCH RESEND bpf v3 0/8] sock_map: some bug fixes and improvements Cong Wang
                   ` (5 preceding siblings ...)
  2021-06-15  2:13 ` [PATCH RESEND bpf v3 6/8] skmsg: teach sk_psock_verdict_apply() to return errors Cong Wang
@ 2021-06-15  2:13 ` Cong Wang
  2021-06-15  2:13 ` [PATCH RESEND bpf v3 8/8] skmsg: increase sk->sk_drops when dropping packets Cong Wang
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Cong Wang @ 2021-06-15  2:13 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

sk_psock_skb_redirect() only takes skb as a parameter, we
will need to know where this skb is from, so just pass
the source psock to this function as a new parameter.
This patch prepares for the next one.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/core/skmsg.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index e3d210811db4..3aa9065811ad 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -824,7 +824,7 @@ int sk_psock_msg_verdict(struct sock *sk, struct sk_psock *psock,
 }
 EXPORT_SYMBOL_GPL(sk_psock_msg_verdict);
 
-static int sk_psock_skb_redirect(struct sk_buff *skb)
+static int sk_psock_skb_redirect(struct sk_psock *from, struct sk_buff *skb)
 {
 	struct sk_psock *psock_other;
 	struct sock *sk_other;
@@ -861,11 +861,12 @@ static int sk_psock_skb_redirect(struct sk_buff *skb)
 	return 0;
 }
 
-static void sk_psock_tls_verdict_apply(struct sk_buff *skb, struct sock *sk, int verdict)
+static void sk_psock_tls_verdict_apply(struct sk_buff *skb,
+				       struct sk_psock *from, int verdict)
 {
 	switch (verdict) {
 	case __SK_REDIRECT:
-		sk_psock_skb_redirect(skb);
+		sk_psock_skb_redirect(from, skb);
 		break;
 	case __SK_PASS:
 	case __SK_DROP:
@@ -889,7 +890,7 @@ int sk_psock_tls_strp_read(struct sk_psock *psock, struct sk_buff *skb)
 		ret = sk_psock_map_verd(ret, skb_bpf_redirect_fetch(skb));
 		skb->sk = NULL;
 	}
-	sk_psock_tls_verdict_apply(skb, psock->sk, ret);
+	sk_psock_tls_verdict_apply(skb, psock, ret);
 	rcu_read_unlock();
 	return ret;
 }
@@ -936,7 +937,7 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb,
 		}
 		break;
 	case __SK_REDIRECT:
-		err = sk_psock_skb_redirect(skb);
+		err = sk_psock_skb_redirect(psock, skb);
 		break;
 	case __SK_DROP:
 	default:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RESEND bpf v3 8/8] skmsg: increase sk->sk_drops when dropping packets
  2021-06-15  2:13 [PATCH RESEND bpf v3 0/8] sock_map: some bug fixes and improvements Cong Wang
                   ` (6 preceding siblings ...)
  2021-06-15  2:13 ` [PATCH RESEND bpf v3 7/8] skmsg: pass source psock to sk_psock_skb_redirect() Cong Wang
@ 2021-06-15  2:13 ` Cong Wang
  2021-06-19  9:57 ` [PATCH RESEND bpf v3 0/8] sock_map: some bug fixes and improvements Jakub Sitnicki
  2021-06-21 15:00 ` patchwork-bot+netdevbpf
  9 siblings, 0 replies; 11+ messages in thread
From: Cong Wang @ 2021-06-15  2:13 UTC (permalink / raw)
  To: netdev
  Cc: bpf, Cong Wang, John Fastabend, Daniel Borkmann, Jakub Sitnicki,
	Lorenz Bauer

From: Cong Wang <cong.wang@bytedance.com>

It is hard to observe packet drops without increasing relevant
drop counters, here we should increase sk->sk_drops which is
a protocol-independent counter. Fortunately psock is always
associated with a struct sock, we can just use psock->sk.

Suggested-by: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/core/skmsg.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 3aa9065811ad..9b6160a191f8 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -578,6 +578,12 @@ static int sk_psock_handle_skb(struct sk_psock *psock, struct sk_buff *skb,
 	return sk_psock_skb_ingress(psock, skb);
 }
 
+static void sock_drop(struct sock *sk, struct sk_buff *skb)
+{
+	sk_drops_add(sk, skb);
+	kfree_skb(skb);
+}
+
 static void sk_psock_backlog(struct work_struct *work)
 {
 	struct sk_psock *psock = container_of(work, struct sk_psock, work);
@@ -617,7 +623,7 @@ static void sk_psock_backlog(struct work_struct *work)
 				/* Hard errors break pipe and stop xmit. */
 				sk_psock_report_error(psock, ret ? -ret : EPIPE);
 				sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED);
-				kfree_skb(skb);
+				sock_drop(psock->sk, skb);
 				goto end;
 			}
 			off += ret;
@@ -708,7 +714,7 @@ static void __sk_psock_zap_ingress(struct sk_psock *psock)
 
 	while ((skb = skb_dequeue(&psock->ingress_skb)) != NULL) {
 		skb_bpf_redirect_clear(skb);
-		kfree_skb(skb);
+		sock_drop(psock->sk, skb);
 	}
 	__sk_psock_purge_ingress_msg(psock);
 }
@@ -834,7 +840,7 @@ static int sk_psock_skb_redirect(struct sk_psock *from, struct sk_buff *skb)
 	 * return code, but then didn't set a redirect interface.
 	 */
 	if (unlikely(!sk_other)) {
-		kfree_skb(skb);
+		sock_drop(from->sk, skb);
 		return -EIO;
 	}
 	psock_other = sk_psock(sk_other);
@@ -844,14 +850,14 @@ static int sk_psock_skb_redirect(struct sk_psock *from, struct sk_buff *skb)
 	 */
 	if (!psock_other || sock_flag(sk_other, SOCK_DEAD)) {
 		skb_bpf_redirect_clear(skb);
-		kfree_skb(skb);
+		sock_drop(from->sk, skb);
 		return -EIO;
 	}
 	spin_lock_bh(&psock_other->ingress_lock);
 	if (!sk_psock_test_state(psock_other, SK_PSOCK_TX_ENABLED)) {
 		spin_unlock_bh(&psock_other->ingress_lock);
 		skb_bpf_redirect_clear(skb);
-		kfree_skb(skb);
+		sock_drop(from->sk, skb);
 		return -EIO;
 	}
 
@@ -942,7 +948,7 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb,
 	case __SK_DROP:
 	default:
 out_free:
-		kfree_skb(skb);
+		sock_drop(psock->sk, skb);
 	}
 
 	return err;
@@ -977,7 +983,7 @@ static void sk_psock_strp_read(struct strparser *strp, struct sk_buff *skb)
 	sk = strp->sk;
 	psock = sk_psock(sk);
 	if (unlikely(!psock)) {
-		kfree_skb(skb);
+		sock_drop(sk, skb);
 		goto out;
 	}
 	prog = READ_ONCE(psock->progs.stream_verdict);
@@ -1098,7 +1104,7 @@ static int sk_psock_verdict_recv(read_descriptor_t *desc, struct sk_buff *skb,
 	psock = sk_psock(sk);
 	if (unlikely(!psock)) {
 		len = 0;
-		kfree_skb(skb);
+		sock_drop(sk, skb);
 		goto out;
 	}
 	prog = READ_ONCE(psock->progs.stream_verdict);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH RESEND bpf v3 0/8] sock_map: some bug fixes and improvements
  2021-06-15  2:13 [PATCH RESEND bpf v3 0/8] sock_map: some bug fixes and improvements Cong Wang
                   ` (7 preceding siblings ...)
  2021-06-15  2:13 ` [PATCH RESEND bpf v3 8/8] skmsg: increase sk->sk_drops when dropping packets Cong Wang
@ 2021-06-19  9:57 ` Jakub Sitnicki
  2021-06-21 15:00 ` patchwork-bot+netdevbpf
  9 siblings, 0 replies; 11+ messages in thread
From: Jakub Sitnicki @ 2021-06-19  9:57 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, bpf, Cong Wang, John Fastabend

On Tue, Jun 15, 2021 at 04:13 AM CEST, Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
>
> This patchset contains a few bug fixes and improvements for sock_map.
>
> Patch 1 improves recvmsg() accuracy for UDP, patch 2 improves UDP
> non-blocking read() by retrying on EAGAIN. With both of them, the
> failure rate of the UDP test case goes down from 10% to 1%.
>
> Patch 3 is memory leak fix I posted, no change since v1. The rest
> patches address similar memory leaks or improve error handling,
> including one increases sk_drops counter for error cases. Please
> check each patch description for more details.
>
> Acked-by: John Fastabend <john.fastabend@gmail.com>
>
> ---

For the series:

Acked-by: Jakub Sitnicki <jakub@cloudflare.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RESEND bpf v3 0/8] sock_map: some bug fixes and improvements
  2021-06-15  2:13 [PATCH RESEND bpf v3 0/8] sock_map: some bug fixes and improvements Cong Wang
                   ` (8 preceding siblings ...)
  2021-06-19  9:57 ` [PATCH RESEND bpf v3 0/8] sock_map: some bug fixes and improvements Jakub Sitnicki
@ 2021-06-21 15:00 ` patchwork-bot+netdevbpf
  9 siblings, 0 replies; 11+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-06-21 15:00 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, bpf, cong.wang, john.fastabend

Hello:

This series was applied to bpf/bpf.git (refs/heads/master):

On Mon, 14 Jun 2021 19:13:34 -0700 you wrote:
> From: Cong Wang <cong.wang@bytedance.com>
> 
> This patchset contains a few bug fixes and improvements for sock_map.
> 
> Patch 1 improves recvmsg() accuracy for UDP, patch 2 improves UDP
> non-blocking read() by retrying on EAGAIN. With both of them, the
> failure rate of the UDP test case goes down from 10% to 1%.
> 
> [...]

Here is the summary with links:
  - [RESEND,bpf,v3,1/8] skmsg: improve udp_bpf_recvmsg() accuracy
    https://git.kernel.org/bpf/bpf/c/9f2470fbc4cb
  - [RESEND,bpf,v3,2/8] selftests/bpf: Retry for EAGAIN in udp_redir_to_connected()
    https://git.kernel.org/bpf/bpf/c/a7e65fe7d820
  - [RESEND,bpf,v3,3/8] udp: fix a memory leak in udp_read_sock()
    https://git.kernel.org/bpf/bpf/c/e00a5c331bf5
  - [RESEND,bpf,v3,4/8] skmsg: clear skb redirect pointer before dropping it
    https://git.kernel.org/bpf/bpf/c/30b9c54a707d
  - [RESEND,bpf,v3,5/8] skmsg: fix a memory leak in sk_psock_verdict_apply()
    https://git.kernel.org/bpf/bpf/c/0cf6672b23c8
  - [RESEND,bpf,v3,6/8] skmsg: teach sk_psock_verdict_apply() to return errors
    https://git.kernel.org/bpf/bpf/c/1581a6c1c329
  - [RESEND,bpf,v3,7/8] skmsg: pass source psock to sk_psock_skb_redirect()
    https://git.kernel.org/bpf/bpf/c/42830571f1fd
  - [RESEND,bpf,v3,8/8] skmsg: increase sk->sk_drops when dropping packets
    https://git.kernel.org/bpf/bpf/c/781dd0431eb5

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-06-21 15:00 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-15  2:13 [PATCH RESEND bpf v3 0/8] sock_map: some bug fixes and improvements Cong Wang
2021-06-15  2:13 ` [PATCH RESEND bpf v3 1/8] skmsg: improve udp_bpf_recvmsg() accuracy Cong Wang
2021-06-15  2:13 ` [PATCH RESEND bpf v3 2/8] selftests/bpf: Retry for EAGAIN in udp_redir_to_connected() Cong Wang
2021-06-15  2:13 ` [PATCH RESEND bpf v3 3/8] udp: fix a memory leak in udp_read_sock() Cong Wang
2021-06-15  2:13 ` [PATCH RESEND bpf v3 4/8] skmsg: clear skb redirect pointer before dropping it Cong Wang
2021-06-15  2:13 ` [PATCH RESEND bpf v3 5/8] skmsg: fix a memory leak in sk_psock_verdict_apply() Cong Wang
2021-06-15  2:13 ` [PATCH RESEND bpf v3 6/8] skmsg: teach sk_psock_verdict_apply() to return errors Cong Wang
2021-06-15  2:13 ` [PATCH RESEND bpf v3 7/8] skmsg: pass source psock to sk_psock_skb_redirect() Cong Wang
2021-06-15  2:13 ` [PATCH RESEND bpf v3 8/8] skmsg: increase sk->sk_drops when dropping packets Cong Wang
2021-06-19  9:57 ` [PATCH RESEND bpf v3 0/8] sock_map: some bug fixes and improvements Jakub Sitnicki
2021-06-21 15:00 ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.