Stable Archive mirror
 help / color / mirror / Atom feed
* [PATCH stable, 6.1] net: sockmap, fix missing MSG_MORE causing TCP disruptions
@ 2024-05-03 16:48 John Fastabend
  2024-05-03 18:16 ` Daniel Borkmann
  2024-05-04  7:30 ` Greg KH
  0 siblings, 2 replies; 4+ messages in thread
From: John Fastabend @ 2024-05-03 16:48 UTC (permalink / raw
  To: stable; +Cc: bpf, daniel, john.fastabend, dhowells

[ Upstream commit ebf2e8860eea66e2c4764316b80c6a5ee5f336ee]
[ Upstream commit f8dd95b29d7ef08c19ec9720564acf72243ddcf6]

In the first patch,

ebf2e8860eea ("tcp_bpf: Inline do_tcp_sendpages as it's now a wrapper around tcp_sendmsg")

This block of code is added to tcp_bpf_push(). The
tcp_bpf_push is the code used by BPF to submit messages into the TCP
stack.

 if (flags & MSG_SENDPAGE_NOTLAST)
     msghdr.msg_flags | MSG_MORE;

In the second patch,

f8dd95b29d7e ("tcp_bpf, smc, tls, espintcp, siw: Reduce MSG_SENDPAGE_NOTLAST usage")

this logic was further changed to,

  if (flags & MSG_SENDPAGE_NOTLAST)
     msghdr.msg_flags |= MSG_MORE

This was done as part of an improvement to use the sendmsg() callbacks
and remove the sendpage usage inside the various sub systems.

However, these two patches together fixed a bug. The issue is without
MSG_MORE set we will break a msg up into many smaller sends. In some
case a lot because the operation loops over the scatter gather list.
Without the MSG_MORE set (the current 6.1 case) we see stalls in data
send/recv and sometimes applications failing to receive data. This
generally is the result of an application that gives up after calling
recv() or similar too many times. We introduce this because of how
we incorrectly change the TCP send pattern.

Now that we have both 6.5 and 6.1 stable kernels deployed we've
observed a series of issues related to this in real deployments. In 6.5
kernels all the HTTP and other compliance tests pass and we are not
observing any other issues. On 6.1 various compliance tests fail
(nginx for example), but more importantly in these clusters without
the flag set we observe stalled applications and increased retries in
other applications. Openssl users where we have annotations to monitor
retries and failures observed a significant increase in retries for
example.

For the backport we isolated the fix to the two lines in the above
patches that fixed the code. With this patch we deployed the workloads
again and error rates and stalls went away and 6.1 stable kernels
perform similar to 6.5 stable kernels. Similarly the compliance tests
also passed.

Cc: <stable@vger.kernel.org> # 6.1.x
Fixes: 604326b41a6fb ("tcp_bpf, smc, tls, espintcp, siw: Reduce MSG_SENDPAGE_NOTLAST usage")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 net/ipv4/tcp_bpf.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index f8037d142bb7..20d94f67fde2 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -105,6 +105,9 @@ static int tcp_bpf_push(struct sock *sk, struct sk_msg *msg, u32 apply_bytes,
 
 		tcp_rate_check_app_limited(sk);
 retry:
+		if (size < sge->length && msg->sg.start != msg->sg.end)
+			flags |= MSG_SENDPAGE_NOTLAST;
+
 		has_tx_ulp = tls_sw_has_ctx_tx(sk);
 		if (has_tx_ulp) {
 			flags |= MSG_SENDPAGE_NOPOLICY;
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH stable, 6.1] net: sockmap, fix missing MSG_MORE causing TCP disruptions
  2024-05-03 16:48 [PATCH stable, 6.1] net: sockmap, fix missing MSG_MORE causing TCP disruptions John Fastabend
@ 2024-05-03 18:16 ` Daniel Borkmann
  2024-05-04  7:30 ` Greg KH
  1 sibling, 0 replies; 4+ messages in thread
From: Daniel Borkmann @ 2024-05-03 18:16 UTC (permalink / raw
  To: John Fastabend, stable; +Cc: bpf, dhowells

On 5/3/24 6:48 PM, John Fastabend wrote:
> [ Upstream commit ebf2e8860eea66e2c4764316b80c6a5ee5f336ee]
> [ Upstream commit f8dd95b29d7ef08c19ec9720564acf72243ddcf6]
> 
> In the first patch,
> 
> ebf2e8860eea ("tcp_bpf: Inline do_tcp_sendpages as it's now a wrapper around tcp_sendmsg")
> 
> This block of code is added to tcp_bpf_push(). The
> tcp_bpf_push is the code used by BPF to submit messages into the TCP
> stack.
> 
>   if (flags & MSG_SENDPAGE_NOTLAST)
>       msghdr.msg_flags | MSG_MORE;
> 
> In the second patch,
> 
> f8dd95b29d7e ("tcp_bpf, smc, tls, espintcp, siw: Reduce MSG_SENDPAGE_NOTLAST usage")
> 
> this logic was further changed to,
> 
>    if (flags & MSG_SENDPAGE_NOTLAST)
>       msghdr.msg_flags |= MSG_MORE
> 
> This was done as part of an improvement to use the sendmsg() callbacks
> and remove the sendpage usage inside the various sub systems.
> 
> However, these two patches together fixed a bug. The issue is without
> MSG_MORE set we will break a msg up into many smaller sends. In some
> case a lot because the operation loops over the scatter gather list.
> Without the MSG_MORE set (the current 6.1 case) we see stalls in data
> send/recv and sometimes applications failing to receive data. This
> generally is the result of an application that gives up after calling
> recv() or similar too many times. We introduce this because of how
> we incorrectly change the TCP send pattern.
> 
> Now that we have both 6.5 and 6.1 stable kernels deployed we've
> observed a series of issues related to this in real deployments. In 6.5
> kernels all the HTTP and other compliance tests pass and we are not
> observing any other issues. On 6.1 various compliance tests fail
> (nginx for example), but more importantly in these clusters without
> the flag set we observe stalled applications and increased retries in
> other applications. Openssl users where we have annotations to monitor
> retries and failures observed a significant increase in retries for
> example.
> 
> For the backport we isolated the fix to the two lines in the above
> patches that fixed the code. With this patch we deployed the workloads
> again and error rates and stalls went away and 6.1 stable kernels
> perform similar to 6.5 stable kernels. Similarly the compliance tests
> also passed.
> 
> Cc: <stable@vger.kernel.org> # 6.1.x
> Fixes: 604326b41a6fb ("tcp_bpf, smc, tls, espintcp, siw: Reduce MSG_SENDPAGE_NOTLAST usage")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH stable, 6.1] net: sockmap, fix missing MSG_MORE causing TCP disruptions
  2024-05-03 16:48 [PATCH stable, 6.1] net: sockmap, fix missing MSG_MORE causing TCP disruptions John Fastabend
  2024-05-03 18:16 ` Daniel Borkmann
@ 2024-05-04  7:30 ` Greg KH
  2024-05-04 16:48   ` John Fastabend
  1 sibling, 1 reply; 4+ messages in thread
From: Greg KH @ 2024-05-04  7:30 UTC (permalink / raw
  To: John Fastabend; +Cc: stable, bpf, daniel, dhowells

On Fri, May 03, 2024 at 09:48:05AM -0700, John Fastabend wrote:
> [ Upstream commit ebf2e8860eea66e2c4764316b80c6a5ee5f336ee]
> [ Upstream commit f8dd95b29d7ef08c19ec9720564acf72243ddcf6]

Why are you mushing 2 patches together?  Why can't we just take the two
as-is instead?  That makes tracking everything much simpler and
possible.

> In the first patch,
> 
> ebf2e8860eea ("tcp_bpf: Inline do_tcp_sendpages as it's now a wrapper around tcp_sendmsg")
> 
> This block of code is added to tcp_bpf_push(). The
> tcp_bpf_push is the code used by BPF to submit messages into the TCP
> stack.
> 
>  if (flags & MSG_SENDPAGE_NOTLAST)
>      msghdr.msg_flags | MSG_MORE;
> 
> In the second patch,
> 
> f8dd95b29d7e ("tcp_bpf, smc, tls, espintcp, siw: Reduce MSG_SENDPAGE_NOTLAST usage")
> 
> this logic was further changed to,
> 
>   if (flags & MSG_SENDPAGE_NOTLAST)
>      msghdr.msg_flags |= MSG_MORE
> 
> This was done as part of an improvement to use the sendmsg() callbacks
> and remove the sendpage usage inside the various sub systems.
> 
> However, these two patches together fixed a bug. The issue is without
> MSG_MORE set we will break a msg up into many smaller sends. In some
> case a lot because the operation loops over the scatter gather list.
> Without the MSG_MORE set (the current 6.1 case) we see stalls in data
> send/recv and sometimes applications failing to receive data. This
> generally is the result of an application that gives up after calling
> recv() or similar too many times. We introduce this because of how
> we incorrectly change the TCP send pattern.
> 
> Now that we have both 6.5 and 6.1 stable kernels deployed we've
> observed a series of issues related to this in real deployments. In 6.5
> kernels all the HTTP and other compliance tests pass and we are not
> observing any other issues. On 6.1 various compliance tests fail
> (nginx for example), but more importantly in these clusters without
> the flag set we observe stalled applications and increased retries in
> other applications. Openssl users where we have annotations to monitor
> retries and failures observed a significant increase in retries for
> example.
> 
> For the backport we isolated the fix to the two lines in the above
> patches that fixed the code. With this patch we deployed the workloads
> again and error rates and stalls went away and 6.1 stable kernels
> perform similar to 6.5 stable kernels. Similarly the compliance tests
> also passed.

Can we just take the two original patches instead?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH stable, 6.1] net: sockmap, fix missing MSG_MORE causing TCP disruptions
  2024-05-04  7:30 ` Greg KH
@ 2024-05-04 16:48   ` John Fastabend
  0 siblings, 0 replies; 4+ messages in thread
From: John Fastabend @ 2024-05-04 16:48 UTC (permalink / raw
  To: Greg KH, John Fastabend; +Cc: stable, bpf, daniel, dhowells

Greg KH wrote:
> On Fri, May 03, 2024 at 09:48:05AM -0700, John Fastabend wrote:
> > [ Upstream commit ebf2e8860eea66e2c4764316b80c6a5ee5f336ee]
> > [ Upstream commit f8dd95b29d7ef08c19ec9720564acf72243ddcf6]
> 
> Why are you mushing 2 patches together?  Why can't we just take the two
> as-is instead?  That makes tracking everything much simpler and
> possible.

OK the thought was to get the minimal diff needed. But that
is problematic. We can take the first one as-is and
then the second one will have a couple chunks that don't
apply but we don't need those chunks because the infiniband
part it touches doesn't have the same issue in 6.1.

> 
> > In the first patch,

[...]

> > For the backport we isolated the fix to the two lines in the above
> > patches that fixed the code. With this patch we deployed the workloads
> > again and error rates and stalls went away and 6.1 stable kernels
> > perform similar to 6.5 stable kernels. Similarly the compliance tests
> > also passed.
> 
> Can we just take the two original patches instead?

Yes minus the couple chunks that don't apply on the second one. I'll
do some testing and resend thanks.

> 
> thanks,
> 
> greg k-h

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-05-04 16:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-03 16:48 [PATCH stable, 6.1] net: sockmap, fix missing MSG_MORE causing TCP disruptions John Fastabend
2024-05-03 18:16 ` Daniel Borkmann
2024-05-04  7:30 ` Greg KH
2024-05-04 16:48   ` John Fastabend

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).