All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v2] net: enable SOCK_NOSPACE for UDP
@ 2024-04-04 23:37 Pavel Begunkov
  2024-04-05  9:11 ` Eric Dumazet
  0 siblings, 1 reply; 3+ messages in thread
From: Pavel Begunkov @ 2024-04-04 23:37 UTC (permalink / raw
  To: netdev; +Cc: edumazet, davem, dsahern, pabeni, kuba, Pavel Begunkov

wake_up_poll() and variants can be expensive even if they don't actually
wake anything up as it involves disabling irqs, taking a spinlock and
walking through the poll list, which is fraught with cache bounces.
That might happen when someone waits for POLLOUT or even POLLIN as the
waitqueue is shared, even though we should be able to skip these
false positive calls when the tx queue is not full.

Add support for SOCK_NOSPACE for UDP sockets. The udp_poll() change is
straightforward and repeats after tcp_poll() and others. However, for
sock_wfree() it's done as an optional feature flagged by
SOCK_SUPPORT_NOSPACE, because the feature requires support from the
corresponding poll handler but there are many users of sock_wfree()
that might be not prepared.

Note, it optimises the sock_wfree() path but not sock_def_write_space().
That's fine because it leads to more false positive wake ups, which is
tolerable and not performance critical.

It wins +5% to throughput testing with a CPU bound tx only io_uring
based benchmark and showed 0.5-3% in more realistic workloads.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/net/sock.h |  1 +
 net/core/sock.c    |  5 +++++
 net/ipv4/udp.c     | 15 ++++++++++++++-
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 2253eefe2848..027a398471c4 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -944,6 +944,7 @@ enum sock_flags {
 	SOCK_XDP, /* XDP is attached */
 	SOCK_TSTAMP_NEW, /* Indicates 64 bit timestamps always */
 	SOCK_RCVMARK, /* Receive SO_MARK  ancillary data with packet */
+	SOCK_NOSPACE_SUPPORTED, /* socket supports the SOCK_NOSPACE flag */
 };
 
 #define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE))
diff --git a/net/core/sock.c b/net/core/sock.c
index 5ed411231fc7..e4f486e9296a 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3393,6 +3393,11 @@ static void sock_def_write_space_wfree(struct sock *sk)
 
 		/* rely on refcount_sub from sock_wfree() */
 		smp_mb__after_atomic();
+
+		if (sock_flag(sk, SOCK_NOSPACE_SUPPORTED) &&
+		    !test_bit(SOCK_NOSPACE, &sk->sk_socket->flags))
+			return;
+
 		if (wq && waitqueue_active(&wq->wait))
 			wake_up_interruptible_sync_poll(&wq->wait, EPOLLOUT |
 						EPOLLWRNORM | EPOLLWRBAND);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 11460d751e73..309fa96e9020 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -342,6 +342,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
 		hslot2->count++;
 		spin_unlock(&hslot2->lock);
 	}
+	sock_set_flag(sk, SOCK_NOSPACE_SUPPORTED);
 	sock_set_flag(sk, SOCK_RCU_FREE);
 	error = 0;
 fail_unlock:
@@ -2885,8 +2886,20 @@ __poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait)
 	/* psock ingress_msg queue should not contain any bad checksum frames */
 	if (sk_is_readable(sk))
 		mask |= EPOLLIN | EPOLLRDNORM;
-	return mask;
 
+	if (!sock_writeable(sk)) {
+		set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
+
+		/* Order with the wspace read so either we observe it
+		 * writeable or udp_sock_wfree() would find SOCK_NOSPACE and
+		 * wake us up.
+		 */
+		smp_mb__after_atomic();
+
+		if (sock_writeable(sk))
+			mask |= EPOLLOUT | EPOLLWRNORM | EPOLLWRBAND;
+	}
+	return mask;
 }
 EXPORT_SYMBOL(udp_poll);
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net-next v2] net: enable SOCK_NOSPACE for UDP
  2024-04-04 23:37 [PATCH net-next v2] net: enable SOCK_NOSPACE for UDP Pavel Begunkov
@ 2024-04-05  9:11 ` Eric Dumazet
  2024-04-05 11:45   ` Pavel Begunkov
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Dumazet @ 2024-04-05  9:11 UTC (permalink / raw
  To: Pavel Begunkov; +Cc: netdev, davem, dsahern, pabeni, kuba

On Fri, Apr 5, 2024 at 1:37 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> wake_up_poll() and variants can be expensive even if they don't actually
> wake anything up as it involves disabling irqs, taking a spinlock and
> walking through the poll list, which is fraught with cache bounces.
> That might happen when someone waits for POLLOUT or even POLLIN as the
> waitqueue is shared, even though we should be able to skip these
> false positive calls when the tx queue is not full.
>
> Add support for SOCK_NOSPACE for UDP sockets. The udp_poll() change is
> straightforward and repeats after tcp_poll() and others. However, for
> sock_wfree() it's done as an optional feature flagged by
> SOCK_SUPPORT_NOSPACE, because the feature requires support from the
> corresponding poll handler but there are many users of sock_wfree()
> that might be not prepared.
>
> Note, it optimises the sock_wfree() path but not sock_def_write_space().
> That's fine because it leads to more false positive wake ups, which is
> tolerable and not performance critical.
>
> It wins +5% to throughput testing with a CPU bound tx only io_uring
> based benchmark and showed 0.5-3% in more realistic workloads.
>
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  include/net/sock.h |  1 +
>  net/core/sock.c    |  5 +++++
>  net/ipv4/udp.c     | 15 ++++++++++++++-
>  3 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 2253eefe2848..027a398471c4 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -944,6 +944,7 @@ enum sock_flags {
>         SOCK_XDP, /* XDP is attached */
>         SOCK_TSTAMP_NEW, /* Indicates 64 bit timestamps always */
>         SOCK_RCVMARK, /* Receive SO_MARK  ancillary data with packet */
> +       SOCK_NOSPACE_SUPPORTED, /* socket supports the SOCK_NOSPACE flag */
>  };
>
>  #define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE))
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 5ed411231fc7..e4f486e9296a 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -3393,6 +3393,11 @@ static void sock_def_write_space_wfree(struct sock *sk)
>
>                 /* rely on refcount_sub from sock_wfree() */
>                 smp_mb__after_atomic();
> +
> +               if (sock_flag(sk, SOCK_NOSPACE_SUPPORTED) &&
> +                   !test_bit(SOCK_NOSPACE, &sk->sk_socket->flags))
> +                       return;
> +
>                 if (wq && waitqueue_active(&wq->wait))
>                         wake_up_interruptible_sync_poll(&wq->wait, EPOLLOUT |
>                                                 EPOLLWRNORM | EPOLLWRBAND);
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 11460d751e73..309fa96e9020 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -342,6 +342,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
>                 hslot2->count++;
>                 spin_unlock(&hslot2->lock);
>         }
> +       sock_set_flag(sk, SOCK_NOSPACE_SUPPORTED);
>         sock_set_flag(sk, SOCK_RCU_FREE);
>         error = 0;
>  fail_unlock:
> @@ -2885,8 +2886,20 @@ __poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait)
>         /* psock ingress_msg queue should not contain any bad checksum frames */
>         if (sk_is_readable(sk))
>                 mask |= EPOLLIN | EPOLLRDNORM;
> -       return mask;
>
> +       if (!sock_writeable(sk)) {

I think there is a race that you could avoid here by using

if  (!(mask & (EPOLLOUT | EPOLLWRNORM | EPOLLWRBAND))) {

(We called datagram_poll() at the beginning of udp_poll())

> +               set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
> +
> +               /* Order with the wspace read so either we observe it
> +                * writeable or udp_sock_wfree() would find SOCK_NOSPACE and
> +                * wake us up.
> +                */
> +               smp_mb__after_atomic();
> +
> +               if (sock_writeable(sk))
> +                       mask |= EPOLLOUT | EPOLLWRNORM | EPOLLWRBAND;
> +       }
> +       return mask;
>  }
>  EXPORT_SYMBOL(udp_poll);
>
> --
> 2.44.0
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net-next v2] net: enable SOCK_NOSPACE for UDP
  2024-04-05  9:11 ` Eric Dumazet
@ 2024-04-05 11:45   ` Pavel Begunkov
  0 siblings, 0 replies; 3+ messages in thread
From: Pavel Begunkov @ 2024-04-05 11:45 UTC (permalink / raw
  To: Eric Dumazet; +Cc: netdev, davem, dsahern, pabeni, kuba

On 4/5/24 10:11, Eric Dumazet wrote:
> On Fri, Apr 5, 2024 at 1:37 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>
>> wake_up_poll() and variants can be expensive even if they don't actually
>> wake anything up as it involves disabling irqs, taking a spinlock and
>> walking through the poll list, which is fraught with cache bounces.
>> That might happen when someone waits for POLLOUT or even POLLIN as the
>> waitqueue is shared, even though we should be able to skip these
>> false positive calls when the tx queue is not full.
>>
>> Add support for SOCK_NOSPACE for UDP sockets. The udp_poll() change is
>> straightforward and repeats after tcp_poll() and others. However, for
>> sock_wfree() it's done as an optional feature flagged by
>> SOCK_SUPPORT_NOSPACE, because the feature requires support from the
>> corresponding poll handler but there are many users of sock_wfree()
>> that might be not prepared.
>>
>> Note, it optimises the sock_wfree() path but not sock_def_write_space().
>> That's fine because it leads to more false positive wake ups, which is
>> tolerable and not performance critical.
>>
>> It wins +5% to throughput testing with a CPU bound tx only io_uring
>> based benchmark and showed 0.5-3% in more realistic workloads.
>>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>>   include/net/sock.h |  1 +
>>   net/core/sock.c    |  5 +++++
>>   net/ipv4/udp.c     | 15 ++++++++++++++-
>>   3 files changed, 20 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/net/sock.h b/include/net/sock.h
>> index 2253eefe2848..027a398471c4 100644
>> --- a/include/net/sock.h
>> +++ b/include/net/sock.h
>> @@ -944,6 +944,7 @@ enum sock_flags {
>>          SOCK_XDP, /* XDP is attached */
>>          SOCK_TSTAMP_NEW, /* Indicates 64 bit timestamps always */
>>          SOCK_RCVMARK, /* Receive SO_MARK  ancillary data with packet */
>> +       SOCK_NOSPACE_SUPPORTED, /* socket supports the SOCK_NOSPACE flag */
>>   };
>>
>>   #define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE))
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index 5ed411231fc7..e4f486e9296a 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -3393,6 +3393,11 @@ static void sock_def_write_space_wfree(struct sock *sk)
>>
>>                  /* rely on refcount_sub from sock_wfree() */
>>                  smp_mb__after_atomic();
>> +
>> +               if (sock_flag(sk, SOCK_NOSPACE_SUPPORTED) &&
>> +                   !test_bit(SOCK_NOSPACE, &sk->sk_socket->flags))
>> +                       return;
>> +
>>                  if (wq && waitqueue_active(&wq->wait))
>>                          wake_up_interruptible_sync_poll(&wq->wait, EPOLLOUT |
>>                                                  EPOLLWRNORM | EPOLLWRBAND);
>> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
>> index 11460d751e73..309fa96e9020 100644
>> --- a/net/ipv4/udp.c
>> +++ b/net/ipv4/udp.c
>> @@ -342,6 +342,7 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
>>                  hslot2->count++;
>>                  spin_unlock(&hslot2->lock);
>>          }
>> +       sock_set_flag(sk, SOCK_NOSPACE_SUPPORTED);
>>          sock_set_flag(sk, SOCK_RCU_FREE);
>>          error = 0;
>>   fail_unlock:
>> @@ -2885,8 +2886,20 @@ __poll_t udp_poll(struct file *file, struct socket *sock, poll_table *wait)
>>          /* psock ingress_msg queue should not contain any bad checksum frames */
>>          if (sk_is_readable(sk))
>>                  mask |= EPOLLIN | EPOLLRDNORM;
>> -       return mask;
>>
>> +       if (!sock_writeable(sk)) {
> 
> I think there is a race that you could avoid here by using
> 
> if  (!(mask & (EPOLLOUT | EPOLLWRNORM | EPOLLWRBAND))) {

Seems like it, I'll add the change, thanks

> (We called datagram_poll() at the beginning of udp_poll())

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-04-05 11:45 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-04 23:37 [PATCH net-next v2] net: enable SOCK_NOSPACE for UDP Pavel Begunkov
2024-04-05  9:11 ` Eric Dumazet
2024-04-05 11:45   ` Pavel Begunkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.