Linux-api Archive mirror
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Joe Damato <jdamato@fastly.com>,
	 Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: linux-kernel@vger.kernel.org,  netdev@vger.kernel.org,
	 chuck.lever@oracle.com,  jlayton@kernel.org,
	 linux-api@vger.kernel.org,  brauner@kernel.org,
	 edumazet@google.com,  davem@davemloft.net,
	 alexander.duyck@gmail.com,  sridhar.samudrala@intel.com,
	 kuba@kernel.org,  weiwan@google.com,
	 Alexander Viro <viro@zeniv.linux.org.uk>,
	 Andrew Waterman <waterman@eecs.berkeley.edu>,
	 Arnd Bergmann <arnd@arndb.de>,
	 Dominik Brodowski <linux@dominikbrodowski.net>,
	 Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	 Jan Kara <jack@suse.cz>,  Jiri Slaby <jirislaby@kernel.org>,
	 Jonathan Corbet <corbet@lwn.net>,
	 Julien Panis <jpanis@baylibre.com>,
	 "open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
	 "(open list:FILESYSTEMS \\(VFS and infrastructure\\))"
	<linux-fsdevel@vger.kernel.org>,
	 Michael Ellerman <mpe@ellerman.id.au>,
	 Nathan Lynch <nathanl@linux.ibm.com>,
	 Palmer Dabbelt <palmer@dabbelt.com>,
	 Steve French <stfrench@microsoft.com>,
	 Thomas Huth <thuth@redhat.com>,
	 Thomas Zimmermann <tzimmermann@suse.de>
Subject: Re: [PATCH net-next v3 0/3] Per epoll context busy poll support
Date: Tue, 30 Jan 2024 15:33:33 -0500	[thread overview]
Message-ID: <65b95d1de41cc_ce3aa294fa@willemb.c.googlers.com.notmuch> (raw)
In-Reply-To: <20240129190922.GA1315@fastly.com>

Joe Damato wrote:
> On Sat, Jan 27, 2024 at 11:20:51AM -0500, Willem de Bruijn wrote:
> > Joe Damato wrote:
> > > Greetings:
> > > 
> > > Welcome to v3. Cover letter updated from v2 to explain why ioctl and
> > > adjusted my cc_cmd to try to get the correct people in addition to folks
> > > who were added in v1 & v2. Labeled as net-next because it seems networking
> > > related to me even though it is fs code.
> > > 
> > > TL;DR This builds on commit bf3b9f6372c4 ("epoll: Add busy poll support to
> > > epoll with socket fds.") by allowing user applications to enable
> > > epoll-based busy polling and set a busy poll packet budget on a per epoll
> > > context basis.
> > > 
> > > This makes epoll-based busy polling much more usable for user
> > > applications than the current system-wide sysctl and hardcoded budget.
> > > 
> > > To allow for this, two ioctls have been added for epoll contexts for
> > > getting and setting a new struct, struct epoll_params.
> > > 
> > > ioctl was chosen vs a new syscall after reviewing a suggestion by Willem
> > > de Bruijn [1]. I am open to using a new syscall instead of an ioctl, but it
> > > seemed that: 
> > >   - Busy poll affects all existing epoll_wait and epoll_pwait variants in
> > >     the same way, so new verions of many syscalls might be needed. It
> > 
> > There is no need to support a new feature on legacy calls. Applications have
> > to be upgraded to the new ioctl, so they can also be upgraded to the latest
> > epoll_wait variant.
> 
> Sure, that's a fair point. I think we could probably make reasonable
> arguments in both directions about the pros/cons of each approach.
> 
> It's still not clear to me that a new syscall is the best way to go on
> this, and IMO it does not offer a clear advantage. I understand that part
> of the premise of your argument is that ioctls are not recommended, but in
> this particular case it seems like a good use case and there have been
> new ioctls added recently (at least according to git log).
> 
> This makes me think that while their use is not recommended, they can serve
> a purpose in specific use cases. To me, this use case seems very fitting.
> 
> More of a joke and I hate to mention this, but this setting is changing how
> io is done and it seems fitting that this done via an ioctl ;)
> 
> > epoll_pwait extends epoll_wait with a sigmask.
> > epoll_pwait2 extends extends epoll_pwait with nsec resolution timespec.
> > Since they are supersets, nothing is lots by limiting to the most recent API.
> > 
> > In the discussion of epoll_pwait2 the addition of a forward looking flags
> > argument was discussed, but eventually dropped. Based on the argument that
> > adding a syscall is not a big task and does not warrant preemptive code.
> > This decision did receive a suitably snarky comment from Jonathan Corbet [1].
> > 
> > It is definitely more boilerplate, but essentially it is as feasible to add an
> > epoll_pwait3 that takes an optional busy poll argument. In which case, I also
> > believe that it makes more sense to configure the behavior of the syscall
> > directly, than through another syscall and state stored in the kernel.
> 
> I definitely hear what you are saying; I think I'm still not convinced, but
> I am thinking it through.
> 
> In my mind, all of the other busy poll settings are configured by setting
> options on the sockets using various SO_* options, which modify some state
> in the kernel. The existing system-wide busy poll sysctl also does this. It
> feels strange to me to diverge from that pattern just for epoll.

I think the stateful approach for read is because there we do want
to support all variants: read, readv, recv, recvfrom, recvmsg,
recvmmsg. So there is no way to pass it directly.

That said, I don't mean to argue strenously for this API or against
yours. Want to make sure the option space is explored. There does not
seem to be much other feedback. I don't hold a strong opinion either.

> In the case of epoll_pwait2 the addition of a new syscall is an approach
> that I think makes a lot of sense. The new system call is also probably
> better from an end-user usability perspective, as well. For busy poll, I
> don't see a clear reasoning why a new system call is better, but maybe I am
> still missing something.
>
> > I don't think that the usec fine grain busy poll argument is all that useful.
> > Documentation always suggests setting it to 50us or 100us, based on limited
> > data. Main point is to set it to exceed the round-trip delay of whatever the
> > process is trying to wait on. Overestimating is not costly, as the call
> > returns as soon as the condition is met. An epoll_pwait3 flag EPOLL_BUSY_POLL
> > with default 100us might be sufficient.
> > 
> > [1] https://lwn.net/Articles/837816/
> 
> Perhaps I am misunderstanding what you are suggesting, but I am opposed to
> hardcoding a value. If it is currently configurable system-wide and via
> SO_* options for other forms of busy poll, I think it should similarly be
> configurable for epoll busy poll.
> 
> I may yet be convinced by the new syscall argument, but I don't think I'd
> agree on imposing a default. The value can be modified by other forms of
> busy poll and the goal of my changes are to:
>   - make epoll-based busy poll per context
>   - allow applications to configure (within reason) how epoll-based busy
>     poll behaves, like they can do now with the existing SO_* options for
>     other busy poll methods.

Okay. I expected some push back. Was curious if people would come back
with examples of where the full range is actually being used.

> > >     seems much simpler for users to use the correct
> > >     epoll_wait/epoll_pwait for their app and add a call to ioctl to enable
> > >     or disable busy poll as needed. This also probably means less work to
> > >     get an existing epoll app using busy poll.
> > 



      reply	other threads:[~2024-01-30 20:33 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-25 22:56 [PATCH net-next v3 0/3] Per epoll context busy poll support Joe Damato
2024-01-25 22:56 ` [PATCH net-next v3 1/3] eventpoll: support busy poll per epoll instance Joe Damato
2024-01-25 22:56 ` [PATCH net-next v3 2/3] eventpoll: Add per-epoll busy poll packet budget Joe Damato
2024-01-25 22:56 ` [PATCH net-next v3 3/3] eventpoll: Add epoll ioctl for epoll_params Joe Damato
2024-01-25 23:20   ` Greg Kroah-Hartman
2024-01-26  0:04     ` Joe Damato
2024-01-25 23:21   ` Greg Kroah-Hartman
2024-01-26  0:11     ` Joe Damato
2024-01-26  0:23       ` Greg Kroah-Hartman
2024-01-26  2:36         ` Joe Damato
2024-01-26  6:16           ` Arnd Bergmann
2024-01-27 14:51             ` David Laight
2024-01-26 10:07           ` Christian Brauner
2024-01-26 16:52             ` Joe Damato
2024-01-25 23:22   ` Greg Kroah-Hartman
2024-01-26  0:07     ` Joe Damato
2024-01-28 11:24   ` kernel test robot
2024-01-27 16:20 ` [PATCH net-next v3 0/3] Per epoll context busy poll support Willem de Bruijn
2024-01-29 19:09   ` Joe Damato
2024-01-30 20:33     ` Willem de Bruijn [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=65b95d1de41cc_ce3aa294fa@willemb.c.googlers.com.notmuch \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=alexander.duyck@gmail.com \
    --cc=arnd@arndb.de \
    --cc=brauner@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jack@suse.cz \
    --cc=jdamato@fastly.com \
    --cc=jirislaby@kernel.org \
    --cc=jlayton@kernel.org \
    --cc=jpanis@baylibre.com \
    --cc=kuba@kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@dominikbrodowski.net \
    --cc=mpe@ellerman.id.au \
    --cc=nathanl@linux.ibm.com \
    --cc=netdev@vger.kernel.org \
    --cc=palmer@dabbelt.com \
    --cc=sridhar.samudrala@intel.com \
    --cc=stfrench@microsoft.com \
    --cc=thuth@redhat.com \
    --cc=tzimmermann@suse.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=waterman@eecs.berkeley.edu \
    --cc=weiwan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).