LKML Archive mirror
 help / color / mirror / Atom feed
From: Eric Wong <normalperson@yhbt.net>
To: Jason Baron <jbaron@akamai.com>
Cc: Ingo Molnar <mingo@kernel.org>,
	peterz@infradead.org, mingo@redhat.com, viro@zeniv.linux.org.uk,
	akpm@linux-foundation.org, davidel@xmailserver.org,
	mtk.manpages@gmail.com, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	"luto@amacapital.net >> Andy Lutomirski" <luto@amacapital.net>
Subject: Re: [PATCH v2 2/2] epoll: introduce EPOLLEXCLUSIVE and EPOLLROUNDROBIN
Date: Sun, 22 Feb 2015 00:24:32 +0000	[thread overview]
Message-ID: <20150222002432.GA9031@dcvr.yhbt.net> (raw)
In-Reply-To: <54E557CF.8080702@akamai.com>

Jason Baron <jbaron@akamai.com> wrote:
> On 02/18/2015 12:51 PM, Ingo Molnar wrote:
> > * Ingo Molnar <mingo@kernel.org> wrote:
> >
> >>> [...] However, I think the userspace API change is less 
> >>> clear since epoll_wait() doesn't currently have an 
> >>> 'input' events argument as epoll_ctl() does.
> >> ... but the change would be a bit clearer and somewhat 
> >> more flexible: LIFO or FIFO queueing, right?
> >>
> >> But having the queueing model as part of the epoll 
> >> context is a legitimate approach as well.
> > Btw., there's another optimization that the networking code 
> > already does when processing incoming packets: waking up a 
> > thread on the local CPU, where the wakeup is running.
> >
> > Doing the same on epoll would have real scalability 
> > advantages where incoming events are IRQ driven and are 
> > distributed amongst multiple CPUs.
> >
> > Where events are task driven the scheduler will already try 
> > to pair up waker and wakee so it might not show up in 
> > measurements that markedly.
> >
> 
> Right, so this makes me think that we may want to potentially
> support a variety of wakeup policies. Adding these to the
> generic wake up code is just going to be too messy. So, perhaps
> a better approach here would be to register a single
> wait_queue_t with the event source queue that will always
> be woken up, and then layer any epoll balancing/irq affinity
> policies on top of that. So in essence we end up with sort of
> two queues layers, but I think it provides much nicer isolation
> between layers. Also, the bulk of the changes are going to be
> isolated to the epoll code, and we avoid Andy's concern about
> missing, or starving out wakeups.
> 
> So here's a stab at how this API could look:
> 
> 1. ep1 = epoll_create1(EPOLL_POLICY);
> 
> So EPOLL_POLICY here could the round robin policy described
> here, or the irq affinity or other ideas. The idea is to create
> an fd that is local to the process, such that other processes
> can not subsequently attach to it and affect our policy.

I'm not against defining more policies if needed.
Maybe FIFO vs LIFO is a good case for this.

For affinity, it could probably be done transparently based on
epoll_wait retrievals + EPOLL_CTL_MOD operations.

> 2. epoll_ctl(ep1, EPOLL_CTL_ADD, fd_source, NULL);
> 
> This associates ep1 with the event source. ep1 can be
> associated with or added to at most 1 wakeup source. This call
> would largely just form the association, but not queue anything
> to the fd_source wait queue.

This would mean one extra FD for every fd_source, but that's
only a handful of FDs (listen sockets), correct?

> 3. epoll_ctl(ep2, EPOLL_CTL_ADD, ep1, event);
>     epoll_ctl(ep3, EPOLL_CTL_ADD, ep1, event);
>     epoll_ctl(ep4, EPOLL_CTL_ADD, ep1, event);
>      .
>      .
>      .
> 
> Finally, we add the epoll sets to the event source (indirectly via
> ep1). So the first add would actually queue the callback to the
> fd_source. While the subsequent calls would simply queue things
> to the 'nested' wakeup queue associated with ep1.

I'm not sure I follow, wouldn't this increase the number of wakeups?

> So any existing epoll/poll/select calls could be queued as well
> to fd_source and will operate independenly from this mechanism,
> as the fd_source queue continues to be 'wake all'. Also, there
> should be no changes necessary to __wake_up_common(), other
> than potentially passing more back though the
> wait_queue_func_t, such as 'nr_exclusive'.

  reply	other threads:[~2015-02-22  0:24 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-17 19:33 [PATCH v2 0/2] Add epoll round robin wakeup mode Jason Baron
2015-02-17 19:33 ` [PATCH v2 1/2] sched/wait: add " Jason Baron
2015-02-17 19:33 ` [PATCH v2 2/2] epoll: introduce EPOLLEXCLUSIVE and EPOLLROUNDROBIN Jason Baron
2015-02-18  8:07   ` Ingo Molnar
2015-02-18 15:42     ` Jason Baron
2015-02-18 16:33       ` Ingo Molnar
2015-02-18 17:38         ` Jason Baron
2015-02-18 17:45           ` Ingo Molnar
2015-02-18 17:51             ` Ingo Molnar
2015-02-18 22:18               ` Eric Wong
2015-02-19  3:26               ` Jason Baron
2015-02-22  0:24                 ` Eric Wong [this message]
2015-02-25 15:48                   ` Jason Baron
2015-02-18 23:12           ` Andy Lutomirski
     [not found]   ` <CAPh34mcPNQELwZCDTHej+HK=bpWgJ=jb1LeCtKoUHVgoDJOJoQ@mail.gmail.com>
2015-02-27 22:24     ` Jason Baron
2015-02-17 19:46 ` [PATCH v2 0/2] Add epoll round robin wakeup mode Andy Lutomirski
2015-02-17 20:33   ` Jason Baron
2015-02-17 21:09     ` Andy Lutomirski
2015-02-18  3:15       ` Jason Baron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150222002432.GA9031@dcvr.yhbt.net \
    --to=normalperson@yhbt.net \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=davidel@xmailserver.org \
    --cc=jbaron@akamai.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=mtk.manpages@gmail.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).