From: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
To: Jason Baron <jbaron@akamai.com>, akpm@linux-foundation.org
Cc: mtk.manpages@gmail.com, mingo@kernel.org, peterz@infradead.org,
viro@ftp.linux.org.uk, normalperson@yhbt.net, m@silodev.com,
corbet@lwn.net, luto@amacapital.net,
torvalds@linux-foundation.org, hagen@jauu.net,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-api@vger.kernel.org
Subject: Re: [PATCH] epoll: add exclusive wakeups flag
Date: Fri, 29 Jan 2016 09:14:36 +0100 [thread overview]
Message-ID: <56AB1F6C.7000609@gmail.com> (raw)
In-Reply-To: <56AA56A2.3000700@akamai.com>
Hello Jason,
On 01/28/2016 06:57 PM, Jason Baron wrote:
> Hi,
>
> On 01/28/2016 02:16 AM, Michael Kerrisk (man-pages) wrote:
>> Hi Jason,
>>
>> On 12/08/2015 04:23 AM, Jason Baron wrote:
>>> Hi,
>>>
>>> Re-post of an old series addressing thundering herd issues when sharing
>>> an event source fd amongst multiple epoll fds. Last posting was here
>>> for reference: https://lkml.org/lkml/2015/2/25/56
>>>
>>> The patch herein drops the core scheduler 'rotate' changes I had previously
>>> proposed as this patch seems performant without those.
>>>
>>> I was prompted to re-post this because Madars Vitolins reported some good
>>> speedups with this patch using Enduro/X application. His writeup is here:
>>> https://mvitolin.wordpress.com/2015/12/05/endurox-testing-epollexclusive-flag/
>>>
>>> Thanks,
>>>
>>> -Jason
>>>
>>> Sample epoll_clt text:
>>
>> Thanks for the proposed text. I have some questions about points
>> that are not quite clear to me.
>>
>>> EPOLLEXCLUSIVE
>>> Sets an exclusive wakeup mode for the epfd file descriptor that is
>>> being attached to the target file descriptor, fd. Thus, when an
>>> event occurs and multiple epfd file descriptors are attached to the
>>> same target file using EPOLLEXCLUSIVE, one or more epfds will receive
>>> an event with epoll_wait(2). The default in this scenario (when
>>> EPOLLEXCLUSIVE is not set) is for all epfds to receive an event.
>>> EPOLLEXLUSVIE may only be specified with the op EPOLL_CTL_ADD.
>>
>> So, assuming an FD is present in the interest list of multiple (say 6)
>> epoll FDs, and some (say 3) of those attachments were done using
>> EPOLLEXCLUSVE. Which of the following statements are correct:
>>
>> (a) It's guaranteed that *none* of the epoll FDs that did NOT specify
>> EPOLLEXCLUSIVE will receive an event.
>>
>> (b) It's guaranteed that *all* of the epoll FDs that did NOT specify
>> EPOLLEXCLUSIVE will receive an event.
>>
>> (c) From 1 to 3 of the epoll FDs that did specify EPOLLEXCLUSIVE
>> will receive an event.
>>
>> (d) Exactly one epoll FD that did specify EPOLLEXCLUSIVE will get
>> an event, and it is indeterminate which one.
>>
>
> So b and c. All the non-exclusive adds will get it and at least 1 of the
> exclusive adds will as well.
So is it fair to say that the expected use case is that all epoll sets
would use EPOLLEXCLUSIVE?
>> I suppose one point I'm trying to uncover in the above is: what is
>> the scope of EPOLLEXCLUSIVE? Is it just applicable for one process's
>> FD, or is it setting an attribute in the epoll "interest list" record
>> for that FD that affects notification behavior across all processes?
>>
>
> Right - so 'EPOLLEXCLUSIVE' will affect other epoll sets that are also
> using 'EPOLLEXCLUSIVE' against the the same fd, but will have no affect
> on epoll sets connected to fd that do not specify it.
>
>
>> And then:
>>
>> (1) What are the semantics of EPOLLEXCLUSIVE if the added FD becomes
>> disabled via EPOLLONESHOT (or explicitly via EPOLL_CTL_MOD with
>> the 'events' field set to 0)?
>>
>
> In the case of EPOLLEXCLUSIVE and EPOLLONESHOT, one would have to re-arm
> at least 1 of threads that was woken up by doing EPOLL_CTL_MOD to
> guarantee further wakeups.
>
> And like-wise with an EPOLL_CTL_MOD with 'events' all set to 0, one
> would need to either re-arm the thread that set the 'events' field to 0
> (by setting back to non-zero), or re-arm in at least one other thread
> via EPOLL_CTL_MOD (or delete and add).
Okay -- so when an EPOLLEXCLUSIVE FD becomes disarmed it is possible
to re-enable rith EPOLL_CTL_MOD; one doesn't need to delete and re-add
the FD.
>> (2) The source code contains a comment "we do not currently supported
>> nested exclusive wakeups". Could you elaborate on this point? It
>> sounds like something that should be documented.
>
> So I was just trying to say that we return -EINVAL if you try to do and
> EPOLL_CTL_ADD with EPOLLEXCLUSIVE and the 'fd' argument is a epoll fd
> returned via epoll_create().
Okay -- that definitely belongs in the man page.
I'll work up a text, but would like to get input about the "use case"
question above.
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
next prev parent reply other threads:[~2016-01-29 8:14 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-08 3:23 [PATCH] epoll: add exclusive wakeups flag Jason Baron
2015-12-08 3:23 ` [PATCH] epoll: add EPOLLEXCLUSIVE flag Jason Baron
2016-01-28 7:16 ` [PATCH] epoll: add exclusive wakeups flag Michael Kerrisk (man-pages)
2016-01-28 17:57 ` Jason Baron
2016-01-29 8:14 ` Michael Kerrisk (man-pages) [this message]
2016-02-01 19:42 ` Jason Baron
2016-03-10 18:53 ` Jason Baron
2016-03-10 19:47 ` Michael Kerrisk (man-pages)
2016-03-10 19:58 ` Michael Kerrisk (man-pages)
2016-03-10 20:40 ` Jason Baron
2016-03-11 20:30 ` Michael Kerrisk (man-pages)
[not found] ` <56E32FC5.4030902@akamai.com>
[not found] ` <56E353CF.6050503@gmail.com>
[not found] ` <56E6D0ED.20609@akamai.com>
2016-03-14 17:47 ` Michael Kerrisk (man-pages)
2016-03-14 19:32 ` Jason Baron
2016-03-14 20:01 ` Michael Kerrisk (man-pages)
2016-03-14 21:03 ` Michael Kerrisk (man-pages)
2016-03-14 22:35 ` Jason Baron
2016-03-14 23:09 ` Madars Vitolins
2016-03-14 23:26 ` Michael Kerrisk (man-pages)
2016-03-15 2:36 ` Jason Baron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56AB1F6C.7000609@gmail.com \
--to=mtk.manpages@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=hagen@jauu.net \
--cc=jbaron@akamai.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=m@silodev.com \
--cc=mingo@kernel.org \
--cc=normalperson@yhbt.net \
--cc=peterz@infradead.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@ftp.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).