From: Jason Baron <jbaron@akamai.com>
To: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: mingo@kernel.org, peterz@infradead.org, viro@ftp.linux.org.uk,
normalperson@yhbt.net, m@silodev.com, corbet@lwn.net,
luto@amacapital.net, torvalds@linux-foundation.org,
hagen@jauu.net, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org
Subject: Re: [PATCH] epoll: add exclusive wakeups flag
Date: Mon, 14 Mar 2016 15:32:19 -0400 [thread overview]
Message-ID: <56E711C3.8020008@akamai.com> (raw)
In-Reply-To: <56E6F941.9040307@gmail.com>
On 03/14/2016 01:47 PM, Michael Kerrisk (man-pages) wrote:
> [Restoring CC, which I see I accidentally dropped, one iteration back.]
>
> Hi Jason,
>
> Thanks for the review. I've tweaked one piece to respond to your
> feedback. But I also have another new question below.
>
> On 03/15/2016 03:55 AM, Jason Baron wrote:
>> On 03/11/2016 06:25 PM, Michael Kerrisk (man-pages) wrote:
>>> On 03/11/2016 09:51 PM, Jason Baron wrote:
>>>> On 03/11/2016 03:30 PM, Michael Kerrisk (man-pages) wrote:
>
> [...]
>
>> Hi Michael,
>>
>> Looks good. One comment below.
>>
>> Thanks,
>>
>>> EPOLLEXCLUSIVE (since Linux 4.5)
>>> Sets an exclusive wakeup mode for the epoll file
>>> descriptor that is being attached to the target file
>>> descriptor, fd. When a wakeup event occurs and multiple
>>> epoll file descriptors are attached to the same target
>>> file using EPOLLEXCLUSIVE, one or more of the epoll file
>>> descriptors will receive an event with epoll_wait(2).
>>> The default in this scenario (when EPOLLEXCLUSIVE is not
>>> set) is for all epoll file descriptors to receive an
>>> event. EPOLLEXCLUSIVE is thus useful for avoiding thun‐
>>> dering herd problems in certain scenarios.
>>>
>>> If the same file descriptor is in multiple epoll
>>> instances, some with the EPOLLEXCLUSIVE flag, and others
>>> without, then events will provided to all epoll
>>> instances that did not specify EPOLLEXCLUSIVE, and at
>>> least one of the epoll instances that did specify
>>> EPOLLEXCLUSIVE.
>>>
>>> The following values may be specified in conjunction
>>> with EPOLLEXCLUSIVE: EPOLLIN, EPOLLOUT, EPOLLWAKEUP, and
>>> EPOLLET. EPOLLHUP and EPOLLERR can also be specified,
>>> but are ignored (as usual). Attempts to specify other
>>
>> I'm not sure 'ignored' is the right wording here. 'EPOLLHUP' and
>> 'EPOLERR' are always included in the set of events when something is
>> added as EPOLLEXCLUSIVE. This is consistent with the non-EPOLLEXCLUSIVE
>> add case.
>
> Yes.
>
>> So 'EPOLLHUP' and 'EPOLERR' may be specified but will be
>> included in the set of events on an add, whether they are specified or not.
>
> Yes. I understand your discomfort with the work "ignored", but the
> problem was that, because it made special mention of EPOLLHUP and EPOLLERR,
> your proposed text made it sound as though EPOLLEXCLUSIVE somehow was
> special with respect to these two flags. I wanted to clarify that it is not.
> How about this:
>
> The following values may be specified in conjunction
> with EPOLLEXCLUSIVE: EPOLLIN, EPOLLOUT, EPOLLWAKEUP, and
> EPOLLET. EPOLLHUP and EPOLLERR can also be specified,
> but this is not required: as usual, these events are
> always reported if they occur, regardless of whether
> they are specified in events.
> ?
Yes, nothing special here with respect to EPOLLHUP and EPOLLERR. So this
looks fine to me.
>
>>> values in events yield an error. EPOLLEXCLUSIVE may be
>>> used only in an EPOLL_CTL_ADD operation; attempts to
>>> employ it with EPOLL_CTL_MOD yield an error. If
>>> EPOLLEXCLUSIVE has set using epoll_ctl(2), then a subse‐
>>> quent EPOLL_CTL_MOD on the same epfd, fd pair yields an
> b>> error. An epoll_ctl(2) that specifies EPOLLEXCLUSIVE in
>>> events and specifies the target file descriptor fd as an
>>> epoll instance will likewise fail. The error in all of
>>> these cases is EINVAL.
>>>
>>> ERRORS
>>> EINVAL An invalid event type was specified along with EPOLLEX‐
>>> CLUSIVE in events.
>>>
>>> EINVAL op was EPOLL_CTL_MOD and events included EPOLLEXCLUSIVE.
>>>
>>> EINVAL op was EPOLL_CTL_MOD and the EPOLLEXCLUSIVE flag has
>>> previously been applied to this epfd, fd pair.
>>>
>>> EINVAL EPOLLEXCLUSIVE was specified in event and fd is refers
>>> to an epoll instance.
>
> Returning to the second sentence in this description:
>
> When a wakeup event occurs and multiple epoll file descrip‐
> tors are attached to the same target file using EPOLLEXCLU‐
> SIVE, one or more of the epoll file descriptors will
> receive an event with epoll_wait(2).
>
> There is a point that is unclear to me: what does "target file" refer to?
> Is it an open file description (aka open file table entry) or an inode?
> I suspect the former, but it was not clear in your original text.
>
So from epoll's perspective, the wakeups are associated with a 'wait
queue'. So if the open() and subsequent EPOLL_CTL_ADD (which is done via
file->poll()) results in adding to the same 'wait queue' then we will
get 'exclusive' wakeup behavior.
So in general, I think the answer here is that its associated with the
inode (I coudn't say with 100% certainty without really looking at all
file->poll() implementations). Certainly, with the 'FIFO' example below,
the two scenarios will have the same behavior with respect to
EPOLLEXCLUSIVE.
Also, the 'non-exclusive' mode would be subject to the same question of
which wait queue is the epfd is associated with...
Thanks,
-Jason
> To make this point even clearer, here are two scenarios I'm thinking of.
> In each case, we're talking of monitoring the read end of a FIFO.
>
> ===
>
> Scenario 1:
>
> We have three processes each of which
> 1. Creates an epoll instance
> 2. Opens the read end of the FIFO
> 3. Adds the read end of the FIFO to the epoll instance, specifying
> EPOLLEXCLUSIVE
>
> When input becomes available on the FIFO, how many processes
> get a wakeup?
>
> ===
>
> Scenario 3
>
> A parent process opens the read end of a FIFO and then calls
> fork() three times to create three children. Each child then:
>
> 1. Creates an epoll instance
> 2. Adds the read end of the FIFO to the epoll instance, specifying
> EPOLLEXCLUSIVE
>
> When input becomes available on the FIFO, how many processes
> get a wakeup?
>
> ===
>
> Cheers,
>
> Michael
>
next prev parent reply other threads:[~2016-03-14 19:32 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-08 3:23 [PATCH] epoll: add exclusive wakeups flag Jason Baron
2015-12-08 3:23 ` [PATCH] epoll: add EPOLLEXCLUSIVE flag Jason Baron
2016-01-28 7:16 ` [PATCH] epoll: add exclusive wakeups flag Michael Kerrisk (man-pages)
2016-01-28 17:57 ` Jason Baron
2016-01-29 8:14 ` Michael Kerrisk (man-pages)
2016-02-01 19:42 ` Jason Baron
2016-03-10 18:53 ` Jason Baron
2016-03-10 19:47 ` Michael Kerrisk (man-pages)
2016-03-10 19:58 ` Michael Kerrisk (man-pages)
2016-03-10 20:40 ` Jason Baron
2016-03-11 20:30 ` Michael Kerrisk (man-pages)
[not found] ` <56E32FC5.4030902@akamai.com>
[not found] ` <56E353CF.6050503@gmail.com>
[not found] ` <56E6D0ED.20609@akamai.com>
2016-03-14 17:47 ` Michael Kerrisk (man-pages)
2016-03-14 19:32 ` Jason Baron [this message]
2016-03-14 20:01 ` Michael Kerrisk (man-pages)
2016-03-14 21:03 ` Michael Kerrisk (man-pages)
2016-03-14 22:35 ` Jason Baron
2016-03-14 23:09 ` Madars Vitolins
2016-03-14 23:26 ` Michael Kerrisk (man-pages)
2016-03-15 2:36 ` Jason Baron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56E711C3.8020008@akamai.com \
--to=jbaron@akamai.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=hagen@jauu.net \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=m@silodev.com \
--cc=mingo@kernel.org \
--cc=mtk.manpages@gmail.com \
--cc=normalperson@yhbt.net \
--cc=peterz@infradead.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@ftp.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).