LKML Archive mirror
 help / color / mirror / Atom feed
From: Jason Baron <jbaron@akamai.com>
To: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: mingo@kernel.org, peterz@infradead.org, viro@ftp.linux.org.uk,
	normalperson@yhbt.net, m@silodev.com, corbet@lwn.net,
	luto@amacapital.net, torvalds@linux-foundation.org,
	hagen@jauu.net, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org
Subject: Re: [PATCH] epoll: add exclusive wakeups flag
Date: Mon, 14 Mar 2016 15:32:19 -0400	[thread overview]
Message-ID: <56E711C3.8020008@akamai.com> (raw)
In-Reply-To: <56E6F941.9040307@gmail.com>



On 03/14/2016 01:47 PM, Michael Kerrisk (man-pages) wrote:
> [Restoring CC, which I see I accidentally dropped, one iteration back.]
> 
> Hi Jason,
> 
> Thanks for the review. I've tweaked one piece to respond to your
> feedback. But I also have another new question below.
> 
> On 03/15/2016 03:55 AM, Jason Baron wrote:
>> On 03/11/2016 06:25 PM, Michael Kerrisk (man-pages) wrote:
>>> On 03/11/2016 09:51 PM, Jason Baron wrote:
>>>> On 03/11/2016 03:30 PM, Michael Kerrisk (man-pages) wrote:
> 
> [...]
> 
>> Hi Michael,
>>
>> Looks good. One comment below.
>>
>> Thanks,
>>
>>>        EPOLLEXCLUSIVE (since Linux 4.5)
>>>               Sets  an  exclusive  wakeup  mode  for  the  epoll  file
>>>               descriptor  that  is  being  attached to the target file
>>>               descriptor, fd.  When a wakeup event occurs and multiple
>>>               epoll  file  descriptors are attached to the same target
>>>               file using EPOLLEXCLUSIVE, one or more of the epoll file
>>>               descriptors  will  receive  an event with epoll_wait(2).
>>>               The default in this scenario (when EPOLLEXCLUSIVE is not
>>>               set)  is  for  all  epoll file descriptors to receive an
>>>               event.  EPOLLEXCLUSIVE is thus useful for avoiding thun‐
>>>               dering herd problems in certain scenarios.
>>>
>>>               If  the  same  file  descriptor  is  in  multiple  epoll
>>>               instances, some with the EPOLLEXCLUSIVE flag, and others
>>>               without,   then   events  will  provided  to  all  epoll
>>>               instances that did not specify  EPOLLEXCLUSIVE,  and  at
>>>               least  one  of  the  epoll  instances  that  did specify
>>>               EPOLLEXCLUSIVE.
>>>
>>>               The following values may  be  specified  in  conjunction
>>>               with EPOLLEXCLUSIVE: EPOLLIN, EPOLLOUT, EPOLLWAKEUP, and
>>>               EPOLLET.  EPOLLHUP and EPOLLERR can also  be  specified,
>>>               but  are  ignored (as usual).  Attempts to specify other
>>
>> I'm not sure 'ignored' is the right wording here. 'EPOLLHUP' and
>> 'EPOLERR' are always included in the set of events when something is
>> added as EPOLLEXCLUSIVE. This is consistent with the non-EPOLLEXCLUSIVE
>> add case. 
> 
> Yes.
> 
>> So 'EPOLLHUP' and 'EPOLERR' may be specified but will be
>> included in the set of events on an add, whether they are specified or not.
> 
> Yes. I understand your discomfort with the work "ignored", but the 
> problem was that, because it made special mention of EPOLLHUP and EPOLLERR,
> your proposed text made it sound as though EPOLLEXCLUSIVE somehow was
> special with respect to these two flags. I wanted to clarify that it is not.
> How about this:
> 
>               The following values may  be  specified  in  conjunction
>               with EPOLLEXCLUSIVE: EPOLLIN, EPOLLOUT, EPOLLWAKEUP, and
>               EPOLLET.  EPOLLHUP and EPOLLERR can also  be  specified,
>               but  this  is  not  required: as usual, these events are
>               always reported if they  occur,  regardless  of  whether
>               they are specified in events.
> ?

Yes, nothing special here with respect to EPOLLHUP and EPOLLERR. So this
looks fine to me.

> 
>>>               values in events yield an error.  EPOLLEXCLUSIVE may  be
>>>               used  only  in  an  EPOLL_CTL_ADD operation; attempts to
>>>               employ  it  with  EPOLL_CTL_MOD  yield  an  error.    If
>>>               EPOLLEXCLUSIVE has set using epoll_ctl(2), then a subse‐
>>>               quent EPOLL_CTL_MOD on the same epfd, fd pair yields  an
> b>>               error.  An epoll_ctl(2) that specifies EPOLLEXCLUSIVE in
>>>               events and specifies the target file descriptor fd as an
>>>               epoll  instance will likewise fail.  The error in all of
>>>               these cases is EINVAL.
>>>
>>>    ERRORS
>>>        EINVAL An invalid event type was specified along with  EPOLLEX‐
>>>               CLUSIVE in events.
>>>
>>>        EINVAL op was EPOLL_CTL_MOD and events included EPOLLEXCLUSIVE.
>>>
>>>        EINVAL op  was  EPOLL_CTL_MOD  and  the EPOLLEXCLUSIVE flag has
>>>               previously been applied to this epfd, fd pair.
>>>
>>>        EINVAL EPOLLEXCLUSIVE was specified in event and fd  is  refers
>>>               to an epoll instance.
> 
> Returning to the second sentence in this description:
> 
>               When a wakeup event occurs and multiple epoll file descrip‐
>               tors are attached to the same target file using EPOLLEXCLU‐
>               SIVE, one or  more  of  the  epoll  file  descriptors  will
>               receive  an  event with epoll_wait(2).
> 
> There is a point that is unclear to me: what does "target file" refer to?
> Is it an open file description (aka open file table entry) or an inode?
> I suspect the former, but it was not clear in your original text.
>

So from epoll's perspective, the wakeups are associated with a 'wait
queue'. So if the open() and subsequent EPOLL_CTL_ADD (which is done via
file->poll()) results in adding to the same 'wait queue' then we will
get 'exclusive' wakeup behavior.

So in general, I think the answer here is that its associated with the
inode (I coudn't say with 100% certainty without really looking at all
file->poll() implementations). Certainly, with the 'FIFO' example below,
the two scenarios will have the same behavior with respect to
EPOLLEXCLUSIVE.

Also, the 'non-exclusive' mode would be subject to the same question of
which wait queue is the epfd is associated with...

Thanks,

-Jason

> To make this point even clearer, here are two scenarios I'm thinking of.
> In each case, we're talking of monitoring the read end of a FIFO.
> 
> ===
> 
> Scenario 1:
> 
> We have three processes each of which
> 1. Creates an epoll instance
> 2. Opens the read end of the FIFO
> 3. Adds the read end of the FIFO to the epoll instance, specifying
>    EPOLLEXCLUSIVE
> 
> When input becomes available on the FIFO, how many processes
> get a wakeup?
> 
> ===
> 
> Scenario 3
> 
> A parent process opens the read end of a FIFO and then calls
> fork() three times to create three children. Each child then:
> 
> 1. Creates an epoll instance
> 2. Adds the read end of the FIFO to the epoll instance, specifying
> EPOLLEXCLUSIVE
> 
> When input becomes available on the FIFO, how many processes
> get a wakeup?
> 
> ===
> 
> Cheers,
> 
> Michael
> 

  reply	other threads:[~2016-03-14 19:32 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-08  3:23 [PATCH] epoll: add exclusive wakeups flag Jason Baron
2015-12-08  3:23 ` [PATCH] epoll: add EPOLLEXCLUSIVE flag Jason Baron
2016-01-28  7:16 ` [PATCH] epoll: add exclusive wakeups flag Michael Kerrisk (man-pages)
2016-01-28 17:57   ` Jason Baron
2016-01-29  8:14     ` Michael Kerrisk (man-pages)
2016-02-01 19:42       ` Jason Baron
2016-03-10 18:53       ` Jason Baron
2016-03-10 19:47         ` Michael Kerrisk (man-pages)
2016-03-10 19:58         ` Michael Kerrisk (man-pages)
2016-03-10 20:40           ` Jason Baron
2016-03-11 20:30             ` Michael Kerrisk (man-pages)
     [not found]               ` <56E32FC5.4030902@akamai.com>
     [not found]                 ` <56E353CF.6050503@gmail.com>
     [not found]                   ` <56E6D0ED.20609@akamai.com>
2016-03-14 17:47                     ` Michael Kerrisk (man-pages)
2016-03-14 19:32                       ` Jason Baron [this message]
2016-03-14 20:01                         ` Michael Kerrisk (man-pages)
2016-03-14 21:03                           ` Michael Kerrisk (man-pages)
2016-03-14 22:35                             ` Jason Baron
2016-03-14 23:09                               ` Madars Vitolins
2016-03-14 23:26                               ` Michael Kerrisk (man-pages)
2016-03-15  2:36                                 ` Jason Baron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56E711C3.8020008@akamai.com \
    --to=jbaron@akamai.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=hagen@jauu.net \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=m@silodev.com \
    --cc=mingo@kernel.org \
    --cc=mtk.manpages@gmail.com \
    --cc=normalperson@yhbt.net \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@ftp.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).