LKML Archive mirror
 help / color / mirror / Atom feed
From: Jason Baron <jbaron@akamai.com>
To: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>,
	akpm@linux-foundation.org
Cc: mingo@kernel.org, peterz@infradead.org, viro@ftp.linux.org.uk,
	normalperson@yhbt.net, m@silodev.com, corbet@lwn.net,
	luto@amacapital.net, torvalds@linux-foundation.org,
	hagen@jauu.net, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org
Subject: Re: [PATCH] epoll: add exclusive wakeups flag
Date: Thu, 10 Mar 2016 15:40:34 -0500	[thread overview]
Message-ID: <56E1DBC2.6040109@akamai.com> (raw)
In-Reply-To: <56E1D1D7.8040000@gmail.com>



On 03/10/2016 02:58 PM, Michael Kerrisk (man-pages) wrote:
> On 03/10/2016 07:53 PM, Jason Baron wrote:
>> Hi Michael,
>>
>> On 01/29/2016 03:14 AM, Michael Kerrisk (man-pages) wrote:
>>> Hello Jason,
>>> On 01/28/2016 06:57 PM, Jason Baron wrote:
>>>> Hi,
>>>>
>>>> On 01/28/2016 02:16 AM, Michael Kerrisk (man-pages) wrote:
>>>>> Hi Jason,
>>>>>
>>>>> On 12/08/2015 04:23 AM, Jason Baron wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Re-post of an old series addressing thundering herd issues when sharing
>>>>>> an event source fd amongst multiple epoll fds. Last posting was here
>>>>>> for reference: https://lkml.org/lkml/2015/2/25/56
>>>>>>  
>>>>>> The patch herein drops the core scheduler 'rotate' changes I had previously
>>>>>> proposed as this patch seems performant without those.
>>>>>>
>>>>>> I was prompted to re-post this because Madars Vitolins reported some good
>>>>>> speedups with this patch using Enduro/X application. His writeup is here:
>>>>>> https://mvitolin.wordpress.com/2015/12/05/endurox-testing-epollexclusive-flag/
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> -Jason
>>>>>>
>>>>>> Sample epoll_clt text:
>>>>>
>>>>> Thanks for the proposed text. I have some questions about points
>>>>> that are not quite clear to me.
>>>>>
>>>>>> EPOLLEXCLUSIVE
>>>>>>         Sets an exclusive wakeup mode for the epfd file descriptor that is
>>>>>> 	being attached to the target file descriptor, fd. Thus, when an
>>>>>> 	event occurs and multiple epfd file descriptors are attached to the
>>>>>> 	same target file using EPOLLEXCLUSIVE, one or more epfds will receive
>>>>>> 	an event with epoll_wait(2). The default in this scenario (when
>>>>>> 	EPOLLEXCLUSIVE is not set) is for all epfds to receive an event.
>>>>>> 	EPOLLEXLUSVIE may only be specified with the op EPOLL_CTL_ADD.
>>>>>
>>>>> So, assuming an FD is present in the interest list of multiple (say 6)
>>>>> epoll FDs, and some (say 3) of those attachments were done using
>>>>> EPOLLEXCLUSVE. Which of the following statements are correct:
>>>>>
>>>>> (a) It's guaranteed that *none* of the epoll FDs that did NOT specify
>>>>>     EPOLLEXCLUSIVE will receive an event.
>>>>>
>>>>> (b) It's guaranteed that *all* of the epoll FDs that did NOT specify
>>>>>     EPOLLEXCLUSIVE will receive an event.
>>>>>
>>>>> (c) From 1 to 3 of the epoll FDs that did specify EPOLLEXCLUSIVE
>>>>>     will receive an event.
>>>>>
>>>>> (d) Exactly one epoll FD that did specify EPOLLEXCLUSIVE will get
>>>>>     an event, and it is indeterminate which one.
>>>>>
>>>>
>>>> So b and c. All the non-exclusive adds will get it and at least 1 of the
>>>> exclusive adds will as well.
>>>
>>> So is it fair to say that the expected use case is that all epoll sets
>>> would use EPOLLEXCLUSIVE?
>>>
>>>>> I suppose one point I'm trying to uncover in the above is: what is
>>>>> the scope of EPOLLEXCLUSIVE? Is it just applicable for one process's
>>>>> FD, or is it setting an attribute in the epoll "interest list" record
>>>>> for that FD that affects notification behavior across all processes?
>>>>>
>>>>
>>>> Right - so 'EPOLLEXCLUSIVE' will affect other epoll sets that are also
>>>> using 'EPOLLEXCLUSIVE' against the the same fd, but will have no affect
>>>> on epoll sets connected to fd that do not specify it.
>>>>
>>>>
>>>>> And then:
>>>>>
>>>>> (1) What are the semantics of EPOLLEXCLUSIVE if the added FD becomes
>>>>>     disabled via EPOLLONESHOT (or explicitly via EPOLL_CTL_MOD with
>>>>>     the 'events' field set to 0)?
>>>>>
>>>>
>>>> In the case of EPOLLEXCLUSIVE and EPOLLONESHOT, one would have to re-arm
>>>> at least 1 of threads that was woken up by doing EPOLL_CTL_MOD to
>>>> guarantee further wakeups.
>>>>
>>>> And like-wise with an EPOLL_CTL_MOD with 'events' all set to 0, one
>>>> would need to either re-arm the thread that set the 'events' field to 0
>>>> (by setting back to non-zero), or re-arm in at least one other thread
>>>> via EPOLL_CTL_MOD (or delete and add).
>>>
>>> Okay -- so when an EPOLLEXCLUSIVE FD becomes disarmed it is possible
>>> to re-enable rith EPOLL_CTL_MOD; one doesn't need to delete and re-add
>>> the FD.
>>>
>>>>> (2) The source code contains a comment "we do not currently supported 
>>>>>     nested exclusive wakeups". Could you elaborate on this point? It
>>>>>     sounds like something that should be documented.
>>>>
>>>> So I was just trying to say that we return -EINVAL if you try to do and
>>>> EPOLL_CTL_ADD with EPOLLEXCLUSIVE and the 'fd' argument is a epoll fd
>>>> returned via epoll_create().
>>>
>>> Okay -- that definitely belongs in the man page.
>>>
>>> I'll work up a text, but would like to get input about the "use case"
>>> question above.
>>>
>>> Cheers,
>>>
>>> Michael
>>>
>>>
>>>
>>
>> Ok, here's some updated text:
>>
>> EPOLLEXCLUSIVE
>>
>> Sets an exclusive wakeup mode for the epfd file descriptor that is being
>> attached to the target file descriptor, fd. When a wakeup event occurs
>> and multiple epfd file descriptors are attached to the same target file
>> using EPOLLEXCLUSIVE, one or more epfds will receive an event with
>> epoll_wait(2). The default in this scenario (when EPOLLEXCLUSIVE is not
>> set) is for all epfds to receive an event.
>>
>> The events supported by EPOLLEXCLUSIVE are: EPOLLIN, EPOLLOUT, EPOLLERR,
>> EPOLLHUP, EPOLLWAKEUP, and EPOLLET. epoll_wait(2) will always wait for
>> EPOLLERR and EPOLLHUP; it is not necessary to set it in events. If
>> EPOLLEXCLUSIVE is set using epoll_ctl(2), then a subsequent
>> EPOLL_CTL_MOD on the same epfd, fd pair will retrun -EINVAL. An
>> epoll_ctl(2) that specifies EPOLLEXCLUSIVE in events and specifies the
>> target file descriptor fd as an epoll instance will return -EINVAL
>> as well.
> 
> By the way, in the code you have
> 
>         case EPOLL_CTL_MOD:
>                 if (epi) { 
>                         if (!(epi->event.events & EPOLLEXCLUSIVE)) {
>                                 epds.events |= POLLERR | POLLHUP;
>                                 error = ep_modify(ep, epi, &epds);
>                         }
> 
> I think the "if" here is redundant. IIUC, earlier in the code you
> disallow EPOLL_CTL_MOD with EPOLLEXCLUSIVE.
> 
> Cheers,
> 
> Michael
> 
> 

Hi Michael,

So the previous check ensures that you can not add the EPOLLEXCLUSIVE
flag to the events via an EPOLL_CTL_MOD operation, where EPOLLEXCLUSIVE
may not be the existing events set. While this check here ensure you
can't modify an existing set that already has the EPOLLEXCLUSIVE flag.

Thanks,

-Jason

  reply	other threads:[~2016-03-10 20:40 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-08  3:23 [PATCH] epoll: add exclusive wakeups flag Jason Baron
2015-12-08  3:23 ` [PATCH] epoll: add EPOLLEXCLUSIVE flag Jason Baron
2016-01-28  7:16 ` [PATCH] epoll: add exclusive wakeups flag Michael Kerrisk (man-pages)
2016-01-28 17:57   ` Jason Baron
2016-01-29  8:14     ` Michael Kerrisk (man-pages)
2016-02-01 19:42       ` Jason Baron
2016-03-10 18:53       ` Jason Baron
2016-03-10 19:47         ` Michael Kerrisk (man-pages)
2016-03-10 19:58         ` Michael Kerrisk (man-pages)
2016-03-10 20:40           ` Jason Baron [this message]
2016-03-11 20:30             ` Michael Kerrisk (man-pages)
     [not found]               ` <56E32FC5.4030902@akamai.com>
     [not found]                 ` <56E353CF.6050503@gmail.com>
     [not found]                   ` <56E6D0ED.20609@akamai.com>
2016-03-14 17:47                     ` Michael Kerrisk (man-pages)
2016-03-14 19:32                       ` Jason Baron
2016-03-14 20:01                         ` Michael Kerrisk (man-pages)
2016-03-14 21:03                           ` Michael Kerrisk (man-pages)
2016-03-14 22:35                             ` Jason Baron
2016-03-14 23:09                               ` Madars Vitolins
2016-03-14 23:26                               ` Michael Kerrisk (man-pages)
2016-03-15  2:36                                 ` Jason Baron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56E1DBC2.6040109@akamai.com \
    --to=jbaron@akamai.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=hagen@jauu.net \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=m@silodev.com \
    --cc=mingo@kernel.org \
    --cc=mtk.manpages@gmail.com \
    --cc=normalperson@yhbt.net \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@ftp.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).