From: Jason Baron <jbaron@akamai.com>
To: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: mingo@kernel.org, peterz@infradead.org, viro@ftp.linux.org.uk,
normalperson@yhbt.net, m@silodev.com, corbet@lwn.net,
luto@amacapital.net, torvalds@linux-foundation.org,
hagen@jauu.net, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org
Subject: Re: [PATCH] epoll: add exclusive wakeups flag
Date: Mon, 14 Mar 2016 18:35:07 -0400 [thread overview]
Message-ID: <56E73C9B.9060206@akamai.com> (raw)
In-Reply-To: <56E7273D.3010403@gmail.com>
Hi Michael,
On 03/14/2016 05:03 PM, Michael Kerrisk (man-pages) wrote:
> Hi Jason,
>
> On 03/15/2016 09:01 AM, Michael Kerrisk (man-pages) wrote:
>> Hi Jason,
>>
>> On 03/15/2016 08:32 AM, Jason Baron wrote:
>>>
>>>
>>> On 03/14/2016 01:47 PM, Michael Kerrisk (man-pages) wrote:
>>>> [Restoring CC, which I see I accidentally dropped, one iteration back.]
>
> [...]
>
>>>> Returning to the second sentence in this description:
>>>>
>>>> When a wakeup event occurs and multiple epoll file descrip‐
>>>> tors are attached to the same target file using EPOLLEXCLU‐
>>>> SIVE, one or more of the epoll file descriptors will
>>>> receive an event with epoll_wait(2).
>>>>
>>>> There is a point that is unclear to me: what does "target file" refer to?
>>>> Is it an open file description (aka open file table entry) or an inode?
>>>> I suspect the former, but it was not clear in your original text.
>>>>
>>>
>>> So from epoll's perspective, the wakeups are associated with a 'wait
>>> queue'. So if the open() and subsequent EPOLL_CTL_ADD (which is done via
>>> file->poll()) results in adding to the same 'wait queue' then we will
>>> get 'exclusive' wakeup behavior.
>>>
>>> So in general, I think the answer here is that its associated with the
>>> inode (I coudn't say with 100% certainty without really looking at all
>>> file->poll() implementations). Certainly, with the 'FIFO' example below,
>>> the two scenarios will have the same behavior with respect to
>>> EPOLLEXCLUSIVE.
>
> So, I was actually a little surprised by this, and went away and tested
> this point. It appears to me that that the two scenarios described below
> do NOT have the same behavior with respect to EPOLLEXCLUSIVE. See below.
>
>> So, in both scenarios, *one or more* processes will get a wakeup?
>> (I'll try to add something to the text to clarify the detail we're
>> discussing.)
>>
>>> Also, the 'non-exclusive' mode would be subject to the same question of
>>> which wait queue is the epfd is associated with...
>>
>> I'm not sure of the point you are trying to make here?
>>
>> Cheers,
>>
>> Michael
>>
>>
>>>> To make this point even clearer, here are two scenarios I'm thinking of.
>>>> In each case, we're talking of monitoring the read end of a FIFO.
>>>>
>>>> ===
>>>>
>>>> Scenario 1:
>>>>
>>>> We have three processes each of which
>>>> 1. Creates an epoll instance
>>>> 2. Opens the read end of the FIFO
>>>> 3. Adds the read end of the FIFO to the epoll instance, specifying
>>>> EPOLLEXCLUSIVE
>>>>
>>>> When input becomes available on the FIFO, how many processes
>>>> get a wakeup?
>
> When I test this scenario, all three processes get a wakeup.
>
>>>> ===
>>>>
>>>> Scenario 3
>>>>
>>>> A parent process opens the read end of a FIFO and then calls
>>>> fork() three times to create three children. Each child then:
>>>>
>>>> 1. Creates an epoll instance
>>>> 2. Adds the read end of the FIFO to the epoll instance, specifying
>>>> EPOLLEXCLUSIVE
>>>>
>>>> When input becomes available on the FIFO, how many processes
>>>> get a wakeup?
>
> When I test this scenario, one process gets a wakeup.
>
> In other words, "target file" appears to mean open file description
> (aka open file table entry), not inode.
>
> This is actually what I suspected might be the case, but now I am
> puzzled. Given what I've discovered and what you suggest are the
> semantics, is the implementation correct? (I suspect that it is,
> but it is at odds with your statement above. My test programs are
> inline below.
>
> Cheers,
>
> Michael
>
Thanks for the test cases. So in your first test case, you are exiting
immediately after the epoll_wait() returns. So this is actually causing
the next wakeup. And then the 2nd thread returns from epoll_wait() and
this causes the 3rd wakeup.
So the wakeups are actually not happening from the write directly, but
instead from the readers doing a close(). If you do some sort of sleep
after the epoll_wait() you can confirm the behavior. So I believe this
is working as expected.
Thanks,
-Jason
> ============
>
> /* t_EPOLLEXCLUSIVE_multipen.c
>
> Licensed under GNU GPLv2 or later.
> */
> #include <sys/epoll.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sys/types.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <string.h>
>
> #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
> } while (0)
>
> #define usageErr(msg, progName) \
> do { fprintf(stderr, "Usage: "); \
> fprintf(stderr, msg, progName); \
> exit(EXIT_FAILURE); } while (0)
>
> #ifndef EPOLLEXCLUSIVE
> #define EPOLLEXCLUSIVE (1 << 28)
> #endif
>
> int
> main(int argc, char *argv[])
> {
> int fd, epfd, nready;
> struct epoll_event ev, rev;
>
> if (argc != 2 || strcmp(argv[1], "--help") == 0)
> usageErr("%s <FIFO>n", argv[0]);
>
> epfd = epoll_create(2);
> if (epfd == -1)
> errExit("epoll_create");
>
> fd = open(argv[1], O_RDONLY);
> if (fd == -1)
> errExit("open");
> printf("Opened %s\n", argv[1]);
>
> ev.events = EPOLLIN | EPOLLEXCLUSIVE;
> if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
> errExit("epoll_ctl");
>
> nready = epoll_wait(epfd, &rev, 1, -1);
> if (nready == -1)
> errExit("epoll-wait");
> printf("epoll_wait() returned %d\n", nready);
>
> exit(EXIT_SUCCESS);
> }
>
> ===============
>
> /* t_EPOLLEXCLUSIVE_fork.c
>
> Licensed under GNU GPLv2 or later.
> */
>
> #include <sys/epoll.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sys/types.h>
> #include <sys/wait.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <string.h>
>
> #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
> } while (0)
>
> #define usageErr(msg, progName) \
> do { fprintf(stderr, "Usage: "); \
> fprintf(stderr, msg, progName); \
> exit(EXIT_FAILURE); } while (0)
>
> #ifndef EPOLLEXCLUSIVE
> #define EPOLLEXCLUSIVE (1 << 28)
> #endif
>
> int
> main(int argc, char *argv[])
> {
> int fd, epfd, nready;
> struct epoll_event ev, rev;
> int cnum;
>
> if (argc != 2 || strcmp(argv[1], "--help") == 0)
> usageErr("%s <FIFO>n", argv[0]);
>
> fd = open(argv[1], O_RDONLY);
> if (fd == -1)
> errExit("open");
> printf("Opened %s\n", argv[1]);
>
> for (cnum = 0; cnum < 3; cnum++) {
> switch (fork()) {
> case -1:
> errExit("fork");
>
> case 0: /* Child */
> epfd = epoll_create(2);
> if (epfd == -1)
> errExit("epoll_create");
>
> ev.events = EPOLLIN | EPOLLEXCLUSIVE;
> if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
> errExit("epoll_ctl");
>
> nready = epoll_wait(epfd, &rev, 1, -1);
> if (nready == -1)
> errExit("epoll-wait");
> printf("Child %d: epoll_wait() returned %d\n", cnum, nready);
> exit(EXIT_SUCCESS);
>
> default:
> break;
> }
> }
>
> wait(NULL);
> wait(NULL);
> wait(NULL);
>
> exit(EXIT_SUCCESS);
> }
>
next prev parent reply other threads:[~2016-03-14 22:35 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-08 3:23 [PATCH] epoll: add exclusive wakeups flag Jason Baron
2015-12-08 3:23 ` [PATCH] epoll: add EPOLLEXCLUSIVE flag Jason Baron
2016-01-28 7:16 ` [PATCH] epoll: add exclusive wakeups flag Michael Kerrisk (man-pages)
2016-01-28 17:57 ` Jason Baron
2016-01-29 8:14 ` Michael Kerrisk (man-pages)
2016-02-01 19:42 ` Jason Baron
2016-03-10 18:53 ` Jason Baron
2016-03-10 19:47 ` Michael Kerrisk (man-pages)
2016-03-10 19:58 ` Michael Kerrisk (man-pages)
2016-03-10 20:40 ` Jason Baron
2016-03-11 20:30 ` Michael Kerrisk (man-pages)
[not found] ` <56E32FC5.4030902@akamai.com>
[not found] ` <56E353CF.6050503@gmail.com>
[not found] ` <56E6D0ED.20609@akamai.com>
2016-03-14 17:47 ` Michael Kerrisk (man-pages)
2016-03-14 19:32 ` Jason Baron
2016-03-14 20:01 ` Michael Kerrisk (man-pages)
2016-03-14 21:03 ` Michael Kerrisk (man-pages)
2016-03-14 22:35 ` Jason Baron [this message]
2016-03-14 23:09 ` Madars Vitolins
2016-03-14 23:26 ` Michael Kerrisk (man-pages)
2016-03-15 2:36 ` Jason Baron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56E73C9B.9060206@akamai.com \
--to=jbaron@akamai.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=hagen@jauu.net \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@amacapital.net \
--cc=m@silodev.com \
--cc=mingo@kernel.org \
--cc=mtk.manpages@gmail.com \
--cc=normalperson@yhbt.net \
--cc=peterz@infradead.org \
--cc=torvalds@linux-foundation.org \
--cc=viro@ftp.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).