From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933124AbcCNXWQ (ORCPT ); Mon, 14 Mar 2016 19:22:16 -0400 Received: from mail2.elkosia.lv ([85.15.200.133]:42698 "EHLO prod.silodev.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932728AbcCNXWH (ORCPT ); Mon, 14 Mar 2016 19:22:07 -0400 X-Greylist: delayed 626 seconds by postgrey-1.27 at vger.kernel.org; Mon, 14 Mar 2016 19:22:05 EDT MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Date: Tue, 15 Mar 2016 01:09:21 +0200 From: Madars Vitolins To: Jason Baron , "Michael Kerrisk (man-pages)" Cc: Andrew Morton , mingo@kernel.org, peterz@infradead.org, viro@ftp.linux.org.uk, normalperson@yhbt.net, corbet@lwn.net, luto@amacapital.net, torvalds@linux-foundation.org, hagen@jauu.net, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org Subject: Re: [PATCH] epoll: add exclusive wakeups flag In-Reply-To: <56E73C9B.9060206@akamai.com> References: <56A9C03B.7020104@gmail.com> <56AA56A2.3000700@akamai.com> <56AB1F6C.7000609@gmail.com> <56E1C2B5.2040905@akamai.com> <56E1D1D7.8040000@gmail.com> <56E1DBC2.6040109@akamai.com> <56E32FC5.4030902@akamai.com> <56E353CF.6050503@gmail.com> <56E6D0ED.20609@akamai.com> <56E6F941.9040307@gmail.com> <56E711C3.8020008@akamai.com> <56E71894.4090607@gmail.com> <56E7273D.3010403@gmail.com> <56E73C9B.9060206@akamai.com> Message-ID: User-Agent: Roundcube Webmail/1.1.3 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Jason and Michael, Hmm... I tried to play with those pipe samples bellow, but even with sleep I got that all process wakeups (maybe I miss something too), also tried with EPOLLIN. On same bases I created sample with Posix Queues with EPOLLIN | EPOLLEXCLUSIVE and the goods news are that it works correctly. file q.c: ================== #include #include #include #include #include #include #include #include #include #include #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \ } while (0) #define usageErr(msg, progName) \ do { fprintf(stderr, "Usage: "); \ fprintf(stderr, msg, progName); \ exit(EXIT_FAILURE); } while (0) #ifndef EPOLLEXCLUSIVE #define EPOLLEXCLUSIVE (1 << 28) #endif #define MAX_SIZE 10 int main (int argc, char *argv[]) { int epfd, nready; struct epoll_event ev, rev; mqd_t fd; struct mq_attr attr; char buffer[MAX_SIZE + 1]; int cnum; /* initialize the queue attributes */ attr.mq_flags = 0; attr.mq_maxmsg = 5; attr.mq_msgsize = MAX_SIZE; attr.mq_curmsgs = 0; /* cleanup for multiple runs... */ mq_unlink ("/TESTQ"); /* create the message queue */ fd = mq_open ("/TESTQ", O_CREAT | O_RDWR | O_NONBLOCK, S_IWUSR | S_IRUSR, &attr); if (fd == -1) errExit ("open"); for (cnum = 0; cnum < 3; cnum++) { switch (fork ()) { case -1: errExit ("fork"); case 0: /* Child */ epfd = epoll_create (2); if (epfd == -1) errExit ("epoll_create"); ev.events = EPOLLIN | EPOLLEXCLUSIVE; if (epoll_ctl (epfd, EPOLL_CTL_ADD, fd, &ev) == -1) errExit ("epoll_ctl"); printf ("About to wait...\n"); nready = epoll_wait (epfd, &rev, 1, -1); if (nready == -1) errExit ("epoll-wait"); printf ("Child %d: epoll_wait() returned %d\n", cnum, nready); exit (EXIT_SUCCESS); default: break; } } sleep (1); /* send a msq to Q */ memset (buffer, 0, MAX_SIZE); if (0 > mq_send (fd, buffer, MAX_SIZE, 0)) errExit ("mq_send"); printf ("msg sent ok...\n"); wait (NULL); wait (NULL); wait (NULL); exit (EXIT_SUCCESS); } ================== $ gcc q.c -lrt $ ./a.out About to wait... About to wait... About to wait... msg sent ok... Child 2: epoll_wait() returned 1 ^C $ Best regards, Madars Jason Baron @ 2016-03-15 00:35 rakstīja: > Hi Michael, > > On 03/14/2016 05:03 PM, Michael Kerrisk (man-pages) wrote: >> Hi Jason, >> >> On 03/15/2016 09:01 AM, Michael Kerrisk (man-pages) wrote: >>> Hi Jason, >>> >>> On 03/15/2016 08:32 AM, Jason Baron wrote: >>>> >>>> >>>> On 03/14/2016 01:47 PM, Michael Kerrisk (man-pages) wrote: >>>>> [Restoring CC, which I see I accidentally dropped, one iteration >>>>> back.] >> >> [...] >> >>>>> Returning to the second sentence in this description: >>>>> >>>>> When a wakeup event occurs and multiple epoll file >>>>> descrip‐ >>>>> tors are attached to the same target file using >>>>> EPOLLEXCLU‐ >>>>> SIVE, one or more of the epoll file descriptors >>>>> will >>>>> receive an event with epoll_wait(2). >>>>> >>>>> There is a point that is unclear to me: what does "target file" >>>>> refer to? >>>>> Is it an open file description (aka open file table entry) or an >>>>> inode? >>>>> I suspect the former, but it was not clear in your original text. >>>>> >>>> >>>> So from epoll's perspective, the wakeups are associated with a 'wait >>>> queue'. So if the open() and subsequent EPOLL_CTL_ADD (which is done >>>> via >>>> file->poll()) results in adding to the same 'wait queue' then we >>>> will >>>> get 'exclusive' wakeup behavior. >>>> >>>> So in general, I think the answer here is that its associated with >>>> the >>>> inode (I coudn't say with 100% certainty without really looking at >>>> all >>>> file->poll() implementations). Certainly, with the 'FIFO' example >>>> below, >>>> the two scenarios will have the same behavior with respect to >>>> EPOLLEXCLUSIVE. >> >> So, I was actually a little surprised by this, and went away and >> tested >> this point. It appears to me that that the two scenarios described >> below >> do NOT have the same behavior with respect to EPOLLEXCLUSIVE. See >> below. >> >>> So, in both scenarios, *one or more* processes will get a wakeup? >>> (I'll try to add something to the text to clarify the detail we're >>> discussing.) >>> >>>> Also, the 'non-exclusive' mode would be subject to the same question >>>> of >>>> which wait queue is the epfd is associated with... >>> >>> I'm not sure of the point you are trying to make here? >>> >>> Cheers, >>> >>> Michael >>> >>> >>>>> To make this point even clearer, here are two scenarios I'm >>>>> thinking of. >>>>> In each case, we're talking of monitoring the read end of a FIFO. >>>>> >>>>> === >>>>> >>>>> Scenario 1: >>>>> >>>>> We have three processes each of which >>>>> 1. Creates an epoll instance >>>>> 2. Opens the read end of the FIFO >>>>> 3. Adds the read end of the FIFO to the epoll instance, specifying >>>>> EPOLLEXCLUSIVE >>>>> >>>>> When input becomes available on the FIFO, how many processes >>>>> get a wakeup? >> >> When I test this scenario, all three processes get a wakeup. >> >>>>> === >>>>> >>>>> Scenario 3 >>>>> >>>>> A parent process opens the read end of a FIFO and then calls >>>>> fork() three times to create three children. Each child then: >>>>> >>>>> 1. Creates an epoll instance >>>>> 2. Adds the read end of the FIFO to the epoll instance, specifying >>>>> EPOLLEXCLUSIVE >>>>> >>>>> When input becomes available on the FIFO, how many processes >>>>> get a wakeup? >> >> When I test this scenario, one process gets a wakeup. >> >> In other words, "target file" appears to mean open file description >> (aka open file table entry), not inode. >> >> This is actually what I suspected might be the case, but now I am >> puzzled. Given what I've discovered and what you suggest are the >> semantics, is the implementation correct? (I suspect that it is, >> but it is at odds with your statement above. My test programs are >> inline below. >> >> Cheers, >> >> Michael >> > > Thanks for the test cases. So in your first test case, you are exiting > immediately after the epoll_wait() returns. So this is actually causing > the next wakeup. And then the 2nd thread returns from epoll_wait() and > this causes the 3rd wakeup. > > So the wakeups are actually not happening from the write directly, but > instead from the readers doing a close(). If you do some sort of sleep > after the epoll_wait() you can confirm the behavior. So I believe this > is working as expected. > > Thanks, > > -Jason > > >> ============ >> >> /* t_EPOLLEXCLUSIVE_multipen.c >> >> Licensed under GNU GPLv2 or later. >> */ >> #include >> #include >> #include >> #include >> #include >> #include >> #include >> #include >> >> #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \ >> } while (0) >> >> #define usageErr(msg, progName) \ >> do { fprintf(stderr, "Usage: "); \ >> fprintf(stderr, msg, progName); \ >> exit(EXIT_FAILURE); } while (0) >> >> #ifndef EPOLLEXCLUSIVE >> #define EPOLLEXCLUSIVE (1 << 28) >> #endif >> >> int >> main(int argc, char *argv[]) >> { >> int fd, epfd, nready; >> struct epoll_event ev, rev; >> >> if (argc != 2 || strcmp(argv[1], "--help") == 0) >> usageErr("%s n", argv[0]); >> >> epfd = epoll_create(2); >> if (epfd == -1) >> errExit("epoll_create"); >> >> fd = open(argv[1], O_RDONLY); >> if (fd == -1) >> errExit("open"); >> printf("Opened %s\n", argv[1]); >> >> ev.events = EPOLLIN | EPOLLEXCLUSIVE; >> if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1) >> errExit("epoll_ctl"); >> >> nready = epoll_wait(epfd, &rev, 1, -1); >> if (nready == -1) >> errExit("epoll-wait"); >> printf("epoll_wait() returned %d\n", nready); >> >> exit(EXIT_SUCCESS); >> } >> >> =============== >> >> /* t_EPOLLEXCLUSIVE_fork.c >> >> Licensed under GNU GPLv2 or later. >> */ >> >> #include >> #include >> #include >> #include >> #include >> #include >> #include >> #include >> #include >> >> #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \ >> } while (0) >> >> #define usageErr(msg, progName) \ >> do { fprintf(stderr, "Usage: "); \ >> fprintf(stderr, msg, progName); \ >> exit(EXIT_FAILURE); } while (0) >> >> #ifndef EPOLLEXCLUSIVE >> #define EPOLLEXCLUSIVE (1 << 28) >> #endif >> >> int >> main(int argc, char *argv[]) >> { >> int fd, epfd, nready; >> struct epoll_event ev, rev; >> int cnum; >> >> if (argc != 2 || strcmp(argv[1], "--help") == 0) >> usageErr("%s n", argv[0]); >> >> fd = open(argv[1], O_RDONLY); >> if (fd == -1) >> errExit("open"); >> printf("Opened %s\n", argv[1]); >> >> for (cnum = 0; cnum < 3; cnum++) { >> switch (fork()) { >> case -1: >> errExit("fork"); >> >> case 0: /* Child */ >> epfd = epoll_create(2); >> if (epfd == -1) >> errExit("epoll_create"); >> >> ev.events = EPOLLIN | EPOLLEXCLUSIVE; >> if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1) >> errExit("epoll_ctl"); >> >> nready = epoll_wait(epfd, &rev, 1, -1); >> if (nready == -1) >> errExit("epoll-wait"); >> printf("Child %d: epoll_wait() returned %d\n", cnum, >> nready); >> exit(EXIT_SUCCESS); >> >> default: >> break; >> } >> } >> >> wait(NULL); >> wait(NULL); >> wait(NULL); >> >> exit(EXIT_SUCCESS); >> } >>