* [PATCH Resend] epoll: add EPOLLEXCLUSIVE support @ 2012-03-28 13:57 Hagen Paul Pfeifer 2012-03-28 14:09 ` richard -rw- weinberger 0 siblings, 1 reply; 12+ messages in thread From: Hagen Paul Pfeifer @ 2012-03-28 13:57 UTC (permalink / raw) To: torvalds; +Cc: linux-kernel, Hagen Paul Pfeifer High performance server sometimes create one listening socket (e.g. port 80), create a epoll file descriptor and add the socket. Afterwards create SC_NPROCESSORS_ONLN threads and wait for events. This often result in a thundering herd problem because all CPUs are scheduled. This patch add an additional flag to epoll_ctl(2) called EPOLLEXCLUSIVE. If a descriptor is added with this flag only one CPU is scheduled in. Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> --- Dave rejected the patch and said not network specific. Because there is no epoll maintainer this time directly. fs/eventpoll.c | 7 +++++-- include/linux/eventpoll.h | 3 +++ 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 629e9ed..16d787f 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -88,7 +88,7 @@ */ /* Epoll private bits inside the event mask */ -#define EP_PRIVATE_BITS (EPOLLONESHOT | EPOLLET) +#define EP_PRIVATE_BITS (EPOLLONESHOT | EPOLLET | EPOLLEXCLUSIVE) /* Maximum number of nesting allowed inside epoll sets */ #define EP_MAX_NESTS 4 @@ -969,7 +969,10 @@ static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead, init_waitqueue_func_entry(&pwq->wait, ep_poll_callback); pwq->whead = whead; pwq->base = epi; - add_wait_queue(whead, &pwq->wait); + if (unlikely(epi->event.events & EPOLLEXCLUSIVE)) + add_wait_queue_exclusive(whead, &pwq->wait); + else + add_wait_queue(whead, &pwq->wait); list_add_tail(&pwq->llink, &epi->pwqlist); epi->nwait++; } else { diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h index 657ab55..d334389 100644 --- a/include/linux/eventpoll.h +++ b/include/linux/eventpoll.h @@ -26,6 +26,9 @@ #define EPOLL_CTL_DEL 2 #define EPOLL_CTL_MOD 3 +/* Set Exclusive wake up behaviour for the target file descriptor */ +#define EPOLLEXCLUSIVE (1 << 29) + /* Set the One Shot behaviour for the target file descriptor */ #define EPOLLONESHOT (1 << 30) -- 1.7.9.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH Resend] epoll: add EPOLLEXCLUSIVE support 2012-03-28 13:57 [PATCH Resend] epoll: add EPOLLEXCLUSIVE support Hagen Paul Pfeifer @ 2012-03-28 14:09 ` richard -rw- weinberger 2012-03-28 16:21 ` Jason Baron 0 siblings, 1 reply; 12+ messages in thread From: richard -rw- weinberger @ 2012-03-28 14:09 UTC (permalink / raw) To: Hagen Paul Pfeifer Cc: torvalds, LKML, Al Viro, Lucas De Marchi, Andrew Morton, jbaron, linux-fsdevel On Wed, Mar 28, 2012 at 3:57 PM, Hagen Paul Pfeifer <hagen@jauu.net> wrote: > High performance server sometimes create one listening socket (e.g. port > 80), create a epoll file descriptor and add the socket. Afterwards > create SC_NPROCESSORS_ONLN threads and wait for events. This often > result in a thundering herd problem because all CPUs are scheduled. > > This patch add an additional flag to epoll_ctl(2) called EPOLLEXCLUSIVE. > If a descriptor is added with this flag only one CPU is scheduled in. > > Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> > --- > Dave rejected the patch and said not network specific. Because there > is no epoll maintainer this time directly. CC'ing maintainers for you... Please use scripts/get_maintainer.pl. -- Thanks, //richard ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH Resend] epoll: add EPOLLEXCLUSIVE support 2012-03-28 14:09 ` richard -rw- weinberger @ 2012-03-28 16:21 ` Jason Baron 2012-03-28 19:58 ` Hagen Paul Pfeifer 0 siblings, 1 reply; 12+ messages in thread From: Jason Baron @ 2012-03-28 16:21 UTC (permalink / raw) To: richard -rw- weinberger Cc: Hagen Paul Pfeifer, torvalds, LKML, Al Viro, Lucas De Marchi, Andrew Morton, linux-fsdevel, eric.dumazet On Wed, Mar 28, 2012 at 04:09:24PM +0200, richard -rw- weinberger wrote: > On Wed, Mar 28, 2012 at 3:57 PM, Hagen Paul Pfeifer <hagen@jauu.net> wrote: > > High performance server sometimes create one listening socket (e.g. port > > 80), create a epoll file descriptor and add the socket. Afterwards > > create SC_NPROCESSORS_ONLN threads and wait for events. This often > > result in a thundering herd problem because all CPUs are scheduled. > > > > This patch add an additional flag to epoll_ctl(2) called EPOLLEXCLUSIVE. > > If a descriptor is added with this flag only one CPU is scheduled in. > > > > Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> > > --- > > Dave rejected the patch and said not network specific. Because there > > is no epoll maintainer this time directly. > > CC'ing maintainers for you... > Please use scripts/get_maintainer.pl. > Hmmm...Looking at ep_poll() it does an '__add_wait_queue_exclusive()'. So, I *think* epoll_wait() should do what you want, if you are waiting on the same epfd in all the threads. I think the case you are describing is where each thread does its own ep_create(), and then a subsequent epoll_wait() on the fd from the create? So, I *think* you can get what you want without adding this flag. Thanks, -Jason ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH Resend] epoll: add EPOLLEXCLUSIVE support 2012-03-28 16:21 ` Jason Baron @ 2012-03-28 19:58 ` Hagen Paul Pfeifer 2012-03-29 14:16 ` Jason Baron 2012-03-29 14:51 ` Hagen Paul Pfeifer 0 siblings, 2 replies; 12+ messages in thread From: Hagen Paul Pfeifer @ 2012-03-28 19:58 UTC (permalink / raw) To: Jason Baron Cc: richard -rw- weinberger, torvalds, LKML, Al Viro, Lucas De Marchi, Andrew Morton, linux-fsdevel, eric.dumazet * Jason Baron | 2012-03-28 12:21:08 [-0400]: >Hmmm...Looking at ep_poll() it does an '__add_wait_queue_exclusive()'. >So, I *think* epoll_wait() should do what you want, if you are waiting >on the same epfd in all the threads. > >I think the case you are describing is where each thread does its own >ep_create(), and then a subsequent epoll_wait() on the fd from the >create? > >So, I *think* you can get what you want without adding this flag. ;) sorry: epoll_wait returned epoll_wait returned epoll_wait returned epoll_wait returned epoll_wait returned epoll_wait returned epoll_wait returned epoll_wait returned epoll_wait returned epoll_wait returned minimal example: >>>>>>>>>>> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <pthread.h> #include <sys/epoll.h> #define AMAX 16 static void *runner(void *args) { int fd = (int) *((int *) args); struct epoll_event events[AMAX]; epoll_wait(fd, events, AMAX, -1); write(1, "epoll_wait returned\n", 20); return NULL; } int main(int ac, char **av) { int i, evfd, pipefd[2]; pthread_t thread_id[2]; struct epoll_event epoll_ev; pipe(pipefd); evfd = epoll_create(64); memset(&epoll_ev, 0, sizeof(struct epoll_event)); epoll_ev.events = EPOLLIN | EPOLLPRI | EPOLLERR | EPOLLHUP; epoll_ctl(evfd, EPOLL_CTL_ADD, pipefd[0], &epoll_ev); for (i = 0; i < 10; i++) pthread_create(&thread_id[0], NULL, runner, &evfd); sleep(1); close(pipefd[1]); write(pipefd[0], "x", 1); sleep(1); return EXIT_SUCCESS; } <<<<<<<<<<< Cheers, Hagen ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH Resend] epoll: add EPOLLEXCLUSIVE support 2012-03-28 19:58 ` Hagen Paul Pfeifer @ 2012-03-29 14:16 ` Jason Baron 2012-03-29 15:05 ` Hagen Paul Pfeifer 2012-03-29 14:51 ` Hagen Paul Pfeifer 1 sibling, 1 reply; 12+ messages in thread From: Jason Baron @ 2012-03-29 14:16 UTC (permalink / raw) To: Hagen Paul Pfeifer Cc: richard -rw- weinberger, torvalds, LKML, Al Viro, Lucas De Marchi, Andrew Morton, linux-fsdevel, eric.dumazet On Wed, Mar 28, 2012 at 09:58:48PM +0200, Hagen Paul Pfeifer wrote: > > >Hmmm...Looking at ep_poll() it does an '__add_wait_queue_exclusive()'. > >So, I *think* epoll_wait() should do what you want, if you are waiting > >on the same epfd in all the threads. > > > >I think the case you are describing is where each thread does its own > >ep_create(), and then a subsequent epoll_wait() on the fd from the > >create? > > > >So, I *think* you can get what you want without adding this flag. > > ;) sorry: > > epoll_wait returned > epoll_wait returned > epoll_wait returned > epoll_wait returned > epoll_wait returned > epoll_wait returned > epoll_wait returned > epoll_wait returned > epoll_wait returned > epoll_wait returned > > > minimal example: > > >>>>>>>>>>> > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > #include <unistd.h> > #include <pthread.h> > #include <sys/epoll.h> > > #define AMAX 16 > > static void *runner(void *args) > { > int fd = (int) *((int *) args); > struct epoll_event events[AMAX]; > > epoll_wait(fd, events, AMAX, -1); > write(1, "epoll_wait returned\n", 20); > > return NULL; > } > > int main(int ac, char **av) > { > int i, evfd, pipefd[2]; > pthread_t thread_id[2]; > struct epoll_event epoll_ev; > > pipe(pipefd); > evfd = epoll_create(64); > > memset(&epoll_ev, 0, sizeof(struct epoll_event)); > epoll_ev.events = EPOLLIN | EPOLLPRI | EPOLLERR | EPOLLHUP; > epoll_ctl(evfd, EPOLL_CTL_ADD, pipefd[0], &epoll_ev); > > for (i = 0; i < 10; i++) > pthread_create(&thread_id[0], NULL, runner, &evfd); > > sleep(1); > close(pipefd[1]); > write(pipefd[0], "x", 1); > sleep(1); > > return EXIT_SUCCESS; > } Right, for level triggered events, they all wait up. However, if you use edge triggered, ie add 'EPOLLET', then the event gets 'consumed' by the first thread that wakes up, and the subseqent waiters wouldn't get woken up. IE you'll get one wakeup. Thanks, -Jason ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH Resend] epoll: add EPOLLEXCLUSIVE support 2012-03-29 14:16 ` Jason Baron @ 2012-03-29 15:05 ` Hagen Paul Pfeifer 2012-03-29 15:53 ` Jason Baron 2012-04-05 22:30 ` Andy Lutomirski 0 siblings, 2 replies; 12+ messages in thread From: Hagen Paul Pfeifer @ 2012-03-29 15:05 UTC (permalink / raw) To: Jason Baron Cc: richard -rw- weinberger, torvalds, LKML, Al Viro, Lucas De Marchi, Andrew Morton, linux-fsdevel, eric.dumazet * Jason Baron | 2012-03-29 10:16:53 [-0400]: >Right, for level triggered events, they all wait up. However, if you use >edge triggered, ie add 'EPOLLET', then the event gets 'consumed' by the >first thread that wakes up, and the subseqent waiters wouldn't get woken >up. IE you'll get one wakeup. I addressed level triggered, right - it match the model. But I don't wanted to wake up every every thread anyway. I don't want to abandon level triggered functioning. Any objective against this flag? Hagen ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH Resend] epoll: add EPOLLEXCLUSIVE support 2012-03-29 15:05 ` Hagen Paul Pfeifer @ 2012-03-29 15:53 ` Jason Baron 2012-03-29 16:32 ` Hagen Paul Pfeifer 2012-04-05 22:30 ` Andy Lutomirski 1 sibling, 1 reply; 12+ messages in thread From: Jason Baron @ 2012-03-29 15:53 UTC (permalink / raw) To: Hagen Paul Pfeifer Cc: richard -rw- weinberger, torvalds, LKML, Al Viro, Lucas De Marchi, Andrew Morton, linux-fsdevel, eric.dumazet On Thu, Mar 29, 2012 at 05:05:41PM +0200, Hagen Paul Pfeifer wrote: > * Jason Baron | 2012-03-29 10:16:53 [-0400]: > > >Right, for level triggered events, they all wait up. However, if you use > >edge triggered, ie add 'EPOLLET', then the event gets 'consumed' by the > >first thread that wakes up, and the subseqent waiters wouldn't get woken > >up. IE you'll get one wakeup. > > I addressed level triggered, right - it match the model. But I don't wanted to > wake up every every thread anyway. I don't want to abandon level triggered > functioning. > > Any objective against this flag? > I was trying to better understand the use-case, since at least for the test case you posted, 'EPOLLET', already does what you want. Also, the 'EPOLLEXCLUSIVE' flag in your patch addresses multiple threads blocking on *different* epoll fds. However, if multiple threads are blocked on a single epoll fd, they will all be woken even if 'EPOLLEXCLUSIVE' is set. Shouldn't 'EPOLLEXCLUSIVE' affect that case too? Thanks, -Jason ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH Resend] epoll: add EPOLLEXCLUSIVE support 2012-03-29 15:53 ` Jason Baron @ 2012-03-29 16:32 ` Hagen Paul Pfeifer 2012-03-29 18:54 ` Jason Baron 0 siblings, 1 reply; 12+ messages in thread From: Hagen Paul Pfeifer @ 2012-03-29 16:32 UTC (permalink / raw) To: Jason Baron Cc: richard -rw- weinberger, torvalds, LKML, Al Viro, Lucas De Marchi, Andrew Morton, linux-fsdevel, eric.dumazet * Jason Baron | 2012-03-29 11:53:24 [-0400]: >I was trying to better understand the use-case, since at least for the >test case you posted, 'EPOLLET', already does what you want. > >Also, the 'EPOLLEXCLUSIVE' flag in your patch addresses multiple threads >blocking on *different* epoll fds. However, if multiple threads are >blocked on a single epoll fd, they will all be woken even if 'EPOLLEXCLUSIVE' >is set. Shouldn't 'EPOLLEXCLUSIVE' affect that case too? Hey Jason, I just wanted to address the "main use-case" (as implemented in a bunch of network server): one listen socket (say 80) is created and a epoll fd is created. The listen socket is added to the set and n threads are created afterwards. So now you have the situation that one listening socket is added to the set and all threads are awoken if a new client connects. This patch reduce the useless-all-thread-awoken-overhead by awake only one thread. Hagen ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH Resend] epoll: add EPOLLEXCLUSIVE support 2012-03-29 16:32 ` Hagen Paul Pfeifer @ 2012-03-29 18:54 ` Jason Baron 2012-03-29 21:19 ` Hagen Paul Pfeifer 0 siblings, 1 reply; 12+ messages in thread From: Jason Baron @ 2012-03-29 18:54 UTC (permalink / raw) To: Hagen Paul Pfeifer Cc: richard -rw- weinberger, torvalds, LKML, Al Viro, Lucas De Marchi, Andrew Morton, linux-fsdevel, eric.dumazet On Thu, Mar 29, 2012 at 06:32:22PM +0200, Hagen Paul Pfeifer wrote: > * Jason Baron | 2012-03-29 11:53:24 [-0400]: > > >I was trying to better understand the use-case, since at least for the > >test case you posted, 'EPOLLET', already does what you want. > > > >Also, the 'EPOLLEXCLUSIVE' flag in your patch addresses multiple threads > >blocking on *different* epoll fds. However, if multiple threads are > >blocked on a single epoll fd, they will all be woken even if 'EPOLLEXCLUSIVE' > >is set. Shouldn't 'EPOLLEXCLUSIVE' affect that case too? > > Hey Jason, > > I just wanted to address the "main use-case" (as implemented in a bunch of > network server): one listen socket (say 80) is created and a epoll fd is > created. The listen socket is added to the set and n threads are created > afterwards. So now you have the situation that one listening socket is added > to the set and all threads are awoken if a new client connects. This patch > reduce the useless-all-thread-awoken-overhead by awake only one thread. > > Hagen Hi, But the behavior of the testcase you've supplied is not changed by the 'EPOLLEXCLUSIVE' support. So is this not the right testcase? Thanks, -Jason ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH Resend] epoll: add EPOLLEXCLUSIVE support 2012-03-29 18:54 ` Jason Baron @ 2012-03-29 21:19 ` Hagen Paul Pfeifer 0 siblings, 0 replies; 12+ messages in thread From: Hagen Paul Pfeifer @ 2012-03-29 21:19 UTC (permalink / raw) To: Jason Baron Cc: richard -rw- weinberger, torvalds, LKML, Al Viro, Lucas De Marchi, Andrew Morton, linux-fsdevel, eric.dumazet * Jason Baron | 2012-03-29 14:54:17 [-0400]: >But the behavior of the testcase you've supplied is not changed by the >'EPOLLEXCLUSIVE' support. So is this not the right testcase? Hi Jason, I am currently abroad, the example yesterday was just a quick hack somewhere in between hotel and meeting to demonstrate the behaviour. ;) I travel with a company laptop, my testcode is on another PC, I have access to my git tree and just wanted to get this patch merged in this window. Hagen ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH Resend] epoll: add EPOLLEXCLUSIVE support 2012-03-29 15:05 ` Hagen Paul Pfeifer 2012-03-29 15:53 ` Jason Baron @ 2012-04-05 22:30 ` Andy Lutomirski 1 sibling, 0 replies; 12+ messages in thread From: Andy Lutomirski @ 2012-04-05 22:30 UTC (permalink / raw) To: Hagen Paul Pfeifer Cc: Jason Baron, richard -rw- weinberger, torvalds, LKML, Al Viro, Lucas De Marchi, Andrew Morton, linux-fsdevel, eric.dumazet On 03/29/2012 08:05 AM, Hagen Paul Pfeifer wrote: > * Jason Baron | 2012-03-29 10:16:53 [-0400]: > >> Right, for level triggered events, they all wait up. However, if you use >> edge triggered, ie add 'EPOLLET', then the event gets 'consumed' by the >> first thread that wakes up, and the subseqent waiters wouldn't get woken >> up. IE you'll get one wakeup. > > I addressed level triggered, right - it match the model. But I don't wanted to > wake up every every thread anyway. I don't want to abandon level triggered > functioning. I think that what you want is a mode in which, once epoll_wait returns, the bits it returns get cleared from the event mask. Am I right? (To get this right, you'd need another flag that disables the automatic addition of POLLERR and POLLHUP.) If you do that and add an extra syscall that simultaneously does a bunch of epoll_ctls, a timerfd_settime, and an epoll_wait, I'll be extra happy. --Andy ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH Resend] epoll: add EPOLLEXCLUSIVE support 2012-03-28 19:58 ` Hagen Paul Pfeifer 2012-03-29 14:16 ` Jason Baron @ 2012-03-29 14:51 ` Hagen Paul Pfeifer 1 sibling, 0 replies; 12+ messages in thread From: Hagen Paul Pfeifer @ 2012-03-29 14:51 UTC (permalink / raw) To: Jason Baron Cc: richard -rw- weinberger, torvalds, LKML, Al Viro, Lucas De Marchi, Andrew Morton, linux-fsdevel, eric.dumazet Ping? Any signed-off-by for this patch now? Andrew? ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-04-05 22:36 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-03-28 13:57 [PATCH Resend] epoll: add EPOLLEXCLUSIVE support Hagen Paul Pfeifer 2012-03-28 14:09 ` richard -rw- weinberger 2012-03-28 16:21 ` Jason Baron 2012-03-28 19:58 ` Hagen Paul Pfeifer 2012-03-29 14:16 ` Jason Baron 2012-03-29 15:05 ` Hagen Paul Pfeifer 2012-03-29 15:53 ` Jason Baron 2012-03-29 16:32 ` Hagen Paul Pfeifer 2012-03-29 18:54 ` Jason Baron 2012-03-29 21:19 ` Hagen Paul Pfeifer 2012-04-05 22:30 ` Andy Lutomirski 2012-03-29 14:51 ` Hagen Paul Pfeifer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).