LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH Resend] epoll: add EPOLLEXCLUSIVE support
@ 2012-03-28 13:57 Hagen Paul Pfeifer
  2012-03-28 14:09 ` richard -rw- weinberger
  0 siblings, 1 reply; 12+ messages in thread
From: Hagen Paul Pfeifer @ 2012-03-28 13:57 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel, Hagen Paul Pfeifer

High performance server sometimes create one listening socket (e.g. port
80), create a epoll file descriptor and add the socket. Afterwards
create SC_NPROCESSORS_ONLN threads and wait for events. This often
result in a thundering herd problem because all CPUs are scheduled.

This patch add an additional flag to epoll_ctl(2) called EPOLLEXCLUSIVE.
If a descriptor is added with this flag only one CPU is scheduled in.

Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
---
Dave rejected the patch and said not network specific. Because there
is no epoll maintainer this time directly.

 fs/eventpoll.c            |    7 +++++--
 include/linux/eventpoll.h |    3 +++
 2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 629e9ed..16d787f 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -88,7 +88,7 @@
  */
 
 /* Epoll private bits inside the event mask */
-#define EP_PRIVATE_BITS (EPOLLONESHOT | EPOLLET)
+#define EP_PRIVATE_BITS (EPOLLONESHOT | EPOLLET | EPOLLEXCLUSIVE)
 
 /* Maximum number of nesting allowed inside epoll sets */
 #define EP_MAX_NESTS 4
@@ -969,7 +969,10 @@ static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead,
 		init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);
 		pwq->whead = whead;
 		pwq->base = epi;
-		add_wait_queue(whead, &pwq->wait);
+		if (unlikely(epi->event.events & EPOLLEXCLUSIVE))
+			add_wait_queue_exclusive(whead, &pwq->wait);
+		else
+			add_wait_queue(whead, &pwq->wait);
 		list_add_tail(&pwq->llink, &epi->pwqlist);
 		epi->nwait++;
 	} else {
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index 657ab55..d334389 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -26,6 +26,9 @@
 #define EPOLL_CTL_DEL 2
 #define EPOLL_CTL_MOD 3
 
+/* Set Exclusive wake up behaviour for the target file descriptor */
+#define EPOLLEXCLUSIVE (1 << 29)
+
 /* Set the One Shot behaviour for the target file descriptor */
 #define EPOLLONESHOT (1 << 30)
 
-- 
1.7.9.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH Resend] epoll: add EPOLLEXCLUSIVE support
  2012-03-28 13:57 [PATCH Resend] epoll: add EPOLLEXCLUSIVE support Hagen Paul Pfeifer
@ 2012-03-28 14:09 ` richard -rw- weinberger
  2012-03-28 16:21   ` Jason Baron
  0 siblings, 1 reply; 12+ messages in thread
From: richard -rw- weinberger @ 2012-03-28 14:09 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: torvalds, LKML, Al Viro, Lucas De Marchi, Andrew Morton, jbaron,
	linux-fsdevel

On Wed, Mar 28, 2012 at 3:57 PM, Hagen Paul Pfeifer <hagen@jauu.net> wrote:
> High performance server sometimes create one listening socket (e.g. port
> 80), create a epoll file descriptor and add the socket. Afterwards
> create SC_NPROCESSORS_ONLN threads and wait for events. This often
> result in a thundering herd problem because all CPUs are scheduled.
>
> This patch add an additional flag to epoll_ctl(2) called EPOLLEXCLUSIVE.
> If a descriptor is added with this flag only one CPU is scheduled in.
>
> Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
> ---
> Dave rejected the patch and said not network specific. Because there
> is no epoll maintainer this time directly.

CC'ing maintainers for you...
Please use scripts/get_maintainer.pl.

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH Resend] epoll: add EPOLLEXCLUSIVE support
  2012-03-28 14:09 ` richard -rw- weinberger
@ 2012-03-28 16:21   ` Jason Baron
  2012-03-28 19:58     ` Hagen Paul Pfeifer
  0 siblings, 1 reply; 12+ messages in thread
From: Jason Baron @ 2012-03-28 16:21 UTC (permalink / raw)
  To: richard -rw- weinberger
  Cc: Hagen Paul Pfeifer, torvalds, LKML, Al Viro, Lucas De Marchi,
	Andrew Morton, linux-fsdevel, eric.dumazet

On Wed, Mar 28, 2012 at 04:09:24PM +0200, richard -rw- weinberger wrote:
> On Wed, Mar 28, 2012 at 3:57 PM, Hagen Paul Pfeifer <hagen@jauu.net> wrote:
> > High performance server sometimes create one listening socket (e.g. port
> > 80), create a epoll file descriptor and add the socket. Afterwards
> > create SC_NPROCESSORS_ONLN threads and wait for events. This often
> > result in a thundering herd problem because all CPUs are scheduled.
> >
> > This patch add an additional flag to epoll_ctl(2) called EPOLLEXCLUSIVE.
> > If a descriptor is added with this flag only one CPU is scheduled in.
> >
> > Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
> > ---
> > Dave rejected the patch and said not network specific. Because there
> > is no epoll maintainer this time directly.
> 
> CC'ing maintainers for you...
> Please use scripts/get_maintainer.pl.
> 

Hmmm...Looking at ep_poll() it does an '__add_wait_queue_exclusive()'.
So, I *think* epoll_wait() should do what you want, if you are waiting
on the same epfd in all the threads.

I think the case you are describing is where each thread does its own
ep_create(), and then a subsequent epoll_wait() on the fd from the
create?

So, I *think* you can get what you want without adding this flag.

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH Resend] epoll: add EPOLLEXCLUSIVE support
  2012-03-28 16:21   ` Jason Baron
@ 2012-03-28 19:58     ` Hagen Paul Pfeifer
  2012-03-29 14:16       ` Jason Baron
  2012-03-29 14:51       ` Hagen Paul Pfeifer
  0 siblings, 2 replies; 12+ messages in thread
From: Hagen Paul Pfeifer @ 2012-03-28 19:58 UTC (permalink / raw)
  To: Jason Baron
  Cc: richard -rw- weinberger, torvalds, LKML, Al Viro, Lucas De Marchi,
	Andrew Morton, linux-fsdevel, eric.dumazet

* Jason Baron | 2012-03-28 12:21:08 [-0400]:

>Hmmm...Looking at ep_poll() it does an '__add_wait_queue_exclusive()'.
>So, I *think* epoll_wait() should do what you want, if you are waiting
>on the same epfd in all the threads.
>
>I think the case you are describing is where each thread does its own
>ep_create(), and then a subsequent epoll_wait() on the fd from the
>create?
>
>So, I *think* you can get what you want without adding this flag.

;) sorry:

epoll_wait returned
epoll_wait returned
epoll_wait returned
epoll_wait returned
epoll_wait returned
epoll_wait returned
epoll_wait returned
epoll_wait returned
epoll_wait returned
epoll_wait returned


minimal example:

>>>>>>>>>>>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <pthread.h>
#include <sys/epoll.h>

#define	AMAX 16

static void *runner(void *args)
{
	int fd = (int) *((int *) args);
	struct epoll_event events[AMAX];

	epoll_wait(fd, events, AMAX, -1);
	write(1, "epoll_wait returned\n", 20);

	return NULL;
}

int main(int ac, char **av)
{
	int i, evfd, pipefd[2];
	pthread_t thread_id[2];
	struct epoll_event epoll_ev;

	pipe(pipefd);
	evfd = epoll_create(64);

	memset(&epoll_ev, 0, sizeof(struct epoll_event));
	epoll_ev.events   = EPOLLIN | EPOLLPRI | EPOLLERR | EPOLLHUP;
	epoll_ctl(evfd, EPOLL_CTL_ADD, pipefd[0], &epoll_ev);

	for (i = 0; i < 10; i++)
		pthread_create(&thread_id[0], NULL, runner, &evfd);

	sleep(1);
	close(pipefd[1]);
	write(pipefd[0], "x", 1);
	sleep(1);

	return EXIT_SUCCESS;
}
<<<<<<<<<<<


Cheers, Hagen

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH Resend] epoll: add EPOLLEXCLUSIVE support
  2012-03-28 19:58     ` Hagen Paul Pfeifer
@ 2012-03-29 14:16       ` Jason Baron
  2012-03-29 15:05         ` Hagen Paul Pfeifer
  2012-03-29 14:51       ` Hagen Paul Pfeifer
  1 sibling, 1 reply; 12+ messages in thread
From: Jason Baron @ 2012-03-29 14:16 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: richard -rw- weinberger, torvalds, LKML, Al Viro, Lucas De Marchi,
	Andrew Morton, linux-fsdevel, eric.dumazet

On Wed, Mar 28, 2012 at 09:58:48PM +0200, Hagen Paul Pfeifer wrote:
> 
> >Hmmm...Looking at ep_poll() it does an '__add_wait_queue_exclusive()'.
> >So, I *think* epoll_wait() should do what you want, if you are waiting
> >on the same epfd in all the threads.
> >
> >I think the case you are describing is where each thread does its own
> >ep_create(), and then a subsequent epoll_wait() on the fd from the
> >create?
> >
> >So, I *think* you can get what you want without adding this flag.
> 
> ;) sorry:
> 
> epoll_wait returned
> epoll_wait returned
> epoll_wait returned
> epoll_wait returned
> epoll_wait returned
> epoll_wait returned
> epoll_wait returned
> epoll_wait returned
> epoll_wait returned
> epoll_wait returned
> 
> 
> minimal example:
> 
> >>>>>>>>>>>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <unistd.h>
> #include <pthread.h>
> #include <sys/epoll.h>
> 
> #define	AMAX 16
> 
> static void *runner(void *args)
> {
> 	int fd = (int) *((int *) args);
> 	struct epoll_event events[AMAX];
> 
> 	epoll_wait(fd, events, AMAX, -1);
> 	write(1, "epoll_wait returned\n", 20);
> 
> 	return NULL;
> }
> 
> int main(int ac, char **av)
> {
> 	int i, evfd, pipefd[2];
> 	pthread_t thread_id[2];
> 	struct epoll_event epoll_ev;
> 
> 	pipe(pipefd);
> 	evfd = epoll_create(64);
> 
> 	memset(&epoll_ev, 0, sizeof(struct epoll_event));
> 	epoll_ev.events   = EPOLLIN | EPOLLPRI | EPOLLERR | EPOLLHUP;
> 	epoll_ctl(evfd, EPOLL_CTL_ADD, pipefd[0], &epoll_ev);
> 
> 	for (i = 0; i < 10; i++)
> 		pthread_create(&thread_id[0], NULL, runner, &evfd);
> 
> 	sleep(1);
> 	close(pipefd[1]);
> 	write(pipefd[0], "x", 1);
> 	sleep(1);
> 
> 	return EXIT_SUCCESS;
> }

Right, for level triggered events, they all wait up. However, if you use
edge triggered, ie add 'EPOLLET', then the event gets 'consumed' by the
first thread that wakes up, and the subseqent waiters wouldn't get woken
up. IE you'll get one wakeup.

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH Resend] epoll: add EPOLLEXCLUSIVE support
  2012-03-28 19:58     ` Hagen Paul Pfeifer
  2012-03-29 14:16       ` Jason Baron
@ 2012-03-29 14:51       ` Hagen Paul Pfeifer
  1 sibling, 0 replies; 12+ messages in thread
From: Hagen Paul Pfeifer @ 2012-03-29 14:51 UTC (permalink / raw)
  To: Jason Baron
  Cc: richard -rw- weinberger, torvalds, LKML, Al Viro, Lucas De Marchi,
	Andrew Morton, linux-fsdevel, eric.dumazet

Ping? Any signed-off-by for this patch now? Andrew?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH Resend] epoll: add EPOLLEXCLUSIVE support
  2012-03-29 14:16       ` Jason Baron
@ 2012-03-29 15:05         ` Hagen Paul Pfeifer
  2012-03-29 15:53           ` Jason Baron
  2012-04-05 22:30           ` Andy Lutomirski
  0 siblings, 2 replies; 12+ messages in thread
From: Hagen Paul Pfeifer @ 2012-03-29 15:05 UTC (permalink / raw)
  To: Jason Baron
  Cc: richard -rw- weinberger, torvalds, LKML, Al Viro, Lucas De Marchi,
	Andrew Morton, linux-fsdevel, eric.dumazet

* Jason Baron | 2012-03-29 10:16:53 [-0400]:

>Right, for level triggered events, they all wait up. However, if you use
>edge triggered, ie add 'EPOLLET', then the event gets 'consumed' by the
>first thread that wakes up, and the subseqent waiters wouldn't get woken
>up. IE you'll get one wakeup.

I addressed level triggered, right - it match the model. But I don't wanted to
wake up every every thread anyway. I don't want to abandon level triggered
functioning.

Any objective against this flag?

Hagen

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH Resend] epoll: add EPOLLEXCLUSIVE support
  2012-03-29 15:05         ` Hagen Paul Pfeifer
@ 2012-03-29 15:53           ` Jason Baron
  2012-03-29 16:32             ` Hagen Paul Pfeifer
  2012-04-05 22:30           ` Andy Lutomirski
  1 sibling, 1 reply; 12+ messages in thread
From: Jason Baron @ 2012-03-29 15:53 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: richard -rw- weinberger, torvalds, LKML, Al Viro, Lucas De Marchi,
	Andrew Morton, linux-fsdevel, eric.dumazet

On Thu, Mar 29, 2012 at 05:05:41PM +0200, Hagen Paul Pfeifer wrote:
> * Jason Baron | 2012-03-29 10:16:53 [-0400]:
> 
> >Right, for level triggered events, they all wait up. However, if you use
> >edge triggered, ie add 'EPOLLET', then the event gets 'consumed' by the
> >first thread that wakes up, and the subseqent waiters wouldn't get woken
> >up. IE you'll get one wakeup.
> 
> I addressed level triggered, right - it match the model. But I don't wanted to
> wake up every every thread anyway. I don't want to abandon level triggered
> functioning.
> 
> Any objective against this flag?
> 

I was trying to better understand the use-case, since at least for the
test case you posted, 'EPOLLET', already does what you want.

Also, the 'EPOLLEXCLUSIVE' flag in your patch addresses multiple threads
blocking on *different* epoll fds. However, if multiple threads are
blocked on a single epoll fd, they will all be woken even if 'EPOLLEXCLUSIVE'
is set. Shouldn't 'EPOLLEXCLUSIVE' affect that case too?

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH Resend] epoll: add EPOLLEXCLUSIVE support
  2012-03-29 15:53           ` Jason Baron
@ 2012-03-29 16:32             ` Hagen Paul Pfeifer
  2012-03-29 18:54               ` Jason Baron
  0 siblings, 1 reply; 12+ messages in thread
From: Hagen Paul Pfeifer @ 2012-03-29 16:32 UTC (permalink / raw)
  To: Jason Baron
  Cc: richard -rw- weinberger, torvalds, LKML, Al Viro, Lucas De Marchi,
	Andrew Morton, linux-fsdevel, eric.dumazet

* Jason Baron | 2012-03-29 11:53:24 [-0400]:

>I was trying to better understand the use-case, since at least for the
>test case you posted, 'EPOLLET', already does what you want.
>
>Also, the 'EPOLLEXCLUSIVE' flag in your patch addresses multiple threads
>blocking on *different* epoll fds. However, if multiple threads are
>blocked on a single epoll fd, they will all be woken even if 'EPOLLEXCLUSIVE'
>is set. Shouldn't 'EPOLLEXCLUSIVE' affect that case too?

Hey Jason,

I just wanted to address the "main use-case" (as implemented in a bunch of
network server): one listen socket (say 80) is created and a epoll fd is
created. The listen socket is added to the set and n threads are created
afterwards. So now you have the situation that one listening socket is added
to the set and all threads are awoken if a new client connects. This patch
reduce the useless-all-thread-awoken-overhead by awake only one thread.

Hagen

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH Resend] epoll: add EPOLLEXCLUSIVE support
  2012-03-29 16:32             ` Hagen Paul Pfeifer
@ 2012-03-29 18:54               ` Jason Baron
  2012-03-29 21:19                 ` Hagen Paul Pfeifer
  0 siblings, 1 reply; 12+ messages in thread
From: Jason Baron @ 2012-03-29 18:54 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: richard -rw- weinberger, torvalds, LKML, Al Viro, Lucas De Marchi,
	Andrew Morton, linux-fsdevel, eric.dumazet

On Thu, Mar 29, 2012 at 06:32:22PM +0200, Hagen Paul Pfeifer wrote:
> * Jason Baron | 2012-03-29 11:53:24 [-0400]:
> 
> >I was trying to better understand the use-case, since at least for the
> >test case you posted, 'EPOLLET', already does what you want.
> >
> >Also, the 'EPOLLEXCLUSIVE' flag in your patch addresses multiple threads
> >blocking on *different* epoll fds. However, if multiple threads are
> >blocked on a single epoll fd, they will all be woken even if 'EPOLLEXCLUSIVE'
> >is set. Shouldn't 'EPOLLEXCLUSIVE' affect that case too?
> 
> Hey Jason,
> 
> I just wanted to address the "main use-case" (as implemented in a bunch of
> network server): one listen socket (say 80) is created and a epoll fd is
> created. The listen socket is added to the set and n threads are created
> afterwards. So now you have the situation that one listening socket is added
> to the set and all threads are awoken if a new client connects. This patch
> reduce the useless-all-thread-awoken-overhead by awake only one thread.
> 
> Hagen

Hi,

But the behavior of the testcase you've supplied is not changed by the
'EPOLLEXCLUSIVE' support. So is this not the right testcase?

Thanks,

-Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH Resend] epoll: add EPOLLEXCLUSIVE support
  2012-03-29 18:54               ` Jason Baron
@ 2012-03-29 21:19                 ` Hagen Paul Pfeifer
  0 siblings, 0 replies; 12+ messages in thread
From: Hagen Paul Pfeifer @ 2012-03-29 21:19 UTC (permalink / raw)
  To: Jason Baron
  Cc: richard -rw- weinberger, torvalds, LKML, Al Viro, Lucas De Marchi,
	Andrew Morton, linux-fsdevel, eric.dumazet

* Jason Baron | 2012-03-29 14:54:17 [-0400]:

>But the behavior of the testcase you've supplied is not changed by the
>'EPOLLEXCLUSIVE' support. So is this not the right testcase?

Hi Jason,

I am currently abroad, the example yesterday was just a quick hack somewhere
in between hotel and meeting to demonstrate the behaviour. ;)

I travel with a company laptop, my testcode is on another PC, I have access to
my git tree and just wanted to get this patch merged in this window.

Hagen


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH Resend] epoll: add EPOLLEXCLUSIVE support
  2012-03-29 15:05         ` Hagen Paul Pfeifer
  2012-03-29 15:53           ` Jason Baron
@ 2012-04-05 22:30           ` Andy Lutomirski
  1 sibling, 0 replies; 12+ messages in thread
From: Andy Lutomirski @ 2012-04-05 22:30 UTC (permalink / raw)
  To: Hagen Paul Pfeifer
  Cc: Jason Baron, richard -rw- weinberger, torvalds, LKML, Al Viro,
	Lucas De Marchi, Andrew Morton, linux-fsdevel, eric.dumazet

On 03/29/2012 08:05 AM, Hagen Paul Pfeifer wrote:
> * Jason Baron | 2012-03-29 10:16:53 [-0400]:
> 
>> Right, for level triggered events, they all wait up. However, if you use
>> edge triggered, ie add 'EPOLLET', then the event gets 'consumed' by the
>> first thread that wakes up, and the subseqent waiters wouldn't get woken
>> up. IE you'll get one wakeup.
> 
> I addressed level triggered, right - it match the model. But I don't wanted to
> wake up every every thread anyway. I don't want to abandon level triggered
> functioning.

I think that what you want is a mode in which, once epoll_wait returns,
the bits it returns get cleared from the event mask.  Am I right?

(To get this right, you'd need another flag that disables the automatic
addition of POLLERR and POLLHUP.)

If you do that and add an extra syscall that simultaneously does a bunch
of epoll_ctls, a timerfd_settime, and an epoll_wait, I'll be extra happy.

--Andy

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-04-05 22:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-28 13:57 [PATCH Resend] epoll: add EPOLLEXCLUSIVE support Hagen Paul Pfeifer
2012-03-28 14:09 ` richard -rw- weinberger
2012-03-28 16:21   ` Jason Baron
2012-03-28 19:58     ` Hagen Paul Pfeifer
2012-03-29 14:16       ` Jason Baron
2012-03-29 15:05         ` Hagen Paul Pfeifer
2012-03-29 15:53           ` Jason Baron
2012-03-29 16:32             ` Hagen Paul Pfeifer
2012-03-29 18:54               ` Jason Baron
2012-03-29 21:19                 ` Hagen Paul Pfeifer
2012-04-05 22:30           ` Andy Lutomirski
2012-03-29 14:51       ` Hagen Paul Pfeifer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).