LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next] epoll: add EPOLLEXCLUSIVE support
@ 2012-02-14 20:48 Hagen Paul Pfeifer
  2012-02-14 21:06 ` Eric Dumazet
  2012-02-14 21:23 ` David Miller
  0 siblings, 2 replies; 4+ messages in thread
From: Hagen Paul Pfeifer @ 2012-02-14 20:48 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev, Hagen Paul Pfeifer, Davide Libenzi, Eric Dumazet

High performance server sometimes create one listening socket (e.g. port
80), create a epoll file descriptor and add the socket. Afterwards
create SC_NPROCESSORS_ONLN threads and wait for events. This often
result in a thundering herd problem because all CPUs are scheduled.

This patch add an additional flag to epoll_ctl(2) called EPOLLEXCLUSIVE.
If a descriptor is added with this flag only one CPU is scheduled in.

Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Reported-by: Li Yu <raise.sail@gmail.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
---
 fs/eventpoll.c            |    7 +++++--
 include/linux/eventpoll.h |    3 +++
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index aabdfc3..bb442b1 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -88,7 +88,7 @@
  */
 
 /* Epoll private bits inside the event mask */
-#define EP_PRIVATE_BITS (EPOLLONESHOT | EPOLLET)
+#define EP_PRIVATE_BITS (EPOLLONESHOT | EPOLLET | EPOLLEXCLUSIVE)
 
 /* Maximum number of nesting allowed inside epoll sets */
 #define EP_MAX_NESTS 4
@@ -913,7 +913,10 @@ static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead,
 		init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);
 		pwq->whead = whead;
 		pwq->base = epi;
-		add_wait_queue(whead, &pwq->wait);
+		if (unlikely(epi->event.events & EPOLLEXCLUSIVE))
+			add_wait_queue_exclusive(whead, &pwq->wait);
+		else
+			add_wait_queue(whead, &pwq->wait);
 		list_add_tail(&pwq->llink, &epi->pwqlist);
 		epi->nwait++;
 	} else {
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index 657ab55..d334389 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -26,6 +26,9 @@
 #define EPOLL_CTL_DEL 2
 #define EPOLL_CTL_MOD 3
 
+/* Set Exclusive wake up behaviour for the target file descriptor */
+#define EPOLLEXCLUSIVE (1 << 29)
+
 /* Set the One Shot behaviour for the target file descriptor */
 #define EPOLLONESHOT (1 << 30)
 
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next] epoll: add EPOLLEXCLUSIVE support
  2012-02-14 20:48 [PATCH net-next] epoll: add EPOLLEXCLUSIVE support Hagen Paul Pfeifer
@ 2012-02-14 21:06 ` Eric Dumazet
  2012-02-14 21:38   ` Hagen Paul Pfeifer
  2012-02-14 21:23 ` David Miller
  1 sibling, 1 reply; 4+ messages in thread
From: Eric Dumazet @ 2012-02-14 21:06 UTC (permalink / raw)
  To: Hagen Paul Pfeifer; +Cc: linux-kernel, netdev, Davide Libenzi

Le mardi 14 février 2012 à 21:48 +0100, Hagen Paul Pfeifer a écrit :
> High performance server sometimes create one listening socket (e.g. port
> 80), create a epoll file descriptor and add the socket. Afterwards
> create SC_NPROCESSORS_ONLN threads and wait for events. This often
> result in a thundering herd problem because all CPUs are scheduled.
> 
> This patch add an additional flag to epoll_ctl(2) called EPOLLEXCLUSIVE.
> If a descriptor is added with this flag only one CPU is scheduled in.
> 
> Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
> Reported-by: Li Yu <raise.sail@gmail.com>
> Cc: Davide Libenzi <davidel@xmailserver.org>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> ---

Seems pretty good to me.

Do you have some performance numbers to share ?





^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next] epoll: add EPOLLEXCLUSIVE support
  2012-02-14 20:48 [PATCH net-next] epoll: add EPOLLEXCLUSIVE support Hagen Paul Pfeifer
  2012-02-14 21:06 ` Eric Dumazet
@ 2012-02-14 21:23 ` David Miller
  1 sibling, 0 replies; 4+ messages in thread
From: David Miller @ 2012-02-14 21:23 UTC (permalink / raw)
  To: hagen; +Cc: linux-kernel, netdev, davidel, eric.dumazet

From: Hagen Paul Pfeifer <hagen@jauu.net>
Date: Tue, 14 Feb 2012 21:48:04 +0100

> High performance server sometimes create one listening socket (e.g. port
> 80), create a epoll file descriptor and add the socket. Afterwards
> create SC_NPROCESSORS_ONLN threads and wait for events. This often
> result in a thundering herd problem because all CPUs are scheduled.
> 
> This patch add an additional flag to epoll_ctl(2) called EPOLLEXCLUSIVE.
> If a descriptor is added with this flag only one CPU is scheduled in.
> 
> Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
> Reported-by: Li Yu <raise.sail@gmail.com>

This is not a networking specific change and therefore should not
be submitted via my tree(s).

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next] epoll: add EPOLLEXCLUSIVE support
  2012-02-14 21:06 ` Eric Dumazet
@ 2012-02-14 21:38   ` Hagen Paul Pfeifer
  0 siblings, 0 replies; 4+ messages in thread
From: Hagen Paul Pfeifer @ 2012-02-14 21:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, netdev, Davide Libenzi

* Eric Dumazet | 2012-02-14 22:06:15 [+0100]:

>Seems pretty good to me.
>
>Do you have some performance numbers to share ?

No, but I did some tests with one of my network performance tools. I imagine
that I can *construct* test-cases and add some 'perf stat cs:u' statistics.
IMHO it is not fair to present some artificial tunned performance numbers.
There are use-cases where EPOLLEXCLUSIVE can be really helpfull, yes I think
that this flag SHOULD be a userspace default. ;-)

Hagen

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-02-14 21:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-14 20:48 [PATCH net-next] epoll: add EPOLLEXCLUSIVE support Hagen Paul Pfeifer
2012-02-14 21:06 ` Eric Dumazet
2012-02-14 21:38   ` Hagen Paul Pfeifer
2012-02-14 21:23 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).