From: Jason Baron <jbaron@akamai.com>
To: peterz@infradead.org, mingo@redhat.com, viro@zeniv.linux.org.uk
Cc: akpm@linux-foundation.org, normalperson@yhbt.net,
davidel@xmailserver.org, mtk.manpages@gmail.com,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: [PATCH 0/2] Add epoll round robin wakeup mode
Date: Mon, 9 Feb 2015 20:05:53 +0000 (GMT) [thread overview]
Message-ID: <cover.1423509605.git.jbaron@akamai.com> (raw)
Hi,
When we are sharing a wakeup source among multiple epoll fds, we end up with
thundering herd wakeups, since there is currently no way to add to the
wakeup source exclusively. This series introduces 2 new epoll flags,
EPOLLEXCLUSIVE for adding to a wakeup source exclusively. And EPOLLROUNDROBIN
which is to be used in conjunction to EPOLLEXCLUSIVE to evenly
distribute the wakeups. I'm showing perf results from the simple pipe() usecase
below. But this patch was originally motivated by a desire to improve
wakeup balance and cpu usage for a shared listen socket().
Perf stat, 3.19.0-rc7+, 4 core, Intel(R) Xeon(R) CPU E3-1265L v3 @ 2.50GHz:
pipe test wake all:
Performance counter stats for './wake':
10837.480396 task-clock (msec) # 1.879 CPUs utilized
2047108 context-switches # 0.189 M/sec
214491 cpu-migrations # 0.020 M/sec
247 page-faults # 0.023 K/sec
23655687888 cycles # 2.183 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
11242141621 instructions # 0.48 insns per cycle
2313479486 branches # 213.470 M/sec
13679036 branch-misses # 0.59% of all branches
5.768295821 seconds time elapsed
pipe test wake balanced:
Performance counter stats for './wake -o':
291.250312 task-clock (msec) # 0.094 CPUs utilized
40308 context-switches # 0.138 M/sec
1448 cpu-migrations # 0.005 M/sec
248 page-faults # 0.852 K/sec
646407197 cycles # 2.219 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
364256883 instructions # 0.56 insns per cycle
65775397 branches # 225.838 M/sec
535637 branch-misses # 0.81% of all branches
3.086694452 seconds time elapsed
Rough epoll manpage text:
EPOLLEXCLUSIVE
Provides exclusive wakeups when attaching multiple epoll fds to a
shared wakeup source. Must be specified on an EPOLL_CTL_ADD operation.
EPOLLROUNDROBIN
Provides balancing for exclusive wakeups when attaching multiple epoll
fds to a shared wakeup soruce. Must be specificed with EPOLLEXCLUSIVE
during an EPOLL_CTL_ADD operation.
Thanks,
-Jason
#include <unistd.h>
#include <sys/epoll.h>
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#define NUM_THREADS 100
#define NUM_EVENTS 20000
#define EPOLLEXCLUSIVE (1 << 28)
#define EPOLLBALANCED (1 << 27)
int optimize, exclusive;
int p[2];
pthread_t threads[NUM_THREADS];
int event_count[NUM_THREADS];
struct epoll_event evt = {
.events = EPOLLIN
};
void die(const char *msg) {
perror(msg);
exit(-1);
}
void *run_func(void *ptr)
{
int i = 0;
int j = 0;
int ret;
int epfd;
char buf[4];
int id = *(int *)ptr;
int *contents;
if ((epfd = epoll_create(1)) < 0)
die("create");
if (optimize)
evt.events |= ((EPOLLBALANCED | EPOLLEXCLUSIVE));
else if (exclusive)
evt.events |= EPOLLEXCLUSIVE;
ret = epoll_ctl(epfd, EPOLL_CTL_ADD, p[0], &evt);
if (ret)
perror("epoll_ctl add error!\n");
while (1) {
ret = epoll_wait(epfd, &evt, 10000, -1);
ret = read(p[0], buf, sizeof(int));
if (ret == 4)
event_count[id]++;
}
}
int main(int argc, char *argv[])
{
int ret, i, j;
int id[NUM_THREADS];
int total = 0;
int nohit = 0;
int extra_wakeups = 0;
if (argc == 2) {
if (strcmp(argv[1], "-o") == 0)
optimize = 1;
if (strcmp(argv[1], "-e") == 0)
exclusive = 1;
}
if (pipe(p) < 0)
die("pipe");
for (i = 0; i < NUM_THREADS; i++) {
id[i] = i;
pthread_create(&threads[i], NULL, run_func, &id[i]);
}
for (j = 0; j < NUM_EVENTS; j++) {
write(p[1], p, sizeof(int));
usleep(100);
}
for (i = 0; i < NUM_THREADS; i++) {
pthread_cancel(threads[i]);
printf("joined: %d\n", i);
printf("event count: %d\n", event_count[i]);
total += event_count[i];
if (!event_count[i])
nohit++;
}
printf("total events is: %d\n", total);
printf("nohit is: %d\n", nohit);
}
Jason Baron (2):
sched/wait: add round robin wakeup mode
epoll: introduce EPOLLEXCLUSIVE and EPOLLROUNDROBIN
fs/eventpoll.c | 25 ++++++++++++++++++++-----
include/linux/wait.h | 11 +++++++++++
include/uapi/linux/eventpoll.h | 6 ++++++
kernel/sched/wait.c | 5 ++++-
4 files changed, 41 insertions(+), 6 deletions(-)
--
1.8.2.rc2
next reply other threads:[~2015-02-09 20:05 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-09 20:05 Jason Baron [this message]
2015-02-09 20:05 ` [PATCH 1/2] sched/wait: add round robin wakeup mode Jason Baron
2015-02-09 20:26 ` Michael Kerrisk
2015-02-09 21:50 ` Peter Zijlstra
2015-02-10 4:06 ` Jason Baron
2015-02-10 9:03 ` Peter Zijlstra
2015-02-10 15:59 ` Jason Baron
2015-02-10 16:11 ` Peter Zijlstra
2015-02-09 20:06 ` [PATCH 2/2] epoll: introduce EPOLLEXCLUSIVE and EPOLLROUNDROBIN Jason Baron
2015-02-09 20:18 ` Andy Lutomirski
2015-02-09 21:32 ` Jason Baron
2015-02-09 22:45 ` Andy Lutomirski
2015-02-10 3:59 ` Jason Baron
2015-02-10 4:49 ` Eric Wong
2015-02-10 19:16 ` Jason Baron
2015-02-10 19:32 ` Eric Wong
2015-02-09 20:27 ` Michael Kerrisk
2015-02-09 20:25 ` [PATCH 0/2] Add epoll round robin wakeup mode Michael Kerrisk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1423509605.git.jbaron@akamai.com \
--to=jbaron@akamai.com \
--cc=akpm@linux-foundation.org \
--cc=davidel@xmailserver.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=mtk.manpages@gmail.com \
--cc=normalperson@yhbt.net \
--cc=peterz@infradead.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).