* Sustained queuing on one listener can block requests from other listeners
@ 2020-04-15 5:06 Stan Hu
2020-04-15 5:26 ` Eric Wong
0 siblings, 1 reply; 6+ messages in thread
From: Stan Hu @ 2020-04-15 5:06 UTC (permalink / raw)
To: unicorn-public
My unicorn.rb has two listeners:
listen "127.0.0.1:8080", :tcp_nopush => false
listen "/var/run/unicorn.socket", :backlog => 1024
We found that because of the greedy attempt to accept new connections
before calling select() in
https://github.com/defunkt/unicorn/blob/981f561a726bb4307d01e4a09a308edba8d69fe3/lib/unicorn/http_server.rb#L707-L714,
listeners on another socket stall out until the first listener is
drained. We would expect Unicorn to round-robin between the two
listeners, but that doesn't happen as long as there is work to be done
for the first listener. We've verified that deleting that `redo` block
fixes the problem.
What do you think about the various options?
1. Only running that redo block if there is one listener
2. Removing the redo block entirely
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Sustained queuing on one listener can block requests from other listeners
2020-04-15 5:06 Sustained queuing on one listener can block requests from other listeners Stan Hu
@ 2020-04-15 5:26 ` Eric Wong
2020-04-16 5:46 ` Stan Hu
0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2020-04-15 5:26 UTC (permalink / raw)
To: Stan Hu; +Cc: unicorn-public
Stan Hu <stanhu@gmail.com> wrote:
> My unicorn.rb has two listeners:
>
> listen "127.0.0.1:8080", :tcp_nopush => false
> listen "/var/run/unicorn.socket", :backlog => 1024
Fwiw, lowering :backlog may make sense if you got other
hosts/instances. More below..
> We found that because of the greedy attempt to accept new connections
> before calling select() in
> https://github.com/defunkt/unicorn/blob/981f561a726bb4307d01e4a09a308edba8d69fe3/lib/unicorn/http_server.rb#L707-L714,
> listeners on another socket stall out until the first listener is
> drained. We would expect Unicorn to round-robin between the two
> listeners, but that doesn't happen as long as there is work to be done
> for the first listener. We've verified that deleting that `redo` block
> fixes the problem.
>
> What do you think about the various options?
>
> 1. Only running that redo block if there is one listener
That seems reasonable, or if ready.size == nr_listeners
(proposed patch below)
> 2. Removing the redo block entirely
From what I recall ages ago, select() entry cost is pretty high
and I remember that redo helping a fair bit even in 2009 with
simple apps. Syscall cost is even higher now with CPU
vulnerability mitigations, and Ruby 1.9+ GVL release+reacquire
is also a penalty I didn't have when developing this on 1.8.
Do you have time+hardware to benchmark either approach on a
simple app? I no longer have stable/reliable hardware for
benchmarking. Thanks.
Totally untested patch to try approach #1
diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb
index a52931a..69f1f60 100644
--- a/lib/unicorn/http_server.rb
+++ b/lib/unicorn/http_server.rb
@@ -686,6 +686,7 @@ def worker_loop(worker)
trap(:USR1) { nr = -65536 }
ready = readers.dup
+ nr_listeners = readers.size
@after_worker_ready.call(self, worker)
begin
@@ -698,7 +699,6 @@ def worker_loop(worker)
# but that will return false
if client = sock.kgio_tryaccept
process_client(client)
- nr += 1
worker.tick = time_now.to_i
end
break if nr < 0
@@ -708,7 +708,7 @@ def worker_loop(worker)
# we're probably reasonably busy, so avoid calling select()
# and do a speculative non-blocking accept() on ready listeners
# before we sleep again in select().
- unless nr == 0
+ if ready.size == nr_listeners
tmp = ready.dup
redo
end
And `nr' can probably just be a boolean `reopen' flag if we're
not overloading it as a counter.
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Sustained queuing on one listener can block requests from other listeners
2020-04-15 5:26 ` Eric Wong
@ 2020-04-16 5:46 ` Stan Hu
2020-04-16 6:59 ` Eric Wong
0 siblings, 1 reply; 6+ messages in thread
From: Stan Hu @ 2020-04-16 5:46 UTC (permalink / raw)
To: Eric Wong; +Cc: unicorn-public
Thanks, Eric. That patch didn't work; it spun the CPU. I think this worked?
diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb
index a52931a..aaa4955 100644
--- a/lib/unicorn/http_server.rb
+++ b/lib/unicorn/http_server.rb
@@ -708,7 +708,7 @@ def worker_loop(worker)
# we're probably reasonably busy, so avoid calling select()
# and do a speculative non-blocking accept() on ready listeners
# before we sleep again in select().
- unless nr == 0
+ if nr == readers.size
tmp = ready.dup
redo
end
On Tue, Apr 14, 2020 at 10:26 PM Eric Wong <e@yhbt.net> wrote:
>
> Stan Hu <stanhu@gmail.com> wrote:
> > My unicorn.rb has two listeners:
> >
> > listen "127.0.0.1:8080", :tcp_nopush => false
> > listen "/var/run/unicorn.socket", :backlog => 1024
>
> Fwiw, lowering :backlog may make sense if you got other
> hosts/instances. More below..
>
> > We found that because of the greedy attempt to accept new connections
> > before calling select() in
> > https://github.com/defunkt/unicorn/blob/981f561a726bb4307d01e4a09a308edba8d69fe3/lib/unicorn/http_server.rb#L707-L714,
> > listeners on another socket stall out until the first listener is
> > drained. We would expect Unicorn to round-robin between the two
> > listeners, but that doesn't happen as long as there is work to be done
> > for the first listener. We've verified that deleting that `redo` block
> > fixes the problem.
> >
> > What do you think about the various options?
> >
> > 1. Only running that redo block if there is one listener
>
> That seems reasonable, or if ready.size == nr_listeners
> (proposed patch below)
>
> > 2. Removing the redo block entirely
>
> From what I recall ages ago, select() entry cost is pretty high
> and I remember that redo helping a fair bit even in 2009 with
> simple apps. Syscall cost is even higher now with CPU
> vulnerability mitigations, and Ruby 1.9+ GVL release+reacquire
> is also a penalty I didn't have when developing this on 1.8.
>
> Do you have time+hardware to benchmark either approach on a
> simple app? I no longer have stable/reliable hardware for
> benchmarking. Thanks.
>
> Totally untested patch to try approach #1
>
> diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb
> index a52931a..69f1f60 100644
> --- a/lib/unicorn/http_server.rb
> +++ b/lib/unicorn/http_server.rb
> @@ -686,6 +686,7 @@ def worker_loop(worker)
> trap(:USR1) { nr = -65536 }
>
> ready = readers.dup
> + nr_listeners = readers.size
> @after_worker_ready.call(self, worker)
>
> begin
> @@ -698,7 +699,6 @@ def worker_loop(worker)
> # but that will return false
> if client = sock.kgio_tryaccept
> process_client(client)
> - nr += 1
> worker.tick = time_now.to_i
> end
> break if nr < 0
> @@ -708,7 +708,7 @@ def worker_loop(worker)
> # we're probably reasonably busy, so avoid calling select()
> # and do a speculative non-blocking accept() on ready listeners
> # before we sleep again in select().
> - unless nr == 0
> + if ready.size == nr_listeners
> tmp = ready.dup
> redo
> end
>
>
>
> And `nr' can probably just be a boolean `reopen' flag if we're
> not overloading it as a counter.
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Sustained queuing on one listener can block requests from other listeners
2020-04-16 5:46 ` Stan Hu
@ 2020-04-16 6:59 ` Eric Wong
2020-04-16 7:24 ` Stan Hu
0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2020-04-16 6:59 UTC (permalink / raw)
To: Stan Hu; +Cc: unicorn-public
Stan Hu <stanhu@gmail.com> wrote:
> Thanks, Eric. That patch didn't work; it spun the CPU. I think this worked?
Oops, sorry. I was too eager to drop `nr += 1' :x
Btw, please don't top post. Fwiw, I wouldn't mind if we stopped
quoting at all on publically-archived lists (saves space and
bandwidth).
> +++ b/lib/unicorn/http_server.rb
> @@ -708,7 +708,7 @@ def worker_loop(worker)
> # we're probably reasonably busy, so avoid calling select()
> # and do a speculative non-blocking accept() on ready listeners
> # before we sleep again in select().
> - unless nr == 0
> + if nr == readers.size
> tmp = ready.dup
> redo
> end
Your patch looks close. However the `readers' array gets
dropped on SIGQUIT with `nuke_listeners!', so `readers.size'
is unstable.
How about this?
diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb
index a52931a..45a2e97 100644
--- a/lib/unicorn/http_server.rb
+++ b/lib/unicorn/http_server.rb
@@ -686,6 +686,7 @@ def worker_loop(worker)
trap(:USR1) { nr = -65536 }
ready = readers.dup
+ nr_listeners = readers.size
@after_worker_ready.call(self, worker)
begin
@@ -708,7 +709,7 @@ def worker_loop(worker)
# we're probably reasonably busy, so avoid calling select()
# and do a speculative non-blocking accept() on ready listeners
# before we sleep again in select().
- unless nr == 0
+ if nr == nr_listeners
tmp = ready.dup
redo
end
Thanks
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Sustained queuing on one listener can block requests from other listeners
2020-04-16 6:59 ` Eric Wong
@ 2020-04-16 7:24 ` Stan Hu
2020-04-16 9:24 ` [PATCH] prevent single listener from monopolizing a worker Eric Wong
0 siblings, 1 reply; 6+ messages in thread
From: Stan Hu @ 2020-04-16 7:24 UTC (permalink / raw)
To: Eric Wong; +Cc: unicorn-public
On Wed, Apr 15, 2020 at 11:59 PM Eric Wong <e@yhbt.net> wrote:
> Your patch looks close. However the `readers' array gets
> dropped on SIGQUIT with `nuke_listeners!', so `readers.size'
> is unstable.
That seems to work, thanks!
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] prevent single listener from monopolizing a worker
2020-04-16 7:24 ` Stan Hu
@ 2020-04-16 9:24 ` Eric Wong
0 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2020-04-16 9:24 UTC (permalink / raw)
To: Stan Hu; +Cc: unicorn-public
Stan Hu <stanhu@gmail.com> wrote:
> That seems to work, thanks!
Thanks for confirming. I'll push the patch below out.
(ugh, dealing with crazy packet loss all around)
Expect a v5.6.0 release within a few days or week at most.
(hopefully no regressions).
And... I wonder, are most deployments nowadays single listener?
I don't think I've used multiple listeners for this aside from
experiments in the early days.
---------8<----------
Subject: [PATCH] prevent single listener from monopolizing a worker
In setups with multiple listeners, it's possible for our greedy
select(2)-avoidance optimization to get pinned on a single, busy
listener and starve the other listener(s).
Prevent starvation by retrying the select(2)-avoidance
optimization if and only if all listeners were active. This
should have no effect on the majority of deployments with only a
single listener.
Thanks for Stan Hu for reporting and testing.
Reported-by: Stan Hu <stanhu@gmail.com>
Tested-by: Stan Hu <stanhu@gmail.com>
Link: https://yhbt.net/unicorn-public/CAMBWrQ=Yh42MPtzJCEO7XryVknDNetRMuA87irWfqVuLdJmiBQ@mail.gmail.com/
---
lib/unicorn/http_server.rb | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb
index a52931a..45a2e97 100644
--- a/lib/unicorn/http_server.rb
+++ b/lib/unicorn/http_server.rb
@@ -686,6 +686,7 @@ def worker_loop(worker)
trap(:USR1) { nr = -65536 }
ready = readers.dup
+ nr_listeners = readers.size
@after_worker_ready.call(self, worker)
begin
@@ -708,7 +709,7 @@ def worker_loop(worker)
# we're probably reasonably busy, so avoid calling select()
# and do a speculative non-blocking accept() on ready listeners
# before we sleep again in select().
- unless nr == 0
+ if nr == nr_listeners
tmp = ready.dup
redo
end
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-04-16 9:24 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-04-15 5:06 Sustained queuing on one listener can block requests from other listeners Stan Hu
2020-04-15 5:26 ` Eric Wong
2020-04-16 5:46 ` Stan Hu
2020-04-16 6:59 ` Eric Wong
2020-04-16 7:24 ` Stan Hu
2020-04-16 9:24 ` [PATCH] prevent single listener from monopolizing a worker Eric Wong
Code repositories for project(s) associated with this public inbox
https://yhbt.net/unicorn.git/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).