* Sustained queuing on one listener can block requests from other listeners @ 2020-04-15 5:06 Stan Hu 2020-04-15 5:26 ` Eric Wong 0 siblings, 1 reply; 6+ messages in thread From: Stan Hu @ 2020-04-15 5:06 UTC (permalink / raw) To: unicorn-public My unicorn.rb has two listeners: listen "127.0.0.1:8080", :tcp_nopush => false listen "/var/run/unicorn.socket", :backlog => 1024 We found that because of the greedy attempt to accept new connections before calling select() in https://github.com/defunkt/unicorn/blob/981f561a726bb4307d01e4a09a308edba8d69fe3/lib/unicorn/http_server.rb#L707-L714, listeners on another socket stall out until the first listener is drained. We would expect Unicorn to round-robin between the two listeners, but that doesn't happen as long as there is work to be done for the first listener. We've verified that deleting that `redo` block fixes the problem. What do you think about the various options? 1. Only running that redo block if there is one listener 2. Removing the redo block entirely ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Sustained queuing on one listener can block requests from other listeners 2020-04-15 5:06 Sustained queuing on one listener can block requests from other listeners Stan Hu @ 2020-04-15 5:26 ` Eric Wong 2020-04-16 5:46 ` Stan Hu 0 siblings, 1 reply; 6+ messages in thread From: Eric Wong @ 2020-04-15 5:26 UTC (permalink / raw) To: Stan Hu; +Cc: unicorn-public Stan Hu <stanhu@gmail.com> wrote: > My unicorn.rb has two listeners: > > listen "127.0.0.1:8080", :tcp_nopush => false > listen "/var/run/unicorn.socket", :backlog => 1024 Fwiw, lowering :backlog may make sense if you got other hosts/instances. More below.. > We found that because of the greedy attempt to accept new connections > before calling select() in > https://github.com/defunkt/unicorn/blob/981f561a726bb4307d01e4a09a308edba8d69fe3/lib/unicorn/http_server.rb#L707-L714, > listeners on another socket stall out until the first listener is > drained. We would expect Unicorn to round-robin between the two > listeners, but that doesn't happen as long as there is work to be done > for the first listener. We've verified that deleting that `redo` block > fixes the problem. > > What do you think about the various options? > > 1. Only running that redo block if there is one listener That seems reasonable, or if ready.size == nr_listeners (proposed patch below) > 2. Removing the redo block entirely From what I recall ages ago, select() entry cost is pretty high and I remember that redo helping a fair bit even in 2009 with simple apps. Syscall cost is even higher now with CPU vulnerability mitigations, and Ruby 1.9+ GVL release+reacquire is also a penalty I didn't have when developing this on 1.8. Do you have time+hardware to benchmark either approach on a simple app? I no longer have stable/reliable hardware for benchmarking. Thanks. Totally untested patch to try approach #1 diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb index a52931a..69f1f60 100644 --- a/lib/unicorn/http_server.rb +++ b/lib/unicorn/http_server.rb @@ -686,6 +686,7 @@ def worker_loop(worker) trap(:USR1) { nr = -65536 } ready = readers.dup + nr_listeners = readers.size @after_worker_ready.call(self, worker) begin @@ -698,7 +699,6 @@ def worker_loop(worker) # but that will return false if client = sock.kgio_tryaccept process_client(client) - nr += 1 worker.tick = time_now.to_i end break if nr < 0 @@ -708,7 +708,7 @@ def worker_loop(worker) # we're probably reasonably busy, so avoid calling select() # and do a speculative non-blocking accept() on ready listeners # before we sleep again in select(). - unless nr == 0 + if ready.size == nr_listeners tmp = ready.dup redo end And `nr' can probably just be a boolean `reopen' flag if we're not overloading it as a counter. ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Sustained queuing on one listener can block requests from other listeners 2020-04-15 5:26 ` Eric Wong @ 2020-04-16 5:46 ` Stan Hu 2020-04-16 6:59 ` Eric Wong 0 siblings, 1 reply; 6+ messages in thread From: Stan Hu @ 2020-04-16 5:46 UTC (permalink / raw) To: Eric Wong; +Cc: unicorn-public Thanks, Eric. That patch didn't work; it spun the CPU. I think this worked? diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb index a52931a..aaa4955 100644 --- a/lib/unicorn/http_server.rb +++ b/lib/unicorn/http_server.rb @@ -708,7 +708,7 @@ def worker_loop(worker) # we're probably reasonably busy, so avoid calling select() # and do a speculative non-blocking accept() on ready listeners # before we sleep again in select(). - unless nr == 0 + if nr == readers.size tmp = ready.dup redo end On Tue, Apr 14, 2020 at 10:26 PM Eric Wong <e@yhbt.net> wrote: > > Stan Hu <stanhu@gmail.com> wrote: > > My unicorn.rb has two listeners: > > > > listen "127.0.0.1:8080", :tcp_nopush => false > > listen "/var/run/unicorn.socket", :backlog => 1024 > > Fwiw, lowering :backlog may make sense if you got other > hosts/instances. More below.. > > > We found that because of the greedy attempt to accept new connections > > before calling select() in > > https://github.com/defunkt/unicorn/blob/981f561a726bb4307d01e4a09a308edba8d69fe3/lib/unicorn/http_server.rb#L707-L714, > > listeners on another socket stall out until the first listener is > > drained. We would expect Unicorn to round-robin between the two > > listeners, but that doesn't happen as long as there is work to be done > > for the first listener. We've verified that deleting that `redo` block > > fixes the problem. > > > > What do you think about the various options? > > > > 1. Only running that redo block if there is one listener > > That seems reasonable, or if ready.size == nr_listeners > (proposed patch below) > > > 2. Removing the redo block entirely > > From what I recall ages ago, select() entry cost is pretty high > and I remember that redo helping a fair bit even in 2009 with > simple apps. Syscall cost is even higher now with CPU > vulnerability mitigations, and Ruby 1.9+ GVL release+reacquire > is also a penalty I didn't have when developing this on 1.8. > > Do you have time+hardware to benchmark either approach on a > simple app? I no longer have stable/reliable hardware for > benchmarking. Thanks. > > Totally untested patch to try approach #1 > > diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb > index a52931a..69f1f60 100644 > --- a/lib/unicorn/http_server.rb > +++ b/lib/unicorn/http_server.rb > @@ -686,6 +686,7 @@ def worker_loop(worker) > trap(:USR1) { nr = -65536 } > > ready = readers.dup > + nr_listeners = readers.size > @after_worker_ready.call(self, worker) > > begin > @@ -698,7 +699,6 @@ def worker_loop(worker) > # but that will return false > if client = sock.kgio_tryaccept > process_client(client) > - nr += 1 > worker.tick = time_now.to_i > end > break if nr < 0 > @@ -708,7 +708,7 @@ def worker_loop(worker) > # we're probably reasonably busy, so avoid calling select() > # and do a speculative non-blocking accept() on ready listeners > # before we sleep again in select(). > - unless nr == 0 > + if ready.size == nr_listeners > tmp = ready.dup > redo > end > > > > And `nr' can probably just be a boolean `reopen' flag if we're > not overloading it as a counter. ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Sustained queuing on one listener can block requests from other listeners 2020-04-16 5:46 ` Stan Hu @ 2020-04-16 6:59 ` Eric Wong 2020-04-16 7:24 ` Stan Hu 0 siblings, 1 reply; 6+ messages in thread From: Eric Wong @ 2020-04-16 6:59 UTC (permalink / raw) To: Stan Hu; +Cc: unicorn-public Stan Hu <stanhu@gmail.com> wrote: > Thanks, Eric. That patch didn't work; it spun the CPU. I think this worked? Oops, sorry. I was too eager to drop `nr += 1' :x Btw, please don't top post. Fwiw, I wouldn't mind if we stopped quoting at all on publically-archived lists (saves space and bandwidth). > +++ b/lib/unicorn/http_server.rb > @@ -708,7 +708,7 @@ def worker_loop(worker) > # we're probably reasonably busy, so avoid calling select() > # and do a speculative non-blocking accept() on ready listeners > # before we sleep again in select(). > - unless nr == 0 > + if nr == readers.size > tmp = ready.dup > redo > end Your patch looks close. However the `readers' array gets dropped on SIGQUIT with `nuke_listeners!', so `readers.size' is unstable. How about this? diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb index a52931a..45a2e97 100644 --- a/lib/unicorn/http_server.rb +++ b/lib/unicorn/http_server.rb @@ -686,6 +686,7 @@ def worker_loop(worker) trap(:USR1) { nr = -65536 } ready = readers.dup + nr_listeners = readers.size @after_worker_ready.call(self, worker) begin @@ -708,7 +709,7 @@ def worker_loop(worker) # we're probably reasonably busy, so avoid calling select() # and do a speculative non-blocking accept() on ready listeners # before we sleep again in select(). - unless nr == 0 + if nr == nr_listeners tmp = ready.dup redo end Thanks ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Sustained queuing on one listener can block requests from other listeners 2020-04-16 6:59 ` Eric Wong @ 2020-04-16 7:24 ` Stan Hu 2020-04-16 9:24 ` [PATCH] prevent single listener from monopolizing a worker Eric Wong 0 siblings, 1 reply; 6+ messages in thread From: Stan Hu @ 2020-04-16 7:24 UTC (permalink / raw) To: Eric Wong; +Cc: unicorn-public On Wed, Apr 15, 2020 at 11:59 PM Eric Wong <e@yhbt.net> wrote: > Your patch looks close. However the `readers' array gets > dropped on SIGQUIT with `nuke_listeners!', so `readers.size' > is unstable. That seems to work, thanks! ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] prevent single listener from monopolizing a worker 2020-04-16 7:24 ` Stan Hu @ 2020-04-16 9:24 ` Eric Wong 0 siblings, 0 replies; 6+ messages in thread From: Eric Wong @ 2020-04-16 9:24 UTC (permalink / raw) To: Stan Hu; +Cc: unicorn-public Stan Hu <stanhu@gmail.com> wrote: > That seems to work, thanks! Thanks for confirming. I'll push the patch below out. (ugh, dealing with crazy packet loss all around) Expect a v5.6.0 release within a few days or week at most. (hopefully no regressions). And... I wonder, are most deployments nowadays single listener? I don't think I've used multiple listeners for this aside from experiments in the early days. ---------8<---------- Subject: [PATCH] prevent single listener from monopolizing a worker In setups with multiple listeners, it's possible for our greedy select(2)-avoidance optimization to get pinned on a single, busy listener and starve the other listener(s). Prevent starvation by retrying the select(2)-avoidance optimization if and only if all listeners were active. This should have no effect on the majority of deployments with only a single listener. Thanks for Stan Hu for reporting and testing. Reported-by: Stan Hu <stanhu@gmail.com> Tested-by: Stan Hu <stanhu@gmail.com> Link: https://yhbt.net/unicorn-public/CAMBWrQ=Yh42MPtzJCEO7XryVknDNetRMuA87irWfqVuLdJmiBQ@mail.gmail.com/ --- lib/unicorn/http_server.rb | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb index a52931a..45a2e97 100644 --- a/lib/unicorn/http_server.rb +++ b/lib/unicorn/http_server.rb @@ -686,6 +686,7 @@ def worker_loop(worker) trap(:USR1) { nr = -65536 } ready = readers.dup + nr_listeners = readers.size @after_worker_ready.call(self, worker) begin @@ -708,7 +709,7 @@ def worker_loop(worker) # we're probably reasonably busy, so avoid calling select() # and do a speculative non-blocking accept() on ready listeners # before we sleep again in select(). - unless nr == 0 + if nr == nr_listeners tmp = ready.dup redo end ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-04-16 9:24 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-04-15 5:06 Sustained queuing on one listener can block requests from other listeners Stan Hu 2020-04-15 5:26 ` Eric Wong 2020-04-16 5:46 ` Stan Hu 2020-04-16 6:59 ` Eric Wong 2020-04-16 7:24 ` Stan Hu 2020-04-16 9:24 ` [PATCH] prevent single listener from monopolizing a worker Eric Wong
Code repositories for project(s) associated with this public inbox https://yhbt.net/unicorn.git/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).