Sustained queuing on one listener can block requests from other listeners

unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed

* Sustained queuing on one listener can block requests from other listeners
@ 2020-04-15  5:06 Stan Hu
  2020-04-15  5:26 ` Eric Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Stan Hu @ 2020-04-15  5:06 UTC (permalink / raw)
  To: unicorn-public

My unicorn.rb has two listeners:

listen "127.0.0.1:8080", :tcp_nopush => false
listen "/var/run/unicorn.socket", :backlog => 1024

We found that because of the greedy attempt to accept new connections
before calling select() in
https://github.com/defunkt/unicorn/blob/981f561a726bb4307d01e4a09a308edba8d69fe3/lib/unicorn/http_server.rb#L707-L714,
listeners on another socket stall out until the first listener is
drained. We would expect Unicorn to round-robin between the two
listeners, but that doesn't happen as long as there is work to be done
for the first listener. We've verified that deleting that `redo` block
fixes the problem.

What do you think about the various options?

1. Only running that redo block if there is one listener
2. Removing the redo block entirely

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Sustained queuing on one listener can block requests from other listeners
  2020-04-15  5:06 Sustained queuing on one listener can block requests from other listeners Stan Hu
@ 2020-04-15  5:26 ` Eric Wong
  2020-04-16  5:46   ` Stan Hu
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2020-04-15  5:26 UTC (permalink / raw)
  To: Stan Hu; +Cc: unicorn-public

Stan Hu <stanhu@gmail.com> wrote:
> My unicorn.rb has two listeners:
> 
> listen "127.0.0.1:8080", :tcp_nopush => false
> listen "/var/run/unicorn.socket", :backlog => 1024

Fwiw, lowering :backlog may make sense if you got other
hosts/instances.  More below..

> We found that because of the greedy attempt to accept new connections
> before calling select() in
> https://github.com/defunkt/unicorn/blob/981f561a726bb4307d01e4a09a308edba8d69fe3/lib/unicorn/http_server.rb#L707-L714,
> listeners on another socket stall out until the first listener is
> drained. We would expect Unicorn to round-robin between the two
> listeners, but that doesn't happen as long as there is work to be done
> for the first listener. We've verified that deleting that `redo` block
> fixes the problem.
> 
> What do you think about the various options?
> 
> 1. Only running that redo block if there is one listener

That seems reasonable, or if ready.size == nr_listeners
(proposed patch below)

> 2. Removing the redo block entirely

From what I recall ages ago, select() entry cost is pretty high
and I remember that redo helping a fair bit even in 2009 with
simple apps.  Syscall cost is even higher now with CPU
vulnerability mitigations, and Ruby 1.9+ GVL release+reacquire
is also a penalty I didn't have when developing this on 1.8.

Do you have time+hardware to benchmark either approach on a
simple app?  I no longer have stable/reliable hardware for
benchmarking.  Thanks.

Totally untested patch to try approach #1

diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb
index a52931a..69f1f60 100644
--- a/lib/unicorn/http_server.rb
+++ b/lib/unicorn/http_server.rb
@@ -686,6 +686,7 @@ def worker_loop(worker)
     trap(:USR1) { nr = -65536 }
 
     ready = readers.dup
+    nr_listeners = readers.size
     @after_worker_ready.call(self, worker)
 
     begin
@@ -698,7 +699,6 @@ def worker_loop(worker)
         # but that will return false
         if client = sock.kgio_tryaccept
           process_client(client)
-          nr += 1
           worker.tick = time_now.to_i
         end
         break if nr < 0
@@ -708,7 +708,7 @@ def worker_loop(worker)
       # we're probably reasonably busy, so avoid calling select()
       # and do a speculative non-blocking accept() on ready listeners
       # before we sleep again in select().
-      unless nr == 0
+      if ready.size == nr_listeners
         tmp = ready.dup
         redo
       end



And `nr' can probably just be a boolean `reopen' flag if we're
not overloading it as a counter.

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Sustained queuing on one listener can block requests from other listeners
  2020-04-15  5:26 ` Eric Wong
@ 2020-04-16  5:46   ` Stan Hu
  2020-04-16  6:59     ` Eric Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Stan Hu @ 2020-04-16  5:46 UTC (permalink / raw)
  To: Eric Wong; +Cc: unicorn-public

Thanks, Eric. That patch didn't work; it spun the CPU. I think this worked?

diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb
index a52931a..aaa4955 100644
--- a/lib/unicorn/http_server.rb
+++ b/lib/unicorn/http_server.rb
@@ -708,7 +708,7 @@ def worker_loop(worker)
       # we're probably reasonably busy, so avoid calling select()
       # and do a speculative non-blocking accept() on ready listeners
       # before we sleep again in select().
-      unless nr == 0
+      if nr == readers.size
         tmp = ready.dup
         redo
       end

On Tue, Apr 14, 2020 at 10:26 PM Eric Wong <e@yhbt.net> wrote:
>
> Stan Hu <stanhu@gmail.com> wrote:
> > My unicorn.rb has two listeners:
> >
> > listen "127.0.0.1:8080", :tcp_nopush => false
> > listen "/var/run/unicorn.socket", :backlog => 1024
>
> Fwiw, lowering :backlog may make sense if you got other
> hosts/instances.  More below..
>
> > We found that because of the greedy attempt to accept new connections
> > before calling select() in
> > https://github.com/defunkt/unicorn/blob/981f561a726bb4307d01e4a09a308edba8d69fe3/lib/unicorn/http_server.rb#L707-L714,
> > listeners on another socket stall out until the first listener is
> > drained. We would expect Unicorn to round-robin between the two
> > listeners, but that doesn't happen as long as there is work to be done
> > for the first listener. We've verified that deleting that `redo` block
> > fixes the problem.
> >
> > What do you think about the various options?
> >
> > 1. Only running that redo block if there is one listener
>
> That seems reasonable, or if ready.size == nr_listeners
> (proposed patch below)
>
> > 2. Removing the redo block entirely
>
> From what I recall ages ago, select() entry cost is pretty high
> and I remember that redo helping a fair bit even in 2009 with
> simple apps.  Syscall cost is even higher now with CPU
> vulnerability mitigations, and Ruby 1.9+ GVL release+reacquire
> is also a penalty I didn't have when developing this on 1.8.
>
> Do you have time+hardware to benchmark either approach on a
> simple app?  I no longer have stable/reliable hardware for
> benchmarking.  Thanks.
>
> Totally untested patch to try approach #1
>
> diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb
> index a52931a..69f1f60 100644
> --- a/lib/unicorn/http_server.rb
> +++ b/lib/unicorn/http_server.rb
> @@ -686,6 +686,7 @@ def worker_loop(worker)
>      trap(:USR1) { nr = -65536 }
>
>      ready = readers.dup
> +    nr_listeners = readers.size
>      @after_worker_ready.call(self, worker)
>
>      begin
> @@ -698,7 +699,6 @@ def worker_loop(worker)
>          # but that will return false
>          if client = sock.kgio_tryaccept
>            process_client(client)
> -          nr += 1
>            worker.tick = time_now.to_i
>          end
>          break if nr < 0
> @@ -708,7 +708,7 @@ def worker_loop(worker)
>        # we're probably reasonably busy, so avoid calling select()
>        # and do a speculative non-blocking accept() on ready listeners
>        # before we sleep again in select().
> -      unless nr == 0
> +      if ready.size == nr_listeners
>          tmp = ready.dup
>          redo
>        end
>
>
>
> And `nr' can probably just be a boolean `reopen' flag if we're
> not overloading it as a counter.

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Sustained queuing on one listener can block requests from other listeners
  2020-04-16  5:46   ` Stan Hu
@ 2020-04-16  6:59     ` Eric Wong
  2020-04-16  7:24       ` Stan Hu
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2020-04-16  6:59 UTC (permalink / raw)
  To: Stan Hu; +Cc: unicorn-public

Stan Hu <stanhu@gmail.com> wrote:
> Thanks, Eric. That patch didn't work; it spun the CPU. I think this worked?

Oops, sorry.  I was too eager to drop `nr += 1' :x

Btw, please don't top post.  Fwiw, I wouldn't mind if we stopped
quoting at all on publically-archived lists (saves space and
bandwidth).

> +++ b/lib/unicorn/http_server.rb
> @@ -708,7 +708,7 @@ def worker_loop(worker)
>        # we're probably reasonably busy, so avoid calling select()
>        # and do a speculative non-blocking accept() on ready listeners
>        # before we sleep again in select().
> -      unless nr == 0
> +      if nr == readers.size
>          tmp = ready.dup
>          redo
>        end

Your patch looks close.  However the `readers' array gets
dropped on SIGQUIT with `nuke_listeners!', so `readers.size'
is unstable.

How about this?

diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb
index a52931a..45a2e97 100644
--- a/lib/unicorn/http_server.rb
+++ b/lib/unicorn/http_server.rb
@@ -686,6 +686,7 @@ def worker_loop(worker)
     trap(:USR1) { nr = -65536 }
 
     ready = readers.dup
+    nr_listeners = readers.size
     @after_worker_ready.call(self, worker)
 
     begin
@@ -708,7 +709,7 @@ def worker_loop(worker)
       # we're probably reasonably busy, so avoid calling select()
       # and do a speculative non-blocking accept() on ready listeners
       # before we sleep again in select().
-      unless nr == 0
+      if nr == nr_listeners
         tmp = ready.dup
         redo
       end

Thanks

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Sustained queuing on one listener can block requests from other listeners
  2020-04-16  6:59     ` Eric Wong
@ 2020-04-16  7:24       ` Stan Hu
  2020-04-16  9:24         ` [PATCH] prevent single listener from monopolizing a worker Eric Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Stan Hu @ 2020-04-16  7:24 UTC (permalink / raw)
  To: Eric Wong; +Cc: unicorn-public

On Wed, Apr 15, 2020 at 11:59 PM Eric Wong <e@yhbt.net> wrote:
> Your patch looks close.  However the `readers' array gets
> dropped on SIGQUIT with `nuke_listeners!', so `readers.size'
> is unstable.

That seems to work, thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] prevent single listener from monopolizing a worker
  2020-04-16  7:24       ` Stan Hu
@ 2020-04-16  9:24         ` Eric Wong
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Wong @ 2020-04-16  9:24 UTC (permalink / raw)
  To: Stan Hu; +Cc: unicorn-public

Stan Hu <stanhu@gmail.com> wrote:
> That seems to work, thanks!

Thanks for confirming.  I'll push the patch below out.
(ugh, dealing with crazy packet loss all around)

Expect a v5.6.0 release within a few days or week at most.
(hopefully no regressions).

And... I wonder, are most deployments nowadays single listener?

I don't think I've used multiple listeners for this aside from
experiments in the early days.

---------8<----------
Subject: [PATCH] prevent single listener from monopolizing a worker

In setups with multiple listeners, it's possible for our greedy
select(2)-avoidance optimization to get pinned on a single, busy
listener and starve the other listener(s).

Prevent starvation by retrying the select(2)-avoidance
optimization if and only if all listeners were active.  This
should have no effect on the majority of deployments with only a
single listener.

Thanks for Stan Hu for reporting and testing.

Reported-by: Stan Hu <stanhu@gmail.com>
Tested-by: Stan Hu <stanhu@gmail.com>
Link: https://yhbt.net/unicorn-public/CAMBWrQ=Yh42MPtzJCEO7XryVknDNetRMuA87irWfqVuLdJmiBQ@mail.gmail.com/
---
 lib/unicorn/http_server.rb | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb
index a52931a..45a2e97 100644
--- a/lib/unicorn/http_server.rb
+++ b/lib/unicorn/http_server.rb
@@ -686,6 +686,7 @@ def worker_loop(worker)
     trap(:USR1) { nr = -65536 }

     ready = readers.dup
+    nr_listeners = readers.size
     @after_worker_ready.call(self, worker)

     begin
@@ -708,7 +709,7 @@ def worker_loop(worker)
       # we're probably reasonably busy, so avoid calling select()
       # and do a speculative non-blocking accept() on ready listeners
       # before we sleep again in select().
-      unless nr == 0
+      if nr == nr_listeners
         tmp = ready.dup
         redo
       end

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-04-16  9:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-15  5:06 Sustained queuing on one listener can block requests from other listeners Stan Hu
2020-04-15  5:26 ` Eric Wong
2020-04-16  5:46   ` Stan Hu
2020-04-16  6:59     ` Eric Wong
2020-04-16  7:24       ` Stan Hu
2020-04-16  9:24         ` [PATCH] prevent single listener from monopolizing a worker Eric Wong

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).