unicorn worker being killed issue

unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed

* unicorn worker being killed issue
@ 2023-05-06 22:02 subashkc1
  2023-05-07 23:13 ` Eric Wong
  0 siblings, 1 reply; 7+ messages in thread
From: subashkc1 @ 2023-05-06 22:02 UTC (permalink / raw)
  To: unicorn-public@yhbt.net

Hey,

First of all, thanks for this awesome project.

I have a weird issue that I'm having trouble debugging. I've created a sample bare bones rails app where there issue is reproducible https://github.com/subashkc/sidekiq_webui_issue.

After running the rails app, and load up the sidekiq web UI, if I click on any tabs e.g. busy/queues then the request is served right away with a 200 but it seems there is a 2nd request (can't tell where from) which eventually times out and kills the unicorn worker.

I originally opened the issue with sidekiq repo but Mike (sidekiq owner) wasn't able to find anything wrong with sidekiq code and he suggested it seemed unicorn was double reading the request and the 2nd one dies after 60s timeout.

Warm Regards

Sent with Proton Mail secure email.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: unicorn worker being killed issue
  2023-05-06 22:02 unicorn worker being killed issue subashkc1
@ 2023-05-07 23:13 ` Eric Wong
  2023-05-10 21:34   ` subashkc1
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Wong @ 2023-05-07 23:13 UTC (permalink / raw)
  To: subashkc1; +Cc: unicorn-public

subashkc1 <subashkc1@protonmail.com> wrote:
> I have a weird issue that I'm having trouble debugging. I've
> created a sample bare bones rails app where there issue is
> reproducible https://github.com/subashkc/sidekiq_webui_issue.

That's too big and having https://enterprise.contribsys.com/ as
a source in your Gemfile means I'm probably not able/allowed
to inspect those gems.

Some general questions:

* how are you starting unicorn and what is the config file?
  (mainly, is `preload_app' enabled? and any hooks?)

  `preload_app true' is the most common source of problems if
  the rest of your application (including all dependencies)
  isn't equipped to handle fork.

* What are the RACK_ENV / RAILS_ENV?

`unicorn_rails' is actually pointless nowadays, and the plain
`unicorn' command is highly recommended.

Use a single worker for debugging, especially if you want to
strace/truss the process.

Fwiw, I haven't touched Rails roughly a decade (and barely at
that).  My only web apps for the past 15 years or so are on bare
Rack and meant to be consumed via curl or w3m (nothing involving
JavaScript or fancy browsers).

Maybe others watching this mailbox can chime in....

> After running the rails app, and load up the sidekiq web UI,
> if I click on any tabs e.g. busy/queues then the request is
> served right away with a 200 but it seems there is a 2nd
> request (can't tell where from) which eventually times out and
> kills the unicorn worker.

I suspect a 2nd request comes from some JS code.  Check for a
network debugging tool included with your browser if it has one.

Or use a single unicorn worker, then strace/truss that worker.
Something like `strace -p $PID_OF_WORKER -f -v -s5000 -o /tmp/dump`
(assuming Linux)

Are you able to reproduce the issue with JS disabled?
Preferably just via curl to minimize surprises.

> I originally opened the issue with sidekiq repo but Mike
> (sidekiq owner) wasn't able to find anything wrong with
> sidekiq code and he suggested it seemed unicorn was double
> reading the request and the 2nd one dies after 60s timeout.

Double-reading a request is impossible unless there's something
very broken in Ruby or your networking stack.  There'd be a
hundreds of reports about it if that were true if Ruby or
unicorn were double-reading requests.

Are you going through some firewalls, or caching/scanning
proxies which might send a request twice?  I've seen that before.

I have seen some folks accidentally enable multiple loggers,
so it looks like a single request is happening multiple times.

strace is usually my tool for debugging stuff like this; along
with sprinkling `warn "#{__LINE__} ..."' statements throughout.
And then perhaps commenting out lots of code and disabling
as many dependencies as possible.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: unicorn worker being killed issue
  2023-05-07 23:13 ` Eric Wong
@ 2023-05-10 21:34   ` subashkc1
  2023-05-11 19:08     ` Eric Wong
  0 siblings, 1 reply; 7+ messages in thread
From: subashkc1 @ 2023-05-10 21:34 UTC (permalink / raw)
  To: Eric Wong; +Cc: unicorn-public

Hi Eric,

Thanks for you reply, I'll try to answer all questions below



> subashkc1 subashkc1@protonmail.com wrote:
>
> > I have a weird issue that I'm having trouble debugging. I've
> > created a sample bare bones rails app where there issue is
> > reproducible https://github.com/subashkc/sidekiq_webui_issue.
>
>
> That's too big and having https://enterprise.contribsys.com/ as
> a source in your Gemfile means I'm probably not able/allowed
> to inspect those gems.
>

I've removed the sidekiq pro form gemfile, it also happens with just the sidekiq gem

> Some general questions:
>
> * how are you starting unicorn and what is the config file?
> (mainly, is `preload_app' enabled? and any hooks?)` preload_app true' is the most common source of problems if
> the rest of your application (including all dependencies)
> isn't equipped to handle fork.

unicorn is started just with `bundle exec unicorn` from a Procfile using foreman, no other config options so its just 1 unicorn worker and its a very bare-bones rails app to reproduce the issue, all I did was add sidekiq gem, and load up the sidekiq webui to see the issue in action
>
> * What are the RACK_ENV / RAILS_ENV?
>

It is happening locally, so all env are development

> `unicorn_rails' is actually pointless nowadays, and the plain` unicorn' command is highly recommended.

I'm only using the unicorn gem, not touching unicorn_rails at all

>
> Use a single worker for debugging, especially if you want to
> strace/truss the process.

Yeah just using 1 unicorn worker with all the default configs, (this repo doesn't have a unicorn config file so all defaults)

>
> Fwiw, I haven't touched Rails roughly a decade (and barely at
> that). My only web apps for the past 15 years or so are on bare
> Rack and meant to be consumed via curl or w3m (nothing involving
> JavaScript or fancy browsers).
>
> Maybe others watching this mailbox can chime in....
>
> > After running the rails app, and load up the sidekiq web UI,
> > if I click on any tabs e.g. busy/queues then the request is
> > served right away with a 200 but it seems there is a 2nd
> > request (can't tell where from) which eventually times out and
> > kills the unicorn worker.
>
>
> I suspect a 2nd request comes from some JS code. Check for a
> network debugging tool included with your browser if it has one.
>

Yeah that's what I can't figure out, I can't see any other requests in the browser network tools

> Or use a single unicorn worker, then strace/truss that worker.
> Something like `strace -p $PID_OF_WORKER -f -v -s5000 -o /tmp/dump`
> (assuming Linux)

I did this, but the output was so verbose I couldn't make any sense of it as I wasn't sure what to look for

>
> Are you able to reproduce the issue with JS disabled?
> Preferably just via curl to minimize surprises.

Ok, it doesn't happen if I do the same request using curl, just doing `curl localhost:8080/sidekiq/busy` and if I do this right after clicking on the tab in the UI, this request doesn't get served right away, it waits until the request from the UI times out >60s, kills the unicorn worker and then the curl req is served

>
> > I originally opened the issue with sidekiq repo but Mike
> > (sidekiq owner) wasn't able to find anything wrong with
> > sidekiq code and he suggested it seemed unicorn was double
> > reading the request and the 2nd one dies after 60s timeout.
>
>
> Double-reading a request is impossible unless there's something
> very broken in Ruby or your networking stack. There'd be a
> hundreds of reports about it if that were true if Ruby or
> unicorn were double-reading requests.
>
> Are you going through some firewalls, or caching/scanning
> proxies which might send a request twice? I've seen that before.
>

No, this is all happening locally on my dev machine, no firewalls or proxies involved

> I have seen some folks accidentally enable multiple loggers,
> so it looks like a single request is happening multiple times.
>

No, just a very basic rails app with the default logger, haven't enabled or added anything else

> strace is usually my tool for debugging stuff like this; along
> with sprinkling `warn "#{LINE} ..."' statements throughout.
> And then perhaps commenting out lots of code and disabling
> as many dependencies as possible.

Yeah it is a rails app with just the sidekiq gem, no other dependencies and no code to disable, haven't added anything.

Let me know if that gives you some hints or if there are other things that I can try or check

Warm Regards

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: unicorn worker being killed issue
  2023-05-10 21:34   ` subashkc1
@ 2023-05-11 19:08     ` Eric Wong
  2023-05-12  6:41       ` subashkc1
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Wong @ 2023-05-11 19:08 UTC (permalink / raw)
  To: subashkc1; +Cc: unicorn-public, Mike Perham

subashkc1 <subashkc1@protonmail.com> wrote:
> Eric Wong wrote:
> > Or use a single unicorn worker, then strace/truss that worker.
> > Something like `strace -p $PID_OF_WORKER -f -v -s5000 -o /tmp/dump`
> > (assuming Linux)
> I did this, but the output was so verbose I couldn't make any
> sense of it as I wasn't sure what to look for

+cc Mike Perham since it's sidekiq-related

subashkc1:  Can you show us the lines leading up to the SIGKILL?

Anything starting with the last accept|accept4 syscall up to
when it says the process is killed should help.

Adding `-t' to enable strace timestamps can also help, thus
`strace -t -p $PID_OF_WORKER -f -v -s5000 -o /tmp/dump`


reminders: disable HTML and reply-to-all to ensure replication + archival
subscribers: use List-Unsubscribe header for one-click unsubscribe

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: unicorn worker being killed issue
  2023-05-11 19:08     ` Eric Wong
@ 2023-05-12  6:41       ` subashkc1
  2023-05-12  8:05         ` Eric Wong
  0 siblings, 1 reply; 7+ messages in thread
From: subashkc1 @ 2023-05-12  6:41 UTC (permalink / raw)
  To: Eric Wong; +Cc: unicorn-public, Mike Perham

Hi Eric,

Here is the trace from accept4 upto SIGKILL

46585 16:32:39 accept4(11, {sa_family=AF_INET, sin_port=htons(59496), sin_addr=inet_addr("127.0.0.1")}, [128 => 16], SOCK_CLOEXEC) = 9
46585 16:32:39 recvfrom(9, 0x56012fa88ca0, 16384, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
46585 16:32:39 getpid()                 = 46585
46585 16:32:39 ppoll([{fd=9, events=POLLIN}], 1, NULL, NULL, 8 <unfinished ...>
46601 16:33:38 <... ppoll resumed>)     = 0 (Timeout)
46601 16:33:38 getpid()                 = 46585
46601 16:33:38 read(3, 0x7fc1327506a0, 8) = -1 EAGAIN (Resource temporarily unavailable)
46601 16:33:38 getpid()                 = 46585
46601 16:33:38 sched_yield()            = 0
46601 16:33:38 ppoll([{fd=3, events=POLLIN}], 1, {tv_sec=60, tv_nsec=0}, NULL, 8 <unfinished ...>) = ?
46586 16:33:40 <... futex resumed>)     = ?
46585 16:33:40 <... ppoll resumed> <unfinished ...>) = ?
46601 16:33:40 +++ killed by SIGKILL +++
46586 16:33:40 +++ killed by SIGKILL +++
46585 16:33:40 +++ killed by SIGKILL +++

I've uploaded the whole strace dump into codeshare if you need the whole file, https://codeshare.io/3AdLy6.

Warm Regards



Subash K.C., Software Developer | 397 Mt Dandenong Road, Croydon, VIC 3134  | 0405299404

Sent with Proton Mail secure email.

------- Original Message -------
On Friday, May 12th, 2023 at 05:08, Eric Wong <e@80x24.org> wrote:


> subashkc1 subashkc1@protonmail.com wrote:
> 
> > Eric Wong wrote:
> > 
> > > Or use a single unicorn worker, then strace/truss that worker.
> > > Something like `strace -p $PID_OF_WORKER -f -v -s5000 -o /tmp/dump`
> > > (assuming Linux)
> > > I did this, but the output was so verbose I couldn't make any
> > > sense of it as I wasn't sure what to look for
> 
> 
> +cc Mike Perham since it's sidekiq-related
> 
> subashkc1: Can you show us the lines leading up to the SIGKILL?
> 
> Anything starting with the last accept|accept4 syscall up to
> when it says the process is killed should help.
> 
> Adding `-t' to enable strace timestamps can also help, thus` strace -t -p $PID_OF_WORKER -f -v -s5000 -o /tmp/dump`
> 
> 
> reminders: disable HTML and reply-to-all to ensure replication + archival
> subscribers: use List-Unsubscribe header for one-click unsubscribe

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: unicorn worker being killed issue
  2023-05-12  6:41       ` subashkc1
@ 2023-05-12  8:05         ` Eric Wong
  2023-05-13  0:10           ` Eric Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Wong @ 2023-05-12  8:05 UTC (permalink / raw)
  To: subashkc1; +Cc: unicorn-public, Mike Perham

subashkc1 <subashkc1@protonmail.com> wrote:
> Hi Eric,
> 
> Here is the trace from accept4 upto SIGKILL
> 
> 46585 16:32:39 accept4(11, {sa_family=AF_INET, sin_port=htons(59496), sin_addr=inet_addr("127.0.0.1")}, [128 => 16], SOCK_CLOEXEC) = 9
> 46585 16:32:39 recvfrom(9, 0x56012fa88ca0, 16384, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable)
> 46585 16:32:39 getpid()                 = 46585
> 46585 16:32:39 ppoll([{fd=9, events=POLLIN}], 1, NULL, NULL, 8 <unfinished ...>
> 46601 16:33:38 <... ppoll resumed>)     = 0 (Timeout)
> 46601 16:33:38 getpid()                 = 46585
> 46601 16:33:38 read(3, 0x7fc1327506a0, 8) = -1 EAGAIN (Resource temporarily unavailable)
> 46601 16:33:38 getpid()                 = 46585
> 46601 16:33:38 sched_yield()            = 0
> 46601 16:33:38 ppoll([{fd=3, events=POLLIN}], 1, {tv_sec=60, tv_nsec=0}, NULL, 8 <unfinished ...>) = ?
> 46586 16:33:40 <... futex resumed>)     = ?
> 46585 16:33:40 <... ppoll resumed> <unfinished ...>) = ?
> 46601 16:33:40 +++ killed by SIGKILL +++
> 46586 16:33:40 +++ killed by SIGKILL +++
> 46585 16:33:40 +++ killed by SIGKILL +++

OK, threadid 46485 looks like the worker thread, and
46601 looks like the Ruby timer thread.

FD=9 is the client socket, so it looks like you have a client
that's opening a connection and not doing anything so recvfrom()
fails and ppoll times out.  You'd need to track down why you
have a client opening a connection like that.

You can use `ss -tp' to dump the TCP ports and processes
using them.  So you can find the process which pairs with
the socket accepted to find the culprit client.

In your above process, you'd look for the one which pairs with
127.0.0.1:59496 (based on what I see in the accept4 line);
but the port changes every connection.

So something like: `ss -tp | grep 127.0.0.1:59496' should show
both the unicorn worker and the local client process.

> I've uploaded the whole strace dump into codeshare if you need
> the whole file, https://codeshare.io/3AdLy6.

It's not viewable to me since it requires JS; but shouldn't be
needed.  The 16:32:xx => 16:33:xx range was enough.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: unicorn worker being killed issue
  2023-05-12  8:05         ` Eric Wong
@ 2023-05-13  0:10           ` Eric Wong
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Wong @ 2023-05-13  0:10 UTC (permalink / raw)
  To: subashkc1; +Cc: unicorn-public, Mike Perham

Eric Wong <e@80x24.org> wrote:
> FD=9 is the client socket, so it looks like you have a client
> that's opening a connection and not doing anything so recvfrom()
> fails and ppoll times out.  You'd need to track down why you
> have a client opening a connection like that.

Btw, this wouldn't be a problem in production with nginx in
front of unicorn (which is the only recommended production
configuration).

nginx acts as a sponge to buffer the entire request (including
request body) since unicorn was never designed for slow clients,
malicious or not.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-05-13  0:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-06 22:02 unicorn worker being killed issue subashkc1
2023-05-07 23:13 ` Eric Wong
2023-05-10 21:34   ` subashkc1
2023-05-11 19:08     ` Eric Wong
2023-05-12  6:41       ` subashkc1
2023-05-12  8:05         ` Eric Wong
2023-05-13  0:10           ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).