fork() errors lead to a completely dead unicorn

unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed

* fork() errors lead to a completely dead unicorn
@ 2014-09-03 10:13 Jonathan del Strother
  2014-09-03 17:11 ` Eric Wong
  0 siblings, 1 reply; 3+ messages in thread
From: Jonathan del Strother @ 2014-09-03 10:13 UTC (permalink / raw)
  To: unicorn-public

Hi - on SmartOS & Solaris, we occasionally run into problems where
unicorn receives USR2 to reload itself, but can't fork off its workers
due to not having enough RAM.  It then kills all of its workers and
sits there failing to process any requests.  Unfortunately, the master
process stays alive - if it actually died, we'd be able to
automatically restart it.

Can we do anything to handle this more elegantly?

Jonathan

PS: An example log file from when this occurs -

I, [2014-09-03T08:51:29.034227 #7556]  INFO -- : executing
["/app/common/bundle/ruby/2.1.0/bin/unicorn", "--env",
"production-live", "--daemonize", "--config-file",
"/app/code/config/unicorn.rb", {17=>#<Kgio::TCPServer:fd 17>}] (in
/app/code)
I, [2014-09-03T08:51:29.035223 #7556]  INFO -- : forked child re-executing...
I, [2014-09-03T08:51:30.480393 #7556]  INFO -- : inherited
addr=0.0.0.0:8090 fd=17
I, [2014-09-03T08:51:30.481257 #7556]  INFO -- : Refreshing Gem list
D, [2014-09-03T08:51:41.715061 #7556] DEBUG -- : ** [Airbrake]
Notifier 3.1.14 ready to catch errors
I, [2014-09-03T08:51:45.437499 #7952]  INFO -- : worker=0 ready
I, [2014-09-03T08:51:45.471084 #7959]  INFO -- : worker=1 ready
I, [2014-09-03T08:51:45.513301 #7960]  INFO -- : worker=2 ready
I, [2014-09-03T08:51:45.558417 #7961]  INFO -- : worker=3 ready
E, [2014-09-03T08:51:45.931282 #7556] ERROR -- : Not enough space -
fork(2) (Errno::ENOMEM)
/app/common/bundle/ruby/2.1.0/gems/unicorn-4.8.3/lib/unicorn/http_server.rb:520:in
`fork'
/app/common/bundle/ruby/2.1.0/gems/unicorn-4.8.3/lib/unicorn/http_server.rb:520:in
`spawn_missing_workers'
/app/common/bundle/ruby/2.1.0/gems/unicorn-4.8.3/lib/unicorn/http_server.rb:140:in
`start'
/app/common/bundle/ruby/2.1.0/gems/unicorn-4.8.3/bin/unicorn:126:in
`<top (required)>'
/app/common/bundle/ruby/2.1.0/bin/unicorn:23:in `load'
/app/common/bundle/ruby/2.1.0/bin/unicorn:23:in `<main>'
E, [2014-09-03T08:51:46.139737 #10484] ERROR -- : reaped
#<Process::Status: pid 7556 exit 1> exec()-ed
I, [2014-09-03T08:51:48.069452 #10484]  INFO -- : reaped
#<Process::Status: pid 21801 exit 0> worker=1
I, [2014-09-03T08:51:48.372431 #10484]  INFO -- : reaped
#<Process::Status: pid 67829 exit 0> worker=3
I, [2014-09-03T08:51:48.473412 #10484]  INFO -- : reaped
#<Process::Status: pid 57211 exit 0> worker=0
I, [2014-09-03T08:51:48.574279 #10484]  INFO -- : reaped
#<Process::Status: pid 70992 exit 0> worker=2
I, [2014-09-03T08:51:48.675085 #10484]  INFO -- : reaped
#<Process::Status: pid 11195 exit 0> worker=5
I, [2014-09-03T08:51:48.876051 #10484]  INFO -- : reaped
#<Process::Status: pid 11194 exit 0> worker=4
I, [2014-09-03T08:51:48.876341 #10484]  INFO -- : master complete

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: fork() errors lead to a completely dead unicorn
  2014-09-03 10:13 fork() errors lead to a completely dead unicorn Jonathan del Strother
@ 2014-09-03 17:11 ` Eric Wong
  2014-09-07 10:12   ` Jonathan del Strother
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Wong @ 2014-09-03 17:11 UTC (permalink / raw)
  To: Jonathan del Strother; +Cc: unicorn-public

Jonathan del Strother <maillist@steelskies.com> wrote:
> Hi - on SmartOS & Solaris, we occasionally run into problems where
> unicorn receives USR2 to reload itself, but can't fork off its workers
> due to not having enough RAM.  It then kills all of its workers and
> sits there failing to process any requests.  Unfortunately, the master
> process stays alive - if it actually died, we'd be able to
> automatically restart it.

I wonder if this is an SMF problem.  At the bottom of your log,
it says "master complete", which seems to be the master which received
the USR2.

I'll walk through the log to see how things look from my end...

> Can we do anything to handle this more elegantly?

> PS: An example log file from when this occurs -
> 
> I, [2014-09-03T08:51:29.034227 #7556]  INFO -- : executing
> ["/app/common/bundle/ruby/2.1.0/bin/unicorn", "--env",
> "production-live", "--daemonize", "--config-file",

7556 is the new child which eventually fails.

> "/app/code/config/unicorn.rb", {17=>#<Kgio::TCPServer:fd 17>}] (in
> /app/code)
> I, [2014-09-03T08:51:29.035223 #7556]  INFO -- : forked child re-executing...
> I, [2014-09-03T08:51:30.480393 #7556]  INFO -- : inherited
> addr=0.0.0.0:8090 fd=17
> I, [2014-09-03T08:51:30.481257 #7556]  INFO -- : Refreshing Gem list
> D, [2014-09-03T08:51:41.715061 #7556] DEBUG -- : ** [Airbrake]
> Notifier 3.1.14 ready to catch errors
> I, [2014-09-03T08:51:45.437499 #7952]  INFO -- : worker=0 ready
> I, [2014-09-03T08:51:45.471084 #7959]  INFO -- : worker=1 ready
> I, [2014-09-03T08:51:45.513301 #7960]  INFO -- : worker=2 ready
> I, [2014-09-03T08:51:45.558417 #7961]  INFO -- : worker=3 ready
> E, [2014-09-03T08:51:45.931282 #7556] ERROR -- : Not enough space -
> fork(2) (Errno::ENOMEM)

OK, fork fails from the new child; current behavior is to exit.

> /app/common/bundle/ruby/2.1.0/gems/unicorn-4.8.3/lib/unicorn/http_server.rb:520:in
> `fork'
> /app/common/bundle/ruby/2.1.0/gems/unicorn-4.8.3/lib/unicorn/http_server.rb:520:in
> `spawn_missing_workers'
> /app/common/bundle/ruby/2.1.0/gems/unicorn-4.8.3/lib/unicorn/http_server.rb:140:in
> `start'
> /app/common/bundle/ruby/2.1.0/gems/unicorn-4.8.3/bin/unicorn:126:in
> `<top (required)>'

OK, this should've hit the exit! case in spawn_missing_workers,
and it does...

> /app/common/bundle/ruby/2.1.0/bin/unicorn:23:in `load'
> /app/common/bundle/ruby/2.1.0/bin/unicorn:23:in `<main>'
> E, [2014-09-03T08:51:46.139737 #10484] ERROR -- : reaped
> #<Process::Status: pid 7556 exit 1> exec()-ed

old master which originally received USR2 is notified the new master
(7556) died

> I, [2014-09-03T08:51:48.069452 #10484]  INFO -- : reaped
> #<Process::Status: pid 21801 exit 0> worker=1
> I, [2014-09-03T08:51:48.372431 #10484]  INFO -- : reaped
> #<Process::Status: pid 67829 exit 0> worker=3
> I, [2014-09-03T08:51:48.473412 #10484]  INFO -- : reaped
> #<Process::Status: pid 57211 exit 0> worker=0
> I, [2014-09-03T08:51:48.574279 #10484]  INFO -- : reaped
> #<Process::Status: pid 70992 exit 0> worker=2
> I, [2014-09-03T08:51:48.675085 #10484]  INFO -- : reaped
> #<Process::Status: pid 11195 exit 0> worker=5
> I, [2014-09-03T08:51:48.876051 #10484]  INFO -- : reaped
> #<Process::Status: pid 11194 exit 0> worker=4

Workers in the old master dying looks like the SMF problem you
encountered with SIGABRT earlier.

> I, [2014-09-03T08:51:48.876341 #10484]  INFO -- : master complete

But the original master does not die after this?

Can you truss it and see if it's stuck on reading/unlinking the pidfile?
That would the only thing preventing the master from actually dying,
but the old master dying should not happen in the first place.

-- 
EW

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: fork() errors lead to a completely dead unicorn
  2014-09-03 17:11 ` Eric Wong
@ 2014-09-07 10:12   ` Jonathan del Strother
  0 siblings, 0 replies; 3+ messages in thread
From: Jonathan del Strother @ 2014-09-07 10:12 UTC (permalink / raw)
  To: Eric Wong; +Cc: unicorn-public

Just wanted to say thanks for the reply - I've been trying to figure
this out over the weekend and not succeeding.  I can't seem to
reproduce it in a self-contained environment, it only ever happens in
production, which is making debugging a bit frustrating...

>
>> I, [2014-09-03T08:51:48.876341 #10484]  INFO -- : master complete
>
> But the original master does not die after this?

99% sure it doesn't - it just sits there in a zombie state with no
workers.  But I want to verify that, so I guess I'm stuck waiting
until it happens in production again.  Will let you know.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-09-07 10:13 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-03 10:13 fork() errors lead to a completely dead unicorn Jonathan del Strother
2014-09-03 17:11 ` Eric Wong
2014-09-07 10:12   ` Jonathan del Strother

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).