Master repeatedly killing workers due to timeouts

* Master repeatedly killing workers due to timeouts
@ 2011-12-08 18:19 Jonathan del Strother
  2011-12-08 20:23 ` Eric Wong
  2011-12-11 14:19 ` Troex Nevelin
  0 siblings, 2 replies; 6+ messages in thread
From: Jonathan del Strother @ 2011-12-08 18:19 UTC (permalink / raw)
  To: mongrel-unicorn

Hi,
We're using unicorn as a Rails server on Solaris, and it's been
running great for several months.  We've recently been having a few
problems and I'm at a loss what might cause it.  A number of times in
the past few days, our unicorn slaves keep timing out & the master
keeps restarting them.  unicorn.log looks something like :

E, [2011-12-08T18:11:32.912237 #26661] ERROR -- : worker=5 PID:15367
timeout (61s > 60s), killing
E, [2011-12-08T18:11:32.952041 #26661] ERROR -- : reaped
#<Process::Status: pid 15367 SIGKILL (signal 9)> worker=5
I, [2011-12-08T18:11:32.985925 #17824]  INFO -- : worker=5 ready
E, [2011-12-08T18:11:42.962869 #26661] ERROR -- : worker=3 PID:15499
timeout (61s > 60s), killing
E, [2011-12-08T18:11:43.003741 #26661] ERROR -- : reaped
#<Process::Status: pid 15499 SIGKILL (signal 9)> worker=3
I, [2011-12-08T18:11:43.043797 #18263]  INFO -- : worker=3 ready
E, [2011-12-08T18:11:51.016129 #26661] ERROR -- : worker=0 PID:16072
timeout (61s > 60s), killing
E, [2011-12-08T18:11:51.053344 #26661] ERROR -- : reaped
#<Process::Status: pid 16072 SIGKILL (signal 9)> worker=0
I, [2011-12-08T18:11:51.084365 #18532]  INFO -- : worker=0 ready
E, [2011-12-08T18:11:53.063737 #26661] ERROR -- : worker=4 PID:16126
timeout (61s > 60s), killing
E, [2011-12-08T18:11:53.117573 #26661] ERROR -- : reaped
#<Process::Status: pid 16126 SIGKILL (signal 9)> worker=4
I, [2011-12-08T18:11:53.154511 #18577]  INFO -- : worker=4 ready
E, [2011-12-08T18:11:59.130075 #26661] ERROR -- : worker=6 PID:16165
timeout (61s > 60s), killing
E, [2011-12-08T18:11:59.176834 #26661] ERROR -- : reaped
#<Process::Status: pid 16165 SIGKILL (signal 9)> worker=6
I, [2011-12-08T18:11:59.220115 #18814]  INFO -- : worker=6 ready
E, [2011-12-08T18:12:12.188372 #26661] ERROR -- : worker=7 PID:16931
timeout (61s > 60s), killing
E, [2011-12-08T18:12:12.245647 #26661] ERROR -- : reaped
#<Process::Status: pid 16931 SIGKILL (signal 9)> worker=7
I, [2011-12-08T18:12:12.282607 #19718]  INFO -- : worker=7 ready
E, [2011-12-08T18:12:29.256887 #26661] ERROR -- : worker=1 PID:17406
timeout (61s > 60s), killing
E, [2011-12-08T18:12:29.302142 #26661] ERROR -- : reaped
#<Process::Status: pid 17406 SIGKILL (signal 9)> worker=1
I, [2011-12-08T18:12:29.335366 #20731]  INFO -- : worker=1 ready
E, [2011-12-08T18:12:31.313734 #26661] ERROR -- : worker=2 PID:17659
timeout (61s > 60s), killing
E, [2011-12-08T18:12:31.357066 #26661] ERROR -- : reaped
#<Process::Status: pid 17659 SIGKILL (signal 9)> worker=2
I, [2011-12-08T18:12:31.390368 #20800]  INFO -- : worker=2 ready

- which seems like quite a lot of timeouts.  However, our database
connection, network, NFS etc all seem fine.  There's no useful errors
in the rails log files.  The master still serves occasional requests,
but drops a high percentage of them.

Can anyone suggest where to look into this?  I'm not even sure if that
timeout is occurring during the initial fork of the worker process, or
if it's timing out during a slow rails request.

-Jonathan
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply	[flat|nested] 6+ messages in thread