unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed
* More unexplained timeouts
@ 2013-09-29 20:13 nick
  2013-09-30  0:06 ` Eric Wong
  0 siblings, 1 reply; 2+ messages in thread
From: nick @ 2013-09-29 20:13 UTC (permalink / raw)
  To: mongrel-unicorn

We're still suffering from unexplained workers timing out.  We recently upgraded to the latest unicorn 4.6.3 (while still on REE 1.8.7) in the hopes that it would solve our issues.  Unfortunately, this seemed to exacerbate the problem, with timeouts happening more frequently, but that could be related to greater precision in timeouts in newer versions of unicorn.  (In our unicorn 3.6.2, a timeout set to 120s might not ACTUALLY timeout until 180s or more, thus allowing a bit more time for Ruby to finish whatever it was choking on.)

We dropped the timeout down to 65s (to make sure it was triggered) and then tried to add greater logging (per http://permalink.gmane.org/gmane.comp.lang.ruby.unicorn.general/1269.)  The START/FINISH approach confirms it's not an issue with our application code, ie:

HH:MM:SS- S/F[PID]- /PATH
15:21:01- START-25904- /pathA
15:21:01- FINISH-25904- /pathA
15:21:01- START-25904- /pathB
15:21:01- FINISH-25904- /pathB
15:21:01- START-25904- /pathC
15:21:01- FINISH-25904- /pathC
worker=11 PID:25904 timeout (66s > 65s), killing
reaped #<Process::Status: pid=25904,signaled(SIGKILL=9)> worker=11

For each START we always get a corresponding FINISH and then the worker is killed.  Additionally, our nginx logs confirm that this last request was sent back to the client.  No 'upstream' errors in our nginx log, either.

When we tried the Thread sleep approach, nothing actually appeared in the logs.  I imagine this means that ruby or some C extension is misbehaving.

Unfortunately, it's been impossible for us to recreate this in development.  

Thoughts?

RHEL 5.6
REE 1.8.7 2011.12
Unicorn 4.6.3
16 unicorn workers on 8 cores
No swap activity, no peaks in load

Again, thanks for all your help!

-Nick

_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: More unexplained timeouts
  2013-09-29 20:13 More unexplained timeouts nick
@ 2013-09-30  0:06 ` Eric Wong
  0 siblings, 0 replies; 2+ messages in thread
From: Eric Wong @ 2013-09-30  0:06 UTC (permalink / raw)
  To: unicorn list

nick@auger.net wrote:
> We're still suffering from unexplained workers timing out.  We
> recently upgraded to the latest unicorn 4.6.3 (while still on REE
> 1.8.7) in the hopes that it would solve our issues.  Unfortunately,
> this seemed to exacerbate the problem, with timeouts happening more
> frequently, but that could be related to greater precision in timeouts
> in newer versions of unicorn.  (In our unicorn 3.6.2, a timeout set to
> 120s might not ACTUALLY timeout until 180s or more, thus allowing a
> bit more time for Ruby to finish whatever it was choking on.)

Yes, there were some fixes in 4.x to improve the timeout accuracy.

> We dropped the timeout down to 65s (to make sure it was triggered) and
> then tried to add greater logging (per
> http://permalink.gmane.org/gmane.comp.lang.ruby.unicorn.general/1269.)
> The START/FINISH approach confirms it's not an issue with our
> application code, ie:
> 
> HH:MM:SS- S/F[PID]- /PATH
> 15:21:01- START-25904- /pathA
> 15:21:01- FINISH-25904- /pathA
> 15:21:01- START-25904- /pathB
> 15:21:01- FINISH-25904- /pathB
> 15:21:01- START-25904- /pathC
> 15:21:01- FINISH-25904- /pathC
> worker=11 PID:25904 timeout (66s > 65s), killing
> reaped #<Process::Status: pid=25904,signaled(SIGKILL=9)> worker=11
> 
> For each START we always get a corresponding FINISH and then the
> worker is killed.  Additionally, our nginx logs confirm that this last
> request was sent back to the client.  No 'upstream' errors in our
> nginx log, either.
> 
> When we tried the Thread sleep approach, nothing actually appeared in
> the logs.  I imagine this means that ruby or some C extension is
> misbehaving.

Sounds like it.  1.8 and old C extensions could easily lock up the
interpreter on blocking calls.

Another problem could be using new versions of C extensions that are no
longer tested under 1.8.  I admit I haven't tested recent versions of
unicorn/kgio/raindrops on 1.8 lately, either, but I'm _fairly_ sure they
still work since they haven't changed much.

> Unfortunately, it's been impossible for us to recreate this in
> development.  

Are you running any different gems/extensions in development vs
production?

> Thoughts?
> 
> RHEL 5.6
> REE 1.8.7 2011.12
> Unicorn 4.6.3
> 16 unicorn workers on 8 cores
> No swap activity, no peaks in load

What other gems/extensions do you use?
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-09-30  0:07 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-29 20:13 More unexplained timeouts nick
2013-09-30  0:06 ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).