unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed
* worker freeze and strace interpretation
@ 2014-12-02 11:21 Jérémy Lecour
  2014-12-02 18:58 ` Eric Wong
  0 siblings, 1 reply; 3+ messages in thread
From: Jérémy Lecour @ 2014-12-02 11:21 UTC (permalink / raw)
  To: unicorn-public

Hi,

I've been trying to debug a weird situation on a Rails app (4.1) using
Unicorn (4.8.3).
Sometimes, some requests are hanging and I can't find why.


I've hooked "strace" to the workers and I have a lot of lines with
"Resource temporarily unavailable" and I wonder if it's normal.

Here is a snippet :

11587 write(7, "\27\3\3\1\253\36\274d\263\340\375\250d\374~X\364\306^\227{F\357~\223\347\245M\351-\360\301"...,
432 <unfinished ...>
11581 read(3,  <unfinished ...>
11587 <... write resumed> )             = 432 <0.000016>
11581 <... read resumed> 0x7f5b04878cc0, 1024) = -1 EAGAIN (Resource
temporarily unavailable) <0.000012>
11581 read(5,  <unfinished ...>
11587 write(7, "\27\3\3\t`+\22zIY\242\252L\346?n\245!\347c\251\341\fo\202Is\346\23\10\320\34"...,
2405 <unfinished ...>
11581 <... read resumed> "!", 1024)     = 1 <0.000016>
11587 <... write resumed> )             = 2405 <0.000015>
11581 read(5, 0x7f5b04878cc0, 1024)     = -1 EAGAIN (Resource
temporarily unavailable) <0.000014>
11587 read(7,  <unfinished ...>
11581 poll([{fd=3, events=POLLIN}], 1, 100 <unfinished ...>
11587 <... read resumed> 0x7f5af47c0123, 5) = -1 EAGAIN (Resource
temporarily unavailable) <0.000009>
11587 futex(0x7f5b04885054, FUTEX_WAKE_OP_PRIVATE, 1, 1,
0x7f5b04885050, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 <0.000005>
1184  <... futex resumed> )             = 0 <0.000226>
11587 futex(0x7f5b0488508c, FUTEX_WAIT_PRIVATE, 231, NULL <unfinished ...>
1184  futex(0x7f5b0488508c, FUTEX_WAKE_OP_PRIVATE, 1, 1,
0x7f5b04885088, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
11587 <... futex resumed> )             = -1 EAGAIN (Resource
temporarily unavailable) <0.000015>
1184  <... futex resumed> )             = 0 <0.000009>
11587 futex(0x7f5b04885020, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
1184  futex(0x7f5b04885020, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
11587 <... futex resumed> )             = -1 EAGAIN (Resource
temporarily unavailable) <0.000007>
1184  <... futex resumed> )             = 0 <0.000008>



Is there a better way to hook into a worker process when the request
is hanging, to see what it is doing ?



Also, I didn't find a way to instrument how a worker handles a request.
I was looking for a debug message when a new request is taken and when
it is returned.


Thanks for any help.

-- 
Jérémy Lecour :
http://jeremy.wordpress.com - http://twitter.com/jlecour

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: worker freeze and strace interpretation
  2014-12-02 11:21 worker freeze and strace interpretation Jérémy Lecour
@ 2014-12-02 18:58 ` Eric Wong
  2014-12-02 21:04   ` Jérémy Lecour
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Wong @ 2014-12-02 18:58 UTC (permalink / raw)
  To: Jérémy Lecour; +Cc: unicorn-public

Jérémy Lecour <jeremy.lecour@gmail.com> wrote:
> Hi,
> 
> I've been trying to debug a weird situation on a Rails app (4.1) using
> Unicorn (4.8.3).
> Sometimes, some requests are hanging and I can't find why.
> 
> 
> I've hooked "strace" to the workers and I have a lot of lines with
> "Resource temporarily unavailable" and I wonder if it's normal.

Depends on the FD, but from yours, yes, EAGAIN is common for
non-blocking IO.

> Here is a snippet :
> 
> 11587 write(7, "\27\3\3\1\253\36\274d\263\340\375\250d\374~X\364\306^\227{F\357~\223\347\245M\351-\360\301"...,
> 432 <unfinished ...>
> 11581 read(3,  <unfinished ...>

lsof -p $PID will tell you what FD=3 is, but it looks to me 11581 is the
timer thread.

That means 11587 could be the main thread which handles your app.
I'm more curious to know what FD=7 is (some encrypted service your
app connects to?)

> 11587 <... write resumed> )             = 432 <0.000016>
> 11581 <... read resumed> 0x7f5b04878cc0, 1024) = -1 EAGAIN (Resource
> temporarily unavailable) <0.000012>

1024 matches CCP_READ_BUFF_SIZE in the MRI sources (thread_pthread.c)
in recent-ish versions of Ruby, so yes,

> 11581 read(5,  <unfinished ...>
> 11587 write(7, "\27\3\3\t`+\22zIY\242\252L\346?n\245!\347c\251\341\fo\202Is\346\23\10\320\34"...,
> 2405 <unfinished ...>
> 11581 <... read resumed> "!", 1024)     = 1 <0.000016>

Yep, "!" is the character the MRI timer thread uses to communicate.

> 11587 <... write resumed> )             = 2405 <0.000015>
> 11581 read(5, 0x7f5b04878cc0, 1024)     = -1 EAGAIN (Resource
> temporarily unavailable) <0.000014>

Hmm, FD=5...  I suspect that's the other timer thread pipe.
Which version of Ruby are you using?  Newere ones have two
timer thread pipes.

> 11587 read(7,  <unfinished ...>
> 11581 poll([{fd=3, events=POLLIN}], 1, 100 <unfinished ...>
> 11587 <... read resumed> 0x7f5af47c0123, 5) = -1 EAGAIN (Resource
> temporarily unavailable) <0.000009>

OK, now FD=7 is probably the most interesting to your process.

> 11587 futex(0x7f5b04885054, FUTEX_WAKE_OP_PRIVATE, 1, 1,
> 0x7f5b04885050, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 <0.000005>
> 1184  <... futex resumed> )             = 0 <0.000226>

However, 1184 is a thread we haven't seen before.  The TID
is far off from 11581 and 11587, so it could've been spawned after TID
wraparound.

Just to confirm, is 11587 or 1184 the PID you saw in "ps" output?
(and unicorn logs, in the time period you were stracing)

(Of course, PIDs and TIDs get recycled over time, so they're only valid
 in the context while these straced processes was running)

> 11587 futex(0x7f5b0488508c, FUTEX_WAIT_PRIVATE, 231, NULL <unfinished ...>
> 1184  futex(0x7f5b0488508c, FUTEX_WAKE_OP_PRIVATE, 1, 1,
> 0x7f5b04885088, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
> 11587 <... futex resumed> )             = -1 EAGAIN (Resource
> temporarily unavailable) <0.000015>
> 1184  <... futex resumed> )             = 0 <0.000009>
> 11587 futex(0x7f5b04885020, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> 1184  futex(0x7f5b04885020, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
> 11587 <... futex resumed> )             = -1 EAGAIN (Resource
> temporarily unavailable) <0.000007>
> 1184  <... futex resumed> )             = 0 <0.000008>

Anything more after this?

So I'm curious for your process, what FD=7 is, and what TID=1184 is
doing.  That above futex calls looks like normal thread switching
behavior in MRI with no interesting syscalls going on for network I/O.

There could be some CPU-intensive stuff going on but no interesting
syscalls, so maybe we need to check CPU usage for this process in "top",
too.

> Is there a better way to hook into a worker process when the request
> is hanging, to see what it is doing ?

Maybe, but I usually don't need more than strace + lsof
(use lsof to figure out which FD is which).

If I need more, I prefer to I sprinkle warn() calls in the application
to figure out what and where things are going.  This can aid in
the strace, as you'll see your warn() calls become:

	write(2, "warning...", ...)

calls in strace.

unicorn always sets "$stderr.sync = true" immediately to ensure any
warnings are flushed to the OS so it's visible in strace immediately.

In C programs and C extensions, I may also use to write() on an invalid
(out-of-bounds too large or negative) FD to view the location in strace.

	int preserve_errno = errno; /* avoid errno-side effects */
	write(-1, "message", msg_len);
	errno = preserve_errno;

> Also, I didn't find a way to instrument how a worker handles a request.
> I was looking for a debug message when a new request is taken and when
> it is returned.

You can strace earlier and look for the following syscalls, in order:

- accept() (or accept4() on recent Linux) succeeding and
  returning the client FD

- read() succeeding on the client FD returned by accept()
  You'll see "something like "GET /... HTTP/1.0" from nginx, you may
  want to use "-s 16384" or some bigger number for big requests

  There's usually only one read() for small requests when talking
  to nginx.

- possibly lots of application processing here
  You might see more read() calls to the client FD if you're
  handling uploads

- write()s on the client FD you saw your HTTP requests with
  earlier.  You should see your HTTP response here, something
  like: write(FD, "HTTP/1.1 200 OK\r\n ...", ...)

- shutdown() + close() on the client FD.

Hope that helps.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: worker freeze and strace interpretation
  2014-12-02 18:58 ` Eric Wong
@ 2014-12-02 21:04   ` Jérémy Lecour
  0 siblings, 0 replies; 3+ messages in thread
From: Jérémy Lecour @ 2014-12-02 21:04 UTC (permalink / raw)
  To: Eric Wong; +Cc: unicorn-public

On Tue, Dec 2, 2014 at 7:58 PM, Eric Wong <e@80x24.org> wrote:
> Hope that helps.

Man, your answer is detailed. Thanks.

The PID 11587 was indeed for the worker process.

FD 7 might be a connection to NewRelic over HTTPS.

This app is doing a lot : connections to Redis, MySQL, Memcached, …
It uses Ruby 2.1.5.


If those EAGAIN are probably expected, I wouldn't dig deeper.
I'm definitely not in my comfort zone here and wanted to be sure that
there were not an elephant in the room that I couldn't see.

I'll try to put some warn() in the code at the beginning and end of
the request cycle to see if something shows up.


Thank you very much for your time and help.

-- 
Jérémy Lecour :
http://jeremy.wordpress.com - http://twitter.com/jlecour

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-12-02 21:04 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-02 11:21 worker freeze and strace interpretation Jérémy Lecour
2014-12-02 18:58 ` Eric Wong
2014-12-02 21:04   ` Jérémy Lecour

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).