unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: "Jérémy Lecour" <jeremy.lecour@gmail.com>
Cc: unicorn-public@bogomips.org
Subject: Re: worker freeze and strace interpretation
Date: Tue, 2 Dec 2014 18:58:54 +0000	[thread overview]
Message-ID: <20141202185854.GA15690@dcvr.yhbt.net> (raw)
In-Reply-To: <CANrRGgq5SNDVSjTEO7GTLEaDA=cSjO23uR+GoXvzfAfvJDD0Yw@mail.gmail.com>

Jérémy Lecour <jeremy.lecour@gmail.com> wrote:
> Hi,
> 
> I've been trying to debug a weird situation on a Rails app (4.1) using
> Unicorn (4.8.3).
> Sometimes, some requests are hanging and I can't find why.
> 
> 
> I've hooked "strace" to the workers and I have a lot of lines with
> "Resource temporarily unavailable" and I wonder if it's normal.

Depends on the FD, but from yours, yes, EAGAIN is common for
non-blocking IO.

> Here is a snippet :
> 
> 11587 write(7, "\27\3\3\1\253\36\274d\263\340\375\250d\374~X\364\306^\227{F\357~\223\347\245M\351-\360\301"...,
> 432 <unfinished ...>
> 11581 read(3,  <unfinished ...>

lsof -p $PID will tell you what FD=3 is, but it looks to me 11581 is the
timer thread.

That means 11587 could be the main thread which handles your app.
I'm more curious to know what FD=7 is (some encrypted service your
app connects to?)

> 11587 <... write resumed> )             = 432 <0.000016>
> 11581 <... read resumed> 0x7f5b04878cc0, 1024) = -1 EAGAIN (Resource
> temporarily unavailable) <0.000012>

1024 matches CCP_READ_BUFF_SIZE in the MRI sources (thread_pthread.c)
in recent-ish versions of Ruby, so yes,

> 11581 read(5,  <unfinished ...>
> 11587 write(7, "\27\3\3\t`+\22zIY\242\252L\346?n\245!\347c\251\341\fo\202Is\346\23\10\320\34"...,
> 2405 <unfinished ...>
> 11581 <... read resumed> "!", 1024)     = 1 <0.000016>

Yep, "!" is the character the MRI timer thread uses to communicate.

> 11587 <... write resumed> )             = 2405 <0.000015>
> 11581 read(5, 0x7f5b04878cc0, 1024)     = -1 EAGAIN (Resource
> temporarily unavailable) <0.000014>

Hmm, FD=5...  I suspect that's the other timer thread pipe.
Which version of Ruby are you using?  Newere ones have two
timer thread pipes.

> 11587 read(7,  <unfinished ...>
> 11581 poll([{fd=3, events=POLLIN}], 1, 100 <unfinished ...>
> 11587 <... read resumed> 0x7f5af47c0123, 5) = -1 EAGAIN (Resource
> temporarily unavailable) <0.000009>

OK, now FD=7 is probably the most interesting to your process.

> 11587 futex(0x7f5b04885054, FUTEX_WAKE_OP_PRIVATE, 1, 1,
> 0x7f5b04885050, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 <0.000005>
> 1184  <... futex resumed> )             = 0 <0.000226>

However, 1184 is a thread we haven't seen before.  The TID
is far off from 11581 and 11587, so it could've been spawned after TID
wraparound.

Just to confirm, is 11587 or 1184 the PID you saw in "ps" output?
(and unicorn logs, in the time period you were stracing)

(Of course, PIDs and TIDs get recycled over time, so they're only valid
 in the context while these straced processes was running)

> 11587 futex(0x7f5b0488508c, FUTEX_WAIT_PRIVATE, 231, NULL <unfinished ...>
> 1184  futex(0x7f5b0488508c, FUTEX_WAKE_OP_PRIVATE, 1, 1,
> 0x7f5b04885088, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
> 11587 <... futex resumed> )             = -1 EAGAIN (Resource
> temporarily unavailable) <0.000015>
> 1184  <... futex resumed> )             = 0 <0.000009>
> 11587 futex(0x7f5b04885020, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> 1184  futex(0x7f5b04885020, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
> 11587 <... futex resumed> )             = -1 EAGAIN (Resource
> temporarily unavailable) <0.000007>
> 1184  <... futex resumed> )             = 0 <0.000008>

Anything more after this?

So I'm curious for your process, what FD=7 is, and what TID=1184 is
doing.  That above futex calls looks like normal thread switching
behavior in MRI with no interesting syscalls going on for network I/O.

There could be some CPU-intensive stuff going on but no interesting
syscalls, so maybe we need to check CPU usage for this process in "top",
too.

> Is there a better way to hook into a worker process when the request
> is hanging, to see what it is doing ?

Maybe, but I usually don't need more than strace + lsof
(use lsof to figure out which FD is which).

If I need more, I prefer to I sprinkle warn() calls in the application
to figure out what and where things are going.  This can aid in
the strace, as you'll see your warn() calls become:

	write(2, "warning...", ...)

calls in strace.

unicorn always sets "$stderr.sync = true" immediately to ensure any
warnings are flushed to the OS so it's visible in strace immediately.

In C programs and C extensions, I may also use to write() on an invalid
(out-of-bounds too large or negative) FD to view the location in strace.

	int preserve_errno = errno; /* avoid errno-side effects */
	write(-1, "message", msg_len);
	errno = preserve_errno;

> Also, I didn't find a way to instrument how a worker handles a request.
> I was looking for a debug message when a new request is taken and when
> it is returned.

You can strace earlier and look for the following syscalls, in order:

- accept() (or accept4() on recent Linux) succeeding and
  returning the client FD

- read() succeeding on the client FD returned by accept()
  You'll see "something like "GET /... HTTP/1.0" from nginx, you may
  want to use "-s 16384" or some bigger number for big requests

  There's usually only one read() for small requests when talking
  to nginx.

- possibly lots of application processing here
  You might see more read() calls to the client FD if you're
  handling uploads

- write()s on the client FD you saw your HTTP requests with
  earlier.  You should see your HTTP response here, something
  like: write(FD, "HTTP/1.1 200 OK\r\n ...", ...)

- shutdown() + close() on the client FD.

Hope that helps.

  reply	other threads:[~2014-12-02 18:58 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-02 11:21 worker freeze and strace interpretation Jérémy Lecour
2014-12-02 18:58 ` Eric Wong [this message]
2014-12-02 21:04   ` Jérémy Lecour

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://yhbt.net/unicorn/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141202185854.GA15690@dcvr.yhbt.net \
    --to=e@80x24.org \
    --cc=jeremy.lecour@gmail.com \
    --cc=unicorn-public@bogomips.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).