Rainbows! Rack HTTP server user/dev discussion
 help / color / mirror / code / Atom feed
From: Samuel Kadolph <samuel.kadolph-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org>
To: "Rainbows! list" <rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org>
Cc: Cody Fauser <cody.fauser-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org>,
	ops <ops-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org>,
	Harry Brundage
	<harry.brundage-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org>,
	Jonathan Rudenberg
	<jonathan.rudenberg-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org>
Subject: Re: Unicorn is killing our rainbows workers
Date: Thu, 19 Jul 2012 20:23:35 -0400	[thread overview]
Message-ID: <CAFFC5+MKdkmLknbLeRzMNzfTVoyj9JDahFSd1Nb90vsbgS4fuQ@mail.gmail.com> (raw)
In-Reply-To: <20120719213125.GA17708-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>

On Thu, Jul 19, 2012 at 5:31 PM, Eric Wong <normalperson-rMlxZR9MS24@public.gmane.org> wrote:
> Samuel Kadolph <samuel.kadolph-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org> wrote:
>> On Thu, Jul 19, 2012 at 4:16 PM, Eric Wong <normalperson-rMlxZR9MS24@public.gmane.org> wrote:
>> > Samuel Kadolph <samuel.kadolph-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org> wrote:
>> > > On Wed, Jul 18, 2012 at 8:26 PM, Eric Wong <normalperson-rMlxZR9MS24@public.gmane.org> wrote:
>> > > > Samuel Kadolph <samuel.kadolph-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org> wrote:
>> > > >> On Wed, Jul 18, 2012 at 5:52 PM, Eric Wong <normalperson-rMlxZR9MS24@public.gmane.org> wrote:
>> > > >> > Samuel Kadolph <samuel.kadolph-/3HedJEncLlQ0OI7PeSoCw@public.gmane.org> wrote:
>> > > >> >> https://gist.github.com/9ec96922e55a59753997. Any insight into why
>> > > >> >> unicorn is killing our ThreadPool workers would help us greatly. If
>> > > >> >> you require additional info I would be happy to provide it.
>> > > >
>> > > > Also, are you using "preload_app true" ?
>> > >
>> > > Yes we are using preload_app true.
>> > >
>> > > > I'm a bit curious how these messages are happening, too:
>> > > > D, [2012-07-18T15:12:43.185808 #17213] DEBUG -- : waiting 151.5s after
>> > > > suspend/hibernation
>> > >
>> > > They are strange. My current hunch is the killing and that message are
>> > > symptoms of the same issue. Since it always follows a killing.
>> >
>> > I wonder if there's some background thread one of your gems spawns on
>> > load that causes the master to stall.  I'm not seeing how else unicorn
>> > could think it was in suspend/hibernation.
>
>> > Anyways, I'm happy your problem seems to be fixed with the mysql2
>> > upgrade :)
>>
>> Unfortunately that didn't fix the problem. We had a large sale today
>> and had 2 502s. We're going to try p194 on next week and I'll let you
>> know if that fixes it.
>
> Are you seeing the same errors as before in stderr for those?

Yeah, we get the same killing, reaping and suspend/hibernation
messages with the 5 second timeout. Upgrading mysql2 seemed to have
prevented any 502s during our stress tests but we that was no the
case.

> Can you also try disabling preload_app?
>
> But before disabling preload_app, you can also check a few things on
> a running master?
>
> * "lsof -p <pid_of_master>"
>
>   To see if there's odd connections the master is making.
>
> * Assuming you're on Linux, can you also check for any other threads
>   the master might be running (and possibly stuck on)?
>
>     ls /proc/<pid_of_master>/task/
>
>   The output should be 2 directories:
>
>     <pid_of_master>/
>     <tid_of_timer_thread>/
>
>   If you have a 3rd entry, you can confirm something in your app one of
>   your gems is spawning a background thread which could be throwing
>   the master off...

I'll see if we can try this tomorrow but it will probably be on Monday.

>> > > Our ops guys say we had this problem before we were using ThreadTimeout.
>> >
>> > OK.  That's somewhat reassuring to know (especially since the culprit
>> > seems to be an old mysql2 gem).  I've had other users (privately) report
>> > issues with recursive locking because of ensure clauses (e.g.
>> > Mutex#synchronize) that I forgot to document.
>>
>> We're going to try going without ThreadTimeout again to make sure
>> that's not the issue.
>
> Alright.
>
> Btw, I also suggest any Rails/application-level logs include the PID and
> timestamp of the request.  This way you can see and correlate the worker
> killing the request to when/if the Rails app stopped processing
> requests.

We found that one of our servers was actually out of the ELB pool so
it wasn't getting pinged constantly and it does not have any killing
messages (other than deploys, which also had the suspend/hibernation
messages). We'll have more time free next week to dig further into
this.
_______________________________________________
Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying


  parent reply	other threads:[~2012-07-20  0:23 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-18 18:52 Unicorn is killing our rainbows workers Samuel Kadolph
     [not found] ` <CAFFC5+MUdUoXhBXvw8VnnVAZsQpN1idELr0nc_Xm0HYcdtQVhA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-18 19:20   ` Jason Lewis
2012-07-18 21:52   ` Eric Wong
     [not found]     ` <20120718215222.GA11539-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
2012-07-18 23:06       ` Samuel Kadolph
     [not found]         ` <CAFFC5+N=_bnyM=0WbtLxPAncs0TV4wA9P8TXZ_-T3qOtW-+w3Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-19  0:26           ` Eric Wong
     [not found]             ` <20120719002641.GA17210-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
2012-07-19 14:29               ` Samuel Kadolph
     [not found]                 ` <CAFFC5+NfChEobr7asqPx+3-U8_mHZqOgCLjRw=w6iCZ=z0-oCg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-19 20:16                   ` Eric Wong
     [not found]                     ` <20120719201633.GA8203-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
2012-07-19 20:57                       ` Samuel Kadolph
     [not found]                         ` <CAFFC5+NiPhu3oyEZ8woDdmH1zdPDDy9-fK3FhWPqv-6u=yFxgg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-19 21:31                           ` Eric Wong
     [not found]                             ` <20120719213125.GA17708-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
2012-07-20  0:23                               ` Samuel Kadolph [this message]
     [not found]                                 ` <CAFFC5+MKdkmLknbLeRzMNzfTVoyj9JDahFSd1Nb90vsbgS4fuQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-26 23:48                                   ` Eric Wong
     [not found]                                     ` <20120726234845.GA29453-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
2012-07-27  0:00                                       ` Samuel Kadolph
     [not found]                                         ` <CAFFC5+PvKhbRWH9aLKgc3k-z+2tEPpqLrMa5+6mEUnO2K_X+9Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-27  0:11                                           ` Eric Wong
     [not found]                                             ` <20120727001125.GA30957-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
2012-07-27 20:01                                               ` Samuel Kadolph
     [not found]                                                 ` <CAFFC5+MqyVEfLJN2rxae7_NPOT=8+X4cBbTz6YYgLzuC8ySXjg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-27 20:40                                                   ` Eric Wong
     [not found]                                                     ` <20120727204040.GA2192-yBiyF41qdooeIZ0/mPfg9Q@public.gmane.org>
2012-07-31 14:09                                                       ` Samuel Kadolph
     [not found]                                                         ` <CAFFC5+OYa5+nVqLFnzVkfAyq8WU57QztkvcP5tdSBDWU-2+SaQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-07-31 20:28                                                           ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://yhbt.net/rainbows/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFFC5+MKdkmLknbLeRzMNzfTVoyj9JDahFSd1Nb90vsbgS4fuQ@mail.gmail.com \
    --to=samuel.kadolph-bqitbotahx1bdgjk7y7tuq@public.gmane.org \
    --cc=cody.fauser-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org \
    --cc=harry.brundage-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org \
    --cc=jonathan.rudenberg-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org \
    --cc=ops-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org \
    --cc=rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhbt.net/rainbows.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).