From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on dcvr.yhbt.net
X-Spam-Level: *
X-Spam-ASN: AS33070 50.56.128.0/17
X-Spam-Status: No, score=1.0 required=3.0 tests=AWL,HK_RANDOM_FROM,
 MSGID_FROM_MTA_HEADER,TVD_RCVD_IP shortcircuit=no autolearn=no version=3.3.2
Path: news.gmane.org!not-for-mail
From: Eric Wong <normalperson-rMlxZR9MS24@public.gmane.org>
Newsgroups: gmane.comp.lang.ruby.rainbows.general
Subject: Re: Unicorn is killing our rainbows workers
Date: Thu, 19 Jul 2012 14:31:25 -0700
Message-ID: <20120719213125.GA17708@dcvr.yhbt.net>
References:
 <CAFFC5+MUdUoXhBXvw8VnnVAZsQpN1idELr0nc_Xm0HYcdtQVhA@mail.gmail.com>
 <20120718215222.GA11539@dcvr.yhbt.net>
 <CAFFC5+N=_bnyM=0WbtLxPAncs0TV4wA9P8TXZ_-T3qOtW-+w3Q@mail.gmail.com>
 <20120719002641.GA17210@dcvr.yhbt.net>
 <CAFFC5+NfChEobr7asqPx+3-U8_mHZqOgCLjRw=w6iCZ=z0-oCg@mail.gmail.com>
 <20120719201633.GA8203@dcvr.yhbt.net>
 <CAFFC5+NiPhu3oyEZ8woDdmH1zdPDDy9-fK3FhWPqv-6u=yFxgg@mail.gmail.com>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Trace: dough.gmane.org 1342733496 26480 80.91.229.3 (19 Jul 2012 21:31:36
 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Thu, 19 Jul 2012 21:31:36 +0000 (UTC)
Cc: Cody Fauser <cody.fauser-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org>, ops
 <ops-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org>, Harry Brundage
 <harry.brundage-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org>, Jonathan Rudenberg
 <jonathan.rudenberg-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org>
To: Rainbows! list <rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org>
Original-X-From:
 rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org Thu Jul 19
 23:31:35 2012
Return-path: <rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org>
Envelope-to: gclrrg-rainbows-talk@m.gmane.org
X-Original-To: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
Delivered-To: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
Content-Disposition: inline
In-Reply-To:
 <CAFFC5+NiPhu3oyEZ8woDdmH1zdPDDy9-fK3FhWPqv-6u=yFxgg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-BeenThere: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: <rainbows-public@bogomips.org>
List-Unsubscribe: <http://rubyforge.org/mailman/options/rainbows-talk>,
 <mailto:rainbows-talk-request-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org?subject=unsubscribe>
List-Archive: <http://rubyforge.org/pipermail/rainbows-talk>
List-Post: <rainbows-public@bogomips.org>
List-Help:
 <mailto:rainbows-talk-request-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org?subject=help>
List-Subscribe: <http://rubyforge.org/mailman/listinfo/rainbows-talk>,
 <mailto:rainbows-talk-request-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org?subject=subscribe>
Original-Sender:
 rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
Errors-To: rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
Xref: news.gmane.org gmane.comp.lang.ruby.rainbows.general:379
Archived-At:
 <http://permalink.gmane.org/gmane.comp.lang.ruby.rainbows.general/379>
Received: from 50-56-192-79.static.cloud-ips.com ([50.56.192.79]
 helo=rubyforge.org) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from
 <rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org>) id
 1SryJp-00077n-Gi for gclrrg-rainbows-talk@m.gmane.org; Thu, 19 Jul 2012
 23:31:33 +0200
Received: from localhost.localdomain (localhost [127.0.0.1]) by rubyforge.org
 (Postfix) with ESMTP id DC3D32E068; Thu, 19 Jul 2012 21:31:31 +0000 (UTC)
Received: from dcvr.yhbt.net (dcvr.yhbt.net [64.71.152.64]) by rubyforge.org
 (Postfix) with ESMTP id B4FBE2E063 for
 <rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org>; Thu, 19 Jul 2012
 21:31:26 +0000 (UTC)
Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net
 (Postfix) with ESMTP id CCFEF1F449; Thu, 19 Jul 2012 21:31:25 +0000 (UTC)

Samuel Kadolph <samuel.kadolph-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org> wrote:
> On Thu, Jul 19, 2012 at 4:16 PM, Eric Wong <normalperson-rMlxZR9MS24@public.gmane.org> wrote:
> > Samuel Kadolph <samuel.kadolph-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org> wrote:
> > > On Wed, Jul 18, 2012 at 8:26 PM, Eric Wong <normalperson-rMlxZR9MS24@public.gmane.org> wrote:
> > > > Samuel Kadolph <samuel.kadolph-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org> wrote:
> > > >> On Wed, Jul 18, 2012 at 5:52 PM, Eric Wong <normalperson-rMlxZR9MS24@public.gmane.org> wrote:
> > > >> > Samuel Kadolph <samuel.kadolph-/3HedJEncLlQ0OI7PeSoCw@public.gmane.org> wrote:
> > > >> >> https://gist.github.com/9ec96922e55a59753997. Any insight into why
> > > >> >> unicorn is killing our ThreadPool workers would help us greatly. If
> > > >> >> you require additional info I would be happy to provide it.
> > > >
> > > > Also, are you using "preload_app true" ?
> > >
> > > Yes we are using preload_app true.
> > >
> > > > I'm a bit curious how these messages are happening, too:
> > > > D, [2012-07-18T15:12:43.185808 #17213] DEBUG -- : waiting 151.5s after
> > > > suspend/hibernation
> > >
> > > They are strange. My current hunch is the killing and that message are
> > > symptoms of the same issue. Since it always follows a killing.
> >
> > I wonder if there's some background thread one of your gems spawns on
> > load that causes the master to stall.  I'm not seeing how else unicorn
> > could think it was in suspend/hibernation.

> > Anyways, I'm happy your problem seems to be fixed with the mysql2
> > upgrade :)
> 
> Unfortunately that didn't fix the problem. We had a large sale today
> and had 2 502s. We're going to try p194 on next week and I'll let you
> know if that fixes it.

Are you seeing the same errors as before in stderr for those?

Can you also try disabling preload_app?

But before disabling preload_app, you can also check a few things on
a running master?

* "lsof -p <pid_of_master>"

  To see if there's odd connections the master is making.

* Assuming you're on Linux, can you also check for any other threads
  the master might be running (and possibly stuck on)?

    ls /proc/<pid_of_master>/task/

  The output should be 2 directories:

    <pid_of_master>/
    <tid_of_timer_thread>/

  If you have a 3rd entry, you can confirm something in your app one of
  your gems is spawning a background thread which could be throwing
  the master off...

> > > Our ops guys say we had this problem before we were using ThreadTimeout.
> >
> > OK.  That's somewhat reassuring to know (especially since the culprit
> > seems to be an old mysql2 gem).  I've had other users (privately) report
> > issues with recursive locking because of ensure clauses (e.g.
> > Mutex#synchronize) that I forgot to document.
> 
> We're going to try going without ThreadTimeout again to make sure
> that's not the issue.

Alright.

Btw, I also suggest any Rails/application-level logs include the PID and
timestamp of the request.  This way you can see and correlate the worker
killing the request to when/if the Rails app stopped processing
requests.
_______________________________________________
Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying