From mboxrd@z Thu Jan  1 00:00:00 1970
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on dcvr.yhbt.net
X-Spam-Level: *
X-Spam-ASN: AS33070 50.56.128.0/17
X-Spam-Status: No, score=1.0 required=3.0 tests=AWL,HK_RANDOM_FROM,
 MSGID_FROM_MTA_HEADER,TVD_RCVD_IP shortcircuit=no autolearn=no version=3.3.2
Path: news.gmane.org!not-for-mail
From: Eric Wong <normalperson-rMlxZR9MS24@public.gmane.org>
Newsgroups: gmane.comp.lang.ruby.rainbows.general
Subject: Re: Unicorn is killing our rainbows workers
Date: Thu, 19 Jul 2012 20:16:33 +0000
Message-ID: <20120719201633.GA8203@dcvr.yhbt.net>
References:
 <CAFFC5+MUdUoXhBXvw8VnnVAZsQpN1idELr0nc_Xm0HYcdtQVhA@mail.gmail.com>
 <20120718215222.GA11539@dcvr.yhbt.net>
 <CAFFC5+N=_bnyM=0WbtLxPAncs0TV4wA9P8TXZ_-T3qOtW-+w3Q@mail.gmail.com>
 <20120719002641.GA17210@dcvr.yhbt.net>
 <CAFFC5+NfChEobr7asqPx+3-U8_mHZqOgCLjRw=w6iCZ=z0-oCg@mail.gmail.com>
NNTP-Posting-Host: plane.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Trace: dough.gmane.org 1342729154 24921 80.91.229.3 (19 Jul 2012 20:19:14
 GMT)
X-Complaints-To: usenet@dough.gmane.org
NNTP-Posting-Date: Thu, 19 Jul 2012 20:19:14 +0000 (UTC)
Cc: Cody Fauser <cody.fauser-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org>, ops
 <ops-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org>, Harry Brundage
 <harry.brundage-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org>, Jonathan Rudenberg
 <jonathan.rudenberg-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org>
To: Rainbows! list <rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org>
Original-X-From:
 rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org Thu Jul 19
 22:19:14 2012
Return-path: <rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org>
Envelope-to: gclrrg-rainbows-talk@m.gmane.org
X-Original-To: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
Delivered-To: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
Content-Disposition: inline
In-Reply-To:
 <CAFFC5+NfChEobr7asqPx+3-U8_mHZqOgCLjRw=w6iCZ=z0-oCg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
X-BeenThere: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: <rainbows-public@bogomips.org>
List-Unsubscribe: <http://rubyforge.org/mailman/options/rainbows-talk>,
 <mailto:rainbows-talk-request-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org?subject=unsubscribe>
List-Archive: <http://rubyforge.org/pipermail/rainbows-talk>
List-Post: <rainbows-public@bogomips.org>
List-Help:
 <mailto:rainbows-talk-request-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org?subject=help>
List-Subscribe: <http://rubyforge.org/mailman/listinfo/rainbows-talk>,
 <mailto:rainbows-talk-request-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org?subject=subscribe>
Original-Sender:
 rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
Errors-To: rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
Xref: news.gmane.org gmane.comp.lang.ruby.rainbows.general:377
Archived-At:
 <http://permalink.gmane.org/gmane.comp.lang.ruby.rainbows.general/377>
Received: from 50-56-192-79.static.cloud-ips.com ([50.56.192.79]
 helo=rubyforge.org) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from
 <rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org>) id
 1SrxBk-0007MH-Mt for gclrrg-rainbows-talk@m.gmane.org; Thu, 19 Jul 2012
 22:19:09 +0200
Received: from localhost.localdomain (localhost [127.0.0.1]) by rubyforge.org
 (Postfix) with ESMTP id 660072E06D; Thu, 19 Jul 2012 20:19:07 +0000 (UTC)
Received: from dcvr.yhbt.net (dcvr.yhbt.net [64.71.152.64]) by rubyforge.org
 (Postfix) with ESMTP id 80A3E2E067 for
 <rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org>; Thu, 19 Jul 2012
 20:16:34 +0000 (UTC)
Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net
 (Postfix) with ESMTP id 1FB471F449; Thu, 19 Jul 2012 20:16:34 +0000 (UTC)

Samuel Kadolph <samuel.kadolph-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org> wrote:
> On Wed, Jul 18, 2012 at 8:26 PM, Eric Wong <normalperson-rMlxZR9MS24@public.gmane.org> wrote:
> > Samuel Kadolph <samuel.kadolph-BqItboTaHx1BDgjK7y7TUQ@public.gmane.org> wrote:
> >> On Wed, Jul 18, 2012 at 5:52 PM, Eric Wong <normalperson-rMlxZR9MS24@public.gmane.org> wrote:
> >> > Samuel Kadolph <samuel.kadolph-/3HedJEncLlQ0OI7PeSoCw@public.gmane.org> wrote:
> >> >> https://gist.github.com/9ec96922e55a59753997. Any insight into why
> >> >> unicorn is killing our ThreadPool workers would help us greatly. If
> >> >> you require additional info I would be happy to provide it.
> >
> > Also, are you using "preload_app true" ?
> 
> Yes we are using preload_app true.
> 
> > I'm a bit curious how these messages are happening, too:
> > D, [2012-07-18T15:12:43.185808 #17213] DEBUG -- : waiting 151.5s after
> > suspend/hibernation
> 
> They are strange. My current hunch is the killing and that message are
> symptoms of the same issue. Since it always follows a killing.

I wonder if there's some background thread one of your gems spawns on
load that causes the master to stall.  I'm not seeing how else unicorn
could think it was in suspend/hibernation.

> > Can you tell (from Rails logs) if the to-be-killed workers are still
> > processing requests/responses the 300s before when the unicorn timeout
> > hits it?  AFAIK, Rails logs the PID of each worker processing the
> > request.
> 
> rails doesn't log the pid but it would seem that after upgrading to
> mysql 0.2.18 it is no longer killing workers that are busy with
> requests.

Oops, I think I've been spoiled into thinking the Hodel3000CompliantLogger
is the default Rails logger :)

> > If anything, I'd lower the unicorn timeout to something low (maybe
> > 5-10s) since that detects hard lockups at the VM level.  Individual
> > requests in Rainbows! _are_ allowed to take longer than the unicorn
> > timeout.
> 
> We lowered the unicorn timeout to 5 seconds and but that did not
> change the killings but they seem to be happening less often. I have
> some of our stderr logs after setting the timeout to 5 seconds at
> https://gist.github.com/3144250.

Thanks for trying that!

> > Newer versions of mysql2 should avoid potential issues with
> > ThreadTimeout/Timeout (or anything that hits Thread#kill).  I think
> > mysql2 0.2.9 fixed a fairly important bug, and 0.2.18 fixed a very rare
> > (but possibly related to your issue) bug,
> 
> Upgrading mysql2 seems to have stopped unicorn from killing workers
> that are currently busy. We were stress testing it last night and
> after we upgraded to 0.2.18 we had no more 502s from the app but this
> could be a coincidence since the killings are still happen.

Alright, good to know 0.2.18 solved your problems.  Btw, have you
noticed any general connectivity issues to your MySQL server?
There were quite a few bugfixes from 0.2.6..0.2.18, though.

Anyways, I'm happy your problem seems to be fixed with the mysql2
upgrade :)

> Our ops guys say we had this problem before we were using ThreadTimeout.

OK.  That's somewhat reassuring to know (especially since the culprit
seems to be an old mysql2 gem).  I've had other users (privately) report
issues with recursive locking because of ensure clauses (e.g.
Mutex#synchronize) that I forgot to document.
_______________________________________________
Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org
http://rubyforge.org/mailman/listinfo/rainbows-talk
Do not quote signatures (like this one) or top post when replying