From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on dcvr.yhbt.net X-Spam-Level: * X-Spam-ASN: AS33070 50.56.128.0/17 X-Spam-Status: No, score=1.0 required=3.0 tests=AWL,HK_RANDOM_FROM, MSGID_FROM_MTA_HEADER,TVD_RCVD_IP shortcircuit=no autolearn=no version=3.3.2 Path: news.gmane.org!not-for-mail From: Eric Wong Newsgroups: gmane.comp.lang.ruby.rainbows.general Subject: Re: Unicorn is killing our rainbows workers Date: Thu, 19 Jul 2012 14:31:25 -0700 Message-ID: <20120719213125.GA17708@dcvr.yhbt.net> References: <20120718215222.GA11539@dcvr.yhbt.net> <20120719002641.GA17210@dcvr.yhbt.net> <20120719201633.GA8203@dcvr.yhbt.net> NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: dough.gmane.org 1342733496 26480 80.91.229.3 (19 Jul 2012 21:31:36 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Thu, 19 Jul 2012 21:31:36 +0000 (UTC) Cc: Cody Fauser , ops , Harry Brundage , Jonathan Rudenberg To: Rainbows! list Original-X-From: rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org Thu Jul 19 23:31:35 2012 Return-path: Envelope-to: gclrrg-rainbows-talk@m.gmane.org X-Original-To: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org Delivered-To: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org Errors-To: rainbows-talk-bounces-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org Xref: news.gmane.org gmane.comp.lang.ruby.rainbows.general:379 Archived-At: Received: from 50-56-192-79.static.cloud-ips.com ([50.56.192.79] helo=rubyforge.org) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1SryJp-00077n-Gi for gclrrg-rainbows-talk@m.gmane.org; Thu, 19 Jul 2012 23:31:33 +0200 Received: from localhost.localdomain (localhost [127.0.0.1]) by rubyforge.org (Postfix) with ESMTP id DC3D32E068; Thu, 19 Jul 2012 21:31:31 +0000 (UTC) Received: from dcvr.yhbt.net (dcvr.yhbt.net [64.71.152.64]) by rubyforge.org (Postfix) with ESMTP id B4FBE2E063 for ; Thu, 19 Jul 2012 21:31:26 +0000 (UTC) Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id CCFEF1F449; Thu, 19 Jul 2012 21:31:25 +0000 (UTC) Samuel Kadolph wrote: > On Thu, Jul 19, 2012 at 4:16 PM, Eric Wong wrote: > > Samuel Kadolph wrote: > > > On Wed, Jul 18, 2012 at 8:26 PM, Eric Wong wrote: > > > > Samuel Kadolph wrote: > > > >> On Wed, Jul 18, 2012 at 5:52 PM, Eric Wong wrote: > > > >> > Samuel Kadolph wrote: > > > >> >> https://gist.github.com/9ec96922e55a59753997. Any insight into why > > > >> >> unicorn is killing our ThreadPool workers would help us greatly. If > > > >> >> you require additional info I would be happy to provide it. > > > > > > > > Also, are you using "preload_app true" ? > > > > > > Yes we are using preload_app true. > > > > > > > I'm a bit curious how these messages are happening, too: > > > > D, [2012-07-18T15:12:43.185808 #17213] DEBUG -- : waiting 151.5s after > > > > suspend/hibernation > > > > > > They are strange. My current hunch is the killing and that message are > > > symptoms of the same issue. Since it always follows a killing. > > > > I wonder if there's some background thread one of your gems spawns on > > load that causes the master to stall. I'm not seeing how else unicorn > > could think it was in suspend/hibernation. > > Anyways, I'm happy your problem seems to be fixed with the mysql2 > > upgrade :) > > Unfortunately that didn't fix the problem. We had a large sale today > and had 2 502s. We're going to try p194 on next week and I'll let you > know if that fixes it. Are you seeing the same errors as before in stderr for those? Can you also try disabling preload_app? But before disabling preload_app, you can also check a few things on a running master? * "lsof -p " To see if there's odd connections the master is making. * Assuming you're on Linux, can you also check for any other threads the master might be running (and possibly stuck on)? ls /proc//task/ The output should be 2 directories: / / If you have a 3rd entry, you can confirm something in your app one of your gems is spawning a background thread which could be throwing the master off... > > > Our ops guys say we had this problem before we were using ThreadTimeout. > > > > OK. That's somewhat reassuring to know (especially since the culprit > > seems to be an old mysql2 gem). I've had other users (privately) report > > issues with recursive locking because of ensure clauses (e.g. > > Mutex#synchronize) that I forgot to document. > > We're going to try going without ThreadTimeout again to make sure > that's not the issue. Alright. Btw, I also suggest any Rails/application-level logs include the PID and timestamp of the request. This way you can see and correlate the worker killing the request to when/if the Rails app stopped processing requests. _______________________________________________ Rainbows! mailing list - rainbows-talk-GrnCvJ7WPxnNLxjTenLetw@public.gmane.org http://rubyforge.org/mailman/listinfo/rainbows-talk Do not quote signatures (like this one) or top post when replying