From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on dcvr.yhbt.net X-Spam-Level: * X-Spam-ASN: AS33070 50.56.128.0/17 X-Spam-Status: No, score=1.3 required=5.0 tests=RDNS_NONE shortcircuit=no autolearn=no version=3.3.2 X-Original-To: archivist@yhbt.net Delivered-To: archivist@dcvr.yhbt.net Received: from rubyforge.org (unknown [50.56.192.79]) by dcvr.yhbt.net (Postfix) with ESMTP id 50F831F6BD for ; Wed, 21 Aug 2013 13:37:10 +0000 (UTC) Received: from localhost.localdomain (localhost [127.0.0.1]) by rubyforge.org (Postfix) with ESMTP id 720512E1DB; Wed, 21 Aug 2013 13:37:10 +0000 (UTC) X-Original-To: mongrel-unicorn@rubyforge.org Delivered-To: mongrel-unicorn@rubyforge.org Received: from smtp185.iad.emailsrvr.com (smtp185.iad.emailsrvr.com [207.97.245.185]) by rubyforge.org (Postfix) with ESMTP id D6B262E1D5 for ; Wed, 21 Aug 2013 13:33:59 +0000 (UTC) Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp38.relay.iad1a.emailsrvr.com (SMTP Server) with ESMTP id CA2A33485BB for ; Wed, 21 Aug 2013 09:33:58 -0400 (EDT) X-Virus-Scanned: OK Received: from legacy21.wa-web.iad1a (legacy21.wa-web.iad1a.rsapps.net [192.168.2.207]) by smtp38.relay.iad1a.emailsrvr.com (SMTP Server) with ESMTP id AA0DB348674 for ; Wed, 21 Aug 2013 09:33:58 -0400 (EDT) Received: from auger.net (localhost.localdomain [127.0.0.1]) by legacy21.wa-web.iad1a (Postfix) with ESMTP id 96ACD1CE806F for ; Wed, 21 Aug 2013 09:33:58 -0400 (EDT) Received: by apps.rackspace.com (Authenticated sender: nick@auger.net, from: nick@auger.net) with HTTP; Wed, 21 Aug 2013 09:33:58 -0400 (EDT) Date: Wed, 21 Aug 2013 09:33:58 -0400 (EDT) Subject: Re: A barrage of unexplained timeouts From: nick@auger.net To: "unicorn list" MIME-Version: 1.0 Importance: Normal X-Priority: 3 (Normal) X-Type: plain In-Reply-To: <20130820213257.GA26442@dcvr.yhbt.net> References: <1377010076.87748476@apps.rackspace.com> <20130820163745.GA22586@dcvr.yhbt.net> <1377019663.33997248@apps.rackspace.com> <20130820174018.GA25675@dcvr.yhbt.net> <1377022272.215912135@apps.rackspace.com> <20130820184902.GA31845@dcvr.yhbt.net> <1377029010.22657225@apps.rackspace.com> <20130820204244.GA3846@dcvr.yhbt.net> <1377033545.89791642@apps.rackspace.com> <20130820213257.GA26442@dcvr.yhbt.net> Message-ID: <1377092038.61531322@apps.rackspace.com> X-Mailer: webmail7.0 X-BeenThere: mongrel-unicorn@rubyforge.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: mongrel-unicorn-bounces@rubyforge.org Errors-To: mongrel-unicorn-bounces@rubyforge.org "Eric Wong" said: > nick@auger.net wrote: >> "Eric Wong" said: >> > nick@auger.net wrote: >> >> "Eric Wong" said: >> > I'm stumped :< >> >> I was afraid you'd say that :(. > > Actually, another potential issue is DNS lookups timing out. But they > shouldn't take *that* long... > >> > Do you have any background threads running that could be hanging the >> > workers? This is Ruby 1.8, after all, so there's more likely to be >> > some blocking call hanging the entire process. AFAIK, some monitoring >> > software runs a background thread in the unicorn worker and maybe the >> > OpenSSL extension doesn't work as well if it encountered network >> > problems under Ruby 1.8 >> >> We don't explicitly create any threads in our rails code. We do >> communicate with backgroundrb worker processes, although, none of the >> strangeness today involved any routes that would hit backgroundrb >> workers. > > I proactively audit every piece of code (including external > libraries/gems) loaded by an app for potentially blocking calls (hits to > the filesystem, socket calls w/o timeout/blocking). I use strace to > help me find that sometimes... > >> Is there any instrumentation that I could add that might help >> debugging in the future? ($request_time and $upstream_response_time >> are now in my nginx logs.) We have noticed these "unexplainable >> timeouts" before, but typically for a single worker. If there's some >> debugging that could be added I might be able to track it down during >> these one-off events. > > As an experiment, can you replay traffic a few minutes leading up to and > including that 7m period in a test setup with only one straced worker? I've replayed this in a development environment without triggering the issue. Lack of POST payloads and slight differences between the two environments, make this difficult to test. > Run "strace -T -f -o $FILE -p $PID_OF_WORKER" and see if there's any > unexpected/surprising dependencies (connect() to unrecognized addresses, > open() to networked filesystems, fcntl locks, etc...). > > You can play around with some other strace options (-v/-s SIZE/-e filters) I'll do my best to look at the output for anything out of the ordinary. Unfortunately, strace is a new tool for me and is a bit out of my wheelhouse. _______________________________________________ Unicorn mailing list - mongrel-unicorn@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying