From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS14383 205.234.109.0/24 X-Spam-Status: No, score=-0.5 required=5.0 tests=AWL,MSGID_FROM_MTA_HEADER, RP_MATCHES_RCVD shortcircuit=no autolearn=unavailable version=3.3.2 Path: news.gmane.org!not-for-mail From: Eric Wong Newsgroups: gmane.comp.lang.ruby.unicorn.general Subject: Re: Sending ABRT to timeout-errant process before KILL Date: Thu, 8 Sep 2011 19:13:52 +0000 Message-ID: <20110908191352.GA25251@dcvr.yhbt.net> References: <6310128B-282E-40C0-801B-15DBBE9E6AC3@engineyard.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: dough.gmane.org 1315512435 1677 80.91.229.12 (8 Sep 2011 20:07:15 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Thu, 8 Sep 2011 20:07:15 +0000 (UTC) Cc: "J. Austin Hughey" To: unicorn list Original-X-From: mongrel-unicorn-bounces@rubyforge.org Thu Sep 08 22:07:10 2011 Return-path: Envelope-to: gclrug-mongrel-unicorn@m.gmane.org X-Original-To: mongrel-unicorn@rubyforge.org Delivered-To: mongrel-unicorn@rubyforge.org Content-Disposition: inline In-Reply-To: <6310128B-282E-40C0-801B-15DBBE9E6AC3@engineyard.com> User-Agent: Mutt/1.5.20 (2009-06-14) X-BeenThere: mongrel-unicorn@rubyforge.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: mongrel-unicorn-bounces@rubyforge.org Errors-To: mongrel-unicorn-bounces@rubyforge.org Xref: news.gmane.org gmane.comp.lang.ruby.unicorn.general:1161 Archived-At: Received: from rubyforge.org ([205.234.109.19]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1R1ksQ-0002LY-1D for gclrug-mongrel-unicorn@m.gmane.org; Thu, 08 Sep 2011 22:07:10 +0200 Received: from rubyforge.org (rubyforge.org [127.0.0.1]) by rubyforge.org (Postfix) with ESMTP id 01AE01858300; Thu, 8 Sep 2011 16:07:06 -0400 (EDT) Received: from dcvr.yhbt.net (dcvr.yhbt.net [64.71.152.64]) by rubyforge.org (Postfix) with ESMTP id 7280A1858300 for ; Thu, 8 Sep 2011 15:13:53 -0400 (EDT) Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id B83E9210A9; Thu, 8 Sep 2011 19:13:52 +0000 (UTC) "J. Austin Hughey" wrote: > The general idea is that I'd like to have some way to "warn" the > application when it's about to be killed. I've patched > murder_lazy_workers to send an abort signal via kill_worker, sleep for > 5 seconds, then look and see if the process still exists by using > Process.getpgid. If so, it sends the original kill command, and if > not, a rescue block kicks in to prevent the raised error from > Process.getpgid from making things explode. The problem with anything other than SIGKILL (or SIGSTOP) is that it assumes the Ruby VM is working and in a good state. > I've created a simulation app, built on Rails 3.0, that uses a generic > "posts" controller to simulate a long-running request. Instead of > just throwing a straight-up sleep 65 in there, I have it running > through sleeping 1 second on a decrementing counter, and doing that 65 > times. The reason is because, assuming I've read the code correctly, > even with my "skip sleeping workers" commented line below, it'll skip > over the process, thus rendering my simulation of a long-running > process invalid. However, clarification on this point is certainly > welcome. You can see the app here: > https://github.com/jaustinhughey/unicorn_test/blob/master/app/controllers/posts_controller.rb (purely for educational purposes, since I'll point you towards another approach I believe is better) Signal.trap(:ABRT) do # Write some stuff to the Rails log logger.info "Caught Unicorn kill exception!" If this is the logger that ships with Ruby, it locks a Mutex, so it'll deadlock if another SIGABRT is received while logging the above statement (a very small window, admittedly). # Do a controlled disconnect from ActiveRecord ActiveRecord::Base.connection.disconnect! Likewise, if AR needs to lock internal structures before disconnecting, it also must be reentrant. Ruby's normal Mutex implementation is not reentrant-safe. > So it looks like Worker 1 is hitting a strange/false timeout of > 1315467289 seconds, which isn't really possible as it wasn't even > running 1315467289 seconds prior to that (which equates to roughly 41 > years ago if my math is right). You're getting this because you removed the following line: 0 == tick and next # skip workers that are sleeping sleeping means they haven't accepted a client connection, yet. Not sleeping while processing a client request. I'll clarify that in the code. > Needless to say, I'm a bit stumped at this point, and would sincerely > appreciate another point of view on this. Am I going about this all > wrong? Is there a better approach I should consider? And if I'm on > the right track, how can I get this to work regardless of how many > Unicorn workers are running? Since it's an application error, it can be done as middleware. You can try something like the Rainbows::ThreadTimeout middleware, it's currently Rainbows! specific but can easily be made to work with Unicorn. git clone git://bogomips.org/rainbows cat rainbows/lib/rainbows/thread_timeout.rb This is conceptually similar to "timeout" in the Ruby standard library, but does not allow nesting. I'll try to clarify more later today if you have questions, in a bit of a rush right now. -- Eric Wong _______________________________________________ Unicorn mailing list - mongrel-unicorn@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying