From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-2.9 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00, URIBL_BLOCKED shortcircuit=no autolearn=unavailable version=3.3.2 X-Original-To: yahns-public@yhbt.net Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id DD9C11FABD; Sat, 9 May 2015 01:03:49 +0000 (UTC) Date: Sat, 9 May 2015 01:03:49 +0000 From: Eric Wong To: "Lin Jen-Shin (godfat)" Cc: yahns-public@yhbt.net, wildjcrt@gmail.com Subject: Re: What would happen if a worker thread died? Message-ID: <20150509010349.GA23261@dcvr.yhbt.net> References: <20150508170311.GA1260@dcvr.yhbt.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: List-Id: "Lin Jen-Shin (godfat)" wrote: > On Sat, May 9, 2015 at 1:03 AM, Eric Wong wrote: > > It's unfortunately difficult to detect thread death from ruby (no > > SIGCHLD handler unlike for processes) besides polling Thread#join > > > > We had this issue in ruby-core a few years back, but apparently > > it was forgotten/ignored by matz. Care to chime in? > > https://bugs.ruby-lang.org/issues/6647 > > I just sent a few characters, hope that would speed up the process. Thanks for reminding us of this, care to examine/fix some of the MRI test failures in the patch I posted to MRI? :) > >> I am aware that yahns is *extremely sensitive to fatal bugs in the > >> applications it hosts*, so I am just curious. > >> > >> For reference, Puma would immediately close the socket without > >> sending anything, and Unicorn would log the error backtrace and > >> kill the worker (If I read it correctly). Actually, unicorn doesn't do that explicitly, it's standard Ruby behavior for the main thread. > >> In this case, Unicorn helped me figure out what's happened. > > > > yahns can probably rescue Exception (or Object(!) like puma) > > and then log + abort the entire process. > > I think rescuing Object is misleading. AFAIK, we cannot raise > an instance which is not a kind of Exception. I guess, there's some internal non-object interrupts in MRI for threads (eKillSignal, eTerminateSignal) but I don't think those get exposed to Ruby-land... > I learned that rescuing Exception is a bad idea because like > signal handling and some other stuffs are also using Exception > to communicate, and of course we won't want to interfere. Right. > However for a worker thread, I guess that might be ok? Maybe limiting it to the common types {Standard,Load,Syntax}Error is sufficient. Below, I'm choosing to both leave the socket open and keep the worker running to slow down a potentially malicious client if this happens and to hopefully prevent an evil client from taking others down with it. The process may be in bad state from Load/SyntaxErrors anyways with partially loaded code, though. yahns cannot be made error-tolerant when given buggy code, but it should at least allow users to find problems since the Ruby default behavior sucks right now: diff --git a/lib/yahns/queue_epoll.rb b/lib/yahns/queue_epoll.rb index 4f3289e..2875920 100644 --- a/lib/yahns/queue_epoll.rb +++ b/lib/yahns/queue_epoll.rb @@ -64,7 +64,7 @@ class Yahns::Queue < SleepyPenguin::Epoll::IO # :nodoc: raise "BUG: #{io.inspect}#yahns_step returned: #{rv.inspect}" end end - rescue => e + rescue StandardError, LoadError, SyntaxError => e break if closed? # can still happen due to shutdown_timeout Yahns::Log.exception(logger, 'queue loop', e) end while true diff --git a/lib/yahns/queue_kqueue.rb b/lib/yahns/queue_kqueue.rb index 4176f7a..33f5f8b 100644 --- a/lib/yahns/queue_kqueue.rb +++ b/lib/yahns/queue_kqueue.rb @@ -72,7 +72,7 @@ class Yahns::Queue < SleepyPenguin::Kqueue::IO # :nodoc: raise "BUG: #{io.inspect}#yahns_step returned: #{rv.inspect}" end end - rescue => e + rescue StandardError, LoadError, SyntaxError => e break if closed? # can still happen due to shutdown_timeout Yahns::Log.exception(logger, 'queue loop', e) end while true Thoughts?