unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed
* Worker Timeout Debugging
@ 2013-04-20  0:52 Bill Vieux
  2013-04-20  1:26 ` Eric Wong
  0 siblings, 1 reply; 3+ messages in thread
From: Bill Vieux @ 2013-04-20  0:52 UTC (permalink / raw)
  To: mongrel-unicorn

I am getting occasional worker timeouts for a Rails app hosted on
Heroku. I have rack-timeout set at the top of the middleware with a
shorter timeout than unicorn workers, but it is not firing for some
reason.

Are there any recommended techniques to determine the call stack when
the worker is reaped?

The solutions that come to mind for me seem to require running a
customized build of unicorn. For example: start a script (e.g., gdb to
attach and core dump the worker) before (or in place of) sending the
SIGKILL.
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Worker Timeout Debugging
  2013-04-20  0:52 Worker Timeout Debugging Bill Vieux
@ 2013-04-20  1:26 ` Eric Wong
  2013-04-20  2:32   ` Eric Wong
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Wong @ 2013-04-20  1:26 UTC (permalink / raw)
  To: unicorn list

Bill Vieux <billv@yahoo.com> wrote:
> I am getting occasional worker timeouts for a Rails app hosted on
> Heroku. I have rack-timeout set at the top of the middleware with a
> shorter timeout than unicorn workers, but it is not firing for some
> reason.

Which version of Ruby is this and what C extensions are you using?
This is probably a buggy C extension which blocks the VM.

> Are there any recommended techniques to determine the call stack when
> the worker is reaped?

Not the call stack, but you can get the Rails endpoint regardless of
Ruby version:

  Ensure your Rails logger is configured to log the PID at the start
  of every request.  (I think Rails logs parameters by default for
  every request).

  Match up the killed workers logging from unicorn to the PIDs that
  started a request (but never logged a completion) in the Rails log.

> The solutions that come to mind for me seem to require running a
> customized build of unicorn. For example: start a script (e.g., gdb to
> attach and core dump the worker) before (or in place of) sending the
> SIGKILL.

If you're using Ruby 1.9 or later, maybe sending SIGBUS/SIGSEGV can work
to trigger a Ruby core dump.

Do not attempt to install SIGSEGV/BUS handler(s) via Ruby, Ruby 1.9
already handles those internally.  Ruby 2.0.0 prevents trapping SEGV/BUS
with Ruby-level Signal#trap handlers, even.
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Worker Timeout Debugging
  2013-04-20  1:26 ` Eric Wong
@ 2013-04-20  2:32   ` Eric Wong
  0 siblings, 0 replies; 3+ messages in thread
From: Eric Wong @ 2013-04-20  2:32 UTC (permalink / raw)
  To: unicorn list

Eric Wong <normalperson@yhbt.net> wrote:
> If you're using Ruby 1.9 or later, maybe sending SIGBUS/SIGSEGV can work
> to trigger a Ruby core dump.
> 
> Do not attempt to install SIGSEGV/BUS handler(s) via Ruby, Ruby 1.9
> already handles those internally.  Ruby 2.0.0 prevents trapping SEGV/BUS
> with Ruby-level Signal#trap handlers, even.

Totally untested, but this may work (use "timeout seconds, :SIGSEGV"
in your config file).

diff --git a/lib/unicorn/configurator.rb b/lib/unicorn/configurator.rb
index 0d0eac7..7599d63 100644
--- a/lib/unicorn/configurator.rb
+++ b/lib/unicorn/configurator.rb
@@ -32,6 +32,7 @@ class Unicorn::Configurator
   # Default settings for Unicorn
   DEFAULTS = {
     :timeout => 60,
+    :timeout_sig => :SIGKILL,
     :logger => Logger.new($stderr),
     :worker_processes => 1,
     :after_fork => lambda { |server, worker|
@@ -179,6 +180,10 @@ def before_exec(*args, &block)
   # low-complexity, low-overhead implementation, timeouts of less
   # than 3.0 seconds can be considered inaccurate and unsafe.
   #
+  # This timeout is only intended as the last line of defense.
+  # See http://unicorn.bogomips.org/Application_Timeouts.html for
+  # an explanation.
+  #
   # For running Unicorn behind nginx, it is recommended to set
   # "fail_timeout=0" for in your nginx configuration like this
   # to have nginx always retry backends that may have had workers
@@ -195,11 +200,30 @@ def before_exec(*args, &block)
   #      server 192.168.0.8:8080 fail_timeout=0;
   #      server 192.168.0.9:8080 fail_timeout=0;
   #    }
-  def timeout(seconds)
+  #
+  # Optionally, unicorn may be configured to (ab)use Ruby VM internals
+  # by sending :SIGSEGV or :SIGBUS to generate a backtrace with debugging
+  # information.  Users must not attempt to install :SIGSEGV or :SIGBUS
+  # handlers via Ruby (Ruby 2.0.0 and later explicitly prevents this).
+  # This feature is experimental, potentially confusing, and may not be
+  # as reliable as using the default signal (:SIGKILL)
+  def timeout(seconds, signal = :SIGKILL)
     set_int(:timeout, seconds, 3)
     # POSIX says 31 days is the smallest allowed maximum timeout for select()
     max = 30 * 60 * 60 * 24
     set[:timeout] = seconds > max ? max : seconds
+
+    # Allow users to (ab)use Ruby VM internal sig handlers for timeout
+    # handling.  MatzRuby 1.9 installs handlers for SIGBUS and SIGSEGV
+    # which continue to work when the VM is wedged.  Rubinius appears to
+    # have similar handling of SIGBUS/SIGSEGV
+    case signal
+    when :SIGSEGV, :SIGBUS, :SIGKILL
+      set[:timeout_sig] = signal
+    else
+      raise ArgumentError,
+        "timeout signal must be one of: :SIGSEGV, :SIGBUS, or :SIGKILL"
+    end
   end
 
   # sets the current number of worker_processes to +nr+.  Each worker
diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb
index cc0a705..b245ec8 100644
--- a/lib/unicorn/http_server.rb
+++ b/lib/unicorn/http_server.rb
@@ -16,7 +16,8 @@ class Unicorn::HttpServer
                 :before_fork, :after_fork, :before_exec,
                 :listener_opts, :preload_app,
                 :reexec_pid, :orig_app, :init_listeners,
-                :master_pid, :config, :ready_pipe, :user
+                :master_pid, :config, :ready_pipe, :user,
+                :timeout_sig
 
   attr_reader :pid, :logger
   include Unicorn::SocketHelper
@@ -470,7 +471,7 @@ def murder_lazy_workers
       next_sleep = 0
       logger.error "worker=#{worker.nr} PID:#{wpid} timeout " \
                    "(#{diff}s > #{@timeout}s), killing"
-      kill_worker(:KILL, wpid) # take no prisoners for timeout violations
+      kill_worker(@timeout_sig, wpid)
     end
     next_sleep <= 0 ? 1 : next_sleep
   end
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2013-04-20  2:36 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-20  0:52 Worker Timeout Debugging Bill Vieux
2013-04-20  1:26 ` Eric Wong
2013-04-20  2:32   ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).