unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed
* funky process tree + stillborn masters
@ 2010-04-08 21:21 Jamie Wilkinson
  2010-04-08 23:55 ` Eric Wong
  0 siblings, 1 reply; 10+ messages in thread
From: Jamie Wilkinson @ 2010-04-08 21:21 UTC (permalink / raw)
  To: mongrel-unicorn

Since upgrading bundler (but applying the RUBYOPT path fixes) we've started experiencing intermittent, difficult-to-isolate USR2 restart failures.

After a USR2 signal our process tree winds up looking like this, with several master-esque processes listed as children (but without the "worker[N]" label):

app      14402  4.4  0.8 199612 70264 ?        S    14:07   0:04 unicorn_rails master -c config/unicorn.rb -E production -D
app      14433  0.0  0.8 204540 68504 ?        Sl   14:07   0:00  \_ unicorn_rails worker[0] -c config/unicorn.rb -E production -D
app      14435  0.0  0.8 204540 68508 ?        Sl   14:07   0:00  \_ unicorn_rails worker[1] -c config/unicorn.rb -E production -D
app      14438  0.0  0.8 199748 65840 ?        S    14:07   0:00  \_ /usr/bin/ruby1.8 /usr/bin/unicorn_rails -c config/unicorn.rb -E production -D
app      14440  0.0  0.8 204540 68508 ?        Sl   14:07   0:00  \_ unicorn_rails worker[3] -c config/unicorn.rb -E production -D
app      14442  0.0  0.8 204540 68508 ?        Sl   14:07   0:00  \_ unicorn_rails worker[4] -c config/unicorn.rb -E production -D
app      14445  0.0  0.8 199760 65840 ?        S    14:07   0:00  \_ /usr/bin/ruby1.8 /usr/bin/unicorn_rails -c config/unicorn.rb -E production -D
app      14447  0.0  0.8 204540 68508 ?        Sl   14:07   0:00  \_ unicorn_rails worker[6] -c config/unicorn.rb -E production -D
app      14449  0.0  0.8 204780 69272 ?        Sl   14:07   0:00  \_ unicorn_rails worker[7] -c config/unicorn.rb -E production -D

Sending another USR2 signal will bring a new master into the mix as a child, spins up a single child worker of its own (which also resembles the "/usr/bin/ruby1.8" master-esque processes), and then fails to continue. 

Further USR2 restarts will obviously do nothing, and we're forced to either kill -9 the stillborn master or cold-restart all of the unicorns. Nothing out of the ordinary is dumped to stderr or stdout

Starting unicorns fresh produces a nice process list where every child is listed cleanly as "unicorn_rails worker[N]"

We only have this issue in one of our applications, on a machine that has 1 Rails app & 2 Sinatra apps, all powered by nginx+unicorn. We've also only noticed this since upgrading to bundler from bundler08

Are the goofy worker processes in the process tree a real problem, or just a red herring?

-jamie
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: funky process tree + stillborn masters
  2010-04-08 21:21 funky process tree + stillborn masters Jamie Wilkinson
@ 2010-04-08 23:55 ` Eric Wong
  2010-04-09  1:20   ` Eric Wong
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Wong @ 2010-04-08 23:55 UTC (permalink / raw)
  To: unicorn list

Jamie Wilkinson <jamie@tramchase.com> wrote:
> Since upgrading bundler (but applying the RUBYOPT path fixes) we've
> started experiencing intermittent, difficult-to-isolate USR2 restart
> failures.
> 
> After a USR2 signal our process tree winds up looking like this, with
> several master-esque processes listed as children (but without the
> "worker[N]" label):
> 
> app      14402  4.4  0.8 199612 70264 ?        S    14:07   0:04 unicorn_rails master -c config/unicorn.rb -E production -D
> app      14433  0.0  0.8 204540 68504 ?        Sl   14:07   0:00  \_ unicorn_rails worker[0] -c config/unicorn.rb -E production -D
> app      14435  0.0  0.8 204540 68508 ?        Sl   14:07   0:00  \_ unicorn_rails worker[1] -c config/unicorn.rb -E production -D
> app      14438  0.0  0.8 199748 65840 ?        S    14:07   0:00  \_ /usr/bin/ruby1.8 /usr/bin/unicorn_rails -c config/unicorn.rb -E production -D
> app      14440  0.0  0.8 204540 68508 ?        Sl   14:07   0:00  \_ unicorn_rails worker[3] -c config/unicorn.rb -E production -D
> app      14442  0.0  0.8 204540 68508 ?        Sl   14:07   0:00  \_ unicorn_rails worker[4] -c config/unicorn.rb -E production -D
> app      14445  0.0  0.8 199760 65840 ?        S    14:07   0:00  \_ /usr/bin/ruby1.8 /usr/bin/unicorn_rails -c config/unicorn.rb -E production -D
> app      14447  0.0  0.8 204540 68508 ?        Sl   14:07   0:00  \_ unicorn_rails worker[6] -c config/unicorn.rb -E production -D
> app      14449  0.0  0.8 204780 69272 ?        Sl   14:07   0:00  \_ unicorn_rails worker[7] -c config/unicorn.rb -E production -D

> Sending another USR2 signal will bring a new master into the mix as a
> child, spins up a single child worker of its own (which also resembles
> the "/usr/bin/ruby1.8" master-esque processes), and then fails to
> continue. 

Hi Jamie,

Odd, if I had to guess PIDs 14438 and 14445 are actually worker[2] and
worker[5] based on the PIDs relative to other workers.  So the new
master died right away, which really should've been logged...

> Further USR2 restarts will obviously do nothing, and we're forced to
> either kill -9 the stillborn master or cold-restart all of the
> unicorns. Nothing out of the ordinary is dumped to stderr or stdout

Anything in your before_fork/after_fork hooks?  Since it looks like
you're on a Linux system, can you strace the master while you send
it a USR2 and see if anything strange happens?

Also, can you strace the weird looking processes above and see if
they're doing anything?

> Starting unicorns fresh produces a nice process list where every child
> is listed cleanly as "unicorn_rails worker[N]"
> 
> We only have this issue in one of our applications, on a machine that
> has 1 Rails app & 2 Sinatra apps, all powered by nginx+unicorn. We've
> also only noticed this since upgrading to bundler from bundler08

I assume you're using regular "unicorn" to run your Sinatra apps and not
"unicorn_rails".  I made some largish cleanups to both for the 0.97.0
release and and perhaps some bugs slipped into the "_rails" variant.

Can you try regular "unicorn" with a config.ru for Rails?  I've stolen
this from the Rainbows! FAQ (http://rainbows.rubyforge.org/FAQ.html):

  For Rails 2.3.x and later, the following config.ru will work for you:

    ENV["RAILS_ENV"] ||= ENV["RACK_ENV"]
    require "config/environment"
    use Rails::Rack::Static
    run ActionController::Dispatcher.new

  For older versions of Rails, the following config.ru will work:

    ENV["RAILS_ENV"] ||= ENV["RACK_ENV"]
    require 'config/boot'
    require 'config/environment'
    require 'unicorn/app/old_rails'
    require 'unicorn/app/old_rails/static' # not needed with Unicorn 0.95+
    use Unicorn::App::OldRails::Static
    run Unicorn::App::OldRails.new

  One thing to watch out for is that RAILS_ENV will not be set in the
  environment for you, thus we set it to match RACK_ENV.

> Are the goofy worker processes in the process tree a real problem, or
> just a red herring?

Not sure if it's a problem, but with Bundler I assume Rack itself is a
bundled dependency, but you're starting unicorn_rails out of
/usr/bin/unicorn_rails which indicates Unicorn is not bundled (and won't
use the bundled Rack).  Can you ensure your unbundled Rack is the same
version as the bundled one to be on the safe side?

I've yet to try bundler 0.9 (and have barely used earlier), but you can
also try bundling Unicorn and using the bundled bin/unicorn(_rails)
launchers instead to ensure a consistent environment.

Unicorn currently ends up (auto)loading "rack/utils" before the
application is loaded, maybe it could (auto)load it after the app is
loaded for preload_app users.

-- 
Eric Wong
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: funky process tree + stillborn masters
  2010-04-08 23:55 ` Eric Wong
@ 2010-04-09  1:20   ` Eric Wong
  2010-04-12  2:52     ` Eric Wong
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Wong @ 2010-04-09  1:20 UTC (permalink / raw)
  To: unicorn list

Eric Wong <normalperson@yhbt.net> wrote:
> I assume you're using regular "unicorn" to run your Sinatra apps and not
> "unicorn_rails".  I made some largish cleanups to both for the 0.97.0
> release and and perhaps some bugs slipped into the "_rails" variant.

<snip>

> Jamie Wilkinson <jamie@tramchase.com> wrote:
> > Are the goofy worker processes in the process tree a real problem, or
> > just a red herring?
> 
> Not sure if it's a problem, but with Bundler I assume Rack itself is a
> bundled dependency, but you're starting unicorn_rails out of
> /usr/bin/unicorn_rails which indicates Unicorn is not bundled (and won't
> use the bundled Rack).  Can you ensure your unbundled Rack is the same
> version as the bundled one to be on the safe side?
> 
> I've yet to try bundler 0.9 (and have barely used earlier), but you can
> also try bundling Unicorn and using the bundled bin/unicorn(_rails)
> launchers instead to ensure a consistent environment.
> 
> Unicorn currently ends up (auto)loading "rack/utils" before the
> application is loaded, maybe it could (auto)load it after the app is
> loaded for preload_app users.

Do the following two patches help?  I've also pushed out a few
cleanups to unicorn.git and also put up a prerelease gem at:

  http://unicorn.bogomips.org/files/gems/unicorn-0.97.0.7.g22e3.gem

Shortlog of changes since 0.97.0:

Eric Wong (7):
      tests: fix to run under Ruby 1.9.2dev
      KNOWN_ISSUES: document Array#shuffle / Array#sample issue under 1.9
      unicorn_rails: use TOPLEVEL_BINDING for eval
      unicorn_rails: respect user's encoding in config.ru in 1.9
      unicorn_rails: rename variable for consistency
      bin/*: remove unnecessary listeners variable
      unicorn: load constants after app has loaded

>From e1a72d58add4260feb6da56d9d588270173da74f Mon Sep 17 00:00:00 2001
From: Eric Wong <normalperson@yhbt.net>
Date: Thu, 8 Apr 2010 17:10:46 -0700
Subject: [PATCH] unicorn_rails: use TOPLEVEL_BINDING for eval

This is to ensure there are no namespace inconsistencies
---
 bin/unicorn_rails |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/bin/unicorn_rails b/bin/unicorn_rails
index 37ee027..de2361e 100755
--- a/bin/unicorn_rails
+++ b/bin/unicorn_rails
@@ -148,7 +148,7 @@ def rails_builder(config, daemonize)
     when /\.ru$/
       raw = File.open(config, "rb") { |fp| fp.sysread(fp.stat.size) }
       raw.sub!(/^__END__\n.*/, '')
-      eval("Rack::Builder.new {(#{raw}\n)}.to_app", nil, config)
+      eval("Rack::Builder.new {(#{raw}\n)}.to_app", TOPLEVEL_BINDING, config)
     else
       require config
       Object.const_get(File.basename(config, '.rb').capitalize)
-- 

>From 22e3ed4de0e89b97dac91c95c796eb8a7f93e5de Mon Sep 17 00:00:00 2001
From: Eric Wong <normalperson@yhbt.net>
Date: Thu, 8 Apr 2010 18:05:43 -0700
Subject: [PATCH] unicorn: load constants after app has loaded

This will help ensure we use the same version of Rack the
application uses and avoid loading conflicting/incompatible
versions.
---
 lib/unicorn.rb |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/unicorn.rb b/lib/unicorn.rb
index b63abeb..75ce09d 100644
--- a/lib/unicorn.rb
+++ b/lib/unicorn.rb
@@ -800,15 +800,15 @@ module Unicorn
 
     def build_app!
       if app.respond_to?(:arity) && app.arity == 0
-        # exploit COW in case of preload_app.  Also avoids race
-        # conditions in Rainbows! since load/require are not thread-safe
-        Unicorn.constants.each { |x| Unicorn.const_get(x) }
-
         if defined?(Gem) && Gem.respond_to?(:refresh)
           logger.info "Refreshing Gem list"
           Gem.refresh
         end
         self.app = app.call
+
+        # exploit COW in case of preload_app.  Also avoids race
+        # conditions in Rainbows! since load/require are not thread-safe
+        Unicorn.constants.each { |x| Unicorn.const_get(x) }
       end
     end

-- 
Eric Wong
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: funky process tree + stillborn masters
  2010-04-09  1:20   ` Eric Wong
@ 2010-04-12  2:52     ` Eric Wong
  2010-04-13  3:23       ` Jamie Wilkinson
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Wong @ 2010-04-12  2:52 UTC (permalink / raw)
  To: unicorn list

Eric Wong <normalperson@yhbt.net> wrote:
> Do the following two patches help?  I've also pushed out a few
> cleanups to unicorn.git and also put up a prerelease gem at:

Hi Jamie, did you get a chance to try this?

-- 
Eric Wong
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: funky process tree + stillborn masters
  2010-04-12  2:52     ` Eric Wong
@ 2010-04-13  3:23       ` Jamie Wilkinson
  2010-04-13  5:24         ` Eric Wong
  0 siblings, 1 reply; 10+ messages in thread
From: Jamie Wilkinson @ 2010-04-13  3:23 UTC (permalink / raw)
  To: unicorn list

Whew! After a long day of debugging, I think I've  gotten closer to isolating this.

I've upgraded us to your pre-release gem but I can still get it working (both before & after a lot of the below fixes)

>> After a USR2 signal our process tree winds up looking like this, with
>> several master-esque processes listed as children (but without the
>> "worker[N]" label):
>> 
>> app      14402  4.4  0.8 199612 70264 ?        S    14:07   0:04 unicorn_rails master -c config/unicorn.rb -E production -D
>> app      14433  0.0  0.8 204540 68504 ?        Sl   14:07   0:00  \_ unicorn_rails worker[0] -c config/unicorn.rb -E production -D
>> app      14435  0.0  0.8 204540 68508 ?        Sl   14:07   0:00  \_ unicorn_rails worker[1] -c config/unicorn.rb -E production -D
>> app      14438  0.0  0.8 199748 65840 ?        S    14:07   0:00  \_ /usr/bin/ruby1.8 /usr/bin/unicorn_rails -c config/unicorn.rb -E production -D
>> app      14440  0.0  0.8 204540 68508 ?        Sl   14:07   0:00  \_ unicorn_rails worker[3] -c config/unicorn.rb -E production -D
>> app      14442  0.0  0.8 204540 68508 ?        Sl   14:07   0:00  \_ unicorn_rails worker[4] -c config/unicorn.rb -E production -D
>> app      14445  0.0  0.8 199760 65840 ?        S    14:07   0:00  \_ /usr/bin/ruby1.8 /usr/bin/unicorn_rails -c config/unicorn.rb -E production -D
>> app      14447  0.0  0.8 204540 68508 ?        Sl   14:07   0:00  \_ unicorn_rails worker[6] -c config/unicorn.rb -E production -D
>> app      14449  0.0  0.8 204780 69272 ?        Sl   14:07   0:00  \_ unicorn_rails worker[7] -c config/unicorn.rb -E production -D
>> 
>> Sending another USR2 signal will bring a new master into the mix as a
>> child, spins up a single child worker of its own (which also resembles
>> the "/usr/bin/ruby1.8" master-esque processes), and then fails to
>> continue. 
> 
> Anything in your before_fork/after_fork hooks?  Since it looks like
> you're on a Linux system, can you strace the master while you send
> it a USR2 and see if anything strange happens?

The only real contents of our before_hook is a send-QUIT-on-first-worker, which I swapped out for the default SIGTTOU behavior. No change.


> I assume you're using regular "unicorn" to run your Sinatra apps and not
> "unicorn_rails".  I made some largish cleanups to both for the 0.97.0
> release and and perhaps some bugs slipped into the "_rails" variant.
> 
> Not sure if it's a problem, but with Bundler I assume Rack itself is a
> bundled dependency, but you're starting unicorn_rails out of
> /usr/bin/unicorn_rails which indicates Unicorn is not bundled (and won't
> use the bundled Rack).  Can you ensure your unbundled Rack is the same
> version as the bundled one to be on the safe side?
> 

My system & bundled rack versions match.

Swapped to vanilla "unicorn" instead of "unicorn_rails" -- also no dice

I switched to using "bundle exec unicorn", which uses RAILS_ROOT/vendor/bundler_gems/bin/unicorn instead of /usr/bin/unicorn. Was convinced this would be it, but no dice.

Attaching some relevant stack traces... the new "orphan" master & its 1st child are pretty boring while just hanging out:

Process 20738 attached - interrupt to quit
futex(0x2aaaaafb23c0, FUTEX_WAIT, 2, NULL

Sending a USR2 to the new, orphaned master...

Process 20738 attached - interrupt to quit
select(10, [9], [], [], {23, 661000})   = ? ERESTARTNOHAND (To be restarted)
--- SIGUSR2 (User defined signal 2) @ 0 (0) ---
rt_sigreturn(0xc)                       = -1 EINTR (Interrupted system call)
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
clock_gettime(CLOCK_MONOTONIC, {2782804, 359708496}) = 0
select(0, [], [], [], {0, 0})           = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
fcntl(5, F_GETFL)                       = 0x801 (flags O_WRONLY|O_NONBLOCK)
write(5, "."..., 1)                     = 1
clock_gettime(CLOCK_MONOTONIC, {2782804, 360046496}) = 0
select(10, [9], [], [], {20, 464341}


I've also produced straces of the *original* master during USR2 restarts, both a success trace and a failure trace. 
Here's a tarball with both complete traces as well as filtered/grepp'd ones:
http://jamiedubs.com/files/unicorn-strace.tgz

I've also found that kill -9'ing the 1st worker of the new orphaned master allows it to continue operation as normal (spinning up workers and taking control from the original master) -- suggesting something is up with just that first worker (!). I'm going to keep noodling with before_/after_fork strategies.

-jamie
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: funky process tree + stillborn masters
  2010-04-13  3:23       ` Jamie Wilkinson
@ 2010-04-13  5:24         ` Eric Wong
  2010-04-19 18:21           ` Eric Wong
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Wong @ 2010-04-13  5:24 UTC (permalink / raw)
  To: unicorn list

Jamie Wilkinson <jamie@tramchase.com> wrote:
> I've also produced straces of the *original* master during USR2
> restarts, both a success trace and a failure trace.  Here's a tarball
> with both complete traces as well as filtered/grepp'd ones:
> http://jamiedubs.com/files/unicorn-strace.tgz
> 
> I've also found that kill -9'ing the 1st worker of the new orphaned
> master allows it to continue operation as normal (spinning up workers
> and taking control from the original master) -- suggesting something
> is up with just that first worker (!). I'm going to keep noodling with
> before_/after_fork strategies.

Hi Jamie, your successful straces have this oddity in them:

  poll([{fd=7, events=POLLIN|POLLPRI}], 1, 0) = 1 ([{fd=7, revents=POLLIN|POLLHUP}
  read(7, ""..., 8192)                    = 0
  shutdown(7, 2 /* send and receive */)   = -1 ENOTCONN (Transport endpoint is not
  close(7)                                = 0

> Eric Wong <normalperson@yhbt.net> wrote:
> > Anything in your before_fork/after_fork hooks?  Since it looks like
> > you're on a Linux system, can you strace the master while you send
> > it a USR2 and see if anything strange happens?
> 
> The only real contents of our before_hook is a
> send-QUIT-on-first-worker, which I swapped out for the default SIGTTOU
> behavior. No change.

It looks like you should be disconnecting/reconnecting whatever is using
fd=7 in the before_fork/after_fork hooks respectively.

Unicorn (and mainline Ruby) do not use the poll() system call (which is
why that strace raised red flags), so there's another C extension
calling poll() in your master process (not good).  I remember one (or
all of) the Ruby Postgres libraries using poll() internally, and there
are likely other things that use poll() in Ruby as well.

Perhaps something like this in the config (assuming you're using
Rails ActiveRecord) (also see http://unicorn.bogomips.org/examples/)

  before_fork do |server,worker|
    defined?(ActiveRecord::Base) and
      ActiveRecord::Base.connection.disconnect!
  end

  after_fork do |server,worker|
    defined?(ActiveRecord::Base) and
      ActiveRecord::Base.establish_connection
  end

-- 
Eric Wong
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: funky process tree + stillborn masters
  2010-04-13  5:24         ` Eric Wong
@ 2010-04-19 18:21           ` Eric Wong
  2010-04-28  4:05             ` Jamie Wilkinson
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Wong @ 2010-04-19 18:21 UTC (permalink / raw)
  To: unicorn list; +Cc: Jamie Wilkinson

Hi Jamie, any more news on this issue?  I plan on doing a minor release
later today or tomorrow with a few minor bug fixes in unicorn.git

-- 
Eric Wong
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: funky process tree + stillborn masters
  2010-04-19 18:21           ` Eric Wong
@ 2010-04-28  4:05             ` Jamie Wilkinson
  2010-04-28  7:40               ` Eric Wong
  0 siblings, 1 reply; 10+ messages in thread
From: Jamie Wilkinson @ 2010-04-28  4:05 UTC (permalink / raw)
  To: Eric Wong; +Cc: unicorn list

Update from the trenches: I've traced this down to the newrelic_rpm agent

Noticed this is not the 1st time this has been brought up, since newrelic spins up the stats collector in its own thread.

Attempted the (old) logger mutex monkeypatch mentioned in the unicorn docs without luck. Noodled with various permutations of NewRelic::Agent.shutdown in before/after_fork without success. NewRelic apparently has some compat issues with bundler, but that didn't affect it, nor did switching to the plugin.

I'm running the latest newrelic_rpm agent (2.11.2) and the latest unicorn (0.97.1). 

I imagine this is contention over its logfile. Is there any low-hanging fruit I should try?

I've also filed a bug with NewRelic:
http://support.newrelic.com/discussions/support/2577-newrelic-agentbundler-causing-stillborn-unicorn-processes?unresolve=true

-jamie

On Apr 19, 2010, at 11:21 AM, Eric Wong wrote:

> Hi Jamie, any more news on this issue?  I plan on doing a minor release
> later today or tomorrow with a few minor bug fixes in unicorn.git
> 
> -- 
> Eric Wong

_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: funky process tree + stillborn masters
  2010-04-28  4:05             ` Jamie Wilkinson
@ 2010-04-28  7:40               ` Eric Wong
  2010-06-08 23:01                 ` Jamie Wilkinson
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Wong @ 2010-04-28  7:40 UTC (permalink / raw)
  To: Jamie Wilkinson; +Cc: unicorn list

Jamie Wilkinson <jamie@tramchase.com> wrote:
> Update from the trenches: I've traced this down to the newrelic_rpm
> agent
> 
> Noticed this is not the 1st time this has been brought up, since
> newrelic spins up the stats collector in its own thread.
> 
> Attempted the (old) logger mutex monkeypatch mentioned in the unicorn
> docs without luck. Noodled with various permutations of
> NewRelic::Agent.shutdown in before/after_fork without success.
> NewRelic apparently has some compat issues with bundler, but that
> didn't affect it, nor did switching to the plugin.
> 
> I'm running the latest newrelic_rpm agent (2.11.2) and the latest
> unicorn (0.97.1). 
> 
> I imagine this is contention over its logfile. Is there any
> low-hanging fruit I should try?

Hi Jamie, thanks for the follow up.

Exactly which version of Ruby + patchlevel are you using?   Some of the
1.8.7 releases had threading bugs in them which you may be hitting:

  http://redmine.ruby-lang.org/issues/show/1993
  http://www.daniel-azuma.com/blog/view/z2ysbx0e4c3it9/ruby_1_8_7_io_select_threading_bug

But ...

> I've also filed a bug with NewRelic:
> http://support.newrelic.com/discussions/support/2577-newrelic-agentbundler-causing-stillborn-unicorn-processes?unresolve=true

> # straces show there's a bad file descriptor read -- presumably logfiles.
> # I've noodled with shutting down the agent in unicorn before/after forks
> # without a lot of luck. Tried newrelic as a plugin with the same issue,
> # as well as some of the bundler fixes mentioned in the FAQ, as well as
> # the

Now that we know it's NewRelic, I suspect it could be reading from the
agent's client socket, not a log file.  You can map fd => files/sockets
with "ls -l /proc/$pid/fd" or "lsof -p $pid"

Perhaps in the before_fork hook you can try closing the TCPSocket (or
similar) that NewRelic is using.  Merely stopping the agent thread
isn't guaranteed to close the client socket properly (in fact, I can
almost guaratee it won't close the socket at the OS level).

Since you're on Linux, try putting the output of "ls -l /proc/#$$/fd" or
"lsof -p #$$" in both the after_fork+before_fork hooks to get an idea of
which descriptors are open across forks.

> # Unicorn doesn't play very nicely with threads, are there are any other
> # manual setup/teardown methods beyond NewRelic::Agent.shutdown I could
> # try to get fd's closed properly between forks?

There's nothing inherent to Unicorn that prevents it from playing nicely
with threads.  It's just not playing nicely with threaded code written
without potential fork() calls in mind.

-- 
Eric Wong
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: funky process tree + stillborn masters
  2010-04-28  7:40               ` Eric Wong
@ 2010-06-08 23:01                 ` Jamie Wilkinson
  0 siblings, 0 replies; 10+ messages in thread
From: Jamie Wilkinson @ 2010-06-08 23:01 UTC (permalink / raw)
  To: unicorn list

On Apr 28, 2010, at 12:05 AM, Jamie Wilkinson wrote:

> Update from the trenches: I've traced this down to the newrelic_rpm agent
> 
> I've also filed a bug with NewRelic:
> http://support.newrelic.com/discussions/support/2577-newrelic-agentbundler-causing-stillborn-unicorn-processes?unresolve=true

I'm happy to announce the recently released 2.12.2 newrelic_rpm gem fixed this long-running issue for me (newrelic_rpm causing some unicorns to fail to start) 
http://rubygems.org/gems/newrelic_rpm/versions/2.12.2 

Huge thanks to Bill Kayser from NewRelic & Eric Wong for his patient debugging!

-jamie
http://jamiedubs.com
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-06-08 23:01 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-08 21:21 funky process tree + stillborn masters Jamie Wilkinson
2010-04-08 23:55 ` Eric Wong
2010-04-09  1:20   ` Eric Wong
2010-04-12  2:52     ` Eric Wong
2010-04-13  3:23       ` Jamie Wilkinson
2010-04-13  5:24         ` Eric Wong
2010-04-19 18:21           ` Eric Wong
2010-04-28  4:05             ` Jamie Wilkinson
2010-04-28  7:40               ` Eric Wong
2010-06-08 23:01                 ` Jamie Wilkinson

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).