unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* Re: On USR2, new master runs with same PID
  2015-03-12  1:45  0% ` Eric Wong
@ 2015-03-12  6:26  0%   ` Kevin Yank
  0 siblings, 0 replies; 4+ results
From: Kevin Yank @ 2015-03-12  6:26 UTC (permalink / raw)
  To: Eric Wong; +Cc: unicorn-public

[-- Attachment #1: Type: text/plain, Size: 5164 bytes --]

Thanks for your help, Eric!

> On 12 Mar 2015, at 12:45 pm, Eric Wong <e@80x24.org> wrote:
> 
> Kevin Yank <kyank@avalanche.com.au> wrote:
>> When I send USR2 to the master process (PID 19216 in this example), I
>> get the following in the Unicorn log:
>> 
>> I, [2015-03-11T23:47:33.992274 #6848]  INFO -- : executing ["/srv/ourapp/shared/bundle/ruby/2.2.0/bin/unicorn", "/srv/ourapp/current/config.ru", "-Dc", "/srv/ourapp/shared/config/unicorn.rb", {10=>#<Kgio::UNIXServer:/srv/ourapp/shared/sockets/unicorn.sock>}] (in /srv/ourapp/releases/a0e8b5df474ad5129200654f92a76af00a750f47)
>> I, [2015-03-11T23:47:36.504235 #6848]  INFO -- : inherited addr=/srv/ourapp/shared/sockets/unicorn.sock fd=10
>> /srv/ourapp/shared/bundle/ruby/2.2.0/gems/unicorn-4.8.1/lib/unicorn/http_server.rb:206:in `pid=': Already running on PID:19216 (or pid=/srv/ourapp/shared/pids/unicorn.pid is stale) (ArgumentError)
> 
> Nothing suspicious until the above line...

That’s right.

>> And when I check, indeed, there is now unicorn.pid and
>> unicorn.pid.oldbin, both containing 19216.
> 
> Any chance you have a process manager or something else creating the
> (non-.oldbin) pid file behind unicorn's back?

It’s possible; I’m using eye (https://github.com/kostya/eye) as a process monitor. I’m not aware of it writing .pid files for processes that self-daemonize like Unicorn, though. And once one of my servers “goes bad” (i.e. Unicorn starts failing to restart in response to a USR2), it does so 100% consistently until I stop and restart Unicorn entirely. Based on that, I don’t believe it’s a race condition where my process monitor is slipping in a new unicorn.pid file some of the time.

> Can you check your process table to ensure there's not multiple
> unicorn instances running off the same config and pid files, too?

As far as I can tell, I have some servers that have gotten themselves into this “USR2 restart fails” state, while others are working just fine. In both cases, the Unicorn process tree (as show in htop) looks like this “at rest” (i.e. no deployments/restarts in progress):

unicorn master
`- unicorn master
`- unicorn worker[2]
|  `- unicorn worker[2]
`- unicorn worker[1]
|  `- unicorn worker[1]
`- unicorn worker[0]
   `- unicorn worker[0]

At first glance I’d definitely say it appears that I have two masters running from the same config files. However, there’s only one unicorn.pid file of course (the root process in the tree above), and when I try to kill -TERM the master process that doesn’t have a .pid file, the entire process tree exits. Am I misinterpreting the process table? Is this process list actually normal?

Thus far I’ve been unable to find any difference in state between a properly-behaving server and a misbehaving server, apart from the behaviour of the Unicorn master when it receives a USR2.

> Also, were there other activity (another USR2 or HUP) in the logs
> a few seconds beforehand?

No, didn’t see anything like that (and I was looking for it).

> What kind of filesystem / kernel is the pid file on?

EXT4 / Ubuntu Server 12.04 LTS

> A network FS or anything without the consistency guarantees provided
> by POSIX would not work for pid files.

Given my environment above, I should be OK, right?

> pid files are unfortunately prone to to nasty race conditions,
> but I'm not sure what you're seeing happens very often.

This has been happening pretty frequently across multiple server instances, and again once it starts happening on an instance, it keeps happening 100% of the time (until I restart Unicorn completely). So it’s not a rare edge case.

> -------------------8<-------------------
>>  sleep 10
>> 
>>  old_pid = "#{server.config[:pid]}.oldbin"
>>  if File.exists?(old_pid) && server.pid != old_pid
>>    begin
>>      Process.kill("QUIT", File.read(old_pid).to_i)
>>    rescue Errno::ENOENT, Errno::ESRCH
>>      # someone else did our job for us
>>    end
>>  end
> -------------------8<-------------------
> 
> I'd get rid of that hunk starting with the "sleep 10" (at least while
> debugging this issue).

If I simply delete this hunk, I’ll have old masters left running on my servers because they’ll never get sent the QUIT signal. I can definitely remove it temporarily (and kill the old master myself) while debugging, though.

> If you did a USR2 previously, maybe it could
> affect the current USR2 upgrade process.  Sleeping so long in the master
> like that is pretty bad it throws off timing and delays signal handling.

I’d definitely like to get rid of the sleep, as my restarts definitely feel slow. I’m not clear on what a better approach would be, though.

> That's a pretty fragile config and I regret ever including it in the
> example config files

Any better examples/docs you’d recommend I consult for guidance? Or is expecting to achieve a robust zero-downtime restart using before_fork/after_fork hooks unrealistic?

--
Kevin Yank
Chief Technology Officer, Avalanche Technology Group
http://avalanche.com.au/

ph: +61 4 2241 0083




^ permalink raw reply	[relevance 0%]

* Re: On USR2, new master runs with same PID
  2015-03-12  1:04  7% On USR2, new master runs with same PID Kevin Yank
@ 2015-03-12  1:45  0% ` Eric Wong
  2015-03-12  6:26  0%   ` Kevin Yank
  0 siblings, 1 reply; 4+ results
From: Eric Wong @ 2015-03-12  1:45 UTC (permalink / raw)
  To: Kevin Yank; +Cc: unicorn-public

Kevin Yank <kyank@avalanche.com.au> wrote:
> Having recently migrated our Rails app to MRI 2.2.0 (which may or may
> not be related), we’re experiencing problems with our Unicorn
> zero-downtime restarts.
> 
> When I send USR2 to the master process (PID 19216 in this example), I
> get the following in the Unicorn log:
>
> I, [2015-03-11T23:47:33.992274 #6848]  INFO -- : executing ["/srv/ourapp/shared/bundle/ruby/2.2.0/bin/unicorn", "/srv/ourapp/current/config.ru", "-Dc", "/srv/ourapp/shared/config/unicorn.rb", {10=>#<Kgio::UNIXServer:/srv/ourapp/shared/sockets/unicorn.sock>}] (in /srv/ourapp/releases/a0e8b5df474ad5129200654f92a76af00a750f47)
> I, [2015-03-11T23:47:36.504235 #6848]  INFO -- : inherited addr=/srv/ourapp/shared/sockets/unicorn.sock fd=10
> /srv/ourapp/shared/bundle/ruby/2.2.0/gems/unicorn-4.8.1/lib/unicorn/http_server.rb:206:in `pid=': Already running on PID:19216 (or pid=/srv/ourapp/shared/pids/unicorn.pid is stale) (ArgumentError)

Nothing suspicious until the above line...

<snip>

> And when I check, indeed, there is now unicorn.pid and
> unicorn.pid.oldbin, both containing 19216.
> 
> What could cause this situation to arise?

Any chance you have a process manager or something else creating the
(non-.oldbin) pid file behind unicorn's back?

Can you check your process table to ensure there's not multiple
unicorn instances running off the same config and pid files, too?

Also, were there other activity (another USR2 or HUP) in the logs
a few seconds beforehand?

What kind of filesystem / kernel is the pid file on?
A network FS or anything without the consistency guarantees provided
by POSIX would not work for pid files.

pid files are unfortunately prone to to nasty race conditions,
but I'm not sure what you're seeing happens very often.

Likewise, the check for stale Unix domain socket paths at startup is
inevitably racy, too, but the window is usually small enough to be
unnoticeable.  But yes, just in case, check the process table to make
sure there aren't multiple, non-related masters running on off the
same paths.

<snip>

> before_fork do |server, worker|
>   ActiveRecord::Base.connection.disconnect!

-------------------8<-------------------
>   sleep 10
> 
>   old_pid = "#{server.config[:pid]}.oldbin"
>   if File.exists?(old_pid) && server.pid != old_pid
>     begin
>       Process.kill("QUIT", File.read(old_pid).to_i)
>     rescue Errno::ENOENT, Errno::ESRCH
>       # someone else did our job for us
>     end
>   end
-------------------8<-------------------

I'd get rid of that hunk starting with the "sleep 10" (at least while
debugging this issue).  If you did a USR2 previously, maybe it could
affect the current USR2 upgrade process.  Sleeping so long in the master
like that is pretty bad it throws off timing and delays signal handling.

That's a pretty fragile config and I regret ever including it in the
example config files

^ permalink raw reply	[relevance 0%]

* [ANN] unicorn 4.8.1 released
@ 2014-01-29  8:54  5% Eric Wong
  0 siblings, 0 replies; 4+ results
From: Eric Wong @ 2014-01-29  8:54 UTC (permalink / raw)
  To: mongrel-unicorn

Note: this is identical to the 4.8.0.1.g10a2 earlier.

fix races/error handling in worker SIGQUIT handler

This protects us from two problems:

1) we (or our app) somehow called IO#close on one of the sockets
   we listen on without removing it from the readers array.
   We'll ignore IOErrors from IO#close and assume we wanted to
   close it.

2) our SIGQUIT handler is interrupted by itself.  This can happen as
   a fake signal from the master could be handled and a real signal
   from an outside user is sent to us (e.g. from unicorn-worker-killer)
   or if a user uses the killall(1) command.

* http://unicorn.bogomips.org/
* mongrel-unicorn@rubyforge.org
* git://bogomips.org/unicorn.git
* http://unicorn.bogomips.org/NEWS.atom.xml
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply	[relevance 5%]

* On USR2, new master runs with same PID
@ 2015-03-12  1:04  7% Kevin Yank
  2015-03-12  1:45  0% ` Eric Wong
  0 siblings, 1 reply; 4+ results
From: Kevin Yank @ 2015-03-12  1:04 UTC (permalink / raw)
  To: unicorn-public

Having recently migrated our Rails app to MRI 2.2.0 (which may or may not be related), we’re experiencing problems with our Unicorn zero-downtime restarts.

When I send USR2 to the master process (PID 19216 in this example), I get the following in the Unicorn log:

I, [2015-03-11T23:47:33.992274 #6848]  INFO -- : executing ["/srv/ourapp/shared/bundle/ruby/2.2.0/bin/unicorn", "/srv/ourapp/current/config.ru", "-Dc", "/srv/ourapp/shared/config/unicorn.rb", {10=>#<Kgio::UNIXServer:/srv/ourapp/shared/sockets/unicorn.sock>}] (in /srv/ourapp/releases/a0e8b5df474ad5129200654f92a76af00a750f47)
I, [2015-03-11T23:47:36.504235 #6848]  INFO -- : inherited addr=/srv/ourapp/shared/sockets/unicorn.sock fd=10
/srv/ourapp/shared/bundle/ruby/2.2.0/gems/unicorn-4.8.1/lib/unicorn/http_server.rb:206:in `pid=': Already running on PID:19216 (or pid=/srv/ourapp/shared/pids/unicorn.pid is stale) (ArgumentError)
  from /srv/ourapp/shared/bundle/ruby/2.2.0/gems/unicorn-4.8.1/lib/unicorn/http_server.rb:134:in `start'
  from /srv/ourapp/shared/bundle/ruby/2.2.0/gems/unicorn-4.8.1/bin/unicorn:126:in `<top (required)>'
  from /srv/ourapp/shared/bundle/ruby/2.2.0/bin/unicorn:23:in `load'
  from /srv/ourapp/shared/bundle/ruby/2.2.0/bin/unicorn:23:in `<main>'
E, [2015-03-11T23:47:36.519549 #19216] ERROR -- : reaped #<Process::Status: pid 6848 exit 1> exec()-ed
E, [2015-03-11T23:47:36.520296 #19216] ERROR -- : master loop error: Already running on PID:19216 (or pid=/srv/ourapp/shared/pids/unicorn.pid is stale) (ArgumentError)
E, [2015-03-11T23:47:36.520496 #19216] ERROR -- : /srv/ourapp/shared/bundle/ruby/2.2.0/gems/unicorn-4.8.1/lib/unicorn/http_server.rb:206:in `pid='
E, [2015-03-11T23:47:36.520650 #19216] ERROR -- : /srv/ourapp/shared/bundle/ruby/2.2.0/gems/unicorn-4.8.1/lib/unicorn/http_server.rb:404:in `reap_all_workers'
E, [2015-03-11T23:47:36.520790 #19216] ERROR -- : /srv/ourapp/shared/bundle/ruby/2.2.0/gems/unicorn-4.8.1/lib/unicorn/http_server.rb:279:in `join'
E, [2015-03-11T23:47:36.520928 #19216] ERROR -- : /srv/ourapp/shared/bundle/ruby/2.2.0/gems/unicorn-4.8.1/bin/unicorn:126:in `<top (required)>'
E, [2015-03-11T23:47:36.521115 #19216] ERROR -- : /srv/ourapp/shared/bundle/ruby/2.2.0/bin/unicorn:23:in `load'
E, [2015-03-11T23:47:36.521254 #19216] ERROR -- : /srv/ourapp/shared/bundle/ruby/2.2.0/bin/unicorn:23:in `<main>’

And when I check, indeed, there is now unicorn.pid and unicorn.pid.oldbin, both containing 19216.

What could cause this situation to arise?


Here’s my unicorn.rb FWIW:

# Set your full path to application.
app_path = "/srv/ourapp/current"

# Set unicorn options
worker_processes 3
preload_app true
timeout 30
listen "/srv/ourapp/shared/sockets/unicorn.sock", :backlog => 64

# Spawn unicorn master worker for user deploy (group: deploy)
user 'deploy', 'deploy'

# Fill path to your app
working_directory app_path

# Should be 'production' by default, otherwise use other env
rails_env = ENV['RAILS_ENV'] || 'production'

# Log everything to one file
stderr_path "/srv/ourapp/shared/log/unicorn.log"
stdout_path "/srv/ourapp/shared/log/unicorn.log"

# Set master PID location
pid "/srv/ourapp/shared/pids/unicorn.pid"

before_exec do |server|
  ENV["BUNDLE_GEMFILE"] = "#{app_path}/Gemfile"
end

before_fork do |server, worker|
  ActiveRecord::Base.connection.disconnect!

  sleep 10

  old_pid = "#{server.config[:pid]}.oldbin"
  if File.exists?(old_pid) && server.pid != old_pid
    begin
      Process.kill("QUIT", File.read(old_pid).to_i)
    rescue Errno::ENOENT, Errno::ESRCH
      # someone else did our job for us
    end
  end
end

after_fork do |server, worker|
  ActiveRecord::Base.establish_connection

  Sidekiq.configure_client do |config|
    config.redis = { namespace: 'sidekiq' }
  end
end


--
Kevin Yank
Chief Technology Officer, Avalanche Technology Group
http://avalanche.com.au/

ph: +61 4 2241 0083



^ permalink raw reply	[relevance 7%]

Results 1-4 of 4 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2015-03-12  1:04  7% On USR2, new master runs with same PID Kevin Yank
2015-03-12  1:45  0% ` Eric Wong
2015-03-12  6:26  0%   ` Kevin Yank
2014-01-29  8:54  5% [ANN] unicorn 4.8.1 released Eric Wong

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).