unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed
From: Jean Boussier <jean.boussier@shopify.com>
To: unicorn-public@yhbt.net
Subject: [PATCH] Master promotion with SIGURG (CoW optimization)
Date: Tue, 5 Jul 2022 22:05:11 +0200	[thread overview]
Message-ID: <b9a7c1b1-0242-af8c-71a2-4d659ab4647d@shopify.com> (raw)

Unicorn rely on Copy on Write to limit applications' memory usage,
however Ruby Copy on Write performance isn't exactly perfect.
As code get executed various internal performance caches get warmed
which cause shared pages to be invalidated.

While there's certainly some improvements to be made to Ruby itself
on that front, it will likely get worse in the near future if YJIT
become popular, as it will generate executable code in the workers
hence won't be shared.

One way effective CoW performance could be improved would be to
periodically promote one worker as the new master. Since that worker
processed some requests, VM caches, JITed code, etc should be somewhat
warm already, hence shared pages should be dirtied less frequently after
each promotion.

Puma 5.0.0 introduced a similar experimental feature called
`fork_worker` or `refork`.

It's a bit more limited though, as instead of promoting a new
master and exiting, they only ask worker #0 to fork itself to
replace its siblings. So once all workers are replaces, there
is 3 levels of processes: cluster -> first_worker -> other_workers.

They also include automatic reforking after a predetermined amount
of requests (1k by default).

The happy path works pretty much like this:

- Assume daemonized mode.

- master: on `SIGURG`
   - Forward `SIGURG` to a chosen worker.

- worker: on `SIGURG` (become a master)
   - do the prep, register traps, open self-pipe, etc
   - don't spawn any worker, set number of wokers to 0.
   - send `SIGWHINCH` to master.

- old master: on `SIGWHINCH`
   - Gracefully shutdown workers, like during a reexec.

- old master: when a worker is reaped.
   - Send `SIGTTIN` to the new master
   - If it was the last worker:
     - replace pidfile
     - exit

This patch is not exactly production quality yet:

   - I need to write some tests
   - There's a race condition that can cause the promoted master
     master to have one less worker than required. Need to be addressed.
   - The pidfile replacement should be improved.
   - Multiple corner cases like a `QUIT` during promotion are not handled.

However it work well enough for demonstration.

So I figured I'd ask wether the feature is desired upstream
before investing more effort into it.

For this to be used in production without too much integration effort
I think automatic promotion based on some criteria like number of
request or process lifetime would be needed.

Ideally a hook interface to programatically trigger promotion would
be very valuable as I'd likely want to trigger promotion
based on memory metrics read from `/proc/<pid>/smaps_rollup`.
---
  SIGNALS                        |   4 ++
  lib/unicorn.rb                 |   2 +-
  lib/unicorn/http_server.rb     | 115 +++++++++++++++++++++++++--------
  lib/unicorn/promoted_worker.rb |  40 ++++++++++++
  4 files changed, 134 insertions(+), 27 deletions(-)
  create mode 100644 lib/unicorn/promoted_worker.rb

diff --git a/SIGNALS b/SIGNALS
index 7321f2b..f5716b9 100644
--- a/SIGNALS
+++ b/SIGNALS
@@ -39,6 +39,10 @@ https://yhbt.net/unicorn/examples/init.sh
  * WINCH - gracefully stops workers but keep the master running.
    This will only work for daemonized processes.

+* URG - promote one of the existing workers as a new master, and gracefully
+  stop workers.
+  This will only work for daemonized processes.
+
  * TTIN - increment the number of worker processes by one

  * TTOU - decrement the number of worker processes by one
diff --git a/lib/unicorn.rb b/lib/unicorn.rb
index 1a50631..832f78d 100644
--- a/lib/unicorn.rb
+++ b/lib/unicorn.rb
@@ -133,6 +133,6 @@ def self.pipe # :nodoc:
  # :enddoc:

  %w(const socket_helper stream_input tee_input http_request configurator
-   tmpio util http_response worker http_server).each do |s|
+   tmpio util http_response worker promoted_worker http_server).each do |s|
    require_relative "unicorn/#{s}"
  end
diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb
index 21f2a05..dd8f021 100644
--- a/lib/unicorn/http_server.rb
+++ b/lib/unicorn/http_server.rb
@@ -50,6 +50,7 @@ class Unicorn::HttpServer
    #   Unicorn::HttpServer::START_CTX[0] = "/home/bofh/2.3.0/bin/unicorn"
    START_CTX = {
      :argv => ARGV.map(&:dup),
+    :generation => 0,
      0 => $0.dup,
    }
    # We favor ENV['PWD'] since it is (usually) symlink aware for Capistrano
@@ -106,7 +107,7 @@ def initialize(app, options = {})
      @orig_app = app
      # list of signals we care about and trap in master.
      @queue_sigs = [
-      :WINCH, :QUIT, :INT, :TERM, :USR1, :USR2, :HUP, :TTIN, :TTOU ]
+      :WINCH, :QUIT, :INT, :TERM, :USR1, :USR2, :HUP, :TTIN, :TTOU, :URG ]

      @worker_data = if worker_data = ENV['UNICORN_WORKER']
        worker_data = worker_data.split(',').map!(&:to_i)
@@ -119,7 +120,24 @@ def initialize(app, options = {})

    # Runs the thing.  Returns self so you can run join on it
    def start
+    @new_master = false
      inherit_listeners!
+    promote
+
+    # write pid early for Mongrel compatibility if we're not inheriting 
sockets
+    # This is needed for compatibility some Monit setups at least.
+    # This unfortunately has the side effect of clobbering valid PID if
+    # we upgrade and the upgrade breaks during preload_app==true && 
build_app!
+    self.pid = config[:pid]
+
+    build_app! if preload_app
+    bind_new_listeners!
+
+    spawn_missing_workers
+    self
+  end
+
+  def promote
      # this pipe is used to wake us up from select(2) in #join when signals
      # are trapped.  See trap_deferred.
      @self_pipe.replace(Unicorn.pipe)
@@ -130,18 +148,6 @@ def start
      # Note that signals don't actually get handled until the #join method
      @queue_sigs.each { |sig| trap(sig) { @sig_queue << sig; 
awaken_master } }
      trap(:CHLD) { awaken_master }
-
-    # write pid early for Mongrel compatibility if we're not inheriting 
sockets
-    # This is needed for compatibility some Monit setups at least.
-    # This unfortunately has the side effect of clobbering valid PID if
-    # we upgrade and the upgrade breaks during preload_app==true && 
build_app!
-    self.pid = config[:pid]
-
-    build_app! if preload_app
-    bind_new_listeners!
-
-    spawn_missing_workers
-    self
    end

    # replaces current listener set with +listeners+.  This will
@@ -178,16 +184,16 @@ def logger=(obj)
      Unicorn::HttpRequest::DEFAULTS["rack.logger"] = @logger = obj
    end

-  def clobber_pid(path)
+  def clobber_pid(path, content = $$)
      unlink_pid_safe(@pid) if @pid
      if path
        fp = begin
-        tmp = "#{File.dirname(path)}/#{rand}.#$$"
+        tmp = "#{File.dirname(path)}/#{rand}.#{content}"
          File.open(tmp, File::RDWR|File::CREAT|File::EXCL, 0644)
        rescue Errno::EEXIST
          retry
        end
-      fp.syswrite("#$$\n")
+      fp.syswrite("#{content}\n")
        File.rename(fp.path, path)
        fp.close
      end
@@ -279,6 +285,11 @@ def join
        end
        @ready_pipe = @ready_pipe.close rescue nil
      end
+
+    if @promoted
+      Process.kill(:WINCH, Process.ppid)
+    end
+
      begin
        reap_all_workers
        case @sig_queue.shift
@@ -292,6 +303,13 @@ def join
            @logger.debug("waiting #{sleep_time}s after 
suspend/hibernation")
          end
          maintain_worker_count if respawn
+
+        if @new_master && @new_master.ready? && @workers.empty?
+          # TODO: we should handle the new master dying like with reexec.
+          clobber_pid(pid, @new_master.pid)
+          break
+        end
+
          master_sleep(sleep_time)
        when :QUIT # graceful shutdown
          break
@@ -305,10 +323,20 @@ def join
          soft_kill_each_worker(:USR1)
        when :USR2 # exec binary, stay alive in case something went wrong
          reexec
+      when :URG
+        if $stdin.tty?
+          logger.info "SIGURG ignored because we're not daemonized"
+        else
+          promote_new_master
+        end
        when :WINCH
          if $stdin.tty?
            logger.info "SIGWINCH ignored because we're not daemonized"
          else
+          if @new_master
+            @new_master.ready!
+          end
+
            respawn = false
            logger.info "gracefully stopping all workers"
            soft_kill_each_worker(:QUIT)
@@ -408,11 +436,30 @@ def reap_all_workers
          worker = @workers.delete(wpid) and worker.close rescue nil
          @after_worker_exit.call(self, worker, status)
        end
+
+      if @new_master && @new_master.ready?
+        @new_master.scale(@workers.size)
+      end
      rescue Errno::ECHILD
        break
      end while true
    end

+  def promote_new_master
+    # Promoting the oldest worker
+    # TODO: handle `@new_master` being dead.
+    if @new_master
+      logger.error "can't promote because worker=#{@new_master.nr} is 
being promoted"
+    elsif pair = @workers.first
+      @new_master = Unicorn::PromotedWorker.new(*pair, worker_processes)
+      @workers.delete(@new_master.pid)
+      logger.info "master promoting worker=#{@new_master.worker.nr}"
+      @new_master.promote
+    else
+      logger.error "can't promote because there is no existing workers"
+    end
+  end
+
    # reexecutes the START_CTX with a new binary
    def reexec
      if @reexec_pid > 0
@@ -516,10 +563,11 @@ def murder_lazy_workers
    end

    def after_fork_internal
+    self.worker_processes = 0
      @self_pipe.each(&:close).clear # this is master-only, now
      @ready_pipe.close if @ready_pipe
      Unicorn::Configurator::RACKUP.clear
-    @ready_pipe = @init_listeners = @before_exec = @before_fork = nil
+    @ready_pipe = nil

      # The OpenSSL PRNG is seeded with only the pid, and apps with 
frequently
      # dying workers can recycle pids
@@ -545,6 +593,13 @@ def spawn_missing_workers
        unless pid
          after_fork_internal
          worker_loop(worker)
+
+        if @promoted
+          worker.tick = 0
+          promote
+          join
+        end
+
          exit
        end

@@ -678,19 +733,22 @@ def init_worker_process(worker)
      trap(:CHLD, 'DEFAULT')
      @sig_queue.clear
      proc_name "worker[#{worker.nr}]"
-    START_CTX.clear
      @workers.clear

      after_fork.call(self, worker) # can drop perms and create listeners
      LISTENERS.each { |sock| sock.close_on_exec = true }

      worker.user(*user) if user.kind_of?(Array) && ! worker.switched
-    @config = nil
      build_app! unless preload_app
-    @after_fork = @listener_opts = @orig_app = nil
+    @listener_opts = @orig_app = nil
      readers = LISTENERS.dup
      readers << worker
      trap(:QUIT) { nuke_listeners!(readers) }
+    @promoted = false
+    trap(:URG) do
+      @promoted = true
+      START_CTX[:generation] += 1
+    end
      readers
    end

@@ -706,11 +764,11 @@ def reopen_worker_logs(worker_nr)

    def prep_readers(readers)
      wtr = Unicorn::Waiter.prep_readers(readers)
-    @timeout *= 500 # to milliseconds for epoll, but halved
+    @select_timeout = @timeout * 500 # to milliseconds for epoll, but 
halved
      wtr
    rescue
      require_relative 'select_waiter'
-    @timeout /= 2.0 # halved for IO.select
+    @select_timeout = @timeout / 2.0 # halved for IO.select
      Unicorn::SelectWaiter.new
    end

@@ -720,7 +778,7 @@ def prep_readers(readers)
    def worker_loop(worker)
      readers = init_worker_process(worker)
      waiter = prep_readers(readers)
-    reopen = false
+    promote = reopen = false

      # this only works immediately if the master sent us the signal
      # (which is the normal case)
@@ -739,12 +797,17 @@ def worker_loop(worker)
            process_client(client)
            worker.tick = time_now.to_i
          end
+        if @promoted
+          worker.tick = time_now.to_i
+          return
+        end
+
          break if reopen
        end

        # timeout so we can .tick and keep parent from SIGKILL-ing us
        worker.tick = time_now.to_i
-      waiter.get_readers(ready, readers, @timeout)
+      waiter.get_readers(ready, readers, @select_timeout)
      rescue => e
        redo if reopen && readers[0]
        Unicorn.log_error(@logger, "listen loop error", e) if readers[0]
@@ -823,8 +886,8 @@ def build_app!
    end

    def proc_name(tag)
-    $0 = ([ File.basename(START_CTX[0]), tag
-          ]).concat(START_CTX[:argv]).join(' ')
+    $0 = ([ File.basename(START_CTX[0]), tag, "(gen: 
#{START_CTX[:generation]})",
+          ]).concat(START_CTX[:argv]).compact.join(' ')
    end

    def redirect_io(io, path)
diff --git a/lib/unicorn/promoted_worker.rb b/lib/unicorn/promoted_worker.rb
new file mode 100644
index 0000000..2595182
--- /dev/null
+++ b/lib/unicorn/promoted_worker.rb
@@ -0,0 +1,40 @@
+# -*- encoding: binary -*-
+
+class Unicorn::PromotedWorker
+  attr_reader :pid, :worker
+
+  def initialize(pid, worker, expected_worker_processes)
+    @pid = pid
+    @worker = worker
+    @worker_processes = 0
+    @expected_worker_processes = expected_worker_processes
+    @ready = false
+  end
+
+  def ready?
+    @ready
+  end
+
+  def ready!
+    @ready = true
+  end
+
+  def promote
+    @worker.soft_kill(:URG)
+  end
+
+  def scale(old_master_worker_processes)
+    diff = @expected_worker_processes -
+      old_master_worker_processes -
+      @worker_processes
+
+    if diff > 0
+      diff.times { kill(:TTIN) }
+      @worker_processes += diff
+    end
+  end
+
+  def kill(sig)
+    Process.kill(sig, @pid)
+  end
+end
-- 
2.35.1


             reply	other threads:[~2022-07-05 20:05 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-05 20:05 Jean Boussier [this message]
2022-07-06  2:33 ` [PATCH] Master promotion with SIGURG (CoW optimization) Eric Wong
2022-07-06  7:40   ` Jean Boussier
2022-07-07 10:23     ` Eric Wong
     [not found]       ` <CANPRWbHTNiEcYq5qhN6Kio8Wg9a+2gXmc2bAcB2oVw4LZv8rcw@mail.gmail.com>
     [not found]         ` <CANPRWbGArasDtbAej4LsCOGeYZSrNz87p5kLjG+x__jHAn-5ng@mail.gmail.com>
2022-07-08  0:13           ` Eric Wong
2022-07-08  0:30         ` Eric Wong
2022-07-08  6:22           ` Jean Boussier
2022-09-21 22:16             ` Eric Wong
2022-09-22  6:43               ` Jean Boussier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://yhbt.net/unicorn/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b9a7c1b1-0242-af8c-71a2-4d659ab4647d@shopify.com \
    --to=jean.boussier@shopify.com \
    --cc=unicorn-public@yhbt.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).