From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-3.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MISSING_HEADERS, RCVD_IN_DNSWL_HI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from mail-oi1-x232.google.com (mail-oi1-x232.google.com [IPv6:2607:f8b0:4864:20::232]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id AE6621F403 for ; Wed, 6 Jul 2022 07:41:00 +0000 (UTC) Authentication-Results: dcvr.yhbt.net; dkim=pass (1024-bit key; unprotected) header.d=shopify.com header.i=@shopify.com header.b="k685WKVT"; dkim-atps=neutral Received: by mail-oi1-x232.google.com with SMTP id be10so19016962oib.7 for ; Wed, 06 Jul 2022 00:41:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shopify.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:cc; bh=57E20XkS9Y+cKFln8iSb10O2+UNtRfUGLv+ZalXbJSg=; b=k685WKVT7ghGdlUgZ/M3J7JO7U556KXUzFPvHbVoIj/Dke7aInXlgRskXNTihGNQh9 lr6gt7A/n5X4iB/zp0PhglkESZ4KA7cqN4bPxYIxeH5TH1iBfXHGbKT5ntvVaT6pTpvf 8dpbagtw6rjfwfDCrzVMeguzQS4Ky5/uJfj6w= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:cc; bh=57E20XkS9Y+cKFln8iSb10O2+UNtRfUGLv+ZalXbJSg=; b=VEL4zQNQiZ6ZLSPp1/TQuMpZI/vR09Kz2P/BneLxjcTTqk55alxYQz6+ZD28lylsyz ICPzkO12s2xttovASnGITVO64UtspishCQi6KOkjqwQsVUS2fM+XPZBR6zZAjnVIxd0Q 7QUwQN2hcdC1C07+AvERs0SUwTGNvzNeAwIYO4H6yYWbRxH3p+GpTRENh15pIj0JNaL7 0RjAd43BiTzpuMCfiy0Qcmux7OL+Y8bgSwTOzkV6v/FJJwwoIBb30DyaUwbtSiONXPKK VFNXleMqMaMRL3E36eAbqtlvC7oiVpWwMgryOdvzPSC+TcgAd1cjGNIwmOToaIMLyZsI +Vow== X-Gm-Message-State: AJIora8SdL116OxgTBc/v73j5vVd40TJR2Dhxaq4YB1CqhG65fDZgBjm jWfjcL+HjmPLe0gIkkm0go6xTH/A9pGnwk4b3aszDZx+CpsRcg== X-Google-Smtp-Source: AGRyM1u3/yMst1OdYJ9kMsM+mTgUihMz7s0KsVNt2FBSI88PlQjQFS3if80zsvoRFUsW+3cRQ4QQP+34pdqhjsT7KfI= X-Received: by 2002:a05:6808:21a6:b0:337:abf7:96c7 with SMTP id be38-20020a05680821a600b00337abf796c7mr1414494oib.98.1657093259445; Wed, 06 Jul 2022 00:40:59 -0700 (PDT) MIME-Version: 1.0 References: <20220706023352.M393316@dcvr> In-Reply-To: <20220706023352.M393316@dcvr> From: Jean Boussier Date: Wed, 6 Jul 2022 09:40:48 +0200 Message-ID: Subject: Re: [PATCH] Master promotion with SIGURG (CoW optimization) Cc: unicorn-public@yhbt.net Content-Type: text/plain; charset="UTF-8" List-Id: > OK, any numbers from Puma users which can be used to project > improvements for unicorn (and other servers)? There are some user reports here: https://github.com/puma/puma/issues/2258 but they are mixed in reports for two other new featurs. Some reports are up to 20-30% savings, but I'd expect unicorn to benefit even more from it, given that typical puma users spawn less processes than with unicorn. > > They also include automatic reforking after a predetermined amount > > of requests (1k by default). > > 1k seems like a lot of requests (more on this below) Agreed. Shared pages start being invalidated much faster than that. > > - master: on `SIGURG` > > - Forward `SIGURG` to a chosen worker. > > OK, I guess SIGURG is safe to use since nobody else relies on it (right?). That's my understanding, an alternative could be to re-use USR2 and have a config flag to define wether it is a rexec or refork. > Right. PID file races and corner cases were painful to deal > with in the earliest days of this project, and I don't look > forward to having to deal with them ever again... > > It looked like the world was moving towards socket-activation > with dedicated process managers and away from fragile PID files; > so being self-daemonization dependent seems like a step back. Agreed. We're currently running unicorn as PID 1 inside containers and I'm not exactly looking forward to have to minitor a PID file. Another avenue I explored was to keep the existing master and refork from one of the worker like puma, but re-assign the parent to the orignal master using PR_SET_CHILD_SUBREAPER. Here's my notes of how it works: ### Requirements: - Need `PR_SET_CHILD_SUBREAPER`, so Linux 3.4+ only (May 2012) - Need `File.mkfifo`, so Ruby 2.3 (Dec 2015), but can be shimed for older rubies. ### Flow: - master: set `PR_SET_CHILD_SUBREAPER` during boot. - master: create a temporary directory (`$TMP`). - master: spawn initial workers the old way. - master: on `SIGCHLD` (a child died): - Fake signal oldest worker (`SIGURG`). - Write the new worker number in the fake signal pipe (at the same time). - worker: on `SIGURG` (should spawn a sibling): - Note: worker fake signals are processed after the current request is completed. - Run `before_fork` callback (MAYBE: need a special `before_refork` callback?) - create a pipe. - daemonize (fork twice so that the new process is attributed to the nearest `CHILD_SUBREAPER`, aka the original master). - wait for the first child to exit. - write into the pipe to signal the parent is dead. - Run `after_fork` callback (MAYBE: need a special `after_refork` callback?) - new sibling after fork: - wait for the parent to exit (though the temporary pipe). - create a named pipe (`File.mkfifo`) at `$TMP/$WORKER_NUM.pipe`. - create a pidfile at `$TMP/WORKER_NUM.pid`. - Open the named pipe in `IO::RDONLY | IO::NONBLOCK` (otherwise it would block until the master open in write mode). - NB: Need to convert it to a `Kgio::Pipe` with `Kgio::Pipe.for_fd(f.to_i)`, and keep `f` need `autoclose = false`. - Create the `Unicorn::Worker` instance with that pipe and worker number. - Send `SIGURG` to the parent process (should be the master). - Wait for `SIGCONT` in the named pipe. - enter the worker loop. - master: on `SIGURG`: - for each outstanding refork order: - Look for `$TMP/$WORKER_NUM.pid` and `$TMP/$WORKER_NUM.pipe` - Read the pidfile - Open the pipe with `IO::RDONLY | IO::NONBLOCK` - NB: Need to convert it to a `Kgio::Pipe` with `Kgio::Pipe.for_fd(f.to_i)`, and keep `f` need `autoclose = false`. - Delete pidfile and named pipe. - Register that new worker normally. - Write `SIGCONT` into the pipe. I can share the patch if you are interested, but the extra complexity and Linux only features made me prefer the master-promotion approach. > > However it work well enough for demonstration. > > > > So I figured I'd ask wether the feature is desired upstream > > before investing more effort into it. > > It really depends on: > > 1) what memory savings and/or speedups can be achieved > > 2) what support can we expect from you (and other users) for > this feature and potential regressions going forward? > > I don't have the free time nor mental capacity to deal with > fallout from major new features such as this, myself. Yeah, this is just a RFC, I wouldn't submit a final patch without first deploying it on our infra with good results. I'm just on the fence between trying to get this upstream vs maintaining our own server that solely work like this, hence somewhat simpler and that can make more assumptions. > The hook interface would be yet another documentation+support > burden I'm worried about (and also more unicorn-specific stuff > to lock people in :<). Understandable. I suppose it can be done with a monitoring process. Check `smaps_rollup` and send `SIGURG` when deemed necessary. More moving pieces but keep unicorn simpler. > A completely different idea which should achieve the same goal, > completely independent of unicorn: > > App startup/loading can be made to trigger warmup paths which > hit common endpoints. IOW, app loading could run a small > warmup suite which simulates common requests to heat up caches > and trigger JIT. Yeah, I don't really believe in this for a few reasons: - That would slow boot time. - On large apps there's just too many codepath for this to be realistic. - Many endpoints require database and other network access you probably don't want in the master. - Endpoints may have side effects. > Ultimately (long-term for ruby-core), it would be better to make > JIT (and perhaps even VM cache) results persistent and shareable > across different processes, somehow... That would help all Ruby > tools, even short-lived CLI tools. Portable locking would be a > pain, I suppose, but proprietary OSes are always a pain :P Based on my limited understanding of both JIT and VM caches, I don't think that's really possible. The VM itself could definitely do better CoW wise, and I have some proposals on that front (https://bugs.ruby-lang.org/issues/18885) but that will take time and will never be perfect. That's why I think periodically promoting a new master has potential. > There's some line-wrapping issues caused by your MUA; Urk. Ok, trying another client now, and I'll resend the patch. > Perhaps documenting this as experimental and subject to removal > at any time would make the addition of a major new feature an > easier pill to swallow; but I still think this is better outside > of app servers. Up to you, if you don't feel like maintaining such a feature I would perfectly understand. --- SIGNALS | 4 ++ lib/unicorn.rb | 2 +- lib/unicorn/http_server.rb | 115 +++++++++++++++++++++++++-------- lib/unicorn/promoted_worker.rb | 40 ++++++++++++ 4 files changed, 134 insertions(+), 27 deletions(-) create mode 100644 lib/unicorn/promoted_worker.rb diff --git a/SIGNALS b/SIGNALS index 7321f2b..f5716b9 100644 --- a/SIGNALS +++ b/SIGNALS @@ -39,6 +39,10 @@ https://yhbt.net/unicorn/examples/init.sh * WINCH - gracefully stops workers but keep the master running. This will only work for daemonized processes. +* URG - promote one of the existing workers as a new master, and gracefully + stop workers. + This will only work for daemonized processes. + * TTIN - increment the number of worker processes by one * TTOU - decrement the number of worker processes by one diff --git a/lib/unicorn.rb b/lib/unicorn.rb index 1a50631..832f78d 100644 --- a/lib/unicorn.rb +++ b/lib/unicorn.rb @@ -133,6 +133,6 @@ def self.pipe # :nodoc: # :enddoc: %w(const socket_helper stream_input tee_input http_request configurator - tmpio util http_response worker http_server).each do |s| + tmpio util http_response worker promoted_worker http_server).each do |s| require_relative "unicorn/#{s}" end diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb index 21f2a05..dd8f021 100644 --- a/lib/unicorn/http_server.rb +++ b/lib/unicorn/http_server.rb @@ -50,6 +50,7 @@ class Unicorn::HttpServer # Unicorn::HttpServer::START_CTX[0] = "/home/bofh/2.3.0/bin/unicorn" START_CTX = { :argv => ARGV.map(&:dup), + :generation => 0, 0 => $0.dup, } # We favor ENV['PWD'] since it is (usually) symlink aware for Capistrano @@ -106,7 +107,7 @@ def initialize(app, options = {}) @orig_app = app # list of signals we care about and trap in master. @queue_sigs = [ - :WINCH, :QUIT, :INT, :TERM, :USR1, :USR2, :HUP, :TTIN, :TTOU ] + :WINCH, :QUIT, :INT, :TERM, :USR1, :USR2, :HUP, :TTIN, :TTOU, :URG ] @worker_data = if worker_data = ENV['UNICORN_WORKER'] worker_data = worker_data.split(',').map!(&:to_i) @@ -119,7 +120,24 @@ def initialize(app, options = {}) # Runs the thing. Returns self so you can run join on it def start + @new_master = false inherit_listeners! + promote + + # write pid early for Mongrel compatibility if we're not inheriting sockets + # This is needed for compatibility some Monit setups at least. + # This unfortunately has the side effect of clobbering valid PID if + # we upgrade and the upgrade breaks during preload_app==true && build_app! + self.pid = config[:pid] + + build_app! if preload_app + bind_new_listeners! + + spawn_missing_workers + self + end + + def promote # this pipe is used to wake us up from select(2) in #join when signals # are trapped. See trap_deferred. @self_pipe.replace(Unicorn.pipe) @@ -130,18 +148,6 @@ def start # Note that signals don't actually get handled until the #join method @queue_sigs.each { |sig| trap(sig) { @sig_queue << sig; awaken_master } } trap(:CHLD) { awaken_master } - - # write pid early for Mongrel compatibility if we're not inheriting sockets - # This is needed for compatibility some Monit setups at least. - # This unfortunately has the side effect of clobbering valid PID if - # we upgrade and the upgrade breaks during preload_app==true && build_app! - self.pid = config[:pid] - - build_app! if preload_app - bind_new_listeners! - - spawn_missing_workers - self end # replaces current listener set with +listeners+. This will @@ -178,16 +184,16 @@ def logger=(obj) Unicorn::HttpRequest::DEFAULTS["rack.logger"] = @logger = obj end - def clobber_pid(path) + def clobber_pid(path, content = $$) unlink_pid_safe(@pid) if @pid if path fp = begin - tmp = "#{File.dirname(path)}/#{rand}.#$$" + tmp = "#{File.dirname(path)}/#{rand}.#{content}" File.open(tmp, File::RDWR|File::CREAT|File::EXCL, 0644) rescue Errno::EEXIST retry end - fp.syswrite("#$$\n") + fp.syswrite("#{content}\n") File.rename(fp.path, path) fp.close end @@ -279,6 +285,11 @@ def join end @ready_pipe = @ready_pipe.close rescue nil end + + if @promoted + Process.kill(:WINCH, Process.ppid) + end + begin reap_all_workers case @sig_queue.shift @@ -292,6 +303,13 @@ def join @logger.debug("waiting #{sleep_time}s after suspend/hibernation") end maintain_worker_count if respawn + + if @new_master && @new_master.ready? && @workers.empty? + # TODO: we should handle the new master dying like with reexec. + clobber_pid(pid, @new_master.pid) + break + end + master_sleep(sleep_time) when :QUIT # graceful shutdown break @@ -305,10 +323,20 @@ def join soft_kill_each_worker(:USR1) when :USR2 # exec binary, stay alive in case something went wrong reexec + when :URG + if $stdin.tty? + logger.info "SIGURG ignored because we're not daemonized" + else + promote_new_master + end when :WINCH if $stdin.tty? logger.info "SIGWINCH ignored because we're not daemonized" else + if @new_master + @new_master.ready! + end + respawn = false logger.info "gracefully stopping all workers" soft_kill_each_worker(:QUIT) @@ -408,11 +436,30 @@ def reap_all_workers worker = @workers.delete(wpid) and worker.close rescue nil @after_worker_exit.call(self, worker, status) end + + if @new_master && @new_master.ready? + @new_master.scale(@workers.size) + end rescue Errno::ECHILD break end while true end + def promote_new_master + # Promoting the oldest worker + # TODO: handle `@new_master` being dead. + if @new_master + logger.error "can't promote because worker=#{@new_master.nr} is being promoted" + elsif pair = @workers.first + @new_master = Unicorn::PromotedWorker.new(*pair, worker_processes) + @workers.delete(@new_master.pid) + logger.info "master promoting worker=#{@new_master.worker.nr}" + @new_master.promote + else + logger.error "can't promote because there is no existing workers" + end + end + # reexecutes the START_CTX with a new binary def reexec if @reexec_pid > 0 @@ -516,10 +563,11 @@ def murder_lazy_workers end def after_fork_internal + self.worker_processes = 0 @self_pipe.each(&:close).clear # this is master-only, now @ready_pipe.close if @ready_pipe Unicorn::Configurator::RACKUP.clear - @ready_pipe = @init_listeners = @before_exec = @before_fork = nil + @ready_pipe = nil # The OpenSSL PRNG is seeded with only the pid, and apps with frequently # dying workers can recycle pids @@ -545,6 +593,13 @@ def spawn_missing_workers unless pid after_fork_internal worker_loop(worker) + + if @promoted + worker.tick = 0 + promote + join + end + exit end @@ -678,19 +733,22 @@ def init_worker_process(worker) trap(:CHLD, 'DEFAULT') @sig_queue.clear proc_name "worker[#{worker.nr}]" - START_CTX.clear @workers.clear after_fork.call(self, worker) # can drop perms and create listeners LISTENERS.each { |sock| sock.close_on_exec = true } worker.user(*user) if user.kind_of?(Array) && ! worker.switched - @config = nil build_app! unless preload_app - @after_fork = @listener_opts = @orig_app = nil + @listener_opts = @orig_app = nil readers = LISTENERS.dup readers << worker trap(:QUIT) { nuke_listeners!(readers) } + @promoted = false + trap(:URG) do + @promoted = true + START_CTX[:generation] += 1 + end readers end @@ -706,11 +764,11 @@ def reopen_worker_logs(worker_nr) def prep_readers(readers) wtr = Unicorn::Waiter.prep_readers(readers) - @timeout *= 500 # to milliseconds for epoll, but halved + @select_timeout = @timeout * 500 # to milliseconds for epoll, but halved wtr rescue require_relative 'select_waiter' - @timeout /= 2.0 # halved for IO.select + @select_timeout = @timeout / 2.0 # halved for IO.select Unicorn::SelectWaiter.new end @@ -720,7 +778,7 @@ def prep_readers(readers) def worker_loop(worker) readers = init_worker_process(worker) waiter = prep_readers(readers) - reopen = false + promote = reopen = false # this only works immediately if the master sent us the signal # (which is the normal case) @@ -739,12 +797,17 @@ def worker_loop(worker) process_client(client) worker.tick = time_now.to_i end + if @promoted + worker.tick = time_now.to_i + return + end + break if reopen end # timeout so we can .tick and keep parent from SIGKILL-ing us worker.tick = time_now.to_i - waiter.get_readers(ready, readers, @timeout) + waiter.get_readers(ready, readers, @select_timeout) rescue => e redo if reopen && readers[0] Unicorn.log_error(@logger, "listen loop error", e) if readers[0] @@ -823,8 +886,8 @@ def build_app! end def proc_name(tag) - $0 = ([ File.basename(START_CTX[0]), tag - ]).concat(START_CTX[:argv]).join(' ') + $0 = ([ File.basename(START_CTX[0]), tag, "(gen: #{START_CTX[:generation]})", + ]).concat(START_CTX[:argv]).compact.join(' ') end def redirect_io(io, path) diff --git a/lib/unicorn/promoted_worker.rb b/lib/unicorn/promoted_worker.rb new file mode 100644 index 0000000..2595182 --- /dev/null +++ b/lib/unicorn/promoted_worker.rb @@ -0,0 +1,40 @@ +# -*- encoding: binary -*- + +class Unicorn::PromotedWorker + attr_reader :pid, :worker + + def initialize(pid, worker, expected_worker_processes) + @pid = pid + @worker = worker + @worker_processes = 0 + @expected_worker_processes = expected_worker_processes + @ready = false + end + + def ready? + @ready + end + + def ready! + @ready = true + end + + def promote + @worker.soft_kill(:URG) + end + + def scale(old_master_worker_processes) + diff = @expected_worker_processes - + old_master_worker_processes - + @worker_processes + + if diff > 0 + diff.times { kill(:TTIN) } + @worker_processes += diff + end + end + + def kill(sig) + Process.kill(sig, @pid) + end +end -- 2.35.1