From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS14383 205.234.109.0/24 X-Spam-Status: No, score=-0.5 required=5.0 tests=AWL,MSGID_FROM_MTA_HEADER, RP_MATCHES_RCVD shortcircuit=no autolearn=unavailable version=3.3.2 Path: news.gmane.org!not-for-mail From: Eric Wong Newsgroups: gmane.comp.lang.ruby.unicorn.general Subject: Re: Problem with binding UNIX listeners before checking PID Date: Mon, 4 Oct 2010 04:17:13 +0000 Message-ID: <20101004041713.GA28709@dcvr.yhbt.net> References: <8D95A44B-A098-43BE-B532-7D74BD957F31@darkridge.com> NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: dough.gmane.org 1286166403 12956 80.91.229.12 (4 Oct 2010 04:26:43 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Mon, 4 Oct 2010 04:26:43 +0000 (UTC) To: unicorn list Original-X-From: mongrel-unicorn-bounces@rubyforge.org Mon Oct 04 06:26:38 2010 Return-path: Envelope-to: gclrug-mongrel-unicorn@m.gmane.org X-Original-To: mongrel-unicorn@rubyforge.org Delivered-To: mongrel-unicorn@rubyforge.org Content-Disposition: inline In-Reply-To: <8D95A44B-A098-43BE-B532-7D74BD957F31@darkridge.com> User-Agent: Mutt/1.5.18 (2008-05-17) X-BeenThere: mongrel-unicorn@rubyforge.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: mongrel-unicorn-bounces@rubyforge.org Errors-To: mongrel-unicorn-bounces@rubyforge.org Xref: news.gmane.org gmane.comp.lang.ruby.unicorn.general:717 Archived-At: Received: from rubyforge.org ([205.234.109.19]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1P2cdJ-0007NJ-Ay for gclrug-mongrel-unicorn@m.gmane.org; Mon, 04 Oct 2010 06:26:37 +0200 Received: from rubyforge.org (rubyforge.org [127.0.0.1]) by rubyforge.org (Postfix) with ESMTP id 3C6021858368; Mon, 4 Oct 2010 00:26:36 -0400 (EDT) Received: from dcvr.yhbt.net (dcvr.yhbt.net [64.71.152.64]) by rubyforge.org (Postfix) with ESMTP id BA3511858368 for ; Mon, 4 Oct 2010 00:17:22 -0400 (EDT) Received: from localhost (unknown [127.0.2.5]) by dcvr.yhbt.net (Postfix) with ESMTP id CC4C31F974; Mon, 4 Oct 2010 04:17:13 +0000 (UTC) Jordan Ritter wrote: > Howdy. > > I have lately been frustrated by the following use case: > > 1. Run nginx/unicorn in production, listening on a UNIX socket > with a defined pid file. Things run good. > 2. Someone pushes code, unicorn restarts just fine, workers are > all up and running. > 3. But someone is suspicious, or maybe they forget which > box they're logged into, so they invoke unicorn manually. > Same directory, same settings. > > 4. It looks like the pid file check kicked in, because unicorn > refuses to boot - hey, it's already running, bugger off. great. > 5. BUT, this happened *after* the listener processing: the > manually-invoked unicorn unlinks the real unicorn master's unix > listener, so it's left dead in the water and everybody loses. > > unicorn master doesn't know its listener is actually gone (but lsof shows > open unix socket fd, netstat shows unix socket still present, so cursory > investigation is misleading), but nginx keeps spewing ECONNREFUSEDs > because the unix socket it's hitting belongs to that accidental unicorn > instance that already decided not to stick around. > > I think this is effectively about a behavioral difference in > Unicorn::SocketHelper#bind_listen around the handling of UNIX vs. TCP > sockets (this doesn't happen with TCP sockets because there's no > unlink/disconnect step), and the fact that HttpServer#start evaluates > the listener config before the PID path/config. > > Now I see comments in and around HttpServer#initialize talking about races > wrt binding to the listener and whatnot, and being newish to the codebase > I admit I haven't yet fully absorbed all the considerations at play. > > But I think it's fair to say that killing the listener(s) (in the UNIX > socket case) before discovering you shouldn't have run in the first place > (from the PID file) qualifies as buggy/bad/broken behavior. Hi Jordan, Thanks for the detailed bug report. I knew from experience with other daemons that lingering UNIX sockets caused troubles for some users, but I failed to take into account the case where a user mistakenly starting the process twice. Yes, getting pid file writing/ordering "right"[1] is very tricky. > I might suggest simply swapping their processing order in #start, but > given the complexity of in-place restarts and other race considerations, > I have doubts solving this would be that easy. That wouldn't work if pid files weren't in use at all. > Any thoughts/ideas? A simpler check would be to use connect(2) (but not make any HTTP request) to see if the socket is alive. Patch coming. [1] - I don't believe there actually is a way to always be right, just less bad/broken than the alternatives. -- Eric Wong _______________________________________________ Unicorn mailing list - mongrel-unicorn@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-unicorn Do not quote signatures (like this one) or top post when replying