From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id EB750207BC; Thu, 20 Apr 2017 07:35:15 +0000 (UTC) Date: Thu, 20 Apr 2017 07:35:15 +0000 From: Eric Wong To: yahns-public@yhbt.net Subject: graceful shutdown can take forever Message-ID: <20170420073515.GA6109@dcvr> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline List-Id: Mostly thinking out loud, here... Yes, shutdown_timeout exists as a config directive, but it also kinda sucks since it doesn't currently distinguish between idle and active clients. For timing out clients the server is writing to, that can be really tricky, since the server doesn't always know how much a client is buffering. So if your kernel socket buffers are 4MB and your client is on dialup downloading at 4K/s, the default shutdown_timeout of 15s is completely insufficient. On https://YHBT.net/ where I host some gigantic git repositories and files over 1GB, I have the shutdown_timeout set to 86400 seconds (1 day), which is enough to make "systemctl restart" timeout. This only seems to be triggered by idle clients from some web crawlers, so yes, idle clients should be treated different from active clients... I don't think people with dialup connections tend to download gigantic files off my server... Now that I think about it again, cmogstored (the single-purpose server on which yahns is based) actually becomes single-threaded during the graceful shutdown phase, which might be a reasonable course for yahns to follow, too... The multithreaded queue design can get very tricky as far as checking timeouts go, so simplifying the shutdown phase into a single-threaded portion removed the need for extra locking overhead (any amount of per-client locking overhead would spill into the regular non-shutdown event loop). It also avoided extra branches during normal operation as an additional win. But, I knew all this back in 2013, so... why didn't yahns use it? Hm...., yahns does allow running arbitrary user-supplied code, which can have unpredictable runtimes even if bug-free. In contrast, cmogstored is a single-purpose; and even the rotational disks cmogstored was designed for has a more predictable upper-bound seek times than the dispatch time of user-supplied Rack apps. So, yes, shutting down a multi-threaded event loop gracefully can be tricky...