From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,RCVD_IN_DNSWL_LOW,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from mail-wm0-x22a.google.com (mail-wm0-x22a.google.com [IPv6:2a00:1450:400c:c09::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 5C96A201B0 for ; Wed, 22 Feb 2017 12:02:59 +0000 (UTC) Received: by mail-wm0-x22a.google.com with SMTP id v77so357290wmv.0 for ; Wed, 22 Feb 2017 04:02:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shopify.com; s=google; h=mime-version:from:date:message-id:subject:to :content-transfer-encoding; bh=aRW9wFtxyz7pusKnNrAe7B0in79mHABMjvdctNnv9to=; b=Kd8+kpm/T2Ev1rQJrMY/nPeON28Z8+ai5Z/m5XYrXi4tW8y7EWLnErdggUKlhFrz6Y cC1+zaojTEaACr/qQm3iboRATkOF10E5E9buNYKGU6FY/vZQiTicjPwmpIdnyrUfgkQn wUie2zJLewETEJK9QEDebwQ/0uPcKm01A3Clk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-transfer-encoding; bh=aRW9wFtxyz7pusKnNrAe7B0in79mHABMjvdctNnv9to=; b=IZwjAgVR/f/2bRgkwWFRZoW75pqrgde4qrVDcQGvv879MutlnFRU1yphUSM+OxeLHS 5nkXH0I66h26+dDOdbZXx3Ty1SAivrI8FBnh+uyDxHW68TdWgFuv1bmLfHB7SYd9Kzxb IZ4EMxBQ+i0cTygN2Xw+16647zsuP8eN+XgyGAoCWLlbikq4kizdfr99m6fefG0id18q rY5p1KKMj1CVGgzthf7uSfM9oaeJiQLSU/ba08Sw9uKN09QTyZDfc6mdMWBVFnjEA7ZG 9do/ruJCr8sRetU/5qTUkzC8f4BGUdFcG6CNwwm/s4ciRlYcy6X2imOk1+tbZL0Ffxuf 9rBg== X-Gm-Message-State: AMke39lnhsfNh2BdWnaWE8qtyva4TVbFRPGsuHaC/YWJ1D3yml37EuuBpLhAylzldJP9UwNDshSGFMcrKIbws4tt X-Received: by 10.28.1.216 with SMTP id 207mr2205553wmb.7.1487764977298; Wed, 22 Feb 2017 04:02:57 -0800 (PST) MIME-Version: 1.0 Received: by 10.28.194.195 with HTTP; Wed, 22 Feb 2017 04:02:56 -0800 (PST) From: Simon Eskildsen Date: Wed, 22 Feb 2017 07:02:56 -0500 Message-ID: Subject: check_client_connection using getsockopt(2) To: unicorn-public@bogomips.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable List-Id: Hello! Almost five years ago Tom Burns contributed the patch in collaboration with Eric that introduced the `check_client_connection` configuration option. To summarize the patch, it was a solution to a problem we have of rapid refreshes during sales where Unicorn would render a page 5 times if an eager customer refreshed Shopify 5 times, despite only seeing one-rendering. This is a large amount of lost capacity. Four of these connections would effectively be in a `CLOSE` state in the backlog, get `accept(2)`ed and a response would be sent back only to get an error that the client had closed its connection. The patch solved this problem by instead of doing a single `write(2)`, it would do a write of the generic HTTP version line, then jump into the middleware stack and render the Rack response in a second write. If the client had closed, the first `write(2)` of the HTTP version header would usually throw an exception causing Unicorn to bail before rendering the heavy Rack response. This saves a large amount of capacity during spiky traffic. A subsequent commit after testing by Eric revealed that: > This only affects clients connecting over Unix domain sockets and TCP via= loopback (127...*). It is unlikely to detect disconnects if the client is = on a remote host (even on a fast LAN). Thanks for that testing Eric. If we hadn't stumbled upon this in the documentation proactively, this wouldn't have been easy to debug in production. In my testing, I can confirm Eric's tests. My testing essentially consists of a snippet like the following to send rapid requests and then closing the client. I've confirmed with Wireshark this is roughly how popular browsers behave when refreshing fast on a slowly rendered page: 100.times do |i| client =3D TCPSocket.new("some-unicorn", 20_000) client.write("GET /collections/#{rand(10000)} HTTP/1.1\r\nHost:walrusser.myshopify.com\r\n\r\n") client.close end This confirms Eric's comment that the existing `check_client_connection` works perfectly on loopback, but as soon as you put an actual network between the Unicorn and client=E2=80=94it's only effective 20% of the time, even with `TCP_NODELAY`. I'm assuming, due to buffering, even when disabling Nagle's. As we're changing our architecture, we move from ngx (lb) -> ngx (host) -> unicorn to ngx (lb) -> unicorn. That means this patch will no longer work for us. I propose instead of the early `write(2)` hack, that we use `getsockopt(2)` with the `TCP_INFO` flag and read the state of the socket. If it's in `CLOSE_WAIT` or `CLOSE`, we kill the connection and move on to the next. https://github.com/torvalds/linux/blob/8fa3b6f9392bf6d90cb7b908e07bd9016663= 9f0a/include/uapi/linux/tcp.h#L163 This is not going to be portable, but we can do this on only Linux which I suspect is where most production deployments of Unicorn that would benefit from this feature run. It's not useful in development (which is likely more common to not be Linux). We could also introduce it under a new configuration option if desired. In my testing, this works to reject 100% of requests early when not on loopback. The code is essentially: def client_closed? tcp_info =3D socket.getsockopt(Socket::SOL_TCP, Socket::TCP_INFO) state =3D tcp_info.unpack("c")[0] state =3D=3D TCP_CLOSE || state =3D=3D TCP_CLOSE_WAIT end This could be called at the top of `#process_client` in `http_server.rb`. Would there be interest in this upstream? Any comments on this proposed implementation? Currently, we're using a middleware with the Rack hijack API, but this seems like it belongs at the webserver level.