unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed
From: Simon Eskildsen <simon.eskildsen@shopify.com>
To: Eric Wong <e@80x24.org>
Cc: unicorn-public@bogomips.org
Subject: Re: [PATCH] check_client_connection: use tcp state on linux
Date: Mon, 27 Feb 2017 06:44:36 -0500	[thread overview]
Message-ID: <CAO3HKM6H33D5=3=TwPJYKST26dkVyh4dkfebxFpf5c7h+jv7XQ@mail.gmail.com> (raw)
In-Reply-To: <20170225231243.GA6224@dcvr.yhbt.net>

> I prefer we use a hash or case statement.  Both allow more
> optimization in the YARV VM of CRuby (opt_aref and
> opt_case_dispatch in insns.def).  case _might_ be a little
> faster if there's no constant lookup overhead, but
> a microbench or dumping the bytecode will be necessary
> to be sure :)
>
> A hash or a case can also help portability-wise in case
> we hit a system where these numbers are non-sequential;
> or if we forgot something.

Good point. I double checked all the states on Linux and found that we
were missing TCP_CLOSING [1] [2]. This is a state where the other side
is closed, and you have buffered data on your side. It doesn't seem
like this would ever happen in Unicorn, but I think we should include
it for completeness. This also means the range becomes non-sequential.
I looked at Illumus (solaris-derived) [3] and BSD [4] and for the TCP
states we're interested in it also appears to have a non-continues
range.

My co-worker, Kir Shatrov, benchmarked a bunch of approaches to the
state check and found that case is a good solution [5].  Due to the
realness of non-sequential states in common operating systems, I think
case is the way to go here as you suggested. I've made sure to
short-circuit the common-case of TCP_ESTABLISHED. I've only seen
CLOSE_WAIT in testing, but in the wild-life of large production scale
I would assume you would see TIME_WAIT and CLOSE. LAST_ACK_CLOSING it
seems pretty unlikely to hit, but not impossible. As with CLOSING,
I've included LAST_ACK_CLOSING for completeness.

[1] https://github.com/torvalds/linux/blob/5924bbecd0267d87c24110cbe2041b5075173a25/include/net/tcp_states.h#L27
[2] https://github.com/torvalds/linux/blob/ca78d3173cff3503bcd15723b049757f75762d15/net/ipv4/tcp.c#L228
[3] https://github.com/freebsd/freebsd/blob/386ddae58459341ec567604707805814a2128a57/sys/netinet/tcp_fsm.h
[4] https://github.com/illumos/illumos-gate/blob/f7877f5d39900cfd8b20dd673e5ccc1ef7cc7447/usr/src/uts/common/netinet/tcp_fsm.h
[5] https://gist.github.com/kirs/11ba4ce84c08188c9f7eba9c639616a5

> Yep, we need to account for the UNIX socket case.  I forget if
> kgio even makes them different...

I read the implementation and verified by dumping the class when
testing on some test boxes. You are right—it's a simple Kgio::Socket
object, not differentiating between Kgio::TCPSocket and
Kgio::UnixSocket at the class level. Kgio only does this if they're
explicitly passed to override the class returned from #try_accept.
Unicorn doesn't do this.

I've tried to find a way to determine the socket domain (INET vs.
UNIX) on the socket object, but neither Ruby's Socket class nor Kgio
seems to expose this. I'm not entirely sure what the simplest way to
do this check would be. We could have the accept loop pass the correct
class to #try_accept based on the listening socket that came back from
#accept. If we passed the listening socket to #read after accept, we'd
know.. but I don't like that the request knows about the listener
either. Alternatively, we could expose the socket domain in Kgio, but
that'll be problematic in the near-ish future as you've mentioned
wanting to move away from Kgio as Ruby's IO library is at parity as
per Ruby 2.4.

What do you suggest pursuing here to check whether the client socket
is a TCP socket?

Below is a patch addressing the other concerns. I had to include
require raindrops so the `defined?` check would do the right thing, as
the only other file that requires Raindrops is the worker one which is
loaded after http_request. I can change the load-order or require
raindrops in lib/unicorn.rb if you prefer.

Missing is the socket type check. Thanks for your feedback!

---
 lib/unicorn/http_request.rb | 31 ++++++++++++++++++++++++++-----
 1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/lib/unicorn/http_request.rb b/lib/unicorn/http_request.rb
index 0c1f9bb..eedccac 100644
--- a/lib/unicorn/http_request.rb
+++ b/lib/unicorn/http_request.rb
@@ -2,6 +2,7 @@
 # :enddoc:
 # no stable API here
 require 'unicorn_http'
+require 'raindrops'

 # TODO: remove redundant names
 Unicorn.const_set(:HttpRequest, Unicorn::HttpParser)
@@ -29,6 +30,7 @@ class Unicorn::HttpParser
   # 2.2+ optimizes hash assignments when used with literal string keys
   HTTP_RESPONSE_START = [ 'HTTP', '/1.1 ']
   @@input_class = Unicorn::TeeInput
+  @@raindrops_tcp_info_defined = defined?(Raindrops::TCP_Info)
   @@check_client_connection = false

   def self.input_class
@@ -83,11 +85,7 @@ def read(socket)
       false until add_parse(socket.kgio_read!(16384))
     end

-    # detect if the socket is valid by writing a partial response:
-    if @@check_client_connection && headers?
-      self.response_start_sent = true
-      HTTP_RESPONSE_START.each { |c| socket.write(c) }
-    end
+    check_client_connection(socket) if @@check_client_connection

     e['rack.input'] = 0 == content_length ?
                       NULL_IO : @@input_class.new(socket, self)
@@ -108,4 +106,27 @@ def call
   def hijacked?
     env.include?('rack.hijack_io'.freeze)
   end
+
+  private
+
+  def check_client_connection(socket)
+    if @@raindrops_tcp_info_defined
+      tcp_info = Raindrops::TCP_Info.new(socket)
+      raise Errno::EPIPE, "client closed connection".freeze, [] if
closed_state?(tcp_info.state)
+    elsif headers?
+      self.response_start_sent = true
+      HTTP_RESPONSE_START.each { |c| socket.write(c) }
+    end
+  end
+
+  def closed_state?(state)
+    case state
+    when 1 # ESTABLISHED
+      false
+    when 6, 7, 8, 9, 11 # TIME_WAIT, CLOSE, CLOSE_WAIT, LAST_ACK, CLOSING
+      true
+    else
+      false
+    end
+  end
 end
-- 
2.11.0

  reply	other threads:[~2017-02-27 11:44 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-25 14:03 [PATCH] check_client_connection: use tcp state on linux Simon Eskildsen
2017-02-25 16:19 ` Simon Eskildsen
2017-02-25 23:12   ` Eric Wong
2017-02-27 11:44     ` Simon Eskildsen [this message]
2017-02-28 21:12       ` Eric Wong
2017-03-01  3:18         ` Eric Wong
2017-03-06 21:32           ` Simon Eskildsen
2017-03-07 22:50             ` Eric Wong
2017-03-08  0:26               ` Eric Wong
2017-03-08 12:06                 ` Simon Eskildsen
2017-03-13 20:16                   ` Simon Eskildsen
2017-03-13 20:37                     ` Eric Wong
2017-03-14 16:14                       ` Simon Eskildsen
2017-03-14 16:41                         ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://yhbt.net/unicorn/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAO3HKM6H33D5=3=TwPJYKST26dkVyh4dkfebxFpf5c7h+jv7XQ@mail.gmail.com' \
    --to=simon.eskildsen@shopify.com \
    --cc=e@80x24.org \
    --cc=unicorn-public@bogomips.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).