Re: check_client_connection using getsockopt(2)

raindrops RubyGem user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed

* Re: check_client_connection using getsockopt(2)
       [not found] <CAO3HKM49+aLD=KLij3zhJqkWnR7bCWVan0mOvxD85xfrW8RXOw@mail.gmail.com>
@ 2017-02-22 18:33 ` Eric Wong
  2017-02-22 20:09   ` Simon Eskildsen
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Wong @ 2017-02-22 18:33 UTC (permalink / raw)
  To: Simon Eskildsen; +Cc: unicorn-public, raindrops-public

Simon Eskildsen <simon.eskildsen@shopify.com> wrote:

<snip> great to know it's still working after all these years :>

> This confirms Eric's comment that the existing
> `check_client_connection` works perfectly on loopback, but as soon as
> you put an actual network between the Unicorn and client—it's only
> effective 20% of the time, even with `TCP_NODELAY`. I'm assuming, due
> to buffering, even when disabling Nagle's. As we're changing our
> architecture, we move from ngx (lb) -> ngx (host) -> unicorn to ngx
> (lb) -> unicorn. That means this patch will no longer work for us.

Side comment: I'm a bit curious how you guys arrived at removing
nginx at the host level (if you're allowed to share that info)

I've mainly kept nginx on the same host as unicorn, but used
haproxy or similar (with minimal buffering) at the LB level.
That allows better filesystem load distribution for large uploads.

> I propose instead of the early `write(2)` hack, that we use
> `getsockopt(2)` with the `TCP_INFO` flag and read the state of the
> socket. If it's in `CLOSE_WAIT` or `CLOSE`, we kill the connection and
> move on to the next.
> 
> https://github.com/torvalds/linux/blob/8fa3b6f9392bf6d90cb7b908e07bd90166639f0a/include/uapi/linux/tcp.h#L163
> 
> This is not going to be portable, but we can do this on only Linux
> which I suspect is where most production deployments of Unicorn that
> would benefit from this feature run. It's not useful in development
> (which is likely more common to not be Linux). We could also introduce
> it under a new configuration option if desired. In my testing, this
> works to reject 100% of requests early when not on loopback.

Good to know about it's effectiveness!  I don't mind adding
Linux-only features as long as existing functionality still
works on the server-oriented *BSDs.

> The code is essentially:
> 
> def client_closed?
>   tcp_info = socket.getsockopt(Socket::SOL_TCP, Socket::TCP_INFO)
>   state = tcp_info.unpack("c")[0]
>   state == TCP_CLOSE || state == TCP_CLOSE_WAIT
> end

+Cc: raindrops-public@bogomips.org -> https://bogomips.org/raindrops-public/

Fwiw, raindrops (already a dependency of unicorn) has TCP_INFO
support under Linux:

https://bogomips.org/raindrops.git/tree/ext/raindrops/linux_tcp_info.c

I haven't used it, much, but FreeBSD also implements a subset of
this feature, nowadays, and ought to be supportable, too.  We
can look into adding it for raindrops.

I don't know about other BSDs...

> This could be called at the top of `#process_client` in `http_server.rb`.
> 
> Would there be interest in this upstream? Any comments on this
> proposed implementation? Currently, we're using a middleware with the
> Rack hijack API, but this seems like it belongs at the webserver
> level.

I guess hijack means you have to discard other middlewares and
the normal rack response handling in unicorn?  If so, ouch!

Unfortunately, without hijack, there's no portable way in Rack
to get at the underlying IO object; so I guess this needs to
be done at the server level...

So yeah, I guess it belongs in the webserver.

I think we can automatically use TCP_INFO if available for
check_client_connection; then fall back to the old early write
hack for Unix sockets and systems w/o TCP_INFO (which would set
the response_start_sent flag).

No need to introduce yet another configuration option.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: check_client_connection using getsockopt(2)
  2017-02-22 18:33 ` check_client_connection using getsockopt(2) Eric Wong
@ 2017-02-22 20:09   ` Simon Eskildsen
  2017-02-23  1:42     ` Eric Wong
  0 siblings, 1 reply; 5+ messages in thread
From: Simon Eskildsen @ 2017-02-22 20:09 UTC (permalink / raw)
  To: Eric Wong; +Cc: unicorn-public, raindrops-public

On Wed, Feb 22, 2017 at 1:33 PM, Eric Wong <e@80x24.org> wrote:
> Simon Eskildsen <simon.eskildsen@shopify.com> wrote:
>
> <snip> great to know it's still working after all these years :>
>
>> This confirms Eric's comment that the existing
>> `check_client_connection` works perfectly on loopback, but as soon as
>> you put an actual network between the Unicorn and client—it's only
>> effective 20% of the time, even with `TCP_NODELAY`. I'm assuming, due
>> to buffering, even when disabling Nagle's. As we're changing our
>> architecture, we move from ngx (lb) -> ngx (host) -> unicorn to ngx
>> (lb) -> unicorn. That means this patch will no longer work for us.
>
> Side comment: I'm a bit curious how you guys arrived at removing
> nginx at the host level (if you're allowed to share that info)
>
> I've mainly kept nginx on the same host as unicorn, but used
> haproxy or similar (with minimal buffering) at the LB level.
> That allows better filesystem load distribution for large uploads.

Absolutely. We're starting to experiment with Kubernetes, and a result
we're interested in simplifying ingress as much as possible. We could
still run them, but if I can avoid explaining why they're there for
the next 5 years—I'll be happy :) As I see, the current use-cases we
have for the host nginx are (with why we can get rid of it):

* Keepalive between edge nginx LBs and host LBs to avoid excessive
network traffic. This is not a deal-breaker, so we'll just ignore this
problem. It's a _massive_ amount of traffic normally though.
* Out of rotation. We take nodes gracefully out of rotation by
touching a file that'll return 404 on a health-checking endpoint from
the edge LBs. Kubernetes implements this by just removing all the
containers on that node.
* Graceful deploys. When we deploy with containers, we take each
container out of rotation with the local host Nginx, wait for it to
come up, and put it back in rotation. We don't utilize Unicorns
graceful restarts because we want a Ruby upgrade deploy (replacing the
container) to be the same as an updated code rollout.
* File uploads. As you mention, currently we load-balance this between
them. I haven't finished the investigation on whether this is feasible
on the front LBs. Otherwise we may go for a 2-tier Nginx solution or
expand the edge. However, with Kubernetes I'd really like to avoid
having host nginxes. It's not very native to the Kubernetes paradigm.
Does it balance across the network, or only to the pods on that node?
* check_client_connection working. :-) This thread.

Slow clients and other advantages we find from the edge Nginxes.

>> I propose instead of the early `write(2)` hack, that we use
>> `getsockopt(2)` with the `TCP_INFO` flag and read the state of the
>> socket. If it's in `CLOSE_WAIT` or `CLOSE`, we kill the connection and
>> move on to the next.
>>
>> https://github.com/torvalds/linux/blob/8fa3b6f9392bf6d90cb7b908e07bd90166639f0a/include/uapi/linux/tcp.h#L163
>>
>> This is not going to be portable, but we can do this on only Linux
>> which I suspect is where most production deployments of Unicorn that
>> would benefit from this feature run. It's not useful in development
>> (which is likely more common to not be Linux). We could also introduce
>> it under a new configuration option if desired. In my testing, this
>> works to reject 100% of requests early when not on loopback.
>
> Good to know about it's effectiveness!  I don't mind adding
> Linux-only features as long as existing functionality still
> works on the server-oriented *BSDs.
>
>> The code is essentially:
>>)?
>> def client_closed?
>>   tcp_info = socket.getsockopt(Socket::SOL_TCP, Socket::TCP_INFO)
>>   state = tcp_info.unpack("c")[0]
>>   state == TCP_CLOSE || state == TCP_CLOSE_WAIT
>> end
>
> +Cc: raindrops-public@bogomips.org -> https://bogomips.org/raindrops-public/
>
> Fwiw, raindrops (already a dependency of unicorn) has TCP_INFO
> support under Linux:
>
> https://bogomips.org/raindrops.git/tree/ext/raindrops/linux_tcp_info.c
>
> I haven't used it, much, but FreeBSD also implements a subset of
> this feature, nowadays, and ought to be supportable, too.  We
> can look into adding it for raindrops.

Cool, I think it makes sense to use Raindrops here, advantage being
it'd use the actual struct instead of a sketchy `#unpack`.

I meant to ask, in Raindrops why do you use the netlink API to get the
socket backlog instead of `getsockopt(2)` with `TCP_INFO` to get
`tcpi_unacked`? (as described in
http://www.ryanfrantz.com/posts/apache-tcp-backlog/) We use this to
monitor socket backlogs with a sidekick Ruby daemon. Although we're
looking to replace it with a middleware to simplify for Kubernetes.
It's one of our main metrics for monitoring performance, especially
around deploys.

>
> I don't know about other BSDs...
>
>> This could be called at the top of `#process_client` in `http_server.rb`.
>>
>> Would there be interest in this upstream? Any comments on this
>> proposed implementation? Currently, we're using a middleware with the
>> Rack hijack API, but this seems like it belongs at the webserver
>> level.
>
> I guess hijack means you have to discard other middlewares and
> the normal rack response handling in unicorn?  If so, ouch!
>
> Unfortunately, without hijack, there's no portable way in Rack
> to get at the underlying IO object; so I guess this needs to
> be done at the server level...
>
> So yeah, I guess it belongs in the webserver.

I was going to use `env["unicorn.socket"]`/`env["puma.socket"]`, but
you could also do `env.delete("hijack_io")` after hijacking to allow
Unicorn to still render the response. Unfortunately the
`<webserver>.socket` key is not part of the Rack standard, so I'm
hesitant to use it. When this gets into Unicorn I'm planning to
propose it upstream to Puma as well.

>
> I think we can automatically use TCP_INFO if available for
> check_client_connection; then fall back to the old early write
> hack for Unix sockets and systems w/o TCP_INFO (which would set
> the response_start_sent flag).
>
> No need to introduce yet another configuration option.

Cool. How would you suggest I check for TCP_INFO compatible platforms
in Unicorn? Is `RUBY_PLATFORM.ends_with?("linux".freeze)` sufficient
or do you prefer another mechanism? I agree that we should fall back
to the write hack on other platforms.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: check_client_connection using getsockopt(2)
  2017-02-22 20:09   ` Simon Eskildsen
@ 2017-02-23  1:42     ` Eric Wong
  2017-02-23  2:42       ` Simon Eskildsen
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Wong @ 2017-02-23  1:42 UTC (permalink / raw)
  To: Simon Eskildsen; +Cc: unicorn-public, raindrops-public

Simon Eskildsen <simon.eskildsen@shopify.com> wrote:

<snip> Thanks for the writeup!

Another sidenote: It seems nginx <-> unicorn is a bit odd
for deployment in a containerized environment(*).

> I meant to ask, in Raindrops why do you use the netlink API to get the
> socket backlog instead of `getsockopt(2)` with `TCP_INFO` to get
> `tcpi_unacked`? (as described in
> http://www.ryanfrantz.com/posts/apache-tcp-backlog/) We use this to
> monitor socket backlogs with a sidekick Ruby daemon. Although we're
> looking to replace it with a middleware to simplify for Kubernetes.
> It's one of our main metrics for monitoring performance, especially
> around deploys.

The netlink API allows independently-spawned processes to
monitor others; so it can be system-wide.  TCP_INFO requires the
process doing the checking to also have the socket open.

I guess this reasoning for using netlink is invalid for containers,
though...

> I was going to use `env["unicorn.socket"]`/`env["puma.socket"]`, but
> you could also do `env.delete("hijack_io")` after hijacking to allow
> Unicorn to still render the response. Unfortunately the
> `<webserver>.socket` key is not part of the Rack standard, so I'm
> hesitant to use it. When this gets into Unicorn I'm planning to
> propose it upstream to Puma as well.

I was going to say env.delete("rack.hijack_io") is dangerous
(since env could be copied by middleware); but apparently not:
rack.hijack won't work with a copied env, either.
You only need to delete with the same env object you call
rack.hijack with.

But calling rack.hijack followed by env.delete may still
have unintended side-effects in other servers; so I guess
we (again) cannot rely on hijack working portably.

> Cool. How would you suggest I check for TCP_INFO compatible platforms
> in Unicorn? Is `RUBY_PLATFORM.ends_with?("linux".freeze)` sufficient
> or do you prefer another mechanism? I agree that we should fall back
> to the write hack on other platforms.

The Raindrops::TCP_Info class should be undefined on unsupported
platforms, so I think you can check for that, instead.  Then it
should be transparent to add FreeBSD support from unicorn's
perspective.

(*) So I've been wondering if adding a "unicorn-mode" to an
    existing C10K, slow-client-capable Ruby/Rack server +
    reverse proxy makes sense for containerized deploys.
    Maybe...

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: check_client_connection using getsockopt(2)
  2017-02-23  1:42     ` Eric Wong
@ 2017-02-23  2:42       ` Simon Eskildsen
  2017-02-23  3:52         ` Eric Wong
  0 siblings, 1 reply; 5+ messages in thread
From: Simon Eskildsen @ 2017-02-23  2:42 UTC (permalink / raw)
  To: Eric Wong; +Cc: unicorn-public, raindrops-public

On Wed, Feb 22, 2017 at 8:42 PM, Eric Wong <e@80x24.org> wrote:
> Simon Eskildsen <simon.eskildsen@shopify.com> wrote:
>
> <snip> Thanks for the writeup!
>
> Another sidenote: It seems nginx <-> unicorn is a bit odd
> for deployment in a containerized environment(*).
>
>> I meant to ask, in Raindrops why do you use the netlink API to get the
>> socket backlog instead of `getsockopt(2)` with `TCP_INFO` to get
>> `tcpi_unacked`? (as described in
>> http://www.ryanfrantz.com/posts/apache-tcp-backlog/) We use this to
>> monitor socket backlogs with a sidekick Ruby daemon. Although we're
>> looking to replace it with a middleware to simplify for Kubernetes.
>> It's one of our main metrics for monitoring performance, especially
>> around deploys.
>
> The netlink API allows independently-spawned processes to
> monitor others; so it can be system-wide.  TCP_INFO requires the
> process doing the checking to also have the socket open.
>
> I guess this reasoning for using netlink is invalid for containers,
> though...

If you namespace the network it's problematic, yeah. I'm considering
right now putting Raindrops in a middleware with the netlink API
inside the container, but it feels weird. That said, if you consider
the alternative of using `getsockopt(2)` on the listening socket, I
don't know how you'd get access to the Unicorn listening socket from a
middleware. Would it be nuts to expose a hook in Unicorn that allows
periodic execution for monitoring listening stats from Raindrops on
the listening socket? It seems somewhat of a narrow use-case, but on
the other hand I'm also not a fan of doing
`Raindrops::Linux.tcp_listener_stats("localhost:#{ENV["PORT"}")`, but
that might be less ugly.

>
>> I was going to use `env["unicorn.socket"]`/`env["puma.socket"]`, but
>> you could also do `env.delete("hijack_io")` after hijacking to allow
>> Unicorn to still render the response. Unfortunately the
>> `<webserver>.socket` key is not part of the Rack standard, so I'm
>> hesitant to use it. When this gets into Unicorn I'm planning to
>> propose it upstream to Puma as well.
>
> I was going to say env.delete("rack.hijack_io") is dangerous
> (since env could be copied by middleware); but apparently not:
> rack.hijack won't work with a copied env, either.
> You only need to delete with the same env object you call
> rack.hijack with.
>
> But calling rack.hijack followed by env.delete may still
> have unintended side-effects in other servers; so I guess
> we (again) cannot rely on hijack working portably.

Exactly, it gives the illusion of portability but e.g. Puma stores an
instance variable to check whether a middleware hijacked, rendering
the `env#delete` option useless.

>
>> Cool. How would you suggest I check for TCP_INFO compatible platforms
>> in Unicorn? Is `RUBY_PLATFORM.ends_with?("linux".freeze)` sufficient
>> or do you prefer another mechanism? I agree that we should fall back
>> to the write hack on other platforms.
>
> The Raindrops::TCP_Info class should be undefined on unsupported
> platforms, so I think you can check for that, instead.  Then it
> should be transparent to add FreeBSD support from unicorn's
> perspective.

Perfect. I'll start working on a patch.

>
>
> (*) So I've been wondering if adding a "unicorn-mode" to an
>     existing C10K, slow-client-capable Ruby/Rack server +
>     reverse proxy makes sense for containerized deploys.
>     Maybe...

I'd love to hear more about this idea. What are you contemplating?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: check_client_connection using getsockopt(2)
  2017-02-23  2:42       ` Simon Eskildsen
@ 2017-02-23  3:52         ` Eric Wong
  0 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2017-02-23  3:52 UTC (permalink / raw)
  To: Simon Eskildsen; +Cc: unicorn-public, raindrops-public

Simon Eskildsen <simon.eskildsen@shopify.com> wrote:
> On Wed, Feb 22, 2017 at 8:42 PM, Eric Wong <e@80x24.org> wrote:
> > Simon Eskildsen <simon.eskildsen@shopify.com> wrote:
> >> I meant to ask, in Raindrops why do you use the netlink API to get the
> >> socket backlog instead of `getsockopt(2)` with `TCP_INFO` to get
> >> `tcpi_unacked`? (as described in
> >> http://www.ryanfrantz.com/posts/apache-tcp-backlog/) We use this to
> >> monitor socket backlogs with a sidekick Ruby daemon. Although we're
> >> looking to replace it with a middleware to simplify for Kubernetes.
> >> It's one of our main metrics for monitoring performance, especially
> >> around deploys.
> >
> > The netlink API allows independently-spawned processes to
> > monitor others; so it can be system-wide.  TCP_INFO requires the
> > process doing the checking to also have the socket open.
> >
> > I guess this reasoning for using netlink is invalid for containers,
> > though...
> 
> If you namespace the network it's problematic, yeah. I'm considering
> right now putting Raindrops in a middleware with the netlink API
> inside the container, but it feels weird. That said, if you consider
> the alternative of using `getsockopt(2)` on the listening socket, I
> don't know how you'd get access to the Unicorn listening socket from a
> middleware. Would it be nuts to expose a hook in Unicorn that allows
> periodic execution for monitoring listening stats from Raindrops on
> the listening socket? It seems somewhat of a narrow use-case, but on
> the other hand I'm also not a fan of doing
> `Raindrops::Linux.tcp_listener_stats("localhost:#{ENV["PORT"}")`, but
> that might be less ugly.

Yeah, all options get pretty ugly since Rack doesn't expose that
in a standardized way.  Unicorn::HttpServer::LISTENERS is a
historical constant which stores all listeners used by some
parts of raindrops.

Maybe relying on ObjectSpace or walking the LISTEN_FDS env from
systemd is acceptable for other servers *shrug*

Another way might be to rely on tcpi_last_data_recv in struct
tcp_info from each and every client socket.  That should tell
you when a client stopped writing to the socket, which works for
most HTTP requests.  You could use the same getsockopt() call
you'd use for check_client_connection to read that field.

However, this won't be visible until the client is accept()ed.

raindrops actually has some support for this, but it was
ugly, too (hooking into *accept* methods):
https://bogomips.org/raindrops/Raindrops/LastDataRecv.html

Perhaps refinements could be used, nowadays (I've never used
them).  Just throwing options out there...

In any case, what I definitely do not want is to introduced more
specialized configuration or API which might lock people into
unicorn or having to burden people with versioning incompatibilies.

> > (*) So I've been wondering if adding a "unicorn-mode" to an
> >     existing C10K, slow-client-capable Ruby/Rack server +
> >     reverse proxy makes sense for containerized deploys.
> >     Maybe...
> 
> I'd love to hear more about this idea. What are you contemplating?

Maybe another time, and not on the unicorn list.  I don't feel
comfortable writing about unrelated/tangential projects on the
unicorn list, even if I'm the leader of both projects.  Anyways,
this other server is also in the rack README.md and I've been
announcing it on ruby-talk since 2013.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-02-23  3:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAO3HKM49+aLD=KLij3zhJqkWnR7bCWVan0mOvxD85xfrW8RXOw@mail.gmail.com>
2017-02-22 18:33 ` check_client_connection using getsockopt(2) Eric Wong
2017-02-22 20:09   ` Simon Eskildsen
2017-02-23  1:42     ` Eric Wong
2017-02-23  2:42       ` Simon Eskildsen
2017-02-23  3:52         ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/raindrops.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).