unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed
* 502s with Nginx, Unicorn, and Unix Domain Sockets
@ 2009-09-18  4:54 Tom Preston-Werner
  2009-09-18  6:48 ` Eric Wong
  0 siblings, 1 reply; 5+ messages in thread
From: Tom Preston-Werner @ 2009-09-18  4:54 UTC (permalink / raw)
  To: mongrel-unicorn

I'm doing some benchmarking on our new Rackspace frontend machines (8
core, 16GB) and running into some problems with the Unix domain socket
setup. At high request rates (on simple pages) I'm getting a lot of
HTTP 502 errors from Nginx. Nothing shows up in the Unicorn error log,
but Nginx has the following in its error log:

2009/09/17 19:36:52 [error] 28277#0: *524824 connect() to
unix:/data/github/current/tmp/sockets/unicorn.sock failed (11:
Resource temporarily unavailable) while connecting to upstream,
client: 172.17.1.5, server: github.com, request: "GET /site/junk
HTTP/1.1", upstream:
"http://unix:/data/github/current/tmp/sockets/unic
orn.sock:/site/junk", host: "github.com"

This problem does not exist with the nginx -> haproxy -> unicorn
setup. Thinking this might be a file descriptor problem, I upped the
fd limit to 32768 with no luck. Then I tried upping net.core.somaxconn
to 262144 which also had no effect. I thought I'd ask about the
problem here to see if anyone knows a simple solution that I'm
missing. Perhaps there is an Nginx configuration directive I need?
Thanks. Unicorn rocks!

Tom

--
Tom Preston-Werner
GitHub Cofounder
http://tom.preston-werner.com
github.com/mojombo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 502s with Nginx, Unicorn, and Unix Domain Sockets
  2009-09-18  4:54 502s with Nginx, Unicorn, and Unix Domain Sockets Tom Preston-Werner
@ 2009-09-18  6:48 ` Eric Wong
  2009-09-19  2:30   ` Eric Wong
  2009-09-19 20:23   ` Tom Preston-Werner
  0 siblings, 2 replies; 5+ messages in thread
From: Eric Wong @ 2009-09-18  6:48 UTC (permalink / raw)
  To: Tom Preston-Werner; +Cc: mongrel-unicorn

Tom Preston-Werner <tom@github.com> wrote:
> I'm doing some benchmarking on our new Rackspace frontend machines (8
> core, 16GB) and running into some problems with the Unix domain socket
> setup. At high request rates (on simple pages) I'm getting a lot of
> HTTP 502 errors from Nginx. Nothing shows up in the Unicorn error log,
> but Nginx has the following in its error log:

Hi Tom,

At what request rates were you running into this?  Also how large are
your responses?  It could be the listen() backlog overflowing if Unicorn
isn't logging anything.  Anything in the system/kernel logs (doubtful,
actually)?

Does increasing the listen :backlog parameter work?  Default is 1024
(which is pretty high already), maybe try a higher number along with the
net.core.netdev_max_backlog sysctl.

Is there a large discrepancy between the times your benchmark client
logs, the request time nginx logs, and whatever Rails/Rack logs for
request times for any particular request?

If the Rails/Rack logging times all seem consistently low but your
nginx/benchmark has some weird spikes/outliers, then some are stuck in
the kernel listen backlog.

How much of the 8 cores are being used on those boxes when this
starts happening?

> 2009/09/17 19:36:52 [error] 28277#0: *524824 connect() to
> unix:/data/github/current/tmp/sockets/unicorn.sock failed (11:
> Resource temporarily unavailable) while connecting to upstream,
> client: 172.17.1.5, server: github.com, request: "GET /site/junk
> HTTP/1.1", upstream:
> "http://unix:/data/github/current/tmp/sockets/unic
> orn.sock:/site/junk", host: "github.com"

Raising proxy_connect_timeout in nginx may be a work around, what is it
set to now?  On the other hand, keeping it (and :backlog in Unicorn) low
would give better indications for failover to other hosts.

> This problem does not exist with the nginx -> haproxy -> unicorn
> setup. Thinking this might be a file descriptor problem, I upped the
> fd limit to 32768 with no luck. Then I tried upping net.core.somaxconn
> to 262144 which also had no effect. I thought I'd ask about the
> problem here to see if anyone knows a simple solution that I'm
> missing. Perhaps there is an Nginx configuration directive I need?
> Thanks. Unicorn rocks!

Definitely not a file descriptor problem (at least not inside Unicorn).

Also, I'm not sure there's a reason to keep haproxy between nginx
and Unicorn...  Maybe haproxy in front of the entire cluster of servers.

Are you already hitting higher request rates (and more consistent
times logged by client/nginx) with:

  nginx -> unicorn/unix

vs

  nginx -> unicorn/tcp(localhost)

?

Under extremely high loads, 502s may actually be wanted since it allows
failover to a less loaded box if there's uneven balancing; but we really
need to have numbers on the request rates.

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 502s with Nginx, Unicorn, and Unix Domain Sockets
  2009-09-18  6:48 ` Eric Wong
@ 2009-09-19  2:30   ` Eric Wong
  2009-09-19 20:23   ` Tom Preston-Werner
  1 sibling, 0 replies; 5+ messages in thread
From: Eric Wong @ 2009-09-19  2:30 UTC (permalink / raw)
  To: Tom Preston-Werner; +Cc: mongrel-unicorn

Hi Tom, any updates on this?  I'd really like to get to the bottom of
this, thanks!

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 502s with Nginx, Unicorn, and Unix Domain Sockets
  2009-09-18  6:48 ` Eric Wong
  2009-09-19  2:30   ` Eric Wong
@ 2009-09-19 20:23   ` Tom Preston-Werner
  2009-09-19 22:08     ` Eric Wong
  1 sibling, 1 reply; 5+ messages in thread
From: Tom Preston-Werner @ 2009-09-19 20:23 UTC (permalink / raw)
  To: Eric Wong; +Cc: mongrel-unicorn

On Thu, Sep 17, 2009 at 11:48 PM, Eric Wong <normalperson@yhbt.net> wrote:
> At what request rates were you running into this?  Also how large are
> your responses?  It could be the listen() backlog overflowing if Unicorn
> isn't logging anything.

I was hitting the 502s at about 1300 req/sec and 80% CPU utilization.
Response size was only a few bytes + headers. I was just testing a
very simple string response from our Rails app to make sure our setup
could tolerate very high request rates.

> Does increasing the listen :backlog parameter work?  Default is 1024
> (which is pretty high already), maybe try a higher number along with the
> net.core.netdev_max_backlog sysctl.

This was the first thing I tried after getting your response, and it
seems that upping the :backlog to 2048 solves the 502 problem! I'm now
able to get 1500 req/sec out of Unicorn/UNIX (as opposed to 1350
req/sec with the TCP/HAProxy setup). I'm quite satisfied with this
result, and I think this is how we'll end up deploying the app.

Thanks for your help, and I'll try to keep you updated on how our
installation performs and if I see any strange behavior under normal
traffic.

Tom
_______________________________________________
mongrel-unicorn mailing list
mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 502s with Nginx, Unicorn, and Unix Domain Sockets
  2009-09-19 20:23   ` Tom Preston-Werner
@ 2009-09-19 22:08     ` Eric Wong
  0 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2009-09-19 22:08 UTC (permalink / raw)
  To: Tom Preston-Werner; +Cc: mongrel-unicorn

Tom Preston-Werner <tom@github.com> wrote:
> On Thu, Sep 17, 2009 at 11:48 PM, Eric Wong <normalperson@yhbt.net> wrote:
> > At what request rates were you running into this? ??Also how large are
> > your responses? ??It could be the listen() backlog overflowing if Unicorn
> > isn't logging anything.
> 
> I was hitting the 502s at about 1300 req/sec and 80% CPU utilization.
> Response size was only a few bytes + headers. I was just testing a
> very simple string response from our Rails app to make sure our setup
> could tolerate very high request rates.

Yup, as I suspected: your UNIX socket setup was maxing out right around
where your TCP setup was maxing out.  TCP is just better at
handling/recovering from errors.

> > Does increasing the listen :backlog parameter work? ??Default is 1024
> > (which is pretty high already), maybe try a higher number along with the
> > net.core.netdev_max_backlog sysctl.
> 
> This was the first thing I tried after getting your response, and it
> seems that upping the :backlog to 2048 solves the 502 problem! I'm now
> able to get 1500 req/sec out of Unicorn/UNIX (as opposed to 1350
> req/sec with the TCP/HAProxy setup). I'm quite satisfied with this
> result, and I think this is how we'll end up deploying the app.

Good to know it worked!

However, I do hesitate to recommend a large listen() backlog for
production.  It can impede with monitoring/failover/load-balancing in
multi-server setups even if it looks good on benchmarks.

I'll make a separate call-for-testing mailing list related to
this subject in a bit...

> Thanks for your help, and I'll try to keep you updated on how our
> installation performs and if I see any strange behavior under normal
> traffic.

No problem, thanks for the feedback!  It's great to know people
actually use it.

-- 
Eric Wong

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-09-19 22:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-18  4:54 502s with Nginx, Unicorn, and Unix Domain Sockets Tom Preston-Werner
2009-09-18  6:48 ` Eric Wong
2009-09-19  2:30   ` Eric Wong
2009-09-19 20:23   ` Tom Preston-Werner
2009-09-19 22:08     ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).