unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed
* Re: [Mongrel] scaling unicorn
       [not found] <AANLkTilv4e_DPDKy440xotrlE7ucFIFXs74uHyGrzCKL@mail.gmail.com>
@ 2010-06-22  0:16 ` Eric Wong
  2010-06-22  2:34   ` Jamie Wilkinson
  2010-06-22 17:30   ` [Mongrel] " snacktime
  0 siblings, 2 replies; 8+ messages in thread
From: Eric Wong @ 2010-06-22  0:16 UTC (permalink / raw)
  To: mongrel-unicorn; +Cc: mongrel-users

snacktime <snacktime@gmail.com> wrote:
> Interested in some feeback on this (does it sound right?), or maybe
> this might be of interest to others.

Hi Chris,

I think you meant to post this to the mongrel-unicorn@rubyforge.org
list, not mongrel-users@rubyforge.org :>

> We are launching a new facebook app in a couple weeks and we did some
> load testing over the weekend on our unicorn web cluster.  The servers
> are 8 way xeon's with 24gb ram.  Our app ended up being primarily cpu
> bound.  So far the sweet spot for the number of unicorns seems to be
> around 40.  This seemed to yield the most requests per second without
> overloading the server or hitting memory bandwidth issues.  The
> backlog is at the somaxconn default of 128, I'm still not sure if we
> will bump that up or not.

The default backlog we try to specify is actually 1024 (same as
Mongrel).  But it's always a murky value anyways, as it's
kernel/sysctl-dependent.  With Unix domain sockets, some folks use
crazy values like 2048 to look better on synthetic benchmarks :)

> Increasing the number of unicorns beyond a
> certain point resulted in a noticable drop in the requests per second
> the server could handle.   I'm pretty sure the cause is the box
> running out of memory bandwidth.  The load average and resource usage
> in general (except for memory) would keep going down but so did the
> requests per second.  At 80 unicorns the requests per second dropped
> by more then half.  I'm going to disable hyperthreading and rerun some
> of the tests to see what impact that has.

That's "8 way xeon" _before_ hyperthreading, right?  Which family of
Xeons are you using, the Pentium4-based crap or the awesome new ones?

How much memory is each Unicorn worker using for your app?

40 workers for 8 physical cores sounds reasonable.  Depending on the
app, I think the reasonable range is anywhere from 2-8 workers per
physical core.  More if you're (unfortunately) limited by external
network calls, but since you claim to be CPU bound, less.

Do you have actual performance numbers you're able to share?
Mean/median request times/rates would be very useful.  If your requests
run very quickly, you may be limited by contention with the accept()
syscall on the listen socket, too.

I assume you're using nginx as the proxy, is this with Unix domain
sockets or TCP sockets?  Unix domain sockets should give a small
performance over TCP if it's all on the same box.

With TCP, you should also check to see you have enough local ports
available if you're hitting extremely high (and probably unrealistic :)
request rates.

-- 
Eric Wong
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: scaling unicorn
  2010-06-22  0:16 ` [Mongrel] scaling unicorn Eric Wong
@ 2010-06-22  2:34   ` Jamie Wilkinson
  2010-06-22  4:53     ` Eric Wong
  2010-06-22 17:30   ` [Mongrel] " snacktime
  1 sibling, 1 reply; 8+ messages in thread
From: Jamie Wilkinson @ 2010-06-22  2:34 UTC (permalink / raw)
  To: unicorn list


On Jun 21, 2010, at 5:16 PM, Eric Wong wrote:

>> overloading the server or hitting memory bandwidth issues.  The
>> backlog is at the somaxconn default of 128, I'm still not sure if we
>> will bump that up or not.
> 
> The default backlog we try to specify is actually 1024 (same as
> Mongrel).  But it's always a murky value anyways, as it's
> kernel/sysctl-dependent.  With Unix domain sockets, some folks use
> crazy values like 2048 to look better on synthetic benchmarks :)

Somewhat related -- I've been meaning to discuss the finer points of backlog tuning.

I've been experimenting with the multi-server socket+TCP megaunicorn configuration from your CDT:
http://rubyforge.org/pipermail/mongrel-unicorn/2009-September/000033.html

Which I think is what this sentence from TUNING is talking about?

	"Setting a very low value for the :backlog parameter in “listen” directives can allow failover to happen more quickly if your cluster is configured for it."

Our app can catch a batch of requests which will be slow (1-3s), and these can pool on one individual server in our load-balanced EC2 cluster -- exactly the case for the multi-server failover setup. 

I've put this into production under a healthy load (5000+ RPM) and it appears to work really well!  Produces very high requests/s rates at significantly higher concurrency than without, and serves zero 502 errors (part of the goal)

I currently I have the unix socket set to a backlog of 64, then failing over to a TCP listener using backlog 1024 (so that things are queued rather than 502'd)

I can imagine there might be a case for keeping the TCP backlog low as well & serving errors when overloaded, rather than getting caught in an unrecoverable back-queue tarpit

I'm currently failing-over to a dedicated "backup" instance, so that I could measure exactly how much traffic is being offloaded. This means my benchmarks w/o failover are 1 server, but with failover is actually 2 servers. We're reconfiguring to something more like the original diagram at which point I'll do some cluster-wide stress-tests & share data/scripts/process. 

BTW, this configuration needs a cool name!

-jamie
http://jamiedubs.com
http://fffff.at

_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: scaling unicorn
  2010-06-22  2:34   ` Jamie Wilkinson
@ 2010-06-22  4:53     ` Eric Wong
  2010-06-22 18:03       ` snacktime
  2010-06-22 19:18       ` Jamie Wilkinson
  0 siblings, 2 replies; 8+ messages in thread
From: Eric Wong @ 2010-06-22  4:53 UTC (permalink / raw)
  To: unicorn list

Jamie Wilkinson <jamie@tramchase.com> wrote:
> On Jun 21, 2010, at 5:16 PM, Eric Wong wrote:
> >> overloading the server or hitting memory bandwidth issues.  The
> >> backlog is at the somaxconn default of 128, I'm still not sure if
> >> we will bump that up or not.
> > 
> > The default backlog we try to specify is actually 1024 (same as
> > Mongrel).  But it's always a murky value anyways, as it's
> > kernel/sysctl-dependent.  With Unix domain sockets, some folks use
> > crazy values like 2048 to look better on synthetic benchmarks :)
>  
> Somewhat related -- I've been meaning to discuss the finer points of
> backlog tuning.
>  
> I've been experimenting with the multi-server socket+TCP megaunicorn
> configuration from your CDT:
> http://rubyforge.org/pipermail/mongrel-unicorn/2009-September/000033.html
> 
> Which I think is what this sentence from TUNING is talking about?
>  
> 	"Setting a very low value for the :backlog parameter in “listen”
> 	directives can allow failover to happen more quickly if your
> 	cluster is configured for it."

Yes.

<snip>

Thanks for sharing, and good to this is working well for you.

I'm still unlikely to have the chance to test this anywhere soon, but
maybe more folks can give it a try now that we've had one successful
report.   More reports (success or not) would definitely be good
to hear.

> BTW, this configuration needs a cool name!

Since you're the first person brave enough to try (or at least report
about it), you shall have the honor of naming it :)

-- 
Eric Wong
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Mongrel] scaling unicorn
  2010-06-22  0:16 ` [Mongrel] scaling unicorn Eric Wong
  2010-06-22  2:34   ` Jamie Wilkinson
@ 2010-06-22 17:30   ` snacktime
  1 sibling, 0 replies; 8+ messages in thread
From: snacktime @ 2010-06-22 17:30 UTC (permalink / raw)
  To: unicorn list

On Mon, Jun 21, 2010 at 5:16 PM, Eric Wong <normalperson@yhbt.net> wrote:
> snacktime <snacktime@gmail.com> wrote:
>> Interested in some feeback on this (does it sound right?), or maybe
>> this might be of interest to others.
>
> Hi Chris,
>
> I think you meant to post this to the mongrel-unicorn@rubyforge.org
> list, not mongrel-users@rubyforge.org :>
>
Yes, not sure how that got mixed up...


>
> That's "8 way xeon" _before_ hyperthreading, right?  Which family of
> Xeons are you using, the Pentium4-based crap or the awesome new ones?
>
Two quad core Nehalems on each server.

> How much memory is each Unicorn worker using for your app?
>
Undoubtedly this is lower then it will be under a real load, but under
our load tests they stabilize at around 160mb.

> Do you have actual performance numbers you're able to share?
> Mean/median request times/rates would be very useful.  If your requests
> run very quickly, you may be limited by contention with the accept()
> syscall on the listen socket, too.
>

I had two different types of requests to test that I did in varying
combinations.  One takes on average 600ms, and the other 40ms.  98% of
our requests will be the faster one.  Deviations were really low.

> I assume you're using nginx as the proxy, is this with Unix domain
> sockets or TCP sockets?  Unix domain sockets should give a small
> performance over TCP if it's all on the same box.
>

Yes nginx with domain sockets.


Chris
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: scaling unicorn
  2010-06-22  4:53     ` Eric Wong
@ 2010-06-22 18:03       ` snacktime
  2010-06-22 18:57         ` Jamie Wilkinson
  2010-06-23  9:32         ` Eric Wong
  2010-06-22 19:18       ` Jamie Wilkinson
  1 sibling, 2 replies; 8+ messages in thread
From: snacktime @ 2010-06-22 18:03 UTC (permalink / raw)
  To: unicorn list

>> Somewhat related -- I've been meaning to discuss the finer points of
>> backlog tuning.
>>
>> I've been experimenting with the multi-server socket+TCP megaunicorn
>> configuration from your CDT:
>> http://rubyforge.org/pipermail/mongrel-unicorn/2009-September/000033.html

So I'm in the position of launching a web app in a couple of weeks
that is pretty much guaranteed to get huge traffic.  I'm working with
ops people who are very good but this is not how they would normally
setup load balancing and scale out.  I'm having a meeting with our
network ops lead tomorrow to talk about this.  I like the idea of this
approach, it seems like it gives you more fine grained control over
how much load you put on individual servers as well as how individual
requests are handled.  But I'm not too keen on using something like
this at scale when we simply don't have the chance to test it out at a
smaller scale.  I have yet to see anyone with this setup running at
scale.  That of course doesn't mean it's not a great idea, only that I
doubt our ops guys are going to want to be the first.  They are
already overworked as it is:)

So assuming we will scale out the 'normal' way by not having a short
backlog, any info on how to manage that?   Should we control the
backlog queue in nginx (not sure exactly how I would do that) or via
the listen backlog?  I was looking around last night and couldn't find
a way to actually poll the listen backlog queue size.

Also, any ideas on how you would practically manage this type of load
balancing setup?  Seems like you would have some type of 'reserve'
cluster for requests that hit the listen backlog, and when you start
seeing too much traffic going to the reserve, you add more servers to
your main pool.  How else would you manage the configuration for
something like this when you are working with 100 - 200 servers?  You
can't be changing the nginx configs every time you add servers, that's
just not practical.

Chris
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: scaling unicorn
  2010-06-22 18:03       ` snacktime
@ 2010-06-22 18:57         ` Jamie Wilkinson
  2010-06-23  9:32         ` Eric Wong
  1 sibling, 0 replies; 8+ messages in thread
From: Jamie Wilkinson @ 2010-06-22 18:57 UTC (permalink / raw)
  To: unicorn list

>> Somewhat related -- I've been meaning to discuss the finer points of
>> backlog tuning.
>> 
>> I've been experimenting with the multi-server socket+TCP megaunicorn
>> configuration from your CDT:
>> http://rubyforge.org/pipermail/mongrel-unicorn/2009-September/000033.html

On Jun 22, 2010, at 11:03 AM, snacktime wrote:

> Seems like you would have some type of 'reserve'
> cluster for requests that hit the listen backlog, and when you start
> seeing too much traffic going to the reserve, you add more servers to
> your main pool.  How else would you manage the configuration for
> something like this when you are working with 100 - 200 servers?  You
> can't be changing the nginx configs every time you add servers, that's
> just not practical.

We are using chef for machine configuration which makes these kinds of  numbers doable
http://wiki.opscode.com/display/chef/Home

I would love to see a nginx module for distributed configuration mgmnt

Right now we are running 6 frontend machines, 4 in use & 2 in reserve like you described. We are doing about 5000rpm with this, almost all dynamic. 10-30% of requests might be 'slow' (1+s) depending on usage patterns. 

To measure health I am using munin to watch system load, nginx requests & nginx errors. In this configuration 502 Bad Gateways from frontend nginx indicate a busy unicorn socket & thus a handoff of the request to the backups. Then we measure the rails production.log for request counts + speed on each server as well as using NewRelic RPM

monit also emails us when 502s show up. 
In theory monit could  be automatically spinning up another backup server, provisioning it using chef, then reprovisioning the rest of the cluster to start handing over traffic. Alternately the new server could just act as backup for the one overloaded machine, which could make isolating performance issues easier.

-jamie

_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: scaling unicorn
  2010-06-22  4:53     ` Eric Wong
  2010-06-22 18:03       ` snacktime
@ 2010-06-22 19:18       ` Jamie Wilkinson
  1 sibling, 0 replies; 8+ messages in thread
From: Jamie Wilkinson @ 2010-06-22 19:18 UTC (permalink / raw)
  To: unicorn list

On Jun 21, 2010, at 9:53 PM, Eric Wong wrote:

> Thanks for sharing, and good to this is working well for you.
> 
> I'm still unlikely to have the chance to test this anywhere soon, but
> maybe more folks can give it a try now that we've had one successful
> report.   More reports (success or not) would definitely be good
> to hear.
> 
>> BTW, this configuration needs a cool name!
> 
> Since you're the first person brave enough to try (or at least report
> about it), you shall have the honor of naming it :)

The all-knowing WikiAnswers says "a group of unicorns is a blessing" :)

http://wiki.answers.com/Q/What_is_a_group_of_Unicorns_called

Some great fan art out there:

http://www.elfwood.com/~ara-tun/Unicorn-Herd.2537340.html

But my coworkers & I are voting "pegacorn"

http://images.elfwood.com/art/m/i/michelle16/pegacorn.jpg

-jamie
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: scaling unicorn
  2010-06-22 18:03       ` snacktime
  2010-06-22 18:57         ` Jamie Wilkinson
@ 2010-06-23  9:32         ` Eric Wong
  1 sibling, 0 replies; 8+ messages in thread
From: Eric Wong @ 2010-06-23  9:32 UTC (permalink / raw)
  To: unicorn list

snacktime <snacktime@gmail.com> wrote:
> >> Somewhat related -- I've been meaning to discuss the finer points of
> >> backlog tuning.
> >>
> >> I've been experimenting with the multi-server socket+TCP megaunicorn
> >> configuration from your CDT:
> >> http://rubyforge.org/pipermail/mongrel-unicorn/2009-September/000033.html
> 
> So I'm in the position of launching a web app in a couple of weeks
> that is pretty much guaranteed to get huge traffic.  I'm working with
> ops people who are very good but this is not how they would normally
> setup load balancing and scale out.  I'm having a meeting with our
> network ops lead tomorrow to talk about this.  I like the idea of this
> approach, it seems like it gives you more fine grained control over
> how much load you put on individual servers as well as how individual
> requests are handled.  But I'm not too keen on using something like
> this at scale when we simply don't have the chance to test it out at a
> smaller scale.  I have yet to see anyone with this setup running at
> scale.  That of course doesn't mean it's not a great idea, only that I
> doubt our ops guys are going to want to be the first.  They are
> already overworked as it is:)

No worries.  Don't ever feel obligated to try something you're not
comfortable with.  Heck, it took months before anybody besides myself
was comfortable with Unicorn.

> So assuming we will scale out the 'normal' way by not having a short
> backlog, any info on how to manage that?   Should we control the
> backlog queue in nginx (not sure exactly how I would do that) or via
> the listen backlog?  I was looking around last night and couldn't find
> a way to actually poll the listen backlog queue size.

nginx lets you specify a backlog=num with the "listen" directive
much like Unicorn does (Unicorn steals most configuration parameter
names/options from nginx):

  http://wiki.nginx.org/NginxHttpCoreModule#listen

If you use Linux, you can poll the current listen queue
using Raindrops (http://raindrops.bogomips.org/), the ss(8) utility,
or parsing /proc/net/tcp and/or /proc/net/unix.  Unfortunately,
checking the listen queue for Unix domain sockets is expensive,
Raindrops and ss(8) both need to parse /proc/net/unix because
that info isn't available via netlink.

> Also, any ideas on how you would practically manage this type of load
> balancing setup?  Seems like you would have some type of 'reserve'
> cluster for requests that hit the listen backlog, and when you start
> seeing too much traffic going to the reserve, you add more servers to
> your main pool.  How else would you manage the configuration for
> something like this when you are working with 100 - 200 servers?  You
> can't be changing the nginx configs every time you add servers, that's
> just not practical.

I've never tried this setup, so what Jamie said :)

One extra note, 100-200 hosts in an upstream {} block makes a very long
nginx config file.  You could use ERB or something else to template,
but based on a previous reading of the nginx source code, you can
also setup a round-robin DNS entry for all the servers.

nginx only does DNS lookups for upstreams at load time.  For round-robin
DNS entries, nginx adds an entry for every IP address a name resolves
to, so just specify the one DNS name in the upstream block instead of
the list of IP(s).

Just remember to HUP the nginxes (or if you're forgetful, make an
occasional cronjob to HUP them) when you make DNS changes and add/remove
a box.

-- 
Eric Wong
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-06-23  9:39 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <AANLkTilv4e_DPDKy440xotrlE7ucFIFXs74uHyGrzCKL@mail.gmail.com>
2010-06-22  0:16 ` [Mongrel] scaling unicorn Eric Wong
2010-06-22  2:34   ` Jamie Wilkinson
2010-06-22  4:53     ` Eric Wong
2010-06-22 18:03       ` snacktime
2010-06-22 18:57         ` Jamie Wilkinson
2010-06-23  9:32         ` Eric Wong
2010-06-22 19:18       ` Jamie Wilkinson
2010-06-22 17:30   ` [Mongrel] " snacktime

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).