[CFT] multi server failover setup

unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed

* [CFT] multi server failover setup
@ 2009-09-19 23:23 Eric Wong
  0 siblings, 0 replies; only message in thread
From: Eric Wong @ 2009-09-19 23:23 UTC (permalink / raw)
  To: mongrel-unicorn

I've been meaning to test this setup somewhere for a while, but never
got the right app/(real) traffic/hardware to do it with.  So maybe
somebody can try it out and let us know if this works...

It's for applications running the same code/data on a cluster of
machines, so this doesn't apply to the majority of low-traffic sites
out there.

The goal is to avoid having to run a dedicated load
balancer/proxy/virtual IP in front of the application servers
by using round-robin DNS for the general load-balancing case.

The immediate downside to this approach is that if one host goes
completely dead (or just the nginx instance), your clients will
have the burden of doing failover.  I know curl will failover if
DNS resolves multiple addresses but I'm not sure about other
HTTP clients...

This setup requires that nginx + unicorn run on all application
boxes.

The request flow for a 3 machine cluster would look like this:

             /--> host1(nginx --> unicorn)
            /
           /
    client ----> host2(nginx --> unicorn)
           \
            \
             \--> host3(nginx --> unicorn)

Now in the unicorn configs:

    # We configure unicorn to listen on both a UNIX socket and TCP:

    # First the UNIX socket socket
    listen "/tmp/sock", :backlog => 5 # fails quickly if overloaded

    # use a internal IP here since Unicorn should not talk to the
    # public...
    listen "#{internal_ip}:8080", :backlog => 1024 # fail slowly

    # the exact numbers for the :backlog values are flexible
    # the idea here is just to have a very low :backlog for the
    # UNIX domain socket and big one as a fallback for the
    # TCP socket

And the nginx configs:

    upstream unicorn_failover {
      # primary connection, "fail_timeout=0" is to ensure that
      # we always *try* to use the UNIX socket on every request
      # that comes in:
      server unix:/tmp/sock fail_timeout=0;

      # failover connections, "backup" ensures these will not
      # be used unless connections to unix:/tmp/sock are failing
      # it may be advisable to reorder these on a per-host basis
      # so "host1" does not connect to "host1_private" as its
      # first choice...
      server host1_private:8080 fail_timeout=0 backup;
      server host2_private:8080 fail_timeout=0 backup;
      server host3_private:8080 fail_timeout=0 backup;
    }

The idea is to have the majority of requests will use the UNIX
socket which is a bit faster than the TCP one.  However, if
_some_ of your machines start getting overloaded, nginx can
failover to using TCP, likely on a different host which may be
less loaded.

So under heavy load, you may end up with requests flowing like
this:

                              /------>-----\
                             /              \
             /--> host1(nginx --> unicorn)<--+-<-\
            /                \___/           |   |
           /                 /   \           |   |
    client ----> host2(nginx --> unicorn)    V   ^
           \                 \___/           |   |
            \                /   \           |   |
             \--> host3(nginx --> unicorn)<--/   |
                             \                   |
                              `------>----------'

All the extra lines from this diagram are the "backup" flows.

This should help address the problem of certain (rare) actions
being extremely slow while the majority of the actions still run
quickly.  It would help smooth out pathological cases where all
the slow actions somehow end up clustering on a small subset of
machines in the cluster while the rest of the machines are still
in the comfort zone.

This setup will not help under extreme load when the entire
cluster is at capacity, only the case where an unbalanced subset
of the cluster is maxed out.

Let me know if you have any questions/comments and especially
any results if you're brave enough to try this :)

-- 
Eric Wong

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2009-09-19 23:23 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-19 23:23 [CFT] multi server failover setup Eric Wong

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).