From c0d79dbb2e5f0f23236c60a0e7c5bb92be2512aa Mon Sep 17 00:00:00 2001 From: Eric Wong Date: Wed, 1 Apr 2009 03:35:47 -0700 Subject: Documentation updates, prep for 0.4.1 release --- PHILOSOPHY | 139 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 139 insertions(+) create mode 100644 PHILOSOPHY (limited to 'PHILOSOPHY') diff --git a/PHILOSOPHY b/PHILOSOPHY new file mode 100644 index 0000000..ce7763a --- /dev/null +++ b/PHILOSOPHY @@ -0,0 +1,139 @@ += The Philosophy Behind Unicorn + +Being a server that only runs on Unix-like platforms, Unicorn is +strongly tied to the Unix philosophy of doing one thing and (hopefully) +doing it well. Despite using HTTP, Unicorn is strictly a _backend_ +application server for running Rack-based Ruby applications. + +== Avoid Complexity + +Instead of attempting to be efficient at serving slow clients, Unicorn +relies on a buffering reverse proxy to efficiently deal with slow +clients. + +Unicorn uses an old-fashioned preforking worker model with blocking I/O. +Our processing model is the antithesis of more modern (and theoretically +more efficient) server processing models using threads or non-blocking +I/O with events. + +=== Threads and Events Are Hard + +...to many developers. Reasons for this is beyond the scope of this +document. Unicorn avoids concurrency within each worker process so you +have fewer things to worry about when developing your application. Of +course Unicorn can use multiple worker processes to utilize multiple +CPUs or spindles. Applications can still use threads internally, however. + +== Slow Clients Are Problematic + +Most benchmarks we've seen don't tell you this, and Unicorn doesn't +care about slow clients... but you should. + +A "slow client" can be any client outside of your datacenter. Network +traffic within a local network is always faster than traffic that +crosses outside of it. The laws of physics do not allow otherwise. + +Persistent connections were introduced in HTTP/1.1 reduce latency from +connection establishment and TCP slow start. They also waste server +resources when clients are idle. + +Persistent connections mean one of the Unicorn worker processes +(depending on your application, it can be very memory hungry) would +spend a significant amount of its time idle keeping the connection alive +and not doing anything else. Being single-threaded and using +blocking I/O, a worker cannot serve other clients while keeping a +connection alive. Thus Unicorn does not implement persistent +connections. + +If your application responses are larger than the socket buffer or if +you're handling large requests (uploads), worker processes will also be +bottlenecked by the speed of the *client* connection. You should +not allow Unicorn to serve clients outside of your local network. + +== Application Concurrency != Network Concurrency + +Performance is asymmetric across the different subsystems of the machine +and parts of the network. CPUs and main memory can process gigabytes of +data in a second; clients on the Internet are usually only capable of a +tiny fraction of that. Unicorn deployments should avoid dealing with +slow clients directly and instead rely on a reverse proxy to shield it +from the effects of slow I/O. + +== Improved Performance Through Reverse Proxying + +By acting as a buffer to shield Unicorn from slow I/O, a reverse proxy +will inevitably incur overhead in the form of extra data copies. +However, as I/O within a local network is fast (and faster still +with local sockets), this overhead is neglible for the vast majority +of HTTP requests and responses. + +The ideal reverse proxy complements the weaknesses of Unicorn. +A reverse proxy for Unicorn should meet the following requirements: + + 1. It should fully buffer all HTTP requests (and large responses). + Each request should be "corked" in the reverse proxy and sent + as fast as possible to the backend Unicorn processes. This is + the most important feature to look for when choosing a + reverse proxy for Unicorn. + + 2. It should spend minimal time in userspace. Network (and disk) I/O + are system-level tasks and usually managed by the kernel. + This may change if userspace TCP stacks become more popular in the + future; but the reverse proxy should not waste time with + application-level logic. These concerns should be separated + + 3. It should avoid context switches and CPU scheduling overhead. + In many (most?) cases, network devices and their interrupts are + only be handled by one CPU at a time. It should avoid contention + within the system by serializing all network I/O into one (or few) + userspace procceses. Network I/O is not a CPU-intensive task and + it is not helpful to use multiple CPU cores (at least not for GigE). + + 4. It should efficiently manage persistent connections (and + pipelining) to slow clients. If you care to serve slow clients + outside your network, then these features of HTTP/1.1 will help. + + 5. It should (optionally) serve static files. If you have static + files on your site (especially large ones), they are far more + efficiently served with as few data copies as possible (e.g. with + sendfile() to completely avoid copying the data to userspace). + +nginx is the only (Free) solution we know of that meets the above +requirements. + +Indeed, the author of Unicorn has deployed nginx as a reverse-proxy not +only for Ruby applications, but also for production applications running +Apache/mod_perl, Apache/mod_php and Apache Tomcat. In every single +case, performance improved because application servers were able to use +backend resources more efficiently and spend less time waiting on slow +I/O. + +== Worse Is Better + +Requirements and scope for applications change frequently and +drastically. Thus languages like Ruby and frameworks like Rails were +built to give developers fewer things to worry about in the face of +rapid change. + +On the other hand, stable protocols which host your applications (HTTP +and TCP) only change rarely. This is why we recommend you NOT tie your +rapidly-changing application logic directly into the processes that deal +with the stable outside world. Instead, use HTTP as a common RPC +protocol to communicate between your frontend and backend. + +In short: separate your concerns. + +Of course a theoretical "perfect" solution would combine the pieces +and _maybe_ give you better performance at the end of the day, but +that is not the Unix way. + +== Just Worse in Some Cases + +Unicorn is not suited for all applications. Unicorn is optimized for +applications that are CPU/memory/disk intensive and spend little time +waiting on external resources (e.g. a database server or external API). + +Unicorn is highly inefficient for Comet/reverse-HTTP/push applications +where the HTTP connection spends a large amount of time idle. +Nevertheless, the ease of troubleshooting, debugging, and management of +Unicorn may still outweigh the drawbacks for these applications. -- cgit v1.2.3-24-ge0c7