From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-2.9 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00 shortcircuit=no autolearn=unavailable version=3.3.2 X-Original-To: yahns-public@yhbt.net Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 04DAB1F4E1; Wed, 14 Jan 2015 07:28:39 +0000 (UTC) Date: Wed, 14 Jan 2015 07:28:39 +0000 From: Eric Wong To: yahns-public@yhbt.net Subject: [PATCH] doc: add design_notes document Message-ID: <20150114072839.GA30416@dcvr.yhbt.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline List-Id: I actually forgot we have the part where we yield a client at the end of every HTTP/1.x request. This hurts cache CPU cache locality, but I bet there's other things in Ruby which destroy CPU cache locality much more than this. Hopefully we don't forget this stuff for HTTP/2, because concurrency for that is so tricky with muliplexed connections! --- Documentation/design_notes.txt | 101 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 101 insertions(+) create mode 100644 Documentation/design_notes.txt diff --git a/Documentation/design_notes.txt b/Documentation/design_notes.txt new file mode 100644 index 0000000..bf63617 --- /dev/null +++ b/Documentation/design_notes.txt @@ -0,0 +1,101 @@ +event queues in yahns +--------------------- + +There are currently 2 classes of queues and 2 classes of thread pools +in yahns. + +While non-blocking I/O with epoll or kqueue is a cheap way to handle +thousands of socket connections, multi-threading is required for many +existing APIs, including Rack and standard POSIX filesystem interfaces. + +listen queue + accept() thread pool +----------------------------------- + +Like all TCP servers, there is a standard listen queue for every listen +socket we have inside the kernel. + +Each listen queue has a dedicated thread pool running _blocking_ +accept(2) (or accept4(2)) syscall in a loop. We use dedicated threads +and blocking accept to benefit from "wake-one" behavior in the Linux +kernel. By default, this thread pool only has thread per-process, doing +nothing but accepting sockets and injecting into to the event queue +(used by epoll or kqueue). + +worker thread pool +------------------ + +This is where all the interesting application dispatch happens in yahns. +epoll(2) (or kqueue(2)) descriptor is the heart of event queue. This +design allows clients to migrate between different threads as they +become active, preventing head-of-line blocking in traditional designs +where a client is pinned to a thread (at the cost of weaker cache +locality). + +The critical component for implementing this thread pool is "one-shot" +notifications in the epoll and kqueue APIs, allowing them to be used as +readiness queues for feeding the thread pool. Used correctly, this +allows us to guarantee exclusive access to a client socket without +additional locks managed in userspace. + +Idle threads will sit performing epoll_wait (or kqueue) indefinitely +until a socket is reported as "ready" by the kernel. + +queue flow +---------- + +Once a client is accept(2)-ed, it is immediately pushed into the worker +thread pool (via EPOLL_CTL_ADD or EV_ADD). This mimics the effect of +TCP_DEFER_ACCEPT (in Linux) and the "dataready" accept filter (in +FreeBSD) from the perspective of the epoll_wait(2)/kqueue(2) caller. +No explicit locking controlled from userspace is necessary. + +TCP_DEFER_ACCEPT/"dataready"/"httpready" themselves are not used as it +has some documented and unresolved issues (and adds latency). + + https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/134274 + http://labs.apnic.net/blabs/?p=57 + +Denial-of-Service and head-of-line blocking mitigation +------------------------------------------------------ + +As mentioned before, traditional uses of multi-threaded event loops may +suffer from head-of-line blocking because clients on a busy thread may +not be able to migrate to a non-busy thread. In yahns, a client +automatically migrates to the next available thread in the worker thread +pool. + +yahns can safely yield a client after every HTTP request, forcing the +client to be rescheduled (via epoll/kqueue) after any existing clients +have completed processing. + +"Yielding" a client is accomplished by re-arming the already "ready" +socket by using EPOLL_CTL_MOD (with EPOLLONESHOT) with a one-shot +notification requeues the descriptor at the end of the internal epoll +ready queue; achieving a similar effect to yielding a thread (via +sched_yield or Thread.pass) in a purely multi-threaded design. + +Once the client is yielded, epoll_wait is called again to pull +the next client off the ready queue. + +Output buffering notes +---------------------- + +yahns will not read data from a client socket if there is any outgoing +data buffered by yahns. This prevents clients from performing a DoS +sending a barrage of requests but not reading them (this should be +obvious behavior for any server!). + +If outgoing data cannot fit into the kernel socket buffer, we buffer to +the filesystem immediately to avoid putting pressure on malloc (or the +Ruby GC). This also allows use of the sendfile(2) syscall to avoid +extra copies into the kernel. + +Input buffering notes (for Rack) +-------------------------------- + +As seen by the famous "Slowloris" example, slow clients can ruin some +HTTP servers. By default, yahns will use non-blocking I/O to +fully-buffer an HTTP request before allowing the Rack 1.x application +dispatch to block a thread. This unfortunately means we double the +amount of data copied, but prevents us from being hogged by slow clients +due to the synchronous nature of Rack 1.x API for handling uploads. -- EW