From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <e@80x24.org>
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on dcvr.yhbt.net
X-Spam-Level: 
X-Spam-ASN: 
X-Spam-Status: No, score=-2.9 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00
 shortcircuit=no autolearn=unavailable version=3.3.2
X-Original-To: yahns-public@yhbt.net
Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net
 (Postfix) with ESMTP id 04DAB1F4E1; Wed, 14 Jan 2015 07:28:39 +0000 (UTC)
Date: Wed, 14 Jan 2015 07:28:39 +0000
From: Eric Wong <e@80x24.org>
To: yahns-public@yhbt.net
Subject: [PATCH] doc: add design_notes document
Message-ID: <20150114072839.GA30416@dcvr.yhbt.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
List-Id: <yahns-public@yhbt.net>

I actually forgot we have the part where we yield a client at the
end of every HTTP/1.x request.  This hurts cache CPU cache locality,
but I bet there's other things in Ruby which destroy CPU cache
locality much more than this.

Hopefully we don't forget this stuff for HTTP/2, because concurrency
for that is so tricky with muliplexed connections!
---
 Documentation/design_notes.txt | 101 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 101 insertions(+)
 create mode 100644 Documentation/design_notes.txt

diff --git a/Documentation/design_notes.txt b/Documentation/design_notes.txt
new file mode 100644
index 0000000..bf63617
--- /dev/null
+++ b/Documentation/design_notes.txt
@@ -0,0 +1,101 @@
+event queues in yahns
+---------------------
+
+There are currently 2 classes of queues and 2 classes of thread pools
+in yahns.
+
+While non-blocking I/O with epoll or kqueue is a cheap way to handle
+thousands of socket connections, multi-threading is required for many
+existing APIs, including Rack and standard POSIX filesystem interfaces.
+
+listen queue + accept() thread pool
+-----------------------------------
+
+Like all TCP servers, there is a standard listen queue for every listen
+socket we have inside the kernel.
+
+Each listen queue has a dedicated thread pool running _blocking_
+accept(2) (or accept4(2)) syscall in a loop.  We use dedicated threads
+and blocking accept to benefit from "wake-one" behavior in the Linux
+kernel.  By default, this thread pool only has thread per-process, doing
+nothing but accepting sockets and injecting into to the event queue
+(used by epoll or kqueue).
+
+worker thread pool
+------------------
+
+This is where all the interesting application dispatch happens in yahns.
+epoll(2) (or kqueue(2)) descriptor is the heart of event queue.  This
+design allows clients to migrate between different threads as they
+become active, preventing head-of-line blocking in traditional designs
+where a client is pinned to a thread (at the cost of weaker cache
+locality).
+
+The critical component for implementing this thread pool is "one-shot"
+notifications in the epoll and kqueue APIs, allowing them to be used as
+readiness queues for feeding the thread pool.  Used correctly, this
+allows us to guarantee exclusive access to a client socket without
+additional locks managed in userspace.
+
+Idle threads will sit performing epoll_wait (or kqueue) indefinitely
+until a socket is reported as "ready" by the kernel.
+
+queue flow
+----------
+
+Once a client is accept(2)-ed, it is immediately pushed into the worker
+thread pool (via EPOLL_CTL_ADD or EV_ADD).  This mimics the effect of
+TCP_DEFER_ACCEPT (in Linux) and the "dataready" accept filter (in
+FreeBSD) from the perspective of the epoll_wait(2)/kqueue(2) caller.
+No explicit locking controlled from userspace is necessary.
+
+TCP_DEFER_ACCEPT/"dataready"/"httpready" themselves are not used as it
+has some documented and unresolved issues (and adds latency).
+
+  https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/134274
+  http://labs.apnic.net/blabs/?p=57
+
+Denial-of-Service and head-of-line blocking mitigation
+------------------------------------------------------
+
+As mentioned before, traditional uses of multi-threaded event loops may
+suffer from head-of-line blocking because clients on a busy thread may
+not be able to migrate to a non-busy thread.  In yahns, a client
+automatically migrates to the next available thread in the worker thread
+pool.
+
+yahns can safely yield a client after every HTTP request, forcing the
+client to be rescheduled (via epoll/kqueue) after any existing clients
+have completed processing.
+
+"Yielding" a client is accomplished by re-arming the already "ready"
+socket by using EPOLL_CTL_MOD (with EPOLLONESHOT) with a one-shot
+notification requeues the descriptor at the end of the internal epoll
+ready queue; achieving a similar effect to yielding a thread (via
+sched_yield or Thread.pass) in a purely multi-threaded design.
+
+Once the client is yielded, epoll_wait is called again to pull
+the next client off the ready queue.
+
+Output buffering notes
+----------------------
+
+yahns will not read data from a client socket if there is any outgoing
+data buffered by yahns.  This prevents clients from performing a DoS
+sending a barrage of requests but not reading them (this should be
+obvious behavior for any server!).
+
+If outgoing data cannot fit into the kernel socket buffer, we buffer to
+the filesystem immediately to avoid putting pressure on malloc (or the
+Ruby GC).  This also allows use of the sendfile(2) syscall to avoid
+extra copies into the kernel.
+
+Input buffering notes (for Rack)
+--------------------------------
+
+As seen by the famous "Slowloris" example, slow clients can ruin some
+HTTP servers.  By default, yahns will use non-blocking I/O to
+fully-buffer an HTTP request before allowing the Rack 1.x application
+dispatch to block a thread.  This unfortunately means we double the
+amount of data copied, but prevents us from being hogged by slow clients
+due to the synchronous nature of Rack 1.x API for handling uploads.
-- 
EW