about summary refs log tree commit homepage
path: root/ext/unicorn_http/unicorn_http.rl
DateCommit message (Collapse)
2020-03-19http: improve RFC 7230 conformance
We need to favor "Transfer-Encoding: chunked" over "Content-Length" in the request header if they both exist. Furthermore, we now reject redundant chunking and cases where "chunked" is not the final encoding. We currently do not and have no plans to decode "gzip", "deflate", or "compress" encoding as described by RFC 7230. That's a job more appropriate for middleware, anyways. cf. https://tools.ietf.org/html/rfc7230 https://www.rfc-editor.org/errata_search.php?rfc=7230
2020-01-20doc: s/bogomips.org/yhbt.net/g
bogomips.org is due to expire, soon, and I'm not willing to pay extortionist fees to Ethos Capital/PIR/ICANN to keep a .org. So it's at yhbt.net, for now, but it will change again to whatever's affordable... Identity is overrated. Tor users can use .onions and kick ICANN to the curb: torsocks w3m http://unicorn.ou63pmih66umazou.onion/ torsocks git clone http://ou63pmih66umazou.onion/unicorn.git/ torsocks w3m http://ou63pmih66umazou.onion/unicorn-public/ While we're at it, `s/news.gmane.org/news.gmane.io/g', too. (but I suspect that'll need to be resynched since our mail "List-Id:" header is changing).
2018-12-26use rb_gc_register_mark_object
Since Ruby 2.6, it's a documented part of the API and we may depend on it: https://bugs.ruby-lang.org/issues/9894 It's been around since the early Ruby 1.9 days, and reduces overhead compared to relying on rb_global_variable: https://bogomips.org/unicorn-public/20170301002854.29198-1-e@80x24.org/
2017-12-16avoid reusing env on hijack
Hijackers may capture and reuse `env' indefinitely, so we must not use it in those cases for future requests. For non-hijack requests, we continue to reuse the `env' object to reduce memory recycling. Reported-and-tested-by: Sam Saffron <sam.saffron@gmail.com>
2017-10-03fix GC issue on rb_global_variable array
We need to add the array to ruby's global_list right after created it; otherwise it probably gets GCed.
2017-03-08unicorn_http: reduce rb_global_variable calls
rb_global_variable registers the address of the variable which refers to the object, instead of the object itself. This adds extra overhead to each global variable for our case, where the variable is frozen and never changed. Given there are currently 59 elements in this array, this saves 58 singly-linked list entries and associated malloc calls and associated overhead in the current mainline Ruby 2.x implementation. On 64-bit GNU libc malloc, this is already 16 * 58 = 928 bytes; more than the extra object slot and array slack space used by the new mark array. Mainline Ruby 1.9+ currently has a rb_gc_register_mark_object public function which would suite our needs, too, but it is currently undocumented, and may not be available in the future.
2015-12-13http: TypedData C-API conversion
This provides some extra type safety if combined with other C extensions, as well as allowing us to account for memory usage of the HTTP parser in ObjectSpace. This requires Ruby 1.9.3+ and has remained a stable API since then. This will become officially supported when Ruby 2.3.0 is released later this month. This API has only been documented in doc/extension.rdoc (formerly README.EXT) in the Ruby source tree since April 2015, r50318
2015-07-15doc: remove references to old servers
They'll continue to be maintained, but we're no longer advertising them. Also, favor lowercase "unicorn" while we're at it since that matches the executable and gem name to avoid unnecessary escaping for RDoc.
2015-06-06http: move response_start_sent into the C ext
Combined with the previous commit to eliminate the `@socket' instance variable, this eliminates the last instance variable in the Unicorn::HttpRequest class. Eliminating the last instance variable avoids the creation of a internal hash table used for implementing the "generic" instance variables found in non-pure-Ruby classes. Method entry overhead remains the same. While this change doesn't do a whole lot for unicorn memory usage where the HttpRequest is a singleton, it helps other HTTP servers which rely on this code where thousands of clients may be connected.
2015-05-29http: use rb_hash_clear in Ruby 2.0+
Calling the function directly avoids the overhead of Ruby method table lookup and global method cache. The only downside is this is now hidden from tracers and cannot be overridden from Ruby, but I doubt anybody cares about that.
2015-03-02http: remove experimental dechunk! method
It was never used anywhere AFAIK and wastes precious bytes.
2015-03-02http: remove deprecated reset method
We use the `clear' method everywhere nowadays.
2015-02-04http: standalone require + reduction in binary size
This allows requiring just the C extension part of "unicorn_http", without requiring the rest of unicorn, allowing other HTTP servers using the same parser to be slimmer. On my x86-64 Debian 7.0 system: text data bss dec hex filename 44026 1976 488 46490 b59a lib/unicorn_http.so 43930 1976 456 46362 b51a lib/unicorn_http.so
2015-01-28http: -Wshorten-64-to-32 warnings on clang
Tested on x86_64, clang version 3.5-1ubuntu1 (trunk) (LLVM 3.5) These warnings were introduced on commit 4b2782a926d8f131b1e7382be35e3abb77bf4be5 ("http: reduce parser from 72 to 56 bytes on 64-bit") and did not affect any releases. These length checks should not be necessary in reality because HTTP header sizes never come close to 4GB in size. Fixup a minor coding style (inherited from Mongrel) violation while we're at it (tabs => spaces).
2014-09-17http: reduce parser from 72 to 56 bytes on 64-bit
This allows the parser struct to fit in one cache line on x86-64 systems where cache lines are 64 bytes. Using 32-bit integer lengths is safe here because these are only for tracking offsets within the HTTP header buffer. We can safely limit HTTP headers and in-memory buffers to be less than 4GB without anybody complaining. HTTP bodies continue to use off_t (usually 64-bit, even on 32-bit systems) sizes and support as much as the OS/hardware can handle.
2014-08-18http: remove the keepalive requests limit
This was a hack for some event loops such as those found in nginx and some Rainbows! concurrency models. Using epoll/kqueue with one-shot notification (which yahns does) avoids all fairness problems.
2014-05-29http: remove xftrust options
This has long been considered a mistake and not documented for very long. I considered removing X-Forwarded-Proto and X-Forwarded-SSL handling, too, so rack.url_scheme is always "http", but that might lead to compatibility issues in rare apps if Rack::Request#scheme is not used.
2013-10-26license: allow all future versions of the GNU GPL
There is currently no GPLv4, so this change has no effect at the moment. In case the GPLv4 arrives and I am not alive to approve/review it, the lesser of evils is have give blanket approval of all future GPL versions (as published by the FSF). The worse evil is to be stuck with a license which cannot guarantee the Free-ness of this project in the future. This unfortunately means the FSF can theoretically come out with license terms I do not agree with, but the GPLv2 and GPLv3 will always be an option to all users.
2013-05-08HttpParser#next? becomes response_start_sent-aware
This could allow servers with persistent connection support[1] to support our check_client_connection in the future. [1] - Rainbows!/zbatery, possibly others
2012-11-29Begin writing HTTP request headers early to detect disconnected clients
This patch checks incoming connections and avoids calling the application if the connection has been closed. It works by sending the beginning of the HTTP response before calling the application to see if the socket can successfully be written to. By enabling this feature users can avoid wasting application rendering time only to find the connection is closed when attempting to write, and throwing out the result. When a client disconnects while being queued or processed, Nginx will log HTTP response 499 but the application will log a 200. Enabling this feature will minimize the time window during which the problem can arise. The feature is disabled by default and can be enabled by adding 'check_client_connection true' to the unicorn config. [ew: After testing this change, Tom Burns wrote: So we just finished the US Black Friday / Cyber Monday weekend running unicorn forked with the last version of the patch I had sent you. It worked splendidly and helped us handle huge flash sales without increased response time over the weekend. Whereas in previous flash traffic scenarios we would see the number of HTTP 499 responses grow past the number of real HTTP 200 responses, over the weekend we saw no growth in 499s during flash sales. Unexpectedly the patch also helped us ward off a DoS attack where the attackers were disconnecting immediately after making a request. ref: <CAK4qKG3rkfVYLyeqEqQyuNEh_nZ8yw0X_cwTxJfJ+TOU+y8F+w@mail.gmail.com> ] Signed-off-by: Eric Wong <normalperson@yhbt.net>
2011-08-29add GPLv3 option to the license
Existing license terms (Ruby-specific) and GPLv2 remain in place, but GPLv3 is preferred as it helps with distribution of AGPLv3 code and is explicitly compatible with Apache License (v2.0). Many more reasons are documented by the FSF: https://www.gnu.org/licenses/quick-guide-gplv3.html http://gplv3.fsf.org/rms-why.html ref: http://thread.gmane.org/gmane.comp.lang.ruby.unicorn.general/933
2011-06-15http: delay CoW string invalidations in filter_body
Not all invocations of filter_body will trigger CoW on the given destination string. We can also avoid an unnecessary rb_str_set_len() in the non-chunked path, too.
2011-06-15http: remove tainting flag
Needless line noise, kgio doesn't support tainting anyways.
2011-06-14http: fix documentation for dechunk!
chunk_ready! was my original name for it, but I'm indecisive when it comes to naming things.
2011-06-13http: dechunk! method to enter dechunk mode
This allows one to enter the dechunker without parsing HTTP headers beforehand. Since we skipped header parsing, trailer parsing is not supported since we don't know what trailers might be (to our knowledge, nobody uses trailers anyways)
2011-06-13http: document reasoning for memcpy in filter_body
copy-on-write behavior doesn't help you if your common use case triggers copies.
2011-06-13http: rename variables in filter_body implementation
Makes things easier-to-understand since it's based on memcpy()
2011-05-23http: call rb_str_modify before rb_str_resize
Ruby 1.9.3dev (trunk) requires it if the string size is unchanged.
2011-05-23strip trailing and leading linear whitespace in headers
RFC 2616, section 4.2: > The field-content does not include any leading or trailing LWS: > linear white space occurring before the first non-whitespace > character of the field-value or after the last non-whitespace > character of the field-value. Such leading or trailing LWS MAY be > removed without changing the semantics of the field value. Any LWS > that occurs between field-content MAY be replaced with a single SP > before interpreting the field value or forwarding the message > downstream.
2011-05-05http_parser: add max_header_len accessor
Rainbows! wants to be able to lower this eventually...
2011-05-04http_parser: new add_parse method
Combines the following sequence: http_parser.buf << socket.readpartial(0x4000) http_parser.parse Into: http_parser.add_parse(socket.readpartial(0x4000)) It was too damn redundant otherwise...
2011-05-04return 414 for URI length violations
There's an HTTP status code allocated for it in <http://www.iana.org/assignments/http-status-codes>, so return that instead of 400.
2011-02-02http: parser handles IPv6 bracketed IP hostnames
Just in case we have people that don't use DNS, we can support folks who enter ugly IPv6 addresses... IPv6 uses brackets around the address to avoid confusing the colons used in the address with the colon used to denote the TCP port number in URIs.
2011-01-05http_parser: add clear method, deprecate reset
But allows small optimizations to be made to avoid constant/instance variable lookups later :)
2011-01-04http_response: implement httpdate in C
This can return a static string and be significantly faster as it reduces object allocations and Ruby method calls for the fastest websites that serve thousands of requests a second. It assumes the Ruby runtime is single-threaded, but that is the case of Ruby 1.8 and 1.9 and also what Unicorn is all about. This change is safe for Rainbows! under 1.8 and 1.9.
2010-12-26http: #keepalive? and #headers? work after #next?
We need to preserve our internal flags and only clear them on HttpParser#parse. This allows the async concurrency models in Rainbows! to work properly.
2010-12-21http: hook up "trust_x_forwarded" to configurator
More config bloat, sadly this is necessary for Rainbows! :<
2010-12-20http: allow ignoring X-Forwarded-* for url_scheme
Evil clients may be exposed to the Unicorn parser via Rainbows!, so we'll allow people to turn off blindly trusting certain X-Forwarded* headers for "rack.url_scheme" and rely on middleware to handle it.
2010-12-20http: refactor finalize_header function
rack.url_scheme handling and SERVER_{NAME,PORT} handling each deserve their own functions.
2010-12-20http: update setting of "https" for rack.url_scheme
The first value of X-Forwarded-Proto in rack.url_scheme should be used as it can be chained. This header can be set multiple times via different proxies in the chain, but consider the first one to be valid. Additionally, respect X-Forwarded-SSL as it may be passed with the "on" flag instead of X-Forwarded-Proto. ref: rack commit 85ca454e6143a3081d90e4546ccad602a4c3ad2e and 35bb5ba6746b5d346de9202c004cc926039650c7
2010-12-20http: support keepalive_requests directive
This limits the number of keepalive requests of a single connection to prevent a single client from monopolizing server resources. On multi-process servers (e.g. Rainbows!) with many keepalive clients per worker process, this can force a client to reconnect and increase its chances of being accepted on a less-busy worker process. This directive is named after the nginx directive which is identical in function.
2010-12-19http: delay clearing env on HttpParser#next?
This allows apps/middlewares on Rainbows! that rely on env in the response_body#close to hold onto the env.
2010-11-07tee_input: switch to simpler API for parsing trailers
Not that anybody uses trailers extensively, but it's good to know it's there.
2010-11-06http_parser: add HttpParser#next? method
An easy combination of the existing HttpParser#keepalive? and HttpParser#reset methods, this makes it easier to implement persistence.
2010-11-06enable HTTP keepalive support for all methods
Yes, this means even POST/PUT bodies may be kept alive, but only if the body (and trailers) are fully-consumed.
2010-10-07http: fix behavior with pipelined requests
We cannot clear the buffer between requests because clients may send multiple requests that get taken in one read()/recv() call.
2010-10-07http: remove unnecessary rb_str_update() calls
Rubinius no longer uses it, and it conflicts with a public method in MRI.
2010-10-07http: allow this to be used as a request object
The parser and request object become one and the same, since the parser lives for the lifetime of the request.
2010-10-05http: raise empty backtrace for HttpParserError
It's expensive to generate a backtrace and this exception is only triggered by bad clients. So make it harder for them to DoS us by sending bad requests.
2010-06-24http: avoid (re-)declaring the Unicorn module
It makes for messy documentation.