unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* Re: Some issues in the HTTP parser (and suggestions)
  @ 2010-05-07 18:33  7% ` Eric Wong
  0 siblings, 0 replies; 1+ results
From: Eric Wong @ 2010-05-07 18:33 UTC (permalink / raw)
  To: unicorn list

Iñaki Baz Castillo <ibc@aliax.net> wrote:
> Hi, by inspecting the Ragel grammar of the HTTP parser (coming from
> Mongrel) I've realized of some possible issues and bugs:
> 
> 
> hostname = (alnum | "-" | "." | "_")+;
> 
> - It doesn't allow IPv6. This is important IMHO.
> - It allows "_" which is an invalid symbol (not valid for a domain).
> 
> I suggest:
>   hostname = (alnum | "-" | "." | "[" | "]" | ":")+;

Hi Iñaki,

Underscore isn't valid for hostnames, but it is allowed in domain names
and most DNS servers will resolve them.  I've personally seen websites
with underscores in their domain names in the wild[1].

We'll have to test the IPv6 addresses and probably split that out into a
separate regexp since ":" would raise issues with the port number in
existing cases.  This is probably something for post-1.0.

> ------------------
> 
> host_with_port = (hostname (":" digit*)?) >mark %host;
> 
> - It allows something ugly as "mydomain.org:"
> 
> I suggest:
>   host_with_port = (hostname (":" digit{1,5})?) >mark %host;

It's ugly, but section 3.2.2 of RFC 2396 appears to allows it.

> ------------------
> 
> message_header = ((field_name ":" " "* field_value)|value_cont) :> CRLF;
> 
> - It doesn't allow valid spaces before ":" as:
>      Host : mydomain.org

Spaces before the ":" aren't allowed in rfc2616, and I have yet to see
evidence of clients sending headers like this in ~4 years of using this
parser.

> - Tabulators are also allowed.
> 
> I suggest:
>   message_header = ((field_name [ \t]* ":" [ \t]*
> field_value)|value_cont) :> CRLF;

I just pushed this out to unicorn.git to allow horizontal tabs:

>From 935912a422cabfd323f9b4ff268ded09a2ebe7a6 Mon Sep 17 00:00:00 2001
From: Eric Wong <normalperson@yhbt.net>
Date: Fri, 7 May 2010 18:20:49 +0000
Subject: [PATCH] http: allow horizontal tab as leading whitespace in header values
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This is allowed by RFC 2616, section 2.2, where spaces and
horizontal tabs are counted as linear white space and linear
white space (not just regular spaces) may prefix field-values
(section 4.2).

This has _not_ been a real issue in ~4 years of using this
parser (starting with Mongrel) with clients in the wild.

Thanks to Iñaki Baz Castillo for pointing this out.
---
 ext/unicorn_http/unicorn_http_common.rl |    2 +-
 test/unit/test_http_parser.rb           |    8 ++++++++
 2 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/ext/unicorn_http/unicorn_http_common.rl b/ext/unicorn_http/unicorn_http_common.rl
index 6fca604..f165e3f 100644
--- a/ext/unicorn_http/unicorn_http_common.rl
+++ b/ext/unicorn_http/unicorn_http_common.rl
@@ -54,7 +54,7 @@
 
   value_cont = lws+ any* >start_value %write_cont_value;
 
-  message_header = ((field_name ":" " "* field_value)|value_cont) :> CRLF;
+  message_header = ((field_name ":" lws* field_value)|value_cont) :> CRLF;
   chunk_ext_val = token*;
   chunk_ext_name = token*;
   chunk_extension = ( ";" " "* chunk_ext_name ("=" chunk_ext_val)? )*;
diff --git a/test/unit/test_http_parser.rb b/test/unit/test_http_parser.rb
index 5b0ca9f..b7c8a1c 100644
--- a/test/unit/test_http_parser.rb
+++ b/test/unit/test_http_parser.rb
@@ -51,6 +51,14 @@ class HttpParserTest < Test::Unit::TestCase
     assert parser.keepalive?
   end
 
+  def test_tab_lws
+    parser = HttpParser.new
+    req = {}
+    tmp = "GET / HTTP/1.1\r\nHost:\tfoo.bar\r\n\r\n"
+    assert_equal req.object_id, parser.headers(req, tmp).object_id
+    assert_equal "foo.bar", req['HTTP_HOST']
+  end
+
   def test_connection_close_no_ka
     parser = HttpParser.new
     req = {}
-- 

[1] - and those were wild NSFW sites, of course :>

-- 
Eric Wong
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply related	[relevance 7%]

Results 1-1 of 1 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2010-05-07 13:51     Some issues in the HTTP parser (and suggestions) Iñaki Baz Castillo
2010-05-07 18:33  7% ` Eric Wong

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).