unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH] unicorn_http_common.rl: use only ASCII spaces for compatibility
@ 2023-06-20 10:46  5% EW
  0 siblings, 0 replies; 11+ results
From: EW @ 2023-06-20 10:46 UTC (permalink / raw)
  To: unicorn-public

Ragel 6.10 on FreeBSD 12.4 amd64 complains and fails on this, yet the same
Ragel version on Debian 11.x i386 and amd64 never has.  I suspect this can
fix compatibility on s390x, arm64, armel, and armhf Debian builds:

https://buildd.debian.org/status/fetch.php?pkg=unicorn&arch=s390x&ver=6.1.0-1&stamp=1687156375&file=log
https://buildd.debian.org/status/fetch.php?pkg=unicorn&arch=arm64&ver=6.1.0-1&stamp=1687156478&file=log
https://buildd.debian.org/status/fetch.php?pkg=unicorn&arch=armel&ver=6.1.0-1&stamp=1687156619&file=log
https://buildd.debian.org/status/fetch.php?pkg=unicorn&arch=armhf&ver=6.1.0-1&stamp=1687156807&file=log

Fixes: d5fbbf547203061b (Add some tolerance (RFC2616 sec. 19.3), 2016-10-20)
---
 ext/unicorn_http/unicorn_http_common.rl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ext/unicorn_http/unicorn_http_common.rl b/ext/unicorn_http/unicorn_http_common.rl
index 0988b54..507e570 100644
--- a/ext/unicorn_http/unicorn_http_common.rl
+++ b/ext/unicorn_http/unicorn_http_common.rl
@@ -4,7 +4,7 @@
 
 #### HTTP PROTOCOL GRAMMAR
 # line endings
-  CRLF = ("\r\n" | "\n");
+  CRLF = ("\r\n" | "\n");
 
 # character types
   CTL = (cntrl | 127);

^ permalink raw reply related	[relevance 5%]

* Re: [PATCH] Add early hints support
  @ 2020-07-16 10:50  3% ` Eric Wong
  0 siblings, 0 replies; 11+ results
From: Eric Wong @ 2020-07-16 10:50 UTC (permalink / raw)
  To: Jean Boussier; +Cc: unicorn-public

Jean Boussier <jean.boussier@shopify.com> wrote:
> While not part of the rack spec, this API is exposed
> by both puma and falcon, and Rails use it when available.
> 
> The 103 Early Hints response code is specified in RFC 8297.

Thanks, I can probably accept it with some minor tweaks.
Some comment below...

> ---
>  lib/unicorn/configurator.rb |  5 +++++
>  lib/unicorn/http_server.rb  | 33 +++++++++++++++++++++++++++++++--
>  test/unit/test_server.rb    | 30 ++++++++++++++++++++++++++++++
>  3 files changed, 66 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/unicorn/configurator.rb b/lib/unicorn/configurator.rb
> index c3a4f2d..43e91f4 100644
> --- a/lib/unicorn/configurator.rb
> +++ b/lib/unicorn/configurator.rb
> @@ -276,6 +276,11 @@ def default_middleware(bool)
>      set_bool(:default_middleware, bool)
>    end
>  
> +  # sets wether to enable rack's early hints API.

spelling: "whether".

Since it's not officially part of Rack, yet, perhaps something
like:

	Enable the proposed early hints Rack API.

I'm no grammar expert, so I also rephrased that sentence to
avoid the apostrophe.

Since this RDoc ends up on the website, links to any relevant
Rails documentation and RFC 8297 would also be appropriate.
Otherwise non-Rails users like me might have no clue what
it's for.

> +  def early_hints(bool)
> +    set_bool(:early_hints, bool)
> +  end
> +
>    # sets listeners to the given +addresses+, replacing or augmenting the
>    # current set.  This is for the global listener pool shared by all
>    # worker processes.  For per-worker listeners, see the after_fork example
> diff --git a/lib/unicorn/http_server.rb b/lib/unicorn/http_server.rb
> index 45a2e97..3e0c9a4 100644
> --- a/lib/unicorn/http_server.rb
> +++ b/lib/unicorn/http_server.rb
> @@ -15,7 +15,7 @@ class Unicorn::HttpServer
>                  :before_fork, :after_fork, :before_exec,
>                  :listener_opts, :preload_app,
>                  :orig_app, :config, :ready_pipe, :user,
> -                :default_middleware
> +                :default_middleware, :early_hints
>    attr_writer   :after_worker_exit, :after_worker_ready, :worker_exec
>  
>    attr_reader :pid, :logger
> @@ -588,6 +588,27 @@ def handle_error(client, e)
>    rescue
>    end
>  
> +  def e103_response_write(client, headers)
> +    response = if @request.response_start_sent
> +      "103 Early Hints\r\n"
> +    else
> +      "HTTP/1.1 103 Early Hints\r\n"
> +    end
> +
> +    headers.each_pair do |k, vs|
> +      if vs.respond_to?(:to_s) && !vs.to_s.empty?
> +        vs.to_s.split("\n".freeze).each do |v|

Are the method calls for .to_s necessary?  At least for regular
Rack responses, this bit from unicorn/http_response.rb seems to
handle odd apps which (improperly) give `nil' value:

          if value =~ /\n/
            # avoiding blank, key-only cookies with /\n+/
            value.split(/\n+/).each { |v| buf << "#{key}: #{v}\r\n" }
          else
            buf << "#{key}: #{value}\r\n"
          end

The split would never be attempted on nil since it wouldn't
match /\n/, and the /\n+/ was needed in the Rails 2.3.5 days:

https://public-inbox.org/rack-devel/20100105235845.GB3377@dcvr.yhbt.net/
https://yhbt.net/unicorn-public/52400de1c9e9437b5c9df899f273485f663bb5b5/s/

> +          response << "#{k}: #{v}\r\n"
> +        end
> +      else
> +        response << "#{k}: #{vs}\r\n"
> +      end
> +    end
> +    response << "\r\n".freeze
> +    response << "HTTP/1.1 ".freeze if @request.response_start_sent
> +    client.write(response)
> +  end
> +
>    def e100_response_write(client, env)
>      # We use String#freeze to avoid allocations under Ruby 2.1+
>      # Not many users hit this code path, so it's better to reduce the
> @@ -602,7 +623,15 @@ def e100_response_write(client, env)
>    # once a client is accepted, it is processed in its entirety here
>    # in 3 easy steps: read request, call app, write app response
>    def process_client(client)
> -    status, headers, body = @app.call(env = @request.read(client))
> +    env = @request.read(client)
> +
> +    if early_hints
> +      env["rack.early_hints"] = lambda do |headers|
> +        e103_response_write(client, headers)
> +      end
> +    end

Eep, extra branch...  What's the performance impact for existing
users when not activated? (on Unix sockets).

Perhaps bypassing the method and accessing the @early_hints ivar
directly can be slightly faster w/o method dispatch.  That
should also allow using attr_writer instead of attr_accessor,
I think.

Not sure how much people here care about minor performance
regressions, here...  I don't really upgrade or touch old
Rack apps, anymore; and I'm certainly never going to buy
a faster CPU.

> +    status, headers, body = @app.call(env)
>  
>      begin
>        return if @request.hijacked?

^ permalink raw reply	[relevance 3%]

* [RFC] doc: unicorn_rails: clarify that it is intended for rails <= 2.x
@ 2019-04-15  6:52  4% Eric Wong
  0 siblings, 0 replies; 11+ results
From: Eric Wong @ 2019-04-15  6:52 UTC (permalink / raw)
  To: unicorn-public

Hopefully the wording is a little more explicit and clearer
by stating its purpose in the first line of the description.
---
 Grammar/wording awkwardness review/comments greatly appreciated,
 haven't used English or Ruby much, lately.

 Documentation/unicorn_rails.1.txt | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/Documentation/unicorn_rails.1.txt b/Documentation/unicorn_rails.1.txt
index fb0e60f..0ce9bcf 100644
--- a/Documentation/unicorn_rails.1.txt
+++ b/Documentation/unicorn_rails.1.txt
@@ -12,14 +12,12 @@ unicorn_rails [-c CONFIG_FILE] [-E RAILS_ENV] [-D] [RACKUP_FILE]
 
 # DESCRIPTION
 
-A rackup(1)-like command to launch Rails applications using Unicorn.  It
-is expected to be started in your Rails application root (RAILS_ROOT),
-but the "working_directory" directive may be used in the CONFIG_FILE.
+A rackup(1)-like command to launch ancient Rails (2.x and earlier)
+applications using Unicorn.  Rails 3 (and later) support Rack natively,
+so users are encouraged to use unicorn(1) instead of unicorn_rails(1).
 
-It is designed to help Rails 1.x and 2.y users transition to Rack, but
-it is NOT needed for Rails 3 applications.  Rails 3 users are encouraged
-to use unicorn(1) instead of unicorn_rails(1).  Users of Rails 1.x/2.y
-may also use unicorn(1) instead of unicorn_rails(1).
+It is expected to be started in your Rails application root (RAILS_ROOT),
+but the "working_directory" directive may be used in the CONFIG_FILE.
 
 The outward interface resembles rackup(1), the internals and default
 middleware loading is designed like the `script/server` command
-- 
EW

^ permalink raw reply related	[relevance 4%]

* [PATCH] Add some tolerance (RFC2616 sec. 19.3)
@ 2016-10-20  9:05  3% Mishael A Sibiryakov
  0 siblings, 0 replies; 11+ results
From: Mishael A Sibiryakov @ 2016-10-20  9:05 UTC (permalink / raw)
  To: unicorn-public

Hi all.

We're implementing client certificate authentication with nginx and
unicorn. 

Nginx configured in the following way:

proxy_set_header X-SSL-Client-Cert $ssl_client_cert;

When client submits certificate and nginx passes it to the unicorn,
unicorn responds with 400 (Bad Request). This caused because nginx
doesn't use "\r\n" they using just "\n" and multilne headers is failed
to parse (I've added test).

Accorording to RFC2616 section 19.3:
https://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.3

"The line terminator for message-header fields is the sequence CRLF.
However, we recommend that applications, when parsing such headers,
recognize a single LF as a line terminator and ignore the leading CR."

CRLF changed to ("\r\n" | "\n")

Github commit https://github.com/uno4ki/unicorn/commit/ed127b66e162aaf1
76de05720f6be758f8b41b1f


PS: Googling "nginx unicorn ssl_client_cert" shows the problem. 

---
 ext/unicorn_http/unicorn_http_common.rl |  2 +-
 test/unit/test_http_parser.rb           | 16 ++++++++++++++++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/ext/unicorn_http/unicorn_http_common.rl
b/ext/unicorn_http/unicorn_http_common.rl
index cc1d455..507e570 100644
--- a/ext/unicorn_http/unicorn_http_common.rl
+++ b/ext/unicorn_http/unicorn_http_common.rl
@@ -4,7 +4,7 @@
 
 #### HTTP PROTOCOL GRAMMAR
 # line endings
-  CRLF = "\r\n";
+  CRLF = ("\r\n" | "\n");
 
 # character types
   CTL = (cntrl | 127);
diff --git a/test/unit/test_http_parser.rb
b/test/unit/test_http_parser.rb
index c72f7f2..4b1a16e 100644
--- a/test/unit/test_http_parser.rb
+++ b/test/unit/test_http_parser.rb
@@ -230,6 +230,22 @@ def test_nasty_pound_header
     assert_equal expect, req['HTTP_X_SSL_BULLSHIT']
   end
 
+  def test_multiline_header_0d0a
+    parser = HttpParser.new
+    parser.buf << "GET / HTTP/1.0\r\nX-Multiline-Header: foo
bar\r\n\tcha cha\r\n\tzha zha\r\n\r\n"
+    req = parser.env
+    assert_equal req, parser.parse
+    assert_equal 'foo bar cha cha zha zha',
req['HTTP_X_MULTILINE_HEADER']
+  end
+
+  def test_multiline_header_0a
+    parser = HttpParser.new
+    parser.buf << "GET / HTTP/1.0\nX-Multiline-Header: foo bar\n\tcha
cha\n\tzha zha\n\n"
+    req = parser.env
+    assert_equal req, parser.parse
+    assert_equal 'foo bar cha cha zha zha',
req['HTTP_X_MULTILINE_HEADER']
+  end
+
   def test_continuation_eats_leading_spaces
     parser = HttpParser.new
     header = "GET / HTTP/1.1\r\n" \
-- 
2.10.1

^ permalink raw reply related	[relevance 3%]

* Re: Please move to github
  @ 2014-08-01 22:48  4%     ` Gabe da Silveira
  0 siblings, 0 replies; 11+ results
From: Gabe da Silveira @ 2014-08-01 22:48 UTC (permalink / raw)
  To: Michael Grosser; +Cc: Eric Wong, unicorn-public

I am a huge fan of GitHub and use it daily in my professional job, but both
the original message and now this comes off full of youthful arrogance and
hubris (in addition to abysmal grammar). GitHub is not the only way to do
open-source, and developers who refuse to use something other than GitHub
tend to be people impressed by superficial polish, but lacking
understanding in the type of deep engineering that allows for things like
the Linux kernel (or preforking app servers like Unicorn) to work.  There
is more to software engineering than what is fashionable in the last 5
years.  Please be more respectful of the maintainers of serious software
like Unicorn than to piss on their choice of project management tools and
techniques.


On Fri, Aug 1, 2014 at 5:27 PM, Michael Grosser <michael@grosser.it> wrote:

> Using github would mean more contributors, better software and
> especially helping more people (compare patches/issues/communication
> on unicorn to any other os project and github and the difference
> should be apparent),
> and I think that's what OS is about ... it's not ideal, but it's the
> best we got.
>
> On Fri, Aug 1, 2014 at 2:32 PM, Eric Wong <e@80x24.org> wrote:
> > Michael Grosser <michael@grosser.it> wrote:
> >> Current state makes it very hard to mange/search/fork/open-issues etc
> >> especially for newcomers,
> >> please move the project to github so we can have nice disussions
> >> forks/prs etc goodness.
> >
> > No.  Never.  Github is proprietary communications tool which requires
> > users to accept a terms of service and login.  That gives power and
> > influence to a single entity (and a for-profit organization at that).
> >
> > Contributing to unicorn is *socially* as easy as contributing to git or
> > the Linux kernel.  There is no need to signup for anything, no need to
> > ever touch a bloated web browser.
> >
> > The reason I contribute to Free Software is because I am against any
> > sort of lock-in or proprietary features.  It absolutely sickens me to
> > encounter users who seem to be incapable of using git without a
> > proprietary communications tool.
>
>


^ permalink raw reply	[relevance 4%]

* [PATCH] doc: emphasize the importance of stderr_path
  @ 2010-06-04 20:47  2%   ` Eric Wong
  0 siblings, 0 replies; 11+ results
From: Eric Wong @ 2010-06-04 20:47 UTC (permalink / raw)
  To: unicorn list

Eric Wong <normalperson@yhbt.net> wrote:
> I shall emphasize the importance of stderr_path if you're using the
> default logger configuration in the documentation.

I've just pushed the following out to unicorn.git and website.

Comments/grammar/speling korections welcome as always.

>From 2d5a4b075801493a85c6e8d2dbdf0c95002e046d Mon Sep 17 00:00:00 2001
From: Eric Wong <normalperson@yhbt.net>
Date: Fri, 4 Jun 2010 13:42:34 -0700
Subject: [PATCH] doc: emphasize the importance of stderr_path

While second nature to myself, stderr_path may be an
overlooked configuration parameter for some users.  Also,
add a minimal sample configuration file that is shorter
and hopefully less intimidating to new users.
---
 examples/unicorn.conf.minimal.rb |   13 +++++++++++++
 examples/unicorn.conf.rb         |   12 +++++++++---
 lib/unicorn/configurator.rb      |   22 ++++++++++++++++++----
 3 files changed, 40 insertions(+), 7 deletions(-)
 create mode 100644 examples/unicorn.conf.minimal.rb

diff --git a/examples/unicorn.conf.minimal.rb b/examples/unicorn.conf.minimal.rb
new file mode 100644
index 0000000..2a47910
--- /dev/null
+++ b/examples/unicorn.conf.minimal.rb
@@ -0,0 +1,13 @@
+# Minimal sample configuration file for Unicorn (not Rack) when used
+# with daemonization (unicorn -D) started in your working directory.
+#
+# See http://unicorn.bogomips.org/Unicorn/Configurator.html for complete
+# documentation.
+# See also http://unicorn.bogomips.org/examples/unicorn.conf.rb for
+# a more verbose configuration using more features.
+
+listen 2007 # by default Unicorn listens on port 8080
+worker_processes 2 # this should be >= nr_cpus
+pid "/path/to/app/shared/pids/unicorn.pid"
+stderr_path "/path/to/app/shared/log/unicorn.log"
+stdout_path "/path/to/app/shared/log/unicorn.log"
diff --git a/examples/unicorn.conf.rb b/examples/unicorn.conf.rb
index e209894..37c3e81 100644
--- a/examples/unicorn.conf.rb
+++ b/examples/unicorn.conf.rb
@@ -1,4 +1,9 @@
-# Sample configuration file for Unicorn (not Rack)
+# Sample verbose configuration file for Unicorn (not Rack)
+#
+# This configuration file documents many features of Unicorn
+# that may not be needed for some applications. See
+# http://unicorn.bogomips.org/examples/unicorn.conf.minimal.rb
+# for a much simpler configuration file.
 #
 # See http://unicorn.bogomips.org/Unicorn/Configurator.html for complete
 # documentation.
@@ -22,8 +27,9 @@ timeout 30
 # feel free to point this anywhere accessible on the filesystem
 pid "/path/to/app/shared/pids/unicorn.pid"
 
-# some applications/frameworks log to stderr or stdout, so prevent
-# them from going to /dev/null when daemonized here:
+# By default, the Unicorn logger will write to stderr.
+# Additionally, ome applications/frameworks log to stderr or stdout,
+# so prevent them from going to /dev/null when daemonized here:
 stderr_path "/path/to/app/shared/log/unicorn.stderr.log"
 stdout_path "/path/to/app/shared/log/unicorn.stdout.log"
 
diff --git a/lib/unicorn/configurator.rb b/lib/unicorn/configurator.rb
index 64a25e3..4fa745d 100644
--- a/lib/unicorn/configurator.rb
+++ b/lib/unicorn/configurator.rb
@@ -7,9 +7,11 @@ module Unicorn
 
   # Implements a simple DSL for configuring a Unicorn server.
   #
-  # See http://unicorn.bogomips.org/examples/unicorn.conf.rb for an
-  # example config file.  An example config file for use with nginx is
-  # also available at http://unicorn.bogomips.org/examples/nginx.conf
+  # See http://unicorn.bogomips.org/examples/unicorn.conf.rb and
+  # http://unicorn.bogomips.org/examples/unicorn.conf.minimal.rb
+  # example configuration files.  An example config file for use with
+  # nginx is also available at
+  # http://unicorn.bogomips.org/examples/nginx.conf
   class Configurator < Struct.new(:set, :config_file, :after_reload)
 
     # Default settings for Unicorn
@@ -78,6 +80,10 @@ module Unicorn
     # sets object to the +new+ Logger-like object.  The new logger-like
     # object must respond to the following methods:
     #  +debug+, +info+, +warn+, +error+, +fatal+, +close+
+    # The default Logger will log its output to the path specified
+    # by +stderr_path+.  If you're running Unicorn daemonized, then
+    # you must specify a path to prevent error messages from going
+    # to /dev/null.
     def logger(new)
       %w(debug info warn error fatal close).each do |m|
         new.respond_to?(m) and next
@@ -310,11 +316,19 @@ module Unicorn
     # file will be opened with the File::APPEND flag and writes
     # synchronized to the kernel (but not necessarily to _disk_) so
     # multiple processes can safely append to it.
+    #
+    # If you are daemonizing and using the default +logger+, it is important
+    # to specify this as errors will otherwise be lost to /dev/null.
+    # Some applications/libraries may also triggering warnings that go to
+    # stderr, and they will end up here.
     def stderr_path(path)
       set_path(:stderr_path, path)
     end
 
-    # Same as stderr_path, except for $stdout
+    # Same as stderr_path, except for $stdout.  Not many Rack applications
+    # write to $stdout, but any that do will have their output written here.
+    # It is safe to point this to the same location a stderr_path.
+    # Like stderr_path, this defaults to /dev/null when daemonized.
     def stdout_path(path)
       set_path(:stdout_path, path)
     end
-- 
Eric Wong
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply related	[relevance 2%]

* Re: Some issues in the HTTP parser (and suggestions)
  2010-05-08 16:17  0%     ` Eric Wong
@ 2010-05-08 16:50  4%       ` Iñaki Baz Castillo
  0 siblings, 0 replies; 11+ results
From: Iñaki Baz Castillo @ 2010-05-08 16:50 UTC (permalink / raw)
  To: unicorn list

2010/5/8 Eric Wong <normalperson@yhbt.net>:
> Iñaki Baz Castillo <ibc@aliax.net> wrote:
>> 2010/5/7 Eric Wong <normalperson@yhbt.net>:
>> > Underscore isn't valid for hostnames, but it is allowed in domain names
>> > and most DNS servers will resolve them.  I've personally seen websites
>> > with underscores in their domain names in the wild[1].
>>
>> Hi Eric, could you point me to the spec stating that underscore is
>> allowed for a domain? In the past I've done a SIP parser [*] with
>> Ragel, being 100% strict at BNF grammar, and note that SIP reuses 80%
>> of the grammar of HTTP. I'm pretty sure that "_" is not valid in a
>> domain (host, hostname or whatever). Anyhow it's better just to allow
>> it at parsing level :)
>
> http://www.ietf.org/rfc/rfc2782.txt

Hi Eric. DNS SRV are not domain names, but DNS queries. For example,
in SIP (and also in XMPP) if a phone is configured to use
"myproxy.org" as proxy then it must perform a DNS SRV query for these
entries:

  _sip._udp.myproxy.org
  _sip._tcp.myproxy.org

Then the DNS query recevives some DNS A records for which the client
also retrieves the associated IP's. But after it, when the client
generates a SIP request it uses "myproxy.org" rather than
"_sip._udp.myproxy.org". This is, "_sip._udp.myproxy.org" is not a
domain/hostname, but a format for querying DNS SRV records.


> Even if it's not part of the RFC, our parser will match reality and
> accommodate broken things we see in the wild, as it has done in the
> past:
>
>  http://mid.gmane.org/20080327215027.GA14531@untitled

Yes, I agree.



>> alphanum  =  ALPHA / DIGIT
>> domainlabel = alphanum | ( alphanum ( alphanum | "-" )* alphanum );
>> toplabel = ALPHA | ( ALPHA ( alphanum | "-" )* alphanum );
>> hostname = ( domainlabel "." )* toplabel "."?;
>> dec_octet = DIGIT | ( 0x31..0x39 DIGIT ) | ( "1" DIGIT{2} ) | ( "2"
>> 0x30..0x34 DIGIT ) | ( "25" 0x30..0x35 );
>> IPv4address = dec_octet "." dec_octet "." dec_octet "." dec_octet;
>> h16 = HEXDIG{1,4};
>> ls32 = ( h16 ":" h16 ) | IPv4address;
>> IPv6address = ( ( h16 ":" ){6} ls32 ) | ( "::" ( h16 ":" ){5} ls32 ) |
>> ( h16? "::" ( h16 ":" ){4} ls32 ) | ( ( ( h16 ":" )? h16 )? "::" ( h16
>> ":" ){3} ls32 ) | ( ( ( h16 ":" ){,2} h16 )? "::" ( h16 ":" ){2} ls32
>> ) | ( ( ( h16 ":" ){,3} h16 )? "::" h16 ":" ls32 ) | ( ( ( h16 ":"
>> ){,4} h16 )? "::" ls32 ) | ( ( ( h16 ":" ){,5} h16 )? "::" h16 ) | ( (
>> ( h16 ":" ){,6} h16 )? "::" );
>> IPv6reference = "[" IPv6address "]";
>> host = hostname | IPv4address | IPv6reference;
>> port = DIGIT{1,5};
>> hostport = host ( ":" port )?;
>>
>>
>> This is much better than the deprecated and bogus grammar in RFC 2396 ;)
>
> Thanks, it might be worth simplifying a bit for readability, simplicity
> (and possibly performance) at the expense of 100% conformance.

You can try to simplicity it, but note that some previous IPv6 BNF
grammars failed to cover all the valid cases and they are bogus. For
example, the original IPv6 BNF grammar appearing in RFC 3261 (SIP
protocol) is buggy (even if it seems simpler) and the specification
has been fixed with the draft I linked in my previous mail.


Regards.


-- 
Iñaki Baz Castillo
<ibc@aliax.net>
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply	[relevance 4%]

* Re: Some issues in the HTTP parser (and suggestions)
  2010-05-08 13:11  7%   ` Iñaki Baz Castillo
@ 2010-05-08 16:17  0%     ` Eric Wong
  2010-05-08 16:50  4%       ` Iñaki Baz Castillo
  0 siblings, 1 reply; 11+ results
From: Eric Wong @ 2010-05-08 16:17 UTC (permalink / raw)
  To: unicorn list

Iñaki Baz Castillo <ibc@aliax.net> wrote:
> 2010/5/7 Eric Wong <normalperson@yhbt.net>:
> > Underscore isn't valid for hostnames, but it is allowed in domain names
> > and most DNS servers will resolve them.  I've personally seen websites
> > with underscores in their domain names in the wild[1].
> 
> Hi Eric, could you point me to the spec stating that underscore is
> allowed for a domain? In the past I've done a SIP parser [*] with
> Ragel, being 100% strict at BNF grammar, and note that SIP reuses 80%
> of the grammar of HTTP. I'm pretty sure that "_" is not valid in a
> domain (host, hostname or whatever). Anyhow it's better just to allow
> it at parsing level :)

http://www.ietf.org/rfc/rfc2782.txt

Even if it's not part of the RFC, our parser will match reality and
accommodate broken things we see in the wild, as it has done in the
past:

  http://mid.gmane.org/20080327215027.GA14531@untitled

> > We'll have to test the IPv6 addresses and probably split that out into a
> > separate regexp since ":" would raise issues with the port number in
> > existing cases.  This is probably something for post-1.0.
> 
> There is a IETF draft to improve and *fix* the existing BNF grammar for IPv6.
> It also improves the grammar for IPv4 (by dissallowing values greater than 255):
> 
>     http://tools.ietf.org/html/draft-ietf-sip-ipv6-abnf-fix
> 
> 
> I've already implemented it in Ragel and I can sure that it's 100%
> valid and strict (I've done lots of tests):
> 
> alphanum  =  ALPHA / DIGIT
> domainlabel = alphanum | ( alphanum ( alphanum | "-" )* alphanum );
> toplabel = ALPHA | ( ALPHA ( alphanum | "-" )* alphanum );
> hostname = ( domainlabel "." )* toplabel "."?;
> dec_octet = DIGIT | ( 0x31..0x39 DIGIT ) | ( "1" DIGIT{2} ) | ( "2"
> 0x30..0x34 DIGIT ) | ( "25" 0x30..0x35 );
> IPv4address = dec_octet "." dec_octet "." dec_octet "." dec_octet;
> h16 = HEXDIG{1,4};
> ls32 = ( h16 ":" h16 ) | IPv4address;
> IPv6address = ( ( h16 ":" ){6} ls32 ) | ( "::" ( h16 ":" ){5} ls32 ) |
> ( h16? "::" ( h16 ":" ){4} ls32 ) | ( ( ( h16 ":" )? h16 )? "::" ( h16
> ":" ){3} ls32 ) | ( ( ( h16 ":" ){,2} h16 )? "::" ( h16 ":" ){2} ls32
> ) | ( ( ( h16 ":" ){,3} h16 )? "::" h16 ":" ls32 ) | ( ( ( h16 ":"
> ){,4} h16 )? "::" ls32 ) | ( ( ( h16 ":" ){,5} h16 )? "::" h16 ) | ( (
> ( h16 ":" ){,6} h16 )? "::" );
> IPv6reference = "[" IPv6address "]";
> host = hostname | IPv4address | IPv6reference;
> port = DIGIT{1,5};
> hostport = host ( ":" port )?;
> 
> 
> This is much better than the deprecated and bogus grammar in RFC 2396 ;)

Thanks, it might be worth simplifying a bit for readability, simplicity
(and possibly performance) at the expense of 100% conformance.

-- 
Eric Wong
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply	[relevance 0%]

* Re: Some issues in the HTTP parser (and suggestions)
  2010-05-07 18:33  0% ` Eric Wong
@ 2010-05-08 13:11  7%   ` Iñaki Baz Castillo
  2010-05-08 16:17  0%     ` Eric Wong
  0 siblings, 1 reply; 11+ results
From: Iñaki Baz Castillo @ 2010-05-08 13:11 UTC (permalink / raw)
  To: unicorn list

2010/5/7 Eric Wong <normalperson@yhbt.net>:
> Underscore isn't valid for hostnames, but it is allowed in domain names
> and most DNS servers will resolve them.  I've personally seen websites
> with underscores in their domain names in the wild[1].

Hi Eric, could you point me to the spec stating that underscore is
allowed for a domain? In the past I've done a SIP parser [*] with
Ragel, being 100% strict at BNF grammar, and note that SIP reuses 80%
of the grammar of HTTP. I'm pretty sure that "_" is not valid in a
domain (host, hostname or whatever). Anyhow it's better just to allow
it at parsing level :)



> We'll have to test the IPv6 addresses and probably split that out into a
> separate regexp since ":" would raise issues with the port number in
> existing cases.  This is probably something for post-1.0.

There is a IETF draft to improve and *fix* the existing BNF grammar for IPv6.
It also improves the grammar for IPv4 (by dissallowing values greater than 255):

    http://tools.ietf.org/html/draft-ietf-sip-ipv6-abnf-fix


I've already implemented it in Ragel and I can sure that it's 100%
valid and strict (I've done lots of tests):

alphanum  =  ALPHA / DIGIT
domainlabel = alphanum | ( alphanum ( alphanum | "-" )* alphanum );
toplabel = ALPHA | ( ALPHA ( alphanum | "-" )* alphanum );
hostname = ( domainlabel "." )* toplabel "."?;
dec_octet = DIGIT | ( 0x31..0x39 DIGIT ) | ( "1" DIGIT{2} ) | ( "2"
0x30..0x34 DIGIT ) | ( "25" 0x30..0x35 );
IPv4address = dec_octet "." dec_octet "." dec_octet "." dec_octet;
h16 = HEXDIG{1,4};
ls32 = ( h16 ":" h16 ) | IPv4address;
IPv6address = ( ( h16 ":" ){6} ls32 ) | ( "::" ( h16 ":" ){5} ls32 ) |
( h16? "::" ( h16 ":" ){4} ls32 ) | ( ( ( h16 ":" )? h16 )? "::" ( h16
":" ){3} ls32 ) | ( ( ( h16 ":" ){,2} h16 )? "::" ( h16 ":" ){2} ls32
) | ( ( ( h16 ":" ){,3} h16 )? "::" h16 ":" ls32 ) | ( ( ( h16 ":"
){,4} h16 )? "::" ls32 ) | ( ( ( h16 ":" ){,5} h16 )? "::" h16 ) | ( (
( h16 ":" ){,6} h16 )? "::" );
IPv6reference = "[" IPv6address "]";
host = hostname | IPv4address | IPv6reference;
port = DIGIT{1,5};
hostport = host ( ":" port )?;


This is much better than the deprecated and bogus grammar in RFC 2396 ;)





>> ------------------
>>
>> host_with_port = (hostname (":" digit*)?) >mark %host;
>>
>> - It allows something ugly as "mydomain.org:"
>>
>> I suggest:
>>   host_with_port = (hostname (":" digit{1,5})?) >mark %host;
>
> It's ugly, but section 3.2.2 of RFC 2396 appears to allows it.

Sometimes there are bugs in the RFC's related to parsing and BNF
grammars. I know several cases. Unfortunatelly RFC's cannot be fixed,
instead the errors are reported and a new draft or RFC "xxx-fix"
appears some years later.



>> message_header = ((field_name ":" " "* field_value)|value_cont) :> CRLF;
>>
>> - It doesn't allow valid spaces before ":" as:
>>      Host : mydomain.org
>
> Spaces before the ":" aren't allowed in rfc2616, and I have yet to see
> evidence of clients sending headers like this in ~4 years of using this
> parser.

In SIP protocol spaces and tabulators before ":" are allowed, I really
expected that in HTTP the same occurs as SIP grammar is based on HTTP
grammar. But it could be different in some aspects, of course.


>> - Tabulators are also allowed.
>>
>> I suggest:
>>   message_header = ((field_name [ \t]* ":" [ \t]*
>> field_value)|value_cont) :> CRLF;
>
> I just pushed this out to unicorn.git to allow horizontal tabs:

Thanks.


[*] http://dev.sipdoc.net/projects/ragel-sip-parser/wiki/Phase1


-- 
Iñaki Baz Castillo
<ibc@aliax.net>
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply	[relevance 7%]

* Re: Some issues in the HTTP parser (and suggestions)
  2010-05-07 13:51  4% Some issues in the HTTP parser (and suggestions) Iñaki Baz Castillo
@ 2010-05-07 18:33  0% ` Eric Wong
  2010-05-08 13:11  7%   ` Iñaki Baz Castillo
  0 siblings, 1 reply; 11+ results
From: Eric Wong @ 2010-05-07 18:33 UTC (permalink / raw)
  To: unicorn list

Iñaki Baz Castillo <ibc@aliax.net> wrote:
> Hi, by inspecting the Ragel grammar of the HTTP parser (coming from
> Mongrel) I've realized of some possible issues and bugs:
> 
> 
> hostname = (alnum | "-" | "." | "_")+;
> 
> - It doesn't allow IPv6. This is important IMHO.
> - It allows "_" which is an invalid symbol (not valid for a domain).
> 
> I suggest:
>   hostname = (alnum | "-" | "." | "[" | "]" | ":")+;

Hi Iñaki,

Underscore isn't valid for hostnames, but it is allowed in domain names
and most DNS servers will resolve them.  I've personally seen websites
with underscores in their domain names in the wild[1].

We'll have to test the IPv6 addresses and probably split that out into a
separate regexp since ":" would raise issues with the port number in
existing cases.  This is probably something for post-1.0.

> ------------------
> 
> host_with_port = (hostname (":" digit*)?) >mark %host;
> 
> - It allows something ugly as "mydomain.org:"
> 
> I suggest:
>   host_with_port = (hostname (":" digit{1,5})?) >mark %host;

It's ugly, but section 3.2.2 of RFC 2396 appears to allows it.

> ------------------
> 
> message_header = ((field_name ":" " "* field_value)|value_cont) :> CRLF;
> 
> - It doesn't allow valid spaces before ":" as:
>      Host : mydomain.org

Spaces before the ":" aren't allowed in rfc2616, and I have yet to see
evidence of clients sending headers like this in ~4 years of using this
parser.

> - Tabulators are also allowed.
> 
> I suggest:
>   message_header = ((field_name [ \t]* ":" [ \t]*
> field_value)|value_cont) :> CRLF;

I just pushed this out to unicorn.git to allow horizontal tabs:

>From 935912a422cabfd323f9b4ff268ded09a2ebe7a6 Mon Sep 17 00:00:00 2001
From: Eric Wong <normalperson@yhbt.net>
Date: Fri, 7 May 2010 18:20:49 +0000
Subject: [PATCH] http: allow horizontal tab as leading whitespace in header values
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This is allowed by RFC 2616, section 2.2, where spaces and
horizontal tabs are counted as linear white space and linear
white space (not just regular spaces) may prefix field-values
(section 4.2).

This has _not_ been a real issue in ~4 years of using this
parser (starting with Mongrel) with clients in the wild.

Thanks to Iñaki Baz Castillo for pointing this out.
---
 ext/unicorn_http/unicorn_http_common.rl |    2 +-
 test/unit/test_http_parser.rb           |    8 ++++++++
 2 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/ext/unicorn_http/unicorn_http_common.rl b/ext/unicorn_http/unicorn_http_common.rl
index 6fca604..f165e3f 100644
--- a/ext/unicorn_http/unicorn_http_common.rl
+++ b/ext/unicorn_http/unicorn_http_common.rl
@@ -54,7 +54,7 @@
 
   value_cont = lws+ any* >start_value %write_cont_value;
 
-  message_header = ((field_name ":" " "* field_value)|value_cont) :> CRLF;
+  message_header = ((field_name ":" lws* field_value)|value_cont) :> CRLF;
   chunk_ext_val = token*;
   chunk_ext_name = token*;
   chunk_extension = ( ";" " "* chunk_ext_name ("=" chunk_ext_val)? )*;
diff --git a/test/unit/test_http_parser.rb b/test/unit/test_http_parser.rb
index 5b0ca9f..b7c8a1c 100644
--- a/test/unit/test_http_parser.rb
+++ b/test/unit/test_http_parser.rb
@@ -51,6 +51,14 @@ class HttpParserTest < Test::Unit::TestCase
     assert parser.keepalive?
   end
 
+  def test_tab_lws
+    parser = HttpParser.new
+    req = {}
+    tmp = "GET / HTTP/1.1\r\nHost:\tfoo.bar\r\n\r\n"
+    assert_equal req.object_id, parser.headers(req, tmp).object_id
+    assert_equal "foo.bar", req['HTTP_HOST']
+  end
+
   def test_connection_close_no_ka
     parser = HttpParser.new
     req = {}
-- 

[1] - and those were wild NSFW sites, of course :>

-- 
Eric Wong
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying


^ permalink raw reply related	[relevance 0%]

* Some issues in the HTTP parser (and suggestions)
@ 2010-05-07 13:51  4% Iñaki Baz Castillo
  2010-05-07 18:33  0% ` Eric Wong
  0 siblings, 1 reply; 11+ results
From: Iñaki Baz Castillo @ 2010-05-07 13:51 UTC (permalink / raw)
  To: unicorn list

Hi, by inspecting the Ragel grammar of the HTTP parser (coming from
Mongrel) I've realized of some possible issues and bugs:


hostname = (alnum | "-" | "." | "_")+;

- It doesn't allow IPv6. This is important IMHO.
- It allows "_" which is an invalid symbol (not valid for a domain).

I suggest:
  hostname = (alnum | "-" | "." | "[" | "]" | ":")+;

------------------

host_with_port = (hostname (":" digit*)?) >mark %host;

- It allows something ugly as "mydomain.org:"

I suggest:
  host_with_port = (hostname (":" digit{1,5})?) >mark %host;

------------------

message_header = ((field_name ":" " "* field_value)|value_cont) :> CRLF;

- It doesn't allow valid spaces before ":" as:
     Host : mydomain.org

- Tabulators are also allowed.

I suggest:
  message_header = ((field_name [ \t]* ":" [ \t]*
field_value)|value_cont) :> CRLF;



Regards.



-- 
Iñaki Baz Castillo
<ibc@aliax.net>
_______________________________________________
Unicorn mailing list - mongrel-unicorn@rubyforge.org
http://rubyforge.org/mailman/listinfo/mongrel-unicorn
Do not quote signatures (like this one) or top post when replying

^ permalink raw reply	[relevance 4%]

Results 1-11 of 11 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2010-05-07 13:51  4% Some issues in the HTTP parser (and suggestions) Iñaki Baz Castillo
2010-05-07 18:33  0% ` Eric Wong
2010-05-08 13:11  7%   ` Iñaki Baz Castillo
2010-05-08 16:17  0%     ` Eric Wong
2010-05-08 16:50  4%       ` Iñaki Baz Castillo
2010-06-04 19:06     Logging errors from Unicorn Peter Kieltyka
2010-06-04 20:06     ` Eric Wong
2010-06-04 20:47  2%   ` [PATCH] doc: emphasize the importance of stderr_path Eric Wong
2014-08-01 20:27     Please move to github Michael Grosser
2014-08-01 21:32     ` Eric Wong
2014-08-01 22:27       ` Michael Grosser
2014-08-01 22:48  4%     ` Gabe da Silveira
2016-10-20  9:05  3% [PATCH] Add some tolerance (RFC2616 sec. 19.3) Mishael A Sibiryakov
2019-04-15  6:52  4% [RFC] doc: unicorn_rails: clarify that it is intended for rails <= 2.x Eric Wong
2020-07-16 10:05     [PATCH] Add early hints support Jean Boussier
2020-07-16 10:50  3% ` Eric Wong
2023-06-20 10:46  5% [PATCH] unicorn_http_common.rl: use only ASCII spaces for compatibility EW

Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).