unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: Gary Grossman <gary.grossman@gmail.com>
Cc: unicorn-public@bogomips.org, michael@grosser.it
Subject: Re: Please move to github
Date: Sat, 2 Aug 2014 08:50:41 +0000	[thread overview]
Message-ID: <20140802085040.GA16241@dcvr.yhbt.net> (raw)
In-Reply-To: <19466F7B-03C2-49BF-97E8-058AD3BE83D6@gmail.com>

Gary Grossman <gary.grossman@gmail.com> wrote:
> Hi Eric,
> 
> I work with Michael, and this discussion sure got off on the
> wrong foot... we love unicorn and use it heavily, and just
> want to contribute back to it.

No worries, cultural differences happen.  Thanks for following up.

> To detail the encoding problem we were trying to fix, unicorn
> uses rb_str_new in several places to create Ruby strings.
> For Ruby 1.9 and later, these strings are assigned ASCII-8BIT
> encoding.
> 
> While the Rack specification doesn't dictate what encoding
> should be used for strings in the environment, many
> developers would probably expect the default external encoding
> setting in Encoding.default_external to be used.

Right, the Rack spec does not dictate this.  Doing this out-of-spec has
the ability to break existing apps as well as compatibility with other
app servers.

What do other app servers do?

My main concern is having more different behavior between various Rack
servers servers, making it harder to switch between them.

Another concern is breaking apps which are already working around this
but work with non-UTF-8 encodings.

The rack-devel mailing list had a discussion on this in September 2010
and a decision was never reached. You can search the archives at:
http://groups.google.com/group/rack-devel

I've also saved the thread to a mbox at
http://80x24.org/rack-devel-encoding-2010.mbox.gz
since Google Groups archives are a bit painful to navigate.

Disclaimer: I am not an encoding expert, so for that reason I prefer
to let other Rack folks make the decision.

> Many Rails applications use UTF8 heavily. The use of ASCII-8BIT
> in the env can lead to Encoding::CompatibilityErrors being
> raised when a UTF8 string and ASCII-8BIT string are concatenated,
> which happens frequently when properties like request.url are
> referenced in erb templates. To get around these problems,
> an app would have to force encoding on the strings in the env
> manually. It seems a shame to do this in slower Ruby code when
> it could be done up front by unicorn.

Yes, this existing behavior sucks on UTF-8-heavy apps.  I would rather
not add more unicorn-only options which make switching between servers
harder.

Do you have performance measurements for doing this as pure-Ruby
middleware vs your patch?

My dislike of lock-in also applies to app servers.  Application-visible
differences like these should be avoided so people can switch between
servers, too.

So it should be best if there were a way to do this for all Rack
servers.

> We'd like to propose that unicorn use rb_external_str_new to
> make strings instead of rb_str_new.
> 
> Perhaps you have your reasons for continuing to use rb_str_new
> but we figured we'd run this by you.

If the Rack spec mandated encodings, I would do it in a heartbeat.

> Subject: [PATCH] If unicorn is used with Ruby 1.9 or later, use
>  rb_external_str_new instead of rb_str_new to create strings. The resulting
>  strings will use the default external encoding. Continue using rb_str_new for
>  older versions of Ruby.

A better, shorter, more direct subject would be:

Subject: use Encoding.default_external for header values

Commit message body is fine <snip>

> +#ifdef HAVE_RUBY_ENCODING_H
> +/* Use default external encoding for strings for Ruby 1.9+,
> + * fall back to rb_str_new when unavailable */
> +#define rb_str_new rb_external_str_new
> +#endif

This is too heavy-handed, as some strings (buffers) may
need to stay binary via rb_str_new.  If we were to do this, it would
something like:

#ifdef HAVE_RUBY_ENCODING_H
#  define env_val_new(ptr,len) rb_external_str_new((ptr),(len))
#else
#  define env_val_new(ptr,len) rb_str_new((ptr),(len))
#endif

... And only making sure header values are set to external.

Last I checked the HTTP RFCs (it's been a while) header keys are
required to be US-ASCII-only (and our parser enforces that).

> +  def test_encoding
> +    if ''.respond_to?(:encoding)
> +      client = MockRequest.new("GET http://e:3/x?y=z HTTP/1.1\r\n" \
> +                               "Host: foo\r\n\r\n")
> +      env = @request.read(client)
> +      encoding = Encoding.default_external
> +      assert_equal encoding, env['REQUEST_PATH'].encoding
> +      assert_equal encoding, env['PATH_INFO'].encoding
> +      assert_equal encoding, env['QUERY_STRING'].encoding
> +    end

This would need to test and work with (and appropriately reject)
invalid requests with bad encodings, too.

  parent reply	other threads:[~2014-08-02  8:50 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-02  7:51 Please move to github Gary Grossman
2014-08-02  7:54 ` Kapil Israni
2014-08-02  8:02   ` Eric Wong
2014-08-02  8:50 ` Eric Wong [this message]
2014-08-02 19:07   ` Gary Grossman
2014-08-02 19:33     ` Michael Fischer
2014-08-04  7:22       ` Hongli Lai
2014-08-04  8:48         ` Rack encodings (was: Please move to github) Eric Wong
2014-08-04  9:46           ` Hongli Lai
2014-08-02 20:15     ` Please move to github Eric Wong
  -- strict thread matches above, loose matches on Subject: below --
2014-08-02  7:46 Gary Grossman
2014-08-01 22:32 Victor Kmita
2014-08-01 20:27 Michael Grosser
2014-08-01 21:32 ` Eric Wong
2014-08-01 22:27   ` Michael Grosser
2014-08-01 22:41     ` Eric Wong
     [not found]       ` <CAAms34NByMcXnbGQ3DvCZTsQuh6yBn8sMSNeTi7Ss4VmmYoOrQ@mail.gmail.com>
2014-08-01 23:07         ` Eric Wong
2014-08-01 22:48     ` Gabe da Silveira
2014-08-01 23:09       ` Xavier Noria
2014-08-01 23:12         ` Michael Grosser
2014-08-01 23:24           ` Eric Wong
2014-08-01 23:26           ` Daniel Evans
2014-08-01 23:38             ` Michael Grosser
2014-08-01 23:18         ` Aaron Suggs

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://yhbt.net/unicorn/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140802085040.GA16241@dcvr.yhbt.net \
    --to=e@80x24.org \
    --cc=gary.grossman@gmail.com \
    --cc=michael@grosser.it \
    --cc=unicorn-public@bogomips.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhbt.net/unicorn.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).