unicorn Ruby/Rack server user+dev discussion/patches/pulls/bugs/help
 help / color / mirror / code / Atom feed
* Re: Please move to github
@ 2014-08-02  7:46 Gary Grossman
  0 siblings, 0 replies; 22+ messages in thread
From: Gary Grossman @ 2014-08-02  7:46 UTC (permalink / raw)
  To: e; +Cc: unicorn-public, michael

Hi Eric,

I work with Michael, and this discussion sure got off on the
wrong foot... we love unicorn and use it heavily, and just
want to contribute back to it.

To detail the encoding problem we were trying to fix, unicorn
uses rb_str_new in several places to create Ruby strings.
For Ruby 1.9 and later, these strings are assigned ASCII-8BIT
encoding.

While the Rack specification doesn't dictate what encoding
should be used for strings in the environment, many
developers would probably expect the default external encoding
setting in Encoding.default_external to be used.

Many Rails applications use UTF8 heavily. The use of ASCII-8BIT
in the env can lead to Encoding::CompatibilityErrors being
raised when a UTF8 string and ASCII-8BIT string are concatenated,
which happens frequently when properties like request.url are
referenced in erb templates. To get around these problems,
an app would have to force encoding on the strings in the env
manually. It seems a shame to do this in slower Ruby code when
it could be done up front by unicorn.

We'd like to propose that unicorn use rb_external_str_new to
make strings instead of rb_str_new.

Perhaps you have your reasons for continuing to use rb_str_new
but we figured we'd run this by you.

Here's a proposed patch.

From befb01530c8d930ba53cc58b979ddf42a4c12565 Mon Sep 17 00:00:00 2001
From: Gary Grossman <gary.grossman@gmail.com>
Date: Sat, 2 Aug 2014 00:19:30 -0700
Subject: [PATCH] If unicorn is used with Ruby 1.9 or later, use
 rb_external_str_new instead of rb_str_new to create strings. The resulting
 strings will use the default external encoding. Continue using rb_str_new
for
 older versions of Ruby.

Using the default external encoding instead of ASCII-8BIT for
strings is more in line with developer expectations and will cause
less unexpected bugs such as Encoding::CompatibilityErrors which
result when, say, a UTF8 string and ASCII-8BIT string are
concatenated together.

Added a unit test to ensure that strings returned in the Rack
environment conform to the default external encoding.
---
 ext/unicorn_http/ext_help.h |  6 ++++++
 test/unit/test_request.rb   | 13 +++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/ext/unicorn_http/ext_help.h b/ext/unicorn_http/ext_help.h
index c87c272..6806f8e 100644
--- a/ext/unicorn_http/ext_help.h
+++ b/ext/unicorn_http/ext_help.h
@@ -79,4 +79,10 @@ static int str_cstr_case_eq(VALUE val, const char *ptr,
long len)
 #define STR_CSTR_CASE_EQ(val, const_str) \
   str_cstr_case_eq(val, const_str, sizeof(const_str) - 1)

+#ifdef HAVE_RUBY_ENCODING_H
+/* Use default external encoding for strings for Ruby 1.9+,
+ * fall back to rb_str_new when unavailable */
+#define rb_str_new rb_external_str_new
+#endif
+
 #endif /* ext_help_h */
diff --git a/test/unit/test_request.rb b/test/unit/test_request.rb
index fbda1a2..0a105e0 100644
--- a/test/unit/test_request.rb
+++ b/test/unit/test_request.rb
@@ -179,4 +179,17 @@ class RequestTest < Test::Unit::TestCase
     env['rack.input'].rewind
     res = @lint.call(env)
   end
+
+  def test_encoding
+    if ''.respond_to?(:encoding)
+      client = MockRequest.new("GET http://e:3/x?y=z HTTP/1.1\r\n" \
+                               "Host: foo\r\n\r\n")
+      env = @request.read(client)
+      encoding = Encoding.default_external
+      assert_equal encoding, env['REQUEST_PATH'].encoding
+      assert_equal encoding, env['PATH_INFO'].encoding
+      assert_equal encoding, env['QUERY_STRING'].encoding
+    end
+  end
+
 end
-- 
1.9.1

Gary


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Please move to github
  2014-08-02 19:33     ` Michael Fischer
@ 2014-08-04  7:22       ` Hongli Lai
  0 siblings, 0 replies; 22+ messages in thread
From: Hongli Lai @ 2014-08-04  7:22 UTC (permalink / raw)
  To: Michael Fischer; +Cc: Gary Grossman, Eric Wong, unicorn-public, Michael Grosser

On Sat, Aug 2, 2014 at 9:33 PM, Michael Fischer <mfischer@zendesk.com> wrote:
> On Sat, Aug 2, 2014 at 12:07 PM, Gary Grossman <gary.grossman@gmail.com>
> wrote:
>> This might be one of those instances where it would be helpful for
>> implementation to lead specification. Unicorn is one of the leading
>> servers of its genre, if not the leader. If you supported a switch
>> that made the encoding regime more sane, I think other popular servers
>> like Thin and Passenger would swiftly follow and it might re-energize
>> the discussion about getting encodings into the Rack spec once and
>> for all, and give a base for experimentation and iteration for
>> getting the encodings in the spec right.
>>
>
> I agree with Gary here.  It's often too easy to decide to preserve the
> status quo because things work well enough -- and then, eventually, time
> catches up with you and it no longer does.

Hi guys. Phusion Passenger author here. I would very much support
standardization of encoding issues. Every now and then, a user submits
a bug report on Phusion Passenger, mentioning an encoding problem. The
user would say that the problem occurs on Phusion Passenger but not on
Unicorn/Thin/etc. The Rack spec doesn't say anything about encodings
so strictly speaking it's not "our fault", but it's still hard to tell
users that it's "their fault" or "their framework's fault" based on
this alone. It's also not a helpful answer: users often have no idea
what to do about the issue.

At this point, I don't really care what the standard is, as long as
it's a sane standard that everybody can follow.

In my opinion, following Encoding.default_external is not helpful.
Most users have absolutely no idea how to configure
Encoding.default_external, or even know that it exists. I've also
never, ever seen anybody who does *not* want default_external to be
UTF-8. If it's not set to UTF-8, then it's always by accident (e.g.
the user not knowing that it depends on LC_CTYPE, that LC_CTYPE is set
differently in the shell than from an init script, or even what
LC_CTYPE is).

-- 
Phusion | Web Application deployment, scaling, and monitoring solutions

Web: http://www.phusion.nl/
E-mail: info@phusion.nl
Chamber of commerce no: 08173483 (The Netherlands)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Please move to github
  2014-08-02 19:07   ` Gary Grossman
  2014-08-02 19:33     ` Michael Fischer
@ 2014-08-02 20:15     ` Eric Wong
  1 sibling, 0 replies; 22+ messages in thread
From: Eric Wong @ 2014-08-02 20:15 UTC (permalink / raw)
  To: Gary Grossman; +Cc: unicorn-public, michael

Gary Grossman <gary.grossman@gmail.com> wrote:
> We'd pretty much need to introduce some kind of configuration
> switch, at least for the short term and maybe for the long term.
> The hope would be that it could become the default setting.
> Apps that don't use UTF8 should be able to set their desired default
> external encoding appropriately.

If possible, I would like to avoid an option and rely on
Encoding.default_external in a new major version.  Too many ways to set
the same thing is confusing and requires extra documentation overhead.

> >The rack-devel mailing list had a discussion on this in September 2010
> >and a decision was never reached. You can search the archives at:
> >http://groups.google.com/group/rack-devel
> 
> I came across this thread but didn't realize that was the last word
> so far when it came to Rack and encodings.
> 
> This might be one of those instances where it would be helpful for
> implementation to lead specification. Unicorn is one of the leading
> servers of its genre, if not the leader. If you supported a switch
> that made the encoding regime more sane, I think other popular servers
> like Thin and Passenger would swiftly follow and it might re-energize
> the discussion about getting encodings into the Rack spec once and
> for all, and give a base for experimentation and iteration for
> getting the encodings in the spec right.

I might start with WEBrick (or the Rack/WEBrick handler).  WEBrick is
distributed with Ruby and maintained by the core team.  It's not used in
production much, but it the reference implementation which is usable
from all Ruby implementations.

naruse (from that rack-devel thread) is also active in Ruby core and
is very knowledgeable in these areas.

> Thanks again for reviewing the patch. I'll work on a new patch that
> incorporates your comments and has a switch for enabling/disabling
> the functionality, and I'll try to follow roughly what the spec
> group in 2010 thought would make sense in terms of encodings for
> the various strings in the env. And I'll see if I can ask the
> Rack folks to chime in.

Definitely get other Rack folks to chime in, even if it is a
unicorn-only change.  This has been a problem for years already,
so taking more time to get things right won't hurt.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Please move to github
  2014-08-02 19:07   ` Gary Grossman
@ 2014-08-02 19:33     ` Michael Fischer
  2014-08-04  7:22       ` Hongli Lai
  2014-08-02 20:15     ` Eric Wong
  1 sibling, 1 reply; 22+ messages in thread
From: Michael Fischer @ 2014-08-02 19:33 UTC (permalink / raw)
  To: Gary Grossman; +Cc: Eric Wong, unicorn-public, Michael Grosser

On Sat, Aug 2, 2014 at 12:07 PM, Gary Grossman <gary.grossman@gmail.com>
wrote:

This might be one of those instances where it would be helpful for
> implementation to lead specification. Unicorn is one of the leading
> servers of its genre, if not the leader. If you supported a switch
> that made the encoding regime more sane, I think other popular servers
> like Thin and Passenger would swiftly follow and it might re-energize
> the discussion about getting encodings into the Rack spec once and
> for all, and give a base for experimentation and iteration for
> getting the encodings in the spec right.
>

I agree with Gary here.  It's often too easy to decide to preserve the
status quo because things work well enough -- and then, eventually, time
catches up with you and it no longer does.

If Gary's proposal makes sense, and improves matters without doing
significant harm -- despite it not adhering to the letter of Rack
compliance as it is currently specified today -- it would represent a major
step forward if implemented in Unicorn.  (And as Gary suggested, the
specification and other implementors will probably catch up by necessity if
the behavior proves beneficial.)

The first step is to prove it's worth shaking the tree with some
benchmarks, though. :)

--Michael


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Please move to github
  2014-08-02  8:50 ` Eric Wong
@ 2014-08-02 19:07   ` Gary Grossman
  2014-08-02 19:33     ` Michael Fischer
  2014-08-02 20:15     ` Eric Wong
  0 siblings, 2 replies; 22+ messages in thread
From: Gary Grossman @ 2014-08-02 19:07 UTC (permalink / raw)
  To: Eric Wong; +Cc: unicorn-public, michael

Hi Eric,

Thanks for your reply and for reviewing the patch!

>Right, the Rack spec does not dictate this.  Doing this out-of-spec has
>the ability to break existing apps as well as compatibility with other
>app servers.

It's true, my patch is too naive since it's a pretty drastic change
in behavior not behind any kind of switch.

>What do other app servers do?

I did a little survey. ASCII-8BIT is kind of the de facto standard
even if it's not mandated by the Rack specification. Phusion
Passenger, Thin and WEBrick all send mostly ASCII-8BIT strings in
the env.

>My main concern is having more different behavior between various Rack
>servers servers, making it harder to switch between them.

Very valid; Rack wouldn't be much of a standard if there were a bunch
of variants in use.

>Another concern is breaking apps which are already working around this
>but work with non-UTF-8 encodings.

We'd pretty much need to introduce some kind of configuration
switch, at least for the short term and maybe for the long term.
The hope would be that it could become the default setting.
Apps that don't use UTF8 should be able to set their desired default
external encoding appropriately.

>The rack-devel mailing list had a discussion on this in September 2010
>and a decision was never reached. You can search the archives at:
>http://groups.google.com/group/rack-devel

I came across this thread but didn't realize that was the last word
so far when it came to Rack and encodings.

This might be one of those instances where it would be helpful for
implementation to lead specification. Unicorn is one of the leading
servers of its genre, if not the leader. If you supported a switch
that made the encoding regime more sane, I think other popular servers
like Thin and Passenger would swiftly follow and it might re-energize
the discussion about getting encodings into the Rack spec once and
for all, and give a base for experimentation and iteration for
getting the encodings in the spec right.

There's a lot of developer pain here. Many apps probably are serving
up encoding-related 500 errors without knowing it. There are
stories of developers adding "# encoding" everywhere, setting
the external/internal encoding, and then "things are fine until it
blows up somewhere else." I heard recently that a very large company
has stuck with Ruby 1.8.7, probably to avoid these encoding issues
among other things. It would be nice to improve the situation.

>Disclaimer: I am not an encoding expert, so for that reason I prefer
>to let other Rack folks make the decision.

I'm not an encoding expert either! Most people aren't... which is
why it'd be nice if they didn't have to know so much about it when
they write a Rack app!

>Do you have performance measurements for doing this as pure-Ruby
>middleware vs your patch?

I don't have measurements currently but I'll get some.
Our app is several years old and so there's a lot of stuff in
request.env by the time we get around to forcing everything to
UTF8 encoding. I wouldn't be surprised if the hit on
every single request is small but significant for us.

>So it should be best if there were a way to do this for all Rack
>servers.

Thanks again for reviewing the patch. I'll work on a new patch that
incorporates your comments and has a switch for enabling/disabling
the functionality, and I'll try to follow roughly what the spec
group in 2010 thought would make sense in terms of encodings for
the various strings in the env. And I'll see if I can ask the
Rack folks to chime in.

Gary


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Please move to github
  2014-08-02  7:51 Gary Grossman
  2014-08-02  7:54 ` Kapil Israni
@ 2014-08-02  8:50 ` Eric Wong
  2014-08-02 19:07   ` Gary Grossman
  1 sibling, 1 reply; 22+ messages in thread
From: Eric Wong @ 2014-08-02  8:50 UTC (permalink / raw)
  To: Gary Grossman; +Cc: unicorn-public, michael

Gary Grossman <gary.grossman@gmail.com> wrote:
> Hi Eric,
> 
> I work with Michael, and this discussion sure got off on the
> wrong foot... we love unicorn and use it heavily, and just
> want to contribute back to it.

No worries, cultural differences happen.  Thanks for following up.

> To detail the encoding problem we were trying to fix, unicorn
> uses rb_str_new in several places to create Ruby strings.
> For Ruby 1.9 and later, these strings are assigned ASCII-8BIT
> encoding.
> 
> While the Rack specification doesn't dictate what encoding
> should be used for strings in the environment, many
> developers would probably expect the default external encoding
> setting in Encoding.default_external to be used.

Right, the Rack spec does not dictate this.  Doing this out-of-spec has
the ability to break existing apps as well as compatibility with other
app servers.

What do other app servers do?

My main concern is having more different behavior between various Rack
servers servers, making it harder to switch between them.

Another concern is breaking apps which are already working around this
but work with non-UTF-8 encodings.

The rack-devel mailing list had a discussion on this in September 2010
and a decision was never reached. You can search the archives at:
http://groups.google.com/group/rack-devel

I've also saved the thread to a mbox at
http://80x24.org/rack-devel-encoding-2010.mbox.gz
since Google Groups archives are a bit painful to navigate.

Disclaimer: I am not an encoding expert, so for that reason I prefer
to let other Rack folks make the decision.

> Many Rails applications use UTF8 heavily. The use of ASCII-8BIT
> in the env can lead to Encoding::CompatibilityErrors being
> raised when a UTF8 string and ASCII-8BIT string are concatenated,
> which happens frequently when properties like request.url are
> referenced in erb templates. To get around these problems,
> an app would have to force encoding on the strings in the env
> manually. It seems a shame to do this in slower Ruby code when
> it could be done up front by unicorn.

Yes, this existing behavior sucks on UTF-8-heavy apps.  I would rather
not add more unicorn-only options which make switching between servers
harder.

Do you have performance measurements for doing this as pure-Ruby
middleware vs your patch?

My dislike of lock-in also applies to app servers.  Application-visible
differences like these should be avoided so people can switch between
servers, too.

So it should be best if there were a way to do this for all Rack
servers.

> We'd like to propose that unicorn use rb_external_str_new to
> make strings instead of rb_str_new.
> 
> Perhaps you have your reasons for continuing to use rb_str_new
> but we figured we'd run this by you.

If the Rack spec mandated encodings, I would do it in a heartbeat.

> Subject: [PATCH] If unicorn is used with Ruby 1.9 or later, use
>  rb_external_str_new instead of rb_str_new to create strings. The resulting
>  strings will use the default external encoding. Continue using rb_str_new for
>  older versions of Ruby.

A better, shorter, more direct subject would be:

Subject: use Encoding.default_external for header values

Commit message body is fine <snip>

> +#ifdef HAVE_RUBY_ENCODING_H
> +/* Use default external encoding for strings for Ruby 1.9+,
> + * fall back to rb_str_new when unavailable */
> +#define rb_str_new rb_external_str_new
> +#endif

This is too heavy-handed, as some strings (buffers) may
need to stay binary via rb_str_new.  If we were to do this, it would
something like:

#ifdef HAVE_RUBY_ENCODING_H
#  define env_val_new(ptr,len) rb_external_str_new((ptr),(len))
#else
#  define env_val_new(ptr,len) rb_str_new((ptr),(len))
#endif

... And only making sure header values are set to external.

Last I checked the HTTP RFCs (it's been a while) header keys are
required to be US-ASCII-only (and our parser enforces that).

> +  def test_encoding
> +    if ''.respond_to?(:encoding)
> +      client = MockRequest.new("GET http://e:3/x?y=z HTTP/1.1\r\n" \
> +                               "Host: foo\r\n\r\n")
> +      env = @request.read(client)
> +      encoding = Encoding.default_external
> +      assert_equal encoding, env['REQUEST_PATH'].encoding
> +      assert_equal encoding, env['PATH_INFO'].encoding
> +      assert_equal encoding, env['QUERY_STRING'].encoding
> +    end

This would need to test and work with (and appropriately reject)
invalid requests with bad encodings, too.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Please move to github
  2014-08-02  7:54 ` Kapil Israni
@ 2014-08-02  8:02   ` Eric Wong
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Wong @ 2014-08-02  8:02 UTC (permalink / raw)
  To: Kapil Israni; +Cc: unicorn-public

Kapil Israni <kapil.israni@gmail.com> wrote:
> How do I unsubscribe from this email list?

Send an email to: unicorn-public+unsubscribe@bogomips.org
(it should've been mentioned in the welcome message, and is in
 every header).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Please move to github
  2014-08-02  7:51 Gary Grossman
@ 2014-08-02  7:54 ` Kapil Israni
  2014-08-02  8:02   ` Eric Wong
  2014-08-02  8:50 ` Eric Wong
  1 sibling, 1 reply; 22+ messages in thread
From: Kapil Israni @ 2014-08-02  7:54 UTC (permalink / raw)
  Cc: unicorn-public

How do I unsubscribe from this email list?


On Sat, Aug 2, 2014 at 12:51 AM, Gary Grossman <gary.grossman@gmail.com>
wrote:

> Hi Eric,
>
> I work with Michael, and this discussion sure got off on the
> wrong foot... we love unicorn and use it heavily, and just
> want to contribute back to it.
>
> To detail the encoding problem we were trying to fix, unicorn
> uses rb_str_new in several places to create Ruby strings.
> For Ruby 1.9 and later, these strings are assigned ASCII-8BIT
> encoding.
>
> While the Rack specification doesn't dictate what encoding
> should be used for strings in the environment, many
> developers would probably expect the default external encoding
> setting in Encoding.default_external to be used.
>
> Many Rails applications use UTF8 heavily. The use of ASCII-8BIT
> in the env can lead to Encoding::CompatibilityErrors being
> raised when a UTF8 string and ASCII-8BIT string are concatenated,
> which happens frequently when properties like request.url are
> referenced in erb templates. To get around these problems,
> an app would have to force encoding on the strings in the env
> manually. It seems a shame to do this in slower Ruby code when
> it could be done up front by unicorn.
>
> We'd like to propose that unicorn use rb_external_str_new to
> make strings instead of rb_str_new.
>
> Perhaps you have your reasons for continuing to use rb_str_new
> but we figured we'd run this by you.
>
> Here's a proposed patch.
>
> Gary
>
> From befb01530c8d930ba53cc58b979ddf42a4c12565 Mon Sep 17 00:00:00 2001
> From: Gary Grossman <gary.grossman@gmail.com>
> Date: Sat, 2 Aug 2014 00:19:30 -0700
> Subject: [PATCH] If unicorn is used with Ruby 1.9 or later, use
>  rb_external_str_new instead of rb_str_new to create strings. The resulting
>  strings will use the default external encoding. Continue using rb_str_new
> for
>  older versions of Ruby.
>
> Using the default external encoding instead of ASCII-8BIT for
> strings is more in line with developer expectations and will cause
> less unexpected bugs such as Encoding::CompatibilityErrors which
> result when, say, a UTF8 string and ASCII-8BIT string are
> concatenated together.
>
> Added a unit test to ensure that strings returned in the Rack
> environment conform to the default external encoding.
> ---
>  ext/unicorn_http/ext_help.h |  6 ++++++
>  test/unit/test_request.rb   | 13 +++++++++++++
>  2 files changed, 19 insertions(+)
>
> diff --git a/ext/unicorn_http/ext_help.h b/ext/unicorn_http/ext_help.h
> index c87c272..6806f8e 100644
> --- a/ext/unicorn_http/ext_help.h
> +++ b/ext/unicorn_http/ext_help.h
> @@ -79,4 +79,10 @@ static int str_cstr_case_eq(VALUE val, const char *ptr,
> long len)
>  #define STR_CSTR_CASE_EQ(val, const_str) \
>    str_cstr_case_eq(val, const_str, sizeof(const_str) - 1)
>
> +#ifdef HAVE_RUBY_ENCODING_H
> +/* Use default external encoding for strings for Ruby 1.9+,
> + * fall back to rb_str_new when unavailable */
> +#define rb_str_new rb_external_str_new
> +#endif
> +
>  #endif /* ext_help_h */
> diff --git a/test/unit/test_request.rb b/test/unit/test_request.rb
> index fbda1a2..0a105e0 100644
> --- a/test/unit/test_request.rb
> +++ b/test/unit/test_request.rb
> @@ -179,4 +179,17 @@ class RequestTest < Test::Unit::TestCase
>      env['rack.input'].rewind
>      res = @lint.call(env)
>    end
> +
> +  def test_encoding
> +    if ''.respond_to?(:encoding)
> +      client = MockRequest.new("GET http://e:3/x?y=z HTTP/1.1\r\n" \
> +                               "Host: foo\r\n\r\n")
> +      env = @request.read(client)
> +      encoding = Encoding.default_external
> +      assert_equal encoding, env['REQUEST_PATH'].encoding
> +      assert_equal encoding, env['PATH_INFO'].encoding
> +      assert_equal encoding, env['QUERY_STRING'].encoding
> +    end
> +  end
> +
>  end
> --
> 1.9.1
>
>
>


-- 
Kapil


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Please move to github
@ 2014-08-02  7:51 Gary Grossman
  2014-08-02  7:54 ` Kapil Israni
  2014-08-02  8:50 ` Eric Wong
  0 siblings, 2 replies; 22+ messages in thread
From: Gary Grossman @ 2014-08-02  7:51 UTC (permalink / raw)
  To: e; +Cc: unicorn-public, michael

Hi Eric,

I work with Michael, and this discussion sure got off on the
wrong foot... we love unicorn and use it heavily, and just
want to contribute back to it.

To detail the encoding problem we were trying to fix, unicorn
uses rb_str_new in several places to create Ruby strings.
For Ruby 1.9 and later, these strings are assigned ASCII-8BIT
encoding.

While the Rack specification doesn't dictate what encoding
should be used for strings in the environment, many
developers would probably expect the default external encoding
setting in Encoding.default_external to be used.

Many Rails applications use UTF8 heavily. The use of ASCII-8BIT
in the env can lead to Encoding::CompatibilityErrors being
raised when a UTF8 string and ASCII-8BIT string are concatenated,
which happens frequently when properties like request.url are
referenced in erb templates. To get around these problems,
an app would have to force encoding on the strings in the env
manually. It seems a shame to do this in slower Ruby code when
it could be done up front by unicorn.

We'd like to propose that unicorn use rb_external_str_new to
make strings instead of rb_str_new.

Perhaps you have your reasons for continuing to use rb_str_new
but we figured we'd run this by you.

Here's a proposed patch.

Gary

From befb01530c8d930ba53cc58b979ddf42a4c12565 Mon Sep 17 00:00:00 2001
From: Gary Grossman <gary.grossman@gmail.com>
Date: Sat, 2 Aug 2014 00:19:30 -0700
Subject: [PATCH] If unicorn is used with Ruby 1.9 or later, use
 rb_external_str_new instead of rb_str_new to create strings. The resulting
 strings will use the default external encoding. Continue using rb_str_new for
 older versions of Ruby.

Using the default external encoding instead of ASCII-8BIT for
strings is more in line with developer expectations and will cause
less unexpected bugs such as Encoding::CompatibilityErrors which
result when, say, a UTF8 string and ASCII-8BIT string are
concatenated together.

Added a unit test to ensure that strings returned in the Rack
environment conform to the default external encoding.
---
 ext/unicorn_http/ext_help.h |  6 ++++++
 test/unit/test_request.rb   | 13 +++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/ext/unicorn_http/ext_help.h b/ext/unicorn_http/ext_help.h
index c87c272..6806f8e 100644
--- a/ext/unicorn_http/ext_help.h
+++ b/ext/unicorn_http/ext_help.h
@@ -79,4 +79,10 @@ static int str_cstr_case_eq(VALUE val, const char *ptr, long len)
 #define STR_CSTR_CASE_EQ(val, const_str) \
   str_cstr_case_eq(val, const_str, sizeof(const_str) - 1)
 
+#ifdef HAVE_RUBY_ENCODING_H
+/* Use default external encoding for strings for Ruby 1.9+,
+ * fall back to rb_str_new when unavailable */
+#define rb_str_new rb_external_str_new
+#endif
+
 #endif /* ext_help_h */
diff --git a/test/unit/test_request.rb b/test/unit/test_request.rb
index fbda1a2..0a105e0 100644
--- a/test/unit/test_request.rb
+++ b/test/unit/test_request.rb
@@ -179,4 +179,17 @@ class RequestTest < Test::Unit::TestCase
     env['rack.input'].rewind
     res = @lint.call(env)
   end
+
+  def test_encoding
+    if ''.respond_to?(:encoding)
+      client = MockRequest.new("GET http://e:3/x?y=z HTTP/1.1\r\n" \
+                               "Host: foo\r\n\r\n")
+      env = @request.read(client)
+      encoding = Encoding.default_external
+      assert_equal encoding, env['REQUEST_PATH'].encoding
+      assert_equal encoding, env['PATH_INFO'].encoding
+      assert_equal encoding, env['QUERY_STRING'].encoding
+    end
+  end
+
 end
-- 
1.9.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Please move to github
  2014-08-01 23:26           ` Daniel Evans
@ 2014-08-01 23:38             ` Michael Grosser
  0 siblings, 0 replies; 22+ messages in thread
From: Michael Grosser @ 2014-08-01 23:38 UTC (permalink / raw)
  To: Daniel Evans; +Cc: Xavier Noria, Gabe da Silveira, Eric Wong, unicorn-public

Let me quickly tell the other 8.9M projects on GH that they are doing
it wrong ;)

I think having different ways of interacting with every project you
want to work on makes OS harder,
also contributing on PRs is much easier when using some collaboration
tool, viewing the source is easier and patches are simpler to read.
It also encourages nice docs and other things newbies need (does not
really matter for some backend thing like unicorn).

Sure crazy snowflakes like linux kernel might need something
different, but most other giant and small projects do just fine.

This discussion is going nowhere, so let's just stop now ;)

On Fri, Aug 1, 2014 at 4:26 PM, Daniel Evans <evans.daniel.n@gmail.com> wrote:
> This brings back memories of the BitKeeper debacle with the Linux kernel.
> One of the reasons git exists is because a large open source project trusted
> a proprietary product and it got ripped out from underneath them.
>
> On Fri, Aug 1, 2014 at 5:12 PM, Michael Grosser <michael@grosser.it> wrote:
>>
>> Patch coming soon, already pinpointed it, just wanted to look at the
>> issues to see if someone already solved it when I noticed that it's
>> not on github.
>>
>> But yeah otherwise, use whatever you like, chances are you do the most
>> work here anyway ;)
>>
>> As far as I am concerned any other OS/self-hosted tool like gitlab etc
>> would also be an improvement.
>>
>> On Fri, Aug 1, 2014 at 4:09 PM, Xavier Noria <fxn@hashref.com> wrote:
>> > Guys, Eric has obviously made a conscious choice by not hosting the source
>> > code on GitHub, and his rationale is clear.
>> >
>> > Eric I for one respect and understand your point of view, and think you have
>> > the right to do whatever you want with your projects. Thanks a lot for your
>> > open source and your integrity.
>> >
>>
>
>
>
> --
> Daniel Evans

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Please move to github
  2014-08-01 23:12         ` Michael Grosser
  2014-08-01 23:24           ` Eric Wong
@ 2014-08-01 23:26           ` Daniel Evans
  2014-08-01 23:38             ` Michael Grosser
  1 sibling, 1 reply; 22+ messages in thread
From: Daniel Evans @ 2014-08-01 23:26 UTC (permalink / raw)
  To: Michael Grosser; +Cc: Xavier Noria, Gabe da Silveira, Eric Wong, unicorn-public

This brings back memories of the BitKeeper debacle with the Linux kernel.
One of the reasons git exists is because a large open source project trusted
a proprietary product and it got ripped out from underneath them.

On Fri, Aug 1, 2014 at 5:12 PM, Michael Grosser <michael@grosser.it> wrote:
>
> Patch coming soon, already pinpointed it, just wanted to look at the
> issues to see if someone already solved it when I noticed that it's
> not on github.
>
> But yeah otherwise, use whatever you like, chances are you do the most
> work here anyway ;)
>
> As far as I am concerned any other OS/self-hosted tool like gitlab etc
> would also be an improvement.
>
> On Fri, Aug 1, 2014 at 4:09 PM, Xavier Noria <fxn@hashref.com> wrote:
> > Guys, Eric has obviously made a conscious choice by not hosting the source
> > code on GitHub, and his rationale is clear.
> >
> > Eric I for one respect and understand your point of view, and think you have
> > the right to do whatever you want with your projects. Thanks a lot for your
> > open source and your integrity.
> >
>



--
Daniel Evans

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Please move to github
  2014-08-01 23:12         ` Michael Grosser
@ 2014-08-01 23:24           ` Eric Wong
  2014-08-01 23:26           ` Daniel Evans
  1 sibling, 0 replies; 22+ messages in thread
From: Eric Wong @ 2014-08-01 23:24 UTC (permalink / raw)
  To: Michael Grosser; +Cc: Xavier Noria, Gabe da Silveira, Eric Wong, unicorn-public

Michael Grosser <michael@grosser.it> wrote:
> Patch coming soon, already pinpointed it, just wanted to look at the
> issues to see if someone already solved it when I noticed that it's
> not on github.

Thanks, I look forward to it.

> As far as I am concerned any other OS/self-hosted tool like gitlab etc
> would also be an improvement.

I do not like using web browsers or any sort of login/registration.
I made that clear when I first announced unicorn on the mongrel
list many years ago[1].  The only reason I can sustain working on
a project is if I don't have to deal with that stuff.

_Anybody_ can send plain-text email to this list, and it'll be archived
and viewable forever.

[1] http://mid.gmane.org/20090211230457.GB22926@dcvr.yhbt.net
    (it turns out I can provide support the project, but only
     as long as I stay inside my plain-text mail client).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Please move to github
  2014-08-01 23:09       ` Xavier Noria
  2014-08-01 23:12         ` Michael Grosser
@ 2014-08-01 23:18         ` Aaron Suggs
  1 sibling, 0 replies; 22+ messages in thread
From: Aaron Suggs @ 2014-08-01 23:18 UTC (permalink / raw)
  To: Xavier Noria; +Cc: Gabe da Silveira, Michael Grosser, Eric Wong, unicorn-public



> On Aug 1, 2014, at 7:09 PM, Xavier Noria <fxn@hashref.com> wrote:
> 
> Guys, Eric has obviously made a conscious choice by not hosting the source
> code on GitHub, and his rationale is clear.
> 
> Eric I for one respect and understand your point of view, and think you
> have the right to do whatever you want with your projects. Thanks a lot for
> your open source and your integrity.

Well said, Xavier. Thank you, Eric.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Please move to github
  2014-08-01 23:09       ` Xavier Noria
@ 2014-08-01 23:12         ` Michael Grosser
  2014-08-01 23:24           ` Eric Wong
  2014-08-01 23:26           ` Daniel Evans
  2014-08-01 23:18         ` Aaron Suggs
  1 sibling, 2 replies; 22+ messages in thread
From: Michael Grosser @ 2014-08-01 23:12 UTC (permalink / raw)
  To: Xavier Noria; +Cc: Gabe da Silveira, Eric Wong, unicorn-public

Patch coming soon, already pinpointed it, just wanted to look at the
issues to see if someone already solved it when I noticed that it's
not on github.

But yeah otherwise, use whatever you like, chances are you do the most
work here anyway ;)

As far as I am concerned any other OS/self-hosted tool like gitlab etc
would also be an improvement.

On Fri, Aug 1, 2014 at 4:09 PM, Xavier Noria <fxn@hashref.com> wrote:
> Guys, Eric has obviously made a conscious choice by not hosting the source
> code on GitHub, and his rationale is clear.
>
> Eric I for one respect and understand your point of view, and think you have
> the right to do whatever you want with your projects. Thanks a lot for your
> open source and your integrity.
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Please move to github
  2014-08-01 22:48     ` Gabe da Silveira
@ 2014-08-01 23:09       ` Xavier Noria
  2014-08-01 23:12         ` Michael Grosser
  2014-08-01 23:18         ` Aaron Suggs
  0 siblings, 2 replies; 22+ messages in thread
From: Xavier Noria @ 2014-08-01 23:09 UTC (permalink / raw)
  To: Gabe da Silveira; +Cc: Michael Grosser, Eric Wong, unicorn-public

Guys, Eric has obviously made a conscious choice by not hosting the source
code on GitHub, and his rationale is clear.

Eric I for one respect and understand your point of view, and think you
have the right to do whatever you want with your projects. Thanks a lot for
your open source and your integrity.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Please move to github
       [not found]       ` <CAAms34NByMcXnbGQ3DvCZTsQuh6yBn8sMSNeTi7Ss4VmmYoOrQ@mail.gmail.com>
@ 2014-08-01 23:07         ` Eric Wong
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Wong @ 2014-08-01 23:07 UTC (permalink / raw)
  To: Michael Grosser; +Cc: unicorn-public

(please don't drop Cc-s, we do not require subscription)

Michael Grosser <michael@grosser.it> wrote:
> Most open-source projects need some kind of work done even if they are
> stable (atm we are trying to fix some encoding issue)

Please describe the issues you're experiencing, then.

> Easier access would mean more PRs vs people just monkey-patching it
> and not caring about giving back.

The better option is to encourage people to use non-proprietary
communications platforms.

As long as contributions are technically valid, I'll accept patches/pull
requests from anyone, anywhere.  I do not even require people use real
names.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Please move to github
  2014-08-01 22:27   ` Michael Grosser
  2014-08-01 22:41     ` Eric Wong
@ 2014-08-01 22:48     ` Gabe da Silveira
  2014-08-01 23:09       ` Xavier Noria
  1 sibling, 1 reply; 22+ messages in thread
From: Gabe da Silveira @ 2014-08-01 22:48 UTC (permalink / raw)
  To: Michael Grosser; +Cc: Eric Wong, unicorn-public

I am a huge fan of GitHub and use it daily in my professional job, but both
the original message and now this comes off full of youthful arrogance and
hubris (in addition to abysmal grammar). GitHub is not the only way to do
open-source, and developers who refuse to use something other than GitHub
tend to be people impressed by superficial polish, but lacking
understanding in the type of deep engineering that allows for things like
the Linux kernel (or preforking app servers like Unicorn) to work.  There
is more to software engineering than what is fashionable in the last 5
years.  Please be more respectful of the maintainers of serious software
like Unicorn than to piss on their choice of project management tools and
techniques.


On Fri, Aug 1, 2014 at 5:27 PM, Michael Grosser <michael@grosser.it> wrote:

> Using github would mean more contributors, better software and
> especially helping more people (compare patches/issues/communication
> on unicorn to any other os project and github and the difference
> should be apparent),
> and I think that's what OS is about ... it's not ideal, but it's the
> best we got.
>
> On Fri, Aug 1, 2014 at 2:32 PM, Eric Wong <e@80x24.org> wrote:
> > Michael Grosser <michael@grosser.it> wrote:
> >> Current state makes it very hard to mange/search/fork/open-issues etc
> >> especially for newcomers,
> >> please move the project to github so we can have nice disussions
> >> forks/prs etc goodness.
> >
> > No.  Never.  Github is proprietary communications tool which requires
> > users to accept a terms of service and login.  That gives power and
> > influence to a single entity (and a for-profit organization at that).
> >
> > Contributing to unicorn is *socially* as easy as contributing to git or
> > the Linux kernel.  There is no need to signup for anything, no need to
> > ever touch a bloated web browser.
> >
> > The reason I contribute to Free Software is because I am against any
> > sort of lock-in or proprietary features.  It absolutely sickens me to
> > encounter users who seem to be incapable of using git without a
> > proprietary communications tool.
>
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Please move to github
  2014-08-01 22:27   ` Michael Grosser
@ 2014-08-01 22:41     ` Eric Wong
       [not found]       ` <CAAms34NByMcXnbGQ3DvCZTsQuh6yBn8sMSNeTi7Ss4VmmYoOrQ@mail.gmail.com>
  2014-08-01 22:48     ` Gabe da Silveira
  1 sibling, 1 reply; 22+ messages in thread
From: Eric Wong @ 2014-08-01 22:41 UTC (permalink / raw)
  To: Michael Grosser; +Cc: unicorn-public

Michael Grosser <michael@grosser.it> wrote:
> Using github would mean more contributors, better software and
> especially helping more people (compare patches/issues/communication
> on unicorn to any other os project and github and the difference
> should be apparent),
> and I think that's what OS is about ... it's not ideal, but it's the
> best we got.

The problem is the Ruby community accepting non-Free solutions and
letting them become a standard.  We need to fix that, but I'm not sure
how.

Also, does a pre-forking server based on ancient technology need much
work done on it?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Please move to github
@ 2014-08-01 22:32 Victor Kmita
  0 siblings, 0 replies; 22+ messages in thread
From: Victor Kmita @ 2014-08-01 22:32 UTC (permalink / raw)
  To: e; +Cc: unicorn-public, michael

"No.  Never.=94 =93It absolutely sickens me to encounter users who seem =
to be incapable of using git without a proprietary communications tool."
Dude. You sicken me.=


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Please move to github
  2014-08-01 21:32 ` Eric Wong
@ 2014-08-01 22:27   ` Michael Grosser
  2014-08-01 22:41     ` Eric Wong
  2014-08-01 22:48     ` Gabe da Silveira
  0 siblings, 2 replies; 22+ messages in thread
From: Michael Grosser @ 2014-08-01 22:27 UTC (permalink / raw)
  To: Eric Wong; +Cc: unicorn-public

Using github would mean more contributors, better software and
especially helping more people (compare patches/issues/communication
on unicorn to any other os project and github and the difference
should be apparent),
and I think that's what OS is about ... it's not ideal, but it's the
best we got.

On Fri, Aug 1, 2014 at 2:32 PM, Eric Wong <e@80x24.org> wrote:
> Michael Grosser <michael@grosser.it> wrote:
>> Current state makes it very hard to mange/search/fork/open-issues etc
>> especially for newcomers,
>> please move the project to github so we can have nice disussions
>> forks/prs etc goodness.
>
> No.  Never.  Github is proprietary communications tool which requires
> users to accept a terms of service and login.  That gives power and
> influence to a single entity (and a for-profit organization at that).
>
> Contributing to unicorn is *socially* as easy as contributing to git or
> the Linux kernel.  There is no need to signup for anything, no need to
> ever touch a bloated web browser.
>
> The reason I contribute to Free Software is because I am against any
> sort of lock-in or proprietary features.  It absolutely sickens me to
> encounter users who seem to be incapable of using git without a
> proprietary communications tool.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Please move to github
  2014-08-01 20:27 Michael Grosser
@ 2014-08-01 21:32 ` Eric Wong
  2014-08-01 22:27   ` Michael Grosser
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Wong @ 2014-08-01 21:32 UTC (permalink / raw)
  To: Michael Grosser; +Cc: unicorn-public

Michael Grosser <michael@grosser.it> wrote:
> Current state makes it very hard to mange/search/fork/open-issues etc
> especially for newcomers,
> please move the project to github so we can have nice disussions
> forks/prs etc goodness.

No.  Never.  Github is proprietary communications tool which requires
users to accept a terms of service and login.  That gives power and
influence to a single entity (and a for-profit organization at that).

Contributing to unicorn is *socially* as easy as contributing to git or
the Linux kernel.  There is no need to signup for anything, no need to
ever touch a bloated web browser.

The reason I contribute to Free Software is because I am against any
sort of lock-in or proprietary features.  It absolutely sickens me to
encounter users who seem to be incapable of using git without a
proprietary communications tool.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Please move to github
@ 2014-08-01 20:27 Michael Grosser
  2014-08-01 21:32 ` Eric Wong
  0 siblings, 1 reply; 22+ messages in thread
From: Michael Grosser @ 2014-08-01 20:27 UTC (permalink / raw)
  To: unicorn-public

Current state makes it very hard to mange/search/fork/open-issues etc
especially for newcomers,
please move the project to github so we can have nice disussions
forks/prs etc goodness.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-08-04  7:23 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-02  7:46 Please move to github Gary Grossman
  -- strict thread matches above, loose matches on Subject: below --
2014-08-02  7:51 Gary Grossman
2014-08-02  7:54 ` Kapil Israni
2014-08-02  8:02   ` Eric Wong
2014-08-02  8:50 ` Eric Wong
2014-08-02 19:07   ` Gary Grossman
2014-08-02 19:33     ` Michael Fischer
2014-08-04  7:22       ` Hongli Lai
2014-08-02 20:15     ` Eric Wong
2014-08-01 22:32 Victor Kmita
2014-08-01 20:27 Michael Grosser
2014-08-01 21:32 ` Eric Wong
2014-08-01 22:27   ` Michael Grosser
2014-08-01 22:41     ` Eric Wong
     [not found]       ` <CAAms34NByMcXnbGQ3DvCZTsQuh6yBn8sMSNeTi7Ss4VmmYoOrQ@mail.gmail.com>
2014-08-01 23:07         ` Eric Wong
2014-08-01 22:48     ` Gabe da Silveira
2014-08-01 23:09       ` Xavier Noria
2014-08-01 23:12         ` Michael Grosser
2014-08-01 23:24           ` Eric Wong
2014-08-01 23:26           ` Daniel Evans
2014-08-01 23:38             ` Michael Grosser
2014-08-01 23:18         ` Aaron Suggs

Code repositories for project(s) associated with this inbox:

	../../../unicorn.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).