From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS15169 74.125.0.0/16 X-Spam-Status: No, score=-1.1 required=3.0 tests=AWL,BAYES_00,FREEMAIL_FROM, RCVD_IN_DNSWL_LOW shortcircuit=no autolearn=ham version=3.3.2 X-Original-To: unicorn-public@bogomips.org Received: from mail-wg0-f42.google.com (mail-wg0-f42.google.com [74.125.82.42]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 3ECC01FEC8 for ; Sat, 2 Aug 2014 07:46:13 +0000 (UTC) Received: by mail-wg0-f42.google.com with SMTP id l18so5132632wgh.13 for ; Sat, 02 Aug 2014 00:46:11 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.180.91.111 with SMTP id cd15mr13687341wib.69.1406965571743; Sat, 02 Aug 2014 00:46:11 -0700 (PDT) Received: by 10.180.238.73 with HTTP; Sat, 2 Aug 2014 00:46:11 -0700 (PDT) Date: Sat, 2 Aug 2014 00:46:11 -0700 Message-ID: Subject: Re: Please move to github From: Gary Grossman To: e@80x24.org Cc: unicorn-public@bogomips.org, michael@grosser.it Content-Type: text/plain; charset=ISO-8859-1 X-Content-Filtered-By: PublicInbox::Filter 0.0.1 List-Id: Hi Eric, I work with Michael, and this discussion sure got off on the wrong foot... we love unicorn and use it heavily, and just want to contribute back to it. To detail the encoding problem we were trying to fix, unicorn uses rb_str_new in several places to create Ruby strings. For Ruby 1.9 and later, these strings are assigned ASCII-8BIT encoding. While the Rack specification doesn't dictate what encoding should be used for strings in the environment, many developers would probably expect the default external encoding setting in Encoding.default_external to be used. Many Rails applications use UTF8 heavily. The use of ASCII-8BIT in the env can lead to Encoding::CompatibilityErrors being raised when a UTF8 string and ASCII-8BIT string are concatenated, which happens frequently when properties like request.url are referenced in erb templates. To get around these problems, an app would have to force encoding on the strings in the env manually. It seems a shame to do this in slower Ruby code when it could be done up front by unicorn. We'd like to propose that unicorn use rb_external_str_new to make strings instead of rb_str_new. Perhaps you have your reasons for continuing to use rb_str_new but we figured we'd run this by you. Here's a proposed patch. >From befb01530c8d930ba53cc58b979ddf42a4c12565 Mon Sep 17 00:00:00 2001 From: Gary Grossman Date: Sat, 2 Aug 2014 00:19:30 -0700 Subject: [PATCH] If unicorn is used with Ruby 1.9 or later, use rb_external_str_new instead of rb_str_new to create strings. The resulting strings will use the default external encoding. Continue using rb_str_new for older versions of Ruby. Using the default external encoding instead of ASCII-8BIT for strings is more in line with developer expectations and will cause less unexpected bugs such as Encoding::CompatibilityErrors which result when, say, a UTF8 string and ASCII-8BIT string are concatenated together. Added a unit test to ensure that strings returned in the Rack environment conform to the default external encoding. --- ext/unicorn_http/ext_help.h | 6 ++++++ test/unit/test_request.rb | 13 +++++++++++++ 2 files changed, 19 insertions(+) diff --git a/ext/unicorn_http/ext_help.h b/ext/unicorn_http/ext_help.h index c87c272..6806f8e 100644 --- a/ext/unicorn_http/ext_help.h +++ b/ext/unicorn_http/ext_help.h @@ -79,4 +79,10 @@ static int str_cstr_case_eq(VALUE val, const char *ptr, long len) #define STR_CSTR_CASE_EQ(val, const_str) \ str_cstr_case_eq(val, const_str, sizeof(const_str) - 1) +#ifdef HAVE_RUBY_ENCODING_H +/* Use default external encoding for strings for Ruby 1.9+, + * fall back to rb_str_new when unavailable */ +#define rb_str_new rb_external_str_new +#endif + #endif /* ext_help_h */ diff --git a/test/unit/test_request.rb b/test/unit/test_request.rb index fbda1a2..0a105e0 100644 --- a/test/unit/test_request.rb +++ b/test/unit/test_request.rb @@ -179,4 +179,17 @@ class RequestTest < Test::Unit::TestCase env['rack.input'].rewind res = @lint.call(env) end + + def test_encoding + if ''.respond_to?(:encoding) + client = MockRequest.new("GET http://e:3/x?y=z HTTP/1.1\r\n" \ + "Host: foo\r\n\r\n") + env = @request.read(client) + encoding = Encoding.default_external + assert_equal encoding, env['REQUEST_PATH'].encoding + assert_equal encoding, env['PATH_INFO'].encoding + assert_equal encoding, env['QUERY_STRING'].encoding + end + end + end -- 1.9.1 Gary