All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] gtk: Hardcode LC_CTYPE as C.utf-8
@ 2017-01-31 10:09 Kevin Wolf
  2017-01-31 11:22 ` Alberto Garcia
  2017-01-31 13:04 ` Gerd Hoffmann
  0 siblings, 2 replies; 7+ messages in thread
From: Kevin Wolf @ 2017-01-31 10:09 UTC (permalink / raw
  To: qemu-devel; +Cc: kwolf, armbru, kraxel, berto, mfabian

Commit 2cb5d2a4 removed setlocale() for everything except LC_MESSAGES in
order to avoid unwanted side effects such as using the wrong decimal
separator in generated JSON objects. However, the problem that unsetting
LC_CTYPE caused is that non-ASCII characters are considered
non-printable now and therefore the GTK menus display question marks for
accented letters, Chinese characters etc.

A first attempt to fix this [1] was rejected because even just setting
LC_CTYPE to the user's locale (and thereby modifying the semantics of
the ctype.h functions) could have unwanted effects that we're not aware
of yet.

Recently, however, glibc introduced a new locale "C.utf-8" that just
uses UTF-8 as its charset, but otherwise leaves the semantics alone.
Just setting the right character set is enough for our use case, so we
can just hardcode this one without having to be afraid of nasty side
effects.

Older systems that don't have the new locale will continue displaying
question marks, but this should fix the problem for most users.

[1] https://lists.gnu.org/archive/html/qemu-devel/2015-12/msg03591.html
    ('Re: gtk: use setlocale() for LC_MESSAGES only')

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---

I happened to talk to Mike Fabian of our Internationalization team at
devconf.cz and mentioned our problem, and this is the solution that he
suggested. I hope we can finally get things back into a non-broken state
with this. :-)


 ui/gtk.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/ui/gtk.c b/ui/gtk.c
index 86368e3..8330762 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -2205,8 +2205,12 @@ void gtk_display_init(DisplayState *ds, bool full_screen, bool grab_on_hover)
 
     s->free_scale = FALSE;
 
-    /* LC_MESSAGES only. See early_gtk_display_init() for details */
+    /* Mostly LC_MESSAGES only. See early_gtk_display_init() for details. For
+     * LC_CTYPE, we need to make sure that non-ASCII characters are considered
+     * printable, but without changing any of the character classes to make
+     * sure that we don't accidentally break implicit assumptions.  */
     setlocale(LC_MESSAGES, "");
+    setlocale(LC_CTYPE, "C.utf-8");
     bindtextdomain("qemu", CONFIG_QEMU_LOCALEDIR);
     textdomain("qemu");
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH] gtk: Hardcode LC_CTYPE as C.utf-8
  2017-01-31 10:09 [Qemu-devel] [PATCH] gtk: Hardcode LC_CTYPE as C.utf-8 Kevin Wolf
@ 2017-01-31 11:22 ` Alberto Garcia
  2017-01-31 13:11   ` Mike FABIAN
  2017-01-31 13:59   ` Eric Blake
  2017-01-31 13:04 ` Gerd Hoffmann
  1 sibling, 2 replies; 7+ messages in thread
From: Alberto Garcia @ 2017-01-31 11:22 UTC (permalink / raw
  To: Kevin Wolf, qemu-devel; +Cc: armbru, kraxel, mfabian

On Tue 31 Jan 2017 11:09:45 AM CET, Kevin Wolf <kwolf@redhat.com> wrote:

> Recently, however, glibc introduced a new locale "C.utf-8" that just
> uses UTF-8 as its charset, but otherwise leaves the semantics alone.
> Just setting the right character set is enough for our use case, so we
> can just hardcode this one without having to be afraid of nasty side
> effects.

>     setlocale(LC_MESSAGES, "");
> +   setlocale(LC_CTYPE, "C.utf-8");
>     bindtextdomain("qemu", CONFIG_QEMU_LOCALEDIR);

A couple of quick questions:

- Is it C.utf-8 or C.UTF-8 ? 'locale -a' shows only the latter in my
  system.

- When was this added? This bug seems to be still open:
  https://sourceware.org/bugzilla/show_bug.cgi?id=17318

Berto

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH] gtk: Hardcode LC_CTYPE as C.utf-8
  2017-01-31 10:09 [Qemu-devel] [PATCH] gtk: Hardcode LC_CTYPE as C.utf-8 Kevin Wolf
  2017-01-31 11:22 ` Alberto Garcia
@ 2017-01-31 13:04 ` Gerd Hoffmann
  1 sibling, 0 replies; 7+ messages in thread
From: Gerd Hoffmann @ 2017-01-31 13:04 UTC (permalink / raw
  To: Kevin Wolf; +Cc: qemu-devel, armbru, berto, mfabian

> Recently, however, glibc introduced a new locale "C.utf-8" that just
> uses UTF-8 as its charset, but otherwise leaves the semantics alone.
> Just setting the right character set is enough for our use case, so we
> can just hardcode this one without having to be afraid of nasty side
> effects.

Cool.  Added to ui queue.

thanks,
  Gerd.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH] gtk: Hardcode LC_CTYPE as C.utf-8
  2017-01-31 11:22 ` Alberto Garcia
@ 2017-01-31 13:11   ` Mike FABIAN
  2017-01-31 13:23     ` Alberto Garcia
  2017-01-31 13:59   ` Eric Blake
  1 sibling, 1 reply; 7+ messages in thread
From: Mike FABIAN @ 2017-01-31 13:11 UTC (permalink / raw
  To: Alberto Garcia; +Cc: Kevin Wolf, qemu-devel, armbru, kraxel

Alberto Garcia <berto@igalia.com> さんはかきました:

> On Tue 31 Jan 2017 11:09:45 AM CET, Kevin Wolf <kwolf@redhat.com> wrote:
>
>> Recently, however, glibc introduced a new locale "C.utf-8" that just
>> uses UTF-8 as its charset, but otherwise leaves the semantics alone.
>> Just setting the right character set is enough for our use case, so we
>> can just hardcode this one without having to be afraid of nasty side
>> effects.
>
>>     setlocale(LC_MESSAGES, "");
>> +   setlocale(LC_CTYPE, "C.utf-8");
>>     bindtextdomain("qemu", CONFIG_QEMU_LOCALEDIR);
>
> A couple of quick questions:
>
> - Is it C.utf-8 or C.UTF-8 ? 'locale -a' shows only the latter in my
>   system.

Both work:

mfabian@taka:~
$ LC_ALL=C.utf-8 strace -eopen ls 2>&1 |  grep LC_CTYPE
open("/usr/lib/locale/C.utf-8/LC_CTYPE", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/locale/C.utf8/LC_CTYPE", O_RDONLY|O_CLOEXEC) = 3
mfabian@taka:~
$ LC_ALL=C.UTF-8 strace -eopen ls 2>&1 |  grep LC_CTYPE
open("/usr/lib/locale/C.UTF-8/LC_CTYPE", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/locale/C.utf8/LC_CTYPE", O_RDONLY|O_CLOEXEC) = 3
mfabian@taka:~
$ LC_ALL=C.UTF8 strace -eopen ls 2>&1 |  grep LC_CTYPE
open("/usr/lib/locale/C.UTF8/LC_CTYPE", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/usr/lib/locale/C.utf8/LC_CTYPE", O_RDONLY|O_CLOEXEC) = 3
mfabian@taka:~
$ LC_ALL=C.utf8 strace -eopen ls 2>&1 |  grep LC_CTYPE
open("/usr/lib/locale/C.utf8/LC_CTYPE", O_RDONLY|O_CLOEXEC) = 3
mfabian@taka:~
$ 

I like C.UTF-8 because “UTF-8” is the official spelling
of that encoding:

https://en.wikipedia.org/wiki/UTF-8#Official_name_and_variants

Using “C.utf8” uses one stat less though because it is the last
fallback, as you can see in the strace.

> - When was this added? This bug seems to be still open:
>   https://sourceware.org/bugzilla/show_bug.cgi?id=17318

Fedora has it since Fedora 24 (spring 2016), Debian for a while longer.

I’ll ping again to get it included upstream.

It needs 1.5MB at runtime only because of

https://sourceware.org/bugzilla/show_bug.cgi?id=18978

as soon as that sorting bug is fixed, the C.UTF-8 locale will need
less than 200k.

> Berto

-- 
Mike FABIAN <mfabian@redhat.com>
睡眠不足はいい仕事の敵だ。

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH] gtk: Hardcode LC_CTYPE as C.utf-8
  2017-01-31 13:11   ` Mike FABIAN
@ 2017-01-31 13:23     ` Alberto Garcia
  0 siblings, 0 replies; 7+ messages in thread
From: Alberto Garcia @ 2017-01-31 13:23 UTC (permalink / raw
  To: Mike FABIAN; +Cc: Kevin Wolf, qemu-devel, armbru, kraxel

On Tue 31 Jan 2017 02:11:07 PM CET, Mike FABIAN <mfabian@redhat.com> wrote:

>> - Is it C.utf-8 or C.UTF-8 ? 'locale -a' shows only the latter in my
>>   system.
>
> Both work:

Hmmm... apparently not in my system:

$ LC_CTYPE=C.utf-8 locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_US.utf8
LANGUAGE=
LC_CTYPE=C.utf-8
[...]

$ LC_CTYPE=C.UTF-8 locale
LANG=en_US.utf8
LANGUAGE=
LC_CTYPE=C.UTF-8

I have glibc 2.24-8 (Debian).

>> - When was this added? This bug seems to be still open:
>>   https://sourceware.org/bugzilla/show_bug.cgi?id=17318
>
> Fedora has it since Fedora 24 (spring 2016), Debian for a while
> longer.
>
> I’ll ping again to get it included upstream.

Ah ok, so it's a distro-specific change at the moment.

Thanks,

Berto

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH] gtk: Hardcode LC_CTYPE as C.utf-8
  2017-01-31 11:22 ` Alberto Garcia
  2017-01-31 13:11   ` Mike FABIAN
@ 2017-01-31 13:59   ` Eric Blake
  2017-01-31 16:05     ` Mike FABIAN
  1 sibling, 1 reply; 7+ messages in thread
From: Eric Blake @ 2017-01-31 13:59 UTC (permalink / raw
  To: Alberto Garcia, Kevin Wolf, qemu-devel; +Cc: mfabian, armbru, kraxel

[-- Attachment #1: Type: text/plain, Size: 996 bytes --]

On 01/31/2017 05:22 AM, Alberto Garcia wrote:
> On Tue 31 Jan 2017 11:09:45 AM CET, Kevin Wolf <kwolf@redhat.com> wrote:
> 
>> Recently, however, glibc introduced a new locale "C.utf-8" that just
>> uses UTF-8 as its charset, but otherwise leaves the semantics alone.
>> Just setting the right character set is enough for our use case, so we
>> can just hardcode this one without having to be afraid of nasty side
>> effects.
> 
>>     setlocale(LC_MESSAGES, "");
>> +   setlocale(LC_CTYPE, "C.utf-8");
>>     bindtextdomain("qemu", CONFIG_QEMU_LOCALEDIR);
> 
> A couple of quick questions:
> 
> - Is it C.utf-8 or C.UTF-8 ? 'locale -a' shows only the latter in my
>   system.

At least Cygwin has C.UTF-8, but not C.utf-8.  Furthermore, since my
system defaults to en_US.UTF-8, I would expect the upper-case variant
for the character set name across all locales.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] [PATCH] gtk: Hardcode LC_CTYPE as C.utf-8
  2017-01-31 13:59   ` Eric Blake
@ 2017-01-31 16:05     ` Mike FABIAN
  0 siblings, 0 replies; 7+ messages in thread
From: Mike FABIAN @ 2017-01-31 16:05 UTC (permalink / raw
  To: Eric Blake; +Cc: Alberto Garcia, Kevin Wolf, qemu-devel, armbru, kraxel

Eric Blake <eblake@redhat.com> さんはかきました:

> On 01/31/2017 05:22 AM, Alberto Garcia wrote:
>> On Tue 31 Jan 2017 11:09:45 AM CET, Kevin Wolf <kwolf@redhat.com> wrote:
>> 
>>> Recently, however, glibc introduced a new locale "C.utf-8" that just
>>> uses UTF-8 as its charset, but otherwise leaves the semantics alone.
>>> Just setting the right character set is enough for our use case, so we
>>> can just hardcode this one without having to be afraid of nasty side
>>> effects.
>> 
>>>     setlocale(LC_MESSAGES, "");
>>> +   setlocale(LC_CTYPE, "C.utf-8");
>>>     bindtextdomain("qemu", CONFIG_QEMU_LOCALEDIR);
>> 
>> A couple of quick questions:
>> 
>> - Is it C.utf-8 or C.UTF-8 ? 'locale -a' shows only the latter in my
>>   system.
>
> At least Cygwin has C.UTF-8, but not C.utf-8.  Furthermore, since my
> system defaults to en_US.UTF-8, I would expect the upper-case variant
> for the character set name across all locales.

Yes, I think it is more compatible to use the C.UTF-8 spelling because
it uses the “official” spelling of “UTF-8”.

-- 
Mike FABIAN <mfabian@redhat.com>
睡眠不足はいい仕事の敵だ。

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-01-31 16:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-01-31 10:09 [Qemu-devel] [PATCH] gtk: Hardcode LC_CTYPE as C.utf-8 Kevin Wolf
2017-01-31 11:22 ` Alberto Garcia
2017-01-31 13:11   ` Mike FABIAN
2017-01-31 13:23     ` Alberto Garcia
2017-01-31 13:59   ` Eric Blake
2017-01-31 16:05     ` Mike FABIAN
2017-01-31 13:04 ` Gerd Hoffmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.