Issue #9: [doc, unicode] UTF-8 issues in the changelog (and not only)
Reported by: | 557058%3Adea91e4c-e257-42be-bc28-2cf352c368c8 |
State: | resolved |
Created on: | 2014-07-14 08:50 |
Updated on: | 2017-06-30 08:41 |
Description
For example, ‘Vernooij’ shows as ‘Vernoo?’.
Attachments
Comments
Comment by Mads Kiilerich, on 2014-07-14 16:14
Thanks for the hint about where to look for a problem. Here is an example of the problem: https://kallithea-scm.org/repos/kallithea/changeset/9ccdb6c537c9
The username is shown correctly with hg on a utf-8 capable unix command line.
Comment by Mads Kiilerich, on 2014-07-14 16:24
It works fine here. I guess it is caused by running the server with LANG=C and thus disabling some unicode handling.
It should perhaps be documented or made independent of the env settings.
Comment by Mads Kiilerich, on 2014-07-14 19:28
Comment by Thomas De Schampheleire, on 2015-06-16 19:53
@andrew_shadura @kiilerix What is the status of this one?
Comment by Mads Kiilerich, on 2015-06-16 22:31
I'm not entirely sure.
In some areas Mercurial makes naive guess/assumptions of what encoding is used. It might thus be necessary to run Kallithea (and thus hg) in an environment with for example HGENCODING=UTF-8 (or perhaps LANG=UTF-8 ... but that might also have other consequences). I guess it should be tested/reviewed and that code or documentation should be changed.
Comment by Thomas De Schampheleire, on 2015-06-19 12:33
Some info: I encountered errors when users added unicode characters in changeset/pullrequest comments, pullrequest titles or descriptions, ... It turned out that this was caused by the PostgreSQL database having encoding SQL_ASCII rather than the recommended UTF-8 (you can check this with 'psql -l')
This in itself was caused by having LC_CTYPE=C set when creating the database initially. Creating the database again (and migrating the existing data) but with LC_CTYPE unset so that the databases are all in UTF-8, made these issues disappear.
For reference, the LANG was always set to en_US.UTF-8 here.
Comment by Mads Kiilerich, on 2015-06-19 16:39
The database encoding might be something that should be mentioned in the documentation?
Comment by Thomas De Schampheleire, on 2015-07-27 20:21
Comment by Andrej Shadura, on 2017-06-28 20:28
I think this can now be closed, we've addressed a bunch of related issues since the date this bug has been reported.
Comment by Andrej Shadura, on 2017-06-28 20:29
Comment by Thomas De Schampheleire, on 2017-06-30 08:41
We should still make sure to update the documentation though: the user still needs to make sure to create the database in UTF-8.