Kallithea issues archive

Issue #9: [doc, unicode] UTF-8 issues in the changelog (and not only)

Reported by: 557058%3Adea91e4c-e257-42be-bc28-2cf352c368c8
State: resolved
Created on: 2014-07-14 08:50
Updated on: 2017-06-30 08:41

Description

For example, ‘Vernooij’ shows as ‘Vernoo?’.

Attachments

Comments

Comment by Mads Kiilerich, on 2014-07-14 16:14

Thanks for the hint about where to look for a problem. Here is an example of the problem: https://kallithea-scm.org/repos/kallithea/changeset/9ccdb6c537c9

The username is shown correctly with hg on a utf-8 capable unix command line.

Comment by Mads Kiilerich, on 2014-07-14 16:24

It works fine here. I guess it is caused by running the server with LANG=C and thus disabling some unicode handling.

It should perhaps be documented or made independent of the env settings.

Comment by Mads Kiilerich, on 2014-07-14 19:28

Comment by Thomas De Schampheleire, on 2015-06-16 19:53

@andrew_shadura @kiilerix What is the status of this one?

Comment by Mads Kiilerich, on 2015-06-16 22:31

I'm not entirely sure.

In some areas Mercurial makes naive guess/assumptions of what encoding is used. It might thus be necessary to run Kallithea (and thus hg) in an environment with for example HGENCODING=UTF-8 (or perhaps LANG=UTF-8 ... but that might also have other consequences). I guess it should be tested/reviewed and that code or documentation should be changed.

Comment by Thomas De Schampheleire, on 2015-06-19 12:33

Some info: I encountered errors when users added unicode characters in changeset/pullrequest comments, pullrequest titles or descriptions, ... It turned out that this was caused by the PostgreSQL database having encoding SQL_ASCII rather than the recommended UTF-8 (you can check this with 'psql -l')

This in itself was caused by having LC_CTYPE=C set when creating the database initially. Creating the database again (and migrating the existing data) but with LC_CTYPE unset so that the databases are all in UTF-8, made these issues disappear.

For reference, the LANG was always set to en_US.UTF-8 here.

Comment by Mads Kiilerich, on 2015-06-19 16:39

The database encoding might be something that should be mentioned in the documentation?

Comment by Thomas De Schampheleire, on 2015-07-27 20:21

Comment by Andrej Shadura, on 2017-06-28 20:28

I think this can now be closed, we've addressed a bunch of related issues since the date this bug has been reported.

Comment by Andrej Shadura, on 2017-06-28 20:29

Comment by Thomas De Schampheleire, on 2017-06-30 08:41

We should still make sure to update the documentation though: the user still needs to make sure to create the database in UTF-8.