Issue #281: cyrillic letters in repository filenames results in error
Reported by: | Alexander Nikitin |
State: | new |
Created on: | 2017-06-01 11:33 |
Updated on: | 2017-06-04 14:02 |
Description
Hello.
I've got a problem with displaying repository contents when it contains files with cyrillic letters:
when using default setting of default_encoding = utf8 and lang = ru I've got the following behaviour
-
User interface is ok:
-
File name and it's contents are not ok
-
File's URL doesn't work
when using setting of default_encoding = utf8,cp1251 and lang = ru I've got the following behaviour
-
User interface is still ok
-
File name and it's contents are ok now
-
But file link still doesn't work
Attached you will find hg repository that used for tests
Attachments
Comments
Comment by Mads Kiilerich, on 2017-06-01 11:57
As I think I mentioned on another issue: It looks like Kallithea is running in a Python environment where Python doesn't know how to encode non-ascii. On Linux, that would be if LANG=C
and can be fixed with for example LANG=en_US.utf8
.
Can you try that? What is your platform and setup?
Comment by Alexander Nikitin, on 2017-06-01 12:05
(venv) nikitin@ubuntu:/srv/kallithea/venv$ uname -a
Linux ubuntu 4.4.0-78-generic #99-Ubuntu SMP Thu Apr 27 15:29:09 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
(venv) nikitin@ubuntu:/srv/kallithea/venv$ lsb_release -a
No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04.2 LTS Release: 16.04 Codename: xenial
(venv) nikitin@ubuntu:/srv/kallithea/venv$ pip freeze | grep Kallithea
Kallithea==0.3.2
(venv) nikitin@ubuntu:/srv/kallithea/venv$ echo $LANG
en_US.UTF-8
I start Kallithea instance with
(venv) nikitin@ubuntu:/srv/kallithea/venv$ paster serve my.ini
Comment by Alexander Nikitin, on 2017-06-01 12:13
Oh, I've forgotten to mention another part of this bug - Kallithea corrupts cyrillic names in zip archive
Comment by Alexander Nikitin, on 2017-06-01 13:50
OK, I've done some debugging (well I'm not a python programmer :) ) .
I think that _file_paths in class MercurialChangeset contains data that doesn't match request
for example in def get_node(self, path):
I have "human readable" path value (from parameter)
path ::: текст с кириллицей и пробелами.txt
and hex encoded filename in self._file_paths
self._file_paths :::
['.hgignore', '\xf2\xe5\xea\xf1\xf2 \xf1 \xea\xe8\xf0\xe8\xeb\xeb\xe8\xf6\xe5\xe9 \xe8 \xef\xf0\xee\xe1\xe5\xeb\xe0\xec\xe8.txt']
this hex encoded characters are in cp1251 encoding
Comment by Alexander Nikitin, on 2017-06-04 13:53
Some more updates for this issue
path parameter in def get_node(self, path): is passed as unicode string (or utf-8 - I'm not sure)
self._file_paths data contains hex escaped file name in cp1251 encoding
that's why method def get_node(self, path): cannot find path in mercurial's _file_paths array
Comment by Alexander Nikitin, on 2017-06-04 14:02
As for the second part of of this bug (Kallithea corrupts cyrillic names in zip archive)
cyrillic file name is encoded as cp1252 instead of cp1251 so that can be mercurial's API issue or misconfiguration of my Kallithea setup environment variables