Kallithea issues archive

Issue #281: cyrillic letters in repository filenames results in error

Reported by: Alexander Nikitin
State: new
Created on: 2017-06-01 11:33
Updated on: 2017-06-04 14:02

Description

Hello.

I've got a problem with displaying repository contents when it contains files with cyrillic letters:

when using default setting of default_encoding = utf8 and lang = ru I've got the following behaviour

  1. User interface is ok: 1.user_interface_is_ok.png

  2. File name and it's contents are not ok 2.file_name_and_file_content_are_not_ok.png

  3. File's URL doesn't work 3.url_link_to_file_with_cyrillic_doesn't_work.png

when using setting of default_encoding = utf8,cp1251 and lang = ru I've got the following behaviour

  1. User interface is still ok

  2. File name and it's contents are ok now 2.1.file_name_and_file_content_are_ok_now.png

  3. But file link still doesn't work 3.url_link_to_file_with_cyrillic_still_doesn't_work.png

Attached you will find hg repository that used for tests

Attachments

cyrillic_filename_test.7z

Comments

Comment by Mads Kiilerich, on 2017-06-01 11:57

As I think I mentioned on another issue: It looks like Kallithea is running in a Python environment where Python doesn't know how to encode non-ascii. On Linux, that would be if LANG=C and can be fixed with for example LANG=en_US.utf8 .

Can you try that? What is your platform and setup?

Comment by Alexander Nikitin, on 2017-06-01 12:05

(venv) nikitin@ubuntu:/srv/kallithea/venv$ uname -a

Linux ubuntu 4.4.0-78-generic #99-Ubuntu SMP Thu Apr 27 15:29:09 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

(venv) nikitin@ubuntu:/srv/kallithea/venv$ lsb_release -a

No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04.2 LTS Release: 16.04 Codename: xenial

(venv) nikitin@ubuntu:/srv/kallithea/venv$ pip freeze | grep Kallithea

Kallithea==0.3.2

(venv) nikitin@ubuntu:/srv/kallithea/venv$ echo $LANG

en_US.UTF-8

I start Kallithea instance with

(venv) nikitin@ubuntu:/srv/kallithea/venv$ paster serve my.ini

Comment by Alexander Nikitin, on 2017-06-01 12:13

Oh, I've forgotten to mention another part of this bug - Kallithea corrupts cyrillic names in zip archive

4.bad_cyrillic_filename_in_zip.png

Comment by Alexander Nikitin, on 2017-06-01 13:50

OK, I've done some debugging (well I'm not a python programmer :) ) .

I think that _file_paths in class MercurialChangeset contains data that doesn't match request

for example in def get_node(self, path):

I have "human readable" path value (from parameter)

path ::: текст с кириллицей и пробелами.txt

and hex encoded filename in self._file_paths

self._file_paths :::

['.hgignore', '\xf2\xe5\xea\xf1\xf2 \xf1 \xea\xe8\xf0\xe8\xeb\xeb\xe8\xf6\xe5\xe9 \xe8 \xef\xf0\xee\xe1\xe5\xeb\xe0\xec\xe8.txt']

this hex encoded characters are in cp1251 encoding

Comment by Alexander Nikitin, on 2017-06-04 13:53

Some more updates for this issue

path parameter in def get_node(self, path): is passed as unicode string (or utf-8 - I'm not sure)

self._file_paths data contains hex escaped file name in cp1251 encoding

that's why method def get_node(self, path): cannot find path in mercurial's _file_paths array

Comment by Alexander Nikitin, on 2017-06-04 14:02

As for the second part of of this bug (Kallithea corrupts cyrillic names in zip archive)

cyrillic file name is encoded as cp1252 instead of cp1251 so that can be mercurial's API issue or misconfiguration of my Kallithea setup environment variables