Issue #275: Regression: Unicode comments fail to be posted
|Reported by:||Konstantin Veretennicov|
|Created on:||2017-04-29 17:48|
|Updated on:||2018-01-20 21:07|
Steps to reproduce:
- Clean install Kallithea (on Windows, from sources, revision a1f8bf0)
- Create 2 users
- Create a repo and a PR
- Post an inline comment with Unicode characters
Expected: comments to work as usual for all users.
Actual: only PR owner can post Unicode, for other users it fails.
There is an error in the server log:
File "c:\kallithea\kallithea\lib\celerylib\tasks.py", line 307, in send_email % (' '.join(recipients), headers, subject, body, html_body)) UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 150: ordinal not in range(128)
Comment by Konstantin Veretennicov, on 2017-04-30 18:00
Confirmed on Ubuntu as well.
The following patch fixes it:
diff -r a1f8bf0428c5 kallithea/model/notification.py --- a/kallithea/model/notification.py Sat Apr 15 01:56:27 2017 +0200 +++ b/kallithea/model/notification.py Sun Apr 30 19:56:08 2017 +0200 @@ -342,4 +342,4 @@ }) log.debug('rendering tmpl %s with kwargs %s', base, _kwargs) - return email_template.render(**_kwargs) + return email_template.render_unicode(**_kwargs)
I wonder though if Mako should be configured globally to always emit Unicode. It has output_encoding='utf-8' at the moment, probably set somewhere by TG2 - I couldn't find it in Kallithea code.
Comment by Mads Kiilerich, on 2017-05-29 00:11
I guess the problem is caused by running the WSGI in an environment where the encoding is set to ASCII 7 bit. I can reproduce the behaviour on Linux with
LANG=C gearbox serve.
I am surprised if this is the only problem you see? Can you for example create repositories with non-ASCII characters in the name?
The Kallithea WSGI application must return encoded unicode and must thus know what encoding the system uses for example in the file system (also on Windows where the Python stack uses the 8-bit API).
Comment by Konstantin Veretennicov, on 2017-05-29 07:30
We avoid any non-ASCII paths and filenames in general. Those are fraught with issues. PR comments are different though - sometimes an accented character gets in, other times it's typographic quote copy/pasted.
I tried to create a Unicode-named repo through Kallithea UI - it worked (don't know if it'd blow up later somewhere). Adding Unicode-named file to it also worked. Hope it helps.
Comment by Mads Kiilerich, on 2017-05-30 01:53
Ok, pushed this - then let's see what comes up next.
Comment by Thomas De Schampheleire, on 2018-01-20 21:07
Problem seems solved with patch applied. Please reopen if there is still an issue.