Issue #147: UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1: ordinal not in range(128)
Reported by: | Udo Spallek |
State: | invalid |
Created on: | 2015-07-23 09:36 |
Updated on: | 2015-07-26 12:18 |
Description
Related Issue #9
When I try to create a kallithea fork of this repository https://code.google.com/p/hgnested/
an error is raised.
In the hg log is an author with Name "Cédric" which makes use of the character u'\xe9'.
TIA and best regards Udo
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/weberror/errormiddleware.py', line 162 in __call__ app_iter = self.application(environ, sr_checker) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/beaker/middleware.py', line 155 in __call__ return self.wrap_app(environ, session_start_response) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/routes/middleware.py', line 131 in __call__ response = self.app(environ, start_response) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/wsgiapp.py', line 107 in __call__ response = self.dispatch(controller, environ, start_response) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/wsgiapp.py', line 312 in dispatch return controller(environ, start_response) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/lib/base.py', line 383 in __call__ return WSGIController.__call__(self, environ, start_response) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/controllers/core.py', line 211 in __call__ response = self._dispatch_call() File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/controllers/core.py', line 162 in _dispatch_call response = self._inspect_call(func) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/controllers/core.py', line 105 in _inspect_call result = self._perform_call(func, args) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/controllers/core.py', line 57 in _perform_call return func(**args) File '<string>', line 2 in index File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/lib/auth.py', line 782 in __wrapper return func(*fargs, **fkwargs) File '<string>', line 2 in index File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/lib/auth.py', line 841 in __wrapper return func(*fargs, **fkwargs) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/controllers/summary.py', line 180 in index return render('summary/summary.html') File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/templating.py', line 243 in render_mako cache_type=cache_type, cache_expire=cache_expire) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/templating.py', line 218 in cached_template return render_func() File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/templating.py', line 240 in render_template return literal(template.render_unicode(**globs)) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/mako/template.py', line 452 in render_unicode as_unicode=True) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/mako/runtime.py', line 803 in _render **_kwargs_for_callable(callable_, data)) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/mako/runtime.py', line 835 in _render_context _exec_template(inherit, lclcontext, args=args, kwargs=kwargs) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/mako/runtime.py', line 855 in _exec_template _render_error(template, context, compat.exception_as()) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/mako/runtime.py', line 864 in _render_error result = template.error_handler(context, error) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/mako/runtime.py', line 853 in _exec_template callable_(context, *args, **kwargs) File '/var/local/kallithea/data/templates/base/root.html.py', line 209 in render_body __M_writer(escape(next.body())) File '/var/local/kallithea/data/templates/base/base.html.py', line 42 in render_body __M_writer(escape(next.main())) File '/var/local/kallithea/data/templates/summary/summary.html.py', line 241 in render_main runtime._include_file(context, u'../changelog/changelog_summary_data.html', _template_uri) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/mako/runtime.py', line 730 in _include_file callable_(ctx, **_kwargs_for_include(callable_, context._data, **kwargs)) File '/var/local/kallithea/data/templates/changelog/changelog_summary_data.html.py', line 79 in render_body __M_writer(escape(h.person(cs.author))) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/lib/helpers.py', line 518 in person user = user_or_none(author) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/lib/helpers.py', line 487 in user_or_none user = User.get_by_username(author_name(author), case_insensitive=True, cache=True) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/model/db.py', line 541 in get_by_username return q.scalar() File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py', line 2215 in scalar ret = self.one() File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py', line 2184 in one ret = list(self) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/lib/caching_query.py', line 80 in __iter__ return self.get_value(createfunc=lambda: File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/lib/caching_query.py', line 99 in get_value ret = cache.get_value(cache_key, createfunc=createfunc) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/beaker/cache.py', line 305 in get return self._get_value(key, **kw).get_value() File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/beaker/container.py', line 385 in get_value v = self.createfunc() File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/lib/caching_query.py', line 81 in <lambda> list(Query.__iter__(self))) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py', line 2227 in __iter__ return self._execute_and_instances(context) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py', line 2242 in _execute_and_instances result = conn.execute(querycontext.statement, self._params) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py', line 1449 in execute params) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py', line 1584 in _execute_clauseelement compiled_sql, distilled_params File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py', line 1691 in _execute_context context) File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py', line 331 in do_execute cursor.execute(statement, parameters) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1: ordinal not in range(128)
Attachments
Comments
Comment by Thomas De Schampheleire, on 2015-07-23 10:34
Which database are you using? I have seen similar backtraces when adding data into a PostgreSQL database (like revie comments or pull request descriptions) with unicode characters. After a long investigation it turned out that the database was in SQL_ASCII format rather than UTF8. Recreating the database in UTF8 solved the problem for me.
To detect if you are in that situation, run 'psql -l'
In my case, the reason that not the default utf8 format was chosen during database creation (initdb) was that I had one of the LC_* environment variables set to C. Unsetting that variable (LC_CTYPE in my case), leaving only LANG and LC_ALL (set to a utf8 compatible value) before creating the database again solved it.
It is apparently not possible to migrate a live database, but the migration went painless. Essentially I took a pg_dumpall and a 'pg_dump kallithea'. I manually fixed the SQL_ASCII references in these files. Then in the new database I recreate the kallithea user with the right permissions, then imported the data from the pg_dump file (kallithea db only), also using pg_dump, piping the dump from stdin.
Comment by Udo Spallek, on 2015-07-23 10:59
Thanks for the pointer and you are right, we use Postgres as database. Unfortunately we already use utf-8 as encoding.
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+-----------+----------+-------------+-------------+-----------------------
kallithea | kallithea | UTF8 | de_DE.UTF-8 | de_DE.UTF-8 |
I will try to examine the dump.
Comment by Udo Spallek, on 2015-07-23 11:10
In the dump I found SQL_ASCII only in the head part as SET client_encoding = 'SQL_ASCII';
But in the database it seems to be sane:
kallithea=# show client_encoding ; client_encoding ----------------- UTF8 kallithea=# show server_encoding ; server_encoding ----------------- UTF8
Any idea what can I do next? TIA Udo
Comment by Mads Kiilerich, on 2015-07-23 11:37
Something in your stack must be forcing everything to be encoded as plain ascii.
Try reproduce while running as paster serve and try reproducing with a simple sqlite database ... just to figure out what makes the problem occur.
Comment by Udo Spallek, on 2015-07-23 11:52
We use kallithea 0.2.1.
My locale are:
LANG=de_DE.UTF-8 LANGUAGE= LC_CTYPE="de_DE.UTF-8" LC_NUMERIC="de_DE.UTF-8" LC_TIME="de_DE.UTF-8" LC_COLLATE="de_DE.UTF-8" LC_MONETARY="de_DE.UTF-8" LC_MESSAGES="de_DE.UTF-8" LC_PAPER="de_DE.UTF-8" LC_NAME="de_DE.UTF-8" LC_ADDRESS="de_DE.UTF-8" LC_TELEPHONE="de_DE.UTF-8" LC_MEASUREMENT="de_DE.UTF-8" LC_IDENTIFICATION="de_DE.UTF-8" LC_ALL=
Comment by Udo Spallek, on 2015-07-23 17:45
After setting the client_encoding[1] in postgresql.conf
explicit to utf-8 and a database restart, anything works as expected.
Thanks for all the good hints!
[1] http://www.postgresql.org/docs/9.3/static/runtime-config-client.html#GUC-CLIENT-ENCODING
Comment by Mads Kiilerich, on 2015-07-23 19:21
Please consider contributing documentation improvements that can help others to avoid this problem.
Comment by Thomas De Schampheleire, on 2015-07-24 11:24
@udono Good that you found a solution. I still wonder though what was causing this problem. Would you care testing the problem (with the change in postgresql.conf undone) with following cases: - LANG=de_DE.UTF-8 + LC_ALL=de_DE.UTF-8 (all other LC_* variables unset) - LANG=en_US.utf8 (all LC variables unset; this is the situation I have and I know works). - LC_ALL=en_US.UTF-8 + LANG=en_US.utf8
Obviously you will have to restart Kallithea between each test.
This analysis can serve as input into improving the documentation. Thanks a lot in advance...
Comment by Udo Spallek, on 2015-07-24 17:28
I still wonder though what was causing this problem.
IMHO has Kallithea UTF-8 problems, when the client_encoding is set to SQL_ASCII in postgresql.conf.
A solution could be enforcing the client encoding[1] in kallithea/sqalchemy.
Workarounds are a. to remove the SQL_ASCII or b. to set utf8 client_encoding in postgresql.conf.
[1] http://initd.org/psycopg/docs/connection.html#connection.set_client_encoding
============= All Scenarios ============= dispatch.wsgi ============= Reset general locale setup: os.environ['LANG'] = '' os.environ['LC_CTYPE'] ="" os.environ['LC_NUMERIC'] ="" os.environ['LC_TIME'] ="" os.environ['LC_COLLATE'] ="" os.environ['LC_MONETARY'] ="" os.environ['LC_MESSAGES'] ="" os.environ['LC_PAPER'] ="" os.environ['LC_NAME'] ="" os.environ['LC_ADDRESS'] ="" os.environ['LC_TELEPHONE'] ="" os.environ['LC_MEASUREMENT'] ="" os.environ['LC_IDENTIFICATION'] ="" os.environ['LC_ALL'] = "" Scenario A ========== postgresql.conf --------------- Change from: client_encoding = utf8 to: client_encoding = SQL_ASCII Scenario A1 ----------- LANG=de_DE.UTF-8 + LC_ALL=de_DE.UTF-8 (all other LC_* variables unset) os.environ['LANG'] = 'de_DE.UTF-8' os.environ['LC_ALL'] = 'de_DE.UTF-8' :: $ systemctl restart uwsgi-emperor Same Error Scenario A2 ----------- LANG=en_US.utf8 (all LC variables unset; this is the situation I have and I know works). os.environ['LANG'] = 'en_US.UTF-8' :: $ systemctl restart uwsgi-emperor Same Error Scenario A3 ----------- LC_ALL=en_US.UTF-8 + LANG=en_US.utf8 os.environ['LANG'] = 'en_US.UTF-8' os.environ['LC_ALL'] = 'en_US.UTF-8' :: $ systemctl restart uwsgi-emperor Same Error Scenario B ========== postgresql.conf --------------- Remove option: client_encoding = utf8 and use Postgres default. Scenario B1 ----------- LANG=de_DE.UTF-8 + LC_ALL=de_DE.UTF-8 (all other LC_* variables unset) os.environ['LANG'] = 'de_DE.UTF-8' os.environ['LC_ALL'] = 'de_DE.UTF-8' :: $ systemctl restart uwsgi-emperor Works Perfect Scenario B2 ----------- LANG=en_US.utf8 (all LC variables unset; this is the situation I have and I know works). os.environ['LANG'] = 'en_US.UTF-8' :: $ systemctl restart uwsgi-emperor Works perfect Scenario B3 ----------- LC_ALL=en_US.UTF-8 + LANG=en_US.utf8 os.environ['LANG'] = 'en_US.UTF-8' os.environ['LC_ALL'] = 'en_US.UTF-8' :: $ systemctl restart uwsgi-emperor Works perfect Scenario C ========== postgresql.conf --------------- Set: client_encoding = utf8 Scenario C1 ----------- LANG=de_DE.UTF-8 + LC_ALL=de_DE.UTF-8 (all other LC_* variables unset) os.environ['LANG'] = 'de_DE.UTF-8' os.environ['LC_ALL'] = 'de_DE.UTF-8' :: $ systemctl restart uwsgi-emperor Works Perfect Scenario C2 ----------- LANG=en_US.utf8 (all LC variables unset; this is the situation I have and I know works). os.environ['LANG'] = 'en_US.UTF-8' :: $ systemctl restart uwsgi-emperor Works perfect Scenario C3 ----------- LC_ALL=en_US.UTF-8 + LANG=en_US.utf8 os.environ['LANG'] = 'en_US.UTF-8' os.environ['LC_ALL'] = 'en_US.UTF-8' :: $ systemctl restart uwsgi-emperor Works perfect
Comment by Mads Kiilerich, on 2015-07-24 18:32
ASCII means no unicode. Why would anybody want to configure that if they want unicode? And if they do, is that Kallithea's problem?
If some system vendor defaults to that, their users should file a bug and ask them to default to unicode or make it very clear how to change it. As a last resort, the Kallithea documentation could explain how to work around it.
How did it end up that way in your case?
I can see you use uwsgi. I haven't tried it but it seems appealing. It would be nice if we could get some documentation of how to configure that. ;-)
Comment by Udo Spallek, on 2015-07-25 18:25
ASCII means no unicode. Why would anybody want to configure that if they want unicode? And if they do, is that Kallithea's problem?
I investigated a little further and found a nice solution: Kallithea just needs to be started with this os environ variable: PGCLIENTENCODING='UTF8'. I tested with SQL_ASCII client encoding in postgres.conf and it works perfect.
Additionally, to my shame, it is already documented in Kallithea: http://docs.kallithea-scm.org/en/latest/setup.html#apache-s-wsgi-config
If some system vendor defaults to that, their users should file a bug and ask them to default to unicode or make it very clear how to change it. As a last resort, the Kallithea documentation could explain how to work around it. How did it end up that way in your case? We came along the debops way: https://github.com/debops/ansible-postgresql/search?utf8=%E2%9C%93&q=client_encoding
I can see you use uwsgi. I haven't tried it but it seems appealing. It would be nice if we could get some documentation of how to configure that. ;-) You can find the configuration in the above link. I do not know more.
Comment by Mads Kiilerich, on 2015-07-25 20:06
Thanks for following up.
But if the admin explicitly set ASCII in inpostgres.conf, should Kallithea really try to overrule it?
I don't see exactly your problem documented in setup.html ... but the documentation is a bit unclear and can be read in many ways ;-)
Comment by Thomas De Schampheleire, on 2015-07-26 12:18
I also wonder why you had SQL_ASCII in postgres.conf, that is not standard.
For the documentation, I think we can list following attention points: - is the database itself in UTF8 - is there no explicit override in postgres.conf - is there nothing in the environment when starting Kallithea that disables UTF8.