Kallithea issues archive

Issue #147: UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1: ordinal not in range(128)

Reported by: Udo Spallek
State: invalid
Created on: 2015-07-23 09:36
Updated on: 2015-07-26 12:18

Description

Related Issue #9

When I try to create a kallithea fork of this repository https://code.google.com/p/hgnested/ an error is raised. In the hg log is an author with Name "C├ędric" which makes use of the character u'\xe9'.

TIA and best regards Udo

File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/weberror/errormiddleware.py', line 162 in __call__
  app_iter = self.application(environ, sr_checker)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/beaker/middleware.py', line 155 in __call__
  return self.wrap_app(environ, session_start_response)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/routes/middleware.py', line 131 in __call__
  response = self.app(environ, start_response)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/wsgiapp.py', line 107 in __call__
  response = self.dispatch(controller, environ, start_response)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/wsgiapp.py', line 312 in dispatch
  return controller(environ, start_response)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/lib/base.py', line 383 in __call__
  return WSGIController.__call__(self, environ, start_response)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/controllers/core.py', line 211 in __call__
  response = self._dispatch_call()
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/controllers/core.py', line 162 in _dispatch_call
  response = self._inspect_call(func)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/controllers/core.py', line 105 in _inspect_call
  result = self._perform_call(func, args)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/controllers/core.py', line 57 in _perform_call
  return func(**args)
File '<string>', line 2 in index
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/lib/auth.py', line 782 in __wrapper
  return func(*fargs, **fkwargs)
File '<string>', line 2 in index
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/lib/auth.py', line 841 in __wrapper
  return func(*fargs, **fkwargs)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/controllers/summary.py', line 180 in index
  return render('summary/summary.html')
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/templating.py', line 243 in render_mako
  cache_type=cache_type, cache_expire=cache_expire)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/templating.py', line 218 in cached_template
  return render_func()
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/pylons/templating.py', line 240 in render_template
  return literal(template.render_unicode(**globs))
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/mako/template.py', line 452 in render_unicode
  as_unicode=True)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/mako/runtime.py', line 803 in _render
  **_kwargs_for_callable(callable_, data))
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/mako/runtime.py', line 835 in _render_context
  _exec_template(inherit, lclcontext, args=args, kwargs=kwargs)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/mako/runtime.py', line 855 in _exec_template
  _render_error(template, context, compat.exception_as())
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/mako/runtime.py', line 864 in _render_error
  result = template.error_handler(context, error)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/mako/runtime.py', line 853 in _exec_template
  callable_(context, *args, **kwargs)
File '/var/local/kallithea/data/templates/base/root.html.py', line 209 in render_body
  __M_writer(escape(next.body()))
File '/var/local/kallithea/data/templates/base/base.html.py', line 42 in render_body
  __M_writer(escape(next.main()))
File '/var/local/kallithea/data/templates/summary/summary.html.py', line 241 in render_main
  runtime._include_file(context, u'../changelog/changelog_summary_data.html', _template_uri)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/mako/runtime.py', line 730 in _include_file
  callable_(ctx, **_kwargs_for_include(callable_, context._data, **kwargs))
File '/var/local/kallithea/data/templates/changelog/changelog_summary_data.html.py', line 79 in render_body
  __M_writer(escape(h.person(cs.author)))
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/lib/helpers.py', line 518 in person
  user = user_or_none(author)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/lib/helpers.py', line 487 in user_or_none
  user = User.get_by_username(author_name(author), case_insensitive=True, cache=True)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/model/db.py', line 541 in get_by_username
  return q.scalar()
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py', line 2215 in scalar
  ret = self.one()
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py', line 2184 in one
  ret = list(self)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/lib/caching_query.py', line 80 in __iter__
  return self.get_value(createfunc=lambda:
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/lib/caching_query.py', line 99 in get_value
  ret = cache.get_value(cache_key, createfunc=createfunc)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/beaker/cache.py', line 305 in get
  return self._get_value(key, **kw).get_value()
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/beaker/container.py', line 385 in get_value
  v = self.createfunc()
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/kallithea/lib/caching_query.py', line 81 in <lambda>
  list(Query.__iter__(self)))
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py', line 2227 in __iter__
  return self._execute_and_instances(context)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py', line 2242 in _execute_and_instances
  result = conn.execute(querycontext.statement, self._params)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py', line 1449 in execute
  params)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py', line 1584 in _execute_clauseelement
  compiled_sql, distilled_params
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py', line 1691 in _execute_context
  context)
File '/usr/share/python/kallithea/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py', line 331 in do_execute
  cursor.execute(statement, parameters)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1: ordinal not in range(128)

Attachments

Comments

Comment by Thomas De Schampheleire, on 2015-07-23 10:34

Which database are you using? I have seen similar backtraces when adding data into a PostgreSQL database (like revie comments or pull request descriptions) with unicode characters. After a long investigation it turned out that the database was in SQL_ASCII format rather than UTF8. Recreating the database in UTF8 solved the problem for me.

To detect if you are in that situation, run 'psql -l'

In my case, the reason that not the default utf8 format was chosen during database creation (initdb) was that I had one of the LC_* environment variables set to C. Unsetting that variable (LC_CTYPE in my case), leaving only LANG and LC_ALL (set to a utf8 compatible value) before creating the database again solved it.

It is apparently not possible to migrate a live database, but the migration went painless. Essentially I took a pg_dumpall and a 'pg_dump kallithea'. I manually fixed the SQL_ASCII references in these files. Then in the new database I recreate the kallithea user with the right permissions, then imported the data from the pg_dump file (kallithea db only), also using pg_dump, piping the dump from stdin.

Comment by Udo Spallek, on 2015-07-23 10:59

Thanks for the pointer and you are right, we use Postgres as database. Unfortunately we already use utf-8 as encoding.

Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+-----------+----------+-------------+-------------+----------------------- kallithea | kallithea | UTF8 | de_DE.UTF-8 | de_DE.UTF-8 |

I will try to examine the dump.

Comment by Udo Spallek, on 2015-07-23 11:10

In the dump I found SQL_ASCII only in the head part as SET client_encoding = 'SQL_ASCII'; But in the database it seems to be sane:

kallithea=# show client_encoding ;
 client_encoding 
-----------------
 UTF8

kallithea=# show server_encoding ;
 server_encoding 
-----------------
 UTF8

Any idea what can I do next? TIA Udo

Comment by Mads Kiilerich, on 2015-07-23 11:37

Something in your stack must be forcing everything to be encoded as plain ascii.

Try reproduce while running as paster serve and try reproducing with a simple sqlite database ... just to figure out what makes the problem occur.

Comment by Udo Spallek, on 2015-07-23 11:52

We use kallithea 0.2.1.

My locale are:

LANG=de_DE.UTF-8 LANGUAGE= LC_CTYPE="de_DE.UTF-8" LC_NUMERIC="de_DE.UTF-8" LC_TIME="de_DE.UTF-8" LC_COLLATE="de_DE.UTF-8" LC_MONETARY="de_DE.UTF-8" LC_MESSAGES="de_DE.UTF-8" LC_PAPER="de_DE.UTF-8" LC_NAME="de_DE.UTF-8" LC_ADDRESS="de_DE.UTF-8" LC_TELEPHONE="de_DE.UTF-8" LC_MEASUREMENT="de_DE.UTF-8" LC_IDENTIFICATION="de_DE.UTF-8" LC_ALL=

Comment by Udo Spallek, on 2015-07-23 17:45

After setting the client_encoding[1] in postgresql.conf explicit to utf-8 and a database restart, anything works as expected. Thanks for all the good hints!

[1] http://www.postgresql.org/docs/9.3/static/runtime-config-client.html#GUC-CLIENT-ENCODING

Comment by Mads Kiilerich, on 2015-07-23 19:21

Please consider contributing documentation improvements that can help others to avoid this problem.

Comment by Thomas De Schampheleire, on 2015-07-24 11:24

@udono Good that you found a solution. I still wonder though what was causing this problem. Would you care testing the problem (with the change in postgresql.conf undone) with following cases: - LANG=de_DE.UTF-8 + LC_ALL=de_DE.UTF-8 (all other LC_* variables unset) - LANG=en_US.utf8 (all LC variables unset; this is the situation I have and I know works). - LC_ALL=en_US.UTF-8 + LANG=en_US.utf8

Obviously you will have to restart Kallithea between each test.

This analysis can serve as input into improving the documentation. Thanks a lot in advance...

Comment by Udo Spallek, on 2015-07-24 17:28

I still wonder though what was causing this problem.

IMHO has Kallithea UTF-8 problems, when the client_encoding is set to SQL_ASCII in postgresql.conf.

A solution could be enforcing the client encoding[1] in kallithea/sqalchemy.

Workarounds are a. to remove the SQL_ASCII or b. to set utf8 client_encoding in postgresql.conf.

[1] http://initd.org/psycopg/docs/connection.html#connection.set_client_encoding

=============
All Scenarios
=============
dispatch.wsgi
=============

Reset general locale setup:

os.environ['LANG'] = ''
os.environ['LC_CTYPE'] =""
os.environ['LC_NUMERIC'] =""
os.environ['LC_TIME'] =""
os.environ['LC_COLLATE'] =""
os.environ['LC_MONETARY'] =""
os.environ['LC_MESSAGES'] =""
os.environ['LC_PAPER'] =""
os.environ['LC_NAME'] =""
os.environ['LC_ADDRESS'] =""
os.environ['LC_TELEPHONE'] =""
os.environ['LC_MEASUREMENT'] =""
os.environ['LC_IDENTIFICATION'] =""
os.environ['LC_ALL'] = ""


Scenario A
==========
postgresql.conf
---------------
Change from:
    client_encoding = utf8
to:
    client_encoding = SQL_ASCII


Scenario A1
-----------
LANG=de_DE.UTF-8 + LC_ALL=de_DE.UTF-8 (all other LC_* variables unset)
os.environ['LANG'] = 'de_DE.UTF-8'
os.environ['LC_ALL'] = 'de_DE.UTF-8'
::
    $ systemctl restart uwsgi-emperor

Same Error


Scenario A2
-----------
LANG=en_US.utf8 (all LC variables unset; this is the situation I have and I
know works).
os.environ['LANG'] = 'en_US.UTF-8'
::
    $ systemctl restart uwsgi-emperor

Same Error


Scenario A3
-----------
LC_ALL=en_US.UTF-8 + LANG=en_US.utf8
os.environ['LANG'] = 'en_US.UTF-8'
os.environ['LC_ALL'] = 'en_US.UTF-8'
::
    $ systemctl restart uwsgi-emperor

Same Error


Scenario B
==========
postgresql.conf
---------------
Remove option:
    client_encoding = utf8

and use Postgres default.


Scenario B1
-----------
LANG=de_DE.UTF-8 + LC_ALL=de_DE.UTF-8 (all other LC_* variables unset)
os.environ['LANG'] = 'de_DE.UTF-8'
os.environ['LC_ALL'] = 'de_DE.UTF-8'
::
    $ systemctl restart uwsgi-emperor

Works Perfect


Scenario B2
-----------
LANG=en_US.utf8 (all LC variables unset; this is the situation I have and I
know works).
os.environ['LANG'] = 'en_US.UTF-8'
::
    $ systemctl restart uwsgi-emperor

Works perfect


Scenario B3
-----------
LC_ALL=en_US.UTF-8 + LANG=en_US.utf8
os.environ['LANG'] = 'en_US.UTF-8'
os.environ['LC_ALL'] = 'en_US.UTF-8'
::
    $ systemctl restart uwsgi-emperor

Works perfect


Scenario C
==========
postgresql.conf
---------------
Set:
    client_encoding = utf8


Scenario C1
-----------
LANG=de_DE.UTF-8 + LC_ALL=de_DE.UTF-8 (all other LC_* variables unset)
os.environ['LANG'] = 'de_DE.UTF-8'
os.environ['LC_ALL'] = 'de_DE.UTF-8'
::
    $ systemctl restart uwsgi-emperor

Works Perfect


Scenario C2
-----------
LANG=en_US.utf8 (all LC variables unset; this is the situation I have and I
know works).
os.environ['LANG'] = 'en_US.UTF-8'
::
    $ systemctl restart uwsgi-emperor

Works perfect


Scenario C3
-----------
LC_ALL=en_US.UTF-8 + LANG=en_US.utf8
os.environ['LANG'] = 'en_US.UTF-8'
os.environ['LC_ALL'] = 'en_US.UTF-8'
::
    $ systemctl restart uwsgi-emperor

Works perfect

Comment by Mads Kiilerich, on 2015-07-24 18:32

ASCII means no unicode. Why would anybody want to configure that if they want unicode? And if they do, is that Kallithea's problem?

If some system vendor defaults to that, their users should file a bug and ask them to default to unicode or make it very clear how to change it. As a last resort, the Kallithea documentation could explain how to work around it.

How did it end up that way in your case?

I can see you use uwsgi. I haven't tried it but it seems appealing. It would be nice if we could get some documentation of how to configure that. ;-)

Comment by Udo Spallek, on 2015-07-25 18:25

ASCII means no unicode. Why would anybody want to configure that if they want unicode? And if they do, is that Kallithea's problem?

I investigated a little further and found a nice solution: Kallithea just needs to be started with this os environ variable: PGCLIENTENCODING='UTF8'. I tested with SQL_ASCII client encoding in postgres.conf and it works perfect.

Additionally, to my shame, it is already documented in Kallithea: http://docs.kallithea-scm.org/en/latest/setup.html#apache-s-wsgi-config

If some system vendor defaults to that, their users should file a bug and ask them to default to unicode or make it very clear how to change it. As a last resort, the Kallithea documentation could explain how to work around it. How did it end up that way in your case? We came along the debops way: https://github.com/debops/ansible-postgresql/search?utf8=%E2%9C%93&q=client_encoding

I can see you use uwsgi. I haven't tried it but it seems appealing. It would be nice if we could get some documentation of how to configure that. ;-) You can find the configuration in the above link. I do not know more.

Comment by Mads Kiilerich, on 2015-07-25 20:06

Thanks for following up.

But if the admin explicitly set ASCII in inpostgres.conf, should Kallithea really try to overrule it?

I don't see exactly your problem documented in setup.html ... but the documentation is a bit unclear and can be read in many ways ;-)

Comment by Thomas De Schampheleire, on 2015-07-26 12:18

I also wonder why you had SQL_ASCII in postgres.conf, that is not standard.

For the documentation, I think we can list following attention points: - is the database itself in UTF8 - is there no explicit override in postgres.conf - is there nothing in the environment when starting Kallithea that disables UTF8.