Kallithea issues archive

Issue #130: [unicode] Some characters in filenames cause the indexing to fail

Reported by: Samuel Delisle
State: new
Created on: 2015-05-04 17:05
Updated on: 2015-08-06 19:53

Description

While creating the index necessary for full-text search, it looks like filenames with special accented characters (è, à, é, ...?) within a repository cause an exception.

Here's the output of the command: /opt/kallithea/venv/bin/paster make-index /opt/kallithea/data/production.ini

/opt/kallithea/venv/bin/paster make-index /opt/kallithea/data/production.ini
2015-05-04 12:54:42.587 INFO  [kallithea.model] initializing db for sqlite:////opt/kallithea/data/kallithea.db?timeout=60
2015-05-04 12:54:42.747 INFO  [kallithea.model.scm] scanning for repositories in /opt/kallithea/repos
Traceback (most recent call last):
  File "/opt/kallithea/venv/bin/paster", line 9, in <module> load_entry_point('PasteScript==1.7.5', 'console_scripts', 'paster')()
  File "/opt/kallithea/venv/local/lib/python2.7/site-packages/paste/script/command.py", line 104, in run invoke(command, command_name, options, args[1:])
  File "/opt/kallithea/venv/local/lib/python2.7/site-packages/paste/script/command.py", line 143, in invoke exit_code = runner.run(args)
  File "/opt/kallithea/venv/local/lib/python2.7/site-packages/kallithea/lib/utils.py", line 753, in run return super(BasePasterCommand, self).run(args[1:])
  File "/opt/kallithea/venv/local/lib/python2.7/site-packages/paste/script/command.py", line 238, in run result = self.command()
  File "/opt/kallithea/venv/local/lib/python2.7/site-packages/kallithea/lib/paster_commands/make_index.py", line 84, in command .run(full_index=self.options.full_index)
  File "/opt/kallithea/venv/local/lib/python2.7/site-packages/kallithea/lib/indexers/daemon.py", line 451, in run self.update_indexes()
  File "/opt/kallithea/venv/local/lib/python2.7/site-packages/kallithea/lib/indexers/daemon.py", line 443, in update_indexes self.update_file_index()
  File "/opt/kallithea/venv/local/lib/python2.7/site-packages/kallithea/lib/indexers/daemon.py", line 390, in update_file_index i, iwc = self.add_doc(writer, path, repo, repo_name)
  File "/opt/kallithea/venv/local/lib/python2.7/site-packages/kallithea/lib/indexers/daemon.py", line 175, in add_doc node = self.get_node(repo, path, index_rev)
  File "/opt/kallithea/venv/local/lib/python2.7/site-packages/kallithea/lib/indexers/daemon.py", line 163, in get_node node = cs.get_node(node_path)
  File "/opt/kallithea/venv/local/lib/python2.7/site-packages/kallithea/lib/vcs/backends/hg/changeset.py", line 347, in get_node % (path, self.short_id))
kallithea.lib.vcs.exceptions.NodeDoesNotExistError: There is no file nor directory at the given path: 'Mod?le.xml' at revision b2a74d2081af</pre>

The actual filename referred to should be "Modèle.xml", not "Mod?le.xml" In the console it's not an actual interrogation point, it's a "♦".

This is using Kallithea 0.2.1 on an Ubuntu server (running through VirtualBox, but that shouldn't change anything?), using Mercurial repositories.

Attachments

Comments

Comment by Samuel Delisle, on 2015-05-04 17:08

Comment by Mads Kiilerich, on 2015-05-04 17:31

Try setting HGENCODING="UTF-8" before running the command.

Comment by Samuel Delisle, on 2015-05-04 19:05

Is that an environment variable? I added HGENCODING="UTF-8" in /etc/environment, it does the same thing... Not sure if I did it right, but after rebooting, echo $HGENCODING gives me "UTF-8" as expected. I didn't find anything in Kallithea's settings related to the encoding.

It's probably relevant to note that the files in question were created on Windows. I notice Mercurial's documentation seem to point out compatibility issues for non-ascii characters: http://mercurial.selenic.com/wiki/EncodingStrategy

Comment by Mads Kiilerich, on 2015-05-04 19:09

Yes, it seems to be an inconsistent repo. It should still work with some caveats.

I guess the main issue here is that the error is considered fatal. It should just continue, possible after issuing a warning.

Comment by Samuel Delisle, on 2015-05-04 19:24

Exactly, I don't mind if it doesn't index that particular file properly, but it should just continue.

Comment by Thomas De Schampheleire, on 2015-08-06 19:53