Issue #130: [unicode] Some characters in filenames cause the indexing to fail
Reported by: | Samuel Delisle |
State: | new |
Created on: | 2015-05-04 17:05 |
Updated on: | 2015-08-06 19:53 |
Description
While creating the index necessary for full-text search, it looks like filenames with special accented characters (è, à, é, ...?) within a repository cause an exception.
Here's the output of the command: /opt/kallithea/venv/bin/paster make-index /opt/kallithea/data/production.ini
/opt/kallithea/venv/bin/paster make-index /opt/kallithea/data/production.ini 2015-05-04 12:54:42.587 INFO [kallithea.model] initializing db for sqlite:////opt/kallithea/data/kallithea.db?timeout=60 2015-05-04 12:54:42.747 INFO [kallithea.model.scm] scanning for repositories in /opt/kallithea/repos Traceback (most recent call last): File "/opt/kallithea/venv/bin/paster", line 9, in <module> load_entry_point('PasteScript==1.7.5', 'console_scripts', 'paster')() File "/opt/kallithea/venv/local/lib/python2.7/site-packages/paste/script/command.py", line 104, in run invoke(command, command_name, options, args[1:]) File "/opt/kallithea/venv/local/lib/python2.7/site-packages/paste/script/command.py", line 143, in invoke exit_code = runner.run(args) File "/opt/kallithea/venv/local/lib/python2.7/site-packages/kallithea/lib/utils.py", line 753, in run return super(BasePasterCommand, self).run(args[1:]) File "/opt/kallithea/venv/local/lib/python2.7/site-packages/paste/script/command.py", line 238, in run result = self.command() File "/opt/kallithea/venv/local/lib/python2.7/site-packages/kallithea/lib/paster_commands/make_index.py", line 84, in command .run(full_index=self.options.full_index) File "/opt/kallithea/venv/local/lib/python2.7/site-packages/kallithea/lib/indexers/daemon.py", line 451, in run self.update_indexes() File "/opt/kallithea/venv/local/lib/python2.7/site-packages/kallithea/lib/indexers/daemon.py", line 443, in update_indexes self.update_file_index() File "/opt/kallithea/venv/local/lib/python2.7/site-packages/kallithea/lib/indexers/daemon.py", line 390, in update_file_index i, iwc = self.add_doc(writer, path, repo, repo_name) File "/opt/kallithea/venv/local/lib/python2.7/site-packages/kallithea/lib/indexers/daemon.py", line 175, in add_doc node = self.get_node(repo, path, index_rev) File "/opt/kallithea/venv/local/lib/python2.7/site-packages/kallithea/lib/indexers/daemon.py", line 163, in get_node node = cs.get_node(node_path) File "/opt/kallithea/venv/local/lib/python2.7/site-packages/kallithea/lib/vcs/backends/hg/changeset.py", line 347, in get_node % (path, self.short_id)) kallithea.lib.vcs.exceptions.NodeDoesNotExistError: There is no file nor directory at the given path: 'Mod?le.xml' at revision b2a74d2081af</pre>
The actual filename referred to should be "Modèle.xml", not "Mod?le.xml" In the console it's not an actual interrogation point, it's a "♦".
This is using Kallithea 0.2.1 on an Ubuntu server (running through VirtualBox, but that shouldn't change anything?), using Mercurial repositories.
Attachments
Comments
Comment by Samuel Delisle, on 2015-05-04 17:08
Comment by Mads Kiilerich, on 2015-05-04 17:31
Try setting HGENCODING="UTF-8" before running the command.
Comment by Samuel Delisle, on 2015-05-04 19:05
Is that an environment variable? I added HGENCODING="UTF-8"
in /etc/environment, it does the same thing... Not sure if I did it right, but after rebooting, echo $HGENCODING
gives me "UTF-8" as expected. I didn't find anything in Kallithea's settings related to the encoding.
It's probably relevant to note that the files in question were created on Windows. I notice Mercurial's documentation seem to point out compatibility issues for non-ascii characters: http://mercurial.selenic.com/wiki/EncodingStrategy
Comment by Mads Kiilerich, on 2015-05-04 19:09
Yes, it seems to be an inconsistent repo. It should still work with some caveats.
I guess the main issue here is that the error is considered fatal. It should just continue, possible after issuing a warning.
Comment by Samuel Delisle, on 2015-05-04 19:24
Exactly, I don't mind if it doesn't index that particular file properly, but it should just continue.
Comment by Thomas De Schampheleire, on 2015-08-06 19:53