Kallithea issues archive

Issue #361: FullTextSearch index creation error

Reported by: vyom
State: closed
Created on: 2020-02-17 03:17
Updated on: 2020-06-05 07:41

Description

When I try to create a FullTextSearch index with build from scratch.

I get following error in my celery task related to .gitmodules (I have few git repositories in it).

Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]: [2020-02-17 10:57:08,049: ERROR/MainProcess] Task kallithea.lib.celerylib.whoosh_index[a4
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]: Traceback (most recent call last):
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:   File "/home/kallithea/projects/kallithea/venv3/lib/python3.6/site-packages/celery/app/trace.
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:     R = retval = fun(*args, **kwargs)
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:   File "/home/kallithea/projects/kallithea/venv3/lib/python3.6/site-packages/celery/app/trace.
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:     return self.run(*args, **kwargs)
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:   File "/home/kallithea/projects/hg/kallithea/kallithea/lib/celerylib/__init__.py", line 67, i
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:     f_org(*args, **kwargs)
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:   File "</home/kallithea/projects/kallithea/venv3/lib/python3.6/site-packages/decorator.py:dec
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:   File "/home/kallithea/projects/hg/kallithea/kallithea/lib/celerylib/__init__.py", line 109,
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:     ret = func(*fargs, **fkwargs)
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:   File "</home/kallithea/projects/kallithea/venv3/lib/python3.6/site-packages/decorator.py:dec
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:   File "/home/kallithea/projects/hg/kallithea/kallithea/lib/celerylib/__init__.py", line 127,
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:     ret = func(*fargs, **fkwargs)
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:   File "/home/kallithea/projects/hg/kallithea/kallithea/lib/celerylib/tasks.py", line 66, in w
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:     .run(full_index=full_index)
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:   File "/home/kallithea/projects/hg/kallithea/kallithea/lib/indexers/daemon.py", line 451, in
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:     self.build_indexes()
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:   File "/home/kallithea/projects/hg/kallithea/kallithea/lib/indexers/daemon.py", line 437, in
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:     self.index_changesets(chgset_idx_writer, repo_name, repo)
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:   File "/home/kallithea/projects/hg/kallithea/kallithea/lib/indexers/daemon.py", line 244, in
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:     added=' '.join(node.path for node in cs.added).lower(),
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:   File "/home/kallithea/projects/hg/kallithea/kallithea/lib/indexers/daemon.py", line 244, in
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:     added=' '.join(node.path for node in cs.added).lower(),
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:   File "/home/kallithea/projects/hg/kallithea/kallithea/lib/vcs/nodes.py", line 58, in __iter_
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:     yield self.cs.get_node(p)
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:   File "/home/kallithea/projects/hg/kallithea/kallithea/lib/vcs/backends/git/changeset.py", li
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:     cf = ConfigFile.from_file(BytesIO(self.repository._repo.get_object(tree[b'.gitmodules
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:   File "/home/kallithea/projects/kallithea/venv3/lib/python3.6/site-packages/dulwich/objects.p
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:     return self._entries[name]
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]: KeyError: b'.gitmodules'

In order to use Kallithea effectively wanted to introduce code search functionality to my team, but the index creation through Whoosh is not always ok.

Attachments

Comments

Comment by Mads Kiilerich, on 2020-02-17 17:44

Can you try to add a print(name, stat, id) to kallithea/lib/vcs/backends/git/changeset.py before the failing line?

It would be nice if someone could contribute a gitmodules test case - it could belong in kallithea/tests/vcs/test_changesets.py or kallithea/tests/vcs/test_nodes.py or kallithea/tests/vcs/test_repository.py or kallithea/tests/vcs/test_git.py …

Comment by Mads Kiilerich, on 2020-02-18 03:34

The original report had the line numbers truncated, but the call stack said it failed in get_node which thus must have been at https://kallithea-scm.org/repos/kallithea/files/0.5.2/kallithea/lib/vcs/backends/git/changeset.py#L413

The new stack shows that the print statement was added elsewhere: in get_nodes on line 441.

Comment by vyom, on 2020-02-18 07:09

In the previous report also the line number error was 441 based on my logs. So just added that line there. The line which had error is:

File "/usr/local/projects/hg/kallithea/kallithea/lib/vcs/backends/git/changeset.py", line 441, in get_node

cf = ConfigFile.from_file(BytesIO(self.repository._repo.get_object(tree[b'.gitmodules'][1]).data))

The original log was:

File "/home/kallithea/projects/hg/kallithea/kallithea/lib/vcs/nodes.py", line 58, in __iter__
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:     yield self.cs.get_node(p)
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:   File "/home/kallithea/projects/hg/kallithea/kallithea/lib/vcs/backends/git/changeset.py", line 441, in get_node
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:     cf = ConfigFile.from_file(BytesIO(self.repository._repo.get_object(tree[b'.gitmodules'][1]).data))
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:   File "/home/kallithea/projects/kallithea/venv3/lib/python3.6/site-packages/dulwich/objects.py", line 982, in __getitem__
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]:     return self._entries[name]
Feb 17 10:57:08 kallithea-5-2 kallithea-cli[15779]: KeyError: b'.gitmodules'

Comment by Mads Kiilerich, on 2020-02-18 13:46

What version is this? The official 0.5.2?

Comment by Mads Kiilerich, on 2020-02-18 13:59

If this is on default branch with Python 3, could you try with '.gitmodules' instead of b'.gitmodules'.

We really need a test case covering this …

Comment by vyom, on 2020-02-19 03:05

Changed that and it still gives the same error, not sure if the lines will again be truncated, so instead of code block directly pasting it in editor.

Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: [2020-02-18 22:26:04,713: ERROR/MainProcess] Task kallithea.lib.celerylib.whoosh_index[9591466a-5b00-4e18-b057-968fd8d1d57e] raised unexpected: KeyError('.gitmodules',)
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: Traceback (most recent call last):
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: File "/home/kallithea/projects/kallithea/venv3/lib/python3.6/site-packages/celery/app/trace.py", line 240, in trace_task
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: R = retval = fun(*args, **kwargs)
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: File "/home/kallithea/projects/kallithea/venv3/lib/python3.6/site-packages/celery/app/trace.py", line 438, in protected_call
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: return self.run(*args, **kwargs)
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: File "/home/kallithea/projects/hg/kallithea/kallithea/lib/celerylib/init.py", line 67, in f_async
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: f_org(*args, **kwargs)
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: File "</home/kallithea/projects/kallithea/venv3/lib/python3.6/site-packages/decorator.py:decorator-gen-4>", line 2, in whoosh_index
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: File "/home/kallithea/projects/hg/kallithea/kallithea/lib/celerylib/init.py", line 109, in __wrapper
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: ret = func(*fargs, **fkwargs)
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: File "</home/kallithea/projects/kallithea/venv3/lib/python3.6/site-packages/decorator.py:decorator-gen-3>", line 2, in whoosh_index
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: File "/home/kallithea/projects/hg/kallithea/kallithea/lib/celerylib/init.py", line 127, in __wrapper
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: ret = func(*fargs, **fkwargs)
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: File "/home/kallithea/projects/hg/kallithea/kallithea/lib/celerylib/tasks.py", line 66, in whoosh_index
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: .run(full_index=full_index)
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: File "/home/kallithea/projects/hg/kallithea/kallithea/lib/indexers/daemon.py", line 451, in run
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: self.build_indexes()
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: File "/home/kallithea/projects/hg/kallithea/kallithea/lib/indexers/daemon.py", line 437, in build_indexes
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: self.index_changesets(chgset_idx_writer, repo_name, repo)
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: File "/home/kallithea/projects/hg/kallithea/kallithea/lib/indexers/daemon.py", line 244, in index_changesets
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: added=' '.join(node.path for node in cs.added).lower(),
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: File "/home/kallithea/projects/hg/kallithea/kallithea/lib/indexers/daemon.py", line 244, in <genexpr>
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: added=' '.join(node.path for node in cs.added).lower(),
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: File "/home/kallithea/projects/hg/kallithea/kallithea/lib/vcs/nodes.py", line 58, in iter
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: yield self.cs.get_node(p)
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: File "/home/kallithea/projects/hg/kallithea/kallithea/lib/vcs/backends/git/changeset.py", line 441, in get_node
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: cf = ConfigFile.from_file(BytesIO(self.repository._repo.get_object(tree['.gitmodules'][1]).data))
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: File "/home/kallithea/projects/kallithea/venv3/lib/python3.6/site-packages/dulwich/objects.py", line 982, in getitem
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: return self._entries[name]
Feb 18 22:26:04 kallithea-5-2 kallithea-cli[30001]: KeyError: '.gitmodules'

Comment by Mads Kiilerich, on 2020-02-19 03:08

Then I guess we will need instructions for how to reproduce the problem. Can you make a small shell script that creates a git repo that trigger the problem?

Comment by vyom, on 2020-02-19 03:11

I have lots of repository need to check which one is giving this error, probably need to print the name of repository whose index it’s creating.

Comment by vyom, on 2020-02-26 00:53

With celery 4.4 the still the same error.

Comment by Thomas De Schampheleire, on 2020-03-01 20:12

I tried reproducing this problem with a test repo with submodule as well as a public one (https://github.com/mit-gfx/multicopter_design ) . In both cases using the default branch (with python 3). The indexing works fine.

But then, looking at the code:

            if stat and objects.S_ISGITLINK(stat):
                tree = self.repository._repo[self._tree_id]
                cf = ConfigFile.from_file(BytesIO(self.repository._repo.get_object(tree[b'.gitmodules'][1]).data))
                url = ascii_str(cf.get(('submodule', path), 'url'))
                node = SubModuleNode(path, url=url, changeset=ascii_str(id_),
                                     alias=self.repository.alias)

it seems to me that while the code expects that if stat is of type ‘GITLINK’, there is a file ‘.gitmodules’ in the repo root, but in your case it isn’t.

Could you please identify on what repository this happens, and if there is anything special about it? You could enable debug logs, or alternatively make following change:

diff --git a/kallithea/lib/indexers/daemon.py b/kallithea/lib/indexers/daemon.py
--- a/kallithea/lib/indexers/daemon.py
+++ b/kallithea/lib/indexers/daemon.py
@@ -435,6 +435,7 @@ class WhooshIndexingDaemon(object):

         for repo_name, repo in sorted(self.repo_paths.items()):
             log.debug('Updating indices for repo %s', repo_name)
+            log.warn('Updating indices for repo %s', repo_name)
             # skip indexing if there aren't any revisions
             if len(repo) < 1:
                 continue

Comment by vyom, on 2020-03-09 09:51

I removed the git repository and then try to index it works fine. So I can close this ticket. When I will give a try to index git and it throws an error will re-open it.

Comment by vyom, on 2020-03-09 09:51

Resolving it as removed the git repository with .gitmodule (It was Flask project repository). It worked fine after that.

Comment by Mads Kiilerich, on 2020-03-09 16:24

We would still very much like to know how to reproduce the problem.

Can you share the repo that showed the problem … or create another git repo to reproduce it?

Comment by vyom, on 2020-03-12 07:03

Issue hasn't been fixed. I tried indexing Flask repository and same error.

Comment by Mads Kiilerich, on 2020-03-12 15:55

Sure. We made no claims that it was fixed. We asked for a way to reproduce. If you have steps to reproduce, please share them.

Comment by Thomas De Schampheleire, on 2020-03-12 21:11

I cloned the flask repo from https://github.com/pallets/flask , then successfully indexed it with:

kallithea-cli index-create -c development.ini --index-only flask   

I don’t see any problem here. I tested it using Kallithea revision abb83e4edfd9 .

Can you clarify which exact steps you follow?

Edit: I also tried using the Admin option ‘Full-text search → Build from scratch’ as you showed in the screenshot in first post. Also this was successful.

But note that that option in Admin works on all repositories. The error could thus come from any of the repositories. Could you reproduce with ‘DEBUG’ logging on the ‘logger_whoosh_indexer’ section in the ini file? It will show more details about which repo index is being created, so should also explain which repo has this “b’.gitmodules” problem. Thanks.

Comment by vyom, on 2020-03-13 01:20

When I try I get the same old error.

$  sudo -u www-data venv3/bin/kallithea-cli index-create -c new.ini --index-only Flask
Traceback (most recent call last):
  File "venv3/bin/kallithea-cli", line 11, in <module>
    load_entry_point('Kallithea', 'console_scripts', 'kallithea-cli')()
  File "/home/kallithea/projects/kallithea/venv3/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/kallithea/projects/kallithea/venv3/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/kallithea/projects/kallithea/venv3/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/kallithea/projects/kallithea/venv3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/kallithea/projects/kallithea/venv3/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/kallithea/projects/hg/kallithea/kallithea/bin/kallithea_cli_base.py", line 81, in runtime_wrapper
    return annotated(*args, **kwargs)
  File "/home/kallithea/projects/hg/kallithea/kallithea/bin/kallithea_cli_index.py", line 59, in index_create
    .run(full_index=full_index)
  File "/home/kallithea/projects/hg/kallithea/kallithea/lib/indexers/daemon.py", line 454, in run
    self.update_indexes()
  File "/home/kallithea/projects/hg/kallithea/kallithea/lib/indexers/daemon.py", line 447, in update_indexes
    self.update_changeset_index()
  File "/home/kallithea/projects/hg/kallithea/kallithea/lib/indexers/daemon.py", line 315, in update_changeset_index
    repo_name, repo, start_id)
  File "/home/kallithea/projects/hg/kallithea/kallithea/lib/indexers/daemon.py", line 244, in index_changesets
    added=' '.join(node.path for node in cs.added).lower(),
  File "/home/kallithea/projects/hg/kallithea/kallithea/lib/indexers/daemon.py", line 244, in <genexpr>
    added=' '.join(node.path for node in cs.added).lower(),
  File "/home/kallithea/projects/hg/kallithea/kallithea/lib/vcs/nodes.py", line 58, in __iter__
    yield self.cs.get_node(p)
  File "/home/kallithea/projects/hg/kallithea/kallithea/lib/vcs/backends/git/changeset.py", line 441, in get_node
    cf = ConfigFile.from_file(BytesIO(self.repository._repo.get_object(tree['.gitmodules'][1]).data))
  File "/home/kallithea/projects/kallithea/venv3/lib/python3.6/site-packages/dulwich/objects.py", line 982, in __getitem__
    return self._entries[name]
KeyError: '.gitmodules'

Comment by Mads Kiilerich, on 2020-03-13 01:27

What output do you get from pip freeze ?

Comment by vyom, on 2020-03-13 01:52

This is the output.

$ pip freeze
alembic==1.4.1
amqp==2.5.2
anyjson==0.3.3
Babel==2.8.0
backlash==0.2.0
bcrypt==3.1.7
Beaker==1.11.0
billiard==3.6.3.0
bleach==3.1.1
celery==4.4.1
certifi==2019.11.28
cffi==1.14.0
click==7.1.1
crank==0.8.1
decorator==4.4.2
docutils==0.16
dulwich==0.19.15
FormEncode==1.3.1
gearbox==0.2.0
gunicorn==20.0.4
hupper==1.10.2
importlib-metadata==1.5.0
ipaddr==2.2.0
-e hg+https://kallithea-scm.org/repos/kallithea@abb83e4edfd90eeaeff9a347e95dd580831c8233#egg=Kallithea
kombu==4.6.8
Mako==1.1.2
Markdown==3.1.1
MarkupSafe==1.1.1
mercurial==5.3.1
paginate==0.5.6
paginate-sqlalchemy==0.3.0
Paste==3.3.0
PasteDeploy==2.1.0
pkg-resources==0.0.0
pycparser==2.20
Pygments==2.5.2
python-dateutil==2.8.1
python-editor==1.0.4
pytz==2019.3
repoze.lru==0.7
Routes==2.4.1
six==1.14.0
SQLAlchemy==1.3.14
Tempita==0.5.2
tgext.routes==0.2.1
TurboGears2==2.4.3
urllib3==1.25.8
URLObject==2.4.3
vine==1.3.0
waitress==1.4.3
webencodings==0.5.1
WebHelpers2==2.0
WebOb==1.8.6
Whoosh==2.7.4
zipp==3.1.0

Comment by Mads Kiilerich, on 2020-03-13 13:31

I can also not reproduce with these exact package versions and Python 3.6.10 on Fedora.

Please try to reproduce on another system and provide exact instructions of how to reproduce.

Comment by Thomas De Schampheleire, on 2020-03-13 13:52

And please also include in the instructions exactly which upstream URL you use for this Flask repository.