Kallithea issues archive

Issue #352: SSH does not ingore trailing slash(es) in repo names

Reported by: Ross Thomas
State: resolved
Created on: 2020-01-23 20:39
Updated on: 2020-02-02 20:34

Description

When a repo is presented to the Kallithea SSH service with one or more trailing slashes (/) it fails to recognize the repo.

This was highlighted by a bug in the mercurial ‘schemes’ extension where a repo ref that only has a single path component will generate a repo name with a trailing slash.

The 'schemes' extension is not needed to reproduce, just add a slash to the end of a working SSH repo ref. For example,

$ hg in ssh://…/working-repo

works, but this doesn’t:

$ hg in ssh://…/working-repo/

Other SSH services I’ve tried (and obviously https-based ones) ignore the trailing slash(es).

Kallithea version: 0.5.1 (also in 0.5.0)

Attachments

Comments

Comment by Mads Kiilerich, on 2020-01-23 23:43

Too bad this missed 5.1 …

Also, I’m a bit concerned if “flexibility” in this place somehow could be security concern. Thus, we are a bit paranoid. I doubt think it would be a real problem, I just don’t have the evidence that it wouldn’t.

How much normalization should we apply? Just rstrip ‘/'? Or strip all superfluous ‘/’ in path? Or also normalize ‘/bogus/../’ and ‘/./’?

Comment by Ross Thomas, on 2020-01-24 04:12

Too bad this missed 5.1 …

Yeah. Unfortunately until this morning it was on my migration TODO list as a ‘schemes extension bug'. That is until I actually got around to investigating it properly.

Also, I’m a bit concerned if “flexibility” in this place somehow could be security concern. Thus, we are a bit paranoid. I doubt think it would be a real problem, I just don’t have the evidence that it wouldn’t.

How much normalization should we apply? Just rstrip ‘/'? Or strip all superfluous ‘/’ in path? Or also normalize ‘/bogus/../’ and ‘/./’?

From what I have been able to determine from the hg sources they treat the (resolved) repo name as an OS path and as such the multiple/trailing ‘/' issue and the ‘/./’ and '/../’ issues essentially solve themselves. However, KT (and other DB-driven services) use it as a key or key-path (i.e. groups + repo) and as such it may well need to be normalized prior to existence and access checks.

The strange thing is, these all work:
$ hg in https://svr/grp1/grp2/repo
$ hg in https://svr/grp1/grp2/repo//
$ hg in https://svr/grp1/grp2/repo/
$ hg in https://svr/grp1/grp2/./repo
$ hg in https://svr/grp1/grp2/../grp2/repo
$ hg in https://svr/grp1////grp2/repo

So the repo resolution mechanism for SSH does seem a LOT more restrictive.

Comment by Mads Kiilerich, on 2020-01-25 16:18

Yes. They are different. The https “flexibility” could be considered wrong - then it would be wrong to duplicate it for SSH.

It is important that people get exactly the repo they ask for, so insisting on specifying it correctly can help catch problems. Flexibility and giving the wrong repo would be bad.

Anyway, I guess something like this will work for you:

--- kallithea/lib/vcs/backends/ssh.py
+++ kallithea/lib/vcs/backends/ssh.py
@@ -19,6 +19,7 @@ vcs.backends.ssh
 SSH backend for all available SCMs
 """

+import posixpath
 import datetime
 import logging
 import sys
@@ -60,6 +61,7 @@ class BaseSshHandler(object):
         """Verify basic sanity of the repository, and that the user is
         valid and has access - then serve the native VCS protocol for
         repository access."""
+        self.repo_name = posixpath.normpath(self.repo_name).strip('/')
         dbuser = User.get(user_id)
         if dbuser is None:
             self.exit('User %r not found' % user_id)

It will strip all kinds of extra characters:

>>> posixpath.normpath('//foo///bar/x/../baz/.//').strip('/')
'foo/bar/baz'

Or should we only strip some kind of extra characters? Just the stripping, without normpath?

Superfluous ‘..’ is a common way to escape a root directory and peek elsewhere. Supporting it could be misleading, one way or other. But the ssh implementation is safe against that and only accept repos where the exact name already is listed in the database.

@Thomas De Schampheleire what’s your thoughts about this?

Comment by Thomas De Schampheleire, on 2020-01-25 19:57

Being relaxed about one or more trailing slashes sounds reasonable to me, and is the originally reported problem. I think we can fix this with the strip('/').

But allowing and resolving other components like ./, ../, or even multiple slashes in the middle of a path, seems unnecessary to me. And as @Mads Kiilerich already mentioned, supporting ‘../’ at some level is a risk if it is not done correctly, possibly bypassing authorization rules.

Comment by Mads Kiilerich, on 2020-01-25 23:03

Comment by Ross Thomas, on 2020-01-28 01:07

Yes, stripping trailing slashes would symptomatically fix the reported issue. Thanks.

There is, however, another issue highlighted here that may need a new ticket: HTTPS and SSH handle repo names differently. Changing the access method for a repo shouldn’t fail if it works for the other method. As it is now, changing from:

ssh://svr/grp1/repo → https://svr/grp1/repo

will work because SSH is stricter. Whereas changing from:

https://svr/grp1//repo/ → ssh://svr/grp1//repo/

will fail even though the original one didn’t. Yes, it would make you fix it to be more ‘compliant’ but it is an inconsistency that may surprise some.

I guess it also comes down to whether a stricter https mode is a breaking change or not.

Just a thought.

Comment by Mads Kiilerich, on 2020-01-28 01:15

Yes, using “normpath” would solve the new case you describe.

But my main perspective on these cases is that even though we for now might preserve some “undefined behaviour” tech debt for http, there is no reason to propagate it to ssh. Perhaps we instead should drop the backwards compatible undefined behaviour for https. Or perhaps we should avoid breaking anything if we don’t have to, even if that means that ssh and https have different degree of support for undefined behaviour. Which is what we do now.

Comment by Ross Thomas, on 2020-01-28 01:42

Yep. That gets back to my comment about 'a breaking change'.

At least I am now able to address the https/ssh dichotomy issue locally as/when my folks trip over it. 🙂

Thanks again.

Comment by Thomas De Schampheleire, on 2020-01-31 20:35

I agree with @Mads Kiilerich regarding not propagating previous ‘undefined behavior’ to SSH. Especially because there is no technical reason why current users of this behavior with HTTP could not normalize the path at their end, i.e. we are not imposing unnecessary complexity on users by keeping things as they are now.

Comment by Mads Kiilerich, on 2020-02-01 04:32

@Thomas De Schampheleire so exactly what do you propose? There is also no reason to be unnecessarily strict. I’m really not sure were to draw the line.

Comment by Thomas De Schampheleire, on 2020-02-01 16:02

I was proposing to leave the curent state, ie allow/strip trailing slashes.

We could use normpath too, but before we do we must check that there are no . or most importantly .. components in the path.

I think we have such a check somewhere already…

If you prefer this latter approach it’s also fine for me.

Comment by Thomas De Schampheleire, on 2020-02-02 20:34

The stripping of slashes was pushed as 7163feda7140. With this we consider the issue as resolved. Thanks for reporting it!