fix: avoid unescaping slashes when proxying URLs #1134

refi64 · 2023-09-07T21:21:39Z

Doing all the path operations on URL.Path means that they're being performed on an unescaped URL. Unfortunately, this URL is not guaranteed to be the directly unescaped version of URL.EscapedPath(); in particular, slashes will be unescaped in URL.Path, e.g. given:

/abc%2Fdef

URL.Path will be:

/abc/def

In these cases, URL.RawPath is also set with the escaped version of the path. However, once URL.Path is modified, URL.RawPath is ignored, and URL.EscapedPath() returns the path-escaped version of URL.Path...which, in this case, is now /abc/def, not the correct /abc%2Fdef.

As a result, when performing any path modifications during proxying (i.e. proxying to an upstream URL with a path component, or applying StripPath), this results in any escaped slashes in the proxied URL being unescaped.

In order to work around this, rewriting operations need to take place on the escaped path, not the unescaped one, and then an intermediate URL can be used to determine the Path and RawPath values to set on the forward URL.

This incurs a small breaking change in that StripPath is now applied to the escaped URL, not the unescaped URL. These semantics arguably make more sense, since StripPath otherwise also cannot distinguish between unencoded slashes within a path segment vs those that are separating path segments, but it's still a breaking change nonetheless.

BREAKING CHANGES: This patch changes the behavior of strip_path to be applied to the *escaped* URL as returned by Go's `URL.EscapePath()`, which means that slashes inside path segments, carets, unicode characters, and others that are need to be percent-encoded in strip_path.

Checklist

I have read the contributing guidelines.
I have referenced an issue containing the design document if my change
introduces a new feature.
I am following the
contributing code guidelines.
I have read the security policy.
I confirm that this pull request does not address a security
vulnerability. If this pull request addresses a security vulnerability, I
confirm that I got the approval (please contact
[email protected]) from the maintainers to push
the changes.
- (the listed email was contacted in advance and confirmed that this change can be opened as a public PR)
I have added tests that prove my fix is effective or that my feature
works.
I have added or changed the documentation.

Further Comments

I debating trying to preserve the current strip_path semantics, but in addition to being unable to distinguish / and %2F, it's also just incredibly ugly: I need to put the value of the attribute in a stub URL and parse that to get the escaped path, which also introduces a large amount of weird edge cases that come from quirks in URL parsers.

CLAassistant · 2023-09-07T21:21:44Z

All committers have signed the CLA.

codecov · 2023-09-07T21:29:17Z

Codecov Report

Attention: Patch coverage is 80.00000% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 77.74%. Comparing base (8fc9b7a) to head (b16e209).
Report is 2 commits behind head on master.

Files	Patch %	Lines
proxy/proxy.go	80.00%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1134      +/-   ##
==========================================
- Coverage   77.90%   77.74%   -0.17%     
==========================================
  Files          80       79       -1     
  Lines        3929     4040     +111     
==========================================
+ Hits         3061     3141      +80     
- Misses        595      618      +23     
- Partials      273      281       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Doing all the path operations on URL.Path means that they're being performed on an unescaped URL. Unfortunately, this URL is *not* guaranteed to be the directly unescaped version of URL.EscapedPath(); in particular, slashes will be unescaped in URL.Path, e.g. given: /abc%2Fdef URL.Path will be: /abc/def In these cases, URL.RawPath is also set with the escaped version of the path. However, once URL.Path is modified, URL.RawPath is ignored, and URL.EscapedPath() returns the path-escaped version of URL.Path...which, in this case, is now /abc/def, not the correct /abc%2Fdef. As a result, when performing any path modifications during proxying (i.e. proxying to an upstream URL with a path component, or applying StripPath), this results in any *escaped* slashes in the proxied URL being unescaped. In order to work around this, rewriting operations need to take place on the *escaped* path, not the unescaped one, and then an intermediate URL can be used to determine the Path and RawPath values to set on the forward URL. This incurs a small breaking change in that StripPath is now applied to the *escaped* URL, not the unescaped URL. These semantics arguably make more sense, since StripPath otherwise also cannot distinguish between unencoded slashes within a path segment vs those that are separating path segments, but it's still a breaking change nonetheless. Signed-off-by: Ryan Gonzalez <[email protected]>

em- · 2024-04-22T22:43:12Z

Any chance to get this reviewed? Thank you!

refi64 requested a review from aeneasr as a code owner September 7, 2023 21:21

refi64 force-pushed the slash-unescape branch from 276c9bc to 2e4ed10 Compare September 8, 2023 14:28

refi64 force-pushed the slash-unescape branch from 2e4ed10 to b16e209 Compare March 12, 2024 20:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: avoid unescaping slashes when proxying URLs #1134

fix: avoid unescaping slashes when proxying URLs #1134

refi64 commented Sep 7, 2023

CLAassistant commented Sep 7, 2023 •

edited

Loading

codecov bot commented Sep 7, 2023 •

edited

Loading

em- commented Apr 22, 2024

fix: avoid unescaping slashes when proxying URLs #1134

Are you sure you want to change the base?

fix: avoid unescaping slashes when proxying URLs #1134

Conversation

refi64 commented Sep 7, 2023

Checklist

Further Comments

CLAassistant commented Sep 7, 2023 • edited Loading

codecov bot commented Sep 7, 2023 • edited Loading

Codecov Report

em- commented Apr 22, 2024

CLAassistant commented Sep 7, 2023 •

edited

Loading

codecov bot commented Sep 7, 2023 •

edited

Loading