Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some aiohttp web server redirects fail #5319

Open
dmoklaf opened this issue Dec 7, 2020 · 9 comments
Open

Some aiohttp web server redirects fail #5319

dmoklaf opened this issue Dec 7, 2020 · 9 comments

Comments

@dmoklaf
Copy link

dmoklaf commented Dec 7, 2020

🐞 Describe the bug
Aiohttp server redirects ( triggered through "raise HTTPFound(...)") fail sometimes, depending on the final webserver targeted by the redirect, when the URL includes a question mark "?" in a query string argument (e.g., when this query argument is itself a URL), even when this question mark is URL-encoded.

💡 To Reproduce
On an aiohttp server send back a redirect to the client with "raise HTTPFound(...)", to a URL (on another final web server) that includes a query string that includes another URL with an encoded question mark. Some final web servers will reject that URL due to the second question mark, and the network trace will show that the initial aiohttp webserver had transformed the redirection URL, decoding the question mark in the query string.

Further analysis revealed this bug is due to 2 things:
(1) The aiowebserver decodes and reencodes the URL through the yarl library
(2) The yarl library has a bug with some query string characters that it does not escape (which is compliant with the web standards but not compliant with many webservers expectations):
aio-libs/yarl#245

Although the yarl maintainers may fix (2), this bug has been opened for 2 years and, more importantly, the usefulness of (1) is questioned: why not letting the user be responsible for the URL it submits? Reencoding it exposes him to expectation mismatchs (I never expected the web server to silently transforms my URL) and such potential bugs.

💡 Expected behavior
The redirect should work flawlessly.

📋 Logs/tracebacks

📋 Your version of the Python
3.7.8

📋 Your version of the aiohttp/yarl/multidict distributions

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.7.3
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: Nikolay Kim
Author-email: [email protected]
License: Apache 2
Location: /usr/local/Caskroom/miniconda/base/envs/myenv/lib/python3.7/site-packages
Requires: async-timeout, chardet, yarl, multidict, attrs, typing-extensions
Required-by: slackclient, gcsfs, aiosocks, aiohttp-jinja2
$ python -m pip show multidict
Name: multidict
Version: 5.0.0
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: /usr/local/Caskroom/miniconda/base/envs/myenv/lib/python3.7/site-packages
Requires: 
Required-by: yarl, aiohttp
$ python -m pip show yarl
Name: yarl
Version: 1.6.3
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: [email protected]
License: Apache 2
Location: /usr/local/Caskroom/miniconda/base/envs/myenv/lib/python3.7/site-packages
Requires: multidict, idna, typing-extensions
Required-by: aiohttp

📋 Additional context

@dmoklaf dmoklaf added the bug label Dec 7, 2020
@asvetlov
Copy link
Member

asvetlov commented Dec 8, 2020

The solution is: aiohttp should not requote HTTP redirect URL. Other quoting logic should be untouched.

The issue is easy and wais for a champion.

@dmoklaf
Copy link
Author

dmoklaf commented Dec 8, 2020

I submitted a PR that works in my program but ... raised interesting failures in the CI pipeline

E.g. exotic test cases where the redirection URL has linefeeds characters ("\r\n"), my patch transmits these characters unchanged.

The current code (before this PR) silently fixes these URLs with yarl by encoding these as any character.

For my patch to be valid, we should enrich it to check that the URL follows the HTTP standard. This becomes harder as we need a standard-compliant checker, and it may potentially break redirects for existing aiohttp users who were relying heavily on yarl transparent url corrections and didn't realize their redirection URLs were not standard-compliant...

In such case it may be simpler to just fix yarl (I am changing my mind on this as I realize we are pulling other potentially larger issues by trying to workaround yarl).

@dralley
Copy link

dralley commented Jul 8, 2022

@asvetlov I have a concrete and trivial case of this

In [6]: import aiohttp
   ...: import asyncio
   ...: 
   ...: async def main():
   ...: 
   ...:     async with aiohttp.ClientSession() as session:
   ...:         async with session.get('https://releases.jfrog.io/artifactory/artifactory-pro-rpms/repodata/8c87521e43dbed223c90e23922c0c4bbe7159112-primary.xml.gz') as response:
   ...: 
   ...:             print("Status:", response.status)
   ...:             print("Content-type:", response.headers['content-type'])
   ...: 
   ...:             html = await response.text()
   ...:             print("Body:", html)
   ...: 
   ...: loop = asyncio.get_event_loop()
   ...: loop.run_until_complete(main())
Status: 403
Content-type: text/xml
Body: <?xml version="1.0" encoding="UTF-8"?><Error><Code>AccessDenied</Code><Message>Access denied</Message></Error>

aiohttp processes the redirect location as the following - note the "X-Artifactory-artifactPath" parameter

https://releases-cdn.jfrog.io/filestore/8c/8c87521e43dbed223c90e23922c0c4bbe7159112?response-content-type=application/x-gzip&response-content-disposition=attachment%3Bfilename%3D%228c87521e43dbed223c90e23922c0c4bbe7159112-primary.xml.gz%22&x-jf-traceId=da959416009abe79&X-Artifactory-repositoryKey=artifactory-pro-rpms&X-Artifactory-projectKey=default&X-Artifactory-artifactPath=repodata/8c87521e43dbed223c90e23922c0c4bbe7159112-primary.xml.gz&X-Artifactory-username=anonymous&Expires=1657248968&Signature=MA21YnR3A8BncqlPFTcELv15ndn2B6yNQpaag2JuuPiWgiwjYnGHlbQkDbXp6Gk3ygrKOPvUETeb-gSzv6g64kFVnNzYzDxMRdV71nIP2YRWkufG-2R9AGSA9OtAevmAWYG-wmGazZt0L6VqX-4XDfpQPoVG5RITCMzQnQ3W~XbX8lB2AhliU0GI8QNOFzVKB8bAiFIFjh0HBV-P~~DlRaOn0ouKkdS-6paLjDq7NPEmlXrt2A~oOd5NvwViRpUNSsbov6WugJHPMMxPb5wj3B7hISYfwN6~6TWgbZ8Y6o3GXV-3zhxN4idpkFuvF90d2Aoep~gmeGKUysk3EohzVQ__&Key-Pair-Id=APKAJ6NHFWMVU3M6DPBA

Whereas wget, httpie and others process the redirect as

https://releases-cdn.jfrog.io/filestore/8c/8c87521e43dbed223c90e23922c0c4bbe7159112?response-content-type=application/x-gzip&response-content-disposition=attachment%3Bfilename%3D%228c87521e43dbed223c90e23922c0c4bbe7159112-primary.xml.gz%22&x-jf-traceId=e4edec532ab037c3&X-Artifactory-repositoryKey=artifactory-pro-rpms&X-Artifactory-projectKey=default&X-Artifactory-artifactPath=repodata%2F8c87521e43dbed223c90e23922c0c4bbe7159112-primary.xml.gz&X-Artifactory-username=anonymous&Expires=1657249015&Signature=dwsfXduAz0Avg62JQfIgXuQGP2cLv6AExW4u9gR3B0wzOu5R599u~satGkD6yLIzB4Y1uBCLhU6IO7b-ovr0wT9AJTxMaE-eiF2awcaj2eGzXpFibJVoUuzhsG3CW5koby8cytTThxRYewn1nACLna58og14o~EkXIgA3~c32y5ltR1rtJDyclQ~wpQNTpc4vX4uk2hr5z0lolmCGy0fgLxmuiFkzNmsHFAUbo1qqb8kJF~SAE-l4erLPAIlyIuDpAl6myyJ1cHmS-dolXiz8M6yNel-sJY5jaOH-iAlZoZm9PHdqmKbMqrzdgzLS2emZF5zER9g068lY~7fLqgqrw__&Key-Pair-Id=APKAJ6NHFWMVU3M6DPBA

and it works fine

@dralley
Copy link

dralley commented Jul 8, 2022

Perhaps yarl could have a "max_compatibility" flag which adopts the more aggressive encoding behavior?

@webknjaz
Copy link
Member

E.g. exotic test cases where the redirection URL has linefeeds characters ("\r\n"), my patch transmits these characters unchanged.

Not exotic really but request smuggling attack prevention.

@webknjaz
Copy link
Member

Perhaps yarl could have a "max_compatibility" flag which adopts the more aggressive encoding behavior?

Per Andrew's comment, maybe aiohttp could make use of the raw argument. But dunno, I haven't checked the code.

@webknjaz
Copy link
Member

@dralley Have you tried setting requote_redirect_url=False?

@dralley
Copy link

dralley commented Jul 13, 2022

@webknjaz This works, thank you.

It does beg the question though - why does aiohttp default to rewriting the redirect URLs it receives, rather than defaulting to not doing so? All things considered it does still seem like surprising behavior.

@webknjaz
Copy link
Member

I think it has to do with the fact that yarl.URL is used for internal representation of URLs in aiohttp and it so happens that multiple valid inputs may be equivalent. It's also a way to be sure that malicious inputs don't get forwarded, I suppose.

I did some Git archeology and found that this issue was first reported in #1474 against aiohttp v1.1.4. Then, with @fafhrd91's patch ed9bf2b this setting appeared in v2.1.0, followed by a later change by @asvetlov#3436 — that deprecated the corresponding attribute mutability. If anybody's able to recall the original motivation, it'd probably be @fafhrd91.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants