Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup URL percent-encoding behavior. #2990

Merged
merged 13 commits into from
Dec 15, 2023
Merged

Conversation

tomchristie
Copy link
Member

@tomchristie tomchristie commented Dec 7, 2023

Closes #2883

Cleanup test cases

  • The first two commits here cleanup three test cases into an exactly equivalent parameterised test case.
  • The next commit supplements the test cases with some extra cases (using with the current behaviour for the assertions).
  • Drop a test case that's now been superseeded.

Determine correct behaviour

For each case here I'm listing...

  • The URL under test.
  • The URL once it's been percent-escaped by chrome.
  • The URL we're currently percent-escaping into.

Test 1: Correct

# URL with unescaped chars in path.
https://example.com/!$&'()*+,;= abc ABC 123 :/[]@
https://example.com/!$&'()*+,;=%20abc%20ABC%20123%20:/[]@
https://example.com/!$&'()*+,;=%20abc%20ABC%20123%20:/[]@

Test 2: Correct

# URL with escaped chars in path.
https://example.com/!$&'()*+,;=%20abc%20ABC%20123%20:/[]@
https://example.com/!$&'()*+,;=%20abc%20ABC%20123%20:/[]@
https://example.com/!$&'()*+,;=%20abc%20ABC%20123%20:/[]@

Test 3: Incorrect. We're double-escaping here.

# URL with mix of unescaped and escaped chars in path.
https://example.com/ %97%98%99
https://example.com/%20%97%98%99
https://example.com/%20%2597%2598%2599

Test 4: Incorrect. We don't need to escape / in the query portion. We do need to percent escape '.

# URL with unescaped chars in query.
https://example.com/?!$&'()*+,;= abc ABC 123 :/[]@?
https://example.com/?!$&%27()*+,;=%20abc%20ABC%20123%20:/[]@?
https://example.com/?!$&'()*+,;=%20abc%20ABC%20123%20:%2F[]@?

Test 5: Correct.

# URL with escaped chars in query.
https://example.com/?!$&%27()*+,;=%20abc%20ABC%20123%20:%2F[]@?
https://example.com/?!$&%27()*+,;=%20abc%20ABC%20123%20:%2F[]@?
https://example.com/?!$&%27()*+,;=%20abc%20ABC%20123%20:%2F[]@?

Test 6: Correct.

# URL with mix of unescaped and escaped chars in query.
https://example.com/?%20%61%62%63
https://example.com/?%20%61%62%63
https://example.com/?%20%61%62%63

Test 7: Correct.

# URL encoding characters in fragment.
# (The fragment isn't sent as part of the HTTP request.)
https://example.com/#!$&'()*+,;= abc ABC 123 :/[]@?#
https://example.com/
https://example.com/

Resolve issue

  • Update test cases to reflect desired behaviour instead of existing behaviour.
  • Update behaviour so that double-escaping is not applied.
  • Update behaviour so that / is treated as a safe character in the query portion of the URL.

@tomchristie
Copy link
Member Author

At this point I've change the expectation in "test 3" to reflect the desired behaviour, not the existing behaviour.

@tomchristie
Copy link
Member Author

This is once I've fixed the double escaping.

@tomchristie
Copy link
Member Author

And now updated to also fix test 4.

(Well, almost except we're not encoding ' to %27 - spec seems to imply we're getting this correct here vs. chrome)

@tomchristie tomchristie marked this pull request as ready for review December 7, 2023 18:30
@tomchristie tomchristie requested a review from a team December 8, 2023 11:49
@Kludex
Copy link
Member

Kludex commented Dec 8, 2023

Copy link
Contributor

@T-256 T-256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need minor version release for this merge, because it is behavior change.

httpx/_urlparse.py Show resolved Hide resolved
httpx/_urlparse.py Show resolved Hide resolved
httpx/_urlparse.py Show resolved Hide resolved
@elupus
Copy link

elupus commented Dec 10, 2023

This unnecessarily encode slashes if you pass them as query params. Which will not solve cases with servers needing stuff like: https://host?mimetype=application/data.

The library will not mutate and break an existing string like that, which is good. But you can not construct it using params={"mimetime": "application/data"}

These server are in theory broken since they should expect possible encoding inside the arguments, but feels like we should avoid overly encoding that. There is quite a lot that in theory is safe here: https://github.com/encode/httpx/pull/2980/files#diff-78d8d93b5dd4c77d99c3e2b46b7286ba71a8fd60e92d8bd4eee5fb200b4f87bfR51

@elupus
Copy link

elupus commented Dec 10, 2023

Ps. Since it does solve most cases where the URL is in a pre-encoded form, its absolutely a good change. So i think something like this could go in to start with.

I think some optimization should be done with the use of sets for the reserved characters like i did in my change, since now it's doing linear scans of strings many times over during encoding, but its much less important.

@tomchristie
Copy link
Member Author

tomchristie commented Dec 11, 2023

I think we need minor version release for this merge, because it is behavior change.

We can commit to that yep.

I think some optimization should be done [...]

Yep, I have some follow-ups I'd like to make wrt...

  • Optimization.
  • Further test cleanup.
  • Ensuring we've got the correct handling for params={...}

I'll treat those as independent follow-ups.

if you pass them as query params.

I'll discuss this separately.

@tomchristie
Copy link
Member Author

@T-256 Thanks for the review. ☺️
If you'd like to be added as a maintainer ping me a private message on gitter.

@tomchristie
Copy link
Member Author

Okay, I think we're good here now.

Just pending a review from @encode/maintainers. 🙏🏼

httpx/_transports/asgi.py Outdated Show resolved Hide resolved
@T-256
Copy link
Contributor

T-256 commented Dec 11, 2023

@T-256 Thanks for the review. ☺️ If you'd like to be added as a maintainer ping me a private message on gitter.

Thanks, I sent PM in gitter.

httpx/_urlparse.py Outdated Show resolved Hide resolved
httpx/_urlparse.py Outdated Show resolved Hide resolved
@@ -449,6 +446,39 @@ def quote(string: str, safe: str = "/") -> str:
)


def quote(string: str, safe: str = "/") -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we should also add unit tests for these functions, rather than simply testing them with httpx.URL.
This would be a more robust approach, in my opinion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO current approach is fine.

This is clearly a code-smell, because our test cases ought to be tests against our public API, rather than testing implementation details. Perhaps there's some cases where it's a necessary hack, but... perhaps not?

from #2492 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree testing the public API should be sufficient unless something private is particularly expensive to test via the public API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a matter of preference, but if we encounter regression in our httpx.URL tests, we will go through these functions and find the method that isn't working properly, which is why unit tests are useful.

Copy link
Contributor

@T-256 T-256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM!

@tomchristie
Copy link
Member Author

thanks @karpetrosyan 🙏🏼

@tomchristie tomchristie merged commit a11fc38 into master Dec 15, 2023
5 checks passed
@tomchristie tomchristie deleted the cleanup-url-encoding branch December 15, 2023 11:35
@T-256 T-256 mentioned this pull request Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ULR parser percent encoding corrupting query params
7 participants