Cleanup URL percent-encoding behavior. #2990

tomchristie · 2023-12-07T16:59:01Z

Cleanup test cases

The first two commits here cleanup three test cases into an exactly equivalent parameterised test case.
The next commit supplements the test cases with some extra cases (using with the current behaviour for the assertions).
Drop a test case that's now been superseeded.

Determine correct behaviour

For each case here I'm listing...

The URL under test.
The URL once it's been percent-escaped by chrome.
The URL we're currently percent-escaping into.

Test 1: Correct

# URL with unescaped chars in path.
https://example.com/!$&'()*+,;= abc ABC 123 :/[]@
https://example.com/!$&'()*+,;=%20abc%20ABC%20123%20:/[]@
https://example.com/!$&'()*+,;=%20abc%20ABC%20123%20:/[]@

Test 2: Correct

# URL with escaped chars in path.
https://example.com/!$&'()*+,;=%20abc%20ABC%20123%20:/[]@
https://example.com/!$&'()*+,;=%20abc%20ABC%20123%20:/[]@
https://example.com/!$&'()*+,;=%20abc%20ABC%20123%20:/[]@

Test 3: Incorrect. We're double-escaping here.

# URL with mix of unescaped and escaped chars in path.
https://example.com/ %97%98%99
https://example.com/%20%97%98%99
https://example.com/%20%2597%2598%2599

Test 4: Incorrect. We don't need to escape / in the query portion. We do need to percent escape '.

# URL with unescaped chars in query.
https://example.com/?!$&'()*+,;= abc ABC 123 :/[]@?
https://example.com/?!$&%27()*+,;=%20abc%20ABC%20123%20:/[]@?
https://example.com/?!$&'()*+,;=%20abc%20ABC%20123%20:%2F[]@?

Test 5: Correct.

# URL with escaped chars in query.
https://example.com/?!$&%27()*+,;=%20abc%20ABC%20123%20:%2F[]@?
https://example.com/?!$&%27()*+,;=%20abc%20ABC%20123%20:%2F[]@?
https://example.com/?!$&%27()*+,;=%20abc%20ABC%20123%20:%2F[]@?

Test 6: Correct.

# URL with mix of unescaped and escaped chars in query.
https://example.com/?%20%61%62%63
https://example.com/?%20%61%62%63
https://example.com/?%20%61%62%63

Test 7: Correct.

# URL encoding characters in fragment.
# (The fragment isn't sent as part of the HTTP request.)
https://example.com/#!$&'()*+,;= abc ABC 123 :/[]@?#
https://example.com/
https://example.com/

Resolve issue

Update test cases to reflect desired behaviour instead of existing behaviour.
Update behaviour so that double-escaping is not applied.
Update behaviour so that / is treated as a safe character in the query portion of the URL.

tomchristie · 2023-12-07T18:13:02Z

At this point I've change the expectation in "test 3" to reflect the desired behaviour, not the existing behaviour.

tomchristie · 2023-12-07T18:17:57Z

This is once I've fixed the double escaping.

tomchristie · 2023-12-07T18:28:29Z

And now updated to also fix test 4.

(Well, almost except we're not encoding ' to %27 - spec seems to imply we're getting this correct here vs. chrome)

httpx/_urlparse.py

Kludex · 2023-12-08T11:54:13Z

Fixes httpx 0.25.0 changes decoding behaviour of .query_params when using TestClient starlette#2282

T-256

I think we need minor version release for this merge, because it is behavior change.

httpx/_urlparse.py

elupus · 2023-12-10T15:00:27Z

This unnecessarily encode slashes if you pass them as query params. Which will not solve cases with servers needing stuff like: https://host?mimetype=application/data.

The library will not mutate and break an existing string like that, which is good. But you can not construct it using params={"mimetime": "application/data"}

These server are in theory broken since they should expect possible encoding inside the arguments, but feels like we should avoid overly encoding that. There is quite a lot that in theory is safe here: https://github.com/encode/httpx/pull/2980/files#diff-78d8d93b5dd4c77d99c3e2b46b7286ba71a8fd60e92d8bd4eee5fb200b4f87bfR51

elupus · 2023-12-10T15:07:15Z

Ps. Since it does solve most cases where the URL is in a pre-encoded form, its absolutely a good change. So i think something like this could go in to start with.

I think some optimization should be done with the use of sets for the reserved characters like i did in my change, since now it's doing linear scans of strings many times over during encoding, but its much less important.

tomchristie · 2023-12-11T09:39:46Z

I think we need minor version release for this merge, because it is behavior change.

We can commit to that yep.

I think some optimization should be done [...]

Yep, I have some follow-ups I'd like to make wrt...

Optimization.
Further test cleanup.
Ensuring we've got the correct handling for params={...}

I'll treat those as independent follow-ups.

if you pass them as query params.

I'll discuss this separately.

tomchristie · 2023-12-11T09:46:46Z

@T-256 Thanks for the review. ☺️
If you'd like to be added as a maintainer ping me a private message on gitter.

tomchristie · 2023-12-11T09:48:08Z

Okay, I think we're good here now.

Just pending a review from @encode/maintainers. 🙏🏼

httpx/_transports/asgi.py

T-256 · 2023-12-11T19:28:54Z

@T-256 Thanks for the review. ☺️ If you'd like to be added as a maintainer ping me a private message on gitter.

Thanks, I sent PM in gitter.

httpx/_urlparse.py

karpetrosyan · 2023-12-13T10:55:47Z

httpx/_urlparse.py

@@ -449,6 +446,39 @@ def quote(string: str, safe: str = "/") -> str:
    )


+def quote(string: str, safe: str = "/") -> str:


I believe we should also add unit tests for these functions, rather than simply testing them with httpx.URL.
This would be a more robust approach, in my opinion.

IMO current approach is fine.

This is clearly a code-smell, because our test cases ought to be tests against our public API, rather than testing implementation details. Perhaps there's some cases where it's a necessary hack, but... perhaps not?

from #2492 (comment)

I agree testing the public API should be sufficient unless something private is particularly expensive to test via the public API.

It's a matter of preference, but if we encounter regression in our httpx.URL tests, we will go through these functions and find the method that isn't working properly, which is why unit tests are useful.

T-256

Thanks, LGTM!

tomchristie · 2023-12-15T11:35:13Z

thanks @karpetrosyan 🙏🏼

tomchristie added 3 commits December 7, 2023 11:46

Replace path_query_fragment encoding tests

9c669b7

Remove replaced test cases

6cc7555

Fix test case to use correct hex sequence for 'abc'

b7d4258

Fix 'quote' behaviour so we don't double-escape.

412a880

Add '/' to safe chars in query strings

a1b4afc

tomchristie marked this pull request as ready for review December 7, 2023 18:30

tomchristie mentioned this pull request Dec 8, 2023

Pre encoded path/params/fragments should be kept #2976

Closed

3 tasks

tomchristie commented Dec 8, 2023

View reviewed changes

httpx/_urlparse.py Show resolved Hide resolved

tomchristie commented Dec 8, 2023

View reviewed changes

httpx/_urlparse.py Show resolved Hide resolved

tomchristie requested a review from a team December 8, 2023 11:49

T-256 approved these changes Dec 10, 2023

View reviewed changes

httpx/_urlparse.py Show resolved Hide resolved

httpx/_urlparse.py Show resolved Hide resolved

httpx/_urlparse.py Show resolved Hide resolved

T-256 mentioned this pull request Dec 10, 2023

Differenctiate between encoded form and element form for url #2980

Closed

3 tasks

Kludex mentioned this pull request Dec 10, 2023

httpx 0.25.0 changes decoding behaviour of .query_params when using TestClient encode/starlette#2282

Closed

tomchristie added 2 commits December 11, 2023 09:35

Update docstring

de9968a

Linting

7bdf227

tomchristie mentioned this pull request Dec 11, 2023

hotfix: urlparse keep the existing escape string in query #2929

Closed

3 tasks

tomchristie requested a review from karpetrosyan December 11, 2023 12:13

Merge branch 'master' into cleanup-url-encoding

76ec6eb

T-256 reviewed Dec 11, 2023

View reviewed changes

httpx/_transports/asgi.py Outdated Show resolved Hide resolved

jkseppan reviewed Dec 13, 2023

View reviewed changes

httpx/_urlparse.py Outdated Show resolved Hide resolved

Merge branch 'master' into cleanup-url-encoding

dd7b657

tomchristie commented Dec 13, 2023

View reviewed changes

httpx/_urlparse.py Outdated Show resolved Hide resolved

tomchristie added 2 commits December 13, 2023 07:36

Update outdated comment.

0d8cc4b

Revert unrelated change

e3532aa

karpetrosyan reviewed Dec 13, 2023

View reviewed changes

Merge branch 'master' into cleanup-url-encoding

84f6338

T-256 approved these changes Dec 13, 2023

View reviewed changes

Merge branch 'master' into cleanup-url-encoding

53d15f1

tomchristie requested review from zanieb and karpetrosyan December 15, 2023 10:47

karpetrosyan approved these changes Dec 15, 2023

View reviewed changes

tomchristie merged commit a11fc38 into master Dec 15, 2023
5 checks passed

tomchristie deleted the cleanup-url-encoding branch December 15, 2023 11:35

T-256 mentioned this pull request Dec 19, 2023

Version 0.26.0 #3009

Merged

zevisert mentioned this pull request Jan 4, 2024

httpx 0.26 supertokens/supertokens-python#468

Closed

thorinaboenke mentioned this pull request Jul 30, 2024

Urlencoding is broken in http-client ait-testbed/attackmate#44

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup URL percent-encoding behavior. #2990

Cleanup URL percent-encoding behavior. #2990

tomchristie commented Dec 7, 2023 •

edited

Loading

tomchristie commented Dec 7, 2023

tomchristie commented Dec 7, 2023

tomchristie commented Dec 7, 2023

Kludex commented Dec 8, 2023

T-256 left a comment

elupus commented Dec 10, 2023

elupus commented Dec 10, 2023

tomchristie commented Dec 11, 2023 •

edited

Loading

tomchristie commented Dec 11, 2023

tomchristie commented Dec 11, 2023

T-256 commented Dec 11, 2023

karpetrosyan Dec 13, 2023

T-256 Dec 13, 2023

zanieb Dec 14, 2023

karpetrosyan Dec 14, 2023

T-256 left a comment

tomchristie commented Dec 15, 2023

		@@ -449,6 +446,39 @@ def quote(string: str, safe: str = "/") -> str:
		)


		def quote(string: str, safe: str = "/") -> str:

Cleanup URL percent-encoding behavior. #2990

Cleanup URL percent-encoding behavior. #2990

Conversation

tomchristie commented Dec 7, 2023 • edited Loading

Cleanup test cases

Determine correct behaviour

Resolve issue

tomchristie commented Dec 7, 2023

tomchristie commented Dec 7, 2023

tomchristie commented Dec 7, 2023

Kludex commented Dec 8, 2023

T-256 left a comment

Choose a reason for hiding this comment

elupus commented Dec 10, 2023

elupus commented Dec 10, 2023

tomchristie commented Dec 11, 2023 • edited Loading

tomchristie commented Dec 11, 2023

tomchristie commented Dec 11, 2023

T-256 commented Dec 11, 2023

karpetrosyan Dec 13, 2023

Choose a reason for hiding this comment

T-256 Dec 13, 2023

Choose a reason for hiding this comment

zanieb Dec 14, 2023

Choose a reason for hiding this comment

karpetrosyan Dec 14, 2023

Choose a reason for hiding this comment

T-256 left a comment

Choose a reason for hiding this comment

tomchristie commented Dec 15, 2023

tomchristie commented Dec 7, 2023 •

edited

Loading

tomchristie commented Dec 11, 2023 •

edited

Loading