Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue where query parameters for S3 are double encoded #704

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

kerryjj
Copy link

@kerryjj kerryjj commented Jul 21, 2020

When using ExAws.S3.list_objects_v2 or ExAws.S3.list_objects and a key that has a space or unicode chars in it, the query params passed to S3 are double encoded.

The following command fails because the URL signatures don't match. This is because the query prefix is encoded to replace " " with +. Then the sanitize method replacces + with %2B.

This PR changes the sanitize method so that it only sanitised the path and not the query params again and the output should be as follows. There are tests as part of the PR that show the new behaviour.

Copy link
Contributor

@benwilson512 benwilson512 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anyone have a good location to find example encoding tests from AWS itself? Or known good tests in another language? Every time there is an encoding issue I get a PR addressing that specific issue in some partial way and over time param encoding has gotten really messy in ExAws. I'm wondering if there is some more canonical, comprehensive thing we could be using.

@@ -106,6 +105,9 @@ defmodule ExAws.Request.Url do

def uri_encode(url), do: URI.encode(url, &valid_path_char?/1)

defp query_replace_spaces(nil), do: nil
defp query_replace_spaces(query), do: String.replace(query, "+", "%20")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's %2B in one place and %20 in another. This seems wrong?

Copy link
Author

@kerryjj kerryjj Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that whole String.replace(new_path, "+", "%2B") isn't needed and has no effect. The new_path gets uri_encoded a little higher up.

In the S3 operation code path, the url that gets passed to sanize has already had the query params encoded, but the reset of the URL hasn't been. So if new_path in this case contains a +, then it is actually a + and not a space.

s3.ex:29 calls ExAws.Request.Url.build(config) which is what encodes the query and not the path.

It feels a little bit inconsistent to encode one earlier and the other later. Maybe it would be better to shift the encoding to one place in the code. And perhaps it's best to encode both the path and the query in Url.sanitize() because ExAws.Operation.{JSON,Query,RestQuery} also call ExAws.Request.Url.build and therefore have their query and not path encoded if they include any.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a quick change taking out that replace. Let me know if you think it'd be a good idea for me to try to move the uri_encoding out of ExAws.Request.Url.build(config) and to encode both the path and query in ExAws.Request.Url.sanitize instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query_replace_spaces appears to actually be replacing +'s rather than spaces .. and I think that could more simply use URI.encode_www_form/1 instead of URI.encode.

On a related note, the query string parse is redundant with the URI.parse/1 call which also separates out the query string.

It would be nice to see this fix get into the repo, but perhaps with a bit more cleanup?

@kerryjj
Copy link
Author

kerryjj commented Jul 22, 2020

Yeah good question about the tests. I have a much simpler implementation of the encoding now that encodes both in the build() method. It works for the tests currently in url_test.ex but I'm a little nervous about pushing it as it's not a comprehensive set of tests.

I'll see what I can find for more test coverage, but yeah if anyone can point out a good suite to copy that would help.

@kerryjj
Copy link
Author

kerryjj commented Aug 14, 2020

Sorry I haven't forgotten about trying to get better test coverage in here. I've just ended up with crunch time at work.

I'll hopefully get a chance to add some more over the weekend.

I have had a look through a few of the amazon provided libraries in python and ruby to see if there's an obvious set of tests to copy from there and haven't found anything comprehensive yet. I'll keep looking, but if anyone else finds a set of tests that are worth reviewing and perhaps copying and can point me in the right direction I'm happy to add them.

I am using this patch in production and it's fixed the original issue I had and the test cases in this PR at least cover that case. i.e. "When using ExAws.S3.list_objects_v2 or ExAws.S3.list_objects and a key that has a space or unicode chars in it, the query params passed to S3 are double encoded."

The encoding in the code here can definitely be simplified.

@jotaviobiondo
Copy link

Heyy, thanks for creating the lib! It's super helpful :)

there's any updates on this issue? I'm also facing this problem and I'm currently using this PR as a fix.

@bernardd
Copy link
Contributor

bernardd commented Jun 9, 2023

@jotaviobiondo No, no updates. If you'd like to have a stab at adding some broader encoding tests and tidying up the PR, I'd be more than happy to accept an updated version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants