Skip to content

Commit

Permalink
EmbedAPI: decode filepath before open them from S3 storage (#9860)
Browse files Browse the repository at this point in the history
* EmbedAPI: decode filepath before open them from S3 storage

We can use encoded URLs (e.g. with %20 for white spaces) for external requests
since they are valid URLs. However, when it's internal and we get it from the
storage, we have to convert them to regular characters (white spaces, in this
example).

We use `urllib.parse.unquote` for this before passing the filename to the S3
backend storage.

Closes #8301

* `unquote` only the filename provided by the user

* Remove the fix from v2 since it's deprecated anyways

The path is calculated from inside `get_storage_path`, so there is no simple way
to unquote only the filename.

Whitespaces on names is an edge case and APIv2 should not be used anyways. So, I
prefer to not fix the bug here than expose a security issue.
  • Loading branch information
humitos authored Jan 10, 2023
1 parent 8048602 commit ca4e3f8
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 0 deletions.
17 changes: 17 additions & 0 deletions readthedocs/embed/v3/tests/test_internal_pages.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,3 +78,20 @@ def test_default_main_section(self, storage_exists, storage_open, app, client):
'content': content,
'external': False,
}

@pytest.mark.sphinx("html", srcdir=srcdir, freshenv=False)
@mock.patch("readthedocs.embed.v3.views.build_media_storage.open")
@mock.patch("readthedocs.embed.v3.views.build_media_storage.exists")
def test_s3_storage_decoded_filename(
self, storage_exists, storage_open, app, client
):
storage_exists.return_value = True
storage_open.side_effect = self._mock_open('<div id="section">content</div>')

params = {
"url": "https://project.readthedocs.io/en/latest/My%20Spaced%20File.html#section",
}
response = client.get(self.api_url, params)
assert response.status_code == 200

storage_open.assert_called_once_with("html/project/latest/My Spaced File.html")
6 changes: 6 additions & 0 deletions readthedocs/embed/v3/views.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""Views for the EmbedAPI v3 app."""

import re
import urllib.parse
from urllib.parse import urlparse

import requests
Expand Down Expand Up @@ -90,11 +91,16 @@ def _get_page_content_from_storage(self, project, version_slug, filename):
include_file=False,
version_type=version.type,
)

# Decode encoded URLs (e.g. convert %20 into a whitespace)
filename = urllib.parse.unquote(filename)

relative_filename = filename.lstrip("/")
file_path = build_media_storage.join(
storage_path,
relative_filename,
)

try:
with build_media_storage.open(file_path) as fd: # pylint: disable=invalid-name
return fd.read()
Expand Down

0 comments on commit ca4e3f8

Please sign in to comment.