Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

content-app will fully stream a file to the user before doing any checksum verification #5012

Open
dralley opened this issue Feb 1, 2024 · 2 comments · May be fixed by #6026
Open

content-app will fully stream a file to the user before doing any checksum verification #5012

dralley opened this issue Feb 1, 2024 · 2 comments · May be fixed by #6026
Assignees
Milestone

Comments

@dralley
Copy link
Contributor

dralley commented Feb 1, 2024

Version

Any

Describe the bug

In the event that a file pointed to by a remote artifact is changed but left in place (i.e. doesn't 404) and no longer represents the original file (different checksums), the content app will still stream the file in its entirety back to the client before pulp verifies the checksum, which it does when saving the file to artifact storage.

An error message will be printed when this fails, but at this point, the wrong file has already been delivered.

Correct behavior in this situation is actually a bit ambiguous because the chance to download the original copy of the file is probably gone now that it has been replaced at the upstream source. And there's no HTTP status code for "checksum validation failed", so the best we could do might be HTTP status code 404 or 410.

To Reproduce

  1. Create a local repository from where pulp will sync. Ensure you will be able to make changes on the repository content.

Example of a small repository, easy to create on localhost:

mkdir /var/www/html/pub/repo           
cd /var/www/html/pub/repo
curl -O https://repos.fedorapeople.org/pulp/pulp/demo_repos/zoo/bear-4.1-1.noarch.rpm
createrepo .
  1. create a repository that will sync from http://localhost/pub/repo/ and sync it:

  2. Change the content of the rpmfile bear-4.1-1.noarch.rpm from the upstream (on localhost) repository. Ensure the checksum of the file changed!!

$ sha256sum /var/www/html/pub/repo/bear-4.1-1.noarch.rpm 
7a831f9f90bf4d21027572cb503d20b702de8e8785b02c0397445c2e481d81b3  /var/www/html/pub/repo/bear-4.1-1.noarch.rpm    <=== original checksum 

$ echo "CORRUPTED" >> /var/www/html/pub/repo/bear-4.1-1.noarch.rpm                                           <=== forcing a corruption of the file

$ sha256sum /var/www/html/pub/repo/bear-4.1-1.noarch.rpm 
1469d4fc47073fffbecf8e68419adf804de3c9c3645719779778fd0d634fca0e  /var/www/html/pub/repo/bear-4.1-1.noarch.rpm   <==== corrupted checksum
  1. Try downloading the file from the repository created in pulp:
$ curl -O $path_to_pulp_repo/Packages/b/bear-4.1-1.noarch.rpm
$ echo $?     <=== note that command returns no error

$ sha256sum bear-4.1-1.noarch.rpm         <====== checksum is something else

Actual results:

Client receives a bad file and no errors.

Pulpcore-content logs errors when saving the artifact:

Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]: Giving up download_wrapper(...) after 5 tries (pulpcore.exceptions.validation.DigestValidationError: A file located at the url http://localhost/pub/repo/bear-4.1-1.noarch.rpm failed validation due to checksum. Expected '7a831f9f90bf4d21027572cb503d20b702de8e8785b02c0397445c2e481d81b3', Actual '1469d4fc47073fffbecf8e68419adf804de3c9c3645719779778fd0d634fca0e')
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]: pulp [None]: backoff:ERROR: Giving up download_wrapper(...) after 5 tries (pulpcore.exceptions.validation.DigestValidationError: A file located at the url http://localhost/pub/repo/bear-4.1-1.noarch.rpm failed validation due to checksum. Expected '7a831f9f90bf4d21027572cb503d20b702de8e8785b02c0397445c2e481d81b3', Actual '1469d4fc47073fffbecf8e68419adf804de3c9c3645719779778fd0d634fca0e')
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]: [2024-01-31 16:10:08 +0000] [1141] [ERROR] Error handling request
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]: Traceback (most recent call last):
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:   File "/usr/lib64/python3.9/site-packages/aiohttp/web_protocol.py", line 435, in _handle_request
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:     resp = await request_handler(request)
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:   File "/usr/lib64/python3.9/site-packages/aiohttp/web_app.py", line 504, in _handle
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:     resp = await handler(request)
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:   File "/usr/lib64/python3.9/site-packages/aiohttp/web_middlewares.py", line 117, in impl
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:     return await handler(request)
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:   File "/usr/lib/python3.9/site-packages/pulpcore/content/authentication.py", line 41, in authenticate
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:     return await handler(request)
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:   File "/usr/lib/python3.9/site-packages/pulpcore/cache/cache.py", line 339, in cached_function
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:     response = await self.make_entry(
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:   File "/usr/lib/python3.9/site-packages/pulpcore/cache/cache.py", line 378, in make_entry
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:     response = await handler(*args, **kwargs)
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:   File "/usr/lib/python3.9/site-packages/pulpcore/content/handler.py", line 237, in stream_content
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:     return await self._match_and_stream(path, request)
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:   File "/usr/lib/python3.9/site-packages/pulpcore/content/handler.py", line 544, in _match_and_stream
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:     return await self._stream_content_artifact(
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:   File "/usr/lib/python3.9/site-packages/pulpcore/content/handler.py", line 693, in _stream_content_artifact
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:     response = await self._stream_remote_artifact(request, response, remote_artifact)
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:   File "/usr/lib/python3.9/site-packages/pulpcore/content/handler.py", line 943, in _stream_remote_artifact
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:     download_result = await downloader.run()
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:   File "/usr/lib/python3.9/site-packages/pulpcore/download/http.py", line 273, in run
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:     return await download_wrapper()
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:   File "/usr/lib/python3.9/site-packages/backoff/_async.py", line 151, in retry
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:     ret = await target(*args, **kwargs)
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:   File "/usr/lib/python3.9/site-packages/pulpcore/download/http.py", line 258, in download_wrapper
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:     return await self._run(extra_data=extra_data)
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:   File "/usr/lib/python3.9/site-packages/pulp_rpm/app/downloaders.py", line 118, in _run
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:     to_return = await self._handle_response(response)
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:   File "/usr/lib/python3.9/site-packages/pulpcore/download/http.py", line 209, in _handle_response
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:     await self.finalize()
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:   File "/usr/lib/python3.9/site-packages/pulpcore/content/handler.py", line 934, in finalize
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:     await original_finalize()
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:   File "/usr/lib/python3.9/site-packages/pulpcore/download/base.py", line 171, in finalize
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:     self.validate_digests()
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:   File "/usr/lib/python3.9/site-packages/pulpcore/download/base.py", line 223, in validate_digests
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]:     raise DigestValidationError(actual_digest, expected_digest, url=self.url)
Jan 31 11:10:08 satellite.example.com pulpcore-content[1141]: pulpcore.exceptions.validation.DigestValidationError: A file located at the url http://localhost/pub/repo/bear-4.1-1.noarch.rpm failed validation due to checksum. Expected '7a831f9f90bf4d21027572cb503d20b702de8e8785b02c0397445c2e481d81b3', Actual '1469d4fc47073fffbecf8e68419adf804de3c9c3645719779778fd0d634fca0e'

Expected behavior

Well, presumably if the upstream file has changed out from under us, then the chance to download the original is gone, so probably an HTTP status code 404 or 410?

Additional context
https://bugzilla.redhat.com/show_bug.cgi?id=2262110

@lubosmj
Copy link
Member

lubosmj commented Sep 5, 2024

Try closing the connection before sending the last chunk on the digest error.

@pedro-psb
Copy link
Member

We discussed this more on this on the last week's rpm meeting. Some key notes:

Close connection

Q: Do we really want to close the connection after all?
Not obviously better than letting the client get a digest validation error.

A: Yes

  • (a) breaking in the server by closing the connection.
    • pros: Fails faster if server breaks.
    • cons: No error message for the client. At the same time, this is a complex error to report in any case and there is no http code for "maybe one of multiple sources may be broken, maybe temporarilaly? Dunno.".
  • (b) breaking in the client with checksum validation
    • pros: provides some error message to the client. That somehow tells the repo data is bad.
    • cons: the error is correct but doesnt tell the whole story

Stream Design vs More Consistent Error responses

Q: Got feedback from support that we should consider trading the streaming feature (send-as-we-receive) for more consistent error message.

That's it, allowing Pulp to first verify it can reach the content at all before sendind any response, which is slower, but could give us a chance to retry on all available RAs and provide an error message on worst case.

Should we consider this?

A: Probably not.

The client may have to wait very long (and possibly close the conn on their side) if we try every remote before starting a response (on a bad case where we have to try a bunch). Also, as stated above, we can't really provide a reasonably good error message in http standards, and we can't even tell if it's a transient problem or not.

Avoid resource wasting

We may still conder trying something to address

"that we do need to handle the unreachable case in such a way that it's not going to consume needlessly large amounts of resources constantly if the RAs are unreachable" - #5725 (comment)

Better Error on Logs for admins/support

Post meeting though: if we close the conn, the admins or support will probably look at pulp logs.
Then, we could write a more complete description of the caveats with RemoteArtifacts (maybe even point to some "known limitations" docs section).

DigestValidatonError: Couldn't get the expected content from {url}.

Pulp may try different Remote sources for the same package.
If this is failing consistently, check known limitations and workarounds with on-demand syncing:
https://pulpproject.org/...

pedro-psb added a commit to pedro-psb/pulpcore that referenced this issue Nov 14, 2024
Assuming we want to keep our stream-redirect approach on the
content-app, We cant recover from wrong data already sent if
the Remote happens to be corrupted (contains wrong binaries).

In order to not give a 200 reponse to client, we decided to
close the connection as soon as the request handler realizes
the checksum is wrong.

That only happens after we already sent the whole blob minus EOF,
so we close the connection before sending the EOF.

fixes pulp#5012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants