-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for rpm2extents transcoder #222
base: master
Are you sure you want to change the base?
Conversation
a74ef8f
to
9d8e174
Compare
9d8e174
to
f09fd4b
Compare
The second commit fixes an issue that only really surfaced when using public mirrors. In my previous testing (on CentOS) I didn't hit it because our internal mirrors were generally more consistent. I'm aiming to get this code into Fedora 34 shortly, either as part of an updated tagged release (1.12.2?) or if not, just as a patch we carry in the rpm src. As part of the Fedora 34 change: this code path is intended to be optional. As far as I can tell, if the environment variable isn't set, then nothing should be different. |
@malmond77 This looks good to me, but I am not comfortable merging this until the RPM PR is merged: rpm-software-management/rpm#1470 |
f09fd4b
to
cef6443
Compare
DNF has been carrying around yum's old checksum function. These functions duplicate code in librepo. They are slower because librepo can employ caching of digests. Lastly, these functions in Python do not know about changes in checksum logic like rpm-software-management/librepo#222 The choices here are: 1. Replicate `lr_checksum_cow_fd()` and caching logic in Python 2. Just use librepo from dnf. This is 2. Note there is an open bug in librepo that forces no caching for `checksum_value()` (rpm-software-management/librepo#233). For now we can keep the perf hit - this is no worse than the existing code and aim for more correctness. This change goes hand in hand with a change to `dnf` itself to make use of the new functions and eliminate the old ones.
DNF has been carrying around yum's old checksum function. These functions duplicate code in librepo. They are slower because librepo can employ caching of digests. Lastly, these functions in Python do not know about changes in checksum logic like rpm-software-management/librepo#222 The choices here are: 1. Replicate `lr_checksum_cow_fd()` and caching logic in Python 2. Just use librepo from dnf. This is 2. Note there is an open bug in librepo that forces no caching for `checksum_value()` (rpm-software-management/librepo#233). For now we can keep the perf hit - this is no worse than the existing code and aim for more correctness. This change goes hand in hand with a change to `dnf` itself to make use of the new functions and eliminate the old ones. On errors, these functions raise libdnf.error.Error which can be easily mapped into MiscError in dnf
libdnf has the canonical implementation of checksum handling. We aim to replace all use of dnf.yum.misc.checksum() with this. In doing so, this fixes installing previously downloaded and transcoded rpms to support rpm-software-management/librepo#222 This also has some minor performance benefits: librepo's checksum handling employs caching of previously downloaded files via extended attributes. This works for ordinary rpms in the dnf cache, but does not work (yet) for rpm paths specified on the command line due to rpm-software-management/librepo#233. That issue is pretty minor, and the fix ends up in libdnf later. The previous implementation maps all runtime errors to MiscError. We do this still by taking the libdnf.error.Error class (defined in SWIG) and map it directly back to the Python exception as before.
libdnf has the canonical implementation of checksum handling. We aim to replace all use of dnf.yum.misc.checksum() with this. In doing so, this fixes installing previously downloaded and transcoded rpms to support rpm-software-management/librepo#222 This also has some minor performance benefits: librepo's checksum handling employs caching of previously downloaded files via extended attributes. This works for ordinary rpms in the dnf cache, but does not work (yet) for rpm paths specified on the command line due to rpm-software-management/librepo#233. That issue is pretty minor, and the fix ends up in libdnf later. The previous implementation maps all runtime errors to MiscError. We do this still by taking the libdnf.error.Error class (defined in SWIG) and map it directly back to the Python exception as before.
libdnf has the canonical implementation of checksum handling. We aim to replace all use of dnf.yum.misc.checksum() with this. In doing so, this fixes installing previously downloaded and transcoded rpms to support rpm-software-management/librepo#222 This also has some minor performance benefits: librepo's checksum handling employs caching of previously downloaded files via extended attributes. This works for ordinary rpms in the dnf cache, but does not work (yet) for rpm paths specified on the command line due to rpm-software-management/librepo#233. That issue is pretty minor, and the fix ends up in libdnf later. The previous implementation maps all runtime errors to MiscError. We do this still by taking the libdnf.error.Error class (defined in SWIG) and map it directly back to the Python exception as before.
DNF has been carrying around yum's old checksum function. These functions duplicate code in librepo. They are slower because librepo can employ caching of digests. Lastly, these functions in Python do not know about changes in checksum logic like rpm-software-management/librepo#222 The choices here are: 1. Replicate `lr_checksum_cow_fd()` and caching logic in Python 2. Just use librepo from dnf. This is 2. Note there is an open bug in librepo that forces no caching for `checksum_value()` (rpm-software-management/librepo#233). For now we can keep the perf hit - this is no worse than the existing code and aim for more correctness. This change goes hand in hand with a change to `dnf` itself to make use of the new functions and eliminate the old ones. On errors, these functions raise libdnf.error.Error which can be easily mapped into MiscError in dnf
DNF has been carrying around yum's old checksum function. These functions duplicate code in librepo. They are slower because librepo can employ caching of digests. Lastly, these functions in Python do not know about changes in checksum logic like rpm-software-management/librepo#222 The choices here are: 1. Replicate `lr_checksum_cow_fd()` and caching logic in Python 2. Just use librepo from dnf. This is 2. Note there was bug in librepo that forces no caching for `checksum_value()` (rpm-software-management/librepo#233). This is now fixed, so we depend on librepo-1.13.1 to ensure this fix is present. This change goes hand in hand with a change to `dnf` itself to make use of the new functions and eliminate the old ones. This is rpm-software-management/dnf#1743. We bump the version of this library to ensure this dependency is expressed properly. On errors, these functions raise libdnf.error.Error which can be easily mapped into MiscError in dnf
DNF has been carrying around yum's old checksum function. These functions duplicate code in librepo. They are slower because librepo can employ caching of digests. Lastly, these functions in Python do not know about changes in checksum logic like rpm-software-management/librepo#222 The choices here are: 1. Replicate `lr_checksum_cow_fd()` and caching logic in Python 2. Just use librepo from dnf. This is 2. Note there was bug in librepo that forces no caching for `checksum_value()` (rpm-software-management/librepo#233). This is now fixed, so we depend on librepo-1.13.1 to ensure this fix is present. This change goes hand in hand with a change to `dnf` itself to make use of the new functions and eliminate the old ones. This is rpm-software-management/dnf#1743. We bump the version of this library in the next commit to ensure this dependency is expressed properly. On errors, these functions raise libdnf.error.Error which can be easily mapped into MiscError in dnf
DNF has been carrying around yum's old checksum function. These functions duplicate code in librepo. They are slower because librepo can employ caching of digests. Lastly, these functions in Python do not know about changes in checksum logic like rpm-software-management/librepo#222 The choices here are: 1. Replicate `lr_checksum_cow_fd()` and caching logic in Python 2. Just use librepo from dnf. This is 2. Note there was bug in librepo that forces no caching for `checksum_value()` (rpm-software-management/librepo#233). This is now fixed, so we depend on librepo-1.13.1 to ensure this fix is present. This change goes hand in hand with a change to `dnf` itself to make use of the new functions and eliminate the old ones. This is rpm-software-management/dnf#1743. We bump the version of this library in the next commit to ensure this dependency is expressed properly. On errors, these functions raise libdnf.error.Error which can be easily mapped into MiscError in dnf
DNF has been carrying around yum's old checksum function. These functions duplicate code in librepo. They are slower because librepo can employ caching of digests. Lastly, these functions in Python do not know about changes in checksum logic like rpm-software-management/librepo#222 The choices here are: 1. Replicate `lr_checksum_cow_fd()` and caching logic in Python 2. Just use librepo from dnf. This is 2. Note there was bug in librepo that forces no caching for `checksum_value()` (rpm-software-management/librepo#233). This is now fixed, so we depend on librepo-1.13.1 to ensure this fix is present. This change goes hand in hand with a change to `dnf` itself to make use of the new functions and eliminate the old ones. This is rpm-software-management/dnf#1743. We bump the version of this library in the next commit to ensure this dependency is expressed properly. On errors, these functions raise libdnf.error.Error which can be easily mapped into MiscError in dnf
libdnf has the canonical implementation of checksum handling. We aim to replace all use of dnf.yum.misc.checksum() with this. In doing so, this fixes installing previously downloaded and transcoded rpms to support rpm-software-management/librepo#222 This also has some minor performance benefits: librepo's checksum handling employs caching of previously downloaded files via extended attributes. This works for ordinary rpms in the dnf cache, but does not work (yet) for rpm paths specified on the command line due to rpm-software-management/librepo#233. That issue is pretty minor, and the fix ends up in libdnf later. The previous implementation maps all runtime errors to MiscError. We do this still by taking the libdnf.error.Error class (defined in SWIG) and map it directly back to the Python exception as before.
cef6443
to
566400a
Compare
The previous code would only truncate the file, and turn off resume if the xattr was missing. This left a gap when resume was disabled, and the xattr was missing - the file wouldn't get truncated if present. This behaviour is strictly needed for rpm-software-management#222 to work correctly. That does not support resuming (sorry, one day it could), so it explicitly disables resuming.
The previous code would only truncate the file, and turn off resume if the xattr was missing. This left a gap when resume was disabled, and the xattr was missing - the file wouldn't get truncated if present. This behaviour is strictly needed for rpm-software-management#222 to work correctly. That does not support resuming (sorry, one day it could), so it explicitly disables resuming.
@Conan-Kudo This is not a change I am comfortable with from a security perspective, as per past discussions. The verification needs to happen before transcoding, which means either buffering the entire package on disk or changing the metadata format. |
To elaborate: the idea of rpm2extents is fine, but it needs to be a separate entry in the metadata XML, with its own digest. |
Two related parts: 1. If `LIBREPO_TRANSCODE_RPMS` environment is set to a program (with parameters) then downloads are piped through it. 2. Transcoded RPMS by definition will not have the same bits on disk as downloaded. This is inherent. The transcoder is tasked with measuring the bits that enter stdin and storing a copy of the digest(s) seen in the footer. `librepo` can then use these stored digests instead if the environment variable is set. This is part of changes described in https://fedoraproject.org/wiki/Changes/RPMCoW
I've observed errors when some mirrors are not completely synced. The library tries to download a file, but gets a 404 error. This means we need to retry with another mirror, not crash out. This was crashing because setting `err` in `clean_transcode()` was firing the assert at the start of `truncate_transfer_file()`. Note this failure mode was most common with 404's, but any transfer error could likely have turned fatal, for invalid reasons. We use `cleanup_transcode()` in two contexts. 1. Within `check_transfer_statuses()`. The first call here happens during a normal download after `check_finished_transfer_status()`. The cleanup checks for errors, and any here will be flagged as a `transfer_err` (not a general, err). 2. In 3 other places where an error has already occurred. We need to wait for the program to exit (and it should stop due to a SIGPIPE or short read from stdin), but we don't need to set an error because something already is handling one. It doesn't matter that the transcoder crashed because we're not going to use the output anyway, and we are likely to retry.
566400a
to
e890128
Compare
Two related parts:
LIBREPO_TRANSCODE_RPMS
environment is set to a program (with parameters) then downloads are piped through it.librepo
can then use these stored digests instead if the environment variable is set.This is part of changes described in https://fedoraproject.org/wiki/Changes/RPMCoW