Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lr_checksum_fd_compare() does not return cached calculated value #233

Closed
malmond77 opened this issue Mar 11, 2021 · 1 comment
Closed

lr_checksum_fd_compare() does not return cached calculated value #233

malmond77 opened this issue Mar 11, 2021 · 1 comment
Labels
Triaged Someone on the DNF 5 team has read the issue and determined the next steps to take

Comments

@malmond77
Copy link
Contributor

malmond77 commented Mar 11, 2021

Context: I'm working on https://fedoraproject.org/wiki/Changes/RPMCoW

I've found that DNF still uses some old yum code to produce and validate checksums of downloaded rpms (https://github.com/rpm-software-management/dnf/blob/master/dnf/yum/misc.py#L153). In the normal case: librepo downloads and validates rpm files using librepo. In the less common path, the rpm is already there, and dnf uses the python code linked to verify the checksum.

This works today because the implementation is close enough to not matter.

Enter: #222 . This augments checksum handling with specific comprehension of "transcoded" rpms. This works in the normal path, but fails with existing packages in cache because the python code is used, and it doesn't know about the transcoded .recorded checksum.

Solving this is fairly involved. There's changes to librepo, libdnf, and dnf needed to fix this. As listed: librepo is first, and I'd like to fix a latent bug in the API, or get guidance on how to better do it.

My thought is to make dnf use lr_checksum_fd_compare() and lr_checksum_fd_cmp() instead of yum.misc.checksum(). Why both? Because in one case, we need to validate checksums (https://github.com/rpm-software-management/dnf/blob/master/dnf/package.py#L333) and another we need to just calculate it (https://github.com/rpm-software-management/dnf/blob/master/dnf/package.py#L59)

The API for librepo doesn't explicitly support calculating checksums. The closest we get is is to call lr_checksum_fd_compare() and pass an empty string for expected, ignore matches, and pass a char * for calculated.

If you call lr_checksum_fd_compare() with caching = TRUE, and an xattr is present, then matches is set, but calculated is not set. This feels like a bug: if the caching mechanic is robust enough to use for the normal case, then it should be possible to set *calculated too. Using caching shouldn't come with this drawback.

For now, I'm going to call lr_checksum_fd_compare() in my libdnf SWIG wrapper with caching = FALSE to force a full checksum though librepo. I will reference this issue in my code, and one day, when the fix to this issue is available in downstream packages, I can enable caching for this less common path.

malmond77 added a commit to malmond77/librepo that referenced this issue Mar 11, 2021
If a file is downloaded via librepo (e.g. `dnf install --downloadonly`)
then a request to get the checksum via `lr_checksum_fd_compare()` will
not work. It'll only return whether the checksum is valid, and not the
actual checksum. This is the simple fix.

Addresses rpm-software-management#233
malmond77 added a commit to malmond77/libdnf that referenced this issue Mar 11, 2021
DNF has been carrying around yum's old checksum function. These
functions duplicate code in librepo. They are slower because librepo can
employ caching of digests. Lastly, these functions in Python do not know
about changes in checksum logic like
rpm-software-management/librepo#222

The choices here are:

1. Replicate `lr_checksum_cow_fd()` and caching logic in Python
2. Just use librepo from dnf.

This is 2. Note there is an open bug in librepo that forces no caching
for `checksum_value()`
(rpm-software-management/librepo#233). For
now we can keep the perf hit - this is no worse than the existing code
and aim for more correctness.

This change goes hand in hand with a change to `dnf` itself to make use
of the new functions and eliminate the old ones.
malmond77 added a commit to malmond77/libdnf that referenced this issue Mar 11, 2021
DNF has been carrying around yum's old checksum function. These
functions duplicate code in librepo. They are slower because librepo can
employ caching of digests. Lastly, these functions in Python do not know
about changes in checksum logic like
rpm-software-management/librepo#222

The choices here are:

1. Replicate `lr_checksum_cow_fd()` and caching logic in Python
2. Just use librepo from dnf.

This is 2. Note there is an open bug in librepo that forces no caching
for `checksum_value()`
(rpm-software-management/librepo#233). For
now we can keep the perf hit - this is no worse than the existing code
and aim for more correctness.

This change goes hand in hand with a change to `dnf` itself to make use
of the new functions and eliminate the old ones.

On errors, these functions raise libdnf.error.Error which can be easily
mapped into MiscError in dnf
malmond77 added a commit to malmond77/dnf that referenced this issue Mar 11, 2021
libdnf has the canonical implementation of checksum handling. We aim to
replace all use of dnf.yum.misc.checksum() with this. In doing so, this
fixes installing previously downloaded and transcoded rpms to support

rpm-software-management/librepo#222

This also has some minor performance benefits: librepo's checksum
handling employs caching of previously downloaded files via extended
attributes. This works for ordinary rpms in the dnf cache, but does not
work (yet) for rpm paths specified on the command line due to
rpm-software-management/librepo#233. That
issue is pretty minor, and the fix ends up in libdnf later.

The previous implementation maps all runtime errors to MiscError. We do
this still by taking the libdnf.error.Error class (defined in SWIG) and
map it directly back to the Python exception as before.
malmond77 added a commit to malmond77/dnf that referenced this issue Mar 11, 2021
libdnf has the canonical implementation of checksum handling. We aim to
replace all use of dnf.yum.misc.checksum() with this. In doing so, this
fixes installing previously downloaded and transcoded rpms to support

rpm-software-management/librepo#222

This also has some minor performance benefits: librepo's checksum
handling employs caching of previously downloaded files via extended
attributes. This works for ordinary rpms in the dnf cache, but does not
work (yet) for rpm paths specified on the command line due to
rpm-software-management/librepo#233. That
issue is pretty minor, and the fix ends up in libdnf later.

The previous implementation maps all runtime errors to MiscError. We do
this still by taking the libdnf.error.Error class (defined in SWIG) and
map it directly back to the Python exception as before.
malmond77 added a commit to malmond77/librepo that referenced this issue Mar 12, 2021
If a file is downloaded via librepo (e.g. `dnf install --downloadonly`)
then a request to get the checksum via `lr_checksum_fd_compare()` will
not work. It'll only return whether the checksum is valid, and not the
actual checksum. This is the simple fix.

Addresses rpm-software-management#233
malmond77 added a commit to malmond77/dnf that referenced this issue Mar 12, 2021
libdnf has the canonical implementation of checksum handling. We aim to
replace all use of dnf.yum.misc.checksum() with this. In doing so, this
fixes installing previously downloaded and transcoded rpms to support

rpm-software-management/librepo#222

This also has some minor performance benefits: librepo's checksum
handling employs caching of previously downloaded files via extended
attributes. This works for ordinary rpms in the dnf cache, but does not
work (yet) for rpm paths specified on the command line due to
rpm-software-management/librepo#233. That
issue is pretty minor, and the fix ends up in libdnf later.

The previous implementation maps all runtime errors to MiscError. We do
this still by taking the libdnf.error.Error class (defined in SWIG) and
map it directly back to the Python exception as before.
malmond77 added a commit to malmond77/libdnf that referenced this issue Mar 12, 2021
DNF has been carrying around yum's old checksum function. These
functions duplicate code in librepo. They are slower because librepo can
employ caching of digests. Lastly, these functions in Python do not know
about changes in checksum logic like
rpm-software-management/librepo#222

The choices here are:

1. Replicate `lr_checksum_cow_fd()` and caching logic in Python
2. Just use librepo from dnf.

This is 2. Note there is an open bug in librepo that forces no caching
for `checksum_value()`
(rpm-software-management/librepo#233). For
now we can keep the perf hit - this is no worse than the existing code
and aim for more correctness.

This change goes hand in hand with a change to `dnf` itself to make use
of the new functions and eliminate the old ones.

On errors, these functions raise libdnf.error.Error which can be easily
mapped into MiscError in dnf
@m-blaha m-blaha added the Triaged Someone on the DNF 5 team has read the issue and determined the next steps to take label Mar 15, 2021
malmond77 added a commit to malmond77/librepo that referenced this issue Mar 15, 2021
If a file is downloaded via librepo (e.g. `dnf install --downloadonly`)
then a request to get the checksum via `lr_checksum_fd_compare()` will
not work. It'll only return whether the checksum is valid, and not the
actual checksum. This is the simple fix.

Addresses rpm-software-management#233
malmond77 added a commit to malmond77/librepo that referenced this issue Mar 17, 2021
If a file is downloaded via librepo (e.g. `dnf install --downloadonly`)
then a request to get the checksum via `lr_checksum_fd_compare()` will
not work. It'll only return whether the checksum is valid, and not the
actual checksum. This is the simple fix.

Addresses rpm-software-management#233
m-blaha pushed a commit that referenced this issue Mar 17, 2021
If a file is downloaded via librepo (e.g. `dnf install --downloadonly`)
then a request to get the checksum via `lr_checksum_fd_compare()` will
not work. It'll only return whether the checksum is valid, and not the
actual checksum. This is the simple fix.

Addresses #233
@malmond77
Copy link
Contributor Author

Fix is merged. I will update rpm-software-management/libdnf#1164 to explicitly depend on v1.13.1 which has the fix so I can reliably pass TRUE for caching in checksum_value(). I will also bump versions and depend on them in dnf itself, completing the chain.

malmond77 added a commit to malmond77/libdnf that referenced this issue Mar 17, 2021
DNF has been carrying around yum's old checksum function. These
functions duplicate code in librepo. They are slower because librepo can
employ caching of digests. Lastly, these functions in Python do not know
about changes in checksum logic like
rpm-software-management/librepo#222

The choices here are:

1. Replicate `lr_checksum_cow_fd()` and caching logic in Python
2. Just use librepo from dnf.

This is 2. Note there was bug in librepo that forces no caching
for `checksum_value()`
(rpm-software-management/librepo#233). This is
now fixed, so we depend on librepo-1.13.1 to ensure this fix is present.

This change goes hand in hand with a change to `dnf` itself to make use
of the new functions and eliminate the old ones. This is
rpm-software-management/dnf#1743. We bump the
version of this library to ensure this dependency is expressed properly.

On errors, these functions raise libdnf.error.Error which can be easily
mapped into MiscError in dnf
malmond77 added a commit to malmond77/libdnf that referenced this issue Mar 17, 2021
DNF has been carrying around yum's old checksum function. These
functions duplicate code in librepo. They are slower because librepo can
employ caching of digests. Lastly, these functions in Python do not know
about changes in checksum logic like
rpm-software-management/librepo#222

The choices here are:

1. Replicate `lr_checksum_cow_fd()` and caching logic in Python
2. Just use librepo from dnf.

This is 2. Note there was bug in librepo that forces no caching
for `checksum_value()`
(rpm-software-management/librepo#233). This is
now fixed, so we depend on librepo-1.13.1 to ensure this fix is present.

This change goes hand in hand with a change to `dnf` itself to make use
of the new functions and eliminate the old ones. This is
rpm-software-management/dnf#1743. We bump the
version of this library in the next commit to ensure this dependency is
expressed properly.

On errors, these functions raise libdnf.error.Error which can be easily
mapped into MiscError in dnf
malmond77 added a commit to malmond77/libdnf that referenced this issue Mar 17, 2021
DNF has been carrying around yum's old checksum function. These
functions duplicate code in librepo. They are slower because librepo can
employ caching of digests. Lastly, these functions in Python do not know
about changes in checksum logic like
rpm-software-management/librepo#222

The choices here are:

1. Replicate `lr_checksum_cow_fd()` and caching logic in Python
2. Just use librepo from dnf.

This is 2. Note there was bug in librepo that forces no caching
for `checksum_value()`
(rpm-software-management/librepo#233). This is
now fixed, so we depend on librepo-1.13.1 to ensure this fix is present.

This change goes hand in hand with a change to `dnf` itself to make use
of the new functions and eliminate the old ones. This is
rpm-software-management/dnf#1743. We bump the
version of this library in the next commit to ensure this dependency is
expressed properly.

On errors, these functions raise libdnf.error.Error which can be easily
mapped into MiscError in dnf
m-blaha pushed a commit to rpm-software-management/libdnf that referenced this issue Mar 18, 2021
DNF has been carrying around yum's old checksum function. These
functions duplicate code in librepo. They are slower because librepo can
employ caching of digests. Lastly, these functions in Python do not know
about changes in checksum logic like
rpm-software-management/librepo#222

The choices here are:

1. Replicate `lr_checksum_cow_fd()` and caching logic in Python
2. Just use librepo from dnf.

This is 2. Note there was bug in librepo that forces no caching
for `checksum_value()`
(rpm-software-management/librepo#233). This is
now fixed, so we depend on librepo-1.13.1 to ensure this fix is present.

This change goes hand in hand with a change to `dnf` itself to make use
of the new functions and eliminate the old ones. This is
rpm-software-management/dnf#1743. We bump the
version of this library in the next commit to ensure this dependency is
expressed properly.

On errors, these functions raise libdnf.error.Error which can be easily
mapped into MiscError in dnf
m-blaha pushed a commit to rpm-software-management/dnf that referenced this issue Mar 19, 2021
libdnf has the canonical implementation of checksum handling. We aim to
replace all use of dnf.yum.misc.checksum() with this. In doing so, this
fixes installing previously downloaded and transcoded rpms to support

rpm-software-management/librepo#222

This also has some minor performance benefits: librepo's checksum
handling employs caching of previously downloaded files via extended
attributes. This works for ordinary rpms in the dnf cache, but does not
work (yet) for rpm paths specified on the command line due to
rpm-software-management/librepo#233. That
issue is pretty minor, and the fix ends up in libdnf later.

The previous implementation maps all runtime errors to MiscError. We do
this still by taking the libdnf.error.Error class (defined in SWIG) and
map it directly back to the Python exception as before.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Triaged Someone on the DNF 5 team has read the issue and determined the next steps to take
Projects
None yet
Development

No branches or pull requests

2 participants