Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This gives a major boost in librepo performance. For a reposync of an Amazon Linux 2023 x86-64 repository on a m5n.16xlarge EC2 instance with a 500MB/sec 3000IOP EBS volume, this alone reduces run time by 30 seconds of wall time, and gets reposync nearly using a whole core rather than only two thirds of one.
For reference, my benchmarking has been done on a
m5n.16xlarge
EC2 instance to the in-region S3 buckets as well as to the CDN repositories. That instance type has 256GB memory, a 75Gbit network connection, and is a 64 core Cascade Lake system. The root volume is a 256GB gp3 EBS volume with 500MB/sec of IO and 3000 IOPs.The background of this is that a lot of EC2 instances don't live that long (relatively speaking), and never install RPMs except on launch - so all the time-to-install RPMs is time spent scaling up a system that could be better served by running the customer workload.
Goes well when paired with #294 and #295 and #296
What I'm not entirely sure of here is the other implications of this change - as in, what is relying on this checksum being crash safe, and should we instead re-compute it sometimes?
I'm open to putting this behind an ifdef or something if that seems safer. I'd love input here.