Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
speed up analyze-outcomes stage #144
speed up analyze-outcomes stage #144
Changes from 1 commit
c662496
1fcbc38
828caca
86fb584
ba120af
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why
-T8
? AFAIK we don't have 8-core executors on the CI.Beware of insisting on too much parallelism: this can create a load spike that slows down other executors, resulting in worse performance overall. You can't measure the impact just by running one job.
Measuring on a machine with a different number of cores is not useful to guess at the performance on the CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well,
-T1000
does not downgrade performance locally. So my assumption is that-T8
is fine and will benefit if the CI improves in the future. It looks likexz
caps threads at CPU count (locally, I see improvement up to-T8
), so my expectation is that this will work well on the CI.I am aware of the load-spike concern, but as this is a serial stage, it is a win to steal CPU capacity from the parallel stages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't see a reason to use
-T8
. Why not-T0
?As far as I understand, the actual number of threads is limited by the memory consumption (which I don't expect to matter at
-O0
), by the-T
value and by the file size, but not by the number of cores. Looking at thexz
source code,lzma_cputhreads()
is called to determine whathardware_threads_get()
will return when using-T0
, and then the encoder only relies onhardware_threads_get()
and notlzma_cputhreads()
.Since 8 is an arbitrary number here, I'd want to see some benchmark that justifies it. Otherwise please use
-T0
(the number of cores seems like a sensible default choice).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still unaddressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect them to result in worse compression, but maybe we're ok with less compression but less CPU consumption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trade-off looked non-favourable in all cases; the closest was probably gzip, which is faster (in single-thread mode, but it doesn't have a multi-thread mode), but almost twice as big output - so
xz -T<n>
wins overall.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multithreaded mode may be counterproductive anyway, but have you tried
pigz
? (In a realistic scenario, not on your machine!)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pigz -1
does look considerably better for both perf (0.36s locally) and size (25 MB, about 2x better).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, we need to keep in mind that another element of the compromise is storage. That's the reason we compress, not to save on download time/bandwidth. A vast majority of outcome files are never consumed, but all are stored for some time. I don't know how much we're spending on storage, and in particular how the (compressed) outcome files compare with logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can always up the storage budget if we need to. As it stands, an extra 18 MB / PR run is unlikely to cause big budget issues IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm tracking the
pigz
suggestion (thank you) as a separate follow-up in #150, since that requires additional CI work beyond the scope of this PR.