Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up analyze-outcomes stage #144

Merged
merged 5 commits into from
Jan 4, 2024
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions vars/analysis.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -237,16 +237,19 @@ def process_outcomes() {
deleteDir()
}

// The complete outcome file is ~14MB compressed as I write.
// The complete outcome file is 2.1GB uncompressed / 56MB compressed as I write.
// Often we just want the failures, so make an artifact with just those.
// Only produce a failure file if there was a failing job (otherwise
// we'd just waste time creating an empty file).
//
// Note that grep ';FAIL;' could pick up false positives, if another field such
// as test description or test suite was "FAIL".
if (gen_jobs.failed_builds) {
sh '''\
awk -F';' '$5 == "FAIL"' outcomes.csv >"failures.csv"
LC_ALL=C grep ';FAIL;' outcomes.csv >"failures.csv"
# Compress the failure list if it is large (for some value of large)
if [ "$(wc -c <failures.csv)" -gt 99999 ]; then
xz failures.csv
xz -0 -T8 failures.csv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why -T8? AFAIK we don't have 8-core executors on the CI.

Beware of insisting on too much parallelism: this can create a load spike that slows down other executors, resulting in worse performance overall. You can't measure the impact just by running one job.

Using -T8 (8 threads) reduces it further to 2.6s locally (expect ~5s on CI). There's no penalty for going too big on thread count, but no benefit beyond about 8-16.

Measuring on a machine with a different number of cores is not useful to guess at the performance on the CI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, -T1000 does not downgrade performance locally. So my assumption is that -T8 is fine and will benefit if the CI improves in the future. It looks like xz caps threads at CPU count (locally, I see improvement up to -T8), so my expectation is that this will work well on the CI.

I am aware of the load-spike concern, but as this is a serial stage, it is a win to steal CPU capacity from the parallel stages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't see a reason to use -T8. Why not -T0?

As far as I understand, the actual number of threads is limited by the memory consumption (which I don't expect to matter at -O0), by the -T value and by the file size, but not by the number of cores. Looking at the xz source code, lzma_cputhreads() is called to determine what hardware_threads_get() will return when using -T0, and then the encoder only relies on hardware_threads_get() and not lzma_cputhreads().

Since 8 is an arbitrary number here, I'd want to see some benchmark that justifies it. Otherwise please use -T0 (the number of cores seems like a sensible default choice).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still unaddressed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lzma, gzip, bzip2, zip are all worse.

I'd expect them to result in worse compression, but maybe we're ok with less compression but less CPU consumption?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trade-off looked non-favourable in all cases; the closest was probably gzip, which is faster (in single-thread mode, but it doesn't have a multi-thread mode), but almost twice as big output - so xz -T<n> wins overall.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multithreaded mode may be counterproductive anyway, but have you tried pigz? (In a realistic scenario, not on your machine!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pigz -1 does look considerably better for both perf (0.36s locally) and size (25 MB, about 2x better).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, we need to keep in mind that another element of the compromise is storage. That's the reason we compress, not to save on download time/bandwidth. A vast majority of outcome files are never consumed, but all are stored for some time. I don't know how much we're spending on storage, and in particular how the (compressed) outcome files compare with logs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always up the storage budget if we need to. As it stands, an extra 18 MB / PR run is unlikely to cause big budget issues IMO.

Copy link
Contributor Author

@daverodgman daverodgman Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tracking the pigz suggestion (thank you) as a separate follow-up in #150, since that requires additional CI work beyond the scope of this PR.

fi
'''
}
Expand All @@ -258,7 +261,7 @@ fi
}
}
} finally {
sh 'xz outcomes.csv'
sh 'xz -0 -T8 outcomes.csv'
archiveArtifacts(artifacts: 'outcomes.csv.xz, failures.csv*',
fingerprint: true,
allowEmptyArchive: true)
Expand Down