-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RF] Make RooBatchCompute thread safe #14877
[RF] Make RooBatchCompute thread safe #14877
Conversation
19906a9
to
09b3913
Compare
09b3913
to
a99fd34
Compare
Test Results 12 files 12 suites 2d 8h 12m 42s ⏱️ Results for commit 2abf7e5. ♻️ This comment has been updated with latest results. |
This can be quite a big overhead if the dataset is small.
a99fd34
to
32dc1a5
Compare
Starting build on |
Build failed on ROOT-ubuntu2004/python3. Errors:
|
Build failed on ROOT-ubuntu2204/nortcxxmod. Errors:
|
Build failed on ROOT-performance-centos8-multicore/soversion. Errors:
|
Build failed on windows10/default. Errors:
|
32dc1a5
to
174c302
Compare
Starting build on |
Build failed on ROOT-ubuntu2204/nortcxxmod. Errors:
|
Build failed on ROOT-ubuntu2004/python3. Errors:
|
Build failed on ROOT-performance-centos8-multicore/soversion. Errors:
|
Build failed on windows10/default. Errors:
|
174c302
to
2abf7e5
Compare
Starting build on |
In the RooBatchCompute CPU library, all scalar inputs have to be copied n times into a buffer that is as long as the SIMD registers, to allow for vectorization in all cases. To avoid frequent memory allocations, this buffer was made a `static` variable in the original implementation of the batchcompute library, which of course made it non-threadsafe. This is now hitting us, because RooFit needs to be able to do multiple fits concurrently. This is a requirement for CMSSW, and a blocker for ROOT master adoption in CMSSW since the new CPU backend is the default: cms-sw/cmsdist#9034 This commit fixes the concurrency problem by doing the buffering in the DataMaps that are used in the `RooFit::Evaluator`. Like this, multiple computation graphs can be concurrently evaluated. It was tested with the ATLAS benchmarks in `rootbench` that the fitting performance remains the same.
The Batches classes in RooBatchCompute have only public data members anyway. It's not necessary to have extra member functions for changing them.
Build failed on windows10/default. Errors:
And 71 more |
The failures on Windows Jenkins are the occasional out-of-heap-space of the compiler. It has nothing to do with this PR, and the new CI is green for Windows 32 bit and 64 bit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I trust you Jonas on these important improvements and fixes
In the RooBatchCompute CPU library, all scalar inputs have to be copied
n times into a buffer that is as long as the SIMD registers, to allow
for vectorization in all cases.
To avoid frequent memory allocations, this buffer was made a
static
variable in the original implementation of the batchcompute library,
which of course made it non-threadsafe.
This is now hitting us, because RooFit needs to be able to do multiple
fits concurrently. This is a requirement for CMSSW, and a blocker for
ROOT master adoption in CMSSW since the new CPU backend is the default:
cms-sw/cmsdist#9034
This commit fixes the concurrency problem by doing the buffering in the
DataMaps that are used in the
RooFit::Evaluator
. Like this, multiplecomputation graphs can be concurrently evaluated.
It was tested with the ATLAS benchmarks in
rootbench
that the fittingperformance remains the same.
This PR also makes some code style and memory management improvements documented in the commit messages.
It also updates the documentation to make clear the new CPU backend is now the default, which was not done in this PR:
#14742