Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with RooFit vectorization target #44308

Open
makortel opened this issue Mar 4, 2024 · 6 comments
Open

Dealing with RooFit vectorization target #44308

makortel opened this issue Mar 4, 2024 · 6 comments

Comments

@makortel
Copy link
Contributor

makortel commented Mar 4, 2024

In ROOT master the a new vectorizing CPU evaluation was made the default backend in RooFit (see cms-sw/cmsdist#9034 (comment)). By default RooFit has a selection logic based on the capabilities of the CPU (between generic, SSE4.1, AVX, AVX2, AVX-512). We should discuss (at least in a future Core Software meeting) and decide how we want to deal with the RooFit's vectorized backends in CMS.

@makortel
Copy link
Contributor Author

makortel commented Mar 4, 2024

assign core

@makortel
Copy link
Contributor Author

makortel commented Mar 4, 2024

type root

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 4, 2024

New categories assigned: core

@Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 4, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 4, 2024

A new Issue was created by @makortel.

@makortel, @Dr15Jones, @antoniovilela, @smuzaffar, @sextonkennedy, @rappoccio can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

makortel commented Mar 4, 2024

Copying over comments in cms-sw/cmsdist#9034 (comment)

By @guitargeek

... making the new vectorizing CPU evaluation backend in RooFit the default.

The latter will have a big impact on the users, speeding RooFit likelihood minimizations up by up to a factor 10. The new evaluation backend was carefully validated in the last years, and I have fixed all problems I was aware of.
...
The reason why AVX2 code is executed, is because RooFit ships with the evaluation library compiled multiple times for different SIMD instruction sets. Then at runtime, RooFit dynamically loads the fastest version of the library that is supported by the CPU: https://github.com/root-project/root/blob/master/roofit/batchcompute/src/Initialisation.cxx#L68

In that logic, AVX is preferred over SSE. Is that a problem for CMSSW?

Reply by @makortel

The reason why AVX2 code is executed, is because RooFit ships with the evaluation library compiled multiple times for different SIMD instruction sets. Then at runtime, RooFit dynamically loads the fastest version of the library that is supported by the CPU: https://github.com/root-project/root/blob/master/roofit/batchcompute/src/Initialisation.cxx#L68
In that logic, AVX is preferred over SSE. Is that a problem for CMSSW?

We would generally want to be in full control of the vectorization target (or as close as we can get). Our baseline is still SSE3, but there is work ongoing towards deploying a "multi-architecture" build of CMSSW (plus some select externals), some more information in cms-sw/cmssw#43652.

We have some exceptions to this general approach

  • Tensorflow (and I believe also ONNX) are allowed to use their more dynamic mechanisms for wider-than-sse3 vectorization targets
  • We don't try to prevent any dynamic behavior of glibc

With Tensorflow we have had quite some trouble, mostly but not only in special cases (some of the story is recorded in cms-sw/cmssw#42444 and other issues linked there). On a somewhat related note cms-sw/cmssw#44188 shows some "fun" we are currently dealing with Eigen (I hope is not very relevant for our use of ROOT).

I see there is already a way for a user to select the target binary, so minimally we could use that. Do I understand correctly that SSE3 would correspond to generic?

I'm quite sure CMS would e.g. want to skip the original AVX implementation because of the frequency scaling behavior of that era of CPUs.

Anyway, I think in CMS we need to discuss more how we want to deal with the by-default dynamic behavior of RooFit. What kind of guarantees for reproducibility of the fit results does RooFit give between different vectorization targets?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Work in CMS
Development

No branches or pull requests

2 participants