-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: oneAPI SYCL for AMD GPUs #3341
CI: oneAPI SYCL for AMD GPUs #3341
Conversation
CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH THE | ||
SOFTWARE. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below:
Can you download those huge installer files in .github/workflows/dependencies/codeplay/
on the fly?
If we plan to merge this, we would not commit 3MB files to AMReX' git repo, because it makes downloads for downstream users in superbuilds very slow :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, totally agree.
At the moment, you need to create a token, to which you associate a list of IPs, which then allows you to download on demand using curl/wget. Codeplay say we can contact them to lift the IP whitelist restriction to allow any client to download the plugins. See here. Unfortunately, these plugins are still quite tied to oneAPI releases, but they do tend to be released within one or two days of the respective oneAPI release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So you use the GitHub action IP range for this?
If you like, we can add a secret environment variable for this in the project that you can pick up in the scripts.
Hi @nmnobre, Thank you for this, very exciting! I would not say this goes against any of our philosophy :) |
@@ -20,7 +20,7 @@ echo 'Acquire::Retries "3";' | sudo tee /etc/apt/apt.conf.d/80-retries | |||
# Ref.: https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html#ubuntu | |||
curl -O https://repo.radeon.com/rocm/rocm.gpg.key | |||
sudo apt-key add rocm.gpg.key | |||
echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/debian/ ubuntu main' \ | |||
echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/5.4.3/ ubuntu main' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the contrary. The AMD plug-in only supports version 5.4.3. Using 5.5.x leads to some opaque pointers problems. Again, unfortunately these plug-ins are quite tied to specific versions of their dependencies... for the time being at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, we need have a new script or a new optional argument for installing an older version. This script has already been used for other jobs that are using the latest rocm release available (5.5 at the moment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Why did you need to change all the Is this the main change of this PR to make the compiler stack work? (Would be totally fine, imho.) |
Because, unfortunately, loads of CXX standard library functions are unsupported on device code for the AMD/HIP backend. This is indeed the most disruptive change. |
We have heard that a few AMReX applications have successfully run on A100 using SYCL without changes in amrex/Src/. So let's not worry about SYCL on AMD GPUs for now until the support for it is more mature. Once we figure out the plugin download issue, we can add a CI for oneAPI on Nvidia GPUs. |
Hi @WeiqunZhang, Would this work for you for the time being? I've confirmed today I was fortunate the ElectromagneticPIC tutorial never uses any of the AMReX functionality calling into std functions within device code. That's why I never faced this problem... I hear your concerns about the allowed subgroup sizes, I agree with them :) Thank you! |
Sounds good. Left a comment about the script installing rocm. |
Note the SYCL subgroup size will still be subject to a runtime check
Hi @ax3l, @WeiqunZhang, I hope you're enjoying your time in Switzerland. :) I've removed some redundancy around the nvcc dependencies files, mostly for consistency with the changes I did to the hip dependencies file in #3341. It should all be pretty self-explanatory. Cheers, -Nuno
Hi @WeiqunZhang and @ax3l,
This PR demonstrates how we could use Intel's SYCL compiler to target AMD and Nvidia GPUs.
I understand this goes against AMReX's philosophy of using each vendor's software to target each vendor's hardware.
For that reason, I'm taking a proof-of-concept, here's-what-we-can-do approach, similarly to #3184, without making any changes to the officially supported compilation flows. The objective is to incite discussion, to lure your interest and for me to better understand if this is something you'd ultimately be interested in or not. :-)
Adding support for Nvidia GPUs is quite trivial, mostly because the support in DPC++ is more mature.
On the other hand, adding support for AMD GPUs requires avoiding std:: functions in device code, and using their sycl:: equivalents instead, which would imply reverting changes we made in the past.
Lastly, the public CI workflows don't seem to actually run any SYCL code, they are limited to compilation tests. It'd be interesting to execute this code on actual hardware to assess correctness and performance. I'm fairly confident as I've been testing SYCL with the Electromagnetic PIC tutorial for a while now on both AMD and Nvidia GPUs, but that doesn't stress the entirety of AMReX's code...
Let me know what you think :-)
-Nuno