CI: oneAPI SYCL for AMD GPUs #3341

nmnobre · 2023-06-02T18:20:27Z

This PR demonstrates how we could use Intel's SYCL compiler to target AMD and Nvidia GPUs.
I understand this goes against AMReX's philosophy of using each vendor's software to target each vendor's hardware.
For that reason, I'm taking a proof-of-concept, here's-what-we-can-do approach, similarly to #3184, without making any changes to the officially supported compilation flows. The objective is to incite discussion, to lure your interest and for me to better understand if this is something you'd ultimately be interested in or not. :-)

Adding support for Nvidia GPUs is quite trivial, mostly because the support in DPC++ is more mature.
On the other hand, adding support for AMD GPUs requires avoiding std:: functions in device code, and using their sycl:: equivalents instead, which would imply reverting changes we made in the past.

Lastly, the public CI workflows don't seem to actually run any SYCL code, they are limited to compilation tests. It'd be interesting to execute this code on actual hardware to assess correctness and performance. I'm fairly confident as I've been testing SYCL with the Electromagnetic PIC tutorial for a while now on both AMD and Nvidia GPUs, but that doesn't stress the entirety of AMReX's code...

Let me know what you think :-)
-Nuno

ax3l · 2023-06-03T00:18:32Z

.github/workflows/dependencies/codeplay/LICENSE.md

+CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS WITH THE
+SOFTWARE.


Below:

Can you download those huge installer files in .github/workflows/dependencies/codeplay/ on the fly?
If we plan to merge this, we would not commit 3MB files to AMReX' git repo, because it makes downloads for downstream users in superbuilds very slow :)

Yes, totally agree.
At the moment, you need to create a token, to which you associate a list of IPs, which then allows you to download on demand using curl/wget. Codeplay say we can contact them to lift the IP whitelist restriction to allow any client to download the plugins. See here. Unfortunately, these plugins are still quite tied to oneAPI releases, but they do tend to be released within one or two days of the respective oneAPI release.

I see. So you use the GitHub action IP range for this?

If you like, we can add a secret environment variable for this in the project that you can pick up in the scripts.

ax3l · 2023-06-03T00:21:59Z

Hi @nmnobre,

Thank you for this, very exciting!

I would not say this goes against any of our philosophy :)
If we are able to support targeting the same platforms with different acceleration backends in a sustainable, then this adds to diversity of supported toolsets, more research tools, more performance tools, more resilience, enables apples-to-apples comparisons for backends for vendors, and overall more awesomeness.

ax3l · 2023-06-03T00:22:19Z

.github/workflows/dependencies/dependencies_hip.sh

@@ -20,7 +20,7 @@ echo 'Acquire::Retries "3";' | sudo tee /etc/apt/apt.conf.d/80-retries
 # Ref.: https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation-Guide.html#ubuntu
 curl -O https://repo.radeon.com/rocm/rocm.gpg.key
 sudo apt-key add rocm.gpg.key
-echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/debian/ ubuntu main' \
+echo 'deb [arch=amd64] https://repo.radeon.com/rocm/apt/5.4.3/ ubuntu main' \


Unrelated change?

On the contrary. The AMD plug-in only supports version 5.4.3. Using 5.5.x leads to some opaque pointers problems. Again, unfortunately these plug-ins are quite tied to specific versions of their dependencies... for the time being at least.

In that case, we need have a new script or a new optional argument for installing an older version. This script has already been used for other jobs that are using the latest rocm release available (5.5 at the moment).

ax3l · 2023-06-03T00:24:42Z

Why did you need to change all the std:: math to amrex::Math::? 😅

Is this the main change of this PR to make the compiler stack work? (Would be totally fine, imho.)

nmnobre · 2023-06-03T00:46:57Z

Why did you need to change all the std:: math to amrex::Math::? 😅

Is this the main change of this PR to make the compiler stack work? (Would be totally fine, imho.)

Because, unfortunately, loads of CXX standard library functions are unsupported on device code for the AMD/HIP backend. This is indeed the most disruptive change.

WeiqunZhang · 2023-06-05T18:05:33Z

We have heard that a few AMReX applications have successfully run on A100 using SYCL without changes in amrex/Src/. So let's not worry about SYCL on AMD GPUs for now until the support for it is more mature. Once we figure out the plugin download issue, we can add a CI for oneAPI on Nvidia GPUs.

nmnobre · 2023-06-15T19:48:06Z

Hi @WeiqunZhang,

Would this work for you for the time being?
I'm skipping all tests to avoid any linking and, thus, any problems with the CXX stdlib functions.

I've confirmed today I was fortunate the ElectromagneticPIC tutorial never uses any of the AMReX functionality calling into std functions within device code. That's why I never faced this problem...

I hear your concerns about the allowed subgroup sizes, I agree with them :)
But let's first see if you are happy with this test, before we proceed to that problem (in any case, we already do a runtime check, so we aren't totally in the dark).

Thank you!

WeiqunZhang · 2023-06-19T06:06:34Z

Sounds good. Left a comment about the script installing rocm.

Note the SYCL subgroup size will still be subject to a runtime check

@ax3l

Hi @ax3l, @WeiqunZhang, I hope you're enjoying your time in Switzerland. :) I've removed some redundancy around the nvcc dependencies files, mostly for consistency with the changes I did to the hip dependencies file in #3341. It should all be pretty self-explanatory. Cheers, -Nuno

ax3l reviewed Jun 3, 2023

View reviewed changes

ax3l requested a review from WeiqunZhang June 3, 2023 00:26

ax3l added GPU install labels Jun 3, 2023

nmnobre mentioned this pull request Jun 6, 2023

CI: oneAPI SYCL for Nvidia GPUs #3351

Closed

nmnobre marked this pull request as draft June 6, 2023 16:06

nmnobre force-pushed the oneapi_beyond branch from e41431d to e6d828b Compare June 15, 2023 19:35

nmnobre changed the title ~~[WIP/POC] CI: oneAPI SYCL for AMD and Nvidia GPUs~~ CI: oneAPI SYCL for AMD GPUs Jun 15, 2023

nmnobre added 4 commits June 27, 2023 11:35

Allow picking the ROCm version to use in CI workflows

31f8e07

Download and install oneAPI for AMD GPUs

e407286

Allow choosing 64 as the SYCL subgroup size at compile time

6cceaa8

Note the SYCL subgroup size will still be subject to a runtime check

Add workflow for oneAPI SYCL on AMD GPUs

31c859c

nmnobre force-pushed the oneapi_beyond branch from e6d828b to 31c859c Compare June 27, 2023 10:37

nmnobre mentioned this pull request Jun 27, 2023

CI: Remove redundancy in nvcc dependencies files #3387

Merged

nmnobre marked this pull request as ready for review June 27, 2023 12:28

WeiqunZhang approved these changes Jun 27, 2023

View reviewed changes

WeiqunZhang merged commit 0236a37 into AMReX-Codes:development Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: oneAPI SYCL for AMD GPUs #3341

CI: oneAPI SYCL for AMD GPUs #3341

nmnobre commented Jun 2, 2023 •

edited

Loading

ax3l Jun 3, 2023

nmnobre Jun 3, 2023

ax3l Jun 5, 2023 •

edited

Loading

ax3l commented Jun 3, 2023

ax3l Jun 3, 2023

nmnobre Jun 3, 2023

WeiqunZhang Jun 19, 2023

nmnobre Jun 27, 2023

ax3l commented Jun 3, 2023

nmnobre commented Jun 3, 2023

WeiqunZhang commented Jun 5, 2023

nmnobre commented Jun 15, 2023

WeiqunZhang commented Jun 19, 2023

CI: oneAPI SYCL for AMD GPUs #3341

CI: oneAPI SYCL for AMD GPUs #3341

Conversation

nmnobre commented Jun 2, 2023 • edited Loading

ax3l Jun 3, 2023

Choose a reason for hiding this comment

nmnobre Jun 3, 2023

Choose a reason for hiding this comment

ax3l Jun 5, 2023 • edited Loading

Choose a reason for hiding this comment

ax3l commented Jun 3, 2023

ax3l Jun 3, 2023

Choose a reason for hiding this comment

nmnobre Jun 3, 2023

Choose a reason for hiding this comment

WeiqunZhang Jun 19, 2023

Choose a reason for hiding this comment

nmnobre Jun 27, 2023

Choose a reason for hiding this comment

ax3l commented Jun 3, 2023

nmnobre commented Jun 3, 2023

WeiqunZhang commented Jun 5, 2023

nmnobre commented Jun 15, 2023

WeiqunZhang commented Jun 19, 2023

nmnobre commented Jun 2, 2023 •

edited

Loading

ax3l Jun 5, 2023 •

edited

Loading