Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detection of ROCM vs CUDA device on Clariden #1447

Closed
edopao opened this issue Feb 8, 2024 · 4 comments · Fixed by #1448
Closed

Detection of ROCM vs CUDA device on Clariden #1447

edopao opened this issue Feb 8, 2024 · 4 comments · Fixed by #1448
Labels
gt4py.next Issues concerning the new version with support for non-cartesian grids. module: backend Related to analysis/backend subpackages

Comments

@edopao
Copy link
Contributor

edopao commented Feb 8, 2024

I encountered an issue when running GT4Py with gtfn_gpu backend on Clariden. I run on a GPU node with Nvidia A100, but this code will select the ROCM cupy device:

CUPY_DEVICE: Final[Literal[None, core_defs.DeviceType.CUDA, core_defs.DeviceType.ROCM]] = (
    None
    if not cp
    else (core_defs.DeviceType.ROCM if cp.cuda.get_hipcc_path() else core_defs.DeviceType.CUDA)
)

I suspect that CUDA was installed on Clariden with support for both Nvidia and AMD GPUs, depending on the type of node allocated by Slurm.

You can run this test:

pytest -s -v -k gtfn_gpu tests/next_tests/integration_tests/multi_feature_tests/ffront_tests/test_icon_like_scan.py::test_solve_nonhydro_stencil_52_like

It will produce this output:

        if self.device_type == core_defs.DeviceType.ROCM:
            # until we can rely on dlpack
>           ndarray.__hip_array_interface__ = {  # type: ignore[attr-defined]
                "shape": ndarray.shape,  # type: ignore[union-attr]
                "typestr": ndarray.dtype.descr[0][1],  # type: ignore[union-attr]
                "descr": ndarray.dtype.descr,  # type: ignore[union-attr]
                "stream": 1,
                "version": 3,
                "strides": ndarray.strides,  # type: ignore[union-attr, attr-defined]
                "data": (ndarray.data.ptr, False),  # type: ignore[union-attr, attr-defined]
            }
E           AttributeError: 'ndarray' object has no attribute '__hip_array_interface__'

src/gt4py/storage/allocators.py:270: AttributeError
================================================================= short test summary info ==================================================================
ERROR tests/next_tests/integration_tests/multi_feature_tests/ffront_tests/test_icon_like_scan.py::test_solve_nonhydro_stencil_52_like[gtfn.run_gtfn_gpu] - AttributeError: 'ndarray' object has no attribute '__hip_array_interface__'
@edopao edopao added gt4py.next Issues concerning the new version with support for non-cartesian grids. module: backend Related to analysis/backend subpackages labels Feb 8, 2024
@havogt
Copy link
Contributor

havogt commented Feb 8, 2024

Does the environment have an installation of cupy-rocm or just cupy-cuda? When we wrote that code, there was no clean/documented way to have cupy for both gpu types. Not sure if that changed.

@edopao
Copy link
Contributor Author

edopao commented Feb 8, 2024

cupy-cuda11x 13.0.0

>>> import cupy as cp
>>> cp.cuda.get_hipcc_path()
'/usr/bin/hipcc'

@havogt
Copy link
Contributor

havogt commented Feb 8, 2024

Maybe we should use this variable cp.cuda.runtime.is_hip

@edopao
Copy link
Contributor Author

edopao commented Feb 8, 2024

Yes, that seems to work!

@edopao edopao linked a pull request Feb 8, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gt4py.next Issues concerning the new version with support for non-cartesian grids. module: backend Related to analysis/backend subpackages
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants