Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuDF numba cuda 12 updates #13337

Merged

Conversation

brandon-b-miller
Copy link
Contributor

@brandon-b-miller brandon-b-miller commented May 11, 2023

Summary of changes:

  • Removed some old code that is only used for numba<0.54 which hasn't been supported for a while now.
  • Removed some old code that is only used when cubinlinker is not present, which is has been a hard requirement for a while now as well.
  • Created a file _numba.py and moved into this file all of the machinery used to configure numba upon cuDF import. This includes functions for determining which toolkit version was used to build the PTX file our UDFs rely on as well as the functions for potentially putting numba into MVC mode if necessary.
  • Created a file _ptxcompiler.py which vendors the driver/runtime version checking machinery from ptxcompiler in case we're in a cuda 12 environment that doesn't have it
  • Changed the code to issue a warning in cuda 12+ MVC situations that the library will likely not work
  • The version of the toolkit used to determine if MVC is required is now determined from the cc=60 PTX file which is always built. This is to avoid needing to query the device compute capability through numba's cuda module. This needs to be avoided during numba's setup because if numba.cuda is imported before numba's config is modified, the config options will have no effect.

Closes #13351
Closes #13339

@brandon-b-miller brandon-b-miller added feature request New feature or request numba Numba issue non-breaking Non-breaking change python labels May 11, 2023
@brandon-b-miller brandon-b-miller self-assigned this May 11, 2023
@github-actions github-actions bot added the Python Affects Python cuDF API. label May 11, 2023

from numba import config

ANY_PTX_FILE = os.path.dirname(__file__) + "/../core/udf/shim_60.ptx"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally there's a better way of doing this rather than relying on a relative path.

Copy link
Contributor

@bdice bdice May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A relative path is appropriate but use os.path.join and not string concatenation. You could also import cudf or a neighboring module and compute the path relative to that?

# cc=60 ptx is always built
cc = int(os.environ.get("STRINGS_UDF_CC", "60"))
else:
from numba import cuda
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime imports can be expensive but importing cuda must be avoided as a side effect of importing the module in which this function resides. Another option could be passing the cuda module as an optional argument here.

@brandon-b-miller brandon-b-miller marked this pull request as ready for review May 15, 2023 12:40
@brandon-b-miller brandon-b-miller requested review from a team as code owners May 15, 2023 12:40
@brandon-b-miller brandon-b-miller requested review from vyasr and shwina May 15, 2023 12:40
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. I have attached comments.

dependencies.yaml Show resolved Hide resolved
python/cudf/cudf/__init__.py Outdated Show resolved Hide resolved

from numba import config

ANY_PTX_FILE = os.path.dirname(__file__) + "/../core/udf/shim_60.ptx"
Copy link
Contributor

@bdice bdice May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A relative path is appropriate but use os.path.join and not string concatenation. You could also import cudf or a neighboring module and compute the path relative to that?

python/cudf/cudf/utils/_numba_setup.py Outdated Show resolved Hide resolved
python/cudf/cudf/utils/_numba_setup.py Outdated Show resolved Hide resolved
sm_number = file_name.rstrip(".ptx").lstrip(prefix)
if sm_number.endswith("a"):
processed_sm_number = int(sm_number.rstrip("a"))
if processed_sm_number == cc:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m confused why we follow a different logical path for compute capabilities ending in “a.” They’re not fundamentally different, just differently named.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, this logic accounts for a multitude of different build configurations for both CI and local builds from source. I am happy to follow up on this but given that this function is only being moved I'm hesitant to anything that could perturb it as part of this PR.

python/cudf/cudf/utils/_numba_setup.py Outdated Show resolved Hide resolved
driver_version, runtime_version, ptx_toolkit_version
):
# Numba thinks cubinlinker is only needed if the driver is older than
# the ctk, but when PTX files are present, it might also need to patch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanting to be precise in our comments here. Do we specifically mean the runtime rather than the “toolkit”?

Suggested change
# the ctk, but when PTX files are present, it might also need to patch
# the CUDA runtime, but when PTX files are present, it might also need to patch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The toolkit component of concern is NVVM. We only use the runtime to check what version NVVM is (assuming if we find the runtime, that it's the same version as NVVM) because NVVM provides no way to check what version it is. I find it less confusing to refer to "the toolkit" as it's the collection of components including the runtime and NVVM, amongst other things.

python/cudf/cudf/utils/_numba_setup.py Outdated Show resolved Hide resolved
"7.6": (11, 6),
"7.7": (11, 7),
"7.8": (11, 8),
"8.0": (12, 0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to expand this to include 12.1? Users could run CUDA 12 pip wheels on 12.1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we try to fall back to parsing the line “Cuda compilation tools, release 11.6” if this mapping fails? I know it might be less safe than the current solution but I wish we could use that to eliminate the need for a specific mapping that must be updated with CUDA releases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this sentiment. Similarly to above however I think it's better to tackle this particular update separately as this function is a move. That said the change should be simple and I can commit to having it in for this release.

@bdice bdice mentioned this pull request May 22, 2023
python/cudf/cudf/__init__.py Outdated Show resolved Hide resolved
python/cudf/cudf/__init__.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/udf/utils.py Outdated Show resolved Hide resolved
CC_60_PTX_FILE = os.path.dirname(__file__) + "/../core/udf/shim_60.ptx"
_NO_DRIVER = (math.inf, math.inf)

CMD = """\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a descriptive name.

Suggested change
CMD = """\
CHECK_CUDA_VERSION_CMD = """\

python/cudf/cudf/utils/_numba.py Show resolved Hide resolved
python/cudf/cudf/utils/_numba.py Show resolved Hide resolved
This function is mostly vendored from ptxcompiler and is used
to check the system CUDA driver and runtime versions in its absence.
"""
cp = subprocess.run([sys.executable, "-c", CMD], capture_output=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ptxcompiler use a subprocess currently? I am surprised by this because this tends to be costly. If possible, we should avoid launching a subprocess at cudf import time.

Copy link
Contributor

@bdice bdice May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to use any other tool (cuda-python, rmm, ...) to know this version information? I assume the goal of using a subprocess is to avoid importing numba.cuda in this process...

Copy link
Contributor Author

@brandon-b-miller brandon-b-miller May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ptxcompiler uses a subprocess. This command is vendored directly from it here. While doing all of our setup before importing numba.cuda is one goal, another requirement is that the process for getting the versions doesn't cuinit, otherwise it can interfere with the way dask initializes its networking. More generally we need to avoid cuinit fully during import.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside it would be nice if we had some kind of small independent package we could list as a dependency that served as a source of truth for the driver and runtime versions, which we could also configure to avoid forking a subprocess, calling cuinit, etc. This would avoid a bunch of reinventing the wheel across cuda-python, numba, rmm, etc....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe cuda-python is supposed to be just this, but in practice we’ve seen various issues (not providing the right runtime version, among others). I am not sure but recent cuda-python versions may be better at this?

Copy link
Contributor Author

@brandon-b-miller brandon-b-miller May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried putting a

from cuda import cuda, cudart
cuda.cuDriverGetVersion()
cudart.cudaRuntimeGetVersion()

At the top of cudf's __init__.py and ran test_no_cuinit.py, which passed. So I suppose it's a clean approach from the cuInit perspective. Happy to refactor as such here and see if things work.

RMM does seem to list some kind of "limitation" here and falls back to numba, but I'm not sure what it means exactly
https://github.com/rapidsai/rmm/blob/branch-23.06/python/rmm/_cuda/gpu.py#L81-L85

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this is referring to issue ( NVIDIA/cuda-python#16 )

Namely the version cuda-python returns is hard-coded at build time as opposed to querying it at run time. So the version it returns may not reflect what the user has installed on their system (and will instead reflect how the binary was built). The issue explains why this is happening currently (though it still presents an issue with intended usage)

There's more details in the threads in PR ( rapidsai/rmm#946 ) as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To close the loop on this I decided in 5cb0ce6 to just vendor the three functions to _ptxcompiler.py.

To summarize the way it works now, the user either has ptxcompiler or doesn't. If they do, use it to get the versions (this launches a subprocess). If they don't, use the vendored functions to do so (this also launches the same subprocess using the same command).

We are constrained to use a subprocess for three reasons:

  1. We can not call cuinit on cudf's import without interfering with dask.
  2. We can not directly use numba.cuda to obtain the versions because we need to finish configuring numba before numba.cuda is imported.
  3. We can not utilize cuda.cuda due to constraint described above

Even with the approach above we need to leave in place a mechanism to disable the spawning of a subprocess in HPC environments where it is not safe to do so. In this case the user may provide an environment variable which disables the check and two more environment variables specifying the driver and runtime versions manually, and these are parsed and returned instead. These environment variables are the same ones, between ptxcompiler and our vendored version.

@brandon-b-miller
Copy link
Contributor Author

I moved a little of the logic around here while vendoring the few extra pieces of ptxcompiler I think we need to keep things safe to _ptxcompiler.py. I think the way the logic reads is a lot simpler now and I'm hoping one more round of review should be enough to merge this.


from numba import config

CC_60_PTX_FILE = os.path.dirname(__file__) + "/../core/udf/shim_60.ptx"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to use os.path.join.

Copy link
Member

@ajschmidt8 ajschmidt8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving ops-codeowner file changes


NO_DRIVER = (math.inf, math.inf)

CMD = """\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name this a proper name? NUMBA_CHECK_VERSION_CMD or similar.

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me. I discussed with @brandon-b-miller and I feel like I understand more of the constraints. I am not sure that we have the best solution but I'm not sure how to improve it at this point.

I requested that we run a manual test, installing the CUDA 12 wheel from the CI artifacts on a system with driver 12.0 and a Docker image with runtime 12.1. This is supposed to raise a warning, I think, and I'd like to see that warning occur in manual testing (we don't have a CI configuration where this can be tested). @brandon-b-miller Please report back the result or let me know if you need help doing this.

@brandon-b-miller
Copy link
Contributor Author

brandon-b-miller commented May 23, 2023

Looks fine to me. I discussed with @brandon-b-miller and I feel like I understand more of the constraints. I am not sure that we have the best solution but I'm not sure how to improve it at this point.

I requested that we run a manual test, installing the CUDA 12 wheel from the CI artifacts on a system with driver 12.0 and a Docker image with runtime 12.1. This is supposed to raise a warning, I think, and I'd like to see that warning occur in manual testing (we don't have a CI configuration where this can be tested). @brandon-b-miller Please report back the result or let me know if you need help doing this.

I have tested this with a bare-metal machine with

NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0

And the 12.1 toolkit, verified with nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

With this I obtain the warning I expect

UserWarning: Using CUDA toolkit version (12, 1) with CUDA driver version (12, 0) requires minor version compatibility, which is not yet supported for CUDA driver versions 12.0 and above. It is likely that many cuDF operations will not work in this state. Please install CUDA toolkit version (12, 0) to continue using cuDF

As well as the error I expect when performing an operation that requires a numba kernel such as

>>> cudf.Series([1,2,3])[0]

From which I get

ptxas application ptx input, line 9; fatal   : Unsupported .version 8.1; current version is '8.0'

So things check out from my end.

Copy link
Contributor

@gmarkall gmarkall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. The vendoring of small parts of the ptxcompiler patch module look fine.

@brandon-b-miller
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 12acf92 into rapidsai:branch-23.06 May 23, 2023
@shwina
Copy link
Contributor

shwina commented May 23, 2023

/merge

@jakirkham
Copy link
Member

Thanks all! 🙏

Really impressive work! 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request non-breaking Non-breaking change numba Numba issue Python Affects Python cuDF API.
Projects
None yet
8 participants