-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuDF numba cuda 12 updates #13337
cuDF numba cuda 12 updates #13337
Conversation
|
||
from numba import config | ||
|
||
ANY_PTX_FILE = os.path.dirname(__file__) + "/../core/udf/shim_60.ptx" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally there's a better way of doing this rather than relying on a relative path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A relative path is appropriate but use os.path.join
and not string concatenation. You could also import cudf or a neighboring module and compute the path relative to that?
# cc=60 ptx is always built | ||
cc = int(os.environ.get("STRINGS_UDF_CC", "60")) | ||
else: | ||
from numba import cuda |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Runtime imports can be expensive but importing cuda
must be avoided as a side effect of importing the module in which this function resides. Another option could be passing the cuda
module as an optional argument here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. I have attached comments.
|
||
from numba import config | ||
|
||
ANY_PTX_FILE = os.path.dirname(__file__) + "/../core/udf/shim_60.ptx" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A relative path is appropriate but use os.path.join
and not string concatenation. You could also import cudf or a neighboring module and compute the path relative to that?
sm_number = file_name.rstrip(".ptx").lstrip(prefix) | ||
if sm_number.endswith("a"): | ||
processed_sm_number = int(sm_number.rstrip("a")) | ||
if processed_sm_number == cc: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m confused why we follow a different logical path for compute capabilities ending in “a.” They’re not fundamentally different, just differently named.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, this logic accounts for a multitude of different build configurations for both CI and local builds from source. I am happy to follow up on this but given that this function is only being moved I'm hesitant to anything that could perturb it as part of this PR.
driver_version, runtime_version, ptx_toolkit_version | ||
): | ||
# Numba thinks cubinlinker is only needed if the driver is older than | ||
# the ctk, but when PTX files are present, it might also need to patch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanting to be precise in our comments here. Do we specifically mean the runtime rather than the “toolkit”?
# the ctk, but when PTX files are present, it might also need to patch | |
# the CUDA runtime, but when PTX files are present, it might also need to patch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The toolkit component of concern is NVVM. We only use the runtime to check what version NVVM is (assuming if we find the runtime, that it's the same version as NVVM) because NVVM provides no way to check what version it is. I find it less confusing to refer to "the toolkit" as it's the collection of components including the runtime and NVVM, amongst other things.
"7.6": (11, 6), | ||
"7.7": (11, 7), | ||
"7.8": (11, 8), | ||
"8.0": (12, 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to expand this to include 12.1? Users could run CUDA 12 pip wheels on 12.1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we try to fall back to parsing the line “Cuda compilation tools, release 11.6” if this mapping fails? I know it might be less safe than the current solution but I wish we could use that to eliminate the need for a specific mapping that must be updated with CUDA releases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this sentiment. Similarly to above however I think it's better to tackle this particular update separately as this function is a move. That said the change should be simple and I can commit to having it in for this release.
Co-authored-by: Bradley Dice <[email protected]>
…er/cudf into cudf-numba-cuda12-updates
python/cudf/cudf/utils/_numba.py
Outdated
CC_60_PTX_FILE = os.path.dirname(__file__) + "/../core/udf/shim_60.ptx" | ||
_NO_DRIVER = (math.inf, math.inf) | ||
|
||
CMD = """\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a descriptive name.
CMD = """\ | |
CHECK_CUDA_VERSION_CMD = """\ |
python/cudf/cudf/utils/_numba.py
Outdated
This function is mostly vendored from ptxcompiler and is used | ||
to check the system CUDA driver and runtime versions in its absence. | ||
""" | ||
cp = subprocess.run([sys.executable, "-c", CMD], capture_output=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does ptxcompiler use a subprocess currently? I am surprised by this because this tends to be costly. If possible, we should avoid launching a subprocess at cudf import time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to use any other tool (cuda-python, rmm, ...) to know this version information? I assume the goal of using a subprocess is to avoid importing numba.cuda
in this process...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ptxcompiler
uses a subprocess. This command is vendored directly from it here. While doing all of our setup before importing numba.cuda
is one goal, another requirement is that the process for getting the versions doesn't cuinit
, otherwise it can interfere with the way dask initializes its networking. More generally we need to avoid cuinit
fully during import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside it would be nice if we had some kind of small independent package we could list as a dependency that served as a source of truth for the driver and runtime versions, which we could also configure to avoid forking a subprocess, calling cuinit, etc. This would avoid a bunch of reinventing the wheel across cuda-python, numba, rmm, etc....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe cuda-python is supposed to be just this, but in practice we’ve seen various issues (not providing the right runtime version, among others). I am not sure but recent cuda-python versions may be better at this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried putting a
from cuda import cuda, cudart
cuda.cuDriverGetVersion()
cudart.cudaRuntimeGetVersion()
At the top of cudf
's __init__.py
and ran test_no_cuinit.py
, which passed. So I suppose it's a clean approach from the cuInit
perspective. Happy to refactor as such here and see if things work.
RMM does seem to list some kind of "limitation" here and falls back to numba, but I'm not sure what it means exactly
https://github.com/rapidsai/rmm/blob/branch-23.06/python/rmm/_cuda/gpu.py#L81-L85
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think this is referring to issue ( NVIDIA/cuda-python#16 )
Namely the version cuda-python returns is hard-coded at build time as opposed to querying it at run time. So the version it returns may not reflect what the user has installed on their system (and will instead reflect how the binary was built). The issue explains why this is happening currently (though it still presents an issue with intended usage)
There's more details in the threads in PR ( rapidsai/rmm#946 ) as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To close the loop on this I decided in 5cb0ce6 to just vendor the three functions to _ptxcompiler.py
.
To summarize the way it works now, the user either has ptxcompiler
or doesn't. If they do, use it to get the versions (this launches a subprocess). If they don't, use the vendored functions to do so (this also launches the same subprocess using the same command).
We are constrained to use a subprocess for three reasons:
- We can not call
cuinit
oncudf
's import without interfering with dask. - We can not directly use
numba.cuda
to obtain the versions because we need to finish configuring numba beforenumba.cuda
is imported. - We can not utilize
cuda.cuda
due to constraint described above
Even with the approach above we need to leave in place a mechanism to disable the spawning of a subprocess in HPC environments where it is not safe to do so. In this case the user may provide an environment variable which disables the check and two more environment variables specifying the driver and runtime versions manually, and these are parsed and returned instead. These environment variables are the same ones, between ptxcompiler
and our vendored version.
I moved a little of the logic around here while vendoring the few extra pieces of ptxcompiler I think we need to keep things safe to |
Co-authored-by: Bradley Dice <[email protected]> Co-authored-by: Graham Markall <[email protected]>
python/cudf/cudf/utils/_numba.py
Outdated
|
||
from numba import config | ||
|
||
CC_60_PTX_FILE = os.path.dirname(__file__) + "/../core/udf/shim_60.ptx" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to use os.path.join
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving ops-codeowner
file changes
|
||
NO_DRIVER = (math.inf, math.inf) | ||
|
||
CMD = """\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we name this a proper name? NUMBA_CHECK_VERSION_CMD
or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me. I discussed with @brandon-b-miller and I feel like I understand more of the constraints. I am not sure that we have the best solution but I'm not sure how to improve it at this point.
I requested that we run a manual test, installing the CUDA 12 wheel from the CI artifacts on a system with driver 12.0 and a Docker image with runtime 12.1. This is supposed to raise a warning, I think, and I'd like to see that warning occur in manual testing (we don't have a CI configuration where this can be tested). @brandon-b-miller Please report back the result or let me know if you need help doing this.
I have tested this with a bare-metal machine with
And the 12.1 toolkit, verified with
With this I obtain the warning I expect
As well as the error I expect when performing an operation that requires a numba kernel such as >>> cudf.Series([1,2,3])[0] From which I get
So things check out from my end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. The vendoring of small parts of the ptxcompiler patch module look fine.
/merge |
/merge |
Thanks all! 🙏 Really impressive work! 👏 |
Summary of changes:
numba<0.54
which hasn't been supported for a while now.cubinlinker
is not present, which is has been a hard requirement for a while now as well._numba.py
and moved into this file all of the machinery used to configure numba upon cuDF import. This includes functions for determining which toolkit version was used to build the PTX file our UDFs rely on as well as the functions for potentially putting numba into MVC mode if necessary._ptxcompiler.py
which vendors the driver/runtime version checking machinery from ptxcompiler in case we're in a cuda 12 environment that doesn't have itcc=60
PTX file which is always built. This is to avoid needing to query the device compute capability through numba'scuda
module. This needs to be avoided during numba's setup because ifnumba.cuda
is imported before numba's config is modified, the config options will have no effect.Closes #13351
Closes #13339