-
Notifications
You must be signed in to change notification settings - Fork 942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable third party library integration tests in CI with cudf.pandas
#17936
Enable third party library integration tests in CI with cudf.pandas
#17936
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this ensure that the tests are run with with the changes to cudf in the current PR? I'm not sure it does.
Good call Matt. We will probably have to update https://github.com/rapidsai/cudf/blob/branch-25.04/ci/cudf_pandas_scripts/third-party-integration/test.sh to support downloading CI artifacts like in other jobs. |
/okay to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Prem! This looks good to me once CI passes
def wrap_from_pandas(original_call): | ||
@functools.wraps(original_call) | ||
def wrapped_from_pandas(obj, *args, **kwargs): | ||
if is_proxy_object(obj): | ||
obj = obj.as_gpu_object() | ||
return obj | ||
return original_call(obj, *args, **kwargs) | ||
|
||
return wrapped_from_pandas | ||
|
||
|
||
def wrap_from_pandas_dataframe(original_call): | ||
@functools.wraps(original_call) | ||
def wrapped_from_pandas_dataframe(dataframe, *args, **kwargs): | ||
if is_proxy_object(dataframe): | ||
dataframe = dataframe.as_gpu_object() | ||
if isinstance(dataframe, cudf.DataFrame): | ||
return dataframe | ||
return original_call(dataframe, *args, **kwargs) | ||
|
||
return wrapped_from_pandas_dataframe | ||
|
||
|
||
def wrap_from_pandas_series(original_call): | ||
@functools.wraps(original_call) | ||
def wrapped_from_pandas_series(s, *args, **kwargs): | ||
if is_proxy_object(s): | ||
s = s.as_gpu_object() | ||
if isinstance(s, cudf.Series): | ||
return s | ||
return original_call(s, *args, **kwargs) | ||
|
||
return wrapped_from_pandas_series | ||
|
||
|
||
def wrap_from_pandas_index(original_call): | ||
@functools.wraps(original_call) | ||
def wrapped_from_pandas_index(index, *args, **kwargs): | ||
if is_proxy_object(index): | ||
index = index.as_gpu_object() | ||
if isinstance(index, cudf.core.index.BaseIndex): | ||
return index | ||
return original_call(index, *args, **kwargs) | ||
|
||
return wrapped_from_pandas_index | ||
|
||
|
||
def wrap_from_pandas_multiindex(original_call): | ||
@functools.wraps(original_call) | ||
def wrapped_from_pandas_multiindex(multiindex, *args, **kwargs): | ||
if is_proxy_object(multiindex): | ||
multiindex = multiindex.as_gpu_object() | ||
if isinstance(multiindex, cudf.MultiIndex): | ||
return multiindex | ||
return original_call(multiindex, *args, **kwargs) | ||
|
||
return wrapped_from_pandas_multiindex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this work?
def wrap_from_pandas(original_call, type_=None):
@functools.wraps(original_call)
def wrapped(obj, *args, **kwargs)
if is_proxy_object(obj):
obj = obj.as_gpu_object()
if (type_ is None) or isinstance(obj, type_):
return obj
return original_call(obj, *args, **kwargs)
return wrapped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem will arise if some passed keywords like cudf.DataFrame.from_pandas(dataframe=df)
, cudf.Series.from_pandas(s=ser)
. Your suggestion was the way I initially wanted to go by but all the functions have different keyword argument names so I had to separate out the implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could presumably handle that in kwargs
in the unified wrapper? This isn't a blocker btw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That approach might get very complicated with a bunch of if/elif's to handle all possible combinations of each constructor.
cudf.from_pandas = wrap_from_pandas(_original_from_pandas) | ||
cudf.DataFrame.from_pandas = wrap_from_pandas_dataframe( | ||
_original_DataFrame_from_pandas | ||
) | ||
cudf.Series.from_pandas = wrap_from_pandas_series( | ||
_original_Series_from_pandas | ||
) | ||
cudf.BaseIndex.from_pandas = wrap_from_pandas_index( | ||
_original_Index_from_pandas | ||
) | ||
cudf.MultiIndex.from_pandas = wrap_from_pandas_multiindex( | ||
_original_MultiIndex_from_pandas | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then use functools.partial
cudf.from_pandas = wrap_from_pandas(_original_from_pandas) | |
cudf.DataFrame.from_pandas = wrap_from_pandas_dataframe( | |
_original_DataFrame_from_pandas | |
) | |
cudf.Series.from_pandas = wrap_from_pandas_series( | |
_original_Series_from_pandas | |
) | |
cudf.BaseIndex.from_pandas = wrap_from_pandas_index( | |
_original_Index_from_pandas | |
) | |
cudf.MultiIndex.from_pandas = wrap_from_pandas_multiindex( | |
_original_MultiIndex_from_pandas | |
) | |
cudf.from_pandas = partial(wrap_from_pandas, type=None) | |
cudf.DataFrame.from_pandas = partial(wrap_from_pandas, type_=cudf.DataFrame) | |
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as: #17936 (comment)
def wrap_from_pandas_dataframe(original_call): | ||
@functools.wraps(original_call) | ||
def wrapped_from_pandas_dataframe(dataframe, *args, **kwargs): | ||
if is_proxy_object(dataframe): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For wrapping these classmethod
s, wouldn't dataframe
be cls
here? We want to be extracting cudf.pandas
objects which might be in args
or kwargs
?
It would be nice to match the method signatures exactly as well if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, should we plan on deprecating the arguments to name them in a unified fashion? or will that be too much noise for the warnings and rather should it just be a breaking change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually just meant making the wrapped function signature here match the <obj>.from_pandas
signature e.g.
# nan_as_null=False matches what would happen when evaluating the mode.pandas_compatible setting in cudf.pandas
def wrapped_from_pandas_dataframe(cls, dataframe, nan_as_null=False):
...
As for the API change, I suppose we could do that as a breaking change, but I don't think we absolutely need to align the signatures of from_pandas
Yes, on it. |
ci/cudf_pandas_scripts/third-party-integration/run-library-tests.sh
Outdated
Show resolved
Hide resolved
ci/cudf_pandas_scripts/third-party-integration/run-library-tests.sh
Outdated
Show resolved
Hide resolved
@Matt711 The pr is now ready for review. Can you sign off on it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I have a blocking question about the stumpy failures because they seem new to me.
@@ -30,13 +30,15 @@ def dask_client(): | |||
yield dask_client | |||
|
|||
|
|||
@pytest.mark.skip(reason="TODO: Fix these stumpy tests to work with dask") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen these stumpy failures before. Is the monkey patching in this PR causing these failures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind checking if you haven't? It it is, I'd rather xfail the cugraph tests in this PR (which I think are passing now). And follow-up and add the monkey patching in a way that the integration tests pass (cugraph and stumpy).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be wrong about the monkeypatching causing the stumpy failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not being caused by monkey-patching from_pandas
in this PR. The stumpy distributed failures seem to be related to some ptx failures. I'm investigating it, but I didn't want them to block this PR. Let's proceed with this PR and then tackle the stumpy failures separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, thanks for looking into it!
@@ -262,7 +262,7 @@ dependencies: | |||
packages: | |||
- pip | |||
- pip: | |||
- ibis-framework[pandas] | |||
- ibis-framework[pandas]<10.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine for now, I'll remove it in #17945
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that was my intention behind this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @galipremsagar! Matt R had a comment about making the classmethods match the signatures of the methods they're patching. But I think that can be added in a follow up.
Responded here: #17936 (comment) |
/merge |
Description
This PR enables 3rd party library integration tests that are run with
cudf.pandas
enabled.Fixes: rapidsai/cuml#6301
Checklist