Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable third party library integration tests in CI with cudf.pandas #17936

Merged
merged 23 commits into from
Feb 10, 2025

Conversation

galipremsagar
Copy link
Contributor

@galipremsagar galipremsagar commented Feb 6, 2025

Description

This PR enables 3rd party library integration tests that are run with cudf.pandas enabled.

Fixes: rapidsai/cuml#6301

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@galipremsagar galipremsagar added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 6, 2025
@galipremsagar galipremsagar self-assigned this Feb 6, 2025
@galipremsagar galipremsagar requested a review from a team as a code owner February 6, 2025 20:22
Copy link
Contributor

@Matt711 Matt711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this ensure that the tests are run with with the changes to cudf in the current PR? I'm not sure it does.

@vyasr
Copy link
Contributor

vyasr commented Feb 6, 2025

Good call Matt. We will probably have to update https://github.com/rapidsai/cudf/blob/branch-25.04/ci/cudf_pandas_scripts/third-party-integration/test.sh to support downloading CI artifacts like in other jobs.

@galipremsagar
Copy link
Contributor Author

/okay to test

@galipremsagar galipremsagar requested a review from a team as a code owner February 7, 2025 15:21
@github-actions github-actions bot added Python Affects Python cuDF API. cudf.pandas Issues specific to cudf.pandas labels Feb 7, 2025
Copy link
Contributor

@Matt711 Matt711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Prem! This looks good to me once CI passes

Comment on lines +1784 to +1840
def wrap_from_pandas(original_call):
@functools.wraps(original_call)
def wrapped_from_pandas(obj, *args, **kwargs):
if is_proxy_object(obj):
obj = obj.as_gpu_object()
return obj
return original_call(obj, *args, **kwargs)

return wrapped_from_pandas


def wrap_from_pandas_dataframe(original_call):
@functools.wraps(original_call)
def wrapped_from_pandas_dataframe(dataframe, *args, **kwargs):
if is_proxy_object(dataframe):
dataframe = dataframe.as_gpu_object()
if isinstance(dataframe, cudf.DataFrame):
return dataframe
return original_call(dataframe, *args, **kwargs)

return wrapped_from_pandas_dataframe


def wrap_from_pandas_series(original_call):
@functools.wraps(original_call)
def wrapped_from_pandas_series(s, *args, **kwargs):
if is_proxy_object(s):
s = s.as_gpu_object()
if isinstance(s, cudf.Series):
return s
return original_call(s, *args, **kwargs)

return wrapped_from_pandas_series


def wrap_from_pandas_index(original_call):
@functools.wraps(original_call)
def wrapped_from_pandas_index(index, *args, **kwargs):
if is_proxy_object(index):
index = index.as_gpu_object()
if isinstance(index, cudf.core.index.BaseIndex):
return index
return original_call(index, *args, **kwargs)

return wrapped_from_pandas_index


def wrap_from_pandas_multiindex(original_call):
@functools.wraps(original_call)
def wrapped_from_pandas_multiindex(multiindex, *args, **kwargs):
if is_proxy_object(multiindex):
multiindex = multiindex.as_gpu_object()
if isinstance(multiindex, cudf.MultiIndex):
return multiindex
return original_call(multiindex, *args, **kwargs)

return wrapped_from_pandas_multiindex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this work?

def wrap_from_pandas(original_call, type_=None):
    @functools.wraps(original_call)
    def wrapped(obj, *args, **kwargs)
        if is_proxy_object(obj):
            obj = obj.as_gpu_object()
            if (type_ is None) or isinstance(obj, type_):
                return obj
        return original_call(obj, *args, **kwargs)
    return wrapped

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem will arise if some passed keywords like cudf.DataFrame.from_pandas(dataframe=df) , cudf.Series.from_pandas(s=ser). Your suggestion was the way I initially wanted to go by but all the functions have different keyword argument names so I had to separate out the implementations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could presumably handle that in kwargs in the unified wrapper? This isn't a blocker btw

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That approach might get very complicated with a bunch of if/elif's to handle all possible combinations of each constructor.

Comment on lines +1878 to +1890
cudf.from_pandas = wrap_from_pandas(_original_from_pandas)
cudf.DataFrame.from_pandas = wrap_from_pandas_dataframe(
_original_DataFrame_from_pandas
)
cudf.Series.from_pandas = wrap_from_pandas_series(
_original_Series_from_pandas
)
cudf.BaseIndex.from_pandas = wrap_from_pandas_index(
_original_Index_from_pandas
)
cudf.MultiIndex.from_pandas = wrap_from_pandas_multiindex(
_original_MultiIndex_from_pandas
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then use functools.partial

Suggested change
cudf.from_pandas = wrap_from_pandas(_original_from_pandas)
cudf.DataFrame.from_pandas = wrap_from_pandas_dataframe(
_original_DataFrame_from_pandas
)
cudf.Series.from_pandas = wrap_from_pandas_series(
_original_Series_from_pandas
)
cudf.BaseIndex.from_pandas = wrap_from_pandas_index(
_original_Index_from_pandas
)
cudf.MultiIndex.from_pandas = wrap_from_pandas_multiindex(
_original_MultiIndex_from_pandas
)
cudf.from_pandas = partial(wrap_from_pandas, type=None)
cudf.DataFrame.from_pandas = partial(wrap_from_pandas, type_=cudf.DataFrame)
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as: #17936 (comment)

def wrap_from_pandas_dataframe(original_call):
@functools.wraps(original_call)
def wrapped_from_pandas_dataframe(dataframe, *args, **kwargs):
if is_proxy_object(dataframe):
Copy link
Contributor

@mroeschke mroeschke Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For wrapping these classmethods, wouldn't dataframe be cls here? We want to be extracting cudf.pandas objects which might be in args or kwargs?

It would be nice to match the method signatures exactly as well if possible

Copy link
Contributor Author

@galipremsagar galipremsagar Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, should we plan on deprecating the arguments to name them in a unified fashion? or will that be too much noise for the warnings and rather should it just be a breaking change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually just meant making the wrapped function signature here match the <obj>.from_pandas signature e.g.

# nan_as_null=False matches what would happen when evaluating the mode.pandas_compatible setting in cudf.pandas 
def wrapped_from_pandas_dataframe(cls, dataframe, nan_as_null=False):
    ...

As for the API change, I suppose we could do that as a breaking change, but I don't think we absolutely need to align the signatures of from_pandas

@galipremsagar
Copy link
Contributor Author

Hey @galipremsagar can we just add the CI changes in this PR and skip any of the failing tests? We can open targeted PRs to fix the remaining tests afterwards.

Yes, on it.

@galipremsagar galipremsagar requested a review from a team as a code owner February 9, 2025 22:51
@galipremsagar
Copy link
Contributor Author

@Matt711 The pr is now ready for review. Can you sign off on it?

Copy link
Contributor

@Matt711 Matt711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I have a blocking question about the stumpy failures because they seem new to me.

@@ -30,13 +30,15 @@ def dask_client():
yield dask_client


@pytest.mark.skip(reason="TODO: Fix these stumpy tests to work with dask")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen these stumpy failures before. Is the monkey patching in this PR causing these failures?

Copy link
Contributor

@Matt711 Matt711 Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind checking if you haven't? It it is, I'd rather xfail the cugraph tests in this PR (which I think are passing now). And follow-up and add the monkey patching in a way that the integration tests pass (cugraph and stumpy).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong about the monkeypatching causing the stumpy failures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not being caused by monkey-patching from_pandas in this PR. The stumpy distributed failures seem to be related to some ptx failures. I'm investigating it, but I didn't want them to block this PR. Let's proceed with this PR and then tackle the stumpy failures separately?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, thanks for looking into it!

@@ -262,7 +262,7 @@ dependencies:
packages:
- pip
- pip:
- ibis-framework[pandas]
- ibis-framework[pandas]<10.0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now, I'll remove it in #17945

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was my intention behind this change.

Copy link
Contributor

@Matt711 Matt711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @galipremsagar! Matt R had a comment about making the classmethods match the signatures of the methods they're patching. But I think that can be added in a follow up.

@galipremsagar
Copy link
Contributor Author

Thanks @galipremsagar! Matt R had a comment about making the classmethods match the signatures of the methods they're patching. But I think that can be added in a follow up.

Responded here: #17936 (comment)

@galipremsagar
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 218d67d into rapidsai:branch-25.04 Feb 10, 2025
122 checks passed
@galipremsagar galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Feb 10, 2025
@Matt711 Matt711 mentioned this pull request Feb 10, 2025
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge cudf.pandas Issues specific to cudf.pandas improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Archived in project
5 participants