Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove cudf._lib.column in favor of pylibcudf. #17760

Merged
merged 28 commits into from
Feb 19, 2025

Conversation

mroeschke
Copy link
Contributor

@mroeschke mroeschke commented Jan 17, 2025

Description

Removes cudf._lib.column.Column and moves its methods and attributes to column.core.column.ColumnBase

closes #17317

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke added Python Affects Python cuDF API. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 17, 2025
@mroeschke mroeschke self-assigned this Jan 17, 2025
@mroeschke mroeschke requested review from a team as code owners January 17, 2025 03:18
@github-actions github-actions bot added CMake CMake build issue pylibcudf Issues specific to the pylibcudf package labels Jan 17, 2025
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving CMake. I did not review the Python code.

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small suggestions. I haven't diligently gone through every line but I think most of it is just Column.from_pylibcudf -> ColumnBase.from_pylibcudf?

Comment on lines +311 to +318
return cudf.core.column.build_column( # type: ignore[return-value]
data=self.data,
dtype=self.dtype,
mask=mask,
size=self.size,
offset=0,
children=self.children,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: (and possibly this is just moving code so it was already there before). What happens if the previous column offset was > 0. Does that not mean the data buffer pointer is now in the wrong place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this was part of just moving code, but the docstring claims

Replaces the mask buffer of the column and returns a new column. This will zero the column offset, compute a new mask buffer if necessary, and compute new data Buffers zero-copy that use pointer arithmetic to properly adjust the pointer.

Last changed in #4057

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

python/cudf/cudf/core/column/column.py Show resolved Hide resolved
@@ -628,6 +628,35 @@ def dtype_to_pylibcudf_type(dtype) -> plc.DataType:
return plc.DataType(SUPPORTED_NUMPY_TO_PYLIBCUDF_TYPES[dtype])


def dtype_from_pylibcudf_column(col: plc.Column):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: type for the return type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done

Comment on lines 472 to 476
cpdef Column column_from_column_view(Column col):
"""
Return a new Column from a Column.view().
"""
return Column.from_libcudf(move(make_unique[column](col.view())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this one when you can just call col.column_from_self_view?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops this was a remnant when I thought this should have been a free function. Removed

Comment on lines 176 to 178
cpdef Column column_from_self_view(self):
"""Return a new column from self.view()."""
return Column.from_libcudf(move(make_unique[column](self.view())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: shall we call this what it is? cpdef Column copy(self) ? Or possibly deepcopy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, this made me realize pylibcudf Column already has a copy method. Thanks I was able to not reinvent the wheel here then

@github-actions github-actions bot removed the pylibcudf Issues specific to the pylibcudf package label Jan 21, 2025
@galipremsagar
Copy link
Contributor

@mroeschke I cancelled the most recent workflow to free up resources to unblock all of cudf CI for this PR: #17771

I'll rerun once #17771 is merged.

@vyasr vyasr changed the base branch from branch-25.02 to branch-25.04 January 31, 2025 19:36
rapids-bot bot pushed a commit to rapidsai/cuspatial that referenced this pull request Feb 3, 2025
Follow up to #1514

Addressing the review in #1514 (comment) by ensuring the Cython bindings return pylibcudf objects where possible instead of python objects (like cudf Python columns) so we can also have better return annotations for these methods where appropriate. Subsequently, the conversion from pylibcudf to cudf Python has been moved to the cuspatial Python layer.

Additionally, this PR will be necessary for rapidsai/cudf#17760 not to break cuspatial

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)
  - Michael Wang (https://github.com/isVoid)
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Michael Wang (https://github.com/isVoid)
  - Bradley Dice (https://github.com/bdice)

URL: #1523
@mroeschke mroeschke requested a review from wence- February 3, 2025 22:49
@mroeschke
Copy link
Contributor Author

@wence- could use another review when you have a moment.

Copy link
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mroeschke

Comment on lines 13 to 14
# ruff does not identify that there's a relative import in use
from cudf.core.column import ColumnBase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an issue we can report?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm interesting I changed this back to a runtime import in the function and as a typing import and ruff isn't complaining anymore. I might have added this comment in a very early commit where this was failing

Comment on lines +311 to +318
return cudf.core.column.build_column( # type: ignore[return-value]
data=self.data,
dtype=self.dtype,
mask=mask,
size=self.size,
offset=0,
children=self.children,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

python/cudf/cudf/core/column/column.py Show resolved Hide resolved
@mroeschke
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 78e59c9 into rapidsai:branch-25.04 Feb 19, 2025
108 of 109 checks passed
@mroeschke mroeschke deleted the cudf/_lib/column branch February 19, 2025 02:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake CMake build issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[FEA] Remove cudf Cython bindings (cudf._lib) in favor of pylibcudf
5 participants