-
Notifications
You must be signed in to change notification settings - Fork 931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove cudf._lib.column in favor of pylibcudf. #17760
Remove cudf._lib.column in favor of pylibcudf. #17760
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving CMake. I did not review the Python code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small suggestions. I haven't diligently gone through every line but I think most of it is just Column.from_pylibcudf -> ColumnBase.from_pylibcudf
?
return cudf.core.column.build_column( # type: ignore[return-value] | ||
data=self.data, | ||
dtype=self.dtype, | ||
mask=mask, | ||
size=self.size, | ||
offset=0, | ||
children=self.children, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: (and possibly this is just moving code so it was already there before). What happens if the previous column offset was > 0
. Does that not mean the data
buffer pointer is now in the wrong place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this was part of just moving code, but the docstring claims
Replaces the mask buffer of the column and returns a new column. This will zero the column offset, compute a new mask buffer if necessary, and compute new data Buffers zero-copy that use pointer arithmetic to properly adjust the pointer.
Last changed in #4057
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
python/cudf/cudf/utils/dtypes.py
Outdated
@@ -628,6 +628,35 @@ def dtype_to_pylibcudf_type(dtype) -> plc.DataType: | |||
return plc.DataType(SUPPORTED_NUMPY_TO_PYLIBCUDF_TYPES[dtype]) | |||
|
|||
|
|||
def dtype_from_pylibcudf_column(col: plc.Column): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: type for the return type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Done
cpdef Column column_from_column_view(Column col): | ||
""" | ||
Return a new Column from a Column.view(). | ||
""" | ||
return Column.from_libcudf(move(make_unique[column](col.view()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need this one when you can just call col.column_from_self_view
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops this was a remnant when I thought this should have been a free function. Removed
cpdef Column column_from_self_view(self): | ||
"""Return a new column from self.view().""" | ||
return Column.from_libcudf(move(make_unique[column](self.view()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: shall we call this what it is? cpdef Column copy(self)
? Or possibly deepcopy
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow, this made me realize pylibcudf Column already has a copy
method. Thanks I was able to not reinvent the wheel here then
@mroeschke I cancelled the most recent workflow to free up resources to unblock all of cudf CI for this PR: #17771 I'll rerun once #17771 is merged. |
Follow up to #1514 Addressing the review in #1514 (comment) by ensuring the Cython bindings return pylibcudf objects where possible instead of python objects (like cudf Python columns) so we can also have better return annotations for these methods where appropriate. Subsequently, the conversion from pylibcudf to cudf Python has been moved to the cuspatial Python layer. Additionally, this PR will be necessary for rapidsai/cudf#17760 not to break cuspatial Authors: - Matthew Roeschke (https://github.com/mroeschke) - Michael Wang (https://github.com/isVoid) - Bradley Dice (https://github.com/bdice) Approvers: - Michael Wang (https://github.com/isVoid) - Bradley Dice (https://github.com/bdice) URL: #1523
@wence- could use another review when you have a moment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mroeschke
# ruff does not identify that there's a relative import in use | ||
from cudf.core.column import ColumnBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an issue we can report?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm interesting I changed this back to a runtime import in the function and as a typing import and ruff isn't complaining anymore. I might have added this comment in a very early commit where this was failing
return cudf.core.column.build_column( # type: ignore[return-value] | ||
data=self.data, | ||
dtype=self.dtype, | ||
mask=mask, | ||
size=self.size, | ||
offset=0, | ||
children=self.children, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
/merge |
78e59c9
into
rapidsai:branch-25.04
Description
Removes
cudf._lib.column.Column
and moves its methods and attributes tocolumn.core.column.ColumnBase
cudf.core._internals
needed to start returningpylibcudf.Column
s to avoid circular importscloses #17317
Checklist