Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix scatter_by_map with spilling enabled #18095

Merged

Conversation

mroeschke
Copy link
Contributor

Description

closes #18088

Before the old Cython bindings of columns_split spill locked the conversion from libcudf to a cudf Python column. When I replaced these bindings, this spill locking was removed during the refactor.

I'm spot checking that other APIs are not affected. If so I can open PRs for those

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mroeschke mroeschke added bug Something isn't working Python Affects Python cuDF API. non-breaking Non-breaking change labels Feb 25, 2025
@mroeschke mroeschke self-assigned this Feb 25, 2025
@mroeschke mroeschke requested a review from a team as a code owner February 25, 2025 22:33
Copy link
Contributor

@Matt711 Matt711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! One non-blocking suggestion

Comment on lines +3311 to +3314
@acquire_spill_lock()
def split_from_pylibcudf(split: list[plc.Column]) -> list[ColumnBase]:
return [ColumnBase.from_pylibcudf(col) for col in split]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work?

Suggested change
@acquire_spill_lock()
def split_from_pylibcudf(split: list[plc.Column]) -> list[ColumnBase]:
return [ColumnBase.from_pylibcudf(col) for col in split]
with acquire_spill_lock()
columns = [ColumnBase.from_pylibcudf(col) for col in split]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split here is an iterator variable (from for split in columns_split) below.

So in order to do it like this I would have to do

with acquire_spill_lock():
    column = [[ColumnBase.from_pylibcudf(col) for col in split] for split in columns_split]

I could put the entire operation in the with acquire_spill_lock() block, but I would prefer tightly scope the operation that definitely needs the spill lock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh thanks I didn't see the for split in. Sounds good

return [
self._from_columns_like_self(
[ColumnBase.from_pylibcudf(col) for col in split],
split_from_pylibcudf(split),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
split_from_pylibcudf(split),
columns,

@mroeschke
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit d8b3d80 into rapidsai:branch-25.04 Feb 26, 2025
109 of 110 checks passed
@mroeschke mroeschke deleted the bug/spilling/scatter_by_map branch February 26, 2025 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

[BUG] cuDF spilling leads to errors in Dask-cuDF shuffle
2 participants