Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX-#5091: Handle pd.Grouper objects correctly #6174

Merged
merged 6 commits into from
May 22, 2023

Conversation

pyrito
Copy link
Collaborator

@pyrito pyrito commented May 19, 2023

What do these changes do?

This PR addresses two issues - it handles pd.Grouper objects correctly and also obeys the order of the groupby sorting keys as passed in.

  • first commit message and PR title follow format outlined here

    NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.

  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves BUG: KeyError for TimeGrouper with df.group_by() #5091
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date

@pyrito pyrito requested a review from a team as a code owner May 19, 2023 15:59
modin/pandas/test/test_groupby.py Outdated Show resolved Hide resolved
modin/pandas/test/test_groupby.py Outdated Show resolved Hide resolved
@@ -573,11 +573,19 @@ def _internal_by(self):
internal_by = tuple()
if self._drop:
if is_list_like(self._by):
internal_by = tuple(by for by in self._by if isinstance(by, str))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're fixing the order here, can you port the test case from #6164 and address the comments I left there?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved. there's a separate test case for order now, but this doesn't fix #6164

@@ -573,11 +573,19 @@ def _internal_by(self):
internal_by = tuple()
if self._drop:
if is_list_like(self._by):
internal_by = tuple(by for by in self._by if isinstance(by, str))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved. there's a separate test case for order now, but this doesn't fix #6164

)
def test_groupby_with_grouper(by):
# See https://github.com/modin-project/modin/issues/5091 for more details
data = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pyrito can you add some more columns too? We should have multiple column partitions on ray/dask

mvashishtha
mvashishtha previously approved these changes May 19, 2023
vnlitvinov
vnlitvinov previously approved these changes May 22, 2023
Copy link
Collaborator

@vnlitvinov vnlitvinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@vnlitvinov
Copy link
Collaborator

@pyrito you now have a merge conflict 😢

@pyrito pyrito dismissed stale reviews from vnlitvinov and mvashishtha via bf999ff May 22, 2023 14:11
Copy link
Collaborator

@devin-petersohn devin-petersohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @pyrito !

@devin-petersohn devin-petersohn merged commit 681c326 into modin-project:master May 22, 2023
pyrito pushed a commit to ponder-org/modin-public that referenced this pull request May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: KeyError for TimeGrouper with df.group_by()
4 participants