Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GroupBy: Avoid guessing variable types #6906

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

janezd
Copy link
Contributor

@janezd janezd commented Oct 3, 2024

Issue

The Group By widget will fail on Pandas 3.0.

Group by currently computes aggregations, puts them into a pandas data frame and then guesses data types - at least in part. In particular, it tries to interpret columns as time and creates a TimeVariable if conversion succeeds. In tests, we have a concatenation of numeric data that results in value "1.0 1.0". Pandas 3.0 now converts this to January 1st 1970 (I think), so the column becomes a time variable and appears among primitive attributes, not metas(!).

Description of changes

I think a proper solution is to define how to construct variables with aggregations. It can be

  • a fixed type (e.g. StringVariable for concatenation, ContinuousVariable for Span...)
  • a copy of the existing variable (e.g. for Mode, Min value, Random value...)
  • a variable of the same type as the existing variable (Mean...)

The difference between the latter two cases is that it resets the number of decimals.

I modified one test. Apparently it assumes that Count and Count Defined for string variables will be metas, but as far as I see, they are attributes. This PR keeps the behaviour, so I don't understand why tests were failing before.

Includes
  • Code changes
  • Tests

@janezd janezd added the needs discussion Core developers need to discuss the issue label Oct 3, 2024
Copy link

codecov bot commented Oct 3, 2024

Codecov Report

Attention: Patch coverage is 98.21429% with 1 line in your changes missing coverage. Please review.

Project coverage is 88.21%. Comparing base (60a2f61) to head (c575527).
Report is 5 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6906      +/-   ##
==========================================
+ Coverage   88.19%   88.21%   +0.01%     
==========================================
  Files         326      326              
  Lines       71233    71290      +57     
==========================================
+ Hits        62827    62887      +60     
+ Misses       8406     8403       -3     

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs discussion Core developers need to discuss the issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant