GroupBy: Avoid guessing variable types #6906

janezd · 2024-10-03T17:47:43Z

Issue

The Group By widget will fail on Pandas 3.0.

Group by currently computes aggregations, puts them into a pandas data frame and then guesses data types - at least in part. In particular, it tries to interpret columns as time and creates a TimeVariable if conversion succeeds. In tests, we have a concatenation of numeric data that results in value "1.0 1.0". Pandas 3.0 now converts this to January 1st 1970 (I think), so the column becomes a time variable and appears among primitive attributes, not metas(!).

Description of changes

I think a proper solution is to define how to construct variables with aggregations. It can be

a fixed type (e.g. StringVariable for concatenation, ContinuousVariable for Span...)
a copy of the existing variable (e.g. for Mode, Min value, Random value...)
a variable of the same type as the existing variable (Mean...)

The difference between the latter two cases is that it resets the number of decimals.

I modified one test. Apparently it assumes that Count and Count Defined for string variables will be metas, but as far as I see, they are attributes. This PR keeps the behaviour, so I don't understand why tests were failing before.

Includes

Code changes
Tests

codecov · 2024-10-03T18:05:03Z

Codecov Report

Attention: Patch coverage is 98.21429% with 1 line in your changes missing coverage. Please review.

Project coverage is 88.21%. Comparing base (60a2f61) to head (c575527).
Report is 5 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6906      +/-   ##
==========================================
+ Coverage   88.19%   88.21%   +0.01%     
==========================================
  Files         326      326              
  Lines       71233    71290      +57     
==========================================
+ Hits        62827    62887      +60     
+ Misses       8406     8403       -3

janezd added the needs discussion Core developers need to discuss the issue label Oct 3, 2024

janezd mentioned this pull request Oct 3, 2024

Fixes for tests on nightly builds of numpy/scipy/pandas #6897

Open

1 task

GroupBy: Avoid guessing variable types

c575527

janezd force-pushed the groupby-keep-variables branch from f025959 to c575527 Compare October 3, 2024 18:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GroupBy: Avoid guessing variable types #6906

GroupBy: Avoid guessing variable types #6906

janezd commented Oct 3, 2024

codecov bot commented Oct 3, 2024 •

edited

Loading

GroupBy: Avoid guessing variable types #6906

Are you sure you want to change the base?

GroupBy: Avoid guessing variable types #6906

Conversation

janezd commented Oct 3, 2024

Issue

Description of changes

Includes

codecov bot commented Oct 3, 2024 • edited Loading

Codecov Report

codecov bot commented Oct 3, 2024 •

edited

Loading