-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Adding dataframe estimated size #1549
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @DeaMariaLeon π I left a bunch of nitpicks, but I expect to merge soon π
Edit: Don't worry about scikit-lego, we knew that the scikit-learn 1.6 release was going to break our CI. We are working on fixing it
narwhals/dataframe.py
Outdated
sz = self._compliant_frame.estimated_size() | ||
return scale_bytes(sz, unit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't this directly be
sz = self._compliant_frame.estimated_size() | |
return scale_bytes(sz, unit) | |
return self._compliant_frame.estimated_size(unit=unit) |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did the scale_bytes
function go? return self._compliant_frame.estimated_size(unit=unit)
..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each compliant frame estimated_size
uses it internally, don't they? π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
π«£ .... of course. I need new glasses. π€
narwhals/_arrow/dataframe.py
Outdated
@@ -285,6 +287,10 @@ def schema(self: Self) -> dict[str, DType]: | |||
def collect_schema(self: Self) -> dict[str, DType]: | |||
return self.schema | |||
|
|||
def estimated_size(self: Self, unit: SizeUnit = "b") -> int | float: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are trying to avoid having defaults for compliant object methods
def estimated_size(self: Self, unit: SizeUnit = "b") -> int | float: | |
def estimated_size(self: Self, unit: SizeUnit) -> int | float: |
narwhals/_pandas_like/dataframe.py
Outdated
@@ -370,6 +372,10 @@ def drop_nulls(self, subset: str | list[str] | None) -> Self: | |||
plx = self.__narwhals_namespace__() | |||
return self.filter(~plx.any_horizontal(plx.col(*subset).is_null())) | |||
|
|||
def estimated_size(self, unit: SizeUnit = "b") -> int | float: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly as per arrow:
def estimated_size(self, unit: SizeUnit = "b") -> int | float: | |
def estimated_size(self, unit: SizeUnit) -> int | float: |
narwhals/utils.py
Outdated
Arguments: | ||
sz: size | ||
unit: size unit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may I suggest:
Arguments: | |
sz: size | |
unit: size unit | |
Arguments: | |
sz: original size in bytes | |
unit: size unit to covert into |
with pytest.raises( | ||
ValueError, | ||
match="`unit` must be one of {'b', 'kb', 'mb', 'gb', 'tb'}, got 'pizza'", | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That got me a serious smile π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should allow unit='pizza'
I think I can eat a pizza in 40 bites. Haven't measured, just guessing. So, unit='pizza'
should be 1/40th of unit='b'
We would allow unit='pizza'
, but leave it undocumented, as a little easter egg for anyone who chooses to read the source code
idk if this is too silly π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I bow in front of such piece of art π
narwhals/dataframe.py
Outdated
Estimated size is given in the specified unit (bytes by default). | ||
|
||
Arguments: | ||
unit : 'b', 'kb', 'mb', 'gb', 'tb', 'bytes', 'kilobytes', 'megabytes', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unit : 'b', 'kb', 'mb', 'gb', 'tb', 'bytes', 'kilobytes', 'megabytes', | |
unit: 'b', 'kb', 'mb', 'gb', 'tb', 'bytes', 'kilobytes', 'megabytes', |
Otherwise the type
won't render in via mkdocstrings
@FBruzzesi I can't get rid of a typing error. And I have a coverage error too that has nothing to do with my changes (I think). I haven't added the pizza yet. I would need to add it to typing, is that OK? Edit: I wonder why the coverage error only shows locally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! just got a question
i think you can type: ignore
the typing failure
ok to leave pizza out π
narwhals/dataframe.py
Outdated
>>> agnostic_estimated_size(df_pd) | ||
330 | ||
>>> agnostic_estimated_size(df_pl) | ||
51 | ||
>>> agnostic_estimated_size(df_pa) | ||
63 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fascinating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good! Thanks for the adjustments! Sad about not allowing pizza size bites π |
What type of PR is this? (check all applicable)
Related issues
estimated_size
Β #1535Checklist
If you have comments or can explain your changes, please do so below
EDIT: Forgot to push the test file, but it looks like it needs some work