Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Adding dataframe estimated size #1549

Merged
merged 11 commits into from
Dec 10, 2024

Conversation

DeaMariaLeon
Copy link
Member

@DeaMariaLeon DeaMariaLeon commented Dec 9, 2024

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

EDIT: Forgot to push the test file, but it looks like it needs some work

@github-actions github-actions bot added the enhancement New feature or request label Dec 9, 2024
Copy link
Member

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @DeaMariaLeon πŸš€ I left a bunch of nitpicks, but I expect to merge soon πŸ˜‡

Edit: Don't worry about scikit-lego, we knew that the scikit-learn 1.6 release was going to break our CI. We are working on fixing it

Comment on lines 811 to 812
sz = self._compliant_frame.estimated_size()
return scale_bytes(sz, unit)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't this directly be

Suggested change
sz = self._compliant_frame.estimated_size()
return scale_bytes(sz, unit)
return self._compliant_frame.estimated_size(unit=unit)

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did the scale_bytes function go? return self._compliant_frame.estimated_size(unit=unit)..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each compliant frame estimated_size uses it internally, don't they? πŸ˜‡

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🫣 .... of course. I need new glasses. πŸ€“

@@ -285,6 +287,10 @@ def schema(self: Self) -> dict[str, DType]:
def collect_schema(self: Self) -> dict[str, DType]:
return self.schema

def estimated_size(self: Self, unit: SizeUnit = "b") -> int | float:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are trying to avoid having defaults for compliant object methods

Suggested change
def estimated_size(self: Self, unit: SizeUnit = "b") -> int | float:
def estimated_size(self: Self, unit: SizeUnit) -> int | float:

@@ -370,6 +372,10 @@ def drop_nulls(self, subset: str | list[str] | None) -> Self:
plx = self.__narwhals_namespace__()
return self.filter(~plx.any_horizontal(plx.col(*subset).is_null()))

def estimated_size(self, unit: SizeUnit = "b") -> int | float:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly as per arrow:

Suggested change
def estimated_size(self, unit: SizeUnit = "b") -> int | float:
def estimated_size(self, unit: SizeUnit) -> int | float:

Comment on lines 688 to 690
Arguments:
sz: size
unit: size unit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may I suggest:

Suggested change
Arguments:
sz: size
unit: size unit
Arguments:
sz: original size in bytes
unit: size unit to covert into

Comment on lines +24 to +27
with pytest.raises(
ValueError,
match="`unit` must be one of {'b', 'kb', 'mb', 'gb', 'tb'}, got 'pizza'",
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That got me a serious smile πŸ˜‚

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should allow unit='pizza'

I think I can eat a pizza in 40 bites. Haven't measured, just guessing. So, unit='pizza' should be 1/40th of unit='b'

We would allow unit='pizza', but leave it undocumented, as a little easter egg for anyone who chooses to read the source code

idk if this is too silly 😜

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bow in front of such piece of art πŸ•

Estimated size is given in the specified unit (bytes by default).

Arguments:
unit : 'b', 'kb', 'mb', 'gb', 'tb', 'bytes', 'kilobytes', 'megabytes',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unit : 'b', 'kb', 'mb', 'gb', 'tb', 'bytes', 'kilobytes', 'megabytes',
unit: 'b', 'kb', 'mb', 'gb', 'tb', 'bytes', 'kilobytes', 'megabytes',

Otherwise the type won't render in via mkdocstrings

@DeaMariaLeon
Copy link
Member Author

DeaMariaLeon commented Dec 10, 2024

@FBruzzesi I can't get rid of a typing error. And I have a coverage error too that has nothing to do with my changes (I think).

I haven't added the pizza yet. I would need to add it to typing, is that OK?

Edit: I wonder why the coverage error only shows locally.

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! just got a question

i think you can type: ignore the typing failure

ok to leave pizza out πŸ˜„

Comment on lines 803 to 808
>>> agnostic_estimated_size(df_pd)
330
>>> agnostic_estimated_size(df_pl)
51
>>> agnostic_estimated_size(df_pa)
63
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fascinating

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @DeaMariaLeon , looks good to me

@FBruzzesi any objections or good to go?

@FBruzzesi
Copy link
Member

thanks @DeaMariaLeon , looks good to me

@FBruzzesi any objections or good to go?

It looks good! Thanks for the adjustments! Sad about not allowing pizza size bites πŸ˜‚

@MarcoGorelli MarcoGorelli merged commit 36afa70 into narwhals-dev:main Dec 10, 2024
22 of 24 checks passed
@DeaMariaLeon DeaMariaLeon deleted the estimated-size branch December 10, 2024 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enh]: Add estimated_size
3 participants