Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Update is_(not)_in_range (#87) to support max/min limits from col #153

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

karthik-ballullaya-db
Copy link

Changes

Changes in the is_in_range and is_not_in_range column function to handle a column as the min/max limit along with the literal value.

Linked issues

Resolves #87

Tests

  • manually tested
  • added unit tests
  • added integration tests

Copy link

github-actions bot commented Feb 5, 2025

✅ 126/126 passed, 1 skipped, 17m53s total

Running from acceptance #498

Copy link
Contributor

@mwojtyczka mwojtyczka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 318 to 319
min_limit: int | datetime.date | datetime.datetime | str,
max_limit: int | datetime.date | datetime.datetime | str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make it a bit different:

  1. add support for a string as a valid range
  2. add support for a column object as min/max arguments, because people may use not simple column names but more complex expressions (via F.expr).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, updated code to support string ranges and column expression

Comment on lines 285 to 288
min_limit: int | datetime.date | datetime.datetime | str = '',
max_limit: int | datetime.date | datetime.datetime | str = '',
min_limit_col_expr: str | Column = '',
max_limit_col_expr: str | Column = '',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use Optional type or None instead of default of empty string

Comment on lines 311 to 314
min_limit: int | datetime.date | datetime.datetime | str = '',
max_limit: int | datetime.date | datetime.datetime | str = '',
min_limit_col_expr: str | Column = '',
max_limit_col_expr: str | Column = '',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here as well

Comment on lines 348 to 351
min_limit: int | datetime.date | datetime.datetime | str = '',
max_limit: int | datetime.date | datetime.datetime | str = '',
min_limit_col_expr: str | Column = '',
max_limit_col_expr: str | Column = '',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

Comment on lines +298 to +305
if min_limit_col_expr is None:
min_limit_expr = F.lit(min_limit)
else:
min_limit_expr = F.col(min_limit_col_expr) if isinstance(min_limit_col_expr, str) else min_limit_col_expr
if max_limit_col_expr is None:
max_limit_expr = F.lit(max_limit)
else:
max_limit_expr = F.col(max_limit_col_expr) if isinstance(max_limit_col_expr, str) else max_limit_col_expr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to throw an error if both min_limit and min_limit_col_expr are None (same for max_limit and max_limit_col_expr).

Otherwise we'll get conditions like (F.col(col_name) < F.lit(None)) | (F.col(col_name) > F.lit(None))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE]: is_(not)_in_range supports min/max_limit from the dataframe
3 participants