Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: let "ambiguous" take "null" value #14961

Merged
merged 3 commits into from
Mar 14, 2024

Conversation

MarcoGorelli
Copy link
Collaborator

@MarcoGorelli MarcoGorelli commented Mar 10, 2024

to be rebased onto #14958

Quick demo of what this lets you do:

In [13]: s = pl.datetime_range(datetime(2018, 10, 28), datetime(2018, 10, 28, 5), '1h', eager=True)
    ...: 

In [14]: s
Out[14]: 
shape: (6,)
Series: 'literal' [datetime[μs]]
[
        2018-10-28 00:00:00
        2018-10-28 01:00:00
        2018-10-28 02:00:00
        2018-10-28 03:00:00
        2018-10-28 04:00:00
        2018-10-28 05:00:00
]

In [15]: s.dt.replace_time_zone('Europe/Amsterdam', ambiguous='null')
Out[15]: 
shape: (6,)
Series: 'literal' [datetime[μs, Europe/Amsterdam]]
[
        2018-10-28 00:00:00 CEST
        2018-10-28 01:00:00 CEST
        null
        2018-10-28 03:00:00 CET
        2018-10-28 04:00:00 CET
        2018-10-28 05:00:00 CET
]

towards #11579. Based on #14842 (comment), I think there's a need for being able to just have null for problematic datetimes.

I'm suggesting to start with adding ambiguous='null', and then adding the option non_existent: Literal['null', 'raise']

Perf implication:

  • for the trivial case of setting/unsetting UTC and ambiguous='raise' (the default), no implication. There's already a fast-path for that
  • base case (ambiguous='raise', the default): no impact. ambiguous: no impact. I've made a fastpath to preserve this case. Check there's no impact here, which shows 12.373298853000051 => 12.362355302333375 (main => here)
  • For the non-base-case, I'm seeing a slight perf hit, which I've timed here: 13.127492245999747 => 13.452795328000017 (main => here). I think this is to be expected, as the result needs to be constructed from a vector of PolarsResult<Option<i64>> as opposed to PolarsResult<i64>, but I think it's worth it for the user benefit

@MarcoGorelli MarcoGorelli changed the title Ambiguous null feat(rust, python): let "ambiguous" take "null" value Mar 10, 2024
@MarcoGorelli MarcoGorelli changed the title feat(rust, python): let "ambiguous" take "null" value feat: let "ambiguous" take "null" value Mar 10, 2024
@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Mar 10, 2024
Copy link

codecov bot commented Mar 10, 2024

Codecov Report

Attention: Patch coverage is 94.59459% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 80.99%. Comparing base (8c2eca9) to head (37cfb09).

❗ Current head 37cfb09 differs from pull request most recent head cce0c4c. Consider uploading reports for the commit cce0c4c to get more accurate results

Files Patch % Lines
crates/polars-arrow/src/legacy/kernels/time.rs 75.00% 3 Missing ⚠️
...ps/src/chunked_array/datetime/replace_time_zone.rs 96.10% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main   #14961   +/-   ##
=======================================
  Coverage   80.99%   80.99%           
=======================================
  Files        1333     1333           
  Lines      173174   173229   +55     
  Branches     2458     2458           
=======================================
+ Hits       140265   140312   +47     
- Misses      32442    32449    +7     
- Partials      467      468    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MarcoGorelli MarcoGorelli force-pushed the ambiguous-null branch 4 times, most recently from 80b07d8 to 00c0d71 Compare March 10, 2024 15:55
@MarcoGorelli MarcoGorelli marked this pull request as ready for review March 10, 2024 17:55
@MarcoGorelli MarcoGorelli marked this pull request as draft March 10, 2024 18:08
Copy link
Contributor

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave the Rust review to someone else, but I'm happy with the addition of the "null" option!

Comment on lines +478 to +484
match="datetime '2021-03-28 02:30:00' is non-existent in time zone 'Europe/Warsaw'",
):
pl.Series(["2021-03-28 02:30"]).str.to_datetime(
time_unit="us",
time_zone="Europe/Warsaw",
ambiguous="null",
)
Copy link
Collaborator Author

@MarcoGorelli MarcoGorelli Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separately, a non_existent argument could be added to make sure that this would return null values instead of raising. This is just to check that ambiguous='null' doesn't affect what happens for non-existent datetimes (coverage purposes)

EDIT: #15062 opens the doors for non_existent

@MarcoGorelli MarcoGorelli marked this pull request as ready for review March 11, 2024 15:03
@ritchie46 ritchie46 merged commit f08914b into pola-rs:main Mar 14, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants