Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add non_existent arg to replace_time_zone #15062

Merged
merged 5 commits into from
Mar 18, 2024

Conversation

MarcoGorelli
Copy link
Collaborator

@MarcoGorelli MarcoGorelli commented Mar 14, 2024

closes #11579

needs rebasing onto #14961

This allows users to do:

In [1]: from datetime import datetime
   ...: 
   ...: import polars as pl
   ...: 
   ...: df = pl.DataFrame(
   ...:     {
   ...:         "ts": pl.datetime_range(
   ...:             datetime(2020, 3, 29, 0), datetime(2020, 3, 29, 5), "1h", eager=True
   ...:         ),
   ...:     },
   ...: )
   ...: 
   ...: df.with_columns(
   ...:     ts_london=pl.col("ts").dt.replace_time_zone("Europe/London", non_existent="null")
   ...: )
Out[1]: 
shape: (6, 2)
┌─────────────────────┬─────────────────────────────┐
│ tsts_london                   │
│ ------                         │
│ datetime[μs]        ┆ datetime[μs, Europe/London] │
╞═════════════════════╪═════════════════════════════╡
│ 2020-03-29 00:00:002020-03-29 00:00:00 GMT     │
│ 2020-03-29 01:00:00null                        │
│ 2020-03-29 02:00:002020-03-29 02:00:00 BST     │
│ 2020-03-29 03:00:002020-03-29 03:00:00 BST     │
│ 2020-03-29 04:00:002020-03-29 04:00:00 BST     │
│ 2020-03-29 05:00:002020-03-29 05:00:00 BST     │
└─────────────────────┴─────────────────────────────┘

Currently there's no way to get that output, as non-existent datetimes just raise (you could convert to string, then use to_datetime with strict=False, but even then that would make all inputs which don't match the given format become null with no way to distinguish the non-existent ones to the invalid ones from those which didn't match the given format)

For now, I'm only adding it to replace_time_zone, and preserving current behaviour in other places. Adding non_existent to to_datetime would require quite a refactor because of how this would interact with strict, but I think for now it's valuable enough to add this to replace_time_zone

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Mar 14, 2024
@MarcoGorelli MarcoGorelli force-pushed the non-existent branch 3 times, most recently from b764012 to ad4e103 Compare March 14, 2024 17:49
Copy link

codecov bot commented Mar 14, 2024

Codecov Report

Attention: Patch coverage is 98.51852% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 81.07%. Comparing base (5d449cc) to head (8db1fd2).

Files Patch % Lines
crates/polars-arrow/src/legacy/kernels/time.rs 85.71% 1 Missing ⚠️
...ates/polars-plan/src/dsl/function_expr/datetime.rs 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #15062      +/-   ##
==========================================
+ Coverage   81.06%   81.07%   +0.01%     
==========================================
  Files        1342     1342              
  Lines      173935   174030      +95     
  Branches     2459     2459              
==========================================
+ Hits       141000   141097      +97     
+ Misses      32468    32467       -1     
+ Partials      467      466       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -669,6 +669,20 @@ def test_to_datetime_use_earliest(exact: bool) -> None:
).item()


def test_to_datetime_naive_format_and_time_zone() -> None:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated to this PR, but codecov flagged it as uncovered, so it's a good chance to add it

@MarcoGorelli MarcoGorelli force-pushed the non-existent branch 2 times, most recently from 004fb76 to a81c0ab Compare March 16, 2024 08:09
@ritchie46 ritchie46 merged commit 1195f85 into pola-rs:main Mar 18, 2024
27 checks passed
@deanm0000
Copy link
Collaborator

I with I stumbled on this before I did this monstrousity

df.join(
    df.select(
        ts=pl.datetime_range(
            pl.col('ts').first().dt.replace_time_zone("Europe/London").dt.convert_time_zone('UTC'), 
            pl.col('ts').last().dt.replace_time_zone("Europe/London").dt.convert_time_zone('UTC'), 
            '1h'
            )
        )
    .with_columns(
        ts_london=(lon:=pl.col('ts').dt.convert_time_zone('Europe/London')), 
        ts=lon.dt.replace_time_zone(None)
        ),
    on='ts', how='left')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable handling non-existent times in expr.dt.replace_time_zone
3 participants