Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Datetime(time_unit, time_zone) and Duration(time_unit) types #960

Merged
merged 39 commits into from
Sep 30, 2024

Conversation

FBruzzesi
Copy link
Member

@FBruzzesi FBruzzesi commented Sep 13, 2024

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below.

Introduces time units and time zones in Datetime type.

@github-actions github-actions bot added the enhancement New feature or request label Sep 13, 2024
@@ -75,13 +75,13 @@ def test_cast_date_datetime_pandas() -> None:
df = df.select(nw.col("a").cast(nw.Datetime))
result = nw.to_native(df)
expected = pd.DataFrame({"a": [datetime(2020, 1, 1), datetime(2020, 1, 2)]}).astype(
{"a": "timestamp[ns][pyarrow]"}
{"a": "timestamp[us][pyarrow]"}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes the default to the polars one

nw.col("date")
.cast(nw.Datetime("ms", time_zone="Europe/Rome"))
.cast(nw.String())
.str.slice(offset=0, length=19)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

19: number of characters of "2024-01-01 01:00:00". The format right after that is different for each backend

Comment on lines 227 to 230
pd_datetime_rgx = (
r"^datetime64\[(?P<time_unit>ms|us|ns)(?:, (?P<time_zone>[a-zA-Z\/]+))?\]$"
)
pa_datetime_rgx = r"^timestamp\[(?P<time_unit>ms|us|ns)(?:, tz=(?P<time_zone>[a-zA-Z\/]+))?\]\[pyarrow\]$"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try to break these πŸ™ˆ

Comment on lines 441 to 448
# Pandas does not support "ms" or "us" time units before version 1.5.0
# Let's overwrite with "ns"
if implementation is Implementation.PANDAS and backend_version < (
1,
5,
0,
): # pragma: no cover
time_unit = "ns"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can do much else here

@FBruzzesi
Copy link
Member Author

@MarcoGorelli any experience on how to locate timezone database at "C:\Users\runneradmin\Downloads\tzdata" ? πŸ˜‚

@FBruzzesi FBruzzesi changed the title feat: time zone aware Datetime type feat: Datetime(time_unit, time_zone) and Duration(time_unit) types Sep 14, 2024
@MarcoGorelli
Copy link
Member

wow, nice one! i'll try breaking it a bit, but this is amazing, been wanting to do this for a while πŸš€

@MarcoGorelli
Copy link
Member

I think the dtype comparison isn't quite right:

In [17]: s
Out[17]:
shape: (1,)
Series: '' [datetime[ΞΌs, Asia/Kathmandu]]
[
        2019-12-31 18:15:00 +0545
]

In [18]: nw.from_native(s, allow_series=True).dtype == nw.Datetime('us')
Out[18]: True

In [19]: s.dtype == pl.Datetime('us')
Out[19]: False

@FBruzzesi
Copy link
Member Author

I think the dtype comparison isn't quite right:

In [17]: s
Out[17]:
shape: (1,)
Series: '' [datetime[ΞΌs, Asia/Kathmandu]]
[
        2019-12-31 18:15:00 +0545
]

In [18]: nw.from_native(s, allow_series=True).dtype == nw.Datetime('us')
Out[18]: True

In [19]: s.dtype == pl.Datetime('us')
Out[19]: False

Nice catch! Thanks! Will fix later on πŸ‘Œ

@MarcoGorelli
Copy link
Member

πŸ€” interesting, so elif dtype in {nw.Datetime, nw.Date}: breaks if we add the time_unit and time_zone attributes

@MarcoGorelli
Copy link
Member

Looks like this might be it πŸ₯³ 🍾 can't believe it...this took hours...worth it though. It means we can change the main narwhals namespace, with zero impact on Altair users. as it turns out, the stable api was really worth doing... this is so rewarding πŸ•Ί

i think the nightly ci failure is unrelated

i'll check this again tomorrow, then hopefully we can make a release with this in on Tuesday

@FBruzzesi
Copy link
Member Author

That's awesome! I will take some time this upcoming week to check what happened in detail!
Also plotly has a test with specific time-zone, thus I am looking forward to the release of this feature 😁

However, to use the "edge" dtypes (or in general features), should we

+ import narwhals as nw
- import narwhals.stable.v1 as nw

?

Comment on lines 224 to 225
pd_duration_rgx = r"^timedelta64\[(?P<time_unit>ms|us|ns)\]$"
pa_duration_rgx = r"^duration\[(?P<time_unit>ms|us|ns)\]\[pyarrow\]$"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas / pyarrow support 'second' time unit, I think that should be allowed to pass through

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just by passing it along or doing manipulation for the user?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we can just pass it through - adding a commit soon

@MarcoGorelli
Copy link
Member

However, to use the "edge" dtypes (or in general features), should we

it's fine to use time_unit / time_zone with stable.v1, it's just probably a good idea to avoid anything which calls __hash__. For example, instead of series.dtype in {nw.Datetime, nw.Date}, it'd be safer to do series.dtype == nw.Datetime or series.dtype == nw.Date

Comment on lines 455 to 460
du_time_unit = getattr(dtype, "time_unit", "us")
return (
f"duration[{du_time_unit}][pyarrow]"
if dtype_backend == "pyarrow-nullable"
else f"timedelta64[{du_time_unit}]"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to do the same pre-1.5.0 check here

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright, happy with this πŸ₯³ let's ship it?

@MarcoGorelli MarcoGorelli merged commit 93d2fc7 into main Sep 30, 2024
25 checks passed
@FBruzzesi
Copy link
Member Author

FBruzzesi commented Oct 1, 2024

Thanks for taking care of this Marco! But most importantly, this is the cutest gif so far! How did you discover it?
FYI: @anopsy

@FBruzzesi FBruzzesi deleted the feat/time-zone-aware-datetime branch October 1, 2024 08:11
@anopsy
Copy link
Member

anopsy commented Oct 1, 2024

This is uber cute!

akmalsoliev pushed a commit to akmalsoliev/narwhals that referenced this pull request Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enh]: Add time units and time zone specifics
3 participants