-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented DATEDIFF function #262
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Core changes LGTM! Most of my comments are nitpicks, a few will require some minor tinkers.
documentation/functions.md
Outdated
- `DATEDIFF(x, y, "minutes")` returns y-x in minutes (same idea as hours). | ||
- `DATEDIFF(x, y, "seconds")` returns y-x in seconds (same idea as hours) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's full expand these out instead of saying "same idea as hours". Also missing a period on line 304.
documentation/functions.md
Outdated
|
||
Calling `DATEDIFF` between 2 timestamps returns the difference in one of `years`, `months`,`days`,`hours`,`minutes`,`seconds`. Default is `days`. | ||
|
||
- `DATEDIFF(x, y, "years")` returns y-x in years (December 31st of 2009 and January 1st of 2010 count as 1 year apart). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets explain these in a more plainspeak manner instead of the shorthand I gave in the issue. E.g.: "returns the number of years since x
that y
occurred"
documentation/functions.md
Outdated
# This calculates the difference between date_time attribute of Transactions collection | ||
# and datetime.date(2023, 4, 2) in days. | ||
Transactions.WHERE(YEAR(date_time) <= 2024) \ | ||
(x = date_time, y = datetime.date(2023, 4, 2), | ||
diff = DATEDIFF(date_time, datetime.date(2023, 4, 2), 'days')).TOP_K(30,by=diff.DESC()) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets simplify this:
# Calculates, for each order, the number of days since January 1st 1992
# that the order was placed:
orders(
days_since=DATEDIFF(datetime.date(1992, 1, 1), order_date, "days")
)
documentation/functions.md
Outdated
|
||
- `DATEDIFF(x, y, "years")` returns y-x in years (December 31st of 2009 and January 1st of 2010 count as 1 year apart). | ||
- `DATEDIFF(x, y, "months") `returns y-x in months (January 31st of 2014 and February 1st of 2014 count as 1 month apart). | ||
- `DATEDIFF(x, y, "days")` returns y-x in days (11:59 pm of one day vs 12:01 am of the next day count as 1 day apart). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vineetg3 Upon reflection, I think we may want to switch this to DATEDIFF(unit, x, y)
to be more consistent with how SQL dialects do it (& get rid of the default=days behavior). Doing so would likely streamline the LLM generation since it would recognize the pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also explicitly specify in the docs for this function:
- What happens if the unit is NOT one of those provided units?
- Are the unit strings case sensitive?
- Are there any aliases that are allowed for those units? (e.g. Snowflake has a lot of aliases for its common units that can be used for functions like DATEDIFF) -> if we support any of these, we should document & test.
@@ -84,6 +84,8 @@ These functions must be called on singular data as a function. | |||
- `HOUR`: Returns the hour component of a datetime. | |||
- `MINUTE`: Returns the minute component of a datetime. | |||
- `SECOND`: Returns the second component of a datetime. | |||
- `DATEDIFF`: Returns the difference between two dates in one of | |||
`years`, `months`,`days`,`hours`,`minutes`,`seconds`. Default is `days`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT:minutes
, or seconds
.
# (days_diff*24 + hours_y) - hours_x | ||
# On expansion: (( day_y - day_x )*24 + hours_y) - hours_x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: fix the spacing here, its kinda inconsistent
tests/simple_pydough_functions.py
Outdated
def months_datediff(): | ||
y_datetime = datetime.datetime(2025, 4, 2, 11, 00, 0) | ||
return ( | ||
Transactions.WHERE((YEAR(date_time) <= 2025) & (DAY(date_time) >= 3))( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be < 2025
so it is deterministic? (Doesn't include the current year, since a lot of the data is generated relative to the current date)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Since the init_defog.sql
has relative initialization time stamps. Will rectify.
tests/simple_pydough_functions.py
Outdated
return Transactions.WHERE((YEAR(date_time) <= 2025) & (DAY(date_time) >= 3))( | ||
x=date_time, y=y_datetime, diff=DATEDIFF(date_time, y_datetime, "years") | ||
).TOP_K(20, by=diff.DESC()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For simplicity, you can merge a lot of these into 1 function that just has a bunch of diff
columns.
@@ -568,6 +568,226 @@ def convert_sqrt( | |||
) | |||
|
|||
|
|||
def convert_datediff( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is a different binding for ANSI, this should get ANSI SQL generation tested also to make sure the generated syntax here is correct.
answer = sqlglot_expressions.DateDiff( | ||
unit=sqlglot_expressions.Var(this="days"), | ||
this=date_y, | ||
expression=date_x, | ||
) | ||
return answer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can simplify this by just recursively calling convert_datediff
with the transformed arguments.
Resolves #221
Please check the issue for more details on the implementation.
This implementation calculates date/time differences for various units as follows:
(years_diff * 12 + month_y) - month_x
to get the total months difference.((days_diff * 24 + hours_y) - hours_x)
for total hours difference.((hours_diff * 60 + minutes_y) - minutes_x)
for total minutes difference.((minutes_diff * 60 + seconds_y) - seconds_x)
for total seconds difference.