Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented DATEDIFF function #262

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Implemented DATEDIFF function #262

wants to merge 5 commits into from

Conversation

vineetg3
Copy link
Contributor

Resolves #221
Please check the issue for more details on the implementation.
This implementation calculates date/time differences for various units as follows:

  • Years: Extracts the year from both dates and calculates the difference.
  • Months: Calculates (years_diff * 12 + month_y) - month_x to get the total months difference.
  • Days: Subtracts the start-of-day values of both dates to compute the days difference.
  • Hours: Expands to ((days_diff * 24 + hours_y) - hours_x) for total hours difference.
  • Minutes: Expands to ((hours_diff * 60 + minutes_y) - minutes_x) for total minutes difference.
  • Seconds: Expands to ((minutes_diff * 60 + seconds_y) - seconds_x) for total seconds difference.

Copy link
Contributor

@knassre-bodo knassre-bodo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Core changes LGTM! Most of my comments are nitpicks, a few will require some minor tinkers.

Comment on lines 303 to 304
- `DATEDIFF(x, y, "minutes")` returns y-x in minutes (same idea as hours).
- `DATEDIFF(x, y, "seconds")` returns y-x in seconds (same idea as hours)
Copy link
Contributor

@knassre-bodo knassre-bodo Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's full expand these out instead of saying "same idea as hours". Also missing a period on line 304.


Calling `DATEDIFF` between 2 timestamps returns the difference in one of `years`, `months`,`days`,`hours`,`minutes`,`seconds`. Default is `days`.

- `DATEDIFF(x, y, "years")` returns y-x in years (December 31st of 2009 and January 1st of 2010 count as 1 year apart).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets explain these in a more plainspeak manner instead of the shorthand I gave in the issue. E.g.: "returns the number of years since x that y occurred"

Comment on lines 307 to 312
# This calculates the difference between date_time attribute of Transactions collection
# and datetime.date(2023, 4, 2) in days.
Transactions.WHERE(YEAR(date_time) <= 2024) \
(x = date_time, y = datetime.date(2023, 4, 2),
diff = DATEDIFF(date_time, datetime.date(2023, 4, 2), 'days')).TOP_K(30,by=diff.DESC())
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets simplify this:

# Calculates, for each order, the number of days since January 1st 1992
# that the order was placed:
orders(
 days_since=DATEDIFF(datetime.date(1992, 1, 1), order_date, "days")
)


- `DATEDIFF(x, y, "years")` returns y-x in years (December 31st of 2009 and January 1st of 2010 count as 1 year apart).
- `DATEDIFF(x, y, "months") `returns y-x in months (January 31st of 2014 and February 1st of 2014 count as 1 month apart).
- `DATEDIFF(x, y, "days")` returns y-x in days (11:59 pm of one day vs 12:01 am of the next day count as 1 day apart).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vineetg3 Upon reflection, I think we may want to switch this to DATEDIFF(unit, x, y) to be more consistent with how SQL dialects do it (& get rid of the default=days behavior). Doing so would likely streamline the LLM generation since it would recognize the pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also explicitly specify in the docs for this function:

  • What happens if the unit is NOT one of those provided units?
  • Are the unit strings case sensitive?
  • Are there any aliases that are allowed for those units? (e.g. Snowflake has a lot of aliases for its common units that can be used for functions like DATEDIFF) -> if we support any of these, we should document & test.

@@ -84,6 +84,8 @@ These functions must be called on singular data as a function.
- `HOUR`: Returns the hour component of a datetime.
- `MINUTE`: Returns the minute component of a datetime.
- `SECOND`: Returns the second component of a datetime.
- `DATEDIFF`: Returns the difference between two dates in one of
`years`, `months`,`days`,`hours`,`minutes`,`seconds`. Default is `days`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT:minutes, or seconds.

Comment on lines 708 to 709
# (days_diff*24 + hours_y) - hours_x
# On expansion: (( day_y - day_x )*24 + hours_y) - hours_x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: fix the spacing here, its kinda inconsistent

def months_datediff():
y_datetime = datetime.datetime(2025, 4, 2, 11, 00, 0)
return (
Transactions.WHERE((YEAR(date_time) <= 2025) & (DAY(date_time) >= 3))(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be < 2025 so it is deterministic? (Doesn't include the current year, since a lot of the data is generated relative to the current date)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Since the init_defog.sql has relative initialization time stamps. Will rectify.

Comment on lines 396 to 398
return Transactions.WHERE((YEAR(date_time) <= 2025) & (DAY(date_time) >= 3))(
x=date_time, y=y_datetime, diff=DATEDIFF(date_time, y_datetime, "years")
).TOP_K(20, by=diff.DESC())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simplicity, you can merge a lot of these into 1 function that just has a bunch of diff columns.

@@ -568,6 +568,226 @@ def convert_sqrt(
)


def convert_datediff(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is a different binding for ANSI, this should get ANSI SQL generation tested also to make sure the generated syntax here is correct.

Comment on lines +698 to +703
answer = sqlglot_expressions.DateDiff(
unit=sqlglot_expressions.Var(this="days"),
this=date_y,
expression=date_x,
)
return answer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify this by just recursively calling convert_datediff with the transformed arguments.

@vineetg3 vineetg3 self-assigned this Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for DATEDIFF function in PyDough, with operators controlling units
2 participants