Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented DATEDIFF function #262

Merged
merged 6 commits into from
Feb 15, 2025
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions documentation/functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ Below is the list of every function/operator currently supported in PyDough as a
* [HOUR](#hour)
* [MINUTE](#minute)
* [SECOND](#second)
* [DATEDIFF] (#datediff)
- [Conditional Functions](#conditional-functions)
* [IFF](#iff)
* [ISIN](#isin)
Expand Down Expand Up @@ -290,6 +291,26 @@ is from 0-59:
Orders(is_lt_30_seconds = SECOND(order_date) < 30)
```

<!-- TOC --><a name="datediff"></a>
### DATEDIFF

Calling `DATEDIFF` between 2 timestamps returns the difference in one of `years`, `months`,`days`,`hours`,`minutes`,`seconds`. Default is `days`.

- `DATEDIFF(x, y, "years")` returns y-x in years (December 31st of 2009 and January 1st of 2010 count as 1 year apart).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets explain these in a more plainspeak manner instead of the shorthand I gave in the issue. E.g.: "returns the number of years since x that y occurred"

- `DATEDIFF(x, y, "months") `returns y-x in months (January 31st of 2014 and February 1st of 2014 count as 1 month apart).
- `DATEDIFF(x, y, "days")` returns y-x in days (11:59 pm of one day vs 12:01 am of the next day count as 1 day apart).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vineetg3 Upon reflection, I think we may want to switch this to DATEDIFF(unit, x, y) to be more consistent with how SQL dialects do it (& get rid of the default=days behavior). Doing so would likely streamline the LLM generation since it would recognize the pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also explicitly specify in the docs for this function:

  • What happens if the unit is NOT one of those provided units?
  • Are the unit strings case sensitive?
  • Are there any aliases that are allowed for those units? (e.g. Snowflake has a lot of aliases for its common units that can be used for functions like DATEDIFF) -> if we support any of these, we should document & test.

- `DATEDIFF(x, y, "hours")` returns y-x in hours (6:59 pm vs 7:01 pm of the same day count as 1 hour apart).
- `DATEDIFF(x, y, "minutes")` returns y-x in minutes (same idea as hours).
- `DATEDIFF(x, y, "seconds")` returns y-x in seconds (same idea as hours)
Copy link
Contributor

@knassre-bodo knassre-bodo Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's full expand these out instead of saying "same idea as hours". Also missing a period on line 304.


```py
# This calculates the difference between date_time attribute of Transactions collection
# and datetime.date(2023, 4, 2) in days.
Transactions.WHERE(YEAR(date_time) <= 2024) \
(x = date_time, y = datetime.date(2023, 4, 2),
diff = DATEDIFF(date_time, datetime.date(2023, 4, 2), 'days')).TOP_K(30,by=diff.DESC())
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets simplify this:

# Calculates, for each order, the number of days since January 1st 1992
# that the order was placed:
orders(
 days_since=DATEDIFF(datetime.date(1992, 1, 1), order_date, "days")
)


<!-- TOC --><a name="conditional-functions"></a>
## Conditional Functions

Expand Down
2 changes: 2 additions & 0 deletions pydough/pydough_operators/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
"CONTAINS",
"COUNT",
"ConstantType",
"DATEDIFF",
"DAY",
"DEFAULT_TO",
"DIV",
Expand Down Expand Up @@ -83,6 +84,7 @@
BXR,
CONTAINS,
COUNT,
DATEDIFF,
DAY,
DEFAULT_TO,
DIV,
Expand Down
2 changes: 2 additions & 0 deletions pydough/pydough_operators/expression_operators/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,8 @@ These functions must be called on singular data as a function.
- `HOUR`: Returns the hour component of a datetime.
- `MINUTE`: Returns the minute component of a datetime.
- `SECOND`: Returns the second component of a datetime.
- `DATEDIFF`: Returns the difference between two dates in one of
`years`, `months`,`days`,`hours`,`minutes`,`seconds`. Default is `days`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT:minutes, or seconds.


##### Conditional Functions

Expand Down
2 changes: 2 additions & 0 deletions pydough/pydough_operators/expression_operators/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
"BinaryOperator",
"CONTAINS",
"COUNT",
"DATEDIFF",
"DAY",
"DEFAULT_TO",
"DIV",
Expand Down Expand Up @@ -77,6 +78,7 @@
BXR,
CONTAINS,
COUNT,
DATEDIFF,
DAY,
DEFAULT_TO,
DIV,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""
Definition bindings of builtin PyDough operators that reutrn an expression.
Definition bindings of builtin PyDough operators that return an expression.
"""

__all__ = [
Expand All @@ -12,6 +12,7 @@
"BXR",
"CONTAINS",
"COUNT",
"DATEDIFF",
"DAY",
"DEFAULT_TO",
"DIV",
Expand Down Expand Up @@ -149,6 +150,9 @@
SECOND = ExpressionFunctionOperator(
"SECOND", False, RequireNumArgs(1), ConstantType(Int64Type())
)
DATEDIFF = ExpressionFunctionOperator(
"DATEDIFF", False, RequireMinArgs(2), ConstantType(Int64Type())
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also verify that there are not more than 3 arguments. Can do so by creating a new RequireArgRange verifier (in the same file as RequireMinArgs) to check if the # of arguments is between two values. E.g. RequireArgRange(2, 3) means it checks that the # of args is between 2 and 3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are going for no default arguments, should I keep this implementation even though I would not be using it in this PR?

SLICE = ExpressionFunctionOperator(
"SLICE", False, RequireNumArgs(4), SelectArgumentType(0)
)
Expand Down
222 changes: 222 additions & 0 deletions pydough/sqlglot/transform_bindings.py
Original file line number Diff line number Diff line change
Expand Up @@ -568,6 +568,226 @@ def convert_sqrt(
)


def convert_datediff(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is a different binding for ANSI, this should get ANSI SQL generation tested also to make sure the generated syntax here is correct.

raw_args: Sequence[RelationalExpression] | None,
sql_glot_args: Sequence[SQLGlotExpression],
) -> SQLGlotExpression:
"""
Support for getting the difference between two dates in sqlite.

Args:
`raw_args`: The operands passed to the function before they were converted to
SQLGlot expressions. (Not actively used in this implementation.)
`sql_glot_args`: The operands passed to the function after they were converted
to SQLGlot expressions.

Returns:
The SQLGlot expression matching the functionality of
`DATEDIFF(y, x)`,i.e the difference between two dates.
"""
assert len(sql_glot_args) == 2 or len(sql_glot_args) == 3
unit: str
if len(sql_glot_args) == 2:
unit = "days"
else:
unit = sql_glot_args[2].this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's verify that it is a string:

  • Check that sql_glot_args[2] is a literal
  • Make sure the is_string property is True

x = sql_glot_args[0]
y = sql_glot_args[1]
datediff_unit: str = unit
match unit:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well be kind and do match unit.lower(): so its case-insensitive.

case "years":
datediff_unit = "year"
case "months":
datediff_unit = "month"
case "days":
datediff_unit = "day"
case "hours":
datediff_unit = "hour"
case "minutes":
datediff_unit = "minute"
case "seconds":
datediff_unit = "second"
case _:
raise ValueError(f"Unsupported argument {unit} for DATEDIFF.")
vineetg3 marked this conversation as resolved.
Show resolved Hide resolved
answer = sqlglot_expressions.DateDiff(
unit=sqlglot_expressions.Var(this=datediff_unit), this=y, expression=x
)
return answer


def convert_sqlite_datediff(
raw_args: Sequence[RelationalExpression] | None,
sql_glot_args: Sequence[SQLGlotExpression],
) -> SQLGlotExpression:
"""
Support for getting the difference between two dates in sqlite.

Args:
`raw_args`: The operands passed to the function before they were converted to
SQLGlot expressions. (Not actively used in this implementation.)
`sql_glot_args`: The operands passed to the function after they were converted
to SQLGlot expressions.

Returns:
The SQLGlot expression matching the functionality of
`DATEDIFF(y, x)`,i.e the difference between two dates.
"""
assert len(sql_glot_args) == 2 or len(sql_glot_args) == 3
unit: str
if len(sql_glot_args) == 2:
unit = "days"
else:
unit = sql_glot_args[2].this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same point about the units.

match unit:
case "years":
# Extracts the year from the date and subtracts the years.
year_x: SQLGlotExpression = convert_sqlite_datetime_extract("'%Y'")(
None, [sql_glot_args[0]]
)
year_y: SQLGlotExpression = convert_sqlite_datetime_extract("'%Y'")(
None, [sql_glot_args[1]]
)
# equivalent to: expression - this
years_diff: SQLGlotExpression = sqlglot_expressions.Sub(
this=year_y, expression=year_x
)
return years_diff
case "months":
# Extracts the difference in years multiplied by 12.
# Extracts the month from the date and subtracts the months.
# Adds the difference in months to the difference in years*12.
# Implementation wise, this is equivalent to:
# (years_diff*12 + month_y) - month_x
# On expansion: (year_y - year_x) * 12 + month_y - month_x
sql_glot_args_hours = [
sql_glot_args[0],
sql_glot_args[1],
sqlglot_expressions.Literal(this="years", is_string=True),
]
_years_diff: SQLGlotExpression = convert_sqlite_datediff(
raw_args, sql_glot_args_hours
)
years_diff_in_months = sqlglot_expressions.Mul(
this=apply_parens(_years_diff),
expression=sqlglot_expressions.Literal.number(12),
)
month_x = convert_sqlite_datetime_extract("'%m'")(None, [sql_glot_args[0]])
month_y = convert_sqlite_datetime_extract("'%m'")(None, [sql_glot_args[1]])
months_diff: SQLGlotExpression = sqlglot_expressions.Sub(
this=sqlglot_expressions.Add(
this=years_diff_in_months, expression=month_y
),
expression=month_x,
)
return months_diff
case "days":
# Extracts the start of date from the datetime and subtracts the dates.
date_x = sqlglot_expressions.Date(
this=sql_glot_args[0],
expressions=[
sqlglot_expressions.Literal(this="start of day", is_string=True)
],
)
date_y = sqlglot_expressions.Date(
this=sql_glot_args[1],
expressions=[
sqlglot_expressions.Literal(this="start of day", is_string=True)
],
)
# This calculates 'this-expression'.
answer = sqlglot_expressions.DateDiff(
unit=sqlglot_expressions.Var(this="days"),
this=date_y,
expression=date_x,
)
return answer
Comment on lines +707 to +712
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify this by just recursively calling convert_datediff with the transformed arguments.

case "hours":
# Extracts the difference in days multiplied by 24 to get difference in hours.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: ensure all the comments are <= 80 lines

# Extracts the hours of x and hours of y. Adds the difference in hours to the (difference in days*24).
# Implementation wise, this is equivalent to:
# (days_diff*24 + hours_y) - hours_x
# On expansion: (( day_y - day_x )*24 + hours_y) - hours_x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: fix the spacing here, its kinda inconsistent

sql_glot_args_days = [
sql_glot_args[0],
sql_glot_args[1],
sqlglot_expressions.Literal(this="days", is_string=True),
]
_days_diff: SQLGlotExpression = convert_sqlite_datediff(
raw_args, sql_glot_args_days
)
days_diff_in_hours = sqlglot_expressions.Mul(
this=apply_parens(_days_diff),
expression=sqlglot_expressions.Literal.number(24),
)
hours_x: SQLGlotExpression = convert_sqlite_datetime_extract("'%H'")(
None, [sql_glot_args[0]]
)
hours_y: SQLGlotExpression = convert_sqlite_datetime_extract("'%H'")(
None, [sql_glot_args[1]]
)
hours_diff: SQLGlotExpression = sqlglot_expressions.Sub(
this=sqlglot_expressions.Add(
this=days_diff_in_hours, expression=hours_y
),
expression=hours_x,
)
return hours_diff
case "minutes":
# Extracts the difference in hours multiplied by 60 to get difference in minutes.
# Extracts the minutes of x and minutes of y. Adds the difference in minutes to the (difference in hours*60).
# Implementation wise, this is equivalent to:
# (hours_diff*60 + minutes_y) - minutes_x
# On expansion: (( hours_y - hours_x )*60 + minutes_y) - minutes_x
sql_glot_args_hours = [
sql_glot_args[0],
sql_glot_args[1],
sqlglot_expressions.Literal(this="hours", is_string=True),
]
_hours_diff: SQLGlotExpression = convert_sqlite_datediff(
raw_args, sql_glot_args_hours
)
hours_diff_in_mins = sqlglot_expressions.Mul(
this=apply_parens(_hours_diff),
expression=sqlglot_expressions.Literal.number(60),
)
min_x = convert_sqlite_datetime_extract("'%M'")(None, [sql_glot_args[0]])
min_y = convert_sqlite_datetime_extract("'%M'")(None, [sql_glot_args[1]])
mins_diff: SQLGlotExpression = sqlglot_expressions.Sub(
this=sqlglot_expressions.Add(this=hours_diff_in_mins, expression=min_y),
expression=min_x,
)
return mins_diff
case "seconds":
# Extracts the difference in minutes multiplied by 60 to get difference in seconds.
# Extracts the seconds of x and seconds of y. Adds the difference in seconds to the (difference in minutes*60).
# Implementation wise, this is equivalent to:
# (mins_diff*60 + seconds_y) - seconds_x
# On expansion: (( mins_y - mins_x )*60 + seconds_y) - seconds_x
sql_glot_args_minutes = [
sql_glot_args[0],
sql_glot_args[1],
sqlglot_expressions.Literal(this="minutes", is_string=True),
]
_mins_diff: SQLGlotExpression = convert_sqlite_datediff(
raw_args, sql_glot_args_minutes
)
minutes_diff_in_secs = sqlglot_expressions.Mul(
this=apply_parens(_mins_diff),
expression=sqlglot_expressions.Literal.number(60),
)
sec_x = convert_sqlite_datetime_extract("'%S'")(None, [sql_glot_args[0]])
sec_y = convert_sqlite_datetime_extract("'%S'")(None, [sql_glot_args[1]])
secs_diff: SQLGlotExpression = sqlglot_expressions.Sub(
this=sqlglot_expressions.Add(
this=minutes_diff_in_secs, expression=sec_y
),
expression=sec_x,
)
return secs_diff
case _:
raise ValueError(f"Unsupported argument {unit} for DATEDIFF.")


class SqlGlotTransformBindings:
"""
Binding infrastructure used to associate PyDough operators with a procedure
Expand Down Expand Up @@ -742,6 +962,7 @@ def add_builtin_bindings(self) -> None:
self.bindings[pydop.HOUR] = create_convert_time_unit_function("HOUR")
self.bindings[pydop.MINUTE] = create_convert_time_unit_function("MINUTE")
self.bindings[pydop.SECOND] = create_convert_time_unit_function("SECOND")
self.bindings[pydop.DATEDIFF] = convert_datediff

# Binary operators
self.bind_binop(pydop.ADD, sqlglot_expressions.Add)
Expand Down Expand Up @@ -779,6 +1000,7 @@ def add_sqlite_bindings(self) -> None:
self.bindings[pydop.HOUR] = convert_sqlite_datetime_extract("'%H'")
self.bindings[pydop.MINUTE] = convert_sqlite_datetime_extract("'%M'")
self.bindings[pydop.SECOND] = convert_sqlite_datetime_extract("'%S'")
self.bindings[pydop.DATEDIFF] = convert_sqlite_datediff

# String function overrides
if sqlite3.sqlite_version < "3.44.1":
Expand Down
Loading