-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented DATEDIFF function #262
Changes from 3 commits
83889ee
f3ee040
2fbf7a3
4041e21
ba59242
c080b68
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,6 +28,7 @@ Below is the list of every function/operator currently supported in PyDough as a | |
* [HOUR](#hour) | ||
* [MINUTE](#minute) | ||
* [SECOND](#second) | ||
* [DATEDIFF] (#datediff) | ||
- [Conditional Functions](#conditional-functions) | ||
* [IFF](#iff) | ||
* [ISIN](#isin) | ||
|
@@ -290,6 +291,26 @@ is from 0-59: | |
Orders(is_lt_30_seconds = SECOND(order_date) < 30) | ||
``` | ||
|
||
<!-- TOC --><a name="datediff"></a> | ||
### DATEDIFF | ||
|
||
Calling `DATEDIFF` between 2 timestamps returns the difference in one of `years`, `months`,`days`,`hours`,`minutes`,`seconds`. Default is `days`. | ||
|
||
- `DATEDIFF(x, y, "years")` returns y-x in years (December 31st of 2009 and January 1st of 2010 count as 1 year apart). | ||
- `DATEDIFF(x, y, "months") `returns y-x in months (January 31st of 2014 and February 1st of 2014 count as 1 month apart). | ||
- `DATEDIFF(x, y, "days")` returns y-x in days (11:59 pm of one day vs 12:01 am of the next day count as 1 day apart). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @vineetg3 Upon reflection, I think we may want to switch this to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's also explicitly specify in the docs for this function:
|
||
- `DATEDIFF(x, y, "hours")` returns y-x in hours (6:59 pm vs 7:01 pm of the same day count as 1 hour apart). | ||
- `DATEDIFF(x, y, "minutes")` returns y-x in minutes (same idea as hours). | ||
- `DATEDIFF(x, y, "seconds")` returns y-x in seconds (same idea as hours) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's full expand these out instead of saying "same idea as hours". Also missing a period on line 304. |
||
|
||
```py | ||
# This calculates the difference between date_time attribute of Transactions collection | ||
# and datetime.date(2023, 4, 2) in days. | ||
Transactions.WHERE(YEAR(date_time) <= 2024) \ | ||
(x = date_time, y = datetime.date(2023, 4, 2), | ||
diff = DATEDIFF(date_time, datetime.date(2023, 4, 2), 'days')).TOP_K(30,by=diff.DESC()) | ||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lets simplify this: # Calculates, for each order, the number of days since January 1st 1992
# that the order was placed:
orders(
days_since=DATEDIFF(datetime.date(1992, 1, 1), order_date, "days")
) |
||
|
||
<!-- TOC --><a name="conditional-functions"></a> | ||
## Conditional Functions | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -84,6 +84,8 @@ These functions must be called on singular data as a function. | |
- `HOUR`: Returns the hour component of a datetime. | ||
- `MINUTE`: Returns the minute component of a datetime. | ||
- `SECOND`: Returns the second component of a datetime. | ||
- `DATEDIFF`: Returns the difference between two dates in one of | ||
`years`, `months`,`days`,`hours`,`minutes`,`seconds`. Default is `days`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NIT: |
||
|
||
##### Conditional Functions | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
""" | ||
Definition bindings of builtin PyDough operators that reutrn an expression. | ||
Definition bindings of builtin PyDough operators that return an expression. | ||
""" | ||
|
||
__all__ = [ | ||
|
@@ -12,6 +12,7 @@ | |
"BXR", | ||
"CONTAINS", | ||
"COUNT", | ||
"DATEDIFF", | ||
"DAY", | ||
"DEFAULT_TO", | ||
"DIV", | ||
|
@@ -149,6 +150,9 @@ | |
SECOND = ExpressionFunctionOperator( | ||
"SECOND", False, RequireNumArgs(1), ConstantType(Int64Type()) | ||
) | ||
DATEDIFF = ExpressionFunctionOperator( | ||
"DATEDIFF", False, RequireMinArgs(2), ConstantType(Int64Type()) | ||
) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should also verify that there are not more than 3 arguments. Can do so by creating a new There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since we are going for no default arguments, should I keep this implementation even though I would not be using it in this PR? |
||
SLICE = ExpressionFunctionOperator( | ||
"SLICE", False, RequireNumArgs(4), SelectArgumentType(0) | ||
) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -568,6 +568,226 @@ def convert_sqrt( | |
) | ||
|
||
|
||
def convert_datediff( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since there is a different binding for ANSI, this should get ANSI SQL generation tested also to make sure the generated syntax here is correct. |
||
raw_args: Sequence[RelationalExpression] | None, | ||
sql_glot_args: Sequence[SQLGlotExpression], | ||
) -> SQLGlotExpression: | ||
""" | ||
Support for getting the difference between two dates in sqlite. | ||
|
||
Args: | ||
`raw_args`: The operands passed to the function before they were converted to | ||
SQLGlot expressions. (Not actively used in this implementation.) | ||
`sql_glot_args`: The operands passed to the function after they were converted | ||
to SQLGlot expressions. | ||
|
||
Returns: | ||
The SQLGlot expression matching the functionality of | ||
`DATEDIFF(y, x)`,i.e the difference between two dates. | ||
""" | ||
assert len(sql_glot_args) == 2 or len(sql_glot_args) == 3 | ||
unit: str | ||
if len(sql_glot_args) == 2: | ||
unit = "days" | ||
else: | ||
unit = sql_glot_args[2].this | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's verify that it is a string:
|
||
x = sql_glot_args[0] | ||
y = sql_glot_args[1] | ||
datediff_unit: str = unit | ||
match unit: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might as well be kind and do |
||
case "years": | ||
datediff_unit = "year" | ||
case "months": | ||
datediff_unit = "month" | ||
case "days": | ||
datediff_unit = "day" | ||
case "hours": | ||
datediff_unit = "hour" | ||
case "minutes": | ||
datediff_unit = "minute" | ||
case "seconds": | ||
datediff_unit = "second" | ||
case _: | ||
raise ValueError(f"Unsupported argument {unit} for DATEDIFF.") | ||
vineetg3 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
answer = sqlglot_expressions.DateDiff( | ||
unit=sqlglot_expressions.Var(this=datediff_unit), this=y, expression=x | ||
) | ||
return answer | ||
|
||
|
||
def convert_sqlite_datediff( | ||
raw_args: Sequence[RelationalExpression] | None, | ||
sql_glot_args: Sequence[SQLGlotExpression], | ||
) -> SQLGlotExpression: | ||
""" | ||
Support for getting the difference between two dates in sqlite. | ||
|
||
Args: | ||
`raw_args`: The operands passed to the function before they were converted to | ||
SQLGlot expressions. (Not actively used in this implementation.) | ||
`sql_glot_args`: The operands passed to the function after they were converted | ||
to SQLGlot expressions. | ||
|
||
Returns: | ||
The SQLGlot expression matching the functionality of | ||
`DATEDIFF(y, x)`,i.e the difference between two dates. | ||
""" | ||
assert len(sql_glot_args) == 2 or len(sql_glot_args) == 3 | ||
unit: str | ||
if len(sql_glot_args) == 2: | ||
unit = "days" | ||
else: | ||
unit = sql_glot_args[2].this | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same point about the units. |
||
match unit: | ||
case "years": | ||
# Extracts the year from the date and subtracts the years. | ||
year_x: SQLGlotExpression = convert_sqlite_datetime_extract("'%Y'")( | ||
None, [sql_glot_args[0]] | ||
) | ||
year_y: SQLGlotExpression = convert_sqlite_datetime_extract("'%Y'")( | ||
None, [sql_glot_args[1]] | ||
) | ||
# equivalent to: expression - this | ||
years_diff: SQLGlotExpression = sqlglot_expressions.Sub( | ||
this=year_y, expression=year_x | ||
) | ||
return years_diff | ||
case "months": | ||
# Extracts the difference in years multiplied by 12. | ||
# Extracts the month from the date and subtracts the months. | ||
# Adds the difference in months to the difference in years*12. | ||
# Implementation wise, this is equivalent to: | ||
# (years_diff*12 + month_y) - month_x | ||
# On expansion: (year_y - year_x) * 12 + month_y - month_x | ||
sql_glot_args_hours = [ | ||
sql_glot_args[0], | ||
sql_glot_args[1], | ||
sqlglot_expressions.Literal(this="years", is_string=True), | ||
] | ||
_years_diff: SQLGlotExpression = convert_sqlite_datediff( | ||
raw_args, sql_glot_args_hours | ||
) | ||
years_diff_in_months = sqlglot_expressions.Mul( | ||
this=apply_parens(_years_diff), | ||
expression=sqlglot_expressions.Literal.number(12), | ||
) | ||
month_x = convert_sqlite_datetime_extract("'%m'")(None, [sql_glot_args[0]]) | ||
month_y = convert_sqlite_datetime_extract("'%m'")(None, [sql_glot_args[1]]) | ||
months_diff: SQLGlotExpression = sqlglot_expressions.Sub( | ||
this=sqlglot_expressions.Add( | ||
this=years_diff_in_months, expression=month_y | ||
), | ||
expression=month_x, | ||
) | ||
return months_diff | ||
case "days": | ||
# Extracts the start of date from the datetime and subtracts the dates. | ||
date_x = sqlglot_expressions.Date( | ||
this=sql_glot_args[0], | ||
expressions=[ | ||
sqlglot_expressions.Literal(this="start of day", is_string=True) | ||
], | ||
) | ||
date_y = sqlglot_expressions.Date( | ||
this=sql_glot_args[1], | ||
expressions=[ | ||
sqlglot_expressions.Literal(this="start of day", is_string=True) | ||
], | ||
) | ||
# This calculates 'this-expression'. | ||
answer = sqlglot_expressions.DateDiff( | ||
unit=sqlglot_expressions.Var(this="days"), | ||
this=date_y, | ||
expression=date_x, | ||
) | ||
return answer | ||
Comment on lines
+707
to
+712
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can simplify this by just recursively calling |
||
case "hours": | ||
# Extracts the difference in days multiplied by 24 to get difference in hours. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NIT: ensure all the comments are <= 80 lines |
||
# Extracts the hours of x and hours of y. Adds the difference in hours to the (difference in days*24). | ||
# Implementation wise, this is equivalent to: | ||
# (days_diff*24 + hours_y) - hours_x | ||
# On expansion: (( day_y - day_x )*24 + hours_y) - hours_x | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. NIT: fix the spacing here, its kinda inconsistent |
||
sql_glot_args_days = [ | ||
sql_glot_args[0], | ||
sql_glot_args[1], | ||
sqlglot_expressions.Literal(this="days", is_string=True), | ||
] | ||
_days_diff: SQLGlotExpression = convert_sqlite_datediff( | ||
raw_args, sql_glot_args_days | ||
) | ||
days_diff_in_hours = sqlglot_expressions.Mul( | ||
this=apply_parens(_days_diff), | ||
expression=sqlglot_expressions.Literal.number(24), | ||
) | ||
hours_x: SQLGlotExpression = convert_sqlite_datetime_extract("'%H'")( | ||
None, [sql_glot_args[0]] | ||
) | ||
hours_y: SQLGlotExpression = convert_sqlite_datetime_extract("'%H'")( | ||
None, [sql_glot_args[1]] | ||
) | ||
hours_diff: SQLGlotExpression = sqlglot_expressions.Sub( | ||
this=sqlglot_expressions.Add( | ||
this=days_diff_in_hours, expression=hours_y | ||
), | ||
expression=hours_x, | ||
) | ||
return hours_diff | ||
case "minutes": | ||
# Extracts the difference in hours multiplied by 60 to get difference in minutes. | ||
# Extracts the minutes of x and minutes of y. Adds the difference in minutes to the (difference in hours*60). | ||
# Implementation wise, this is equivalent to: | ||
# (hours_diff*60 + minutes_y) - minutes_x | ||
# On expansion: (( hours_y - hours_x )*60 + minutes_y) - minutes_x | ||
sql_glot_args_hours = [ | ||
sql_glot_args[0], | ||
sql_glot_args[1], | ||
sqlglot_expressions.Literal(this="hours", is_string=True), | ||
] | ||
_hours_diff: SQLGlotExpression = convert_sqlite_datediff( | ||
raw_args, sql_glot_args_hours | ||
) | ||
hours_diff_in_mins = sqlglot_expressions.Mul( | ||
this=apply_parens(_hours_diff), | ||
expression=sqlglot_expressions.Literal.number(60), | ||
) | ||
min_x = convert_sqlite_datetime_extract("'%M'")(None, [sql_glot_args[0]]) | ||
min_y = convert_sqlite_datetime_extract("'%M'")(None, [sql_glot_args[1]]) | ||
mins_diff: SQLGlotExpression = sqlglot_expressions.Sub( | ||
this=sqlglot_expressions.Add(this=hours_diff_in_mins, expression=min_y), | ||
expression=min_x, | ||
) | ||
return mins_diff | ||
case "seconds": | ||
# Extracts the difference in minutes multiplied by 60 to get difference in seconds. | ||
# Extracts the seconds of x and seconds of y. Adds the difference in seconds to the (difference in minutes*60). | ||
# Implementation wise, this is equivalent to: | ||
# (mins_diff*60 + seconds_y) - seconds_x | ||
# On expansion: (( mins_y - mins_x )*60 + seconds_y) - seconds_x | ||
sql_glot_args_minutes = [ | ||
sql_glot_args[0], | ||
sql_glot_args[1], | ||
sqlglot_expressions.Literal(this="minutes", is_string=True), | ||
] | ||
_mins_diff: SQLGlotExpression = convert_sqlite_datediff( | ||
raw_args, sql_glot_args_minutes | ||
) | ||
minutes_diff_in_secs = sqlglot_expressions.Mul( | ||
this=apply_parens(_mins_diff), | ||
expression=sqlglot_expressions.Literal.number(60), | ||
) | ||
sec_x = convert_sqlite_datetime_extract("'%S'")(None, [sql_glot_args[0]]) | ||
sec_y = convert_sqlite_datetime_extract("'%S'")(None, [sql_glot_args[1]]) | ||
secs_diff: SQLGlotExpression = sqlglot_expressions.Sub( | ||
this=sqlglot_expressions.Add( | ||
this=minutes_diff_in_secs, expression=sec_y | ||
), | ||
expression=sec_x, | ||
) | ||
return secs_diff | ||
case _: | ||
raise ValueError(f"Unsupported argument {unit} for DATEDIFF.") | ||
|
||
|
||
class SqlGlotTransformBindings: | ||
""" | ||
Binding infrastructure used to associate PyDough operators with a procedure | ||
|
@@ -742,6 +962,7 @@ def add_builtin_bindings(self) -> None: | |
self.bindings[pydop.HOUR] = create_convert_time_unit_function("HOUR") | ||
self.bindings[pydop.MINUTE] = create_convert_time_unit_function("MINUTE") | ||
self.bindings[pydop.SECOND] = create_convert_time_unit_function("SECOND") | ||
self.bindings[pydop.DATEDIFF] = convert_datediff | ||
|
||
# Binary operators | ||
self.bind_binop(pydop.ADD, sqlglot_expressions.Add) | ||
|
@@ -779,6 +1000,7 @@ def add_sqlite_bindings(self) -> None: | |
self.bindings[pydop.HOUR] = convert_sqlite_datetime_extract("'%H'") | ||
self.bindings[pydop.MINUTE] = convert_sqlite_datetime_extract("'%M'") | ||
self.bindings[pydop.SECOND] = convert_sqlite_datetime_extract("'%S'") | ||
self.bindings[pydop.DATEDIFF] = convert_sqlite_datediff | ||
|
||
# String function overrides | ||
if sqlite3.sqlite_version < "3.44.1": | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets explain these in a more plainspeak manner instead of the shorthand I gave in the issue. E.g.: "returns the number of years since
x
thaty
occurred"