Add support for DATEDIFF function in PyDough, with operators controlling units #221
Labels
documentation
Improvements or additions to documentation
effort - medium
mid-sized issue with average implementation time/difficulty
enhancement
New feature or request
user feature
Adding a new user-facing feature/functionality
Goal: add the following functionalities to PyDough, which should be tested e2e (see
test_pipeline.py
for examples):DATEDIFF(x, y, "years")
returnsy-x
in years (December 31st of 2009 and January 1st of 2010 count as 1 year apart).DATEDIFF(x, y, "months")
returnsy-x
in months (January 31st of 2014 and February 1st of 2014 count as 1 month apart).DATEDIFF(x, y, "days")
returnsy-x
in days (11:59 pm of one day vs 12:01 am of the next day count as 1 day apart).DATEDIFF(x, y, "hours")
returnsy-x
in hours (6:59 pm vs 7:01 pm of the same day count as 1 hour apart).DATEDIFF(x, y, "minutes")
returnsy-x
in minutes (same idea as hours).DATEDIFF(x, y, "seconds")
returnsy-x
in seconds (same idea as hours).The default, without
unit
, should be"days"
. Any other unit should raise an exception.How these should be converted for SQLite:
x
andy
to the corresponding unit usingdate
, take the difference in theirjulianday
values after truncation & multiply accordingly. Everything after the truncation can be done with the SQLGlotDateDiff
function. For example:DATEDIFF(x, y, "days")
-> sqlglot code isDATEDIFF(this=DATE(y, 'start of day'), expression=DATE(x, 'start of day'), unit=VAR('DAY'))
-> sqlite text isCAST(JULIANDAY(DATE(y, 'start of day')) - JULIANDAY(DATE(x, 'start of day')) AS INTEGER)
DATEDIFF(x, y, "hours")
-> sqlglot code isDATEDIFF(this=DATE(y, 'start of hour'), expression=DATE(x, 'start of hour'), unit=VAR('HOUR'))
-> sqlite text isCAST(JULIANDAY(DATE(y, 'start of hour')) - JULIANDAY(DATE(x, 'start of hour')) * 24.0 AS INTEGER)
x
to the start of the year usingdate
, take the difference in years. For example:DATEDIFF(x, y, "years")
-> becomes the equivalent ofCAST(strftime('%Y', date(y, 'start of year')) AS INTEGER) - CAST(strftime('%Y', date(x, 'start of year')) AS INTEGER)
DATEDIFF(x, y, "months")
-> becomes the equivalent of12 * (CAST(strftime('%Y', date(y, 'start of year')) AS INTEGER) - CAST(strftime('%Y', date(x, 'start of year')) AS INTEGER)) + CAST(strftime('%m', date(y, 'start of month')) AS INTEGER) - CAST(strftime('%m', date(x, 'start of month')) AS INTEGER)
All of these features need to be documented in the function list documentation
The text was updated successfully, but these errors were encountered: