-
Notifications
You must be signed in to change notification settings - Fork 1.5k
add with_timezone #5906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add with_timezone #5906
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BuiltinScalarFunction::WithTimezone => match input_expr_types[0] { | ||
DataType::Timestamp(TimeUnit::Nanosecond, _) => { | ||
Ok(DataType::Timestamp(TimeUnit::Nanosecond, None)) | ||
} | ||
DataType::Timestamp(TimeUnit::Microsecond, _) => { | ||
Ok(DataType::Timestamp(TimeUnit::Microsecond, None)) | ||
} | ||
DataType::Timestamp(TimeUnit::Millisecond, _) => { | ||
Ok(DataType::Timestamp(TimeUnit::Millisecond, None)) | ||
} | ||
DataType::Timestamp(TimeUnit::Second, _) => { | ||
Ok(DataType::Timestamp(TimeUnit::Second, None)) | ||
} | ||
_ => return Err(DataFusionError::Internal( | ||
"The with_timezone function can only accept timestamp as the first arg." | ||
.to_string(), | ||
)), | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub fn return_type(
fun: &BuiltinScalarFunction,
input_expr_types: &[DataType],
) -> Result<DataType> {
this function only has 2 inputs: function pointer and args types. there's no way to inference timezone in the final return type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you may need to special case it, in the same way as arrow_cast
: https://github.com/apache/arrow-datafusion/blob/f00ef9d5484039dc7013af5b8fa04f7b29ffaba7/datafusion/sql/src/expr/function.rs#L139-L142
let timezone = format!("{}", input_phy_exprs[1]); | ||
data_type = match data_type { | ||
DataType::Timestamp(TimeUnit::Nanosecond, _) => DataType::Timestamp(TimeUnit::Nanosecond, Some(timezone)), | ||
DataType::Timestamp(TimeUnit::Microsecond, _) => DataType::Timestamp(TimeUnit::Microsecond, Some(timezone)), | ||
DataType::Timestamp(TimeUnit::Millisecond, _) => DataType::Timestamp(TimeUnit::Millisecond, Some(timezone)), | ||
DataType::Timestamp(TimeUnit::Second, _) => DataType::Timestamp(TimeUnit::Second, Some(timezone)), | ||
other => return Err(DataFusionError::Internal(format!( | ||
"Unsupported data type {other:?} as the first arg for function with_timezone", | ||
))) | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct the timezone in return data_type according to the args value
@alamb is this implementation ok for now? or it's recommended to modify the return_type
function to accept some new args to achieve this?
I could submit another ticket to modify it as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @waitingkuo --
Even though arrow_cast
could (and probably should) be updated to support timezones having a specific WITH_TIMEZONE seems like a good idea to me as it will be a better experience
I think this PR needs
- Get the type correct (I left a comment about how to do this)
- Add an entry to
WITH_TIMEZONE
the user documentation (e.g. https://github.com/apache/arrow-datafusion/blob/f00ef9d5484039dc7013af5b8fa04f7b29ffaba7/docs/source/user-guide/sql/scalar_functions.md?plain=1#L1208)
BuiltinScalarFunction::WithTimezone => match input_expr_types[0] { | ||
DataType::Timestamp(TimeUnit::Nanosecond, _) => { | ||
Ok(DataType::Timestamp(TimeUnit::Nanosecond, None)) | ||
} | ||
DataType::Timestamp(TimeUnit::Microsecond, _) => { | ||
Ok(DataType::Timestamp(TimeUnit::Microsecond, None)) | ||
} | ||
DataType::Timestamp(TimeUnit::Millisecond, _) => { | ||
Ok(DataType::Timestamp(TimeUnit::Millisecond, None)) | ||
} | ||
DataType::Timestamp(TimeUnit::Second, _) => { | ||
Ok(DataType::Timestamp(TimeUnit::Second, None)) | ||
} | ||
_ => return Err(DataFusionError::Internal( | ||
"The with_timezone function can only accept timestamp as the first arg." | ||
.to_string(), | ||
)), | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you may need to special case it, in the same way as arrow_cast
: https://github.com/apache/arrow-datafusion/blob/f00ef9d5484039dc7013af5b8fa04f7b29ffaba7/datafusion/sql/src/expr/function.rs#L139-L142
Hi @alamb thanks, in fact i'm working on adding timezone support for |
I think we may want to fix apache/arrow-rs#1936 upstream, before exposing the currently incorrect timezone casting behaviour in arrow-rs? TBC this is not a comment on this PR, the metadata only |
Hi @alamb thank you for the suggestion. does it mean that i need to move |
I am not sure to be honest -- there might not be a good existing pattern to follow for such functions. 🤔 |
@alamb thanks, i'll think about how to fix this |
Since this has been open for more than a year, closing it down. Feel free to reopen if/when you keep working on it. |
Which issue does this PR close?
Closes #5905
Rationale for this change
What changes are included in this PR?
a new function
with_timezone
that replace the timezone of timestamp without changing underline valuewhile the timezone is incorrect, it raises error
Are these changes tested?
yes, test cases are added to
timestamps.slt
Are there any user-facing changes?