Skip to content

add with_timezone #5906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

add with_timezone #5906

wants to merge 1 commit into from

Conversation

waitingkuo
Copy link
Contributor

Which issue does this PR close?

Closes #5905

Rationale for this change

What changes are included in this PR?

a new function with_timezone that replace the timezone of timestamp without changing underline value

select WITH_TIMEZONE(timestamp '2000-01-01T00:00:00.123456789', '+08:00');
+--------------------------------------------------------------------+
| withtimezone(Utf8("2000-01-01T00:00:00.123456789"),Utf8("+08:00")) |
+--------------------------------------------------------------------+
| 2000-01-01T08:00:00.123456789+08:00                                |
+--------------------------------------------------------------------+
1 row in set. Query took 0.004 seconds.
select WITH_TIMEZONE(WITH_TIMEZONE(timestamp '2000-01-01T00:00:00.123456789', '+08:00'), '+07:00');
+-------------------------------------------------------------------------------------------------+
| withtimezone(withtimezone(Utf8("2000-01-01T00:00:00.123456789"),Utf8("+08:00")),Utf8("+07:00")) |
+-------------------------------------------------------------------------------------------------+
| 2000-01-01T07:00:00.123456789+07:00                                                             |
+-------------------------------------------------------------------------------------------------+
1 row in set. Query took 0.003 seconds.

while the timezone is incorrect, it raises error

select WITH_TIMEZONE(timestamp '2000-01-01T00:00:00.123456789', '+25:00');
Arrow error: Parser error: Invalid timezone "+25:00": '+25:00' is not a valid timezone

Are these changes tested?

yes, test cases are added to timestamps.slt

Are there any user-facing changes?

@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt) labels Apr 7, 2023
Copy link
Contributor Author

@waitingkuo waitingkuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since there's no way to extract timezonefn return_type , the return data_type is corrected in create_physical_expr
@alamb please help to evaluate whether it make sense

cc @tustvold

Comment on lines +265 to +282
BuiltinScalarFunction::WithTimezone => match input_expr_types[0] {
DataType::Timestamp(TimeUnit::Nanosecond, _) => {
Ok(DataType::Timestamp(TimeUnit::Nanosecond, None))
}
DataType::Timestamp(TimeUnit::Microsecond, _) => {
Ok(DataType::Timestamp(TimeUnit::Microsecond, None))
}
DataType::Timestamp(TimeUnit::Millisecond, _) => {
Ok(DataType::Timestamp(TimeUnit::Millisecond, None))
}
DataType::Timestamp(TimeUnit::Second, _) => {
Ok(DataType::Timestamp(TimeUnit::Second, None))
}
_ => return Err(DataFusionError::Internal(
"The with_timezone function can only accept timestamp as the first arg."
.to_string(),
)),
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pub fn return_type(
    fun: &BuiltinScalarFunction,
    input_expr_types: &[DataType],
) -> Result<DataType> {

this function only has 2 inputs: function pointer and args types. there's no way to inference timezone in the final return type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +176 to +185
let timezone = format!("{}", input_phy_exprs[1]);
data_type = match data_type {
DataType::Timestamp(TimeUnit::Nanosecond, _) => DataType::Timestamp(TimeUnit::Nanosecond, Some(timezone)),
DataType::Timestamp(TimeUnit::Microsecond, _) => DataType::Timestamp(TimeUnit::Microsecond, Some(timezone)),
DataType::Timestamp(TimeUnit::Millisecond, _) => DataType::Timestamp(TimeUnit::Millisecond, Some(timezone)),
DataType::Timestamp(TimeUnit::Second, _) => DataType::Timestamp(TimeUnit::Second, Some(timezone)),
other => return Err(DataFusionError::Internal(format!(
"Unsupported data type {other:?} as the first arg for function with_timezone",
)))
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct the timezone in return data_type according to the args value

@alamb is this implementation ok for now? or it's recommended to modify the return_type function to accept some new args to achieve this?

I could submit another ticket to modify it as well

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @waitingkuo --

Even though arrow_cast could (and probably should) be updated to support timezones having a specific WITH_TIMEZONE seems like a good idea to me as it will be a better experience

I think this PR needs

  1. Get the type correct (I left a comment about how to do this)
  2. Add an entry to WITH_TIMEZONE the user documentation (e.g. https://github.com/apache/arrow-datafusion/blob/f00ef9d5484039dc7013af5b8fa04f7b29ffaba7/docs/source/user-guide/sql/scalar_functions.md?plain=1#L1208)

Comment on lines +265 to +282
BuiltinScalarFunction::WithTimezone => match input_expr_types[0] {
DataType::Timestamp(TimeUnit::Nanosecond, _) => {
Ok(DataType::Timestamp(TimeUnit::Nanosecond, None))
}
DataType::Timestamp(TimeUnit::Microsecond, _) => {
Ok(DataType::Timestamp(TimeUnit::Microsecond, None))
}
DataType::Timestamp(TimeUnit::Millisecond, _) => {
Ok(DataType::Timestamp(TimeUnit::Millisecond, None))
}
DataType::Timestamp(TimeUnit::Second, _) => {
Ok(DataType::Timestamp(TimeUnit::Second, None))
}
_ => return Err(DataFusionError::Internal(
"The with_timezone function can only accept timestamp as the first arg."
.to_string(),
)),
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@waitingkuo
Copy link
Contributor Author

Thanks @waitingkuo --

Even though arrow_cast could (and probably should) be updated to support timezones having a specific WITH_TIMEZONE seems like a good idea to me as it will be a better experience

I think this PR needs

  1. Get the type correct (I left a comment about how to do this)
  2. Add an entry to WITH_TIMEZONE the user documentation (e.g. https://github.com/apache/arrow-datafusion/blob/f00ef9d5484039dc7013af5b8fa04f7b29ffaba7/docs/source/user-guide/sql/scalar_functions.md?plain=1#L1208
    )

Hi @alamb thanks, in fact i'm working on adding timezone support for arrow_cast now, will send a pr soon

@tustvold
Copy link
Contributor

tustvold commented Apr 7, 2023

In fact i'm working on adding timezone support for arrow_cast now, will send a pr soon

I think we may want to fix apache/arrow-rs#1936 upstream, before exposing the currently incorrect timezone casting behaviour in arrow-rs?

TBC this is not a comment on this PR, the metadata only with_timezone added by this PR is good imo, but this should not be the behaviour of arrow-cast, which should instead preserve the same time "instant" as per the arrow specification

@waitingkuo waitingkuo marked this pull request as draft April 7, 2023 14:02
@waitingkuo
Copy link
Contributor Author

Thanks @waitingkuo --

Even though arrow_cast could (and probably should) be updated to support timezones having a specific WITH_TIMEZONE seems like a good idea to me as it will be a better experience

I think this PR needs

  1. Get the type correct (I left a comment about how to do this)
  2. Add an entry to WITH_TIMEZONE the user documentation (e.g. https://github.com/apache/arrow-datafusion/blob/f00ef9d5484039dc7013af5b8fa04f7b29ffaba7/docs/source/user-guide/sql/scalar_functions.md?plain=1#L1208
    )

Hi @alamb thank you for the suggestion.

does it mean that i need to move WITH_TIMEZONE out of BuiltinScalarFunction ?

@alamb
Copy link
Contributor

alamb commented Apr 7, 2023

does it mean that i need to move WITH_TIMEZONE out of BuiltinScalarFunction ?

I am not sure to be honest -- there might not be a good existing pattern to follow for such functions. 🤔

@waitingkuo
Copy link
Contributor Author

@alamb thanks, i'll think about how to fix this

@alamb
Copy link
Contributor

alamb commented Apr 8, 2024

Since this has been open for more than a year, closing it down. Feel free to reopen if/when you keep working on it.

@alamb alamb closed this Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement with_timezone function
3 participants