-
Notifications
You must be signed in to change notification settings - Fork 1.5k
support SET Timezone
#4107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support SET Timezone
#4107
Conversation
fn convert_simple_data_type(&self, sql_type: &SQLDataType) -> Result<DataType> { | ||
match sql_type { | ||
SQLDataType::Boolean => Ok(DataType::Boolean), | ||
SQLDataType::TinyInt(_) => Ok(DataType::Int8), | ||
SQLDataType::SmallInt(_) => Ok(DataType::Int16), | ||
SQLDataType::Int(_) | SQLDataType::Integer(_) => Ok(DataType::Int32), | ||
SQLDataType::BigInt(_) => Ok(DataType::Int64), | ||
SQLDataType::UnsignedTinyInt(_) => Ok(DataType::UInt8), | ||
SQLDataType::UnsignedSmallInt(_) => Ok(DataType::UInt16), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
becomes method of SqlToRel, so that we could get SessionState
-> config_options
-> timezone
datafusion/sql/src/planner.rs
Outdated
SQLDataType::Timestamp(tz_info) => { | ||
let tz = if matches!(tz_info, TimezoneInfo::Tz) | ||
|| matches!(tz_info, TimezoneInfo::WithTimeZone) | ||
{ | ||
match self.schema_provider.get_config_option("datafusion.execution.time_zone") { | ||
Some(ScalarValue::Utf8(s)) => { | ||
s | ||
} | ||
Some(_) => { | ||
None | ||
} | ||
None => None | ||
} | ||
//Some("+00:00".to_string()) | ||
|
||
} else { | ||
None | ||
}; | ||
Ok(DataType::Timestamp(TimeUnit::Nanosecond, tz)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as this becomes the method of SqltoRel
, we can now get the timezone and use it to replace the fixed UTC
note that we need to add some timezone validation here (e.g. 00:00.1 isn't valid)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is somewhat strange to me that the type of a table that is created depends on the current session 's timezone setting -- but I can't think of anything better short of the user explicitly providing the timezone themselves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sql's TimestampTz determine the timezone during the execution (perhaps right before it's been execute). arrow doesn't support TimestampTz in nature. what arrow has here is Timestamp<TimeUnit, Some(tz)>
.
I've been thinking about how to deal with this for a long time 🤮 . I have 2 approaches to do so
-
add some thing like
Timestamp<TimeUnit, Some("session")>
All the time related functions/kernels need to convert this "session" to the timezone by themselves. i.e.fn week()
acceptTimestamp<TimeUnit, Some("session")>
as input and use the timezone in config_options to convert the underline timestamp and then do the week extraction. This will be more that theTImestampTz
in SQL -
arrow's
Timestamp<TImeUnit, Some("some-timezone")>
is determined while SQL is translated to arrow's data type. and this is what this pr is using.
SQL's TimestampTz isn't deterministic while making arrow's data deterministic makes more sense to me; so i picked the 2nd approach for now. But i'm happy to try the first approach if anyone is interested in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the second approach makes sense to me.
Maybe usability wise we can just add enough comments about what is happening, or maybe add that best practice is to explicitly specify the timezone in the create table statement / timestamp types.
datafusion/sql/src/planner.rs
Outdated
/* | ||
pub fn convert_simple_data_type(sql_type: &SQLDataType) -> Result<DataType> { | ||
match sql_type { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the original function which is now the method of SqlToRel
perhaps we should a deprecated notifunction here as this is a pub function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove it personally or of we want to deprecate it for a release cycle that would be fine
Bonus points for somehow calling the new implementation (like making a dummy SqlToRel
somehow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is looking great !
@@ -1741,6 +1741,12 @@ impl ContextProvider for SessionState { | |||
.as_ref() | |||
.and_then(|provider| provider.get(&provider_type)?.get_type(variable_names)) | |||
} | |||
|
|||
fn get_config_option(&self, variable: &str) -> Option<ScalarValue> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
datafusion/sql/src/planner.rs
Outdated
SQLDataType::Timestamp(tz_info) => { | ||
let tz = if matches!(tz_info, TimezoneInfo::Tz) | ||
|| matches!(tz_info, TimezoneInfo::WithTimeZone) | ||
{ | ||
match self.schema_provider.get_config_option("datafusion.execution.time_zone") { | ||
Some(ScalarValue::Utf8(s)) => { | ||
s | ||
} | ||
Some(_) => { | ||
None | ||
} | ||
None => None | ||
} | ||
//Some("+00:00".to_string()) | ||
|
||
} else { | ||
None | ||
}; | ||
Ok(DataType::Timestamp(TimeUnit::Nanosecond, tz)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is somewhat strange to me that the type of a table that is created depends on the current session 's timezone setting -- but I can't think of anything better short of the user explicitly providing the timezone themselves
datafusion/sql/src/planner.rs
Outdated
let tz = if matches!(tz_info, TimezoneInfo::Tz) | ||
|| matches!(tz_info, TimezoneInfo::WithTimeZone) | ||
{ | ||
match self.schema_provider.get_config_option("datafusion.execution.time_zone") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to use a named constant instead of the string here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb named constants are in datafusion/core
. i'm not sure how to import them from here (due to circular import)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 maybe we should eventually move the config option stuff to common? I have a todo on my list to write up a ticket about rationalizing the config system - if no one else beats me to it I'll probably start in a week or two
datafusion/sql/src/planner.rs
Outdated
/* | ||
pub fn convert_simple_data_type(sql_type: &SQLDataType) -> Result<DataType> { | ||
match sql_type { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove it personally or of we want to deprecate it for a release cycle that would be fine
Bonus points for somehow calling the new implementation (like making a dummy SqlToRel
somehow)
1c334bf
to
6862860
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb @avantgardnerio
i made some updates
- fix examples
- limit the usage to fixedoffset (like +08:00)
- add test cases for different timezone format
- delete
pub fn convert_simple_data_type
, it becomes the method ofSqlToRel
- still use
dataufion.execution.time_zone
instead ofTIME_ZONE_OPT
since importing constant variable insql/planners
introduces circular import.
/// Convert SQL simple data type to relational representation of data type | ||
pub fn convert_simple_data_type(sql_type: &SQLDataType) -> Result<DataType> { | ||
match sql_type { | ||
SQLDataType::Boolean => Ok(DataType::Boolean), | ||
SQLDataType::TinyInt(_) => Ok(DataType::Int8), | ||
SQLDataType::SmallInt(_) => Ok(DataType::Int16), | ||
SQLDataType::Int(_) | SQLDataType::Integer(_) => Ok(DataType::Int32), | ||
SQLDataType::BigInt(_) => Ok(DataType::Int64), | ||
SQLDataType::UnsignedTinyInt(_) => Ok(DataType::UInt8), | ||
SQLDataType::UnsignedSmallInt(_) => Ok(DataType::UInt16), | ||
SQLDataType::UnsignedInt(_) | SQLDataType::UnsignedInteger(_) => { | ||
Ok(DataType::UInt32) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb i deleted it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I marked the PR as "API change"
datafusion/sql/src/planner.rs
Outdated
match self | ||
.schema_provider | ||
.get_config_option("datafusion.execution.time_zone") | ||
{ | ||
Some(ScalarValue::Utf8(s)) => s, | ||
Some(v) => { | ||
return Err(DataFusionError::Internal(format!( | ||
"Incorrect data type for time_zone: {}", | ||
v.get_datatype(), | ||
))) | ||
} | ||
None => return Err(DataFusionError::Internal(format!( | ||
"Config Option datafusion.execution.time_zone doesn't exist" | ||
))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ScalarValue::UTF8
-> Option<String>
other ScalarValue
-> Err
None
(variable name doen't exsit) -> Err
async fn set_time_zone_good_time_zone_format() { | ||
let ctx = | ||
SessionContext::with_config(SessionConfig::new().with_information_schema(true)); | ||
|
||
plan_and_collect(&ctx, "SET TIME ZONE = '+08:00'") | ||
.await | ||
.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test cases for good time zone format
// casting UTF-8 to TimestampTZ isn't supported yet, add Timestamp as the middle layer for now | ||
let result = | ||
plan_and_collect(&ctx, "SELECT '2000-01-01T00:00:00'::TIMESTAMP::TIMESTAMPTZ") | ||
.await | ||
.unwrap(); | ||
let expected = vec![ | ||
"+-----------------------------+", | ||
"| Utf8(\"2000-01-01T00:00:00\") |", | ||
"+-----------------------------+", | ||
"| 2000-01-01T08:00:00+08:00 |", | ||
"+-----------------------------+", | ||
]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally we could change this to '2000-01-01T00:00:00'::TIMESTAMPTZ
once it's added in the casting kernel (in arrow-rs)
async fn set_time_zone_good_time_zone_format() { | ||
let ctx = | ||
SessionContext::with_config(SessionConfig::new().with_information_schema(true)); | ||
|
||
plan_and_collect(&ctx, "SET TIME ZONE = '+08:00'") | ||
.await | ||
.unwrap(); | ||
|
||
// casting UTF-8 to TimestampTZ isn't supported yet, add Timestamp as the middle layer for now | ||
let result = | ||
plan_and_collect(&ctx, "SELECT '2000-01-01T00:00:00'::TIMESTAMP::TIMESTAMPTZ") | ||
.await | ||
.unwrap(); | ||
let expected = vec![ | ||
"+-----------------------------+", | ||
"| Utf8(\"2000-01-01T00:00:00\") |", | ||
"+-----------------------------+", | ||
"| 2000-01-01T08:00:00+08:00 |", | ||
"+-----------------------------+", | ||
]; | ||
// this might break once https://github.com/apache/arrow-rs/issues/1936 fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that these test cases might need to be modified after apache/arrow-rs#1936 fixed. I added a message here
#[tokio::test] | ||
async fn set_time_zone_bad_time_zone_format() { | ||
let ctx = | ||
SessionContext::with_config(SessionConfig::new().with_information_schema(true)); | ||
|
||
plan_and_collect(&ctx, "SET TIME ZONE = '+08:00:00'") | ||
.await | ||
.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test cases for bad time zone format
let expected = vec![ | ||
"+-------------------------------------------------------+", | ||
"| Utf8(\"2000-01-01T00:00:00\") |", | ||
"+-------------------------------------------------------+", | ||
"| 2000-01-01T00:00:00 (Unknown Time Zone 'Asia/Taipei') |", | ||
"+-------------------------------------------------------+", | ||
]; | ||
assert_batches_eq!(expected, &result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@avantgardnerio
based on the discussion here #4106 (comment)
named timezone like America/Denver
or Asia/Taipei
won't be supported in this pr
match self | ||
.schema_provider | ||
.get_config_option("datafusion.execution.time_zone") | ||
{ | ||
Some(ScalarValue::Utf8(s)) => s, | ||
Some(v) => { | ||
return Err(DataFusionError::Internal(format!( | ||
"Incorrect data type for time_zone: {}", | ||
v.get_datatype(), | ||
))) | ||
} | ||
None => return Err(DataFusionError::Internal( | ||
"Config Option datafusion.execution.time_zone doesn't exist" | ||
.to_string(), | ||
)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ScalarValue::Utf8
-> Option<String>
other ScalarValue
-> Err
None (variable datafusion.execution.time_zone doesn't exist in config_options) -> Err
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- thank you @waitingkuo
I will wait for 24 hours to merge this as it has an API change.
/// Convert SQL simple data type to relational representation of data type | ||
pub fn convert_simple_data_type(sql_type: &SQLDataType) -> Result<DataType> { | ||
match sql_type { | ||
SQLDataType::Boolean => Ok(DataType::Boolean), | ||
SQLDataType::TinyInt(_) => Ok(DataType::Int8), | ||
SQLDataType::SmallInt(_) => Ok(DataType::Int16), | ||
SQLDataType::Int(_) | SQLDataType::Integer(_) => Ok(DataType::Int32), | ||
SQLDataType::BigInt(_) => Ok(DataType::Int64), | ||
SQLDataType::UnsignedTinyInt(_) => Ok(DataType::UInt8), | ||
SQLDataType::UnsignedSmallInt(_) => Ok(DataType::UInt16), | ||
SQLDataType::UnsignedInt(_) | SQLDataType::UnsignedInteger(_) => { | ||
Ok(DataType::UInt32) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I marked the PR as "API change"
Thank you again @waitingkuo for driving this feature set forward |
Thanks again @waitingkuo |
Benchmark runs are scheduled for baseline = 63386d1 and contender = e8fe042. e8fe042 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #4106
Rationale for this change
a POC for #4106
to support set time zone
casting to timestamptz should use the timezone instead of the fixed UTC
What changes are included in this PR?
Are there any user-facing changes?