-
Notifications
You must be signed in to change notification settings - Fork 3
EPIC: Icehut timestamp support #100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Timestamps in datafusion works peculiarly. For example in query like, select convert_timezone('America/New_York, 'UTC', v3), where v3 is a timestamp with value '2024-01-06 08:00:00 America/New_York'. |
A udf cannot return inside a timestamp timezone_wildcard or any predtermined timezones, because it understands you literally. So if you get a timestamp with a timezone of "+00" it expects that the return type will have the same timezone. And in fn return types, since this is dynamic and we only have the argument types, not the arguments themselves, we cannot know the timezone in advance, or pattern match on it, etc. We can however write by hand or with the use of chrono_tz "TZ_Variants" or/and " Tz"(chrono_tz it is used internally by chrono for the Timezone trait and is made by the same people) in the creation of the signature all possible timezone combinations. |
My suggestion is that we always accept only utf8 and get the timezone from there before datafusion converts it to a timestamp with right value (ts), but a broken timezone (tz). |
Even if we do this we still cannot know the output timezone in advance, because even if the timezone is not going to be "+00" we will change it, and for example in a 2 argument version select convert_timezone('UTC', v3), where v3 is a timestamp with value '2024-01-06 08:00:00 America/New_York', it will still expect 'America/New_York' not "+00", "UTC", etc. |
This is confirmed by apache/datafusion#12218 |
All those finds are connected to #76 |
Problem DescriptionTimestamps in Datafusion are not handled as expected in certain scenarios. Specifically, when using a query like: SELECT convert_timezone('America/New_York', 'UTC', v3) Where v3 is a timestamp with the value Expected BehaviorThe system should recognize that the Timestamp<Unit>(Some(ts), None)
Actual BehaviorInstead of throwing an error or handling it properly, the query passes through. However, the internal math in Datafusion breaks. For example, with v3 as
Root CauseSomewhere inside Datafusion, the timezone is added to the ObservationsWhen attempting to match the pattern: Timestamp<Unit>(Some(ts), Some(tz))
This pattern is used in the two-argument version of the function (where the timestamp should include a timezone). However: The Suggested FixInvestigate and prevent the internal conversion that adds the timezone offset to the i64 representation of the timestamp. Steps to ReproduceExecute the query: SELECT convert_timezone('America/New_York', 'UTC', '2024-01-06 08:00:00 America/New_York') Observe the incorrect result:
Compare with the expected result for v3 as '2024-01-06 08:00:00' (without timezone):
@DanCodedThis Let me know if ChatGPT accurately summarized the problem :) |
@slyons It did, actually.
|
Everything we know about timestamp: snowflake support, datafusion, arrow, Iceberg, etc.
All problems we identified, solved. All decisions we made.
The text was updated successfully, but these errors were encountered: