-
Notifications
You must be signed in to change notification settings - Fork 472
Bring EXTRACT into alignment with PostgreSQL v14 #10027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/repr/src/adt/numeric.rs
Outdated
|
||
/// Used to do potentially lossy value-to-value conversions while consuming the input value. Useful | ||
/// for interoperability between Numeric and f64. | ||
pub trait LossyFrom<T>: Sized { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't super happy with this approach. The issue is that you can't do f64::from(i)
if i
is type i64
since the conversion is potentially lossy. Which means I can't use the From<i64
trait on generic methods. So if anyone has a better approach please let me know. Also if there's a better place to put this please let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also if you have a better name, since it's technically only lossy when used with f64
and not Numeric
, let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think d155340 cleans this up slightly with the DecimalLike
trait.
Before this commit, EXTRACT was treated as an alias to date_part with slightly different syntax. Both of these functions returned a float data type. PostgreSQL v14 updated EXTRACT to return a numeric data type, which makes it compliant with the SQL standard. This commit updates the EXTRACT function so that it return a numeric data type so that it matches PostgreSQL. date_part still returns a float. Additionally PostgreSQL v14 implemented EXTRACT explicitly for the DATE data type. Previously DATEs were casted to TIMESTAMPs before extracting fields from them. This commit also explicitly implements EXTRACT for DATE types, so they aren't cast to a TIMESTAMP first. A consequence of this is that time related fields (e.g. SECOND) are now rejected when trying to EXTRACT a DATE type. However, The implementation for date_part still casts DATE types to TIMESTAMP types. This means that date_part does not reject time related fields for DATE types. This implementation matches PostgreSQL. The postgres commit that implements these changes can be found here: postgres/postgres@a2da77c This commit also implements extracting EPOCHs from TIME types, which wasn't previously implemented. Fixes #9853, #9870
f292922
to
3a3e978
Compare
Also, in the discussion and commit for postgres they talk about making an optimized numeric division function:
I haven't looked into this in our code base, but if there's interest than I can see if we can make a similar optimization. |
…sh44/extract � Conflicts: � doc/user/content/release-notes.md
{ | ||
/// Used to do value-to-value conversions while consuming the input value. Depending on the | ||
/// implementation it may be potentially lossy. | ||
fn lossy_from(i: i64) -> Self; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently I just found out that there's a similar RFC for Rust: rust-lang/rfcs#2484
…sh44/extract � Conflicts: � doc/user/content/release-notes.md
…sh44/extract � Conflicts: � doc/user/content/release-notes.md
…sh44/extract � Conflicts: � doc/user/content/release-notes.md
So sorry for the delay here, @jkosh44! Looks like this slipped through the cracks. I've pinged a few relevant folks. |
No worries and no rush. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a soft LGTM on this right now, modulo the comments; I need to take another look at it this evening.
Feel free to hold off on implementing anything mentioned here until I (or someone else) comes back with firmer suggestions. I agree this feels a little over-engineered, but I think that's maybe just an artifact of the trait.
src/expr/src/scalar/func.rs
Outdated
DatePartTimestamp(units) => date_part_timestamp_inner(*units, a.unwrap_timestamp()), | ||
DatePartTimestampTz(units) => date_part_timestamp_inner(*units, a.unwrap_timestamptz()), | ||
ExtractInterval(units) => { | ||
date_part_interval_inner::<Numeric>(*units, a.unwrap_interval()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was not your doing, but all of these unwraps could just get moved inside this inner func:
diff --git a/src/expr/src/scalar/func.rs b/src/expr/src/scalar/func.rs
index 331e1acdb..683d941d6 100644
--- a/src/expr/src/scalar/func.rs
+++ b/src/expr/src/scalar/func.rs
@@ -1735,13 +1735,11 @@ fn date_part_interval(a: Datum, interval: Interval) -> Result<Datum, EvalError>
}
}
-fn date_part_interval_inner<T>(
- units: DateTimeUnits,
- interval: Interval,
-) -> Result<Datum<'static>, EvalError>
+fn date_part_interval_inner<T>(units: DateTimeUnits, a: Datum) -> Result<Datum<'static>, EvalError>
where
T: DecimalLike + Into<Datum<'static>>,
{
+ let interval = a.unwrap_interval();
match units {
DateTimeUnits::Epoch => Ok(interval.as_seconds::<T>().into()),
DateTimeUnits::Millennium => Ok(T::from(interval.millennia()).into()),
@@ -3752,9 +3750,7 @@ impl UnaryFunc {
CharLength => char_length(a),
IsRegexpMatch(regex) => Ok(is_regexp_match_static(a, ®ex)),
RegexpMatch(regex) => regexp_match_static(a, temp_storage, ®ex),
- ExtractInterval(units) => {
- date_part_interval_inner::<Numeric>(*units, a.unwrap_interval())
- }
+ ExtractInterval(units) => date_part_interval_inner::<Numeric>(*units, a),
ExtractTime(units) => date_part_time_inner::<_, Numeric>(*units, a.unwrap_time()),
ExtractTimestamp(units) => {
date_part_timestamp_inner::<_, Numeric>(*units, a.unwrap_timestamp())
@@ -3763,7 +3759,7 @@ impl UnaryFunc {
date_part_timestamp_inner::<_, Numeric>(*units, a.unwrap_timestamptz())
}
ExtractDate(units) => extract_date_inner(*units, a.unwrap_date()),
- DatePartInterval(units) => date_part_interval_inner::<f64>(*units, a.unwrap_interval()),
+ DatePartInterval(units) => date_part_interval_inner::<f64>(*units, a),
DatePartTime(units) => date_part_time_inner::<_, f64>(*units, a.unwrap_time()),
DatePartTimestamp(units) => {
date_part_timestamp_inner::<_, f64>(*units, a.unwrap_timestamp())
note that I don't think this blithely works with some of the other suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushed this change for interval
, time
, and date
.
@@ -8,13 +8,15 @@ menu: | |||
|
|||
`EXTRACT` returns some time component from a time-based value, such as the year from a Timestamp. | |||
|
|||
See [`date_part`](../date-part) for the traditional Ingres equivalent function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to point people from date_part to extract because extract is the new preferred way. Does it make sense to not even link people from extract to date_part, and instead remove this sentence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this sentence.
…sh44/extract � Conflicts: � doc/user/content/release-notes.md
@sploiselle @mjibson Just pushed some changes that address your comments and fix merge conflicts. Thanks for the review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs changes LGTM. Other folks might have opinions about the Rust, I don't.
…sh44/extract � Conflicts: � doc/user/content/release-notes.md
…sh44/extract � Conflicts: � doc/user/content/release-notes.md
@jkosh44 Sorry for the delay here; I saw there were some failing tests and forgot to circle back here. Overall this approach LGTM; if there's some simplification that becomes obvious, can always refactor. When I came back to this today, though, I had an easy time reasoning about the changes, so I think it's in a good state. LMK if you'd like any help chasing down these failures. |
@sploiselle Sorry about that, I didn't notice I had failed tests. I just pushed some changes that should hopefully fix them, but I think I need someone to starts the build for me (I'm not a member of the build-authorized team). |
@sploiselle Double apologies, I definitely should have confirmed locally that I had fixed everything. I fixed the remaining errors and confirmed locally. Lint Docs:
Testdrive
|
No apologies necessary! Testing is iterative |
Breaking changes are supposed to be listed first in the release notes. PR MaterializeInc#10027 mistakenly did not list it's breaking changes first, so this commit fixes that.
Before this commit, EXTRACT was treated as an alias to date_part with slightly
different syntax. Both of these functions returned a float data type. PostgreSQL
v14 updated EXTRACT to return a numeric data type, which makes it compliant with
the SQL standard. This commit updates the EXTRACT function so that it return a
numeric data type so that it matches PostgreSQL. date_part still returns a float.
Additionally PostgreSQL v14 implemented EXTRACT explicitly for the DATE data
type. Previously DATEs were casted to TIMESTAMPs before extracting fields from
them. This commit also explicitly implements EXTRACT for DATE types, so they
aren't cast to a TIMESTAMP first. A consequence of this is that time related
fields (e.g. SECOND) are now rejected when trying to EXTRACT a DATE type.
However, The implementation for date_part still casts DATE types to TIMESTAMP
types. This means that date_part does not reject time related fields for DATE
types. This implementation matches PostgreSQL.
The postgres commit that implements these changes can be found here:
postgres/postgres@a2da77c
This commit also implements extracting EPOCHs from TIME types, which wasn't
previously implemented.
Fixes MaterializeInc/database-issues#2980, MaterializeInc/database-issues#2981, #10027
Motivation
This PR adds a known-desirable feature: https://github.com/MaterializeInc/database-issues/issues/2980 and https://github.com/MaterializeInc/database-issues/issues/2981
Tips for reviewer
The previous implementation has the following structure. Each type (
Interval
,TimestampLike
, etc) implemented methods to extract out the various parts from the type and return anf64
. For exampleInterval
hadfn extract_quarter(&self) -> f64
. These methods can be further broken down into two categories:f64
before returning. These are all the other date/time parts.The
date_part_X_inner
functions (ex:date_part_timestamp_inner
) would simply call these functions and wrap the result in aResult<Datum<'a>, EvalError>
.This PR changes it to the following structure. Type 1 methods are converted into generic methods that let them do either floating point arithmetic or Numeric arithmetic. Type 2 methods are converted to returning signed/unsigned integers directly. The
date_part_X_inner
methods are also made generic so that they can be called with eitherf64
orNumeric
. When adate_part_X_inner
method calls a type 1 method, is uses the supplied generic type. When adate_part_X_inner
method calls a type 2 method, it casts the result to the supplied generic type.The goal was to try and avoid duplicating every method twice for the different return types. Though after finishing it feels slightly over-engineered so I'm curious to hear if anyone has thoughts/suggestions on the approach.
Also I'm very open to stylistic comments since I'm new to both Rust and the code base.
Checklist