-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Add support for default column values in engine #25679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
796b75e
to
9b59d7c
Compare
9b59d7c
to
30ab353
Compare
3eae180
to
2e3b139
Compare
b905a01
to
d75264c
Compare
core/trino-main/src/main/java/io/trino/execution/CreateTableTask.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/CreateTableTask.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/execution/AddColumnTask.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/QueryPlanner.java
Outdated
Show resolved
Hide resolved
core/trino-parser/src/main/java/io/trino/sql/parser/AstBuilder.java
Outdated
Show resolved
Hide resolved
testing/trino-testing/src/main/java/io/trino/testing/TestingConnectorBehavior.java
Outdated
Show resolved
Hide resolved
@@ -535,7 +535,7 @@ private RelationPlan getInsertPlan( | |||
if (supportsMissingColumnsOnInsert) { | |||
continue; | |||
} | |||
expression = new Constant(column.getType(), null); | |||
expression = new Constant(column.getType(), column.getDefaultValue().orElse(null)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the connector supportsMissingColumnsOnInsert
, then it is supposed to apply the default value on its own, right?
Do we have tests to cover both cases (connector applies the default value versus the engine applies it here)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. We don't have such a test. I'm wondering if we can write a helpful test covering both cases in the engine.
d75264c
to
995de6e
Compare
Addressed comments. |
core/trino-grammar/src/main/antlr4/io/trino/grammar/sql/SqlBase.g4
Outdated
Show resolved
Hide resolved
ColumnMetadata column = ColumnMetadata.builder() | ||
.setName(columnName.getValue()) | ||
.setType(getSupportedType(session, catalogHandle, tableMetadata.metadata().getProperties(), type)) | ||
.setType(supportedType) | ||
.setDefaultValue(element.getDefaultValue().map(value -> evaluateLiteral(literalInterpreter, value, type))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn’t the literal be evaluated as supported type ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the connector supports VARCHAR(4)
and default value abcde
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I was wondering which type should win. My question is why users specify 5 characters on VARCHAR(4)
column. That looks user mistake to me.
I will change to supportedType
anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are special rules regarding type coercions at insert. They are described in the section "Store assignment" of the spec. TL;DR it is legal to trim a string to fit into the declared type, but you can only trim spaces.
I think we should respect the declared type of the column, and follow the coercion rules for inserts.
You can refer to the method noTruncationCast()
in LogicalPlanner
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a logic to trim following spaces. Could you take another look?
Also, extracted helper method so we can reuse from CreateTableTask
and AddColumnTask
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ebyhr I discussed this with Martin, and we found relevant information in the spec. I'm sorry for the confusion. Here are the rules:
If the subject data type is character string, then the <literal> shall be a <character string
literal>. If the length of the subject data type is fixed, then the length in characters of
the <character string literal> shall not be greater than the length of the subject data
type. If the length of the subject data type is variable, then the length in characters of
the <character string literal> shall not be greater than the maximum length of the
subject data type.
It implies that
- the default value declaration should fail if the value is too long for the type
- the declaration should succeed when the value is shorter. In such a case, we should coerce the value to the declared type. Since we only support literals, we can statically adjust the value at column declaration time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we confirmed that the DEFAULT NULL NOT NULL
behavior is implemented correctly: it can be legally declared, and should fail whenever the default value is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kasiafi In case of CREATE TABLE/CTAS
we have introduced supportedType
spi method which allows the engine to automatically coerce the data to supported type - In case of hive if it supports only timestamp(3) and if we provide a default value which is of timestamp(6) - then shouldn't we cast the default value to the supported type ?
core/trino-main/src/test/java/io/trino/execution/TestCreateTableTask.java
Show resolved
Hide resolved
testing/trino-testing/src/main/java/io/trino/testing/BaseConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-memory/src/test/java/io/trino/plugin/memory/TestMemoryConnectorTest.java
Outdated
Show resolved
Hide resolved
cecbfec
to
62525ca
Compare
Addressed comments and added anothor commit supporting the default option in Memory connector. |
column.getDefaultValue().ifPresent(defaultValue -> defaultColumnValues.put(columnHandle, defaultValue)); | ||
if (!column.isNullable()) { | ||
nonNullableColumnHandles.add(columnHandle); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the column is non-nullable and the default value is null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It fails as NOT NULL violation. Added another test to BaseConnectorTest and verified locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So it is legal to declare: y int DEFAULT null NOT NULL
but then it fails if the default value is actually inserted. I wondered if the spec even allows to define DEFAULT null NOT NULL
but seems like it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of DEFAULT null NOT NULL
shouldn't we fail as the default value here doesn't makes much sense ?
core/trino-main/src/main/java/io/trino/sql/rewrite/ShowQueriesRewrite.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/rewrite/ShowQueriesRewrite.java
Outdated
Show resolved
Hide resolved
29b8e3b
to
ccfec5d
Compare
@martint @kasiafi @Praveen2112 Could you take another look? |
ccfec5d
to
e7d9bdd
Compare
0b2730f
to
9a473c9
Compare
Regarding the |
Do we plan to support other variants of the default value in the future? For values like |
4a4d211
to
7550565
Compare
I'm a little wondering whether fallback to existing implementation is really good idea. The behavior between default-null and missing-columns might be different in remote connectors. For instance, ClickHouse has default values in each type (e.g. I changed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment about type coercion.
@martint could you please check if it makes sense to you? Other than that, I think it's ready for you to review.
{ | ||
try { | ||
Object value = literalInterpreter.evaluate(literal, type); | ||
checkDefaultValue(value, type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we declare column type as double
, and pass an integer default value?
According to spec, it is legal:
If the subject data type is approximate numeric, then the <literal> shall be a <signed
numeric literal>.
In such a case, we should apply a coercion.
For the general case, we should:
- analyze the passed literal to get the type
- if the type is not equal to the declared column type, check if this combination is legal according to the spec. Most cases can probably be handled with a
TypeCoercion#canCoerce()
call. - apply a coercion
- evaluate the expression and store as a Constant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to use ConstantEvaluator.evaluateConstant
method. Let me know if I should avoid the method.
edit : The method doesn't work well in some cases, e.g. TINYINT DEFAULT 123
. I will avoid this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of CREATE TABLE/CTAS we have introduced supportedType spi method which allows the engine to automatically coerce the data to supported type - In case of hive if it supports only timestamp(3) and if we provide a default value which is of timestamp(6) - then shouldn't we cast the default value to the supported type ?
@Praveen2112 good point about the supported types.
The spec does not consider the possibility that the effective column type is different than the declared type.
In such case, we should probably validate the default value against the effective column type.
However, this could hypothetically lead to the unintuitive situation when the default value type is the same as the declared column type, but the default value is rejected.
@martint could you please share you opinion on how we should validate and coerce the default value in case when the declared column type differs from the effective type of the created column?
ColumnMetadata column = ColumnMetadata.builder() | ||
.setName(columnName.getValue()) | ||
.setType(getSupportedType(session, catalogHandle, tableMetadata.metadata().getProperties(), type)) | ||
.setType(supportedType) | ||
.setDefaultValue(element.getDefaultValue().map(value -> evaluateLiteral(literalInterpreter, value, type))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kasiafi In case of CREATE TABLE/CTAS
we have introduced supportedType
spi method which allows the engine to automatically coerce the data to supported type - In case of hive if it supports only timestamp(3) and if we provide a default value which is of timestamp(6) - then shouldn't we cast the default value to the supported type ?
column.getDefaultValue().ifPresent(defaultValue -> defaultColumnValues.put(columnHandle, defaultValue)); | ||
if (!column.isNullable()) { | ||
nonNullableColumnHandles.add(columnHandle); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of DEFAULT null NOT NULL
shouldn't we fail as the default value here doesn't makes much sense ?
7550565
to
9507eb0
Compare
Description
This PR adds support for the DEFAULT clause with literal values when creating a new table or adding a column.
The SHOW CREATE TABLE statement now includes the specified default values.
These defaults are respected in INSERT and MERGE operations.
Non-literal expressions such as <datetime value function>, USER, and similar are not supported in this PR.
Release notes