Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Incorrect error message when using INSERT SELECT *, and source table has less columns than target table #3701

Open
2 of 8 tasks
felipepessoto opened this issue Sep 20, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@felipepessoto
Copy link
Contributor

felipepessoto commented Sep 20, 2024

Bug

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Describe the problem

The error message is misleading: [DELTA_DUPLICATE_COLUMNS_FOUND] Found duplicate column(s) in the data to save: name

Steps to reproduce

DROP TABLE IF EXISTS MySourceTable;
DROP TABLE IF EXISTS MyTargetTable;
CREATE TABLE MySourceTable USING DELTA AS SELECT 1 as Id, 30 as Age, 'John' as Name;
CREATE TABLE MyTargetTable (Id INT, Name STRING) USING DELTA;
INSERT INTO MyTargetTable SELECT * FROM MySourceTable;

Observed results

[DELTA_DUPLICATE_COLUMNS_FOUND] Found duplicate column(s) in the data to save: name
org.apache.spark.sql.delta.schema.SchemaMergingUtils$.checkColumnNameDuplication(SchemaMergingUtils.scala:123)
org.apache.spark.sql.delta.schema.SchemaMergingUtils$.mergeSchemas(SchemaMergingUtils.scala:168)
org.apache.spark.sql.delta.schema.ImplicitMetadataOperation$.mergeSchema(ImplicitMetadataOperation.scala:219)
org.apache.spark.sql.delta.schema.ImplicitMetadataOperation.updateMetadata(ImplicitMetadataOperation.scala:84)
org.apache.spark.sql.delta.schema.ImplicitMetadataOperation.updateMetadata$(ImplicitMetadataOperation.scala:66)
org.apache.spark.sql.delta.commands.WriteIntoDelta.updateMetadata(WriteIntoDelta.scala:77)
org.apache.spark.sql.delta.commands.WriteIntoDelta.writeAndReturnCommitData(WriteIntoDelta.scala:162)
org.apache.spark.sql.delta.commands.WriteIntoDelta.$anonfun$run$1(WriteIntoDelta.scala:106)
org.apache.spark.sql.delta.commands.WriteIntoDelta.$anonfun$run$1$adapted(WriteIntoDelta.scala:101)
org.apache.spark.sql.delta.DeltaLog.withNewTransaction(DeltaLog.scala:227)
org.apache.spark.sql.delta.commands.WriteIntoDelta.run(WriteIntoDelta.scala:101)
org.apache.spark.sql.delta.catalog.WriteIntoDeltaBuilder$$anon$1$$anon$2.insert(DeltaTableV2.scala:432)
org.apache.spark.sql.execution.datasources.v2.SupportsV1Write.writeWithV1(V1FallbackWriters.scala:79)

Expected results

A message saying the data source schema doesn't match the columns of columns

Environment information

  • Delta Lake version: 3.2
  • Spark version: 3.5
  • Scala version: 2.12

Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
  • No. I cannot contribute a bug fix at this time.
@felipepessoto felipepessoto added the bug Something isn't working label Sep 20, 2024
@felipepessoto
Copy link
Contributor Author

It happens because it expands the * in INSERT INTO MyTargetTable SELECT * FROM MySourceTable into: INSERT INTO MyTargetTable SELECT Id as Id, Age as Name, Name FROM MySourceTable, which makes sense since the second column in the target is Name. I think we need a column length validation first.

Project [Id#724 AS Id#760, cast(Age#725 as string) AS Name#761, Name#726]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant