-
Notifications
You must be signed in to change notification settings - Fork 202
fix: Support Schema Evolution in iceberg #1723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -1217,34 +1217,6 @@ abstract class ParquetReadSuite extends CometTestBase { | |||
} | |||
} | |||
|
|||
test("schema evolution") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update the test rather than remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to me this only tests CometConf.COMET_SCHEMA_EVOLUTION_ENABLED
. Since I removed the config, I think this test is not needed any more.
I currently did my test in Iceberg. I am thinking maybe we can add an iceberg-integration module to have the iceberg related test, or depend on iceberg CI (#1715)?
Thanks @huaxingao. The implementation changes LGTM, but I would like to understand how this will be tested. |
@@ -28,33 +28,33 @@ | |||
|
|||
public class Utils { | |||
|
|||
/** This method is called from Apache Iceberg. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) Shall we keep this comment? I think it is useful if it's still valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will use the same method for both Comet and Iceberg. The comment is not needed any more.
@@ -33,7 +33,7 @@ import org.apache.spark.sql.execution.datasources.v2.parquet.ParquetScan | |||
import org.apache.spark.sql.internal.SQLConf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) This import can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. Thanks
62e547d
to
6fc8b85
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1723 +/- ##
============================================
+ Coverage 56.12% 58.70% +2.58%
- Complexity 976 1139 +163
============================================
Files 119 129 +10
Lines 11743 12707 +964
Branches 2251 2377 +126
============================================
+ Hits 6591 7460 +869
- Misses 4012 4058 +46
- Partials 1140 1189 +49 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The config |
One set of Spark Sql test failures in native_iceberg_compat is exactly because this schema evolution/type promotion check is not being performed correctly (or not being performed at all, in fact). I'm trying to address these failures atm by writing a |
Which issue does this PR close?
We original have CometConf.COMET_SCHEMA_EVOLUTION_ENABLED to set schema evolution to true in Scan rule if the scan is Iceberg table scan. However, it doesn't work for the following case:
In this example, when executing SELECT * FROM table, Iceberg creates a Comet ColumnReader and invokes TypeUtil.checkParquetType. This throws an exception because the scan rule hasn't been applied yet, but the column type has already changed to bigint.
Instead of enabling schema evolution in the scan rule, I will update Utils.getColumnReader to accept a boolean supportsSchemaEvolution parameter and pass true from the Iceberg side.
Closes #.
Rationale for this change
What changes are included in this PR?
How are these changes tested?
I current test the new patch in Iceberg.
Without the fix, I got
With fix, the problem went away.