-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow user-specified schema in read if it's consistent #3929
base: master
Are you sure you want to change the base?
Conversation
@allisonport-db @cloud-fan will this very important fix be in the 3.3.0 release? |
@allisonport-db @cloud-fan will this very important fix be in the 3.3.0 release? |
cc @tdas |
Hi, @tdas any news about this fix? Thanks! |
Hi, |
@cloud-fan @tdas any update? Anything? |
Hi @nimrod-doubleverify apologies for the delay here, unfortunately this didn't make it to the 3.3.0 release but we'll work on getting out a 3.3.1 release this week. |
Which Delta project/connector is this regarding?
Description
User-specified schema may come from the catalog if the Delta table is stored in an external catalog that syncs the table schema with the Delta log. We should allow it if it's the same as the real Delta table schema.
This is already the case for batch read, see apache/spark#15046
This PR changes the Delta streaming read to allow it as well.
Note: since Delta uses DS v2 (
TableProvider
) and explicitly claims that user-specified schema is not supported (TableProvider#supportsExternalMetadata
returns false by default), end users still can't specify schema inspark.read/readStream.schema
. This change is only for advanced Spark plugins that can construct logical plans to triggers Delta v1 source stream scan.How was this patch tested?
a new test
Does this PR introduce any user-facing changes?
No