Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mapping Data Flow / CDC support for writing to Delta Tables with Delta Lake Change Feed enabled #532

Open
stephen-bowser opened this issue Feb 9, 2023 · 1 comment

Comments

@stephen-bowser
Copy link

Hi,

This is a follow up to a comment from issue 531

I am using your new CDC resource to stream data from my Azure SQL Server to a Delta Table. Since my downstream applications need to keep track of Inserts, Updates, and Deletes, it would be very helpful if the CDC could support writing to tables that have Delta Lake Change Feed enabled.

Currently when I convert the delta table to have this enabled, the CDC activity fails. I think this might be because the ADF is writing to the table using an old Databricks Runtime, since the docs I have linked to say that once this feature is enabled, 'you can no longer write to the table using Databricks Runtime 8.1 or below.'

Would it be possible to upgrade the way ADF is writing to delta to make it compatible with this feature?

Error message from adf when this feature is enabled:

{
"message": "{\"StatusCode\":\"DFExecutorUserError\",\"Message\":\"Job failed due to reason: at Sink 'SinkdatabasefeeddboPlannedActivity': Delta protocol version is too new for this version of the Databricks Runtime. Please upgrade to a newer release.\",\"Details\":\"org.apache.spark.sql.delta.actions.InvalidProtocolVersionException: Delta protocol version is too new for this version of the Databricks Runtime. Please upgrade to a newer release.\\n\\tat org.apache.spark.sql.delta.DeltaLog.protocolWrite(DeltaLog.scala:294)\\n\\tat org.apache.spark.sql.delta.OptimisticTransactionImpl$class.prepareCommit(OptimisticTransaction.scala:390)\\n\\tat org.apache.spark.sql.delta.OptimisticTransaction.prepareCommit(OptimisticTransaction.scala:80)\\n\\tat org.apache.spark.sql.delta.OptimisticTransactionImpl$$anonfun$commit$1.apply$mcJ$sp(OptimisticTransaction.scala:287)\\n\\tat org.apache.spark.sql.delta.OptimisticTransactionImpl$$anonfun$commit$1.apply(OptimisticTransaction.scala:284)\\n\\tat org.apache.spark.sql.delta.OptimisticTransactionImpl$$anonfun$commit$1.apply(OptimisticTransaction.scala:284)\\n\\tat com.microsoft.spark.telemetry.delta.SynapseLoggingShim$class.recordOperation(SynapseLoggingShim.scala:72)\\n\\tat org.apache.spark.sql.delta.OptimisticTransaction.recordOperation(OptimisticTransaction.scala:80)\\n\"}",
"failureType": "UserError",
"target": "SystemPipeline_03de3c8c736a4f059fd967faf62aedcc"
}
@rnandurimsft
Copy link

Hi @stephen-bowser,

Current latest supported spark runtime for ADF is 'Synapse Spark 3.1.2' version. Even in this latest version, CDC is not available and hence you would see the same error.

Hence, we may not be able support CDC for Delta now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants