Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for spark version 3.5.0 - support required for UNTYPED_SCALA_UDF version #1297

Open
VIKCT001 opened this issue Sep 24, 2024 · 2 comments
Assignees

Comments

@VIKCT001
Copy link

with spark 3.4 and scala 2.12.16 version on dataproc image 2. we were able to run our jobs by setting the below property.
--properties=spark.sql.legacy.allowUntypedScalaUDF=true

but since we have been migrated to 3.5 and scala 2.12.18 version of dataproc image 2. we are getting below error message.

"exception": "AnalysisException: [UNTYPED_SCALA_UDF] You\u0027re using untyped Scala UDF,
which does not have the input type information. Spark may blindly pass null to the Scala closure with primitive-type argument,
and the closure will see the default value of the Java type for the null argument, e.g. udf((x: Int) \u003d\u003e x, IntegerType),
the result is 0 for null input. To get rid of this error, you could:\n1. use typed Scala UDF APIs(without return type parameter),
e.g. udf((x: Int) \u003d\u003e x).\n2. use Java UDF APIs, e.g. udf(new UDF1[String, Integer] { override def call(s: String): Integer \u003d s.length() }, IntegerType),
if input types are all non primitive.\n3. set "spark.sql.legacy.allowUntypedScalaUDF" to "true" and use this API with caution."

is this property being deprecated with spark 3.5 version?

@davidrabinowitz
Copy link
Member

Is it related to the Spark BigQuery connector? I fail to see the relation. For a general Dataproc support please see https://cloud.google.com/dataproc/docs/support/getting-support

@davidrabinowitz davidrabinowitz self-assigned this Sep 24, 2024
@VIKCT001
Copy link
Author

yes simialr code was working fine with spark 3.3 and scala 2.12.16 version on dataproc image 2.0. it just that we need to specify the spark.sql.legacy.allowUntypedScalaUDF=true when submitting the spark job to dataproc cluster and with old bigquery connector.

Since then, we been migrated to image 2.2 and scala 2.12.18 and spark 3.5 version(which are compatible for new dataproc image) we are facing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants