Support for spark version 3.5.0 - support required for UNTYPED_SCALA_UDF version #1297

VIKCT001 · 2024-09-24T08:50:41Z

with spark 3.4 and scala 2.12.16 version on dataproc image 2. we were able to run our jobs by setting the below property.
--properties=spark.sql.legacy.allowUntypedScalaUDF=true

but since we have been migrated to 3.5 and scala 2.12.18 version of dataproc image 2. we are getting below error message.

"exception": "AnalysisException: [UNTYPED_SCALA_UDF] You\u0027re using untyped Scala UDF,
which does not have the input type information. Spark may blindly pass null to the Scala closure with primitive-type argument,
and the closure will see the default value of the Java type for the null argument, e.g. udf((x: Int) \u003d\u003e x, IntegerType),
the result is 0 for null input. To get rid of this error, you could:\n1. use typed Scala UDF APIs(without return type parameter),
e.g. udf((x: Int) \u003d\u003e x).\n2. use Java UDF APIs, e.g. udf(new UDF1[String, Integer] { override def call(s: String): Integer \u003d s.length() }, IntegerType),
if input types are all non primitive.\n3. set "spark.sql.legacy.allowUntypedScalaUDF" to "true" and use this API with caution."

is this property being deprecated with spark 3.5 version?

davidrabinowitz · 2024-09-24T17:24:27Z

Is it related to the Spark BigQuery connector? I fail to see the relation. For a general Dataproc support please see https://cloud.google.com/dataproc/docs/support/getting-support

VIKCT001 · 2024-09-25T05:04:06Z

yes simialr code was working fine with spark 3.3 and scala 2.12.16 version on dataproc image 2.0. it just that we need to specify the spark.sql.legacy.allowUntypedScalaUDF=true when submitting the spark job to dataproc cluster and with old bigquery connector.

Since then, we been migrated to image 2.2 and scala 2.12.18 and spark 3.5 version(which are compatible for new dataproc image) we are facing this issue.

davidrabinowitz self-assigned this Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for spark version 3.5.0 - support required for UNTYPED_SCALA_UDF version #1297

Support for spark version 3.5.0 - support required for UNTYPED_SCALA_UDF version #1297

VIKCT001 commented Sep 24, 2024

davidrabinowitz commented Sep 24, 2024

VIKCT001 commented Sep 25, 2024

Support for spark version 3.5.0 - support required for UNTYPED_SCALA_UDF version #1297

Support for spark version 3.5.0 - support required for UNTYPED_SCALA_UDF version #1297

Comments

VIKCT001 commented Sep 24, 2024

davidrabinowitz commented Sep 24, 2024

VIKCT001 commented Sep 25, 2024