Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump SLF4J from 1.7.30 to 2.0.16. #33574

Merged
merged 18 commits into from
Feb 24, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -96,7 +96,7 @@ jobs:
- name: run integrationTest
uses: ./.github/actions/gradle-command-self-hosted-action
with:
gradle-command: :sdks:java:io:sparkreceiver:2:integrationTest
gradle-command: :sdks:java:io:sparkreceiver:3:integrationTest
arguments: |
--info \
--tests org.apache.beam.sdk.io.sparkreceiver.SparkReceiverIOIT \
5 changes: 3 additions & 2 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -68,12 +68,13 @@
## New Features / Improvements

* Support custom coders in Reshuffle ([#29908](https://github.com/apache/beam/issues/29908), [#33356](https://github.com/apache/beam/issues/33356)).

* [Java] Upgrade SLF4J to 2.0.16. Update default Spark version to 3.5.0. ([#33574](https://github.com/apache/beam/pull/33574))
* X feature added (Java/Python) ([#X](https://github.com/apache/beam/issues/X)).

## Breaking Changes
* [Python] Reshuffle now correctly respects user-specified type hints, fixing a previous bug where it might use FastPrimitivesCoder wrongly. This change could break pipelines with incorrect type hints in Reshuffle. If you have issues after upgrading, temporarily set update_compatibility_version to a previous Beam version to use the old behavior. The recommended solution is to fix the type hints in your code. ([#33932](https://github.com/apache/beam/pull/33932))

* [Python] Reshuffle now correctly respects user-specified type hints, fixing a previous bug where it might use FastPrimitivesCoder wrongly. This change could break pipelines with incorrect type hints in Reshuffle. If you have issues after upgrading, temporarily set update_compatibility_version to a previous Beam version to use the old behavior. The recommended solution is to fix the type hints in your code. ([#33932](https://github.com/apache/beam/pull/33932))
* [Java] SparkReceiver 2 has been moved to SparkReceiver 3 that supports Spark 3.x. ([#33574](https://github.com/apache/beam/pull/33574))
* X behavior was changed ([#X](https://github.com/apache/beam/issues/X)).

## Deprecations
2 changes: 1 addition & 1 deletion build.gradle.kts
Original file line number Diff line number Diff line change
@@ -304,7 +304,7 @@ tasks.register("javaPreCommit") {
dependsOn(":sdks:java:io:contextualtextio:build")
dependsOn(":sdks:java:io:expansion-service:build")
dependsOn(":sdks:java:io:file-based-io-tests:build")
dependsOn(":sdks:java:io:sparkreceiver:2:build")
dependsOn(":sdks:java:io:sparkreceiver:3:build")
dependsOn(":sdks:java:io:synthetic:build")
dependsOn(":sdks:java:io:xml:build")
dependsOn(":sdks:java:javadoc:allJavadoc")
Original file line number Diff line number Diff line change
@@ -636,12 +636,12 @@ class BeamModulePlugin implements Plugin<Project> {
def quickcheck_version = "1.0"
def sbe_tool_version = "1.25.1"
def singlestore_jdbc_version = "1.1.4"
def slf4j_version = "1.7.30"
def slf4j_version = "2.0.16"
def snakeyaml_engine_version = "2.6"
def snakeyaml_version = "2.2"
def solace_version = "10.21.0"
def spark2_version = "2.4.8"
def spark3_version = "3.2.2"
def spark3_version = "3.5.0"
def spotbugs_version = "4.0.6"
def testcontainers_version = "1.19.7"
// [bomupgrader] determined by: org.apache.arrow:arrow-memory-core, consistent with: google_cloud_platform_libraries_bom
4 changes: 2 additions & 2 deletions runners/google-cloud-dataflow-java/worker/build.gradle
Original file line number Diff line number Diff line change
@@ -87,7 +87,7 @@ applyJavaNature(
// TODO(https://github.com/apache/beam/issues/19114): Move DataflowRunnerHarness class under org.apache.beam.runners.dataflow.worker namespace
"com/google/cloud/dataflow/worker/DataflowRunnerHarness.class",
// Allow slf4j implementation worker for logging during pipeline execution
"org/slf4j/impl/**"
"org/slf4j/jul/**"
],
generatedClassPatterns: [
/^org\.apache\.beam\.runners\.dataflow\.worker\.windmill.*/,
@@ -240,7 +240,7 @@ project.task('validateShadedJarContainsSlf4jJdk14', dependsOn: 'shadowJar') {
doLast {
project.configurations.shadow.artifacts.files.each {
FileTree slf4jImpl = project.zipTree(it).matching {
include "org/slf4j/impl/JDK14LoggerAdapter.class"
include "org/slf4j/jul/JDK14LoggerAdapter.class"
}
outFile.text = slf4jImpl.files
if (slf4jImpl.files.isEmpty()) {
20 changes: 20 additions & 0 deletions runners/spark/3/build.gradle
Original file line number Diff line number Diff line change
@@ -56,6 +56,26 @@ sparkVersions.each { kv ->
}

dependencies {
// Spark versions prior to 3.4.0 are compiled against SLF4J 1.x. The
// `org.apache.spark.internal.Logging.isLog4j12()` function references an
// SLF4J 1.x binding class (org.slf4j.impl.StaticLoggerBinder) which is
// no longer available in SLF4J 2.x. This results in a
// `java.lang.NoClassDefFoundError`.
//
// The workaround is to provide an SLF4J 1.x binding module out of group
// `org.slf4j` to resolve the issue.
// Module `org.apache.logging.log4j:log4j-slf4j-impl` is an example that
// provides a compatible SLF4J 1.x binding regardless SLF4J upgrade.
// Binding/provider modules under group `org.slf4j` (e.g.,
// slf4j-simple, slf4j-reload4j) get upgraded as a new SLF4J version is in
// use, and therefore do not contain the 1.x binding classes.
//
// Notice that Spark 3.1.x uses `ch.qos.logback:logback-classic` and is
// unaffected by the SLF4J upgrade. Spark 3.3.x already uses
// `log4j-slf4j-impl` so it is also unaffected.
if ("$kv.key" >= "320" && "$kv.key" <= "324") {
"sparkVersion$kv.key" library.java.log4j2_slf4j_impl
}
spark.components.each { component -> "sparkVersion$kv.key" "$component:$kv.value" }
}

8 changes: 8 additions & 0 deletions runners/spark/spark_runner.gradle
Original file line number Diff line number Diff line change
@@ -176,6 +176,10 @@ dependencies {
spark.components.each { component ->
provided "$component:$spark_version"
}
if ("$spark_version" >= "3.5.0") {
implementation "org.apache.spark:spark-common-utils_$spark_scala_version:$spark_version"
implementation "org.apache.spark:spark-sql-api_$spark_scala_version:$spark_version"
}
permitUnusedDeclared "org.apache.spark:spark-network-common_$spark_scala_version:$spark_version"
implementation "io.dropwizard.metrics:metrics-core:4.1.1" // version used by Spark 3.1
compileOnly "org.scala-lang:scala-library:2.12.15"
@@ -202,6 +206,10 @@ dependencies {
testImplementation library.java.mockito_core
testImplementation "org.assertj:assertj-core:3.11.1"
testImplementation "org.apache.zookeeper:zookeeper:3.4.11"
if ("$spark_version" >= "3.5.0") {
testImplementation "org.apache.spark:spark-common-utils_$spark_scala_version:$spark_version"
testImplementation "org.apache.spark:spark-sql-api_$spark_scala_version:$spark_version"
}
validatesRunner project(path: ":sdks:java:core", configuration: "shadowTest")
validatesRunner project(path: ":runners:core-java", configuration: "testRuntimeMigration")
validatesRunner project(":sdks:java:io:hadoop-format")
15 changes: 3 additions & 12 deletions sdks/java/extensions/arrow/build.gradle
Original file line number Diff line number Diff line change
@@ -24,19 +24,10 @@ description = "Apache Beam :: SDKs :: Java :: Extensions :: Arrow"
dependencies {
implementation library.java.vendored_guava_32_1_2_jre
implementation project(path: ":sdks:java:core", configuration: "shadow")
implementation(library.java.arrow_vector) {
// Arrow 15 has compile dependency of slf4j 2.x where Beam does not support
exclude group: 'org.slf4j', module: 'slf4j-api'
}
implementation(library.java.arrow_memory_core) {
// Arrow 15 has compile dependency of slf4j 2.x where Beam does not support
exclude group: 'org.slf4j', module: 'slf4j-api'
}
implementation(library.java.arrow_vector)
implementation(library.java.arrow_memory_core)
implementation library.java.joda_time
testImplementation(library.java.arrow_memory_netty) {
// Arrow 15 has compile dependency of slf4j 2.x where Beam does not support
exclude group: 'org.slf4j', module: 'slf4j-api'
}
testImplementation(library.java.arrow_memory_netty)
testImplementation library.java.junit
testImplementation library.java.hamcrest
testRuntimeOnly library.java.slf4j_simple
16 changes: 13 additions & 3 deletions sdks/java/io/cdap/build.gradle
Original file line number Diff line number Diff line change
@@ -45,7 +45,11 @@ dependencies {
implementation library.java.cdap_etl_api
implementation library.java.cdap_etl_api_spark
implementation library.java.cdap_hydrator_common
implementation library.java.cdap_plugin_hubspot
implementation (library.java.cdap_plugin_hubspot) {
// Excluding the module for scala 2.11, because Spark 3.x uses scala
// 2.12 instead.
exclude group: "com.fasterxml.jackson.module", module: "jackson-module-scala_2.11"
}
implementation library.java.cdap_plugin_salesforce
implementation library.java.cdap_plugin_service_now
implementation library.java.cdap_plugin_zendesk
@@ -56,11 +60,17 @@ dependencies {
implementation library.java.jackson_core
implementation library.java.jackson_databind
implementation library.java.slf4j_api
implementation library.java.spark_streaming
implementation (library.java.spark3_streaming) {
// Excluding `org.slf4j:jul-to-slf4j` which was introduced as a
// transitive dependency in Spark 3.5.0 (particularly from
// spark-common-utils_2.12) and would cause stack overflow together with
// `org.slf4j:slf4j-jdk14`.
exclude group: "org.slf4j", module: "jul-to-slf4j"
}
implementation library.java.tephra
implementation library.java.vendored_guava_32_1_2_jre
implementation project(path: ":sdks:java:core", configuration: "shadow")
implementation project(":sdks:java:io:sparkreceiver:2")
implementation project(":sdks:java:io:sparkreceiver:3")
implementation project(":sdks:java:io:hadoop-format")
testImplementation library.java.cdap_plugin_service_now
testImplementation library.java.cdap_etl_api
15 changes: 3 additions & 12 deletions sdks/java/io/google-cloud-platform/build.gradle
Original file line number Diff line number Diff line change
@@ -138,22 +138,13 @@ dependencies {
implementation library.java.slf4j_api
implementation library.java.vendored_grpc_1_69_0
implementation library.java.vendored_guava_32_1_2_jre
implementation(library.java.arrow_memory_core) {
// Arrow 15 has compile dependency of slf4j 2.x where Beam does not support
exclude group: 'org.slf4j', module: 'slf4j-api'
}
implementation(library.java.arrow_vector) {
// Arrow 15 has compile dependency of slf4j 2.x where Beam does not support
exclude group: 'org.slf4j', module: 'slf4j-api'
}
implementation library.java.arrow_memory_core
implementation library.java.arrow_vector

implementation 'com.google.http-client:google-http-client-gson:1.41.2'
implementation "org.threeten:threetenbp:1.4.4"

testImplementation(library.java.arrow_memory_netty) {
// Arrow 15 has compile dependency of slf4j 2.x where Beam does not support
exclude group: 'org.slf4j', module: 'slf4j-api'
}
testImplementation library.java.arrow_memory_netty
testImplementation project(path: ":sdks:java:core", configuration: "shadowTest")
testImplementation project(path: ":sdks:java:extensions:avro", configuration: "testRuntimeMigration")
testImplementation project(path: ":sdks:java:extensions:google-cloud-platform-core", configuration: "testRuntimeMigration")
Original file line number Diff line number Diff line change
@@ -18,11 +18,11 @@
-->
# SparkReceiverIO

SparkReceiverIO provides I/O transforms to read messages from an [Apache Spark Receiver](https://spark.apache.org/docs/2.4.0/streaming-custom-receivers.html) `org.apache.spark.streaming.receiver.Receiver` as an unbounded source.
SparkReceiverIO provides I/O transforms to read messages from an [Apache Spark Receiver](https://spark.apache.org/docs/3.5.0/streaming-custom-receivers.html) `org.apache.spark.streaming.receiver.Receiver` as an unbounded source.

## Prerequistes

SparkReceiverIO supports [Spark Receivers](https://spark.apache.org/docs/2.4.0/streaming-custom-receivers.html) (Spark version 2.4).
SparkReceiverIO supports [Spark Receivers](https://spark.apache.org/docs/3.5.0/streaming-custom-receivers.html) (Spark version 3.x, tested on Spark version 3.5.0).
1. Corresponding Spark Receiver should implement [HasOffset](https://github.com/apache/beam/blob/master/sdks/java/io/sparkreceiver/src/main/java/org/apache/beam/sdk/io/sparkreceiver/HasOffset.java) interface.
2. Records should have the numeric field that represents record offset. *Example:* `RecordId` field for Salesforce and `vid` field for Hubspot Receivers.
For more details please see [GetOffsetUtils](https://github.com/apache/beam/tree/master/examples/java/cdap/src/main/java/org/apache/beam/examples/complete/cdap/utils/GetOffsetUtils.java) class from CDAP plugins examples.
@@ -53,7 +53,7 @@ To learn more, please check out CDAP Streaming plugins [complete examples](https

## Dependencies

To use SparkReceiverIO, add a dependency on `beam-sdks-java-io-sparkreceiver`.
To use SparkReceiverIO, add a dependency on `beam-sdks-java-io-sparkreceiver-3`.

```maven
<dependency>
Original file line number Diff line number Diff line change
@@ -43,8 +43,8 @@ dependencies {
implementation library.java.commons_lang3
implementation library.java.joda_time
implementation library.java.slf4j_api
implementation library.java.spark_streaming
implementation library.java.spark_core
implementation library.java.spark3_streaming
implementation library.java.spark3_core
implementation library.java.vendored_guava_32_1_2_jre
implementation project(path: ":sdks:java:core", configuration: "shadow")
compileOnly "org.scala-lang:scala-library:2.11.12"
2 changes: 1 addition & 1 deletion settings.gradle.kts
Original file line number Diff line number Diff line change
@@ -252,7 +252,7 @@ include(":sdks:java:io:rabbitmq")
include(":sdks:java:io:redis")
include(":sdks:java:io:rrio")
include(":sdks:java:io:solr")
include(":sdks:java:io:sparkreceiver:2")
include(":sdks:java:io:sparkreceiver:3")
include(":sdks:java:io:snowflake")
include(":sdks:java:io:snowflake:expansion-service")
include(":sdks:java:io:splunk")