Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Apache Spark to 2.4 #184

Closed
seratch opened this issue Nov 22, 2018 · 20 comments
Closed

Upgrade Apache Spark to 2.4 #184

seratch opened this issue Nov 22, 2018 · 20 comments

Comments

@seratch
Copy link
Contributor

seratch commented Nov 22, 2018

As you may already know, Apache Spark 2.4.0 was released on Nov 3.

https://spark.apache.org/releases/spark-release-2-4-0.html

Are you already planning to upgrade Spark version? I am interested in working on the upgrade task.

@tovbinm
Copy link
Collaborator

tovbinm commented Nov 22, 2018

0.5.0 release is going to be with Spark 2.3.2. We can start working on adding Spark 2.4.0 once we are done with the release.

In general we wait until the x.y.1 version is out which usually is more stable than .0 releases

@tovbinm
Copy link
Collaborator

tovbinm commented Nov 22, 2018

Update: 0.5.0 was just released.

@seratch
Copy link
Contributor Author

seratch commented Nov 26, 2018

@tovbinm I understand it. Once Spark 2.4.1 is out, I will start working on it.

UPDATE(Mar 2019): @seratch is no longer going to work on this task.

@tovbinm
Copy link
Collaborator

tovbinm commented Jan 9, 2019

@seratch since it might take some time to implement - let's start adding spark 2.4.x support (if you have time of course).

Ideally it would be great to have a cross version build for Spark versions 2.3.x/2.4.x and Scala 2.11/2.12. For example: com.salesforce.transmogrifai:transmogrifai-core_<scala-version>:<version>_<spark-version>, i.e.:
com.salesforce.transmogrifai:transmogrifai-core_2.11:0.6.0_2.3.2
com.salesforce.transmogrifai:transmogrifai-core_2.11:0.6.0_2.4.0
com.salesforce.transmogrifai:transmogrifai-core_2.12:0.6.0_2.4.0

WDYT?

@seratch
Copy link
Contributor Author

seratch commented Jan 10, 2019

@tovbinm Sounds nice. It's also possible to have the spark-version in the artifact like com.salesforce.transmogrifai:transmogrifai-core_spark-<spark-version>_<scala-version>:<version>, i.e.:transmogrifai-core_spark-2.3.2_2.11:0.6.0

The idea is inspired by Scala.js bin versions. Scala.js libraries have sjs0.6in the middle of artifact names. https://search.maven.org/search?q=_sjs0.6_

@tovbinm
Copy link
Collaborator

tovbinm commented Jan 10, 2019

Yeah, thank makes more sense to include it in the artifact name instead of the version.

Let’s just drop the dash and I only keep the major Spark version, I.e: com.salesforce.transmogrifai:transmogrifai-core_spark2.3_2.11:0.6.0

@tovbinm
Copy link
Collaborator

tovbinm commented Feb 19, 2019

This plugin seems promising - https://github.com/ADTRAN/gradle-scala-multiversion-plugin (though after I tried it, scoverage plugin broke):

And a few more options that I found:

  1. https://github.com/prokod/gradle-crossbuild-scala
  2. https://github.com/linkedin/scanns/tree/master/buildSrc/src/main/groovy/com/linkedin/nn/build/plugins

We should evaluate which one is the best option and go from there.

@deanwampler
Copy link

I attempted building with Scala 2.12, but the build failed immediately because there is now Scala 2.12 version of this: org.github.ngbinh.scalastyle:gradle-scalastyle-plugin_2.12:1.0.1. That's as far as I got. How important is scalastyle? Or, is there a more update version (Lightbend doesn't even support Scala 2.11 anymore...)

@tovbinm
Copy link
Collaborator

tovbinm commented Mar 15, 2019

@deanwampler
Copy link

Thanks. I found that last night, then proceeded to misspell his name...

That worked, but now I'm stuck on Gradle's hard-coded (?) integration with the ancient version of zinc (gradle/gradle#2158). There's a PR for upgrading that appears to be languishing (gradle/gradle#8485).

@tovbinm
Copy link
Collaborator

tovbinm commented Mar 15, 2019

I am starting to think if it’s worth moving to sbt or other build tool that can handle the cross build properly. The value proposition of gradle have somewhat finished over time all of the functionality that we need is already present in sbt with plugins: scalastyle check, spark submit, bin tray publish etc.

@tovbinm
Copy link
Collaborator

tovbinm commented Apr 6, 2019

Spark 2.4.1 is out - https://spark.apache.org/news/spark-2-4-1-released.html and it's officially GA-ed Scala 2.12 support. Yay!

@tovbinm
Copy link
Collaborator

tovbinm commented Apr 11, 2019

I just found out that Kafka implemented cross build for Scala 2.11 and 2.12 with some manual customization in their gradle build files - here and here.

@koertkuipers
Copy link

i tried to get this going for scala 2.12 as well. seems i ran into same issue as @deanwampler with old zinc not liking scala 2.12

i get a lot of errors like:

assertion failed: ClassBType.info not yet assigned: Lorg/apache/spark/rdd/RDD;

although if kafka can compile for scala 2.12 with gradle then it sounds like i am messing up somehow here.

@koertkuipers
Copy link

i tried to get this going for scala 2.12 as well. seems i ran into same issue as @deanwampler with old zinc not liking scala 2.12

i get a lot of errors like:

assertion failed: ClassBType.info not yet assigned: Lorg/apache/spark/rdd/RDD;

although if kafka can compile for scala 2.12 with gradle then it sounds like i am messing up somehow here.

removing -optimize from scalaCompileOptions seems to make the errors go away. now i get much further along. i now have a slain compiler on:

  trying to do lub/glb of typevar ?_
     while compiling: core/src/main/scala/com/salesforce/op/ModelInsights.scala
        during phase: uncurry
     library version: version 2.12.8
    compiler version: version 2.12.8

issue seems to be on line 460 of ModelInsights.scala

                symbol: value x$27
     symbol definition: x$27: com.salesforce.op.stages.base.binary.OpTransformer2[com.salesforce.op.features.types.RealNN,com.salesforce.op.features.types.OPVector,com.salesforce.op.features.types.Prediction] with org.apache.spark.ml.Model[_ >: com.salesforce.op.stages.impl.selector.SelectedModel with com.salesforce.op.stages.sparkwrappers.specific.OpPredictorWrapperModel[_] <: com.salesforce.op.stages.base.binary.OpTransformer2[com.salesforce.op.features.types.RealNN,com.salesforce.op.features.types.OPVector,com.salesforce.op.features.types.Prediction] with org.apache.spark.ml.Model[_ >: com.salesforce.op.stages.impl.selector.SelectedModel with com.salesforce.op.stages.sparkwrappers.specific.OpPredictorWrapperModel[_] <: com.salesforce.op.stages.base.binary.OpTransformer2[com.salesforce.op.features.types.RealNN,com.salesforce.op.features.types.OPVector,com.salesforce.op.features.types.Prediction] with org.apache.spark.ml.Model[_ >: com.salesforce.op.stages.impl.selector.SelectedModel with com.salesforce.op.stages.sparkwrappers.specific.OpPredictorWrapperModel[_] <: com.salesforce.op.stages.base.binary.OpTransformer2[com.salesforce.op.features.types.RealNN,com.salesforce.op.features.types.OPVector,com.salesforce.op.features.types.Prediction]] with com.salesforce.op.stages.sparkwrappers.generic.SparkWrapperParams[_ >: _ <: org.apache.spark.ml.Model[_]]] with com.salesforce.op.stages.sparkwrappers.generic.SparkWrapperParams[_ >: _ <: org.apache.spark.ml.Model[_]]] with com.salesforce.op.stages.sparkwrappers.generic.SparkWrapperParams[_ >: _ <: org.apache.spark.ml.Model[_]] forSome { type _ <: org.apache.spark.ml.PredictionModel[org.apache.spark.ml.linalg.Vector,_] } (a TermSymbol)
        symbol package: com.salesforce.op
         symbol owners: value x$27 -> value $anonfun -> method extractFromStages -> object ModelInsights
             call site: method extractFromStages in object ModelInsights in package op
  
  == Source file context for tree position ==
  
     457     val model = models.lastOption
     458     log.info(
     459       s"Found ${models.length} models will " +
     460         s"${model.map("use results from the last model:" + _.uid + "to").getOrElse("not")}" +
     461         s" to fill in model insights"
     462     )
     463 

@koertkuipers
Copy link

koertkuipers commented Jun 14, 2019

ok with a little extra type help the compiler no longer dies.
now its chasing dependencies for scala 2.12. ones missing i found so far:
mleap-spark, looks like @tovbinm is already on to this, see combust/mleap#496
xgboost4j-spark, havent found any mention of scala 2.12 yet but will also require akka upgrade

@koertkuipers
Copy link

sorry i think i have polluted this thread enough, will create separate ticket for scala 2.12

@wsuchy
Copy link
Contributor

wsuchy commented Jun 14, 2019

@koertkuipers could you please share your branch with Scala 2.12 so we could follow up with the migration?

@tovbinm
Copy link
Collaborator

tovbinm commented Jun 21, 2019

FYI, #327 is upto date with changes from master.

@tovbinm
Copy link
Collaborator

tovbinm commented Jun 21, 2019

Hopefully we can release this one next.

@tovbinm tovbinm mentioned this issue Jul 11, 2019
@tovbinm tovbinm closed this as completed Jul 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants