Skip to content

Commit 96834fb

Browse files
shanyusrowen
authored andcommitted
[SPARK-26011][SPARK-SUBMIT] Yarn mode pyspark app without python main resource does not honor "spark.jars.packages"
SparkSubmit determines pyspark app by the suffix of primary resource but Livy uses "spark-internal" as the primary resource when calling spark-submit, therefore args.isPython is set to false in SparkSubmit.scala. In Yarn mode, SparkSubmit module is responsible for resolving maven coordinates and adding them to "spark.submit.pyFiles" so that python's system path can be set correctly. The fix is to resolve maven coordinates not only when args.isPython is true, but also when primary resource is spark-internal. Tested the patch with Livy submitting pyspark app, spark-submit, pyspark with or without packages config. Signed-off-by: Shanyu Zhao <shzhaomicrosoft.com> Closes apache#23009 from shanyu/shanyu-26011. Authored-by: Shanyu Zhao <[email protected]> Signed-off-by: Sean Owen <[email protected]> (cherry picked from commit 9a5fda6) Signed-off-by: Sean Owen <[email protected]>
1 parent aaa21d8 commit 96834fb

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala

+1-1
Original file line numberDiff line numberDiff line change
@@ -318,7 +318,7 @@ private[spark] class SparkSubmit extends Logging {
318318

319319
if (!StringUtils.isBlank(resolvedMavenCoordinates)) {
320320
args.jars = mergeFileLists(args.jars, resolvedMavenCoordinates)
321-
if (args.isPython) {
321+
if (args.isPython || isInternal(args.primaryResource)) {
322322
args.pyFiles = mergeFileLists(args.pyFiles, resolvedMavenCoordinates)
323323
}
324324
}

0 commit comments

Comments
 (0)