apache-spark-on-k8s · sahilprasad · Aug 23, 2017 · Aug 23, 2017 · Aug 24, 2017 · Aug 29, 2017
diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
@@ -344,8 +344,8 @@ object SparkSubmit extends CommandLineUtils {
 
     // The following modes are not supported or applicable
     (clusterManager, deployMode) match {
-      case (KUBERNETES, CLIENT) =>
-        printErrorAndExit("Client mode is currently not supported for Kubernetes.")
+      case (KUBERNETES, CLIENT) if !inK8sCluster() =>
+        printErrorAndExit("Kubernetes currently only supports in-cluster client mode.")
       case (KUBERNETES, CLUSTER) if args.isR =>
         printErrorAndExit("Kubernetes does not currently support R applications.")
       case (STANDALONE, CLUSTER) if args.isPython =>
@@ -856,6 +856,14 @@ object SparkSubmit extends CommandLineUtils {
     res == SparkLauncher.NO_RESOURCE
   }
 
+  /**
+   * Return whether the submission environment is within a Kubernetes cluster
+   */
+  private[deploy] def inK8sCluster(): Boolean = {
+    !sys.env.get("KUBERNETES_SERVICE_HOST").isEmpty &&
+      !sys.env.get("KUBERNETES_SERVICE_PORT").isEmpty
+  }
+
   /**
    * Merge a sequence of comma-separated file lists, some of which may be null to indicate
    * no files, into a single comma-separated string.

diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md
@@ -65,7 +65,7 @@ For example, if the registry host is `registry-host` and the registry is listeni
     docker push registry-host:5000/spark-driver:latest
     docker push registry-host:5000/spark-executor:latest
     docker push registry-host:5000/spark-init:latest
-    
+
 Note that `spark-base` is the base image for the other images.  It must be built first before the other images, and then afterwards the other images can be built in any order.
 
 ## Submitting Applications to Kubernetes
@@ -182,10 +182,10 @@ is currently supported.
 
 ### Running PySpark
 
-Running PySpark on Kubernetes leverages the same spark-submit logic when launching on Yarn and Mesos. 
-Python files can be distributed by including, in the conf, `--py-files` 
+Running PySpark on Kubernetes leverages the same spark-submit logic when launching on Yarn and Mesos.
+Python files can be distributed by including, in the conf, `--py-files`
 
-Below is an example submission: 
+Below is an example submission:
 
 
 ```
@@ -240,6 +240,38 @@ the command may then look like the following:
 
 ## Advanced
 
+### Running in-cluster client mode applications
+
+While Spark on Kubernetes does not officially support client mode applications, such as the PySpark shell, there is a workaround that
+allows for execution of these apps from within an existing Kubernetes cluster. This _in-cluster_ client mode bypasses some of the networking and
+dependency issues inherent to running a client from outside of a cluster while allowing much of the same functionality in terms of interactive use cases.
+
+In order to run in client mode, use `kubectl attach` to attach to an existing driver pod on the cluster, or the following to run a new driver:
+
+    kubectl run -it --image=<driver image> --restart=Never -- /bin/bash
+
+This will open up a shell into the specified driver pod from which you can run client mode applications. In order to appropriately configure
+these in-cluster applications, be sure to set the following configuration value for all applications, as in the following `spark-submit` example, 
+which essentially tells the cluster manager to refer back to the current driver pod as the driver for any applications you submit:
+
+    spark.kubernetes.driver.pod.name=$HOSTNAME
+
+With that set, you should be able to run the following example from within the pod:
+
+    bin/spark-submit \
+      --class org.apache.spark.examples.SparkPi \
+      --master k8s://$KUBERNETES_SERVICE_HOST:$KUBERNETES_SERVICE_PORT \
+      --kubernetes-namespace default \
+      --conf spark.app.name=spark-pi \
+      --conf spark.kubernetes.driver.pod.name=$HOSTNAME \
+      --conf spark.kubernetes.driver.docker.image=kubespark/spark-driver:latest \
+      --conf spark.kubernetes.executor.docker.image=kubespark/spark-executor:latest \
+      --conf spark.dynamicAllocation.enabled=true \
+      --conf spark.shuffle.service.enabled=true \
+      --conf spark.kubernetes.shuffle.namespace=default \
+      --conf spark.kubernetes.shuffle.labels="app=spark-shuffle-service,spark-version=2.1.0" \
+      local:///opt/spark/examples/jars/spark_examples_2.11-2.2.0.jar 10
+
 ### Securing the Resource Staging Server with TLS
 
 The default configuration of the resource staging server is not secured with TLS. It is highly recommended to configure
@@ -759,25 +791,25 @@ from the other deployment modes. See the [configuration page](configuration.html
   </td>
 </tr>
 <tr>
-  <td><code>spark.kubernetes.node.selector.[labelKey]</code></td> 
+  <td><code>spark.kubernetes.node.selector.[labelKey]</code></td>
   <td>(none)</td>
   <td>
-    Adds to the node selector of the driver pod and executor pods, with key <code>labelKey</code> and the value as the 
+    Adds to the node selector of the driver pod and executor pods, with key <code>labelKey</code> and the value as the
     configuration's value. For example, setting <code>spark.kubernetes.node.selector.identifier</code> to <code>myIdentifier</code>
-    will result in the driver pod and executors having a node selector with key <code>identifier</code> and value 
+    will result in the driver pod and executors having a node selector with key <code>identifier</code> and value
     <code>myIdentifier</code>. Multiple node selector keys can be added by setting multiple configurations with this prefix.
   </td>
 </tr>
 <tr>
-  <td><code>spark.executorEnv.[EnvironmentVariableName]</code></td> 
+  <td><code>spark.executorEnv.[EnvironmentVariableName]</code></td>
   <td>(none)</td>
   <td>
     Add the environment variable specified by <code>EnvironmentVariableName</code> to
     the Executor process. The user can specify multiple of these to set multiple environment variables.
   </td>
 </tr>
 <tr>
-  <td><code>spark.kubernetes.driverEnv.[EnvironmentVariableName]</code></td> 
+  <td><code>spark.kubernetes.driverEnv.[EnvironmentVariableName]</code></td>
   <td>(none)</td>
   <td>
     Add the environment variable specified by <code>EnvironmentVariableName</code> to