-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move docker image management and test entrypoint to Maven #31
Changes from 16 commits
1d7f36c
8c067af
71ac3e9
df00060
dcd44f4
03c5977
016505d
f2aa748
b4f2004
dc97080
f689415
81285c1
d7b44d1
0882db7
d69787b
7a4a5d4
b33e962
555a898
0105297
f67af08
f10c3b5
87f7fb6
e9035aa
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,11 @@ | ||
.idea/ | ||
spark/ | ||
integration-test/target/ | ||
target/ | ||
build/*.jar | ||
build/apache-maven* | ||
build/scala* | ||
build/zinc* | ||
build/run-mvn | ||
*.class | ||
*.log | ||
*.iml | ||
*.swp |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,98 +8,67 @@ title: Spark on Kubernetes Integration Tests | |
Note that the integration test framework is currently being heavily revised and | ||
is subject to change. Note that currently the integration tests only run with Java 8. | ||
|
||
As shorthand to run the tests against any given cluster, you can use the `e2e/runner.sh` script. | ||
The script assumes that you have a functioning Kubernetes cluster (1.6+) with kubectl | ||
configured to access it. The master URL of the currently configured cluster on your | ||
machine can be discovered as follows: | ||
|
||
``` | ||
$ kubectl cluster-info | ||
|
||
Kubernetes master is running at https://xyz | ||
``` | ||
|
||
If you want to use a local [minikube](https://github.com/kubernetes/minikube) cluster, | ||
the minimum tested version is 0.23.0, with the kube-dns addon enabled | ||
and the recommended configuration is 3 CPUs and 4G of memory. There is also a wrapper | ||
script for running on minikube, `e2e/e2e-minikube.sh` for testing the master branch | ||
of the apache/spark repository in specific. | ||
|
||
``` | ||
$ minikube start --memory 4000 --cpus 3 | ||
``` | ||
|
||
If you're using a non-local cluster, you must provide an image repository | ||
which you have write access to, using the `-i` option, in order to store docker images | ||
generated during the test. | ||
|
||
Example usages of the script: | ||
|
||
``` | ||
$ ./e2e/runner.sh -m https://xyz -i docker.io/foxish -d cloud | ||
$ ./e2e/runner.sh -m https://xyz -i test -d minikube | ||
$ ./e2e/runner.sh -m https://xyz -i test -r https://github.com/my-spark/spark -d minikube | ||
$ ./e2e/runner.sh -m https://xyz -i test -r https://github.com/my-spark/spark -b my-branch -d minikube | ||
``` | ||
|
||
# Detailed Documentation | ||
|
||
## Running the tests using maven | ||
|
||
Integration tests firstly require installing [Minikube](https://kubernetes.io/docs/getting-started-guides/minikube/) on | ||
your machine, and for the `Minikube` binary to be on your `PATH`.. Refer to the Minikube documentation for instructions | ||
on how to install it. It is recommended to allocate at least 8 CPUs and 8GB of memory to the Minikube cluster. | ||
|
||
Running the integration tests requires a Spark distribution package tarball that | ||
contains Spark jars, submission clients, etc. You can download a tarball from | ||
http://spark.apache.org/downloads.html. Or, you can create a distribution from | ||
source code using `make-distribution.sh`. For example: | ||
|
||
``` | ||
$ git clone [email protected]:apache/spark.git | ||
$ cd spark | ||
$ ./dev/make-distribution.sh --tgz \ | ||
-Phadoop-2.7 -Pkubernetes -Pkinesis-asl -Phive -Phive-thriftserver | ||
``` | ||
|
||
The above command will create a tarball like spark-2.3.0-SNAPSHOT-bin.tgz in the | ||
top-level dir. For more details, see the related section in | ||
[building-spark.md](https://github.com/apache/spark/blob/master/docs/building-spark.md#building-a-runnable-distribution) | ||
|
||
|
||
Once you prepare the tarball, the integration tests can be executed with Maven or | ||
your IDE. Note that when running tests from an IDE, the `pre-integration-test` | ||
phase must be run every time the Spark main code changes. When running tests | ||
from the command line, the `pre-integration-test` phase should automatically be | ||
invoked if the `integration-test` phase is run. | ||
|
||
With Maven, the integration test can be run using the following command: | ||
|
||
``` | ||
$ mvn clean integration-test \ | ||
-Dspark-distro-tgz=spark/spark-2.3.0-SNAPSHOT-bin.tgz | ||
``` | ||
|
||
## Running against an arbitrary cluster | ||
|
||
In order to run against any cluster, use the following: | ||
```sh | ||
$ mvn clean integration-test \ | ||
-Dspark-distro-tgz=spark/spark-2.3.0-SNAPSHOT-bin.tgz \ | ||
-DextraScalaTestArgs="-Dspark.kubernetes.test.master=k8s://https://<master> | ||
|
||
## Reuse the previous Docker images | ||
|
||
The integration tests build a number of Docker images, which takes some time. | ||
By default, the images are built every time the tests run. You may want to skip | ||
re-building those images during development, if the distribution package did not | ||
change since the last run. You can pass the property | ||
`spark.kubernetes.test.imageDockerTag` to the test process and specify the Docker | ||
image tag that is appropriate. | ||
Here is an example: | ||
|
||
``` | ||
$ mvn clean integration-test \ | ||
-Dspark-distro-tgz=spark/spark-2.3.0-SNAPSHOT-bin.tgz \ | ||
-Dspark.kubernetes.test.imageDockerTag=latest | ||
``` | ||
The simplest way to run the integration tests is to install and run Minikube, then run the following: | ||
|
||
dev/dev-run-integration-tests.sh | ||
|
||
The minimum tested version of Minikube is 0.23.0. The kube-dns addon must be enabled. Minikube should | ||
run with a minimum of 3 CPUs and 4G of memory: | ||
|
||
minikube start --cpus 3 --memory 4096 | ||
|
||
You can download Minikube [here](https://github.com/kubernetes/minikube/releases). | ||
|
||
# Integration test customization | ||
|
||
Configuration of the integration test runtime is done through passing different arguments to the test script. The main useful options are outlined below. | ||
|
||
## Use a non-local cluster | ||
|
||
To use your own cluster running in the cloud, set the following: | ||
|
||
* `--deploy-mode cloud` to indicate that the test is connecting to a remote cluster instead of Minikube, | ||
* `--spark-master <master-url>` - set `<master-url>` to the externally accessible Kubernetes cluster URL, | ||
* `--image-repo <repo>` - set `<repo>` to a write-accessible Docker image repository that provides the images for your cluster. The framework assumes your local Docker client can push to this repository. | ||
|
||
Therefore the command looks like this: | ||
|
||
dev/dev-run-integration-tests.sh \ | ||
--deploy-mode cloud \ | ||
--spark-master https://example.com:8443/apiserver \ | ||
--image-repo docker.example.com/spark-images | ||
|
||
## Re-using Docker Images | ||
|
||
By default, the test framework will build new Docker images on every test execution. A unique image tag is generated, | ||
and it is written to file at `target/imageTag.txt`. To reuse the images built in a previous run, or to use a Docker image tag | ||
that you have built by other means already, pass the tag to the test script: | ||
|
||
dev/dev-run-integration-tests.sh --image-tag <tag> | ||
|
||
where if you still want to use images that were built before by the test framework: | ||
|
||
dev/dev-run-integration-tests.sh --image-tag $(cat target/imageTag.txt) | ||
|
||
## Customizing the Spark Source Code to Test | ||
|
||
By default, the test framework will test the master branch of Spark from [here](https://github.com/apache/spark). You | ||
can specify the following options to test against different source versions of Spark: | ||
|
||
* `--spark-repo <repo>` - set `<repo>` to the git or http URI of the Spark git repository to clone | ||
* `--spark-branch <branch>` - set `<branch>` to the branch of the repository to build. | ||
|
||
|
||
An example: | ||
|
||
dev/dev-run-integration-tests.sh \ | ||
--spark-repo https://github.com/apache-spark-on-k8s/spark \ | ||
--spark-branch new-feature | ||
|
||
Additionally, you can use a pre-built Spark distribution. In this case, the repository is not cloned at all, and no | ||
source code has to be compiled. | ||
|
||
* `--spark-tgz <path-to-tgz>` - set `<path-to-tgz>` to point to a tarball containing the Spark distribution to test. | ||
|
||
When the tests are cloning a repository and building it, the Spark distribution is placed in `target/spark/spark-<VERSION>.tgz`. | ||
Reuse this tarball to save a significant amount of time if you are iterating on the development of these integration tests. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
#!/bin/bash | ||
#!/usr/bin/env bash | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we need this? We'll need to ensure that it stays in sync with our repo? Or maybe we can get rid of it once we merge this upstream? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We get rid of this when we merge into upstream, but even aside from that, this has hardly been changing in upstream and it's theoretically completely isolated from what we will use to build Spark (since the shell forks the second Maven process it's completely independent) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We also have to add it here now because unlike before, now There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of copying the content here, could you just have a much shorter script that just downloads from https://raw.githubusercontent.com/apache/spark/master/build/mvn and execs it? Something like build/mvn-getter whose content is:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm, I think that works. Will incorporate that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
|
||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one or more | ||
# contributor license agreements. See the NOTICE file distributed with | ||
# this work for additional information regarding copyright ownership. | ||
|
@@ -14,23 +15,10 @@ | |
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
### This script can be used to run integration tests locally on minikube. | ||
### Requirements: minikube v0.23+ with the DNS addon enabled, and kubectl configured to point to it. | ||
|
||
set -ex | ||
|
||
### Basic Validation ### | ||
if [ ! -d "integration-test" ]; then | ||
echo "This script must be invoked from the top-level directory of the integration-tests repository" | ||
usage | ||
exit 1 | ||
fi | ||
|
||
# Set up config. | ||
master=$(kubectl cluster-info | head -n 1 | grep -oE "https?://[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}(:[0-9]+)?") | ||
repo="https://github.com/apache/spark" | ||
image_repo=test | ||
|
||
# Run tests in minikube mode. | ||
./e2e/runner.sh -m $master -r $repo -i $image_repo -d minikube | ||
BUILD_DIR=$(dirname $0) | ||
MVN_RUNNER=$BUILD_DIR/run-mvn | ||
curl -s https://raw.githubusercontent.com/apache/spark/master/build/mvn > $MVN_RUNNER | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not just pipe to bash? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do the arguments sent to this script follow the pipe? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We also don't want to download the mvn script if we already have it - so the script ends up writing simpler with that fix. Addressed in latest patch. |
||
chmod +x $MVN_RUNNER | ||
source $MVN_RUNNER |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
#!/usr/bin/env bash | ||
|
||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one or more | ||
# contributor license agreements. See the NOTICE file distributed with | ||
# this work for additional information regarding copyright ownership. | ||
# The ASF licenses this file to You under the Apache License, Version 2.0 | ||
# (the "License"); you may not use this file except in compliance with | ||
# the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# | ||
|
||
TEST_ROOT_DIR=$(git rev-parse --show-toplevel) | ||
BRANCH="master" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove unnecessary quotes |
||
SPARK_REPO="https://github.com/apache/spark" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove unnecessary quotes |
||
SPARK_REPO_LOCAL_DIR="$TEST_ROOT_DIR/target/spark" | ||
DEPLOY_MODE=minikube | ||
IMAGE_REPO="docker.io/kubespark" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove unnecessary quotes |
||
SPARK_TGZ="N/A" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove unnecessary quotes |
||
IMAGE_TAG="N/A" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove unnecessary quotes |
||
SPARK_MASTER= | ||
|
||
# Parse arguments | ||
while (( "$#" )); do | ||
case $1 in | ||
--spark-branch) | ||
BRANCH="$2" | ||
shift | ||
;; | ||
--spark-repo) | ||
SPARK_REPO="$2" | ||
shift | ||
;; | ||
--image-repo) | ||
IMAGE_REPO="$2" | ||
shift | ||
;; | ||
--image-tag) | ||
IMAGE_TAG="$2" | ||
shift | ||
;; | ||
--deploy-mode) | ||
DEPLOY_MODE="$2" | ||
shift | ||
;; | ||
--spark-tgz) | ||
SPARK_TGZ="$2" | ||
shift | ||
;; | ||
*) | ||
break | ||
;; | ||
esac | ||
shift | ||
done | ||
|
||
if [[ $SPARK_TGZ == "N/A" ]]; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you going to keep this clause, or delete it? I think it's pretty straightforward this way, so I think it'd be reasonable to keep it. |
||
then | ||
echo "Cloning $SPARK_REPO into $SPARK_REPO_LOCAL_DIR and checking out $BRANCH." | ||
|
||
# clone spark distribution if needed. | ||
if [ -d "$SPARK_REPO_LOCAL_DIR" ]; | ||
then | ||
(cd $SPARK_REPO_LOCAL_DIR && git fetch origin $branch); | ||
else | ||
mkdir -p $SPARK_REPO_LOCAL_DIR; | ||
git clone -b $BRANCH --single-branch $SPARK_REPO $SPARK_REPO_LOCAL_DIR; | ||
fi | ||
cd $SPARK_REPO_LOCAL_DIR | ||
git checkout -B $BRANCH origin/$branch | ||
./dev/make-distribution.sh --tgz -Phadoop-2.7 -Pkubernetes -DskipTests; | ||
SPARK_TGZ=$(find $SPARK_REPO_LOCAL_DIR -name spark-*.tgz) | ||
echo "Built Spark TGZ at $SPARK_TGZ". | ||
cd - | ||
fi | ||
|
||
cd $TEST_ROOT_DIR | ||
|
||
if [ -z $SPARK_MASTER ]; | ||
then | ||
build/mvn integration-test \ | ||
-Dspark.kubernetes.test.sparkTgz=$SPARK_TGZ \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would like to add support for arbitrary maven arguments, just couldn't quite get the shell scripting to work out yet. |
||
-Dspark.kubernetes.test.imageTag=$IMAGE_TAG \ | ||
-Dspark.kubernetes.test.imageRepo=$IMAGE_REPO \ | ||
-Dspark.kubernetes.test.deployMode=$DEPLOY_MODE \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should drop the backslash here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Duh, it's just the last line (line 92) where it should be dropped, I got it now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is addressed. |
||
else | ||
build/mvn integration-test \ | ||
-Dspark.kubernetes.test.sparkTgz=$SPARK_TGZ \ | ||
-Dspark.kubernetes.test.imageTag=$IMAGE_TAG \ | ||
-Dspark.kubernetes.test.imageRepo=$IMAGE_REPO \ | ||
-Dspark.kubernetes.test.deployMode=$DEPLOY_MODE \ | ||
-Dspark.kubernetes.test.master=$SPARK_MASTER \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should drop the backslash. |
||
fi |
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the motivation of why we should have the integration test code ever try to clone and build spark. Why not just always depend on a pre-built spark distribution? (Sorry if I missed the reason for this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've found it a lot easier locally to be able to run a single command that both fetches the Spark distribution and builds it and then the integration test uses that Spark distribution. We provide the optionality for the local development scenario but we can still provide the TGZ specifically in Jenkins via
spark.kubernetes.test.sparkTgz
.