-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move docker image management and test entrypoint to Maven #31
Conversation
Also requires the test commands in Jenkins to change. |
todo and not handled by this patch:
|
@@ -0,0 +1,158 @@ | |||
#!/usr/bin/env bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? We'll need to ensure that it stays in sync with our repo? Or maybe we can get rid of it once we merge this upstream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We get rid of this when we merge into upstream, but even aside from that, this has hardly been changing in upstream and it's theoretically completely isolated from what we will use to build Spark (since the shell forks the second Maven process it's completely independent)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also have to add it here now because unlike before, now mvn
is the entrypoint into the whole system, as it's only the substep of the build reactor that clones the Spark repository down. Aside from that, we also allow for Spark TGZs to be provided to remove the usage of the Spark source code entirely. So no matter what we're going to need our own Maven here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of copying the content here, could you just have a much shorter script that just downloads from https://raw.githubusercontent.com/apache/spark/master/build/mvn and execs it? Something like build/mvn-getter whose content is:
#!/bin/sh
curl -s https://raw.githubusercontent.com/apache/spark/master/build/mvn | bash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I think that works. Will incorporate that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
scripts/parse-arguments.sh
Outdated
# | ||
|
||
set -ex | ||
TEST_ROOT_DIR=$(git rev-parse --show-toplevel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of these defaults are repeated again in pom.xml
- the idea is that maintainers of the framework can run the scripts in isolation without using Maven. This can be done both with the top level setup-integration-test-env.sh
script, or any of the sub-components - e.g. clone-spark.sh
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. it seems error prone that this needs to stay in sync with the args in pom.xml. Is there a better dispatch mechanism than to use this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pom.xml
can assume that these are the defaults and that they are correct, but I like having them explicitly enumerated in the pom.xml
because the behavior can be known right from the pom. But I also like being able to run the script independently without any arguments. But all of this is just for convenience, so for the sake of developer sanity we can make one of these ideas less convenient.
Don't have a strong opinion on the best way to move forward.
scripts/parse-arguments.sh
Outdated
DEPLOY_MODE=minikube | ||
IMAGE_REPO="docker.io/kubespark" | ||
SKIP_BUILDING_IMAGES=false | ||
SPARK_TGZ="N/A" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is less than ideal. It's more elegant to have here: SPARK_TGZ=
and then to check in other places, if [[ -z $SPARK_TGZ ]]
. The problem is that Maven's exec
plugin has to pass in a fixed set of command line arguments. Maven will always have to pass in some spark-tgz
argument. But we can't pass in empty, because we then end up with commands like setup-integration-test-env --spark-tgz --deploy-mode cloud
which screws up the argument shifting.
An alternative we can employ in general is to allow these variables to be set by environment variables as well and to pass them in as environment variables from Maven. Ideas are welcome here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use the maven exec plugin like this:
<executable>/bin/sh</executable>
<arguments>
<argument>-c</argument>
<argument>arbitrary length here ; another command</argument>
</arguments>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the arguments themselves have to be fixed. I can't sometimes pass in the --spark-tgz
flag and sometimes not pass the --spark-tgz
flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets simpler with the idea that this code always takes --spark-tgz, I think.
scripts/write-docker-tag.sh
Outdated
fi | ||
|
||
rm -f $IMAGE_TAG_OUTPUT_FILE | ||
touch $IMAGE_TAG_OUTPUT_FILE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less than ideal. The problem is that Maven wants to pass the image tag into KubernetesSuite
, but Maven doesn't know it offhand until it's resolved by the test environment script. We use a UUID for the default image tag because there can be multiple builds concurrently running against the same Spark git hash. But the UUID can't be generated by Maven, meaning the script has to make it and pass that information along somehow, and a temporary file was the best bridge I could think of. Other more elegant ideas would be appreciated here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concurrent runs against the same git-sha, I'm not quite sure. So, if a new PR is submitted, it would apply that commit on the base and then run the tests - which would make each one have a unique sha. If they are the same sha, then invariably, they are identical and the image could be reused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think there's two git hashes to consider and neither will work:
- If we're talking about the integration test repository git hash, multiple builds should certainly be able to run against the same commit. Multiple test runs on the same node testing two different commits of Spark but using the same integration test version comes to mind.
- If we're talking about the Spark repository git hash, the inverse rule applies here - multiple versions of the integration test running against the same Spark version. Reusing the images is difficult because we need to know when it's safe to GC the image, which is only if all tests holding that given tag are finished. Then we can run into race conditions, etc.
Seems a lot easier to bias towards isolation for correctness and optimize later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be simpler to just GC any images that are from this, but over a day old?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still a potential race with who gets to build the image with the given hash though. Two builds trying to use the same hash need to agree on who builds the image. I don't want to have to deal with that locking at all, so just use completely unique tags everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok!
@@ -1,39 +0,0 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is specific to GKE testsuites, I think we still need this setup script, even if the runner goes away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I didn't notice the apt-get calls. I think we want these to be assumed by the framework. Installing components on demand in a test framework is scary-ish - not to mention that installation probably depends on the user's OS, e.g. if the user is running from Mac OSX this won't work. We should have separate scripts that aren't invoked in the build lifecycle for these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, did not intend for these to be part of the testsuite. These are external "setup" steps and the entrypoint for the k8s CI system.
Agreed. It's separate and the only expected consumer is k8s test
infrastructure.
…On Jan 12, 2018 2:42 PM, "mccheah" ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In e2e/e2e-prow.sh
<#31 (comment)>
:
> @@ -1,39 +0,0 @@
-#!/bin/bash
Ah I didn't notice the apt-get calls. I think we want these to be assumed
by the framework. Installing components on demand in a test framework is
scary-ish - not to mention that installation probably depends on the user's
OS, e.g. if the user is running from Mac OSX this won't work. We should
have separate scripts that aren't invoked in the build lifecycle for these.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#31 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA3U5zFSnkZLVZef7SZqoSf2Xwg__39uks5tJ9_lgaJpZM4Rc4bM>
.
|
The Kubernetes integration tests now always expect an image to be pre-built, so we no longer build images with Scala code. Maven's pre-integration-test invokes a single script to bootstrap the environment with the built images, etc. In the transition we try to keep as much of the same semantics as possible.
ed69645
to
016505d
Compare
README.md
Outdated
## Re-using Docker Images | ||
|
||
By default, the test framework will build new Docker images on every test execution. A unique image tag is generated, | ||
and it is written to file at `target/imageTag.txt`. To reuse the images built in a previous run, set: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked before that it was using the git sha - it made it easy to correlate to the distribution that was just tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed in #31 (comment)
README.md
Outdated
By default, the test framework will test the master branch of Spark from [here](https://github.com/apache/spark). You | ||
can specify the following options to test against different source versions of Spark: | ||
|
||
* `spark.kubernetes.test.sparkRepo` to the git or http URI of the Spark git repository to clone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gitRepo
and gitBranch
may be better names for this.
@@ -0,0 +1,158 @@ | |||
#!/usr/bin/env bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? We'll need to ensure that it stays in sync with our repo? Or maybe we can get rid of it once we merge this upstream?
@@ -1,39 +0,0 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, did not intend for these to be part of the testsuite. These are external "setup" steps and the entrypoint for the k8s CI system.
scripts/build-spark.sh
Outdated
mkdir -p $UNPACKED_SPARK_TGZ | ||
if [[ $SPARK_TGZ == "N/A" ]]; | ||
then | ||
./dev/make-distribution.sh --tgz -Phadoop-2.7 -Pkubernetes -DskipTests; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we omit --tgz
, we should just get the unpacked distribution under dist/
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes it easier to reuse the distribution with the spark.kubernetes.test.sparkTgz
option. Thus that configuration can pull double duty with custom Spark distributions and re-using dynamically built ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for jenkins, we will want to use the pre-built distributions exclusively. This is because we don't want to keep the minikube locked during the building of the distribution. If the spark distribution builds "inside" of the spark-integration test, then it will be harder to express this partial locking. As it currently stands, one jenkins job creates the spark distribution, and another jenkins job does the integration test, making the locking easy.
scripts/parse-arguments.sh
Outdated
# | ||
|
||
set -ex | ||
TEST_ROOT_DIR=$(git rev-parse --show-toplevel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. it seems error prone that this needs to stay in sync with the args in pom.xml. Is there a better dispatch mechanism than to use this?
scripts/write-docker-tag.sh
Outdated
fi | ||
|
||
rm -f $IMAGE_TAG_OUTPUT_FILE | ||
touch $IMAGE_TAG_OUTPUT_FILE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concurrent runs against the same git-sha, I'm not quite sure. So, if a new PR is submitted, it would apply that commit on the base and then run the tests - which would make each one have a unique sha. If they are the same sha, then invariably, they are identical and the image could be reused.
|
||
package object config { | ||
val KUBERNETES_TEST_DOCKER_TAG_SYSTEM_PROPERTY = "spark.kubernetes.test.imageDockerTag" | ||
val DRIVER_DOCKER_IMAGE = "spark.kubernetes.driver.container.image" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These need to be cleaned up I think.
Changed the base to |
|
||
if [[ $SPARK_TGZ == "N/A" ]]; | ||
then | ||
$SCRIPTS_DIR/clone-spark.sh "$@"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of passing all the args around, we should parge the args here, and then set the right env-vars/args and call the other scripts maybe? I think it's an antipattern for each script to be calling parse-args and seeing unrelated args.
README.md
Outdated
-Dspark.kubernetes.test.sparkRepo=https://github.com/apache-spark-on-k8s/spark \ | ||
-Dspark.kubernetes.test.sparkBranch=new-feature | ||
|
||
Additionally, you can use a pre-built Spark distribution. In this case, the repository is not cloned at all, and no |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the motivation of why we should have the integration test code ever try to clone and build spark. Why not just always depend on a pre-built spark distribution? (Sorry if I missed the reason for this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've found it a lot easier locally to be able to run a single command that both fetches the Spark distribution and builds it and then the integration test uses that Spark distribution. We provide the optionality for the local development scenario but we can still provide the TGZ specifically in Jenkins via spark.kubernetes.test.sparkTgz
.
@@ -0,0 +1,158 @@ | |||
#!/usr/bin/env bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of copying the content here, could you just have a much shorter script that just downloads from https://raw.githubusercontent.com/apache/spark/master/build/mvn and execs it? Something like build/mvn-getter whose content is:
#!/bin/sh
curl -s https://raw.githubusercontent.com/apache/spark/master/build/mvn | bash
scripts/build-spark.sh
Outdated
mkdir -p $UNPACKED_SPARK_TGZ | ||
if [[ $SPARK_TGZ == "N/A" ]]; | ||
then | ||
./dev/make-distribution.sh --tgz -Phadoop-2.7 -Pkubernetes -DskipTests; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for jenkins, we will want to use the pre-built distributions exclusively. This is because we don't want to keep the minikube locked during the building of the distribution. If the spark distribution builds "inside" of the spark-integration test, then it will be harder to express this partial locking. As it currently stands, one jenkins job creates the spark distribution, and another jenkins job does the integration test, making the locking easy.
scripts/write-docker-tag.sh
Outdated
fi | ||
|
||
rm -f $IMAGE_TAG_OUTPUT_FILE | ||
touch $IMAGE_TAG_OUTPUT_FILE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be simpler to just GC any images that are from this, but over a day old?
@ssuchter @foxish approach has changed - the |
Reason we have to do this was discussed in sig-big-data - Jenkins will only use the Maven build without building the Spark distribution, so that the maven build doesn't lock the Minikube resource on building Spark, which takes a long time. |
rerun integration tests please |
1 similar comment
rerun integration tests please |
@kimoonkim the Couple of questions:
|
@ssuchter let's just have quotes everywhere. |
@kimoonkim the |
Sorry, forgot to mention what the specific command should be in Jenkins without using the dev script: |
rerun integration tests please |
@kimoonkim integration tests aren't running when I comment, can you trigger the build? |
I think this build 98 actually ran upon your comment, given the timing of build start. I believe there is a quiet period of dozen secs. So maybe it did not trigger right away. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
Right. It gets stopped and deleted at the end. Pepperdata Jenkins cluster has only one node and it still has jobs running old integration tests from our fork. If we keep minikube running, that will interfere with the old integration tests. Stopping minikube is the best compromise for this setup, IMO. |
Hm, but the error is this:
But the tests work locally for me. I wonder if Java's |
Jenkins can switch to this command, but what is the advised way of building docker images now that the scala code does not build them? |
@kimoonkim this PR puts us in an in-between state where Maven is building the Docker images but we expect the external Jenkins job to build the Spark distribution. If we build using Maven, the reactor will still build the images for us. |
I see I'll try adding the minikube location to the PATH used by maven. |
rerun integration tests please |
We're working on that same issue (missing docker) in slack sig-big-data |
rerun integration tests please |
1 similar comment
rerun integration tests please |
Closes #30.