-
Notifications
You must be signed in to change notification settings - Fork 390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Speed up repository builds in the test suite #1202
[MRG] Speed up repository builds in the test suite #1202
Conversation
The thing to consider about this PR is about the removal of the gitlab build. I think it is far more valuable to save any local developer or someone waiting for the tests to complete to save time than to do what repo2docker already does in its tests. |
2495818
to
492a728
Compare
As I wrote in the issue: I disagree about removing the GitLab test. For local development you can simply not run tests you don't care about/only run tests you care about. For CI runs saving a few minutes is not worth it because CI minutes are something we have a lot of compared to human time required to react to downtime on mybinder.org. The BinderHub repo is special in the sense that we have to have a very high probability that We can speed up the GitLab test by modifying the contents of the repository to for example something like https://github.com/binder-examples/minimal-dockerfile which is about as fast to build as it gets. Why is there an issue and this PR? It feels like living in a parallel universe because there are two places to discuss the same thing. I think there is no need to create an issue if a PR for that issue is created straight away. Let's either only make a PR or make an issue which is only followed up with a PR after discussion. |
Updated the title of the PR to something more descriptive. WDYT? |
binderhub/tests/test_build.py
Outdated
"gh/binderhub-ci-repos/requirements/d687a7f9e6946ab01ef2baa7bd6d5b73c6e904fd", | ||
"git/{}/d687a7f9e6946ab01ef2baa7bd6d5b73c6e904fd".format( | ||
"gh/binderhub-ci-repos/requirements/20c4fe55a9b2c5011d228545e821b1c7b1723652", | ||
"git/{}/20c4fe55a9b2c5011d228545e821b1c7b1723652".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use HEAD here to avoid manually changing the ref for the tests? That also tests the default behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the /master test was added, it was added with the motivation of testing a not already full git hash. Do we want to keep testing a full git hash? I suggest not as we compromise on needing to update this repo to avoid a performance degradation if we add another commit to the test repo.
I favor not using any full git commit reference when building images like we do here, but instead stick to using HEAD only. I suggest we retain a full git commit hash in other tests instead to capture that variation if it is relevant to capture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding a test case for HEAD is a good idea, especially because that is the default now.
I'd also keep the explicit reference case. My reasoning is that it tests a case that isn't the default and infrequently used/developed for. As a developer you might test master/main/HEAD explicitly but rarely think "ah yes, let's make sure explicit commit hashes still work". This makes it a good task for robots because they don't get bored of testing the same stuff over and over again.
Maybe we can remove the manual work of having to update the commit hash by making the test find out the latest hash and then updating the build slug. Like a templated build slug. However we commit so rarely to the CI repos used here that I'd not spend a lot of time on building this automation and instead would add a comment to let performance improvers from the future know that there is a low hanging fruit here.
I understand the friction you have experienced @betatim, I mean to avoid that. I first created the issue, and an hour later or so I decided to invest my time to try address it in a PR.
Wieee yes! Let's do https://github.com/binder-examples/minimal-dockerfile builds! To not need to rebuild the entire repo2docker boilerplate is the biggest improvement we can get I think. |
nods. It is tricky because you want to create an issue when you think of it and don't know if you'll have time to work on it or not. So when you then shortly after do have time, what to do? I don't know. Maybe we should then post in the issue saying "Please discuss in the PR"? No good ideas from my side :(
Lets keep using the GitLab repository but change its contents, not switch to the binder-examples repo. |
I think that was a good idea. Updating the issue to reflect there is now an open PR, and declaring where the discussion should go, and in general I figure it should go in the PR where some of the discussion always will need to go as it is where the changes are declared practically. Action points
|
How many times do we run test_build, and with what arguments?I think this is mutually agreed, so I'll make that explicit.
GitLab or not?I accept building a GitLab repo alongside the GitHub repo if needed, but would like to have a motivation that it is still needed even though we have a GitLab resolver test in Dedicated hash, HEAD, master?Since we have this test trying to resolve various URLs etc, I think the only thing we need to test in test_build, is if we manage to handle weird URLs in practice, but not the resolution of references. Due to this, I suggest the test_build function is run twice. Currently in master
SuggestedI suggest we combine the dedicated commit hash, the URL quoting of the commit hash, and the master branch reference into one.
|
492a728
to
e5ea660
Compare
4b49b6d
to
2ad44de
Compare
Performance testingpytest --durations=0 cached-minimal-dockerfile repo First repo provider build takes ~65-85s (cached) or ~85-105s (non cached) Second repo provider build requests takes ~5s (assuming no image build pod is started) Comparative examples
Conclusions
Action points |
ae61547
to
413247b
Compare
@betatim @minrk I have documented the new minimal-dockerfile and cached-minimal-dockerfile repos in https://github.com/binderhub-ci-repos/minimal-dockerfile. I want to highlight some outcomes of this setup.
Together, this makes us able to update the minimal-dockerfile with changes without thinking about commit changes, as we only reference cached-minimal-dockerfile, and by automatically updating the minimal-dockerfile image by daily rebuilds, we ensure that don't start to fail because the built image has become outdated. It also makes our build duration go from ~20s to ~5s. I have now optimized as much as I can, and it is not a question on how much tests variations should be run. Each additional repo provider we add adds ~65+5 seconds if we do a HEAD+SHA test against a cached-minimal-dockerfile. In the PRs current state we test 3 repo providers with HEAD+SHA. |
@betatim okay to merge? |
I've not had time to review this yet. Is there a reason to be in a hurry merging this optimisation? |
Thanks for working on this optimisation |
jupyterhub/binderhub#1202 Merge pull request #1202 from consideRatio/pr/ci-test-build-speedup
Thank you for getting this merged @betatim! ❤️
Its a personal benefit to a large degree. It improves my own focus to avoid context switching as becomes more needed if a PRs active life is longer rather than shorter. I also personally care about having this and #1184 merged before resuming work on #1101 relying on the features part of 0.10.X of the z2jh Helm chart. Of course I can start without them, but it just adds a bit more complexity to manage when having PRs build on other PRs which may be subject to change still etc. Thank you for your work to maintain this repo Tim! |
Closes #1199 by #1202 (comment)