-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track PR metrics with Bencher #3849
Conversation
Thanks for opening this PR. For now I would like to use the As a more general note for bencher itself: It would be great if the frontend could group benchmarks from the same group somewhat. We likely do not want to see all benchmarks for all sizes at the same time. Either we would be interested in seeing all |
Oh, my misunderstanding. I'm sorry! I'll make the change to the
I totally agree! My current thinking is split between two ways of doing this:
I think either way, I will end up doing 2, so the question then becomes, would it be worth it to also have 1? |
cadacec
to
a03da24
Compare
16150d3
to
9dce159
Compare
Okay, I have moved things over to the There are three comments on the PR, one for each testbed: I have an option to only start commenting once there is an Alert is generated (using the Currently the "tag to run benchmarks" mechanism is in place, but it is not meant as a security feature. Once the tag is added any subsequent revisions with that tag will also be benchmarked. I can setup GitHub Actions to require a reviewer before each run if you would prefer that. However, the way that I have things set up now should be secure from pwn requests. |
I just want to leave a comment here that this PR is not forgotten. I'm away for a few days. I try to have a closer look when I'm back. |
Thanks for updating the PR to work with the comparing results on PR's. That look helpful. There are a few things I would like clarify before moving forward:
|
Thank you for the great feedback!
Great point! I've now added two sets of data points:
Metric Boundary:
Very true! I have update things so now for all public projects the PR comment only uses public links. I've also split out the benchmarking of the |
9e6c6ff
to
03c5e3d
Compare
I've been doing a lot of thinking lately about the security implications of running benchmarks from fork PRs. Was the original intention behind the "run on tag" semantics meant as a security feature (requiring review) or simply as a speed of CI optimization (on run benchmarks on PRs where it would matter)? If the intention of the "run on tag" was meant as a security feature, I think we should move over to creating a couple of GitHub Environments and add Required Reviewers for fork PRs. I have a write up on this option here: https://bencher.dev/docs/how-to/github-actions/#benchmark-fork-pr-from-target-branch-with-required-reviewers If the "run on tag" was meant more as a speed of CI optimization, I can work that into this much safer option: https://bencher.dev/docs/how-to/github-actions/#benchmark-fork-pr-and-upload-from-default-branch Please, let me know what you think, and I will update this PR accordingly! |
Thanks for the updates. It's good to see what it would look like. You mention that this approach now reuses "old" builds for the master branch as optimization. That might become problematic, as it seems like there is really a lot of noise between running benchmark jobs on different action runners. The old approach did run both the current PR and the master PR after each other on the same machine. That would enforce the all benchmarks have a similar environment. It reduces the amount of noise drastically in my opinion, which is the thing that it makes helpful for judging the performance impact of certain changes. I'm not sure that this behavior is still there with the new approach? Maybe you can clarify that?
The original reasoning behind the tag based solution is mostly that it is a speed optimization, as everything is "local" to the PR. The worst thing that could happened there was that someone submitted a PR that wast CI time. There are no results that would have been stored for a longer time, so it couldn't influence anything else. |
I do need to clarify 😃 For example, under the old system if we had 3 PRs against The statistical thresholds used by Bencher are what are meant to help deal with this noise, and we only want a single sample run against any branch per change. So yes, being able to judge the performance impact of certain changes is still here in the new approach! It is relying on the statistics, powered by the data of previous runs. The issue with relying on only a single run per version is you can't do such statistical analysis, as
This is very much a positive externality/benefit and not at all the motivation behind the change. I'm sorry for any confusion!
Okay, that makes sense. That has me leaning towards adding "run on tag" to this much safer option: https://bencher.dev/docs/how-to/github-actions/#benchmark-fork-pr-and-upload-from-default-branch |
I have updated this PR to now use the Here is a new example PR doing just that: epompeii#3 Let me know what you think! |
Thanks for the update.
Well, that true if we would have done a single run only, but we use
My main issue with this approach is that the old benchmark on HEAD#1 might have run on a completely different runner with different setup, etc. The metrics data indicate that there is quite a lot of difference between these different runs. I totally understand that you can do some statistics there to get a threshold that says: Anything above/below that is a change in the runtime, but my fear is that this threshold is really large. That would render such a system much less useful for me. |
Sounds good! I can move things over to using relative benchmarking then: https://bencher.dev/docs/how-to/track-benchmarks/#relative-benchmarking I will only run the benchmark harness itself once, as |
Okay, I've moved things back to using relative benchmarking. For security reasons, the results are uploaded as artifacts. Then in a separate You can see the results in this example pull request: epompeii#3 |
Thanks for the update. I wont have the capacity to review this PR before my Christmas holidays, I will try to do this in the new year. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for continue to work on this. It look nearly good now. I left some minor comments about the github action definition. The results + comments look ok for me, although I personally would prefer if the table posted in the comment to the PR would contain the old (main branch) and new (PR branch) timing in an explicit manner + the absolute difference in a third column, but that's a shouldn't block merging this PR.
Thanks for all the great feedback! 😃 |
(I resolved the merge conflict, CI builds will likely fail due to the recent rustc API, will take care of that later) |
@weiznich I just wanted to check in. What can I do to help get this PR across the finish line? |
There is not much that can currently be done as this is blocked on passing CI, which in turn is blocked on fixing rust-lang/rust#120830 which is essentially just needs some more time to get the PR with the fixed merged there. |
ecdc673
to
9dd67d5
Compare
This PR is a follow up from a discussion with @weiznich on adding the nightly
master
metrics to Bencher: #3848A separate Testbed has been created for all three targets in order to distinguish their results (ie
ubuntu-latest-postgres
,ubuntu-latest-sqlite
, andubuntu-latest-mysql
).Currently only the
sqlite
metrics are being ingested by Bencher, due to build failures in the other two. Please, let me know if you would like for me to try to fix those in a follow up PR.The results can be seen here: https://bencher.dev/perf/diesel
Note that it takes a hot second for all 99 results to be returned.
Those results are from a test run on
epompeii/diesel
: https://github.com/epompeii/diesel/actions/runs/6826611755/job/18566903071#step:10:3477In order for this to work on
diesel-rs/diesel
a maintainer will need to add theBENCHER_API_TOKEN
as a repository secret. I could either send over the key I setup or add one of the maintainers ofdiesel
to the project on Bencher and you can use your own API token (preferred solution).