-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include failed benchmarks in output and continue #272
Conversation
I would prefer if it had N/A for the entire row of failed benchmarks. NaN usually indicates that some numerical error happened. |
Thanks, I appreciate that feedback! I had originally written up a solution that was more intentionally creating the data in the loop (which I think is what we'll need to do eventually). I lost my changes and I tried to quickly reimplement and so that I could push something to preserve the WIP. This version ended up being much simpler to start, as instead of rewriting the data generation it just fakes the data into something that can get past the calculations (which is what produces the NaNs). I expect there will be more work needed to handle (whatever format is used for) the new empty data (anything that reads these CSVs) so I'll need to do a lot of exploring. |
Could maybe do something simple, like not store any data for failing benchmarks, then at the end when reporting the data, if there's a benchmark that was in the list to be benchmarked, but has no associated data, you just report N/A for all the values? |
Yeah, that's a good idea! That would be simpler than altering that loop. I also wondered if there was any useful data we could include about the failure. I think the message would be too large. Not sure if the error class might be useful. |
Maybe write the stderr output in a log file? |
Well, yeah if there isn't value in jamming the failure into the data report, we could certainly just add extra output to the console. |
15dd57a
to
7027af3
Compare
You can see it working in the failed Which raises a question: Would we want these CI tests to fail fast? Or run as much as they can? |
I would personally like this CI to run all. If it failed early, you wouldn't know what other tests could fail, which could ultimately end up wasting your time by failing again after fixing the immediate failure. It runs only one warmup iteration and one benchmark iteration, and 12 minutes don't seem too long as a CI job. |
It doesn't appear to have ever been used and all it would have done was to merge stderr into stdout.
52535d0
to
c0a767f
Compare
I think it would be a lot more noticeable if the two failed benchmarks appeared in the table, maybe as the first rows, with N/A for all values. Makes it hard not to notice. |
Injected leading rows into the table filled with |
result = {} | ||
|
||
IO.pipe do |reader, writer| | ||
result[:success] = system(env, command, err: writer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to capture the exit status number here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered that, but didn't see using it (I expected they'd all be the same).
We could print what it was when we print the failure output. 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. It will occasionally be different when we hit unexpected cases -- that is, when we hit different exit codes. But reasonable to not bother with it.
|
||
result = {} | ||
|
||
IO.pipe do |reader, writer| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I did #280, and it works for --yjit-stats
. But then I also noticed I sometimes print debugging logs to stderr (not stdout so that it doesn't influence tests, etc.). The capturing seems to make it pretty slow, and you don't get any output since the output is no longer streamed.
Can we do either of the following two options?
- The stderr output is streamed to the console, and show the captured result at the end again.
- Just avoid capturing stderr.
To my knowledge, there's no easy way to implement (1). You'd need to spawn a Thread to achieve it like Open3 does. Because of the complexity, I'm personally in favor of (2); simply stop capturing result[:err]
.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 to avoid capturing stderr, it's enough to scroll above to find the error(s) and definitely do not want to delay showing those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, with the summary at the end making it is obvious that some failed, even if you can't scrollback you can always run individual ones again to see the errors. Always happy to make it simpler: #281
Running
Where
fluentd
andruby-lsp
fail (on some executables) will generate data for all the successful benchmarksand produce output like this which reprints any failure output at the end and exits non-zero.
As for the
./data/output_*
files:.txt
file that contains this same output as the console (includes the N/A rows)..csv
file contains only rows for benchmarks that passed for all executables..json
file shows data that passed (in this case "fluentd" appears under "yjit" but not under the other two).