Include failed benchmarks in output and continue #272

rwstauner · 2024-01-20T00:11:43Z

Running

./run_benchmarks.rb --once fluentd nqueens ruby-lsp setivar -e "head::$HOME/.rubies/ruby-dev/bin/ruby;yjit::$HOME/.rubies/ruby-dev/bin/ruby --enable-yjit;prism::$HOME/.rubies/ruby-dev/bin/ruby --parser=prism"

Where fluentd and ruby-lsp fail (on some executables) will generate data for all the successful benchmarks
and produce output like this which reprints any failure output at the end and exits non-zero.

As for the ./data/output_* files:

The .txt file that contains this same output as the console (includes the N/A rows).
The .csv file contains only rows for benchmarks that passed for all executables.
The .json file shows data that passed (in this case "fluentd" appears under "yjit" but not under the other two).

itr #1: 38ms
RSS: 13.6MiB
MAXRSS: 14512.0MiB

Error output for fluentd (head):
/Users/rwstauner/src/github.com/Shopify/yjit-bench/benchmarks/fluentd/benchmark.rb:6:in `<main>': yjit (RuntimeError)


Error output for fluentd (prism):
/Users/rwstauner/.rubies/ruby-dev/bin/ruby: warning: The compiler based on the Prism parser is currently experimental and compatibility with the compiler based on parse.y is not yet complete. Please report any issues you find on the `ruby/prism` issue tracker.
/Users/rwstauner/src/github.com/Shopify/yjit-bench/benchmarks/fluentd/benchmark.rb:5:in `<compiled>': yjit (RuntimeError)


Error output for ruby-lsp (prism):
/Users/rwstauner/.rubies/ruby-dev/bin/ruby: warning: The compiler based on the Prism parser is currently experimental and compatibility with the compiler based on parse.y is not yet complete. Please report any issues you find on the `ruby/prism` issue tracker.
/Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/methods/signature.rb:70:in `initialize': undefined method `any?' for nil (NoMethodError)
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/methods/_methods.rb:373:in `new'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/methods/_methods.rb:373:in `build_sig'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/methods/_methods.rb:342:in `run_sig'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/methods/_methods.rb:242:in `block in _on_method_added'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/methods/_methods.rb:443:in `run_sig_block_for_key'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/methods/_methods.rb:423:in `maybe_run_sig_block_for_key'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/methods/_methods.rb:252:in `block in _on_method_added'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/props/_props.rb:34:in `new'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/props/_props.rb:34:in `decorator'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/props/_props.rb:148:in `included'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/mixins/mixins.rb:8:in `included'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/props/optional.rb:4:in `include'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/props/optional.rb:4:in `<module:Optional>'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/props/optional.rb:3:in `<compiled>'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/sorbet-runtime.rb:102:in `require_relative'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/sorbet-runtime.rb:102:in `<compiled>'
        from /Users/rwstauner/.rubies/ruby-dev/lib/ruby/3.4.0+0/bundled_gems.rb:73:in `require'
        from /Users/rwstauner/.rubies/ruby-dev/lib/ruby/3.4.0+0/bundled_gems.rb:73:in `block (2 levels) in replace_require'
        from /Users/rwstauner/.gem/ruby/dev/gems/ruby-lsp-0.13.4/lib/ruby_lsp/internal.rb:8:in `<compiled>'
        from /Users/rwstauner/.rubies/ruby-dev/lib/ruby/3.4.0+0/bundled_gems.rb:73:in `require'
        from /Users/rwstauner/.rubies/ruby-dev/lib/ruby/3.4.0+0/bundled_gems.rb:73:in `block (2 levels) in replace_require'
        from /Users/rwstauner/src/github.com/Shopify/yjit-bench/benchmarks/ruby-lsp/benchmark.rb:10:in `<compiled>'

Total time spent benchmarking: 5s
Failed benchmarks: 3

head: ruby 3.4.0dev (2024-01-18T15:35:46Z master 00814fd672) [arm64-darwin23]
yjit: ruby 3.4.0dev (2024-01-18T15:35:46Z master 00814fd672) +YJIT [arm64-darwin23]
prism: ruby 3.4.0dev (2024-01-18T15:35:46Z master 00814fd672) +PRISM [arm64-darwin23]

--------  ---------  ----------  ---------  ----------  ----------  ----------  ------------  -------------  ---------  ----------
bench     head (ms)  stddev (%)  yjit (ms)  stddev (%)  prism (ms)  stddev (%)  yjit 1st itr  prism 1st itr  head/yjit  head/prism
fluentd   N/A        N/A         N/A        N/A         N/A         N/A         N/A           N/A            N/A        N/A
ruby-lsp  N/A        N/A         N/A        N/A         N/A         N/A         N/A           N/A            N/A        N/A
nqueens   166.4      0.0         166.8      0.0         166.3       0.0         1.00          1.00           1.00       1.00
setivar   38.5       0.0         38.5       0.0         38.4        0.0         1.00          1.00           1.00       1.00
--------  ---------  ----------  ---------  ----------  ----------  ----------  ------------  -------------  ---------  ----------
Legend:
- yjit 1st itr: ratio of head/yjit time for the first benchmarking iteration.
- head/yjit: ratio of head/yjit time. Higher is better for yjit. Above 1 represents a speedup.
- prism 1st itr: ratio of head/prism time for the first benchmarking iteration.
- head/prism: ratio of head/prism time. Higher is better for prism. Above 1 represents a speedup.

Output:
./data/output_158.csv

Failed benchmarks:
  head: fluentd
  prism: fluentd, ruby-lsp

maximecb · 2024-01-22T16:30:40Z

I would prefer if it had N/A for the entire row of failed benchmarks.

NaN usually indicates that some numerical error happened.

rwstauner · 2024-01-22T17:04:41Z

Thanks, I appreciate that feedback!

I had originally written up a solution that was more intentionally creating the data in the loop (which I think is what we'll need to do eventually). I lost my changes and I tried to quickly reimplement and so that I could push something to preserve the WIP. This version ended up being much simpler to start, as instead of rewriting the data generation it just fakes the data into something that can get past the calculations (which is what produces the NaNs).

I expect there will be more work needed to handle (whatever format is used for) the new empty data (anything that reads these CSVs) so I'll need to do a lot of exploring.

maximecb · 2024-01-22T17:06:17Z

Could maybe do something simple, like not store any data for failing benchmarks, then at the end when reporting the data, if there's a benchmark that was in the list to be benchmarked, but has no associated data, you just report N/A for all the values?

rwstauner · 2024-01-22T17:10:03Z

Yeah, that's a good idea! That would be simpler than altering that loop.

I also wondered if there was any useful data we could include about the failure. I think the message would be too large. Not sure if the error class might be useful.
Still, I'd have to see how anything downstream would handle the text in the data.

maximecb · 2024-01-22T17:18:04Z

Maybe write the stderr output in a log file?

rwstauner · 2024-01-22T17:29:39Z

Well, yeah if there isn't value in jamming the failure into the data report, we could certainly just add extra output to the console.
Would be a quick way to start, at least!
Ok, I'll pursue that and we can decide if we want to do anything else with it down the road.

rwstauner · 2024-01-22T21:41:14Z

You can see it working in the failed head build:
https://github.com/Shopify/yjit-bench/actions/runs/7617594824/job/20746942610?pr=272#step:5:1376
(fixed with #274)

Which raises a question: Would we want these CI tests to fail fast? Or run as much as they can?

k0kubun · 2024-01-22T21:48:43Z

Would we want these CI tests to fail fast? Or run as much as they can?

I would personally like this CI to run all. If it failed early, you wouldn't know what other tests could fail, which could ultimately end up wasting your time by failing again after fixing the immediate failure. It runs only one warmup iteration and one benchmark iteration, and 12 minutes don't seem too long as a CI job.

It doesn't appear to have ever been used and all it would have done was to merge stderr into stdout.

maximecb · 2024-01-22T23:20:35Z

I think it would be a lot more noticeable if the two failed benchmarks appeared in the table, maybe as the first rows, with N/A for all values. Makes it hard not to notice.

rwstauner · 2024-01-23T17:18:08Z

Injected leading rows into the table filled with N/A for any benchmarks that failed

noahgibbs · 2024-01-24T12:36:04Z

run_benchmarks.rb

+  result = {}
+
+  IO.pipe do |reader, writer|
+    result[:success] = system(env, command, err: writer)


Might want to capture the exit status number here?

I considered that, but didn't see using it (I expected they'd all be the same).
We could print what it was when we print the failure output. 🤷

Fair enough. It will occasionally be different when we hit unexpected cases -- that is, when we hit different exit codes. But reasonable to not bother with it.

k0kubun · 2024-01-26T01:56:30Z

run_benchmarks.rb

+
+  result = {}
+
+  IO.pipe do |reader, writer|


So I did #280, and it works for --yjit-stats. But then I also noticed I sometimes print debugging logs to stderr (not stdout so that it doesn't influence tests, etc.). The capturing seems to make it pretty slow, and you don't get any output since the output is no longer streamed.

Can we do either of the following two options?

The stderr output is streamed to the console, and show the captured result at the end again.

Just avoid capturing stderr.

To my knowledge, there's no easy way to implement (1). You'd need to spawn a Thread to achieve it like Open3 does. Because of the complexity, I'm personally in favor of (2); simply stop capturing result[:err].

WDYT?

👍 to avoid capturing stderr, it's enough to scroll above to find the error(s) and definitely do not want to delay showing those.

Yeah, with the summary at the end making it is obvious that some failed, even if you can't scrollback you can always run individual ones again to see the errors. Always happy to make it simpler: #281

rwstauner mentioned this pull request Jan 22, 2024

Ignore failures when running the full job in jenkins Shopify/yjit-metrics#218

Closed

rwstauner force-pushed the rwstauner/continue-on-error branch 2 times, most recently from 15dd57a to 7027af3 Compare January 22, 2024 21:03

rwstauner added 3 commits January 22, 2024 14:48

Remove unused verbose arg

8c88af6

It doesn't appear to have ever been used and all it would have done was to merge stderr into stdout.

Process all requested benchmarks and print any failures at the end

200cabc

Print names of failed benchmarks at bottom for visibility

c0a767f

rwstauner force-pushed the rwstauner/continue-on-error branch from 52535d0 to c0a767f Compare January 22, 2024 21:49

rwstauner marked this pull request as ready for review January 22, 2024 21:49

Include failed benchmarks at top of console table with N/A

8751101

maximecb approved these changes Jan 23, 2024

View reviewed changes

noahgibbs reviewed Jan 24, 2024

View reviewed changes

noahgibbs approved these changes Jan 24, 2024

View reviewed changes

maximecb merged commit 88809c9 into main Jan 24, 2024
4 checks passed

maximecb deleted the rwstauner/continue-on-error branch January 24, 2024 16:42

k0kubun mentioned this pull request Jan 25, 2024

Always print stderr #280

Merged

k0kubun reviewed Jan 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include failed benchmarks in output and continue #272

Include failed benchmarks in output and continue #272

rwstauner commented Jan 20, 2024 •

edited

Loading

maximecb commented Jan 22, 2024

rwstauner commented Jan 22, 2024

maximecb commented Jan 22, 2024

rwstauner commented Jan 22, 2024

maximecb commented Jan 22, 2024

rwstauner commented Jan 22, 2024

rwstauner commented Jan 22, 2024 •

edited

Loading

k0kubun commented Jan 22, 2024

maximecb commented Jan 22, 2024

rwstauner commented Jan 23, 2024

noahgibbs Jan 24, 2024

rwstauner Jan 24, 2024

noahgibbs Jan 24, 2024

k0kubun Jan 26, 2024

eregon Jan 26, 2024 •

edited

Loading

rwstauner Jan 26, 2024 •

edited

Loading

Include failed benchmarks in output and continue #272

Include failed benchmarks in output and continue #272

Conversation

rwstauner commented Jan 20, 2024 • edited Loading

maximecb commented Jan 22, 2024

rwstauner commented Jan 22, 2024

maximecb commented Jan 22, 2024

rwstauner commented Jan 22, 2024

maximecb commented Jan 22, 2024

rwstauner commented Jan 22, 2024

rwstauner commented Jan 22, 2024 • edited Loading

k0kubun commented Jan 22, 2024

maximecb commented Jan 22, 2024

rwstauner commented Jan 23, 2024

noahgibbs Jan 24, 2024

Choose a reason for hiding this comment

rwstauner Jan 24, 2024

Choose a reason for hiding this comment

noahgibbs Jan 24, 2024

Choose a reason for hiding this comment

k0kubun Jan 26, 2024

Choose a reason for hiding this comment

eregon Jan 26, 2024 • edited Loading

Choose a reason for hiding this comment

rwstauner Jan 26, 2024 • edited Loading

Choose a reason for hiding this comment

rwstauner commented Jan 20, 2024 •

edited

Loading

rwstauner commented Jan 22, 2024 •

edited

Loading

eregon Jan 26, 2024 •

edited

Loading

rwstauner Jan 26, 2024 •

edited

Loading