Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include failed benchmarks in output and continue #272

Merged
merged 4 commits into from
Jan 24, 2024

Conversation

rwstauner
Copy link
Contributor

@rwstauner rwstauner commented Jan 20, 2024

Running

./run_benchmarks.rb --once fluentd nqueens ruby-lsp setivar -e "head::$HOME/.rubies/ruby-dev/bin/ruby;yjit::$HOME/.rubies/ruby-dev/bin/ruby --enable-yjit;prism::$HOME/.rubies/ruby-dev/bin/ruby --parser=prism"

Where fluentd and ruby-lsp fail (on some executables) will generate data for all the successful benchmarks
and produce output like this which reprints any failure output at the end and exits non-zero.

As for the ./data/output_* files:

  • The .txt file that contains this same output as the console (includes the N/A rows).
  • The .csv file contains only rows for benchmarks that passed for all executables.
  • The .json file shows data that passed (in this case "fluentd" appears under "yjit" but not under the other two).
itr #1: 38ms
RSS: 13.6MiB
MAXRSS: 14512.0MiB

Error output for fluentd (head):
/Users/rwstauner/src/github.com/Shopify/yjit-bench/benchmarks/fluentd/benchmark.rb:6:in `<main>': yjit (RuntimeError)


Error output for fluentd (prism):
/Users/rwstauner/.rubies/ruby-dev/bin/ruby: warning: The compiler based on the Prism parser is currently experimental and compatibility with the compiler based on parse.y is not yet complete. Please report any issues you find on the `ruby/prism` issue tracker.
/Users/rwstauner/src/github.com/Shopify/yjit-bench/benchmarks/fluentd/benchmark.rb:5:in `<compiled>': yjit (RuntimeError)


Error output for ruby-lsp (prism):
/Users/rwstauner/.rubies/ruby-dev/bin/ruby: warning: The compiler based on the Prism parser is currently experimental and compatibility with the compiler based on parse.y is not yet complete. Please report any issues you find on the `ruby/prism` issue tracker.
/Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/methods/signature.rb:70:in `initialize': undefined method `any?' for nil (NoMethodError)
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/methods/_methods.rb:373:in `new'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/methods/_methods.rb:373:in `build_sig'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/methods/_methods.rb:342:in `run_sig'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/methods/_methods.rb:242:in `block in _on_method_added'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/methods/_methods.rb:443:in `run_sig_block_for_key'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/methods/_methods.rb:423:in `maybe_run_sig_block_for_key'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/methods/_methods.rb:252:in `block in _on_method_added'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/props/_props.rb:34:in `new'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/props/_props.rb:34:in `decorator'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/props/_props.rb:148:in `included'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/private/mixins/mixins.rb:8:in `included'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/props/optional.rb:4:in `include'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/props/optional.rb:4:in `<module:Optional>'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/types/props/optional.rb:3:in `<compiled>'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/sorbet-runtime.rb:102:in `require_relative'
        from /Users/rwstauner/.gem/ruby/dev/gems/sorbet-runtime-0.5.11212/lib/sorbet-runtime.rb:102:in `<compiled>'
        from /Users/rwstauner/.rubies/ruby-dev/lib/ruby/3.4.0+0/bundled_gems.rb:73:in `require'
        from /Users/rwstauner/.rubies/ruby-dev/lib/ruby/3.4.0+0/bundled_gems.rb:73:in `block (2 levels) in replace_require'
        from /Users/rwstauner/.gem/ruby/dev/gems/ruby-lsp-0.13.4/lib/ruby_lsp/internal.rb:8:in `<compiled>'
        from /Users/rwstauner/.rubies/ruby-dev/lib/ruby/3.4.0+0/bundled_gems.rb:73:in `require'
        from /Users/rwstauner/.rubies/ruby-dev/lib/ruby/3.4.0+0/bundled_gems.rb:73:in `block (2 levels) in replace_require'
        from /Users/rwstauner/src/github.com/Shopify/yjit-bench/benchmarks/ruby-lsp/benchmark.rb:10:in `<compiled>'

Total time spent benchmarking: 5s
Failed benchmarks: 3

head: ruby 3.4.0dev (2024-01-18T15:35:46Z master 00814fd672) [arm64-darwin23]
yjit: ruby 3.4.0dev (2024-01-18T15:35:46Z master 00814fd672) +YJIT [arm64-darwin23]
prism: ruby 3.4.0dev (2024-01-18T15:35:46Z master 00814fd672) +PRISM [arm64-darwin23]

--------  ---------  ----------  ---------  ----------  ----------  ----------  ------------  -------------  ---------  ----------
bench     head (ms)  stddev (%)  yjit (ms)  stddev (%)  prism (ms)  stddev (%)  yjit 1st itr  prism 1st itr  head/yjit  head/prism
fluentd   N/A        N/A         N/A        N/A         N/A         N/A         N/A           N/A            N/A        N/A
ruby-lsp  N/A        N/A         N/A        N/A         N/A         N/A         N/A           N/A            N/A        N/A
nqueens   166.4      0.0         166.8      0.0         166.3       0.0         1.00          1.00           1.00       1.00
setivar   38.5       0.0         38.5       0.0         38.4        0.0         1.00          1.00           1.00       1.00
--------  ---------  ----------  ---------  ----------  ----------  ----------  ------------  -------------  ---------  ----------
Legend:
- yjit 1st itr: ratio of head/yjit time for the first benchmarking iteration.
- head/yjit: ratio of head/yjit time. Higher is better for yjit. Above 1 represents a speedup.
- prism 1st itr: ratio of head/prism time for the first benchmarking iteration.
- head/prism: ratio of head/prism time. Higher is better for prism. Above 1 represents a speedup.

Output:
./data/output_158.csv

Failed benchmarks:
  head: fluentd
  prism: fluentd, ruby-lsp

@maximecb
Copy link
Contributor

I would prefer if it had N/A for the entire row of failed benchmarks.

NaN usually indicates that some numerical error happened.

@rwstauner
Copy link
Contributor Author

Thanks, I appreciate that feedback!

I had originally written up a solution that was more intentionally creating the data in the loop (which I think is what we'll need to do eventually). I lost my changes and I tried to quickly reimplement and so that I could push something to preserve the WIP. This version ended up being much simpler to start, as instead of rewriting the data generation it just fakes the data into something that can get past the calculations (which is what produces the NaNs).

I expect there will be more work needed to handle (whatever format is used for) the new empty data (anything that reads these CSVs) so I'll need to do a lot of exploring.

@maximecb
Copy link
Contributor

Could maybe do something simple, like not store any data for failing benchmarks, then at the end when reporting the data, if there's a benchmark that was in the list to be benchmarked, but has no associated data, you just report N/A for all the values?

@rwstauner
Copy link
Contributor Author

Yeah, that's a good idea! That would be simpler than altering that loop.

I also wondered if there was any useful data we could include about the failure. I think the message would be too large. Not sure if the error class might be useful.
Still, I'd have to see how anything downstream would handle the text in the data.

@maximecb
Copy link
Contributor

Maybe write the stderr output in a log file?

@rwstauner
Copy link
Contributor Author

Well, yeah if there isn't value in jamming the failure into the data report, we could certainly just add extra output to the console.
Would be a quick way to start, at least!
Ok, I'll pursue that and we can decide if we want to do anything else with it down the road.

@rwstauner rwstauner force-pushed the rwstauner/continue-on-error branch 2 times, most recently from 15dd57a to 7027af3 Compare January 22, 2024 21:03
@rwstauner
Copy link
Contributor Author

rwstauner commented Jan 22, 2024

You can see it working in the failed head build:
https://github.com/Shopify/yjit-bench/actions/runs/7617594824/job/20746942610?pr=272#step:5:1376
(fixed with #274)

Which raises a question: Would we want these CI tests to fail fast? Or run as much as they can?

@k0kubun
Copy link
Member

k0kubun commented Jan 22, 2024

Would we want these CI tests to fail fast? Or run as much as they can?

I would personally like this CI to run all. If it failed early, you wouldn't know what other tests could fail, which could ultimately end up wasting your time by failing again after fixing the immediate failure. It runs only one warmup iteration and one benchmark iteration, and 12 minutes don't seem too long as a CI job.

@rwstauner rwstauner force-pushed the rwstauner/continue-on-error branch from 52535d0 to c0a767f Compare January 22, 2024 21:49
@rwstauner rwstauner marked this pull request as ready for review January 22, 2024 21:49
@maximecb
Copy link
Contributor

I think it would be a lot more noticeable if the two failed benchmarks appeared in the table, maybe as the first rows, with N/A for all values. Makes it hard not to notice.

@rwstauner
Copy link
Contributor Author

Injected leading rows into the table filled with N/A for any benchmarks that failed

result = {}

IO.pipe do |reader, writer|
result[:success] = system(env, command, err: writer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to capture the exit status number here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that, but didn't see using it (I expected they'd all be the same).
We could print what it was when we print the failure output. 🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. It will occasionally be different when we hit unexpected cases -- that is, when we hit different exit codes. But reasonable to not bother with it.

@maximecb maximecb merged commit 88809c9 into main Jan 24, 2024
4 checks passed
@maximecb maximecb deleted the rwstauner/continue-on-error branch January 24, 2024 16:42
@k0kubun k0kubun mentioned this pull request Jan 25, 2024

result = {}

IO.pipe do |reader, writer|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I did #280, and it works for --yjit-stats. But then I also noticed I sometimes print debugging logs to stderr (not stdout so that it doesn't influence tests, etc.). The capturing seems to make it pretty slow, and you don't get any output since the output is no longer streamed.

Can we do either of the following two options?

  1. The stderr output is streamed to the console, and show the captured result at the end again.
  2. Just avoid capturing stderr.

To my knowledge, there's no easy way to implement (1). You'd need to spawn a Thread to achieve it like Open3 does. Because of the complexity, I'm personally in favor of (2); simply stop capturing result[:err].

WDYT?

Copy link
Contributor

@eregon eregon Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to avoid capturing stderr, it's enough to scroll above to find the error(s) and definitely do not want to delay showing those.

Copy link
Contributor Author

@rwstauner rwstauner Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, with the summary at the end making it is obvious that some failed, even if you can't scrollback you can always run individual ones again to see the errors. Always happy to make it simpler: #281

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants