max-time-per-run #576

christianhujer · 2022-10-08T08:53:07Z

For benchmarking, we already have -P, --parameter-scan, and its more flexible counterpart -L, --parameter-list, and that's great. As the example shows, we can use this to run hyperfine like this:

hyperfine -p 'make clean' -P threads 1 8 'make -j {threads}'

I've now found a use case where a possibility to have hyperfine limit the benchmarking runs based on elapsed time could be useful.

hyperfine -L dir 'C,Rust,bash' -L N 10,20,30,40 'make -C {dir} fibonacci-recursive-benchmark N={N}'

For benchmarks where performance varies greatly, like between C and bash, it could occasionally be useful to present results as "aborted (took too long)" by having a --max-time-per-run <TIME> argument, for example --max-time-per-run 2s, that will automatically terminate a run and its associated benchmark when its runs take longer than --max-time-per-run. The values could be output as ∞.

The text was updated successfully, but these errors were encountered:

sharkdp · 2022-10-13T21:25:56Z

Thank you for your request.

I think the implementation of such a feature would require a LOT of special cases downstream to properly handle the absence of values.

Have you considered using sth like timeout to limit the time (in combination with hyperfines --ignore-failure option to ignore the nonzero exit code)? That would not show sth. like aborted or infinity, but it will run into the time limit and show that:

▶ hyperfine --ignore-failure -L time 1,5 'timeout 2 sleep {time}' 
Benchmark 1: timeout 2 sleep 1
  Time (mean ± σ):      1.002 s ±  0.000 s    [User: 0.001 s, System: 0.002 s]
  Range (min … max):    1.001 s …  1.002 s    10 runs
 
Benchmark 2: timeout 2 sleep 5
  Time (mean ± σ):      2.001 s ±  0.000 s    [User: 0.002 s, System: 0.001 s]
  Range (min … max):    2.001 s …  2.002 s    10 runs
 
  Warning: Ignoring non-zero exit code.
 
Summary
  'timeout 2 sleep 1' ran
    2.00 ± 0.00 times faster than 'timeout 2 sleep 5'

In general: why would you be interested in including benchmarks that would potentially run into a time limit?

sharkdp · 2022-10-13T21:26:49Z

see also: #106

christianhujer · 2022-10-23T13:20:05Z

To answer the question "why would I be interested in including benchmarks that would potentially run into a time limit?"

I am benchmarking a matrix of languages and programs automatically.
From my Makefile:

ALL:=$(patsubst %/,%,$(filter-out \
    asm-m68k-amiga-gasm/ \
    asm-m68k-amiga-masm/ \
    asm-m68k-amiga2-masm/ \
    Carbon/ \
    Concurnas/ \
    Logo/ \
    , $(wildcard */)))

.PHONY: hyperfine-roundtrip
hyperfine-roundtrip: hyperfine-roundtrip.csv
hyperfine-roundtrip.csv:
	hyperfine --export-csv hyperfine-roundtrip.csv -L variant $(shell echo $(ALL) | sed -e 's/ /,/g') -p 'make -C {variant} clean' 'make -sC {variant}'

I think you can see how well hyperfine works for this case. ❤️

Before hyperfine, my Makefile looked like this:

.PHONY: time-%
time-%:
	@for ((i = 0; i < 10; i++)); do
	@$(MAKE) -s -C $* clean 2>&1
	@start=$$(date -u +'%s%N')
	@$(MAKE) -s -C $* >/dev/null 2>&1
	@end=$$(date -u +'%s%N')
	@echo '$*,'$$(($$end - $$start))
	@done

time.csv:
	echo 'Language,time (ns)' >$@
	$(MAKE) -s time >>$@

clean::
	$(RM) time.csv

time-processed.csv: time.csv
	sqlite3 >>$@ <<END
	.mode csv
	.import time.csv times
	select "Language", ((1.0 * sum("time (ns)") - max("time (ns)") - min("time (ns)")) / (count("time (ns)") - 2.0)) / 1000000 as "Time (ms)" from times group by "Language" order by "Time (ms)";
	END

Using timeout as a wrapper will work from a functional perspective. The measurement would no longer be just the target program, but timeout plus the target program. One would have to benchmark timeout itself also and then subtract that value, by first running a benchmark on true, then running a benchmark on timeout true and subtracting the benchmark of true from it. That's why having this feature in hyperfine itself would be great.

For a lot of purposes, timeout will work fine, this feature is not essential.
It only matters where some of the results will be so low/fast that the time it takes to run timeout (I guess 3-6ms) makes a significant difference.

(I'm measuring roundtrip times of programming languages, and they can range from a few ms in Perl or Assembler to many seconds like Flix, and it also heavily depends on the problem statement.)

sharkdp · 2022-11-19T20:43:26Z

For a lot of purposes, timeout will work fine, this feature is not essential.
It only matters where some of the results will be so low/fast that the time it takes to run timeout (I guess 3-6ms) makes a significant difference.

Don't guess - measure 😄

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`fd`	12.2 ± 0.9	10.5	14.7	1.00
`timeout 2s fd`	12.9 ± 0.8	11.1	15.7	1.06 ± 0.10

The overhead seems to be below 1 ms.

sharkdp added the feature-request label Apr 17, 2023

sharkdp changed the title ~~Feature Request: max-time-per-run~~ max-time-per-run Apr 17, 2023

hundredwatt mentioned this issue Jan 10, 2024

evaluate2.sh: Add Time Limit for Runs gunnarmorling/1brc#293

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

max-time-per-run #576

max-time-per-run #576

christianhujer commented Oct 8, 2022

sharkdp commented Oct 13, 2022

sharkdp commented Oct 13, 2022

christianhujer commented Oct 23, 2022 •

edited

Loading

sharkdp commented Nov 19, 2022

max-time-per-run #576

max-time-per-run #576

Comments

christianhujer commented Oct 8, 2022

sharkdp commented Oct 13, 2022

sharkdp commented Oct 13, 2022

christianhujer commented Oct 23, 2022 • edited Loading

sharkdp commented Nov 19, 2022

christianhujer commented Oct 23, 2022 •

edited

Loading