How to measure a Halide::Func #5710
Unanswered
TongMoNumb
asked this question in
Q&A
Replies: 1 comment 1 reply
-
One thing you can do to get some understanding at the per If your code is in a generator, in CMake you would do something like the following: add_halide_library(<some_function> FROM <some_generator>
TARGETS ${Halide_CMAKE_TARGET}-profile <- `-profile` here
AUTOSCHEDULER ${SCHEDULER_TYPE}
SCHEDULE OUTVAR
PARAMS auto_schedule=true) With <pipeline>
total time: 1071.243164 ms samples: 995 runs: 10 time/run: 107.124313 ms
average threads used: 7.522613
heap allocations: 2880 peak heap usage: 1357504 bytes
<func1>: 1.057ms (33%) threads: 6.400
<func2>: 1.057ms (33%) threads: 6.400
<func3>: 1.057ms (33%) threads: 6.400 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi All,
I am playing with Adam2019 codes recently. During the beam search, I could extract Halide::Fun and stages of the pipeline, and I want to know how could I measure the pipeline on hardware in this circumstance.
Take Matmul case for example. Base on the tutorials, if I want to measure a pipeline, I should follow steps like:
(1) define var
(2) compile_jit(target)
(3) realize it.
But now, I do not know the exact input size, all I know are the Funcs I obtain:
(1) Func output
produce output:
for y.y in [0, 127]:
unrolled y.yi in [0, 11]:
for x.x in [0, 1]:
for x.xi.xi in [0, 47]:
produce matrix_mul:
vectorized x.xi in [0, 15]:
matrix_mul(...) = ...
vectorized x.xi in [0, 15]:
for r8 in [0, 1535]:
matrix_mul(...) = ...
consume matrix_mul:
vectorized x.xi.xii in [0, 15]:
output(...) = ...
(2) Func Matmul
produce matrix_mul:
unrolled y:
for x.x:
vectorized x.xi in [0, 15]:
matrix_mul(...) = ...
unrolled y:
for x.x:
vectorized x.xi in [0, 15]:
for r8 in [0, 1535]:
matrix_mul(...) = ...
(3) Func Input
produce input_b_im:
for _1:
for _0:
input_b_im(...) = ...
produce input_a_im:
for _1:
for _0:
input_a_im(...) = ...
Above are four Halide::Func I can obtain, based on these, how could I combine them to a multi-stage pipeline and how could I know the input size based on the Func?
Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions