Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[benchmarks] don't fail on suite setup issues #2654

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pbalcer
Copy link
Contributor

@pbalcer pbalcer commented Feb 3, 2025

No description provided.

@pbalcer pbalcer requested a review from a team as a code owner February 3, 2025 11:59
Copy link

github-actions bot commented Feb 3, 2025

Compute Benchmarks level_zero run (with params: --sycl-target intel_gpu_pvc):
https://github.com/oneapi-src/unified-runtime/actions/runs/13112970891

@github-actions github-actions bot added the ci/cd Continuous integration/devliery label Feb 3, 2025
Copy link

github-actions bot commented Feb 3, 2025

Compute Benchmarks level_zero run (--sycl-target intel_gpu_pvc):
https://github.com/oneapi-src/unified-runtime/actions/runs/13112970891
Job status: success. Test status: success.

Summary

Total 79 benchmarks in mean.
Geomean 100.132%.
Improved 13 Regressed 9 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group api (12): 99.570%
Benchmark This PR baseline Relative perf Change -
api_overhead_benchmark_l0 SubmitKernel out of order 11.708000 μs 11.868 μs 101.37% 1.37% .
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 2.098000 μs 2.113 μs 100.71% 0.71% .
api_overhead_benchmark_ur SubmitKernel out of order CPU count 104663.000000 instr 104663.000 instr 100.00% 0.00% .
api_overhead_benchmark_ur SubmitKernel in order CPU count 110006.000000 instr 110006.000 instr 100.00% 0.00% .
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count 123063.000 instr 122876.000000 instr 99.85% -0.15% .
api_overhead_benchmark_ur SubmitKernel in order with measure completion 21.072 μs 21.005000 μs 99.68% -0.32% .
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 1.689 μs 1.679000 μs 99.41% -0.59% .
api_overhead_benchmark_ur SubmitKernel in order 16.358 μs 16.241000 μs 99.28% -0.72% .
api_overhead_benchmark_sycl SubmitKernel in order 24.312 μs 24.133000 μs 99.26% -0.74% .
api_overhead_benchmark_ur SubmitKernel out of order 15.915 μs 15.750000 μs 98.96% -1.04% .
api_overhead_benchmark_sycl SubmitKernel out of order 23.382 μs 22.969000 μs 98.23% -1.77% .
api_overhead_benchmark_l0 SubmitKernel in order 11.636 μs 11.418000 μs 98.13% -1.87% .
Relative perf in group memory (4): 100.222%
Benchmark This PR baseline Relative perf Change -
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.183000 GB/s 3.158 GB/s 100.79% 0.79% .
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 250.480000 μs 251.872 μs 100.56% 0.56% .
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 5.550000 μs 5.573 μs 100.41% 0.41% .
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 133.629 μs 132.472000 μs 99.13% -0.87% .
Relative perf in group miscellaneous (1): 100.034%
Benchmark This PR baseline Relative perf Change -
miscellaneous_benchmark_sycl VectorSum 860.370000 bw GB/s 860.664 bw GB/s 100.03% 0.03% .
Relative perf in group multithread (10): 99.357%
Benchmark This PR baseline Relative perf Change -
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1 7425.063000 μs 7472.404 μs 100.64% 0.64% .
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1 6930.378000 μs 6939.950 μs 100.14% 0.14% .
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1 1200.630000 μs 1201.865 μs 100.10% 0.10% .
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1 2095.744 μs 2093.086000 μs 99.87% -0.13% .
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events 113100.883 μs 112790.682000 μs 99.73% -0.27% .
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1 8725.475 μs 8689.121000 μs 99.58% -0.42% .
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events 41248.839 μs 40846.653000 μs 99.02% -0.98% .
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1 26016.861 μs 25587.435000 μs 98.35% -1.65% .
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1 17482.001 μs 17154.077000 μs 98.12% -1.88% .
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1 47867.991 μs 46935.372000 μs 98.05% -1.95% .
Relative perf in group graph (10): 100.689%
Benchmark This PR baseline Relative perf Change -
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:10 5585.614000 μs 5721.966 μs 102.44% 2.44% ++
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100 56449.397000 μs 57817.523 μs 102.42% 2.42% ++
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:10 5591.936000 μs 5688.177 μs 101.72% 1.72% .
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:10 62.106000 μs 62.367 μs 100.42% 0.42% .
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10 72509.690000 μs 72642.878 μs 100.18% 0.18% .
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:10 54.513000 μs 54.566 μs 100.10% 0.10% .
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100 353437.568000 μs 353502.721 μs 100.02% 0.02% .
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10 71760.323 μs 71747.470000 μs 99.98% -0.02% .
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100 353432.608 μs 353339.946000 μs 99.97% -0.03% .
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100 676.466 μs 674.284000 μs 99.68% -0.32% .
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (5): 101.878%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool<os_provider> 2862.910000 ns 3174.620 ns 110.89% 10.89% ++++++++
alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider 2085.710000 ns 2192.650 ns 105.13% 5.13% ++++
alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool<os_provider> 310.765 ns 306.767000 ns 98.71% -1.29% .
alloc/size:10000/0/4096/iterations:200000/threads:4 umfProxy 2783.700 ns 2735.530000 ns 98.27% -1.73% .
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc 2699.630 ns 2620.060000 ns 97.05% -2.95% --
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (5): 100.230%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider 192.698000 ns 195.988 ns 101.71% 1.71% .
alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool<os_provider> 267.576000 ns 271.315 ns 101.40% 1.40% .
alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool<os_provider> 213.528000 ns 213.992 ns 100.22% 0.22% .
alloc/size:10000/0/4096/iterations:200000/threads:1 umfProxy 718.204 ns 711.693000 ns 99.09% -0.91% .
alloc/size:10000/0/4096/iterations:200000/threads:1 glibc 719.650 ns 710.790000 ns 98.77% -1.23% .
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (5): 98.903%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider 1715.220000 ns 1936.480 ns 112.90% 12.90% +++++++++
alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool<os_provider> 3193.060000 ns 3386.980 ns 106.07% 6.07% ++++
alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool<os_provider> 249.441000 ns 253.226 ns 101.52% 1.52% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc 1408.400 ns 1267.280000 ns 89.98% -10.02% -------
alloc/size:10000/100000/4096/iterations:200000/threads:4 umfProxy 1421.850 ns 1230.060000 ns 86.51% -13.49% ----------
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (5): 99.348%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool<os_provider> 289.022000 ns 299.838 ns 103.74% 3.74% +++
alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider 191.496000 ns 192.935 ns 100.75% 0.75% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool<os_provider> 208.625 ns 206.336000 ns 98.90% -1.10% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 umfProxy 755.180 ns 730.895000 ns 96.78% -3.22% --
alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc 752.595 ns 727.999000 ns 96.73% -3.27% --
Relative perf in group alloc/min (6): 102.427%
Benchmark This PR baseline Relative perf Change -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool<os_provider> 993.201000 ns 1128.250 ns 113.60% 13.60% ++++++++++
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 umfProxy 176.046000 ns 182.287 ns 103.55% 3.55% +++
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool<os_provider> 962.199000 ns 968.189 ns 100.62% 0.62% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 umfProxy 834.393000 ns 834.560 ns 100.02% 0.02% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc 814.536 ns 809.442000 ns 99.37% -0.63% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc 180.546 ns 177.227000 ns 98.16% -1.84% .
Relative perf in group multiple (16): 99.885%
Benchmark This PR baseline Relative perf Change -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 umfProxy 26694.800000 ns 27865.300 ns 104.38% 4.38% +++
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider 139774.000000 ns 144859.000 ns 103.64% 3.64% +++
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 umfProxy 4130.730000 ns 4241.250 ns 102.68% 2.68% ++
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool<os_provider> 1151730.000000 ns 1181150.000 ns 102.55% 2.55% ++
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool<os_provider> 74249.600000 ns 75687.100 ns 101.94% 1.94% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool<os_provider> 158545.000000 ns 160647.000 ns 101.33% 1.33% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 umfProxy 138895.000 ns 138580.000000 ns 99.77% -0.23% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 umfProxy 31092.200 ns 31018.400000 ns 99.76% -0.24% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc 139588.000 ns 139089.000000 ns 99.64% -0.36% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool<os_provider> 15375.100 ns 15279.900000 ns 99.38% -0.62% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc 4256.920 ns 4200.920000 ns 98.68% -1.32% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool<os_provider> 42358.400 ns 41527.800000 ns 98.04% -1.96% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool<os_provider> 25788.500 ns 25041.800000 ns 97.10% -2.90% --
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider 1198540.000 ns 1162710.000000 ns 97.01% -2.99% --
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc 32125.800 ns 31133.200000 ns 96.91% -3.09% --
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc 31533.000 ns 30222.700000 ns 95.84% -4.16% ---
Relative perf in group Velocity-Bench (9): cannot calculate
Benchmark This PR baseline Relative perf Change -
Velocity-Bench Hashtable - 358.375158 M keys/sec
Velocity-Bench Bitcracker - 35.965200 s
Velocity-Bench CudaSift - 201.701000 ms
Velocity-Bench Easywave - 226.000000 ms
Velocity-Bench QuickSilver - 117.580000 MMS/CTT
Velocity-Bench Sobel Filter - 611.944000 ms
Velocity-Bench dl-cifar - 23.442800 s
Velocity-Bench dl-mnist - 2.720000 s
Velocity-Bench svm - 0.134300 s
Relative perf in group Runtime (8): cannot calculate
Benchmark This PR baseline Relative perf Change -
Runtime_IndependentDAGTaskThroughput_SingleTask - 268.614000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor - 277.626000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor - 277.078000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor - 277.264000 ms
Runtime_DAGTaskThroughput_SingleTask - 1688.724000 ms
Runtime_DAGTaskThroughput_BasicParallelFor - 1764.745000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor - 1737.282000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor - 1705.559000 ms
Relative perf in group MicroBench (14): cannot calculate
Benchmark This PR baseline Relative perf Change -
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous - 5.241000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous - 4.991000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous - 4.763000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous - 4.863000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous - 618.230000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous - 618.282000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Strided - 4.928000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Strided - 5.197000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Strided - 5.079000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Strided - 5.207000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Strided - 617.816000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Strided - 617.727000 ms
MicroBench_LocalMem_int32_4096 - 29.924000 ms
MicroBench_LocalMem_fp32_4096 - 29.864000 ms
Relative perf in group Pattern (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
Pattern_Reduction_NDRange_int32 - 16.761000 ms
Pattern_Reduction_Hierarchical_int32 - 16.736000 ms
Pattern_SegmentedReduction_NDRange_int16 - 2.264000 ms
Pattern_SegmentedReduction_NDRange_int32 - 2.166000 ms
Pattern_SegmentedReduction_NDRange_int64 - 2.337000 ms
Pattern_SegmentedReduction_NDRange_fp32 - 2.165000 ms
Pattern_SegmentedReduction_Hierarchical_int16 - 11.801000 ms
Pattern_SegmentedReduction_Hierarchical_int32 - 11.589000 ms
Pattern_SegmentedReduction_Hierarchical_int64 - 11.771000 ms
Pattern_SegmentedReduction_Hierarchical_fp32 - 11.590000 ms
Relative perf in group ScalarProduct (6): cannot calculate
Benchmark This PR baseline Relative perf Change -
ScalarProduct_NDRange_int32 - 3.744000 ms
ScalarProduct_NDRange_int64 - 5.440000 ms
ScalarProduct_NDRange_fp32 - 3.760000 ms
ScalarProduct_Hierarchical_int32 - 10.507000 ms
ScalarProduct_Hierarchical_int64 - 11.485000 ms
ScalarProduct_Hierarchical_fp32 - 10.152000 ms
Relative perf in group USM (7): cannot calculate
Benchmark This PR baseline Relative perf Change -
USM_Allocation_latency_fp32_device - 0.066000 ms
USM_Allocation_latency_fp32_host - 37.402000 ms
USM_Allocation_latency_fp32_shared - 0.065000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch - 1.681000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch - 1.056000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch - 1.838000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch - 1.205000 ms
Relative perf in group VectorAddition (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
VectorAddition_int32 - 1.492000 ms
VectorAddition_int64 - 3.061000 ms
VectorAddition_fp32 - 1.434000 ms
Relative perf in group Polybench (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
Polybench_2mm - 1.039000 ms
Polybench_3mm - 1.482000 ms
Polybench_Atax - 6.416000 ms
Relative perf in group Kmeans (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
Kmeans_fp32 - 14.144000 ms
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
LinearRegressionCoeff_fp32 - 899.874000 ms
Relative perf in group MolecularDynamics (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
MolecularDynamics - 0.029000 ms
Relative perf in group llama.cpp (6): cannot calculate
Benchmark This PR baseline Relative perf Change -
llama.cpp Prompt Processing Batched 128 - 824.202968 token/s
llama.cpp Text Generation Batched 128 - 62.990615 token/s
llama.cpp Prompt Processing Batched 256 - 870.375426 token/s
llama.cpp Text Generation Batched 256 - 62.990517 token/s
llama.cpp Prompt Processing Batched 512 - 429.991968 token/s
llama.cpp Text Generation Batched 512 - 62.959741 token/s

Details

Benchmark details - environment, command...
api_overhead_benchmark_l0 SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel in order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel in order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros --multiplier=1

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

miscellaneous_benchmark_sycl VectorSum

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=1 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=4 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=10 --withGraphs=0

graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=10 --withGraphs=1

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=100 --withGraphs=0

graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=100 --withGraphs=1

graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=0 --numKernels=10

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=1 --numKernels=10

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=1 --numKernels=100

graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=0 --numKernels=10

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=1 --numKernels=10

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=1 --numKernels=100

api_overhead_benchmark_ur SubmitKernel out of order CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

alloc/size:10000/0/4096/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/0/4096/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

@pbalcer pbalcer force-pushed the add-sycl-target-pvc branch from 8378842 to 27c2f7a Compare February 3, 2025 15:00
Copy link

github-actions bot commented Feb 3, 2025

Compute Benchmarks level_zero run (with params: --sycl-target intel_gpu_pvc):
https://github.com/oneapi-src/unified-runtime/actions/runs/13116361310

Copy link

github-actions bot commented Feb 3, 2025

Compute Benchmarks level_zero run (--sycl-target intel_gpu_pvc):
https://github.com/oneapi-src/unified-runtime/actions/runs/13116361310
Job status: success. Test status: success.

Summary

Total 79 benchmarks in mean.
Geomean 99.804%.
Improved 10 Regressed 15 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group api (12): 99.330%
Benchmark This PR baseline Relative perf Change -
api_overhead_benchmark_l0 SubmitKernel out of order 11.688000 μs 11.868 μs 101.54% 1.54% .
api_overhead_benchmark_ur SubmitKernel out of order 15.553000 μs 15.750 μs 101.27% 1.27% .
api_overhead_benchmark_ur SubmitKernel out of order CPU count 104663.000000 instr 104663.000 instr 100.00% 0.00% .
api_overhead_benchmark_ur SubmitKernel in order CPU count 110006.000000 instr 110006.000 instr 100.00% 0.00% .
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count 122876.000000 instr 122876.000 instr 100.00% 0.00% .
api_overhead_benchmark_ur SubmitKernel in order with measure completion 21.091 μs 21.005000 μs 99.59% -0.41% .
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 1.686 μs 1.679000 μs 99.58% -0.42% .
api_overhead_benchmark_sycl SubmitKernel out of order 23.322 μs 22.969000 μs 98.49% -1.51% .
api_overhead_benchmark_l0 SubmitKernel in order 11.639 μs 11.418000 μs 98.10% -1.90% .
api_overhead_benchmark_sycl SubmitKernel in order 24.627 μs 24.133000 μs 97.99% -2.01% -
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 2.157 μs 2.113000 μs 97.96% -2.04% -
api_overhead_benchmark_ur SubmitKernel in order 16.653 μs 16.241000 μs 97.53% -2.47% -
Relative perf in group memory (4): 99.370%
Benchmark This PR baseline Relative perf Change -
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.162000 GB/s 3.158 GB/s 100.13% 0.13% .
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 5.594 μs 5.573000 μs 99.62% -0.38% .
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 253.003 μs 251.872000 μs 99.55% -0.45% .
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 134.918 μs 132.472000 μs 98.19% -1.81% .
Relative perf in group miscellaneous (1): 105.917%
Benchmark This PR baseline Relative perf Change -
miscellaneous_benchmark_sycl VectorSum 812.587000 bw GB/s 860.664 bw GB/s 105.92% 5.92% +++
Relative perf in group multithread (10): 99.730%
Benchmark This PR baseline Relative perf Change -
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1 2068.430000 μs 2093.086 μs 101.19% 1.19% .
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1 7433.408000 μs 7472.404 μs 100.52% 0.52% .
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1 25552.489000 μs 25587.435 μs 100.14% 0.14% .
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events 40877.062 μs 40846.653000 μs 99.93% -0.07% .
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1 17175.787 μs 17154.077000 μs 99.87% -0.13% .
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events 113014.974 μs 112790.682000 μs 99.80% -0.20% .
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1 1204.627 μs 1201.865000 μs 99.77% -0.23% .
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1 6958.898 μs 6939.950000 μs 99.73% -0.27% .
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1 8723.371 μs 8689.121000 μs 99.61% -0.39% .
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1 48485.934 μs 46935.372000 μs 96.80% -3.20% --
Relative perf in group graph (10): 100.664%
Benchmark This PR baseline Relative perf Change -
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100 56469.890000 μs 57817.523 μs 102.39% 2.39% +
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:10 5595.119000 μs 5721.966 μs 102.27% 2.27% +
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:10 5615.528000 μs 5688.177 μs 101.29% 1.29% .
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:10 62.003000 μs 62.367 μs 100.59% 0.59% .
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10 72560.815000 μs 72642.878 μs 100.11% 0.11% .
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100 673.827000 μs 674.284 μs 100.07% 0.07% .
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10 71727.394000 μs 71747.470 μs 100.03% 0.03% .
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100 353408.656000 μs 353502.721 μs 100.03% 0.03% .
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100 353371.260 μs 353339.946000 μs 99.99% -0.01% .
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:10 54.612 μs 54.566000 μs 99.92% -0.08% .
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (5): 101.332%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool<os_provider> 285.187000 ns 306.767 ns 107.57% 7.57% ++++
alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool<os_provider> 3130.450000 ns 3174.620 ns 101.41% 1.41% .
alloc/size:10000/0/4096/iterations:200000/threads:4 umfProxy 2698.480000 ns 2735.530 ns 101.37% 1.37% .
alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider 2174.040000 ns 2192.650 ns 100.86% 0.86% .
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc 2735.030 ns 2620.060000 ns 95.80% -4.20% --
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (5): 99.678%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider 192.683000 ns 195.988 ns 101.72% 1.72% .
alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool<os_provider> 273.010 ns 271.315000 ns 99.38% -0.62% .
alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool<os_provider> 215.383 ns 213.992000 ns 99.35% -0.65% .
alloc/size:10000/0/4096/iterations:200000/threads:1 glibc 717.186 ns 710.790000 ns 99.11% -0.89% .
alloc/size:10000/0/4096/iterations:200000/threads:1 umfProxy 719.890 ns 711.693000 ns 98.86% -1.14% .
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (5): 100.739%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc 1223.930000 ns 1267.280 ns 103.54% 3.54% ++
alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider 1899.840000 ns 1936.480 ns 101.93% 1.93% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 umfProxy 1217.490000 ns 1230.060 ns 101.03% 1.03% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool<os_provider> 3378.360000 ns 3386.980 ns 100.26% 0.26% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool<os_provider> 260.917 ns 253.226000 ns 97.05% -2.95% -
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (5): 95.253%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool<os_provider> 292.458000 ns 299.838 ns 102.52% 2.52% +
alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider 191.592000 ns 192.935 ns 100.70% 0.70% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc 742.173 ns 727.999000 ns 98.09% -1.91% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 umfProxy 746.594 ns 730.895000 ns 97.90% -2.10% -
alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool<os_provider> 260.879 ns 206.336000 ns 79.09% -20.91% ----------
Relative perf in group alloc/min (6): 101.945%
Benchmark This PR baseline Relative perf Change -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool<os_provider> 996.326000 ns 1128.250 ns 113.24% 13.24% ++++++
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 umfProxy 176.906000 ns 182.287 ns 103.04% 3.04% +
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc 174.627000 ns 177.227 ns 101.49% 1.49% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool<os_provider> 967.803000 ns 968.189 ns 100.04% 0.04% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 umfProxy 857.112 ns 834.560000 ns 97.37% -2.63% -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc 831.810 ns 809.442000 ns 97.31% -2.69% -
Relative perf in group multiple (16): 99.347%
Benchmark This PR baseline Relative perf Change -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider 140271.000000 ns 144859.000 ns 103.27% 3.27% ++
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 umfProxy 4152.220000 ns 4241.250 ns 102.14% 2.14% +
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool<os_provider> 158512.000000 ns 160647.000 ns 101.35% 1.35% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool<os_provider> 1165480.000000 ns 1181150.000 ns 101.34% 1.34% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool<os_provider> 74983.800000 ns 75687.100 ns 100.94% 0.94% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 umfProxy 30770.300000 ns 31018.400 ns 100.81% 0.81% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool<os_provider> 15186.100000 ns 15279.900 ns 100.62% 0.62% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool<os_provider> 41813.000 ns 41527.800000 ns 99.32% -0.68% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc 140913.000 ns 139089.000000 ns 98.71% -1.29% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 umfProxy 140543.000 ns 138580.000000 ns 98.60% -1.40% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc 31651.200 ns 31133.200000 ns 98.36% -1.64% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 umfProxy 28511.500 ns 27865.300000 ns 97.73% -2.27% -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider 1195510.000 ns 1162710.000000 ns 97.26% -2.74% -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc 4330.310 ns 4200.920000 ns 97.01% -2.99% -
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool<os_provider> 26003.600 ns 25041.800000 ns 96.30% -3.70% --
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc 31432.800 ns 30222.700000 ns 96.15% -3.85% --
Relative perf in group Velocity-Bench (9): cannot calculate
Benchmark This PR baseline Relative perf Change -
Velocity-Bench Hashtable - 358.375158 M keys/sec
Velocity-Bench Bitcracker - 35.965200 s
Velocity-Bench CudaSift - 201.701000 ms
Velocity-Bench Easywave - 226.000000 ms
Velocity-Bench QuickSilver - 117.580000 MMS/CTT
Velocity-Bench Sobel Filter - 611.944000 ms
Velocity-Bench dl-cifar - 23.442800 s
Velocity-Bench dl-mnist - 2.720000 s
Velocity-Bench svm - 0.134300 s
Relative perf in group Runtime (8): cannot calculate
Benchmark This PR baseline Relative perf Change -
Runtime_IndependentDAGTaskThroughput_SingleTask - 268.614000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor - 277.626000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor - 277.078000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor - 277.264000 ms
Runtime_DAGTaskThroughput_SingleTask - 1688.724000 ms
Runtime_DAGTaskThroughput_BasicParallelFor - 1764.745000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor - 1737.282000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor - 1705.559000 ms
Relative perf in group MicroBench (14): cannot calculate
Benchmark This PR baseline Relative perf Change -
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous - 5.241000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous - 4.991000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous - 4.763000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous - 4.863000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous - 618.230000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous - 618.282000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Strided - 4.928000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Strided - 5.197000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Strided - 5.079000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Strided - 5.207000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Strided - 617.816000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Strided - 617.727000 ms
MicroBench_LocalMem_int32_4096 - 29.924000 ms
MicroBench_LocalMem_fp32_4096 - 29.864000 ms
Relative perf in group Pattern (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
Pattern_Reduction_NDRange_int32 - 16.761000 ms
Pattern_Reduction_Hierarchical_int32 - 16.736000 ms
Pattern_SegmentedReduction_NDRange_int16 - 2.264000 ms
Pattern_SegmentedReduction_NDRange_int32 - 2.166000 ms
Pattern_SegmentedReduction_NDRange_int64 - 2.337000 ms
Pattern_SegmentedReduction_NDRange_fp32 - 2.165000 ms
Pattern_SegmentedReduction_Hierarchical_int16 - 11.801000 ms
Pattern_SegmentedReduction_Hierarchical_int32 - 11.589000 ms
Pattern_SegmentedReduction_Hierarchical_int64 - 11.771000 ms
Pattern_SegmentedReduction_Hierarchical_fp32 - 11.590000 ms
Relative perf in group ScalarProduct (6): cannot calculate
Benchmark This PR baseline Relative perf Change -
ScalarProduct_NDRange_int32 - 3.744000 ms
ScalarProduct_NDRange_int64 - 5.440000 ms
ScalarProduct_NDRange_fp32 - 3.760000 ms
ScalarProduct_Hierarchical_int32 - 10.507000 ms
ScalarProduct_Hierarchical_int64 - 11.485000 ms
ScalarProduct_Hierarchical_fp32 - 10.152000 ms
Relative perf in group USM (7): cannot calculate
Benchmark This PR baseline Relative perf Change -
USM_Allocation_latency_fp32_device - 0.066000 ms
USM_Allocation_latency_fp32_host - 37.402000 ms
USM_Allocation_latency_fp32_shared - 0.065000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch - 1.681000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch - 1.056000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch - 1.838000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch - 1.205000 ms
Relative perf in group VectorAddition (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
VectorAddition_int32 - 1.492000 ms
VectorAddition_int64 - 3.061000 ms
VectorAddition_fp32 - 1.434000 ms
Relative perf in group Polybench (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
Polybench_2mm - 1.039000 ms
Polybench_3mm - 1.482000 ms
Polybench_Atax - 6.416000 ms
Relative perf in group Kmeans (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
Kmeans_fp32 - 14.144000 ms
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
LinearRegressionCoeff_fp32 - 899.874000 ms
Relative perf in group MolecularDynamics (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
MolecularDynamics - 0.029000 ms
Relative perf in group llama.cpp (6): cannot calculate
Benchmark This PR baseline Relative perf Change -
llama.cpp Prompt Processing Batched 128 - 824.202968 token/s
llama.cpp Text Generation Batched 128 - 62.990615 token/s
llama.cpp Prompt Processing Batched 256 - 870.375426 token/s
llama.cpp Text Generation Batched 256 - 62.990517 token/s
llama.cpp Prompt Processing Batched 512 - 429.991968 token/s
llama.cpp Text Generation Batched 512 - 62.959741 token/s

Details

Benchmark details - environment, command...
api_overhead_benchmark_l0 SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel in order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel in order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros --multiplier=1

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

miscellaneous_benchmark_sycl VectorSum

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=1 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=4 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=10 --withGraphs=0

graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=10 --withGraphs=1

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=100 --withGraphs=0

graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=100 --withGraphs=1

graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=0 --numKernels=10

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=1 --numKernels=10

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=1 --numKernels=100

graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=0 --numKernels=10

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=1 --numKernels=10

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=1 --numKernels=100

api_overhead_benchmark_ur SubmitKernel out of order CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

alloc/size:10000/0/4096/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/0/4096/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

@pbalcer pbalcer force-pushed the add-sycl-target-pvc branch from 27c2f7a to f93adc4 Compare February 3, 2025 15:52
@pbalcer pbalcer changed the title [benchmarks] add explicit sycl target for building benchmarks [benchmarks] don't fail on suite setup issues Feb 3, 2025
Copy link

github-actions bot commented Feb 3, 2025

Compute Benchmarks level_zero run (with params: ):
https://github.com/oneapi-src/unified-runtime/actions/runs/13117455679

Copy link

github-actions bot commented Feb 3, 2025

Compute Benchmarks level_zero run ():
https://github.com/oneapi-src/unified-runtime/actions/runs/13117455679
Job status: success. Test status: success.

Summary

Total 90 benchmarks in mean.
Geomean 99.390%.
Improved 14 Regressed 14 (threshold 2.00%)

(result is better)

Performance change in benchmark groups

Relative perf in group api (12): 99.148%
Benchmark This PR baseline Relative perf Change -
api_overhead_benchmark_ur SubmitKernel out of order 15.585000 μs 15.750 μs 101.06% 1.06% .
api_overhead_benchmark_l0 SubmitKernel out of order 11.813000 μs 11.868 μs 100.47% 0.47% .
api_overhead_benchmark_ur SubmitKernel out of order CPU count 104663.000000 instr 104663.000 instr 100.00% 0.00% .
api_overhead_benchmark_ur SubmitKernel in order CPU count 110006.000000 instr 110006.000 instr 100.00% 0.00% .
api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count 122876.000000 instr 122876.000 instr 100.00% 0.00% .
api_overhead_benchmark_ur SubmitKernel in order with measure completion 21.080 μs 21.005000 μs 99.64% -0.36% .
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 2.137 μs 2.113000 μs 98.88% -1.12% .
api_overhead_benchmark_sycl SubmitKernel in order 24.425 μs 24.133000 μs 98.80% -1.20% .
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 1.713 μs 1.679000 μs 98.02% -1.98% .
api_overhead_benchmark_sycl SubmitKernel out of order 23.472 μs 22.969000 μs 97.86% -2.14% -
api_overhead_benchmark_ur SubmitKernel in order 16.642 μs 16.241000 μs 97.59% -2.41% -
api_overhead_benchmark_l0 SubmitKernel in order 11.706 μs 11.418000 μs 97.54% -2.46% -
Relative perf in group memory (4): 99.520%
Benchmark This PR baseline Relative perf Change -
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.198000 GB/s 3.158 GB/s 101.27% 1.27% .
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 5.619 μs 5.573000 μs 99.18% -0.82% .
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 254.840 μs 251.872000 μs 98.84% -1.16% .
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 134.056 μs 132.472000 μs 98.82% -1.18% .
Relative perf in group miscellaneous (1): 105.882%
Benchmark This PR baseline Relative perf Change -
miscellaneous_benchmark_sycl VectorSum 812.850000 bw GB/s 860.664 bw GB/s 105.88% 5.88% +++
Relative perf in group multithread (10): 100.062%
Benchmark This PR baseline Relative perf Change -
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1 2056.593000 μs 2093.086 μs 101.77% 1.77% .
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events 111136.742000 μs 112790.682 μs 101.49% 1.49% .
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1 8626.433000 μs 8689.121 μs 100.73% 0.73% .
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1 7442.492000 μs 7472.404 μs 100.40% 0.40% .
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1 25506.903000 μs 25587.435 μs 100.32% 0.32% .
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1 6940.709 μs 6939.950000 μs 99.99% -0.01% .
multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1 1205.284 μs 1201.865000 μs 99.72% -0.28% .
multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events 41092.553 μs 40846.653000 μs 99.40% -0.60% .
multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1 17272.728 μs 17154.077000 μs 99.31% -0.69% .
multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1 48112.388 μs 46935.372000 μs 97.55% -2.45% -
Relative perf in group graph (10): 100.579%
Benchmark This PR baseline Relative perf Change -
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:10 5580.336000 μs 5721.966 μs 102.54% 2.54% +
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100 56471.937000 μs 57817.523 μs 102.38% 2.38% +
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:10 5596.135000 μs 5688.177 μs 101.64% 1.64% .
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100 353205.252000 μs 353502.721 μs 100.08% 0.08% .
graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10 72619.075000 μs 72642.878 μs 100.03% 0.03% .
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10 71742.419000 μs 71747.470 μs 100.01% 0.01% .
graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:10 54.581 μs 54.566000 μs 99.97% -0.03% .
graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100 353516.899 μs 353339.946000 μs 99.95% -0.05% .
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:10 62.520 μs 62.367000 μs 99.76% -0.24% .
graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100 677.834 μs 674.284000 μs 99.48% -0.52% .
Relative perf in group Velocity-Bench (9): 100.042%
Benchmark This PR baseline Relative perf Change -
Velocity-Bench Bitcracker 35.521300 s 35.965 s 101.25% 1.25% .
Velocity-Bench QuickSilver 118.350000 MMS/CTT 117.580 MMS/CTT 100.65% 0.65% .
Velocity-Bench dl-mnist 2.730 s 2.720000 s 99.63% -0.37% .
Velocity-Bench Sobel Filter 615.011 ms 611.944000 ms 99.50% -0.50% .
Velocity-Bench Hashtable 355.453 M keys/sec 358.375158 M keys/sec 99.18% -0.82% .
Velocity-Bench CudaSift - 201.701000 ms
Velocity-Bench Easywave - 226.000000 ms
Velocity-Bench dl-cifar - 23.442800 s
Velocity-Bench svm - 0.134300 s
Relative perf in group llama.cpp (6): 99.226%
Benchmark This PR baseline Relative perf Change -
llama.cpp Prompt Processing Batched 256 867.196 token/s 870.375426 token/s 99.63% -0.37% .
llama.cpp Prompt Processing Batched 128 820.005 token/s 824.202968 token/s 99.49% -0.51% .
llama.cpp Text Generation Batched 128 62.483 token/s 62.990615 token/s 99.19% -0.81% .
llama.cpp Text Generation Batched 256 62.482 token/s 62.990517 token/s 99.19% -0.81% .
llama.cpp Text Generation Batched 512 62.450 token/s 62.959741 token/s 99.19% -0.81% .
llama.cpp Prompt Processing Batched 512 424.217 token/s 429.991968 token/s 98.66% -1.34% .
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:4 (5): 102.024%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider 2093.460000 ns 2192.650 ns 104.74% 4.74% ++
alloc/size:10000/0/4096/iterations:200000/threads:4 umfProxy 2663.560000 ns 2735.530 ns 102.70% 2.70% +
alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool<os_provider> 3109.090000 ns 3174.620 ns 102.11% 2.11% +
alloc/size:10000/0/4096/iterations:200000/threads:4 glibc 2604.620000 ns 2620.060 ns 100.59% 0.59% .
alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool<os_provider> 306.617000 ns 306.767 ns 100.05% 0.05% .
Relative perf in group alloc/size:10000/0/4096/iterations:200000/threads:1 (5): 99.117%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool<os_provider> 271.318 ns 271.315000 ns 100.00% -0.00% .
alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider 196.030 ns 195.988000 ns 99.98% -0.02% .
alloc/size:10000/0/4096/iterations:200000/threads:1 umfProxy 718.486 ns 711.693000 ns 99.05% -0.95% .
alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool<os_provider> 217.713 ns 213.992000 ns 98.29% -1.71% .
alloc/size:10000/0/4096/iterations:200000/threads:1 glibc 723.248 ns 710.790000 ns 98.28% -1.72% .
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:4 (5): 96.005%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool<os_provider> 3317.410000 ns 3386.980 ns 102.10% 2.10% +
alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider 1919.300000 ns 1936.480 ns 100.90% 0.90% .
alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool<os_provider> 259.377 ns 253.226000 ns 97.63% -2.37% -
alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc 1375.520 ns 1267.280000 ns 92.13% -7.87% ----
alloc/size:10000/100000/4096/iterations:200000/threads:4 umfProxy 1397.420 ns 1230.060000 ns 88.02% -11.98% ------
Relative perf in group alloc/size:10000/100000/4096/iterations:200000/threads:1 (5): 91.420%
Benchmark This PR baseline Relative perf Change -
alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool<os_provider> 291.382000 ns 299.838 ns 102.90% 2.90% +
alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider 193.823 ns 192.935000 ns 99.54% -0.46% .
alloc/size:10000/100000/4096/iterations:200000/threads:1 umfProxy 816.172 ns 730.895000 ns 89.55% -10.45% -----
alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc 818.649 ns 727.999000 ns 88.93% -11.07% -----
alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool<os_provider> 263.580 ns 206.336000 ns 78.28% -21.72% ----------
Relative perf in group alloc/min (6): 101.387%
Benchmark This PR baseline Relative perf Change -
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool<os_provider> 979.688000 ns 1128.250 ns 115.16% 15.16% +++++++
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 umfProxy 182.265000 ns 182.287 ns 100.01% 0.01% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 umfProxy 838.330 ns 834.560000 ns 99.55% -0.45% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool<os_provider> 981.010 ns 968.189000 ns 98.69% -1.31% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc 180.268 ns 177.227000 ns 98.31% -1.69% .
alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc 829.074 ns 809.442000 ns 97.63% -2.37% -
Relative perf in group multiple (16): 100.056%
Benchmark This PR baseline Relative perf Change -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider 137763.000000 ns 144859.000 ns 105.15% 5.15% ++
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool<os_provider> 1132110.000000 ns 1181150.000 ns 104.33% 4.33% ++
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc 29159.200000 ns 30222.700 ns 103.65% 3.65% ++
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool<os_provider> 157077.000000 ns 160647.000 ns 102.27% 2.27% +
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 umfProxy 30333.700000 ns 31018.400 ns 102.26% 2.26% +
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 umfProxy 4161.020000 ns 4241.250 ns 101.93% 1.93% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool<os_provider> 15132.200000 ns 15279.900 ns 100.98% 0.98% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider 1161470.000000 ns 1162710.000 ns 100.11% 0.11% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool<os_provider> 75688.200 ns 75687.100000 ns 100.00% -0.00% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 umfProxy 138767.000 ns 138580.000000 ns 99.87% -0.13% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc 4224.700 ns 4200.920000 ns 99.44% -0.56% .
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc 140112.000 ns 139089.000000 ns 99.27% -0.73% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 umfProxy 28135.400 ns 27865.300000 ns 99.04% -0.96% .
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool<os_provider> 42394.300 ns 41527.800000 ns 97.96% -2.04% -
multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc 32503.900 ns 31133.200000 ns 95.78% -4.22% --
multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool<os_provider> 27852.800 ns 25041.800000 ns 89.91% -10.09% -----
Relative perf in group Runtime (8): cannot calculate
Benchmark This PR baseline Relative perf Change -
Runtime_IndependentDAGTaskThroughput_SingleTask - 268.614000 ms
Runtime_IndependentDAGTaskThroughput_BasicParallelFor - 277.626000 ms
Runtime_IndependentDAGTaskThroughput_HierarchicalParallelFor - 277.078000 ms
Runtime_IndependentDAGTaskThroughput_NDRangeParallelFor - 277.264000 ms
Runtime_DAGTaskThroughput_SingleTask - 1688.724000 ms
Runtime_DAGTaskThroughput_BasicParallelFor - 1764.745000 ms
Runtime_DAGTaskThroughput_HierarchicalParallelFor - 1737.282000 ms
Runtime_DAGTaskThroughput_NDRangeParallelFor - 1705.559000 ms
Relative perf in group MicroBench (14): cannot calculate
Benchmark This PR baseline Relative perf Change -
MicroBench_HostDeviceBandwidth_1D_H2D_Contiguous - 5.241000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Contiguous - 4.991000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Contiguous - 4.763000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Contiguous - 4.863000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Contiguous - 618.230000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Contiguous - 618.282000 ms
MicroBench_HostDeviceBandwidth_1D_H2D_Strided - 4.928000 ms
MicroBench_HostDeviceBandwidth_2D_H2D_Strided - 5.197000 ms
MicroBench_HostDeviceBandwidth_3D_H2D_Strided - 5.079000 ms
MicroBench_HostDeviceBandwidth_1D_D2H_Strided - 5.207000 ms
MicroBench_HostDeviceBandwidth_2D_D2H_Strided - 617.816000 ms
MicroBench_HostDeviceBandwidth_3D_D2H_Strided - 617.727000 ms
MicroBench_LocalMem_int32_4096 - 29.924000 ms
MicroBench_LocalMem_fp32_4096 - 29.864000 ms
Relative perf in group Pattern (10): cannot calculate
Benchmark This PR baseline Relative perf Change -
Pattern_Reduction_NDRange_int32 - 16.761000 ms
Pattern_Reduction_Hierarchical_int32 - 16.736000 ms
Pattern_SegmentedReduction_NDRange_int16 - 2.264000 ms
Pattern_SegmentedReduction_NDRange_int32 - 2.166000 ms
Pattern_SegmentedReduction_NDRange_int64 - 2.337000 ms
Pattern_SegmentedReduction_NDRange_fp32 - 2.165000 ms
Pattern_SegmentedReduction_Hierarchical_int16 - 11.801000 ms
Pattern_SegmentedReduction_Hierarchical_int32 - 11.589000 ms
Pattern_SegmentedReduction_Hierarchical_int64 - 11.771000 ms
Pattern_SegmentedReduction_Hierarchical_fp32 - 11.590000 ms
Relative perf in group ScalarProduct (6): cannot calculate
Benchmark This PR baseline Relative perf Change -
ScalarProduct_NDRange_int32 - 3.744000 ms
ScalarProduct_NDRange_int64 - 5.440000 ms
ScalarProduct_NDRange_fp32 - 3.760000 ms
ScalarProduct_Hierarchical_int32 - 10.507000 ms
ScalarProduct_Hierarchical_int64 - 11.485000 ms
ScalarProduct_Hierarchical_fp32 - 10.152000 ms
Relative perf in group USM (7): cannot calculate
Benchmark This PR baseline Relative perf Change -
USM_Allocation_latency_fp32_device - 0.066000 ms
USM_Allocation_latency_fp32_host - 37.402000 ms
USM_Allocation_latency_fp32_shared - 0.065000 ms
USM_Instr_Mix_fp32_device_1:1mix_with_init_no_prefetch - 1.681000 ms
USM_Instr_Mix_fp32_host_1:1mix_with_init_no_prefetch - 1.056000 ms
USM_Instr_Mix_fp32_device_1:1mix_no_init_no_prefetch - 1.838000 ms
USM_Instr_Mix_fp32_host_1:1mix_no_init_no_prefetch - 1.205000 ms
Relative perf in group VectorAddition (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
VectorAddition_int32 - 1.492000 ms
VectorAddition_int64 - 3.061000 ms
VectorAddition_fp32 - 1.434000 ms
Relative perf in group Polybench (3): cannot calculate
Benchmark This PR baseline Relative perf Change -
Polybench_2mm - 1.039000 ms
Polybench_3mm - 1.482000 ms
Polybench_Atax - 6.416000 ms
Relative perf in group Kmeans (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
Kmeans_fp32 - 14.144000 ms
Relative perf in group LinearRegressionCoeff (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
LinearRegressionCoeff_fp32 - 899.874000 ms
Relative perf in group MolecularDynamics (1): cannot calculate
Benchmark This PR baseline Relative perf Change -
MolecularDynamics - 0.029000 ms

Details

Benchmark details - environment, command...
api_overhead_benchmark_l0 SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_l0 SubmitKernel in order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_l0 --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_sycl SubmitKernel in order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros --multiplier=1

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

miscellaneous_benchmark_sycl VectorSum

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:1 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=1 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:1, allocSize:102400 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=1 --NumOpsPerThread=400 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:100, numThreads:8, allocSize:102400 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=102400 --NumThreads=8 --NumOpsPerThread=100 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:400, numThreads:8, allocSize:1024 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=8 --NumOpsPerThread=400 --iterations=1000 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:10, numThreads:16, allocSize:1024 srcUSM:0 dstUSM:1

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=1 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=16 --NumOpsPerThread=10 --iterations=10000 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:1, allocSize:1024 srcUSM:0 dstUSM:1 without events

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=1 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1

multithread_benchmark_ur MemcpyExecute opsPerThread:4096, numThreads:4, allocSize:1024 srcUSM:0 dstUSM:1 without events

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/multithread_benchmark_ur --test=MemcpyExecute --csv --noHeaders --Ioq=1 --UseEvents=0 --MeasureCompletion=1 --UseQueuePerThread=1 --AllocSize=1024 --NumThreads=4 --NumOpsPerThread=4096 --iterations=10 --SrcUSM=0 --DstUSM=1

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=10 --withGraphs=0

graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=10 --withGraphs=1

graph_api_benchmark_sycl SinKernelGraph graphs:0, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=100 --withGraphs=0

graph_api_benchmark_sycl SinKernelGraph graphs:1, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SinKernelGraph --csv --noHeaders --iterations=100 --numKernels=100 --withGraphs=1

graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:1, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=0 --numKernels=10

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=1 --numKernels=10

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:1, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=1 --ioq=1 --numKernels=100

graph_api_benchmark_sycl SubmitExecGraph ioq:0, submit:0, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=0 --numKernels=10

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:10

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=1 --numKernels=10

graph_api_benchmark_sycl SubmitExecGraph ioq:1, submit:0, numKernels:100

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/graph_api_benchmark_sycl --test=SubmitExecGraph --csv --noHeaders --iterations=100 --measureSubmit=0 --ioq=1 --numKernels=100

api_overhead_benchmark_ur SubmitKernel out of order CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel out of order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion CPU count

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

api_overhead_benchmark_ur SubmitKernel in order with measure completion

Environment Variables:

Command:

/home/pmdk/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=1 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Velocity-Bench Hashtable

Environment Variables:

Command:

/home/pmdk/bench_workdir/hashtable/hashtable_sycl --no-verify

Velocity-Bench Bitcracker

Environment Variables:

Command:

/home/pmdk/bench_workdir/bitcracker/bitcracker -f /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/pmdk/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Velocity-Bench QuickSilver

Environment Variables:

QS_DEVICE=GPU

Command:

/home/pmdk/bench_workdir/QuickSilver/qs -i /home/pmdk/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Velocity-Bench Sobel Filter

Environment Variables:

OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/pmdk/bench_workdir/sobel_filter/sobel_filter -i /home/pmdk/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Velocity-Bench dl-mnist

Environment Variables:

NEOReadDebugKeys=1
DisableScratchPages=0

Command:

/home/pmdk/bench_workdir/dl-mnist/dl-mnist-sycl -conv_algo ONEDNN_AUTO

llama.cpp Prompt Processing Batched 128

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

llama.cpp Text Generation Batched 128

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

llama.cpp Prompt Processing Batched 256

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

llama.cpp Text Generation Batched 256

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

llama.cpp Prompt Processing Batched 512

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

llama.cpp Text Generation Batched 512

Environment Variables:

Command:

/home/pmdk/bench_workdir/llamacpp-build/bin/llama-bench --output csv -n 128 -p 512 -b 128,256,512 --numa isolate -t 56 --model /home/pmdk/bench_workdir/models/Phi-3-mini-4k-instruct-q4.gguf

alloc/size:10000/0/4096/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/100000/4096/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 glibc

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 proxy_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 os_provider

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 scalable_pool

Environment Variables:

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv

alloc/size:10000/0/4096/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/0/4096/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/size:10000/100000/4096/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

alloc/min size:10000/max size:0/granularity:8/65536/8/iterations:200000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/size:10000/4096/iterations:2000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:4 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

multiple_malloc_free/min size:10000/max size:8/granularity:65536/8/iterations:2000/threads:1 umfProxy

Environment Variables:

LD_PRELOAD=/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/lib/libumf_proxy.so

Command:

/home/pmdk/ur-actions-runner/_work/unified-runtime/unified-runtime/umf_build/benchmark/umf-benchmark --benchmark_format=csv --benchmark_filter=glibc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/cd Continuous integration/devliery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants