-
Notifications
You must be signed in to change notification settings - Fork 115
optimize opencl kernel building time (still has problem in win10) #34
base: master
Are you sure you want to change the base?
Conversation
in win10, I find the cpu cost is high when do infer (or wait infer-request), the clWaitForEvents seems to a busy wait. |
for the long time of clCreateBuffer , got it, too much clCreateBuffer (>30000) (the speed is the same as CL_MEM_USE_HOST_PTR , but too much calls) , I will try drop 7.0 since it has memory optimization. clGetplatformids is indeed slow, I have also tried my code and clcaffe, the same , 200+ms, but it seems that windows will slow in the first opencl call. |
A conclusion:
3.high cpu cost (it time to optimize opencl driver or gpu driver) |
Hello @liyuming1978 , I am from OpenCL driver team. DECLARE_DEBUG_VARIABLE(int32_t, OverrideEnableKmdNotify, -1, "-1: dont override, 0: disable, 1: enable") It is also platform dependent, so different SKUs have different timers. But be warned, this may decrease performance, due to completion latencies in non busy mode. For long compilation time, we have an experimental feature in the driver called cl_cache. |
@MichalMrozek cl_cache works! that good to my patch. for DECLARE_DEBUG_VARIABLE... dose it need re-build opencl driver? all just set register key? I use quick and small model to get best performance. and I notice it only happens in windows, not happen in ubuntu. |
Windows is a bit different. Unfortunately Windows builds are not available from open source code so you will not be able to check those flags out and registry flags do not work with official drivers as they are disabled there. |
@MichalMrozek how to enable cl_cache in ubuntu? I use nuc in windows, so , always AC.. my email is [email protected] :) |
cl_cache works the same way on Linux & Windows. |
now, for ssd model , in linux it cost 5s to load, and windows ~10s...
I have done the optimze for kernel build in windows, linux need add set_exepath();
I have not test in linux, just for windows (my isv need windows version) to speedup the loading time.
now, the status in windows is still bad. if kernel always build.
the clCreateBuffer will cost 2s, and clGetplatformids 2s. clBuildProgram 7s
after optimzation, clBuildProgram gone, but clCreateBuffer 9s...
I use Skylake GT3e, and kabylake GT2, the same in win10. the driver is https://downloadmirror.intel.com/27803/a08/win64_24.20.100.6094.exe
the two models are https://github.com/liyuming1978/openvino_example/tree/master/windows/facedemo/facedemo/model