-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Execution of Inference
Network execution happens when user calls inferRequest->infer()
or inferRequest->start_async()
. (link)
In high level, all we need to do is enqueuing OCL kernels with buffers. For that purpose, we need to find the cldnn::network
instance as it contains the required buffers for execution. (link) CPUStreamExecutor
is holding streams and the stream corresponds to the cldnn::network
structure. (link)
The main body of network execution is cldnn::network::execute_impl
. (link) In this function, set_arguments()
is called to set OpenCL arguments and execute_primitive
is called to enqueue kernels to OCL queue.
In case of synchronous API call(i.e. inferRequest->infer()
), waiting for completion of kernels is also required. It is called from cldnn::network_output::get_memory()
function. (link)
cldnn::network::execute_impl
also contains some logic to dump layer in/out buffers for debugging purpose. As it is related to memory usage, it deserves some description, too.
In order to dump buffers, we need to wait for the moment that the kernel is about to be called(for source buffer) or just called(for destination buffer). In other moments, we don't have the layer's buffer as the buffers are reused from memory pool. TBD
get_stream().finish()
is called firstly as we need to be synchronous with kernel execution. (link) Then we can access the buffer. (link) This access varies depending on the kind of buffer. If it is usm_host
or usm_shared
, it is just accessed directly. If it is usm_device
, it is accessed after copying the data into host memory because host cannot access usm_device
directly. (link) If it is ocl memory, we map this into host memory. (link) Typical network execution happens with usm_host
for network input and output and usm_device
for the buffers inside the network.
For usage of this dumping feature, please see link
© Copyright 2018-2024, OpenVINO team
- Home
- General resources
- How to build
-
Developer documentation
- Inference Engine architecture
- CPU plugin
- GPU plugin
- HETERO plugin architecture
- Snippets
- Sample for IE C++/C/Python API
- Proxy plugin (Concept)
- Tests