-
Dear all, I am trying to build a NN model using brevitas and FINN. In the hardware implementation, I exported the stitched IP and ran behavior simulation on the IP within Vivado. In the folding config, I made it as fully parallel in each layer (PS and SIMD are all in their maximum value) to largely reduce latency. Thank you in advance. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Hi, Maybe your pipeline parallelism is not fully balanced, so that you have a bottleneck in a later stage of the pipeline? Normally you would want the bottleneck to be your first layer and match or exceed the throughput for all following layers. For full unfolding of CNN layers you need to use the ConvolutionInputGenerator_rtl with |
Beta Was this translation helpful? Give feedback.
I found the way to do so.
In principle, the default IP generated by FINN does not include this loop pipeline setup, so there must be back pressure while the input data comes too frequently.
After the stitched IP is generated, I found that the temporary HLS projects locate under /tmp/.
Then I simply opened each project, add one line in the source code:
#pragma HLS PIPELINE II=1 style=frp
export RTL, and repeat for all the HLS projects.
After using the newly generated IP, the data flow is in full pipeline with back pressure.