Skip to content

Latest commit

 

History

History
 
 

section-6

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

Section 6 - Larger Example Designs

There are a number of example designs available here, which further help explain many of the unique features of AI Engines and the NPU array in Ryzen™ AI. This section contains more complex application designs for both vision and machine learning use cases. In particular, we will describe a ResNet implementation on for Ryzen™ AI.

Vision Kernels

Design name Data type Description
Vision Passthrough i8 A simple pipeline with just one passThrough kernel. This pipeline mainly aims to test whether the data movement works correctly to copy a greyscale image.
Color Detect i32 This multi-kernel, multi-core pipeline detects colors in an RGBA image.
Edge Detect i32 A multi-kernel, multi-core pipeline that detects edges in an image and overlays the detection on the original image.
Color Threshold i32 A multi-core data-parallel implementation of color thresholding of a RGBA image.

Machine Learning Designs

Design name Data type Description
bottleneck ui8 A Bottleneck Residual Block is a variant of the residual block that utilizes three convolutions, using 1x1, 3x3, and 1x1 filter sizes, respectively. The implementation features fusing of multiple kernels and dataflow optimizations, highlighting the unique architectural capabilities of AI Engines
resnet ui8 ResNet with offloaded conv2_x layers. The implementation features depth-first implementation of multiple bottleneck blocks across multiple NPU columns.

Exercises

  1. In bottleneck design, how many different types of fused computations do you observe?
  2. In bottleneck design following a dataflow approach, how many elements does the 3x3 convolution operation require from the 1x1 convolution core to proceed with its computation?
  3. Suppose you have a bottleneck block with input dimensions of 32x32x256. After passing through the 1x1 convolutional layer, the output dimensions become 32x32x64. What would be the output dimensions after the subsequent 3x3 convolutional layer, assuming a stride of 1 with no padding and an output channel of 64?

[Prev - Section 5] [Top]