Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Depthwise convolution for oneAPI #1131

Merged
merged 129 commits into from
Dec 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
129 commits
Select commit Hold shift + click to select a range
ce287f0
snapshot adding oneapi
jmitrevs Dec 21, 2023
cd0a2b8
fix reduce constexpr
jmitrevs Dec 21, 2023
3b3d40d
further updates
jmitrevs Dec 23, 2023
b742901
update the bridge and testbench
jmitrevs Dec 26, 2023
8f6ef78
fix issues discovered when compiling
jmitrevs Dec 28, 2023
2e56be4
update bridge writing files
jmitrevs Jan 8, 2024
db780f0
Merge remote-tracking branch 'upstream/main' into oneapi_backend
jmitrevs Jan 8, 2024
b90021f
build library (but not tested)
jmitrevs Jan 9, 2024
f086aa2
fix a bug in testbench
jmitrevs Jan 10, 2024
1f28cbf
snapshot after some debugging
jmitrevs Jan 11, 2024
3e69b9a
remove forgotten debug printing
jmitrevs Jan 11, 2024
17e6856
add build
jmitrevs Jan 11, 2024
2766a6e
pre-commit fixes
jmitrevs Jan 12, 2024
c4ce138
fix more pre-commit
jmitrevs Jan 12, 2024
354d708
fix more pre-commit errors
jmitrevs Jan 12, 2024
8119029
snapshot of work before reworking types
jmitrevs Jan 21, 2024
cae1a8a
Use using to decide array type, some preliminary updates
jmitrevs Feb 12, 2024
06a8c27
snapshot unifying types
jmitrevs Feb 14, 2024
8f58778
fix the testbench and bridge
jmitrevs Feb 14, 2024
86b0f4b
snapshot updating nnet_utils (not finished)
jmitrevs Feb 14, 2024
62c5ecb
define array in nnet_types for oneAPI
jmitrevs Feb 14, 2024
d203b42
fix parallel conv2d
jmitrevs Feb 14, 2024
f983ece
add back the streaming versions of algs, most unconverted
jmitrevs Feb 14, 2024
5dd9282
tentatively complete streaming for dense but not functional
jmitrevs Feb 15, 2024
09b9513
first version that compiles streaming
jmitrevs Feb 15, 2024
0e3f9ba
change how the pipe value type is extracted
jmitrevs Feb 16, 2024
e9f49ad
Merge remote-tracking branch 'upstream/main' into oneapi_backend
jmitrevs Feb 16, 2024
99038eb
fix pre-commit error
jmitrevs Feb 16, 2024
3d555ac
always treat elu as ELU class
jmitrevs Feb 26, 2024
68c6a51
fix batchnorm
jmitrevs Feb 26, 2024
a3f5b3c
snapshot towards fixing conv
jmitrevs Feb 27, 2024
0cbf5be
snapshot fixing test for streaming
jmitrevs Feb 28, 2024
75c9301
fix conv1d
jmitrevs Feb 28, 2024
ba2e283
fix conv2d
jmitrevs Feb 28, 2024
a7c08d3
fix reshape and flatten for oneAPI
jmitrevs Feb 29, 2024
fa05c8b
initial oneAPI tests
jmitrevs Feb 29, 2024
36d5c85
remove nnet_dense_compressed from oneAPI
jmitrevs Mar 14, 2024
de8b76d
add merge functionality (untested)
jmitrevs Mar 16, 2024
058adb4
fix merge for oneAPI
jmitrevs Mar 16, 2024
0a7c761
fix merge for oneAPI (missing commit)
jmitrevs Mar 16, 2024
4c847b2
add zeropadding
jmitrevs Mar 17, 2024
f690c98
standardize paralellization spelling
jmitrevs Mar 17, 2024
262bc0c
fix pointwise for oneAPI
jmitrevs Apr 15, 2024
a8da30e
remove references to quartus
jmitrevs Apr 15, 2024
46ccc1d
more replace quartus with oneapi
jmitrevs Apr 15, 2024
8c9313b
snapshot on the way towards implementing pooling
jmitrevs Apr 15, 2024
0498d44
fix io_stream pooling for oneAPI
jmitrevs Apr 16, 2024
7bd7ba5
add fix for Conv2DBatchnorm
jmitrevs Apr 16, 2024
4ff035f
accidentally committed CMakeLists.txt in my debug setup
jmitrevs Apr 16, 2024
b754f76
reshaping, not fully tested
jmitrevs Apr 17, 2024
2e5a05e
fix cloning of streams
jmitrevs Apr 18, 2024
77c5672
Merge remote-tracking branch 'upstream/main' into oneapi_backend
jmitrevs Apr 19, 2024
8470a6c
fix pytest library loading
jmitrevs Apr 22, 2024
20128bb
remove unused template
jmitrevs Apr 22, 2024
efb6a7a
fix some activation bugs
jmitrevs Apr 22, 2024
6f439d5
fix the overwriting of directories in the pytest
jmitrevs Apr 22, 2024
637e192
update version of test repository
jmitrevs Apr 23, 2024
0f12c96
try to fix docker issue
jmitrevs Apr 23, 2024
a5aac2a
bump hls4ml-testing tag to 0.5.2
jmitrevs Apr 24, 2024
412bd43
try not restricting tensorflow-model-optimizatoin
jmitrevs Apr 24, 2024
5cffadf
Update to 0.5.3 for testing
jmitrevs Apr 24, 2024
d156339
bump to docker image 0.5.4, suggested by Ben
jmitrevs Apr 25, 2024
924af07
fix pre-commit warning
jmitrevs Apr 25, 2024
7ded550
dial down N_TESTS_PER_YAML to 4
jmitrevs Apr 25, 2024
e966b18
revert tensorflow-model-optimization change
jmitrevs Apr 25, 2024
e649f34
fix issue of saving in "obsolete" h5 format
jmitrevs Apr 25, 2024
4743a5d
Merge remote-tracking branch 'upstream/main' into oneapi_backend
jmitrevs May 3, 2024
bf68958
fix embedding for oneAPI
jmitrevs May 8, 2024
d07985d
First attempt at adding RNNs to oneAPI
jmitrevs May 8, 2024
a58e4f5
fix bug in array size
jmitrevs May 9, 2024
eb9575a
fix order or indices
jmitrevs May 9, 2024
04e0fcf
Merge branch 'main' into oneapi_backend
jmitrevs May 31, 2024
b4ed5bc
make queues static in bridge
jmitrevs Jun 6, 2024
ba55211
fix logic error in repack stream
jmitrevs Jun 27, 2024
9b790c5
changing the style, but functionally identical
jmitrevs Jun 27, 2024
b4e8873
Merge remote-tracking branch 'upstream/main' into oneapi_backend
jmitrevs Jul 17, 2024
60fe56b
Merge branch 'main' into oneapi_backend
jmitrevs Jul 25, 2024
056765e
update pointwise optimizer for oneAPI
jmitrevs Jul 25, 2024
ee6817d
add oneAPI to test_multi_dense.py
jmitrevs Jul 25, 2024
5a5b015
fix updating weight types
jmitrevs Jul 29, 2024
1d72aa8
initial changes of templates, for testing
jmitrevs Jul 29, 2024
106d578
fix weight naming, product selection
jmitrevs Jul 29, 2024
80902d7
make im2col the default; fix winograd size
jmitrevs Jul 29, 2024
ea213a3
fix up streaming dense and convolution
jmitrevs Jul 30, 2024
5ba9a29
fix prelu, some batchnorm
jmitrevs Jul 30, 2024
fdd0baf
fix weight array of exponential types
jmitrevs Jul 31, 2024
3ff54a9
move ACExponentialPrecisionDefinition to oneapi_types
jmitrevs Jul 31, 2024
d6604f0
attempt to fix batchnorm and recurrent
jmitrevs Aug 3, 2024
0f74122
Merge branch 'main' into oneapi_backend
jmitrevs Aug 22, 2024
9ffd18e
fixed BatchNormalizationQuantizedTanhConfigTemplate template selection
jmitrevs Aug 22, 2024
be08ad0
fix embedding_stream
jmitrevs Aug 22, 2024
c06beda
fix lstm and simple rnn
jmitrevs Aug 22, 2024
5452fab
fix GRU
jmitrevs Aug 22, 2024
e39e867
fix winograd, and also disable it by default
jmitrevs Aug 22, 2024
cfe229f
fix threshold name
jmitrevs Aug 23, 2024
70617e1
split bn_quant to be backend-specific
jmitrevs Aug 23, 2024
5bc6cbe
add type inference to oneAPI
jmitrevs Aug 25, 2024
c0cf580
add oneAPI to pytorch tests
jmitrevs Aug 27, 2024
8c827b8
fix pooling with padding for oneAPI and Quartus
jmitrevs Aug 29, 2024
a4f4bd9
Merge branch 'main' into oneapi_backend
jmitrevs Sep 12, 2024
f1c0301
Merge branch 'main' into oneapi_backend
jmitrevs Sep 13, 2024
7e0a8ca
Compilation for larger models enabled by increasing -fconstexpr-steps
laurilaatu Sep 13, 2024
acdc363
Merge pull request #6 from laurilaatu/oneapi_constexpr_fix
jmitrevs Sep 13, 2024
d1e14de
add oneapi clone tests; remove reduntand multi_clone test
jmitrevs Sep 18, 2024
1b78e57
remove some attributes to avoid overwrite warnings
jmitrevs Sep 24, 2024
865e2c8
Merge branch 'main' into oneapi_backend
jmitrevs Oct 1, 2024
f9a71f1
make extra handling for oneAPI like others (as in PR #1067)
jmitrevs Oct 1, 2024
320615d
remove warnings for extra optimizers that are not scheduled on purpose
jmitrevs Oct 1, 2024
5d13de5
update parametrized activations
jmitrevs Oct 2, 2024
09c5d5b
intial depthconv2d implementation
laurilaatu Oct 2, 2024
c92091b
intial depthconv2d implementation
laurilaatu Oct 2, 2024
8403348
Merge remote-tracking branch 'refs/remotes/origin/oneapi_separablecon…
laurilaatu Oct 2, 2024
c596f30
Rename to depthconv, add strides and add tests
laurilaatu Oct 9, 2024
bcd8c70
Remove class for DepthwiseConv2D
laurilaatu Oct 9, 2024
8981112
Remove Separable convolution template
laurilaatu Oct 10, 2024
5ad1188
Remove layer optimizer for sepconv
laurilaatu Oct 11, 2024
3c5b633
Merge branch 'main' into oneapi_separableconv
laurilaatu Nov 1, 2024
6b9bf0c
Loop unroll
laurilaatu Nov 7, 2024
09013a1
Merge remote-tracking branch 'origin' into oneapi_separableconv
laurilaatu Nov 18, 2024
21f21fc
Pre-commit format
laurilaatu Nov 18, 2024
9536248
Fix spelling
laurilaatu Nov 18, 2024
8ebdf22
Merge branch 'fastmachinelearning:main' into oneapi_separableconv
laurilaatu Nov 20, 2024
0fb0997
depthconv1d, channel order in loop, product
laurilaatu Nov 20, 2024
d34876d
Gather result to accum
laurilaatu Nov 20, 2024
7d9ec3a
Merge branch 'main' into oneapi_separableconv
laurilaatu Nov 22, 2024
d1c10ca
Merge branch 'main' into oneapi_separableconv
laurilaatu Nov 23, 2024
6de4043
Merge branch 'main' into oneapi_separableconv
laurilaatu Dec 5, 2024
326b188
Merge branch 'main' into oneapi_separableconv
laurilaatu Dec 9, 2024
c58db99
Merge branch 'main' into oneapi_separableconv
laurilaatu Dec 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion hls4ml/backends/fpga/fpga_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ def __init__(self, name):
attrs.append(ConfigurableAttribute('reuse_factor', default=1, description=descriptions.reuse_factor))
self.attribute_map[layer] = attrs

# seperable is kind of special because it is effectively two layers that will be split
# separable is kind of special because it is effectively two layers that will be split
for layer in (SeparableConv1D, SeparableConv2D):
attrs = self.attribute_map.get(layer, [])
attrs.append(TypeAttribute('depthwise_accum'))
Expand Down
40 changes: 37 additions & 3 deletions hls4ml/backends/oneapi/passes/convolution_templates.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from hls4ml.backends.backend import get_backend
from hls4ml.backends.oneapi.oneapi_template import StreamFunctionCallTemplate, TaskSequenceTemplate
from hls4ml.backends.template import FunctionCallTemplate, LayerConfigTemplate
from hls4ml.model.layers import Conv1D, Conv2D, Conv2DBatchnorm
from hls4ml.model.layers import Conv1D, Conv2D, Conv2DBatchnorm, DepthwiseConv1D, DepthwiseConv2D

# TODO - Dilation rate ?

Expand Down Expand Up @@ -70,9 +70,20 @@
conv1d_include_list = ['nnet_utils/nnet_conv1d.h', 'nnet_utils/nnet_conv1d_stream.h']


depthconv1d_function_template = (
'nnet::depthwise_conv_1d_{data_format}<{input_t}, {output_t}, {config}>({input}, {output}, {w}, {b});'
)
depthconv1d_include_list = [
'nnet_utils/nnet_conv1d.h',
'nnet_utils/nnet_conv1d_resource.h',
'nnet_utils/nnet_depthconv1d.h',
'nnet_utils/nnet_depthconv1d_resource.h',
]


class Conv1DConfigTemplate(LayerConfigTemplate):
def __init__(self):
super().__init__(Conv1D)
super().__init__((Conv1D, DepthwiseConv1D))
self.template = conv1d_config_template
self.mult_template = conv_mult_config_template

Expand Down Expand Up @@ -137,6 +148,12 @@ def format(self, node):
return self.template.format(**params)


class DepthwiseConv1DFunctionTemplate(Conv1DFunctionTemplate):
def __init__(self):
super(Conv1DFunctionTemplate, self).__init__(DepthwiseConv1D, include_header=depthconv1d_include_list)
self.template = depthconv1d_function_template


''' 2D Conv '''
conv2d_config_template = """struct config{index} : nnet::conv2d_config {{
static const unsigned in_height = {in_height};
Expand Down Expand Up @@ -183,7 +200,7 @@ def format(self, node):

class Conv2DConfigTemplate(LayerConfigTemplate):
def __init__(self):
super().__init__((Conv2D, Conv2DBatchnorm))
super().__init__((Conv2D, Conv2DBatchnorm, DepthwiseConv2D))
self.template = conv2d_config_template
self.mult_template = conv_mult_config_template

Expand Down Expand Up @@ -233,3 +250,20 @@ def format(self, node):
raise RuntimeError('channels_first not supported on oneAPI')
params['data_format'] = 'cl'
return self.template.format(**params)


depthconv2d_function_template = (
'nnet::depthwise_conv_2d_{data_format}<{input_t}, {output_t}, {config}>({input}, {output}, {w}, {b});'
)
depthconv2d_include_list = [
'nnet_utils/nnet_conv2d.h',
'nnet_utils/nnet_conv2d_resource.h',
'nnet_utils/nnet_depthconv2d.h',
'nnet_utils/nnet_depthconv2d_resource.h',
]


class DepthwiseConv2DFunctionTemplate(Conv2DFunctionTemplate):
def __init__(self):
super(Conv2DFunctionTemplate, self).__init__(DepthwiseConv2D, include_header=depthconv2d_include_list)
self.template = depthconv2d_function_template
2 changes: 1 addition & 1 deletion hls4ml/model/optimizer/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@
'convert',
[
'channels_last_converter',
'seperable_to_depthwise_and_conv',
'separable_to_depthwise_and_conv',
'remove_transpose_before_flatten',
'remove_nop_transpose',
'remove_single_channel_transpose',
Expand Down
10 changes: 5 additions & 5 deletions hls4ml/model/optimizer/passes/seperable_to_dw_conv.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""
This optimizer converts a seperable convolution to a depthwise followed by a regular convolution.
This optimizer converts a separable convolution to a depthwise followed by a regular convolution.
For backends with a custom pointwise implementations the regular convolution will subsequently
be converted to a pointwise convolution by a different optimizer.
"""
Expand All @@ -10,8 +10,8 @@
from hls4ml.model.optimizer import OptimizerPass


class SeperableToDepthwiseAndConv(OptimizerPass):
"""Convert Seperable to DepthwiseConv + Conv (potentially later Pointwise)"""
class SeparableToDepthwiseAndConv(OptimizerPass):
"""Convert Separable to DepthwiseConv + Conv (potentially later Pointwise)"""

_dw_attributes = (
'in_width',
Expand Down Expand Up @@ -70,7 +70,7 @@ def transform(self, model, node):
model.config.parse_name_config(dw_name, dw_layer_config)

# creating the attributes
dw_attributes = {k: node.attributes[k] for k in SeperableToDepthwiseAndConv._dw_attributes if k in node.attributes}
dw_attributes = {k: node.attributes[k] for k in SeparableToDepthwiseAndConv._dw_attributes if k in node.attributes}
dw_attributes['n_filt'] = dw_attributes['n_chan'] * dw_attributes['depth_multiplier']
dw_attributes['use_bias'] = False

Expand Down Expand Up @@ -100,7 +100,7 @@ def transform(self, model, node):
model.config.parse_name_config(pw_name, pw_layer_config)

# creating the attributes
pw_attributes = {k: node.attributes[k] for k in SeperableToDepthwiseAndConv._pw_attributes if k in node.attributes}
pw_attributes = {k: node.attributes[k] for k in SeparableToDepthwiseAndConv._pw_attributes if k in node.attributes}
pw_attributes['filt_width'] = 1
pw_attributes['filt_height'] = 1
pw_attributes['stride_width'] = 1
Expand Down
19 changes: 19 additions & 0 deletions hls4ml/templates/oneapi/firmware/nnet_utils/nnet_depthconv1d.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#ifndef NNET_DEPTH_CONV1D_H_
#define NNET_DEPTH_CONV1D_H_

#include "nnet_common.h"
#include "nnet_conv1d.h"
#include "nnet_depthconv1d_resource.h"

namespace nnet {

template <class data_T, class res_T, typename CONFIG_T>
void depthwise_conv_1d_cl(const data_T &data, res_T &res, const typename CONFIG_T::weight_t &weights,
const typename CONFIG_T::bias_t &biases) {

depthwise_conv_1d_resource_cl<data_T, res_T, CONFIG_T>(data, res, weights, biases);
}

} // namespace nnet

#endif
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
#ifndef NNET_DEPTH_CONV1D_LATENCY_H_
#define NNET_DEPTH_CONV1D_LATENCY_H_

#include "nnet_common.h"
#include "nnet_conv1d_resource.h"
#include "nnet_mult.h"

namespace nnet {

template <class data_T, class res_T, typename CONFIG_T>
void depthwise_conv_1d_resource_cl(const data_T &data, res_T &res, const typename CONFIG_T::weight_t &weights,
const typename CONFIG_T::bias_t &biases) {

int depth_multiplier = CONFIG_T::n_filt / CONFIG_T::n_chan;
[[intel::fpga_register]] int res_idx = 0;

[[intel::fpga_register]] typename CONFIG_T::accum_t acc[CONFIG_T::out_width * CONFIG_T::n_filt];

DM_LOOP:
#pragma unroll
for (int dm = 0; dm < depth_multiplier; dm++) {

WIDTH_LOOP:
#pragma unroll
for (int w = 0; w < CONFIG_T::out_width; w++) {

CHAN_LOOP:
#pragma unroll
for (int c = 0; c < CONFIG_T::n_chan; c++) {

res_idx = (w * CONFIG_T::n_filt) + (c * depth_multiplier) + dm;

acc[res_idx] = biases[c * depth_multiplier + dm];

KERNEL_W_LOOP:
#pragma unroll
for (int kw = 0; kw < CONFIG_T::filt_width; kw++) {

int w_in = w * CONFIG_T::stride_width + kw - CONFIG_T::pad_left;

if ((w_in >= 0) && (w_in < CONFIG_T::in_width)) {

acc[res_idx] += CONFIG_T::mult_config::
template product<typename data_T::value_type, typename CONFIG_T::weight_t::value_type>::product(
data[(w_in)*CONFIG_T::n_chan + c],
weights[(dm * CONFIG_T::filt_width * CONFIG_T::n_chan) + (kw * CONFIG_T::n_chan) + c]);
}
}
}
}
}

RESULT:
#pragma unroll
for (int ires = 0; ires < CONFIG_T::out_width * CONFIG_T::n_filt; ires++) {
res[ires] = cast<typename CONFIG_T::accum_t, typename res_T::value_type, CONFIG_T>(acc[ires]);
}
}
} // namespace nnet
#endif
19 changes: 19 additions & 0 deletions hls4ml/templates/oneapi/firmware/nnet_utils/nnet_depthconv2d.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#ifndef NNET_DEPTH_CONV2D_H_
#define NNET_DEPTH_CONV2D_H_

#include "nnet_common.h"
#include "nnet_conv2d.h"
#include "nnet_depthconv2d_resource.h"

namespace nnet {

template <class data_T, class res_T, typename CONFIG_T>
void depthwise_conv_2d_cl(const data_T &data, res_T &res, const typename CONFIG_T::weight_t &weights,
const typename CONFIG_T::bias_t &biases) {

depthwise_conv_2d_resource_cl<data_T, res_T, CONFIG_T>(data, res, weights, biases);
}

} // namespace nnet

#endif
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#ifndef NNET_SEPARABLE_CONV2D_LATENCY_H_
#define NNET_SEPARABLE_CONV2D_LATENCY_H_

#include "nnet_common.h"
#include "nnet_conv2d_resource.h"
#include "nnet_mult.h"

namespace nnet {

template <class data_T, class res_T, typename CONFIG_T>
void depthwise_conv_2d_resource_cl(const data_T &data, res_T &res, const typename CONFIG_T::weight_t &weights,
const typename CONFIG_T::bias_t &biases) {

int depth_multiplier = CONFIG_T::n_filt / CONFIG_T::n_chan;
[[intel::fpga_register]] int res_idx = 0;

[[intel::fpga_register]] typename CONFIG_T::accum_t acc[CONFIG_T::out_width * CONFIG_T::out_height * CONFIG_T::n_filt];

DM_LOOP:
laurilaatu marked this conversation as resolved.
Show resolved Hide resolved
#pragma unroll
for (int dm = 0; dm < depth_multiplier; dm++) {

HEIGHT_LOOP:
#pragma unroll
for (int h = 0; h < CONFIG_T::out_height; h++) {
WIDTH_LOOP:
#pragma unroll
for (int w = 0; w < CONFIG_T::out_width; w++) {

CHAN_LOOP:
#pragma unroll
for (int c = 0; c < CONFIG_T::n_chan; c++) {

res_idx =
(h * CONFIG_T::out_width * CONFIG_T::n_filt) + (w * CONFIG_T::n_filt) + (c * depth_multiplier) + dm;

acc[res_idx] = biases[c * depth_multiplier + dm];

KERNEL_H_LOOP:
#pragma unroll
for (int kh = 0; kh < CONFIG_T::filt_height; kh++) {
KERNEL_W_LOOP:
#pragma unroll
for (int kw = 0; kw < CONFIG_T::filt_width; kw++) {

int h_in = h * CONFIG_T::stride_height + kh - CONFIG_T::pad_top;
int w_in = w * CONFIG_T::stride_width + kw - CONFIG_T::pad_left;

if ((h_in >= 0) && (h_in < CONFIG_T::in_height) && (w_in >= 0) && (w_in < CONFIG_T::in_width)) {

acc[res_idx] +=
CONFIG_T::mult_config::template product<typename data_T::value_type,
typename CONFIG_T::weight_t::value_type>::
product(
data[(h_in)*CONFIG_T::in_width * CONFIG_T::n_chan + (w_in)*CONFIG_T::n_chan + c],
weights[(dm * CONFIG_T::filt_height * CONFIG_T::filt_width * CONFIG_T::n_chan) +
(kh * CONFIG_T::filt_width * CONFIG_T::n_chan) +
(kw * CONFIG_T::n_chan) + c]);

;
}
}
}
}
}
}
}

RESULT:
#pragma unroll
for (int ires = 0; ires < CONFIG_T::out_width * CONFIG_T::out_height * CONFIG_T::n_filt; ires++) {
res[ires] = cast<typename CONFIG_T::accum_t, typename res_T::value_type, CONFIG_T>(acc[ires]);
}
}
} // namespace nnet
#endif
1 change: 1 addition & 0 deletions test/pytest/test_depthconv1d.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
@pytest.mark.parametrize(
'backend, io_type',
[
('oneAPI', 'io_parallel'),
('Vivado', 'io_parallel'),
('Vitis', 'io_parallel'),
('Vivado', 'io_stream'),
Expand Down
1 change: 1 addition & 0 deletions test/pytest/test_depthconv2d.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
@pytest.mark.parametrize(
'backend, io_type',
[
('oneAPI', 'io_parallel'),
('Vivado', 'io_parallel'),
('Vitis', 'io_parallel'),
('Vivado', 'io_stream'),
Expand Down
1 change: 1 addition & 0 deletions test/pytest/test_sepconv1d.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
@pytest.mark.parametrize(
'backend, io_type',
[
('oneAPI', 'io_parallel'),
('Vivado', 'io_parallel'),
('Vitis', 'io_parallel'),
('Vivado', 'io_stream'),
Expand Down
1 change: 1 addition & 0 deletions test/pytest/test_sepconv2d.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
@pytest.mark.parametrize(
'backend, io_type',
[
('oneAPI', 'io_parallel'),
('Vivado', 'io_parallel'),
('Vitis', 'io_parallel'),
('Vivado', 'io_stream'),
Expand Down
Loading