-
Notifications
You must be signed in to change notification settings - Fork 8k
CFG Parameters in the different layers
Image processing [N x C x H x W]:
-
[convolutional]
- convolutional layer-
batch_normalize=1
- if1
- will be used batch-normalization, if0
will not (0 by default) -
filters=64
- number of kernel-filters (1 by default) -
size=3
- kernel_size of filter (1 by default) -
groups = 32
- number of groups for grouped-convolutional (depth-wise) (1 by default) -
stride=1
- stride (offset step) of kernel filter (1 by default) -
padding=1
- size of padding (0 by default) -
pad=1
- if1
will be usedpadding = size/2
, if0
the will be used parameterpadding=
(0 by default) -
dilation=1
- size of dilation (1 by default) -
activation=leaky
- activation function after convolution:logistic (by default), loggy, relu, elu, selu, relie, plse, hardtan, lhtan, linear, ramp, leaky, tanh, stair, relu6, swish, mish
-
-
[activation]
- separate activation layer-
activation=leaky
- activation function:linear (by default), loggy, relu, elu, selu, relie, plse, hardtan, lhtan, linear, ramp, leaky, tanh, stair
-
-
[batchnorm]
- separate Batch-normalization layer
-
[maxpool]
- max-pooling layer (the maximum value)-
size=2
- size of max-pooling kernel -
stride=2
- stirde (offset step) of max-pooling kernel
-
-
[avgpool]
- average pooling layer inputW x H x C
-> output1 x 1 x C
-
[shortcut]
- residual connection (ResNet)-
from=-3,-5
- relative layer numbers, preforms element-wise adding of several layers: previous-layer and layers specified infrom=
parameter -
weights_type=per_feature
- will be used weights for shortcuty[i] = w1*layer1[i] + w2*layer2[i] ...
-
per_feature
- 1 weights per layer/feature -
per_channel
- 1 weights per channel -
none
- weights will not be used (by default)
-
-
weights_normalization=softmax
- will be used weights normalization-
softmax
- softmax normalization -
relu
- relu normalization -
none
- without weights normalization - unbound weights (by default)
-
-
activation=linear
- activation function after shortcut/residual connection (linear by default)
-
-
[upsample]
- upsample layer (increase W x H resolution of input by duplicating elements)-
stride=2
- factor for increasing both Width and Height (new_w = w*stride
,new_h = h*stride
)
-
-
[scale_channels]
- scales channels (SE: squeeze-and-excitation blocks) or (ASFF: adaptively spatial feature fusion) -it multiplies elements of one layer by elements of another layer-
from=-3
- relative layer number, performs multiplication of all elements of channelN
from layer-3
, by one element of channelN
from the previous layer-1
(i.e.for(int i=0; i < b*c*h*w; ++i) output[i] = from_layer[i] * previous_layer[i/(w*h)];
) -
scale_wh=0
- SE-layer (previous layer 1x1xC),scale_wh=1
- ASFF-layer (previous layer WxHx1) -
activation=linear
- activation function after scale_channels-layer (linear by default)
-
-
[sam]
- Spatial Attention Module (SAM) - it multiplies elements of one layer by elements of another layer-
from=-3
- relative layer number (this and previous layers should be the same size WxHxC)
-
-
[reorg3d]
- reorg layer (resize W x H x C)-
stride=2
- ifreverse=0
input will be resized toW/2 x H/2
x C4, if
reverse=1then
W2 x H*2 x C/4`, (1 by default) -
reverse=1 - if
0(by default) then decrease WxH, if
1then
increase WxH (0 by default)
-
-
[reorg]
- OLD reorg layer from Yolo v2 - has incorrect logic (resize W x H x C) - depracated-
stride=2
- ifreverse=0
input will be resized toW/2 x H/2
x C4, if
reverse=1then
W2 x H*2 x C/4`, (1 by default) -
reverse=1 - if
0(by default) then decrease WxH, if
1then
increase WxH (0 by default)
-
-
[route]
- concatenation layer,Concat
for several input-layers, orIdentity
for one input-layer-
layers = -1, 61
- layers that will be concatenated, output:W
xH
xC_layer_1 + C_layer_2
- if
index < 0
, then it is relative layer number (-1
means previous layer) - if
index >= 0
, then it is absolute layer number
- if
-
-
[yolo]
- detection layer for Yolo v3 / v4-
mask = 3,4,5
- indexes ofanchors
which are used in this [yolo]-layer -
anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
- initial sizes if bounded_boxes that will be adjusted -
num=9
- total number of anchors -
classes=80
- number of classes of objects which can be detected -
ignore_thresh = .7
- keeps duplicated detectionsif IoU(detect, truth) > ignore_thresh
, which will be fused during NMS (is used for training only) -
truth_thresh = 1
- adjusts duplicated detectionsif IoU(detect, truth) > truth_thresh
, which will be fused during NMS (is used for training only) -
jitter=.3
- randomly crops and resizes images with changing aspect ratio from x(1 - 2*jitter)
to x(1 + 2*jitter)
(data augmentation parameter is used only from the last layer) -
random=1
- randomly resizes network for each 10 iterations from1/1.4
to1.4
(data augmentation parameter is used only from the last layer) -
resize=1.5
- randomly resizes image in range:1/1.5 - 1.5x
-
max=200
- maximum number of objects per image during training -
counters_per_class=100,10,1000
- number of objects per class in Training dataset to eliminate the imbalance -
label_smooth_eps=0.1
- label smoothing -
scale_x_y=1.05
- eliminate grid sensitivity -
iou_thresh=0.2
- use many anchors per object ifIoU(Obj, Anchor) > 0.2
-
iou_loss=mse
- IoU-loss:mse, giou, diou, ciou
-
iou_normalizer=0.07
- normalizer for delta-IoU -
cls_normalizer=1.0
- normalizer for delta-Objectness -
max_delta=5
- limits delta for each entry
-
-
parameters for tracking if contrastive learning is used:
-
track_history_size = 5
- find similiraty on 5 previous frames [1 - inf) -
sim_thresh = 0.8
- similarity threshold to consider an object on two frames the same (0.0 to 1.0) -
dets_for_show = 2
- number of frames with this object before Show it [0 - inf) -
dets_for_track = 8
- number of frames with this object before Track it [0 - inf) -
track_ciou_norm = 0.3
- take into account CIoU (0.0 to 1.0)
-
-
[crnn]
- convolutional RNN-layer (recurrent)-
batch_normalize=1
- if1
- will be used batch-normalization, if0
will not (0 by default) -
size=1
- convolutional kernel_size of filter (1 by default) -
pad=0
- if1
will be usedpadding = size/2
, if0
the will be used parameterpadding=
(0 by default) -
output = 1024
- number of kernel-filters in one output convolutional layer (1 by default) -
hidden=1024
- number of kernel-filters in two (input and hidden) convolutional layers (1 by default) -
activation=leaky
- activation function for each of 3 convolutional-layers in the [crnn]-layer (logistic by default)
-
-
[conv_lstm]
- convolutional LSTM-layer (recurrent)-
batch_normalize=1
- if1
- will be used batch-normalization, if0
will not (0 by default) -
size=3
- convolutional kernel_size of filter (1 by default) -
padding=1
- convolutional size of padding (0 by default) -
pad=1
- if1
will be usedpadding = size/2
, if0
the will be used parameterpadding=
(by default) -
stride=1
- convolutional stride (offset step) of kernel filter (1 by default) -
dilation=1
- convolutional size of dilation (1 by default) -
output=256
- number of kernel-filters in each of 8 or 11 convolutional layers (1 by default) -
groups=4
- number of groups for grouped-convolutional (depth-wise) (1 by default) -
state_constrain=512
- constrains LSTM-state values [-512; +512] after each inference (time_steps*32
by default) -
peephole=0
- if1
then will be used Peephole (additional 3 conv-layers), if0
will not (1 by default) -
bottleneck=0
- if1
then will be used reduced optimal versionn of conv-lstm layer -
activation=leaky
- activation function for each of 8 or 11 convolutional-layers in the [conv_lstm]-layer (linear by default) -
lstm_activation=tanh
- activation for G (gate:g = tanh(wg + ug)
) and C (memory cell:h = o * tanh(c)
)
-
Free-form data processing [Inputs]:
-
[connected]
- fully connected layer-
output=256
- number of outputs (1 by default), so number of connections is equal toinputs*outputs
-
activation=leaky
- activation after layer (logistic by default)
-
-
[dropout]
- dropout layer-
probability=0.5
- dropout probability - what part of inputs will be zeroed (0.5 = 50% by default) -
dropblock=1
- use as DropBlock -
dropblock_size_abs=7
- size of DropBlock in pixels 7x7
-
-
[softmax]
- SoftMax CE (cross entropy) layer - Categorical cross-entropy for multi-class classification
-
[contrastive]
- Contrastive loss layer for Supervised and Unsupervised learning (should be set[net] contrastive=1
and optionally[net] unsupervised=1
)-
yolo_layer= -2
- index (absolute or relative) of reletated [yolo] layer -
classes=1000
- number of classes -
temperature=1.0
- temperature -
cls_normalizer=1.0
- normalizer for delta-Objectness -
max_delta=5
- limits delta for each entry
-
-
[cost]
- cost layer calculates (linear)Delta and (squared)Loss-
type=sse
- cost type:sse
(L2),masked
,smooth
(smooth-L1) (SSE by default)
-
-
[rnn]
- fully connected RNN-layer (recurrent)-
batch_normalize=1
- if1
- will be used batch-normalization, if0
will not (0 by default) -
output = 1024
- number of outputs in one connected layer (1 by default) -
hidden=1024
- number of outputs in two (input and hidden) connected layers (1 by default) -
activation=leaky
- activation after layer (logistic by default)
-
-
[lstm]
- fully connected LSTM-layer (recurrent)-
batch_normalize=1
- if1
- will be used batch-normalization, if0
will not (0 by default) -
output = 1024
- number of outputs in all connected layers (1 by default)
-
-
[gru]
- fully connected GRU-layer (recurrent)-
batch_normalize=1
- if1
- will be used batch-normalization, if0
will not (0 by default) -
output = 1024
- number of outputs in all connected layers (1 by default)
-