Applies Layer Normalization over a mini-batch of inputs as described in the paper Layer Normalization
The mean and standard-deviation are calculated over the last axis
. For example, if axis
= -2, the mean is computed over the last 2 dimensions of the input.
SkipLayerNorm is performed by the formula below:
axis to split the normalization dimension.
Whether has W
and B
.
Whether apply SkipLayerNorm.
Input features.
Shape:
Transformation weight.
Shape:
Transformation bias.
Shape:
Skip input.
Shape: same as X
Output features.
Shape: same as X
SkipOutput. If SkipIn
is not appear, SkipOut
will be a copy of X
Shape: same as X
.
If input is float16, data will convert to float32 before LayerNorm.