Skip to content

Latest commit

 

History

History
27 lines (14 loc) · 364 Bytes

SwiGLU.md

File metadata and controls

27 lines (14 loc) · 364 Bytes

SwiGLU

Apply gated Swish to $X$, where last dimension of $X$ is $D$.

$$Swish(X)=x*sigmoid(\beta X)$$

$$SwiGLU(X)=X[\cdots, D/2:]*Swish(X[\cdots, :D/2])$$

Attributes/Parameters

beta: float(default: 1.0f)

Inputs

X: tensor(T)

Shape: $(*,D)$

Outputs

Y: tensor(T)

Shape: $(*,D/2)$

Type Constraints

T: float32, float16