diff --git a/chapters/en/unit13/hyena.mdx b/chapters/en/unit13/hyena.mdx index 7599888f7..87812dd46 100644 --- a/chapters/en/unit13/hyena.mdx +++ b/chapters/en/unit13/hyena.mdx @@ -63,13 +63,13 @@ However, unlike the attention mechanism, which typically uses a single dense lay The core idea is to repeatedly apply linear operators that are fast to evaluate to an input sequence \\(u \in \mathbb{R}^{L}\\) with \\(L\\) the length of the sequence. Because global convolutions have a large number of parameters, they are expensive to train. A notable design choice is the use of **implicit convolutions**. -Unlike standard convolutional layers, the convolution filter \\(h\\) is learned implicitly with a small neural network \\(gamma_{\theta}\\) (also called the Hyena Filter). -This network takes the positional index and potentially positional encodings as inputs. From the outputs of \\(gamma_theta\\) one can construct a Toeplitz matrix \\(T_h\\). +Unlike standard convolutional layers, the convolution filter \\(h\\) is learned implicitly with a small neural network \\(\gamma_{\theta}\\) (also called the Hyena Filter). +This network takes the positional index and potentially positional encodings as inputs. From the outputs of \\(\gamma_{\theta}\\) one can construct a Toeplitz matrix \\(T_h\\). This implies that instead of learning the values of the convolution filter directly, we learn a mapping from a temporal positional encoding to the values, which is more computationally efficient, especially for long sequences. -It's important to note that the mapping function can be conceptualized within various abstract models, such Neural Field or State Space Models (S4) as discussed in H3 Paper. +It's important to note that the mapping function can be conceptualized within various abstract models, such as Neural Field or State Space Models (S4) as discussed in H3 Paper. ### Implicit convolutions