The Exponentially Weighted Moving Average (EWMA) is a quantitative or statistical measure used to model or describe a time series, but it's also part of one of the well known optimization algorithm of Deep Learning: Gradient Descent with Momentum.
In order to compute the EWMA you must define one parameter
Lets make an example based on the temperatures of Paris, France, in 2019.
Define:
-
$\beta$ := Weight parameter -
$\theta_t$ := Temperature day$t$ -
$W_t:=$ EWMA for day$t$ (set$W_0 = 0$ ).
For this example supose that
In general to compute the EWA for given parameter weight
$$ W_t = \begin{cases} 0 & t =0 \
\textcolor{red}{\beta}\cdot W_{t-1} + (\textcolor{red}{1-\beta})\cdot\theta_t & t > 0 \end{cases} $$
If we plot this in red, we can see that what we get is a moving average of the daily temperature, it's like a smooth, less noisy curve.
Lets explain a bit more the general equation:
We can see that the value of
Take a value of
Lets try the other extreme and set
But it adapts much more quickly to changes in temperature.
If you want to understand the meaning of the parameter
as the numbers of observations used to adapt your EWA.
EWA | ||
---|---|---|
0.9 | 10 | Adapts normal |
0.98 | 50 | Adapts slowly |
0.5 | 2 | Adapts quickly |
In order to go a little bit deeper into the intuitions of what this algorithm actually does.
Lets expand the 3th term (
$$ W_3 = \textcolor{red}{0.9}\cdot W_2 + \textcolor{red}{0.1}\cdot\theta_{3}\ W_2 = \textcolor{red}{0.9}\cdot W_1 + \textcolor{red}{0.1}\cdot\theta_{2}\ W_1 = \textcolor{red}{0.9}\cdot \underbrace{W_0}{0} + \textcolor{red}{0.1}\cdot\theta{1} $$
Plugin
$$ W_{3} = \textcolor{red}{0.9}\cdot \underbrace{(\textcolor{red}{0.9}\underbrace{(\textcolor{red}{0.9}\cdot 0 + \textcolor{red}{0.1}\cdot\theta_{1})}{W_1} + \textcolor{red}{0.1}\cdot\theta{2})}{W_2} + \textcolor{red}{0.1}\cdot\theta{3} $$
Simplifying
Here it is quite clear what the roll of
In general we have:
or the closed formula:
If you are visual learner, this is another approach:
Rewrite