update

StatisticsSU · Dec 19, 2023 · 36aa5a6 · 36aa5a6
1 parent 9af71e3
commit 36aa5a6
Show file tree

Hide file tree

Showing 4 changed files with 34 additions and 16 deletions.
diff --git a/docs/search.json b/docs/search.json
@@ -137,7 +137,7 @@
     "href": "tutorial/statespace/statespace.html#state-space-model---filtering-smoothing-and-forecasting",
     "title": "State-space models - filtering, smoothing and forecasting",
     "section": "State-space model - filtering, smoothing and forecasting",
-    "text": "State-space model - filtering, smoothing and forecasting\n\nThe state space model\nAll of the models above, and many, many, many more can be written as a so called state-space model. A state-space model for a univariate time series \\(y_t\\) with a state vector \\(\\boldsymbol{\\theta}_t\\) can be written as\n\\[\n\\begin{align}\ny_t &= \\boldsymbol{F} \\boldsymbol{\\theta}_t + v_t,\\hspace{1.5cm} v_t \\sim N(\\boldsymbol{0},\\boldsymbol{V})  \\\\\n\\boldsymbol{\\theta}_t &= \\boldsymbol{G} \\boldsymbol{\\theta}_{t-1} + \\boldsymbol{w}_t, \\qquad \\boldsymbol{w}_t \\sim N(\\boldsymbol{0},\\boldsymbol{W})\n\\end{align}\n\\]\nFor example, the local level model is a state-space model with a single scalar state variable \\(\\boldsymbol{\\theta}_t = \\mu_t\\) and parameters\n\\[\n\\begin{align}\n\\boldsymbol{F} &= 1 \\\\\n\\boldsymbol{G} &= 1  \\\\\n\\boldsymbol{V} &= \\sigma_\\varepsilon^2 \\\\\n\\boldsymbol{W} &= \\sigma_\\nu^2\n\\end{align}\n\\]\nWe learn about the state \\(\\mu_t\\) from the observed time series \\(y_t\\) . The first equation is often called the observation or measurement model since it gives the connection between the unobserved state and the observed measurements. The measurements can also be a vector, but we will use a single measurement in this tutorial. The second equation is called the state transition model since it determines how the state evolves over time.\nWe can even let the state-space parameters \\(\\boldsymbol{F}, \\boldsymbol{G}, \\boldsymbol{V}, \\boldsymbol{W}\\) be different i every time period. This is in fact needed if we want to write the time-varying regression model in state-space form. Recall the time varying regression model\n\\[\n\\begin{align}  \ny_t &= \\alpha_{t} + \\beta_{t} x_t  + \\varepsilon_t, \\quad \\varepsilon_t \\sim N(0, \\sigma_\\varepsilon^2)  \\\\  \n\\alpha_{t} &= \\alpha_{t-1} + \\eta_t, \\qquad \\quad \\eta_t \\sim N(0, \\sigma_\\alpha^2)   \\\\  \n\\beta_{t} &= \\beta_{t-1} + \\nu_t, \\qquad \\quad \\nu_t \\sim N(0, \\sigma_\\beta^2)\n\\end{align}\n\\]\nWe can tuck the two time-varying parameters in a vector \\(\\boldsymbol{\\beta}=(\\alpha_t,\\beta_t)^\\top\\) and also write the models as\n\\[\n\\begin{align}  \ny_t &= \\boldsymbol{x}_t^\\top\\boldsymbol{\\beta}_{t}   + \\varepsilon_t, \\hspace{0.8cm} \\varepsilon_t \\sim N(0, \\sigma_\\varepsilon^2)  \\\\    \n\\boldsymbol{\\beta}_{t} &= \\boldsymbol{\\beta}_{t-1} + \\boldsymbol{w}_t, \\quad \\quad \\nu_t \\sim N(0, \\boldsymbol{W})\n\\end{align}\n\\]\nwhere\n\\[\n\\begin{align}  \n\\boldsymbol{x}_t &= (1,x_t)^\\top  \\\\    \n\\boldsymbol{w}_t &= (\\eta_t,\\nu_t)^\\top  \\\\\n\\boldsymbol{W} &=\n\\begin{pmatrix}\n\\sigma_\\alpha^2 & 0 \\\\\n0               & \\sigma_\\eta^2\n\\end{pmatrix}\n\\end{align}\n\\]\nNote this is a state-space model with\n\\[\n\\begin{align}\n\\boldsymbol{F}_t &= \\boldsymbol{x}_t\\\\\n\\boldsymbol{G} &=\n\\begin{pmatrix}\n1 & 0 \\\\\n0 & 1\n\\end{pmatrix} \\\\\n\\boldsymbol{V} &= \\sigma_\\varepsilon^2 \\\\\n\\boldsymbol{W} &=\n\\begin{pmatrix}\n\\sigma_\\alpha^2 & 0 \\\\\n0               & \\sigma_\\eta^2\n\\end{pmatrix}\n\\end{align}\n\\]\nand note now that \\(\\boldsymbol{F}\\) changes in every time period, hence the subscript \\(t\\).\n\n\nFiltering and smoothing\nThere are two different types of relevant inferences in state-space models: filtering and smoothing:\n\nThe filtered estimate \\(\\hat{\\boldsymbol{\\theta}}_{t|t}\\) of the state \\(\\boldsymbol{\\theta}_t\\) uses data up to time \\(t\\).\nThe smoothed estimate \\(\\hat{\\boldsymbol{\\theta}}_{t|T}\\) of the state \\(\\boldsymbol{\\theta}_t\\) uses data up to time \\(T\\), the end of the time series.\n\nThe filtered estimate is therefore the instantaneous estimate, giving the best estimate of the current state. The smoothed estimate is the retrospective estimate that looks back in time and gives us the best estimate using all the data.\nFiltering means to compute the sequence of instantaneous estimates of the unobserved state at every time point \\(t=1,2,\\ldots,T\\)\n\\[\n\\hat{\\boldsymbol{\\theta}}_{1|1},\\hat{\\boldsymbol{\\theta}}_{2|2},\\ldots,\\hat{\\boldsymbol{\\theta}}_{T|T}\n\\]\nWe will take a time series and compute the filtered estimates for the whole time series, but it is important to understand that filtering is often done in real-time, which means it is a continously ongoing process that returns filtered estimates of the state \\(\\boldsymbol{\\theta}_t\\) as time progresses and new measurements \\(y_t\\) come in. Think about a self-driving car that is continously trying to understand the environment (people, other cars, the road conditions etc). The environment is the state and the car uses its sensors to collect measurements. The filtering estimates tells the car about the best guess for the environment at every point in time.\nFor state-space models of the type discussed here (linear measurement equation and linear evolution of the state, with independent Normal measurement errors and state innovations), the filtered estimates are computed with one of the most famous algorithms in statistics: the Kalman filter.\nThe Kalman filter is a little messy to write up, we will do it for completeness, but we will use a package for it so don’t worry if the linear algebra is intidimating. We will use the notation $\\(\\boldsymbol{\\mu}_{t|t}\\) instead of \\(\\hat{\\boldsymbol{\\theta}}_{t|t}\\), but they really mean the same.\n\ntime \\(t = 0\\). The Kalman filter starts with mean \\(\\boldsymbol{\\mu}_{0|0}\\) and covariance matrix \\(\\boldsymbol{\\Omega}_{0|0}\\) for the state at time \\(t=0\\). Think about \\(\\boldsymbol{\\mu}_{0|0}\\) as the best guess \\(\\boldsymbol{\\theta}_0\\) of the state vector at time \\(t=0\\) and \\(\\boldsymbol{\\Omega}_{0|0}\\) representing how sure we can be about this guess2.\ntime \\(t = 1\\). The Kalman filter then uses the first measurement \\(y_1\\) to update \\(\\boldsymbol{\\mu}_{0|0} \\rightarrow \\boldsymbol{\\mu}_{1|1}\\) and \\(\\boldsymbol{\\Omega}_{0|0} \\rightarrow \\boldsymbol{\\Omega}_{1|1}\\) to represent the estimate and the uncertainty for \\(\\boldsymbol{\\theta}_1\\), the state at time \\(t=1\\).\ntime \\(t = 2,...,T\\). It then continues in this fashion using the next measurement \\(y_2\\) to compute \\(\\boldsymbol{\\mu}_{2|2}\\) and \\(\\boldsymbol{\\Omega}_{2|2}\\) and so on all the way to the end of the time series to finally get \\(\\boldsymbol{\\mu}_{T|T}\\) and \\(\\boldsymbol{\\Omega}_{T|T}\\).\n\nHere is the Kalman filter algorithm:\n\n\nInitialization: set \\(\\boldsymbol{\\mu}_{0|0}\\) and \\(\\boldsymbol{\\Omega}_{0|0}\\)\nfor \\(t=1,\\ldots,T\\) do\n\nPrediction update\\[\n\\begin{align}\n\\boldsymbol{\\mu}_{t|t-1} &= \\boldsymbol{G} \\boldsymbol{\\mu}_{t-1|t-1} \\\\  \n\\boldsymbol{\\Omega}_{t|t-1} &= \\boldsymbol{G}\\boldsymbol{\\Omega}_{t-1|t-1}  \\boldsymbol{G}^\\top + \\boldsymbol{W}\n\\end{align}\n\\]\nMeasurement update\\[\n\\begin{align}\n\\boldsymbol{\\mu}_{t|t} &= \\boldsymbol{\\mu}_{t|t-1} + \\boldsymbol{K}_t ( y_t - \\boldsymbol{F} \\boldsymbol{\\mu}_{t|t-1}  )  \\\\  \n\\boldsymbol{\\Omega}_{t|t} &= (\\boldsymbol{I} - \\boldsymbol{K}_t \\boldsymbol{F} )\\boldsymbol{\\Omega}_{t|t-1}\n\\end{align}\n\\]\n\n\nwhere \\[\\boldsymbol{K}_t = \\boldsymbol{\\Omega}_{t|t-1}\\boldsymbol{F}^\\top ( \\boldsymbol{F} \\boldsymbol{\\Omega}_{t|t-1}\\boldsymbol{F}^\\top + \\boldsymbol{V})^{-1}\\] is the Kalman Gain.\n\nThe widget below lets you experiment with the Kalman filter for the local level model fitted to the Nile river data. In the widget we infer (filter) the local levels \\(\\mu_1,\\mu_2,\\ldots,\\mu_T\\) and can experiment with the measurement standard deviation \\(\\sigma_\\varepsilon\\), the standard deviation of the innovations to the local mean \\(\\sigma_\\eta\\), and also the initial guess for \\(\\mu_0\\) and the standard deviation \\(\\sigma_0\\) of that guess.\nHere are few things to try out in the widget below:\n\nIncrease the measurement standard deviation \\(\\sigma_\\varepsilon\\) and note how the filtered mean pays less and less attention to changes in the data (because the model believes that the data is very poor quality (noisy) and tells us basically nothing about the level). Then move \\(\\sigma_\\varepsilon\\) to smaller values and note how the filtered mean starts chasing the data (because the model believes that the data are super informative about the level).\nMake the standard deviation for the initial level \\(\\sigma_0\\) very small and then change the initial mean \\(\\mu_0\\) to see how this affects the filtered mean at the first part of the time series.\nMove the standard deviation of the innovations to the level \\(\\sigma_\\eta\\) small and note how the filtered mean becomes smoother and smoother over time."
+    "text": "State-space model - filtering, smoothing and forecasting\n\nThe state space model\nAll of the models above, and many, many, many more can be written as a so called state-space model. A state-space model for a univariate time series \\(y_t\\) with a state vector \\(\\boldsymbol{\\theta}_t\\) can be written as\n$$ \\[\\begin{align}\ny_t &= \\boldsymbol{F} \\boldsymbol{\\theta}_t + v_t,\\hspace{1.5cm} v_t \\sim N(\\boldsymbol{0},\\boldsymbol{V})  \\\\\n\\boldsymbol{\\theta}_t &= \\boldsymbol{G} \\boldsymbol{\\theta}_{t-1} + \\boldsymbol{w}_t, \\qquad \\boldsymbol{w}_t \\sim N(\\boldsymbol{0},\\boldsymbol{W})\n\\end{align}\\] $$ where we have written the multivariate distribution \\(N(\\boldsymbol{0},\\boldsymbol{V})\\) for \\(v_t\\), even though it is actually a scalar here, to be consistent with the notation used later.\nFor example, the local level model is a state-space model with a single scalar state variable \\(\\boldsymbol{\\theta}_t = \\mu_t\\) and parameters\n\\[\n\\begin{align}\n\\boldsymbol{F} &= 1 \\\\\n\\boldsymbol{G} &= 1  \\\\\n\\boldsymbol{V} &= \\sigma_\\varepsilon^2 \\\\\n\\boldsymbol{W} &= \\sigma_\\nu^2\n\\end{align}\n\\]\nWe learn about the state \\(\\mu_t\\) from the observed time series \\(y_t\\) . The first equation is often called the observation or measurement model since it gives the connection between the unobserved state and the observed measurements. The measurements can also be a vector, but we will use a single measurement in this tutorial. The second equation is called the state transition model since it determines how the state evolves over time.\nWe can even let the state-space parameters \\(\\boldsymbol{F}, \\boldsymbol{G}, \\boldsymbol{V}, \\boldsymbol{W}\\) be different in every time period. This is in fact needed if we want to write the time-varying regression model in state-space form. Recall the time varying regression model\n\\[\n\\begin{align}  \ny_t &= \\alpha_{t} + \\beta_{t} x_t  + \\varepsilon_t, \\quad \\varepsilon_t \\sim N(0, \\sigma_\\varepsilon^2)  \\\\  \n\\alpha_{t} &= \\alpha_{t-1} + \\eta_t, \\qquad \\quad \\eta_t \\sim N(0, \\sigma_\\alpha^2)   \\\\  \n\\beta_{t} &= \\beta_{t-1} + \\nu_t, \\qquad \\quad \\nu_t \\sim N(0, \\sigma_\\beta^2)\n\\end{align}\n\\]\nWe can tuck the two time-varying parameters in a vector \\(\\boldsymbol{\\beta}=(\\alpha_t,\\beta_t)^\\top\\) and also write the models as\n\\[\n\\begin{align}  \ny_t &= \\boldsymbol{x}_t^\\top\\boldsymbol{\\beta}_{t}   + \\varepsilon_t, \\hspace{0.8cm} \\varepsilon_t \\sim N(0, \\sigma_\\varepsilon^2)  \\\\    \n\\boldsymbol{\\beta}_{t} &= \\boldsymbol{\\beta}_{t-1} + \\boldsymbol{w}_t, \\quad \\quad \\nu_t \\sim N(0, \\boldsymbol{W})\n\\end{align}\n\\]\nwhere\n\\[\n\\begin{align}  \n\\boldsymbol{x}_t &= (1,x_t)^\\top  \\\\    \n\\boldsymbol{w}_t &= (\\eta_t,\\nu_t)^\\top  \\\\\n\\boldsymbol{W} &=\n\\begin{pmatrix}\n\\sigma_\\alpha^2 & 0 \\\\\n0               & \\sigma_\\eta^2\n\\end{pmatrix}\n\\end{align}\n\\]\nNote that this is a state-space model with\n\\[\n\\begin{align}\n\\boldsymbol{F}_t &= \\boldsymbol{x}_t\\\\\n\\boldsymbol{G} &=\n\\begin{pmatrix}\n1 & 0 \\\\\n0 & 1\n\\end{pmatrix} \\\\\n\\boldsymbol{V} &= \\sigma_\\varepsilon^2 \\\\\n\\boldsymbol{W} &=\n\\begin{pmatrix}\n\\sigma_\\alpha^2 & 0 \\\\\n0               & \\sigma_\\eta^2\n\\end{pmatrix}\n\\end{align}\n\\]\nand note now that \\(\\boldsymbol{F}\\) changes in every time period, hence the subscript \\(t\\).\nFinally, we can also have multivariate response vector \\(\\boldsymbol{y}_t\\)\nas\n\\[\n\\begin{align}\n\\boldsymbol{y}_t &= \\boldsymbol{F} \\boldsymbol{\\theta}_t + \\boldsymbol{v}_t,\\hspace{1.5cm} \\boldsymbol{v}_t \\sim N(\\boldsymbol{0},\\boldsymbol{V})  \\\\\n\\boldsymbol{\\theta}_t &= \\boldsymbol{G} \\boldsymbol{\\theta}_{t-1} + \\boldsymbol{w}_t, \\qquad \\boldsymbol{w}_t \\sim N(\\boldsymbol{0},\\boldsymbol{W})\n\\end{align}\n\\]\n\n\nFiltering and smoothing\nThere are two different types of relevant inferences in state-space models: filtering and smoothing:\n\nThe filtered estimate \\(\\hat{\\boldsymbol{\\theta}}_{t|t}\\) of the state \\(\\boldsymbol{\\theta}_t\\) uses data up to time \\(t\\).\nThe smoothed estimate \\(\\hat{\\boldsymbol{\\theta}}_{t|T}\\) of the state \\(\\boldsymbol{\\theta}_t\\) uses data up to time \\(T\\), the end of the time series.\n\nThe filtered estimate is therefore the instantaneous estimate, giving the best estimate of the current state. The smoothed estimate is the retrospective estimate that looks back in time and gives us the best estimate using all the data.\nFiltering means to compute the sequence of instantaneous estimates of the unobserved state at every time point \\(t=1,2,\\ldots,T\\)\n\\[\n\\hat{\\boldsymbol{\\theta}}_{1|1},\\hat{\\boldsymbol{\\theta}}_{2|2},\\ldots,\\hat{\\boldsymbol{\\theta}}_{T|T}\n\\]\nWe will take a time series and compute the filtered estimates for the whole time series, but it is important to understand that filtering is often done in real-time, which means it is a continuously ongoing process that returns filtered estimates of the state \\(\\boldsymbol{\\theta}_t\\) as time progresses and new measurements \\(y_t\\) come in. Think about a self-driving car that is continuously trying to understand the environment (people, other cars, the road conditions etc). The environment is the state and the car uses its sensors to collect measurements. The filtering estimates tells the car about the best guess for the environment at every point in time.\nFor state-space models of the type discussed here (linear measurement equation and linear evolution of the state, with independent Normal measurement errors and state innovations), the filtered estimates can be computed with one of the most famous algorithms in statistics: the Kalman filter.\nThe Kalman filter is a little messy to write up if you are shaky on vectors and matrices, but we will do it for completeness. We will however use a package for it so don’t worry if the linear algebra is intidimating. We will use the notation $\\(\\boldsymbol{\\mu}_{t|t}\\) instead of \\(\\hat{\\boldsymbol{\\theta}}_{t|t}\\), but they really mean the same.\n\ntime \\(t = 0\\). The Kalman filter starts with mean \\(\\boldsymbol{\\mu}_{0|0}\\) and covariance matrix \\(\\boldsymbol{\\Omega}_{0|0}\\) for the state at time \\(t=0\\). Think about \\(\\boldsymbol{\\mu}_{0|0}\\) as the best guess \\(\\boldsymbol{\\theta}_0\\) of the state vector at time \\(t=0\\) and \\(\\boldsymbol{\\Omega}_{0|0}\\) representing how sure we can be about this guess2.\ntime \\(t = 1\\). The Kalman filter then uses the first measurement \\(y_1\\) to update \\(\\boldsymbol{\\mu}_{0|0} \\rightarrow \\boldsymbol{\\mu}_{1|1}\\) and \\(\\boldsymbol{\\Omega}_{0|0} \\rightarrow \\boldsymbol{\\Omega}_{1|1}\\) to represent the estimate and the uncertainty for \\(\\boldsymbol{\\theta}_1\\), the state at time \\(t=1\\).\ntime \\(t = 2,...,T\\). It then continues in this fashion using the next measurement \\(y_2\\) to compute \\(\\boldsymbol{\\mu}_{2|2}\\) and \\(\\boldsymbol{\\Omega}_{2|2}\\) and so on all the way to the end of the time series to finally get \\(\\boldsymbol{\\mu}_{T|T}\\) and \\(\\boldsymbol{\\Omega}_{T|T}\\).\n\nHere is the Kalman filter algorithm:\n\n\nInitialization: set \\(\\boldsymbol{\\mu}_{0|0}\\) and \\(\\boldsymbol{\\Omega}_{0|0}\\)\nfor \\(t=1,\\ldots,T\\) do\n\nPrediction update\\[\n\\begin{align}\n\\boldsymbol{\\mu}_{t|t-1} &= \\boldsymbol{G} \\boldsymbol{\\mu}_{t-1|t-1} \\\\  \n\\boldsymbol{\\Omega}_{t|t-1} &= \\boldsymbol{G}\\boldsymbol{\\Omega}_{t-1|t-1}  \\boldsymbol{G}^\\top + \\boldsymbol{W}\n\\end{align}\n\\]\nMeasurement update\\[\n\\begin{align}\n\\boldsymbol{\\mu}_{t|t} &= \\boldsymbol{\\mu}_{t|t-1} + \\boldsymbol{K}_t ( y_t - \\boldsymbol{F} \\boldsymbol{\\mu}_{t|t-1}  )  \\\\  \n\\boldsymbol{\\Omega}_{t|t} &= (\\boldsymbol{I} - \\boldsymbol{K}_t \\boldsymbol{F} )\\boldsymbol{\\Omega}_{t|t-1}\n\\end{align}\n\\]\n\n\nwhere \\[\\boldsymbol{K}_t = \\boldsymbol{\\Omega}_{t|t-1}\\boldsymbol{F}^\\top ( \\boldsymbol{F} \\boldsymbol{\\Omega}_{t|t-1}\\boldsymbol{F}^\\top + \\boldsymbol{V})^{-1}\\] is the Kalman Gain.\n\nThe widget below lets you experiment with the Kalman filter for the local level model fitted to the Nile river data. In the widget we infer (filter) the local levels \\(\\mu_1,\\mu_2,\\ldots,\\mu_T\\) and can experiment with the measurement standard deviation \\(\\sigma_\\varepsilon\\), the standard deviation of the innovations to the local mean \\(\\sigma_\\eta\\), and also the initial guess for \\(\\mu_0\\) and the standard deviation \\(\\sigma_0\\) of that guess.\nHere are few things to try out in the widget below:\n\nIncrease the measurement standard deviation \\(\\sigma_\\varepsilon\\) and note how the filtered mean pays less and less attention to changes in the data (because the model believes that the data is very poor quality (noisy) and tells us basically nothing about the level). Then move \\(\\sigma_\\varepsilon\\) to smaller values and note how the filtered mean starts chasing the data (because the model believes that the data are super informative about the level).\nMake the standard deviation for the initial level \\(\\sigma_0\\) very small and then change the initial mean \\(\\mu_0\\) to see how this affects the filtered mean at the first part of the time series.\nMove the standard deviation of the innovations to the level \\(\\sigma_\\eta\\) small and note how the filtered mean becomes smoother and smoother over time."
   },
   {
     "objectID": "tutorial/statespace/statespace.html#non-gaussian-state-space-models",