Cubic smoothing splines with natural boundary conditions and automated choice of the smoothing parameter
A natural cubic smoothing splines module to smooth-out noise and obtain an estimate of the first two derivatives (velocity and acceleration in the case of a particle trajectory).
Various methods have been introduced for the automatic choice of the smoothing parameter.
It can also be used to get an interpolating natural cubic spline.
This code was adapted to python from the Octave splines package created by N.Y. Krakauer (see Refs. below), with algorithmic modifications, mainly to gain performance.
- Krakauer, N. Y. & Fekete, B. M. Are climate model simulations useful for forecasting precipitation trends? hindcast and synthetic-data experiments. Environ. Res. Lett. 9, 024009 (2014).
- Krakauer's original octave splines code http://octave.sourceforge.net/splines/
- E. Afik. Robust and highly performant ring detection algorithm for 3d particle tracking using 2d microscope imaging. Sci. Rep. 5, 13584; doi: 10.1038/srep13584 (2015)
- E. Afik and V. Steinberg. On the role of initial velocities in pair dispersion in a microfluidic chaotic flow. Nature Communications 8, Article number: 468 (2017) doi: 10.1038/s41467-017-00389-8.
- E. Afik and V. Steinberg. A Lagrangian approach to elastic turbulence in a curvilinear microfluidic channel. figshare doi: 10.6084/m9.figshare.5112991 (2017).
- T. Li, A. Yan, N. Bhatia, A. Altinok, E. Afik, P. Durand-Smet, ... & E. M. Meyerowitz. Calcium signals are necessary to establish auxin transporter polarity in a plant stem cell niche. Nature communications 10, Article number: 726 (2019) doi: 10.1038/s41467-019-08575-6.
TODO: render math properly
Carl de Boor (1978), A Practical Guide to Splines, Springer, Chapter XIV
Given noisy data y parametrised by x (with weights
** Some notations (following [de Boor
]):**
The following relations result from the continuity of the first derivative and minimisation the regularised LSQ
expressing
It should be noted that for natural spline $ c_{0} = c_{N-1} = 0 $,
hence in the above equation the first and last elements of
The ppform (in terms of
C.M. Hurvich, J.S. Simonoff, C-L Tsai (1998), Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion, J. Royal Statistical Society, 60B:271-293
P. Craven and G. Wahba (1978), Smoothing noisy data with spline functions, Numerische Mathematik, 31:377-403
M. F. Hutchinson and F. R. de Hoog (1985), Smoothing noisy data with spline functions, Numerische Mathematik, 47:99-106
V. Cherkassky and F. Mulier (2007), Learning from Data: Concepts, Theory, and Methods. Wiley
Nir Krakauer, octave-splines 1.2.4
L. Wasserman (2004), All of Nonparametric Statistics
Following Hurvich et al. (1998)
:
"Classical" methods to choose the smoothing parameter are based on the
minimisation of an approximately unbiased estimator of either the mean
average squared error: $$ \qquad MASE = \frac{1}{N} E { || g(x_i) -
f_p(x_i) ||^2 } $$
(as the case of GCV) or the expected Kullback-Leibler discrepency (as
the case for the AIC).
The smoothing parameter is the the minimizer of $$ \qquad \log \hat
\sigma ^2 + \psi (H_p / N) $$
where $$ \qquad \hat \sigma ^2 = \frac{1}{N} \sum { y_i - f_p(x_i)
}^2 = \frac{1}{N} || (I-H_p)y ||^2 $$
and
The Hat matrix Wasserman (2004)
]
or the Influence matrix Hutchinson & de Hoog (1985)
]),
is the matrix transforming the (noisy) data to the estimator:
$$ \qquad a = H_p y $$
Denote
Cherkassky and Mulier (2007) [p.129]
Vapnik-Chervonenkis penalization factor or Vapnik's measure: $$ \qquad \psi_{VM} = -\log \left[ 1 - \sqrt{ h-h \log h +\log N / 2N } \right] $$
All require the estimation of Craven & Wahba (1978)
, denote Craven & Wahba (1978)
suggest to choose by
$$ \qquad \min_p \frac{\hat \sigma^2} { [Tr(I-H_p)/N]^2} =
\min_p N \frac{ || \Lambda U^T y ||^2}{\left(Tr(\Lambda)
\right)^2}$$
where $ \Lambda $ denotes the diagonal matrix with the elements
$\frac{\sigma_i^2}{6(1-p)\sigma_i^2 + p} $ on the diagonal.