Bryce M. Henson, Dong K. Shin, Kieran F. Thomas
A matlab function that uses bootstrapping to find the standard error in an arbitrary analysis operation. Status: This core functionality provided here is ready for use in other projects. Testing is implemented and passing for the core functionality which provides error determination.
It only takes a moderate amount of complexity in data analysis operation(estimation function) before it is difficult to determine the error in the result. Bootstraping/resampling is a powerful statistical method that performs the analysis operation repeatedly on smaller subsets of the data in order to estimate the error in the result of the the operation(estimation function) on the full data set. Further the method is able to work with an analysis operation that only produces meaningful results when performed with many data points (such as a (non)linear fit)
The procedure is reasonably simple given some analysis operation (estimation function) A(x) (that produces a scalar) and a dataset D
- select a random sample of the data S of length n_samp (with replacements) out of all data collected (D, with length n_tot)
- compute the analysis operation A(S)
- repeat steps 1 to 2 many times saving the result of each analysis operation (on the subset)
- calculate the standard deviation across these results and multiply by sqrt(n_samp)/sqrt(n_tot) to estimate the standard error in A(D). (This is known as mean-like scaling.)
As a test it is advisable to check that there is no trend in either: the output of the function, or the estimated standard error, as function of the size of the subset. Thus the above procedure may be repeated at many different fractions of the whole dataset giving the graph below.
The above uses random sampling with replacement in order to prevent biasing of the standard error estimate. It is however possible to use the Finite sample correction from L. Isserlis,On the Value of a Mean as Calculated from a Sample,J. Royal Stat. Soc Vol. 81, No. 1 (Jan., 1918), pp. 75-81 to correct for the bias when using random sampling without replacements. Both methods are implemented in this work.
If you are studying the error from some analysis operation then it may be required to know how significant some change in this error is. This is where it is natural to start worrying about the error in the SE. This code provides two estimates, the first assumes a normal distribution (and is unbiased) the second does not and is slightly biased.
- wiki
- pi bootstrapped
- Jason Brownlee,A gentler introduction to the bootstrap method
- [https://ijpam.eu/contents/2005-21-3/10/10.pdf]
- [https://en.wikipedia.org/wiki/Variance#Distribution_of_the_sample_variance]
- [https://ijpam.eu/contents/2009-52-1/5/5.pdf]
- dist. of sample var.
- centeral moments
- see derivation folder
- https://ijpam.eu/contents/2005-21-3/10/10.pdf
- https://ijpam.eu/contents/2009-52-1/5/5.pdf
contributors welcome! There is a lot to do to build this into a powerful tool. Drop me an email.
- allow for vctor output of function, producing the estimated error for each element of the vector
- add an option to let the estimator function handle the subsampling
- fix the overestimation of error at small fractions of the dataset
- allow second output from anal_opp function (to be used as a structure of details about the fit)
- allow for this second output not to be provided
- provide all these second outputs in a cell matrix
- error in the error
- fix how the estimated SE of the SE estimate is calculated
- make all work for even a single subsample size
- check estimated SE of the SE estimate by nesting the bootstrapper
- undestand why results are wrong
- understand how should treat combining multiple sampling fractions
- fit a laurent series to the mean and error dependence
- more Documentation
- careful documentation in function of what each output is
- commenting in main function with links
- organizing the resources in this readme
- write tutorial in more detail
- normalily testing
- during the bootstrap try and determine if the underlying distribution is normal or not
- package
- build convergence test for some distributions (normal,uniform,arb. kurtosis
- make a nice logo/diagram
- add to matlab file exchange