- Rsquared performance metric for regression/continuous outcomes was previously
calculated using
defaultSummary()
function fromcaret
which uses the square of Pearson correlation coefficient (r-squared), instead of the correct coefficient of determination which is calculated as1 - rss/tss
, whererss
= residual sum of squares,tss
= total sum of squares. The correct formula for R-squared is now being applied.
- Prevent bug if
x
is a single predictor.
- Added
prc()
which enables easy building of precision-recall curves from 'nestedcv' models andrepeatcv()
results. - Added
predict
method forcva.glmnet
. - Removed magrittr as an imported package. The standard R pipe
|>
can be used instead. - Added
metrics()
which gives additional performance metrics for binary classification models such as F1 score, Matthew's correlation coefficient and precision recall AUC. - Added
pls_filter()
which uses partial least squares regression to filter features. - Enabled parallelisation over repeats in
repeatedcv()
leading to significant improvement in speed.
- Fixed issue with xgboost on linux/windows with parallel processing in
nestcv.train()
. If argumentcv.cores
>1, openMP multithreading is now disabled, which prevents caret modelsxgbTree
andxgbLinear
from crashing, and allows them to be parallelised efficiently over the outer CV loops. - Improvements to
var_stability()
and its plots. - Fixed major bug in multivariate Gaussian and Cox models in
nestcv.glmnet()
- Added new feature
repeatcv()
to apply repeated nested CV to the mainnestedcv
model functions for robust measurement of model performance.
- Added new feature via
modifyX
argument to allnestedcv
models. This allows more powerful manipulation of the predictors such as scaling, imputing missing values, adding extra columns through variable manipulations. Importantly these are applied to train and test input data separately. - Added
predict()
function fornestcv.SuperLearner()
- Added
pred_SuperLearner
wrapper for use withfastshap::explain
- Fixed parallelisation of
nestcv.SuperLearner()
on windows.
- Added support for multivariate Gaussian and Cox models in
nestcv.glmnet()
- Added argument
verbose
innestcv.train()
,nestcv.glmnet()
andoutercv()
to show progress. - Added argument
multicore_fork
innestcv.train()
andoutercv()
to allow choice of parallelisation between forked multicore processing usingmclapply
or non-forked usingparLapply
. This can help prevent errors with certain multithreaded caret models e.g.model = "xgbTree"
. - In
one_hot()
changedall_levels
argument default toFALSE
to be compatible with regression models by default. - Add coefficient column to
lm_filter()
full results table
- Fixed significant bug in
lm_filter()
where variables with zero variance were incorrectly reporting very low p-values in linear models instead of returningNA
. This is due to how rank deficient models are handled byRcppEigen::fastLmPure
. Default method forfastLmPure
has been changed to0
to allow detection of rank deficient models. - Fixed bug in
weight()
caused byNA
. Allowweight()
to tolerate character vectors.
- Better handling of dataframes in filters.
keep_factors
option has been added to filters to control filtering of factors with 3 or more levels. - Added
one_hot()
for fast one-hot encoding of factors and character columns by creating dummy variables. - Added
stat_filter()
which applies univariate filtering to dataframes with mixed datatype (continuous & categorical combined). - Changed one-way ANOVA test in
anova_filter()
fromRfast::ftests()
tomatrixTests::col_oneway_welch()
for much better accuracy
- Fixed bug caused by use of weights with
nestcv.train()
(Matt Siggins suggestion)
- Added
n_inner_folds
argument tonestcv.train()
to make it easier to set the number of inner CV folds, andinner_folds
argument which enables setting the inner CV fold indices directly (suggestion Aline Wildberger)
- Fixed error in
plot_shap_beeswarm()
caused by change in fastshap 0.1.0 output from tibble to matrix - Fixed bug with categorical features and
nestcv.train()
- Add argument
pass_outer_folds
to bothnestcv.glmnet
andnestcv.train
: this enables passing of passing of outer CV fold indices stored inouter_folds
to the final round of CV. Note this can only work ifn_outer_folds
= number of inner CV folds and balancing is not applied so thaty
is a consistent length.
- Fix: ensure
nfolds
for final CV equalsn_inner_folds
innestcv.glmnet()
- Improve
plot_var_stability()
to be more user friendly - Add
top
argument to shap plots
- Modified examples and vignette in anticipation of new version of fastshap 0.1.0
- Add vignette for variable stability and SHAP value analysis
- Refine variable stability and shap plots
- Switch some packages from Imports to Suggests to make basic installation simpler.
- Provide helper prediction wrapper functions to make it easier to use package
fastshap
for calculating SHAP values. - Add
force_vars
argument toglmnet_filter()
- Add
ranger_filter()
- Disable printing in
nestcv.train()
from models such asgbm
. This fixes multicore bug when using standard R gui on mac/linux. - Bugfix if
nestcv.glmnet()
model has 0 or 1 coefficients. - Add multiclass AUC for multinomial classification.
nestedcv
models now returnxsub
containing a subset of the predictor matrixx
with filtered variables across outer folds and the final fitboxplot_model()
no longer needs the predictor matrix to be specified as it is contained inxsub
innestedcv
modelsboxplot_model()
now works for allnestedcv
model types- Add function
var_stability()
to assess variance and stability of variable importance across outer folds, and directionality for binary outcome - Add function
plot_var_stability()
to plot variable stability across outer folds - Add
finalCV = NA
option which skips fitting the final model completely. This gives a useful speed boost if performance metrics are all that is needed. model
argument inoutercv
now prefers a character value instead of a function for the model to be fitted- Bugfixes
- Add check model exists in
outercv
- Perform final model fit first in
nestcv.train
which improves error detection in caret. Sonestcv.train
can be run in multicore mode straightaway. - Removes predictors with variance = 0
- Fix bug caused by filter p-values = NA
- Add confusion matrix to results summaries for classification
- Fix bugs in extraction of inner CV predictions for
nestcv.glmnet
- Fix multinomial
nestcv.glmnet
- Add
outer_train_predict
argument to enable saving of predictions on outer training folds - Add function
train_preds
to obtain outer training fold predictions - Add function
train_summary
to show performance metrics on outer training folds
- Add examples of imbalance datasets
- Fix rowname bug in
smote()
- Add support for nested CV on ensemble models from
SuperLearner
package - Final CV on whole data is now the default in
nestcv.train
andnestcv.glmnet
- Fix windows parallelisation bugs
- Fix bug in
nestcv.train
for caret models with tuning parameters which are factors - Fix bug in
nestcv.train
for caret models using regression - Add option in
nestcv.train
andnestcv.glmnet
to tune final model parameters using a final round of CV on the whole dataset - Fix bugs in LOOCV
- Add balancing to final model fitting
- Add case weights to
nestcv.train
andoutercv
- Add
randomsample()
to handle class imbalance using random over/undersampling - Add
smote()
for SMOTE algorithm for increasing minority class data - Add bootstrap wrapper to filters, e.g.
boot_ttest()
- Final lambda in
nestcv.glmnet()
is mean of best lambdas on log scale - Added
plot_varImp
for plotting variable importance fornestcv.glmnet
final models
- Corrected handling of multinomial models in
nestcv.glmnet()
- Align lambda in
cva.glmnet()
- Improve plotting of error bars in
plot.cva.glmnet
- Bugfix: plot of single
alphaSet
inplot.cva.glmnet
- Updated documentation and vignette
- Parallelisation on windows added
- hsstan model has been added (Athina Spiliopoulou)
- outer_folds can be specified for consistent model comparisons
- Checks on x, y added
- NA handling
- summary and print methods
- Implemented LOOCV
- Collinearity filter
- Implement lm and glm as models in outercv()
- Runnable examples have been added throughout
- Major update to include nestedcv.train function which adds nested CV to the
train
function ofcaret
- Note passing of extra arguments to filter functions specified by
filterFUN
is no longer done through...
but with a list of arguments passed through a new argumentfilter_options
.
- Initial build of nestedcv
- Added outercv.rf function for measuring performance of rf
- Added cv.rf for tuning mtry parameter
- Added plot_caret for plotting caret objects with error bars on the tuning metric