Statistical Learning

Introduction

This github repository is intended to provide materials (slides, scripts datasets etc) for one part of the Statistical Learning course at the UPC-UB MSc in Statistics and Operations Research (MESIO).

This part of the course has an introduction and two blocks, each with two parts.

Class material

All class materials are available from the repository https://aspteaching.github.io/Introduction2StatisticalLearning/.

In this page you will find links to the html/pdf version of the slides and other documents, as well as to datasets or references and resources documents

Course presentation

Introduction to Statistical Learning

1.1 Introduction: Statistics, Machine Learning, Statistical Learning and Data Science
1.2 Overview of Supervised Learning
1.3 Model validation and Resampling
Rlabs
- Regression with KNN
- Classification with KNN
Complements
- Introduction to biomarkers and diagnostic tests

Tree based methods

Decision Trees

Decision trees are a type of non-parametric classifiers which have been Very successful because of their interpretability, flexibility and a very decent accuracy.

Slides
[R-lab]
- Lab_1- Classification and Regression Trees
Python-labs
- Lab-0 Introduction to python (from ISL. Ch 02)
- Lab_1- Decision Trees lab (from ISL. Ch 08)

Ensemble methods

The term "Ensemble" (together in french) refers to distinct approaches to build predictiors by combining multiple models.

They have proved to addres well some limitations of trees therefore improving accuracy and robustness as well as being able to reduce overfitting and capture complex relationships.

Slides
Labs
- Ensemble Lab (Rmd version)
- Ensemble Lab (Python notebook

Artifical Neural Networks

Shallow Neural Networks

These are raditional ML models, inspired in brain, that simulate neuron behavior, thata is they receive an input, which is processed and an output prediction is produced.

For long their applicability has been relatively restricted to a few fields or problems due mainly to their "black box" functioning that made them hard to interpret.

The scenario has now completely changed with the advent of deep neural networks which are in the basis of many powerful applications of artificial intelligence.

Neural Networks Slides
Labs
- NeuralNets Lab (Rmd version)
- NeuralNets Lab (Python notebook

Deep Neural Networks

Esssentially these are ANN with multiple hidden layers with allow overpassing many of their limitations.

They can be tuned in a much more automatical way and have been applied to many complex tasks. such as Computer vision, Natural Language Processing or Recommender systems.

Deep learning Slides
- DeepLearning Lab (Rmd version)
- DeepLearning Lab (Python notebook

References and resources

References for Tree based methods

Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. CRC press.
Brandon M. Greenwell (202) Tree-Based Methods for Statistical Learning in R. 1st Edition. Chapman and Hall/CRC DOI: https://doi.org/10.1201/9781003089032 Web site
Efron, B., Hastie T. (2016) Computer Age Statistical Inference. Cambridge University Press. Web site
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112). Springer.

References for deep neural networks

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning (Vol. 1). MIT press. Web site
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.
Chollet, F. (2018). Deep learning with Python. Manning Publications.
Chollet, F. (2023). Deep learning with R . 2nd edition. Manning Publications.

Some interesting online resources

Decision Trees free course (9 videos). By Analytics Vidhya
Applied Data Mining and Statistical Learning (Penn Statte-University)
R for statistical learning
An Introduction to Recursive Partitioning Using the RPART Routines
Introduction to Artificial Neural Networks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!