Unsupervised machine learning is a class of algorithms that identifies patterns in unlabeled data, i.e. without considering an outcome or target. This workshop will describe and demonstrate powerful unsupervised learning algorithms used for clustering (hdbscan, latent class analysis, hopach), dimensionality reduction (umap, generalized low-rank models), and anomaly detection (isolation forests). Participants will learn how to structure unsupervised learning analyses and will gain familiarity with example code that can be adapted to their own projects.
Author: Chris Kennedy
This is an intermediate machine learning workshop. Participants should have significant prior experience with R and RStudio, including manipulation of data frames, installation of packages, and plotting.
Prerequisite workshops
- R Fundamentals or similar training in R basics.
Recommended workshops
- Machine Learning in R or other supervised learning experience.
Participants should have access to a computer with the following software:
- R version 3.6 or greater
- RStudio
- RTools - if using Windows
To prepare for the workshop, please download the materials and work through the package installation in 0-install.Rmd
. Please report any errors to the GitHub issue queue.
There is also an RStudio Cloud workspace that can be used.
Please create a GitHub issue to report any errors or give feedback on this workshop.
Books
- Boemke & Greenwell (2019). Hands-on Machine Learning with R - free online version
- Hennig et al. (2015). Handbook of Cluster Analysis - thorough and highly recommended
- Aggarwal & Reddy. (2014). Data clustering: algorithms and applications - great complement to Hennig et al.
- Dolnicar et al. (2018). Market segmentation analysis - free, closely tied to R, and chapter 7 is especially helpful
- Izenman (2013). Modern Multivariate Statistical Techniques
- Everitt et al. (2011). Cluster Analysis