-
Notifications
You must be signed in to change notification settings - Fork 12
Ordination
Ordination is a collective term for multivariate techniques that summarize a multidimensional dataset in such a way that when it is projected onto a low-dimensional space, any intrinsic pattern the data may possess becomes apparent upon visual inspection.
- It is impossible to visualize multiple dimensions simultaneously
- Saves time, in contrast to a separate univariate analysis
- Other than being a “dimension reduction technique”, by focusing on ‘important dimensions’, we avoid interpreting (and misinterpreting) noise. Thus, ordination is also a ‘noise reduction technique’
Principal Component Analysis, or PCA, is a dimensionality-reduction method that is often used to reduce the dimensionality of large data sets while preserving as much information as possible. PCA is used in exploratory data analysis and for making predictive models.
- "Easy to implement" tool for exploratory data analysis & for making predictive models
- Convenient visualization of high-dimensional data
- Highly affected by outliers in data
- Favours strong correlations
Principal Coordinate Analysis, or PCoA, is a method to explore and to visualize similarities or dissimilarities of data. It starts with a similarity matrix or dissimilarity matrix (= distance matrix) and assigns for each item a location in a low-dimensional space.
- can handle wide range of data
- convenient visualization of high-dimensional data
- values of the objects along a PCoA axis of interest may be correlated
Source: https://www.youtube.com/watch?v=HMOI_lkzW08&t=1s&ab_channel=StatQuestwithJoshStarmer
- There is a Principal Component/Coordinate for each dimension
- If we have “n” variables, we would have “n” Principal Components/Coordinates
- PC1/PCo1/Dim1 would span the direction of most variation
- PCo2/PCo2/Dim2 would span in the direction of 2nd most variation
- .
- .
- PC”n”/PCoA”n”/Dim"n" would span in the direction of “n”th most variation
- Each axis has an eigenvalue whose magnitude indicates the amount of variation captured in that axis
Redundancy Analysis can analyse relationships between 2 tables of variables. It is very similar to PCA.
RDA can be labelled as constrained PCA. While PCA, being without constraints, can search for variables that can best explain sample composition, RDA only searches for the best explanatory and defining variables.
While PCA decomposes relations between columns only, CA decomposes columns and rows simultaneously. CA is more suitable for categorical data than continuous data.
Another similar technique to PCA is Correspondance Analysis which can summarise a set of 2-dimensional data.
Canonical Correspondance Analysis identifies patterns in 2 multivariate datasets & constructs sets of transformed variables by projecting data on these variables
PCA looks for patterns within a single dataset that can represent the maximum distribution of data whereas CCA looks for patterns within 2 datasets describing internal variability
- Non-Metric Multidimensional Scaling is fundamentally different than PCA, CA; more robust: produces an ordination based on a distance or dissimilarity matrix.
- Ordination based on ranks rather than distance rather than object A being 2.1 units distant from object B and 4.4 units distant from object C, object C is the "first" most distant from object A while object C is the "second" most distant.
- Avoids assumption of linear relationships among variables
NMDS Maximizes rank-order correlation between distance measures and distance in ordination space. Points are iteratively moved to minimize "stress". Stress is a measure of the mismatch between the two kinds of distance.
"Goodness-of-fit" is measured by stress, a measure of rank order disagreement between observed & fitted(in the reduced dimension) distance. Ideally, all points should fall on the monotonic "red" line. Shepard Diagram helps us decide the number of dimensions we should use to plot our ordination results
Like other ordination plots, you should qualitatively identify gradients corresponding to the underlying process
Differences from eigenanalysis :
- Does not extract components(based on distances) so axes are meaningless
- The plot can be rotated, translated, or scaled as long as relative distances are maintained
In multivariate statistics, a scree plot is a line plot of the eigenvalues of factors or principal components in an analysis. The scree plot is used to determine the number of principal components to keep in a principal component analysis (PCA)
In this case, we can see 2 Principal Components are enough to capture approximately 90% variance of data with respect to all the dimensions
Programming Language R
- WHAT IS R?
WHY R? - Install R & RStudio
- Data Types & Their Modes
- Reading and Writing Data
- Data Wrangling with tidyr
User-Interface
Group Comparison of Variables within 2 Groups
Comparison of Multiple Groups
Group Comparison of Multivariate Data
Unsupervised Learning
Supervised Learning