GitHub - mmp2/manifold-learning-examples

Welcome to the Manifold Learning Examples project!

Manifold Learning (ML) algorithms -- also called Embedding algorithms -- can help us interpret data with many dimensions (such as a cloud of word embeddings or of configurations of a molecule) by mapping it to 2D or to 3D where we can see it. But is what we are seeing the real shape of the data? Almost always, ML algorithms distort the shape. Sometimes the distortions are unimportant, but sometimes they can make us see clusters, "arms", holes, and "horseshoes" (what we will call artefacts) that are not properties of the data, but just effects of the algorithm and parameter choices.

This project illustrates some of the most common effects and artefacts you will encounter, as you start using Embedding algorithms for your real data. The artefacts are not symptoms of "too little data" -- most of them persist even when the data size n goes to infinity! We chose simple artificial examples as the most common effects are present even with the simplest data.

The good news is that once you are aware of their presence, the artefacts and distorions can be recognized and methods exist to circumvent or to correct them.

What you will find on this site

a recent talk hosted by the ACMS social Unsupervised Learning in the age of Big Data
a recent talk hosted by IMSI: Dimension reduction from a user's perspective. Understanding the configuration spaces of molecules with Manifold Learning
short tutorial introductions to manifold learning
- Manifolds explained
- Manifold Learning explained
- ---What is an "embedding", and what is not one, with a touch of topology on the torus.
- ---more examples of embeddings of some familliar shapes. Can you recognize them?
- ---more examples of embeddings of some familliar shapes. Answers unveiled
links to longer tutorials on manifold learning -- Video lectures on Manifold Learning by Marina Meila Video Lecture 1, Video Lecture 2, Video Lecture 3, annotated slides from Lecture 1 and from Lectures 2-3 here, and unannotated Lecture slides with additional definitions and notes -- Manifold learning: what, how and why by Marina Meila and Hanyu Zhang. Review article in Annual Reviews of Statistics and Its Applications, Volume 11, 2024.
links to software
- scikit-learn simple Manifold Learning software package
- megaman performance optimized Manifold Learning software package, with API compatible with sk-learn; megaman implements state of the art theoretical results. Updated instructions for installing and using megaman on your computer are in install_use_megaman.ipynb, install_env.sh.
a series of short articles illustrating significant but less widely known counterintuitive behaviors of manifold learning algorithms. Most of these effects are predictable or documented theoretically, and we include (light) references to the main sources.
- Aspect-ratio and horseshoes aspect-ratio.md Most real (manifold) data extends more in one direction than in others, that is, it looks more like elongated strips than as discs or blobs. This can have unexpected drastic consequences on some ML algorithms, when they are used naively. We demonstrate both what can happen, how the problem is detected by simple inspecttion and a simple way to fix it. See also Manifold learning: what, how and why Figure 3.
- Eigenvector selection (TBW) the aspect ratio problem can be fixed by selecting (functionally independent) eigenvectors as shown here ("Selecting the independent coordinates of manifolds with large aspect ratios").
- Non-uniform density, stretching and contracting variable_density.md For most ML algorithms, the way we sample the data affects the algorithm's output. In other words, what we see is not just the shape of the manifold, but a combination of the shape and the density of the data on the manifold, that can vary by algorithm and by the parameters used.
- Subtle effect: nearest neighbors, renormalizatioh TBW (see Manifold learning: what, how and why, Figure 6 in the meanwhile)
- t-SNE tsne.md (in progress) t-SNE is a very popular algorithm for visualizing high-dimensional data in 2D and 3D. However, care must be taken, as sometimes the features we see do not exist in the original data.
- UMAP experiments (under construction). How to choose the parameters? See also the UMAP documentation and a relevant article on t-SNE, UMAP and similar algorithms "Attraction-Repulsion Spectrum in Neighbor Embeddings"
the python code used to generate the examples. Most of the examples are based on sk-learn; for spectral_embedding (aka the DiffusionMaps / LaplacianEigenmaps algorithm) the megaman code is used; for UMAP and t-SNE we used the code provided by the authors. Additionally, we occasionally share our experience in installing/running these codes.
- generating samples from simple synthetic manifolds (rectangle, rectangle with a hole, swiss roll=rolled up rectangle, 3D rectangle, circle, torus); in these examples, the samples are distributed nearly uniformly [more details TBW]
- generating samples from the same manifolds as above, but in a highly non-uniform manner (that can be controlled).
- generating informative plots of the algorithms' output
- instructions on how to install megaman on windows install_use_megaman.ipynb, install_env.sh.
- instructions on how to make the ellipse distortion plots
the plots used in the articles and many more

How to use this site

Feel free to use the code, articles and graphics, citing this repository (please see sidebar About to obtain citation). Currently, this is a working repository; changes to the code or files are possible.

Contributors (in alphabetical order)

Haoqiang (Murray) Kang original repository creator, non-uniform density, aspect ratio, t-SNE
Marina Meila, Professor, concept and scientific leadership
Hangliang Ren (Harry), spectral embedding, non-uniform density, plotting, aspect ratio
Qirui Wang, UMAP, aspect ratio
Yujia Wu, data generation, plotting, Local Linear Embedding, aspect ratio
Shuzhen Zhang manifold learning explained, Riemannian metric, maps embeddings, site curator 2024

Name		Name	Last commit message	Last commit date
Latest commit History 438 Commits
T-SNE		T-SNE
Talks		Talks
aspect-ratio-code		aspect-ratio-code
aspect-ratio-parameters		aspect-ratio-parameters
aspect-ratio-plots		aspect-ratio-plots
graphs-from-ARSIA-figs		graphs-from-ARSIA-figs
other-figures		other-figures
synthetic-data-code		synthetic-data-code
synthetic-data-file		synthetic-data-file
.gitignore		.gitignore
Basic UMAP Parameters.ipynb		Basic UMAP Parameters.ipynb
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
Ratios for Four Shapes		Ratios for Four Shapes
Review-figures.ipynb		Review-figures.ipynb
UMAP.ipynb		UMAP.ipynb
UMAP_experiments.ipynb		UMAP_experiments.ipynb
aspect-ratio.md		aspect-ratio.md
display-distortion-howto.md		display-distortion-howto.md
fig-mani-swisshole.png		fig-mani-swisshole.png
install_env.sh		install_env.sh
install_use_megaman.ipynb		install_use_megaman.ipynb
manifold-learning-explained.md		manifold-learning-explained.md
manifolds-explained.md		manifolds-explained.md
maps-anon.md		maps-anon.md
maps-embeddings.md		maps-embeddings.md
research-overview-ACMS23.pdf		research-overview-ACMS23.pdf
smooth-embedding.md		smooth-embedding.md
to-do.md		to-do.md
toolbox.py		toolbox.py
tsne.md		tsne.md
variable_density.md		variable_density.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to the Manifold Learning Examples project!

What you will find on this site

How to use this site

Contributors (in alphabetical order)

About

Releases

Packages

Contributors 7

Languages

License

mmp2/manifold-learning-examples

Folders and files

Latest commit

History

Repository files navigation

Welcome to the Manifold Learning Examples project!

What you will find on this site

How to use this site

Contributors (in alphabetical order)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages