Skip to content

Commit

Permalink
separate getting-started page from intro in doc
Browse files Browse the repository at this point in the history
* put intro of scikit-matter into the index page and abbreviate
* add getting started page which gives an overview of important
  implementations
* include existing introductory text for reconstruction measures
  into API reference so all introductory texts from the API are
  included into the getting started
* reword text a bit for more soundness within the getting started
  • Loading branch information
agoscinski committed Jul 27, 2023
1 parent 12ffc9c commit c42e389
Show file tree
Hide file tree
Showing 6 changed files with 152 additions and 100 deletions.
83 changes: 83 additions & 0 deletions docs/src/getting-started.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
Getting started
===============

A small introduction to all methods implemented in scikit-matter.
For a detailed explaination, please look at the :ref:`selection-api`

Features and Samples Selection
------------------------------

.. include:: selection.rst
:start-after: marker-selection-introduction-begin
:end-before: marker-selection-introduction-end


These selectors are available:

* :ref:`CUR-api`: a decomposition: an iterative feature selection method based upon the
singular value decoposition.
* :ref:`PCov-CUR-api` decomposition extends upon CUR by using augmented right or left
singular vectors inspired by Principal Covariates Regression.
* :ref:`FPS-api`: a common selection technique intended to exploit the diversity of
the input space. The selection of the first point is made at random or by a
separate metric
* :ref:`PCov-FPS-api` extends upon FPS much like PCov-CUR does to CUR.
* :ref:`Voronoi-FPS-api`: conduct FPS selection, taking advantage of Voronoi
tessellations to accelerate selection.
* :ref:`DCH-api`: selects samples by constructing a directional convex hull and
determining which samples lie on the bounding surface.

Examples
^^^^^^^^

.. include:: examples/selection/index.rst
:start-line: 4


Reconstruction Measures
-----------------------

.. include:: gfrm.rst
:start-after: marker-reconstruction-introduction-begin
:end-before: marker-reconstruction-introduction-end


These reconstruction measures are available:

* :ref:`GRE-api` (GRE) computes the amount of linearly-decodable information
recovered through a global linear reconstruction.
* :ref:`GRD-api` (GRD) computes the amount of distortion contained in a global linear
reconstruction.
* :ref:`LRE-api` (LRE) computes the amount of decodable information recovered through
a local linear reconstruction for the k-nearest neighborhood of each sample.

Examples
^^^^^^^^

.. include:: examples/reconstruction/index.rst
:start-line: 4

Principal Covariates Regression
-------------------------------

.. include:: pcovr.rst
:start-after: marker-pcovr-introduction-begin
:end-before: marker-pcovr-introduction-end

It includes

* :ref:`PCovR-api` the standard Principal Covariates Regression. Utilises a
combination between a PCA-like and an LR-like loss, and therefore attempts to find
a low-dimensional projection of the feature vectors that simultaneously minimises
information loss and error in predicting the target properties using only the
latent space vectors :math:`\mathbf{T}`.
* :ref:`KPCovR-api` the Kernel Principal Covariates Regression
a kernel-based variation on the
original PCovR method, proposed in [Helfrecht2020]_.


Examples
^^^^^^^^

.. include:: examples/pcovr/index.rst
:start-line: 4
27 changes: 22 additions & 5 deletions docs/src/gfrm.rst
Original file line number Diff line number Diff line change
@@ -1,31 +1,48 @@
.. _gfrm:

Reconstruction Measures
======================================
=======================

.. marker-reconstruction-introduction-begin
A set of easily-interpretable error measures of the relative information capacity of
feature space `F` with respect to feature space `F'`. The methods returns a value
between 0 and 1, where 0 means that `F` and `F'` are completey distinct in terms of
linearly-decodable information, and where 1 means that `F'` is contained in `F`. All
methods are implemented as the root mean-square error for the regression of the
feature matrix `X_F'` (or sometimes called `Y` in the doc) from `X_F` (or sometimes
called `X` in the doc) for transformations with different constraints (linear,
orthogonal, locally-linear). By default a custom 2-fold cross-validation
:py:class:`skosmo.linear_model.RidgeRegression2FoldCV` is used to ensure the
generalization of the transformation and efficiency of the computation, since we deal
with a multi-target regression problem. Methods were applied to compare different
forms of featurizations through different hyperparameters and induced metrics and
kernels [Goscinski2021]_ .

.. marker-reconstruction-introduction-end
.. currentmodule:: skmatter.metrics


.. _GRE-api:

Global Reconstruction Error
###########################
---------------------------

.. autofunction:: pointwise_global_reconstruction_error
.. autofunction:: global_reconstruction_error

.. _GRD-api:

Global Reconstruction Distortion
################################
--------------------------------

.. autofunction:: pointwise_global_reconstruction_distortion
.. autofunction:: global_reconstruction_distortion

.. _LRE-api:

Local Reconstruction Error
##########################
--------------------------

.. autofunction:: pointwise_local_reconstruction_error
.. autofunction:: local_reconstruction_error
30 changes: 11 additions & 19 deletions docs/src/index.rst
Original file line number Diff line number Diff line change
@@ -1,23 +1,11 @@
scikit-matter documentation
===========================
scikit-matter
=============

``scikit-matter`` is a collection of `scikit-learn <https://scikit.org/>`_ compatible
utilities that implement methods born out of the materials science and chemistry
communities.

Convenient-to-use libraries such as scikit-learn have accelerated the adoption and
application of machine learning (ML) workflows and data-driven methods. Such libraries
have gained great popularity partly because the implemented methods are generally
applicable in multiple domains. While developments in the atomistic learning community
have put forward general-use machine learning methods, their deployment is commonly
entangled with domain-specific functionalities, preventing access to a wider audience.

scikit-matter targets domain-agnostic implementations of methods developed in the
computational chemical and materials science community, following the scikit-learn API
scikit-matter is a toolbox of methods developed in the
computational chemical and materials science community, following the
`scikit-learn <https://scikit.org/>`_ API
and coding guidelines to promote usability and interoperability with existing workflows.
scikit-matter contains a toolbox of methods for unsupervised and supervised analysis of
ML datasets, including the comparison, decomposition, and selection of features and
samples.


.. include:: ../../README.rst
:start-after: marker-issues
Expand All @@ -27,9 +15,13 @@ samples.
:maxdepth: 1
:caption: Contents:

intro
getting-started
installation
reference
tutorials
contributing
bibliography


If you would like to contribute to scikit-matter, check out our :ref:`contributing`
page!
68 changes: 0 additions & 68 deletions docs/src/intro.rst

This file was deleted.

23 changes: 23 additions & 0 deletions docs/src/pcovr.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,29 @@
Principal Covariates Regression (PCovR)
=======================================


.. marker-pcovr-introduction-begin
Often, one wants to construct new ML features from their
current representation in order to compress data or visualise
trends in the dataset. In the archetypal method for this
dimensionality reduction, principal components analysis (PCA),
features are transformed into the latent space which best
preserves the variance of the original data. Principal Covariates
Regression (PCovR), as introduced by [deJong1992]_,
is a modification to PCA that incorporates target information,
such that the resulting embedding could be tuned using a
mixing parameter α to improve performance in regression
tasks (:math:`\alpha = 0` corresponding to linear regression
and :math:`\alpha = 1` corresponding to PCA).
[Helfrecht2020]_ introduced the non-linear
version, Kernel Principal Covariates Regression (KPCovR),
where the mixing parameter α now interpolates between kernel ridge
regression (:math:`\alpha = 0`) and kernel principal components
analysis (KPCA, :math:`\alpha = 1`)

.. marker-pcovr-introduction-end
.. _PCovR-api:

PCovR
Expand Down
21 changes: 13 additions & 8 deletions docs/src/selection.rst
Original file line number Diff line number Diff line change
@@ -1,18 +1,22 @@
.. _selection-api:

Feature and Sample Selection
============================

`scikit-matter` contains multiple data sub-selection modules,
primarily corresponding to methods derived from CUR matrix decomposition
and Farthest Point Sampling. In their classical form, CUR and FPS determine
a data subset that maximizes the
variance (CUR) or distribution (FPS) of the features or samples. These methods
can be modified to combine supervised and unsupervised learning, in a formulation
denoted `PCov-CUR` and `PCov-FPS`.
.. marker-selection-introduction-begin
Data sub-selection modules primarily corresponding to methods derived from
CUR matrix decomposition and Farthest Point Sampling. In their classical form,
CUR and FPS determine a data subset that maximizes the variance (CUR) or
distribution (FPS) of the features or samples.
These methods can be modified to combine supervised target information denoted by the
methods `PCov-CUR` and `PCov-FPS`.
For further reading, refer to [Imbalzano2018]_ and [Cersonsky2021]_.

These selectors can be used for both feature and sample selection, with similar
instantiations. This can be executed using:
instantiations. All sub-selection methods scores each feature or sample
(without an estimator)
and chooses that with the maximum score. As an simple example

.. doctest::

Expand Down Expand Up @@ -62,6 +66,7 @@ instantiations. This can be executed using:
>>> print(Xr.shape)
(2, 3)

.. marker-selection-introduction-end
.. _CUR-api:

Expand Down

0 comments on commit c42e389

Please sign in to comment.