Skip to content

IARCbioinfo/DRMetrics

This branch is 35 commits ahead of, 9 commits behind emathian/DRMetrics:NextJournalH.

Folders and files

NameName
Last commit message
Last commit date
Feb 19, 2024
Jun 3, 2020
Aug 4, 2020
Jan 1, 2020
Oct 30, 2020
Jun 4, 2020

Repository files navigation

Dimensionality Reduction Metrics

This repository contains R functions to evaluate the quality of projections obtained after using dimensionality reduction techniques. A nextjournal notebook is associated to this repository and uses the functions described in this README file to evaluate the quality of a molecular map of lung neuroendocrine tumors produced using the UMAP algorithm.

Sequence difference view (SD) metric

SD metric calculation for one sample compute_SD

Description

This function computes the sequence difference (SD) view metric value for a single given sample (i), following the equation 3 described by Martins et al. in 2015. This dissimilarity metric compares the k-neighborhood of a given sample in two different dimensional spaces. The lower is the SD value, the better is the neighborhood preservation.

Usage

compute_SD(dist_space1,dist_space2,k)

Arguments

  • dist_space1: vector containing the distances of sample i to all samples in space1

  • dist_space2: vector containing the distances of sample i to all samples in space2

  • k: number of neighbors considered

Value

A numeric value corresponding to the SD value is returned.

SD metric calculation for all samples compute_SD_allSamples

Description

This function computes the SD metric for all samples included in the dimensionality reduction. The metric is computed to compare one or multiple comparison reduced spaces to a the reference space. The SD values are computed for several k values (number of neighbors to consider).

Usage

compute_SD_allSamples(distRef,List_projection,k_values,colnames_res_df, threads=2)

Arguments

  • distRef: vector containing the distances of sample i to all samples in the reference space

  • List_projection: list of data frames where each data frame contains the coordinates of all samples in each reduced space for which the SD metric needs to be calculated.

  • k_values: vector listing the k values corresponding to the number of neighbors considered

  • colnames_res_df: vector specifying the colnames associated to the computed SD values in the returned data frame. The vector should have the same length as List_projection

Value

  • Data frame containing a column with the samples IDs, a column correspoding to the k values, and n colunms containing the SD values, n corresponding to the number of data frames listed in List_projection.

Visualizing the SD metric in a two dimensional map SD_map_f

Description

This function allows to display, on a two dimensional projection, the samples SD values averaged over different values of k (number of neighbors considered to compute the SD metric).

Usage

SD_map_f(SD_df, Coords_df, legend_pos = "right")

Arguments

  • SD_df: a data frame resulting from the call to the function compute_SD_allSamples. The data frame contains the following columns: i) the samples IDs, ii) k values, the number of neighbors considered to compute the SD metric, and iii) the SD values

  • Coords_df: data frame containing the coordinates of each sample in the projection to use for the representation of the samples

  • legend_pos: Optional argument to define the position of the legend

Value

A list containing:

  • A data frame containing the same columns as Coords_df and a column corresponding to the averaged SD values over k.
  • The plot representing all samples in a two dimensional space. A color gradient is used to represent the SD values averaged over the k levels.

Spatial autocorrelation

Moran's Index (MI) computation moran_I_knn

Description

This function allows to compute the Moran’s Index autocorrelation coefficient for a given feature used in the dimensionality reduction technique, for different levels of the parameter k which corresponds to the number of samples to consider for the samples neighborhood definition. The MI values are computed using the Moran.I function from the R package ape.

Usage

moran_I_knn(expr_data , spatial_data, listK)

Arguments

  • expr_data: matrix containing, for each sample (in rows), the values of the features (in columns) for which the MI values will be calculated

  • spatial_data: matrix containing the coordinates of each sample in the projection used to define the samples neighborhood

  • listK: vector listing the k values corresponding to the number of samples considered to define samples neighborhood

Value

  • MI_array: 3D array containing the MI values and their associated p-values for each feature (in columns), and each k level (in rows).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.8%
  • R 0.2%