Skip to content

brettChapman/multivis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MultiVis

The MultiVis package contains the necessary tools for visualisation of multivariate data.

Installation

Dependencies

multivis requires:

  • Python (==3.11.4)
  • NumPy (==1.25.2)
  • OpenPyXL (==2.6.1)
  • Pandas (==2.1.0)
  • Matplotlib (==3.8.0)
  • Seaborn (==0.12.2)
  • Networkx (==3.1.0)
  • statsmodels (==0.14.0)
  • scikits-bootstrap (==1.1.0)
  • SciPy (==1.11.2)
  • Scikit-learn (==1.3.1)
  • tqdm (==4.66.1)
  • xlrd (==2.0.1)

User installation

The recommend way to install multivis and dependencies is to using conda:

conda install -c brett.chapman multivis

or pip:

pip install multivis

Alternatively, to install directly from github:

pip install https://github.com/brettChapman/multivis/archive/master.zip

API

For further detail on the usage refer to the docstring.

multivis

  • Edge: Builds nodes and edges and is the base class for the Network class.

    • init_parameters
      • [peaktable] : Pandas dataframe containing peak data. Must contain 'Name' and 'Label'.
      • [datatable] : Pandas dataframe matrix containing scores.
      • [pvalues] : Pandas dataframe matrix containing score/similarity pvalues (if available, otherwise set to None).
    • methods
      • [set_params] : Set parameters

        • [filter_type] : The value type to filter the data on (default: 'pvalue')
        • [hard_threshold] : Value to filter the data on (default: 0.005)
        • [withinBlocks] : Include scores within blocks if building multi-block network (default: False)
        • [sign] : The sign of the score/similarity to filter on ('pos', 'neg' or 'both') (default: 'both')
      • [help] : Print this help text

      • [build] : Builds the nodes and edges.

      • [getNodes] : Returns a Pandas dataframe of all nodes.

      • [getEdges] : Returns a Pandas dataframe of all edges.

  • Network: Builds nodes and edges, with added NetworkX functionality. Inherits from Edge.

    • init_parameters
      • [peaktable] : Pandas dataframe containing peak data. Must contain 'Name' and 'Label'.
      • [datatable] : Pandas dataframe matrix containing scores.
      • [pvalues] : Pandas dataframe matrix containing score/similarity pvalues.
    • methods
      • [set_params] : Set parameters

        • [filter_type] : The value type to filter the data on (default: 'pvalue')
        • [hard_threshold] : Value to filter the data on (default: 0.005)
        • [link_type] : The value type to represent links in the network (default: 'score')
        • [withinBlocks] : Include scores within blocks if building multi-block network (default: False)
        • [sign] : The sign of the score/similarity to filter on ('pos', 'neg' or 'both') (default: 'both')
      • [help] : Print this help text

      • [build] : Builds nodes, edges and NetworkX graph.

      • [getNetworkx] : Returns a NetworkX graph.

      • [getLinkType] : Returns the link type parameter used in building the network.

  • edgeBundle: Produces an interactive hierarchical edge bundle in D3.js, from nodes and edges.

    • init_parameters
      • [nodes] : Pandas dataframe containing nodes generated from Edge.
      • [edges] : Pandas dataframe containing edges generated from Edge.
    • methods
      • [set_params] : Set parameters

        • [html_file] : Name to save the HTML file as (default: 'hEdgeBundle.html')
        • [innerRadiusOffset] : Sets the inner radius based on the offset value from the canvas width/diameter (default: 120)
        • [blockSeparation] : Value to set the distance between different segmented blocks (default: 1)
        • [linkFadeOpacity] : The link fade opacity when hovering over/clicking nodes (default: 0.05)
        • [mouseOver] : Setting to 'True' swaps from clicking to hovering over nodes to select them (default: True)
        • [fontSize] : The font size in pixels set for each node (default: 10)
        • [backgroundColor] : Set the background colour of the plot (default: 'white')
        • [foregroundColor] : Set the foreground colour of the plot (default: 'black')
        • [node_data] : Peak Table column names to include in the mouse over information (default: 'Name' and 'Label')
        • [nodeColorScale] : The scale to use for colouring the nodes ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") (default: 'linear')
        • [node_color_column] : The Peak Table column to use for node colours (default: None sets to black)
        • [node_cmap] : Set the CMAP colour palette to use for colouring the nodes (default: 'brg')
        • [edgeColorScale] : The scale to use for colouring the edges, if edge_color_value is 'pvalue' ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") (default: 'linear')
        • [edge_color_value] : Set the values to colour the edges by. Either 'sign', 'score' or 'pvalue' (default: 'score')
        • [edge_cmap] : Set the CMAP colour palette to use for colouring the edges (default: 'brg')
        • [addArcs] : Setting to 'True' adds arcs around the edge bundle for each block (default: False)
        • [arcRadiusOffset] : Sets the arc radius offset from the inner radius (default: 20)
        • [extendArcAngle] : Sets the angle value to add to each end of the arc (default: 2)
        • [arc_cmap] : Set the CMAP colour palette to use for colouring the arcs (default: 'Set1')
      • [help] : Print this help text

      • [build] : Generates the JavaScript embedded HTML code, writes to a HTML file and opens it in a browser.

      • [buildDashboard] : Generates the JavaScript embedded HTML code in a dashboard format, writes to a HTML file and opens it in a browser.

  • plotNetwork: Produces a static spring-embedded network from a NetworkX graph.

    • init_parameters
      • [g] : NetworkX graph.
    • methods
      • [set_params] : Set parameters

        • [imageFileName] : The image file name to save to (default: 'networkPlot.jpg')
        • [edgeLabels] : Setting to 'True' labels all edges with the score/similarity value (default: True)
        • [saveImage] : Setting to 'True' will save the image to file (default: True)
        • [layout] : Set the NetworkX layout type ('circular', 'kamada_kawai', 'random', 'spring', 'spectral') (default: 'spring')
        • [transparent] : Setting to 'True' will make the background transparent (default: False)
        • [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
        • [figSize] : The figure size as a tuple (width,height) (default: (30,20))
        • [node_cmap] : The CMAP colour palette to use for nodes (default: 'brg')
        • [colorScale] : The node colour scale to apply ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") (default: 'linear')
        • [node_color_column] : The Peak Table column to use for node colours (default: None sets to black)
        • [sizeScale] : The node size scale to apply ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") (default: 'reverse_linear')
        • [size_range] : The node size scale range to apply. Tuple of length 2. Minimum size to maximum size (default: (150,2000))
        • [sizing_column] : The node sizing column to use (default: sizes all nodes to 1)
        • [alpha] : Node opacity value (default: 0.5)
        • [nodeLabels] : Setting to 'True' will label the nodes (default: True)
        • [fontSize] : The font size set for each node (default: 15)
        • [keepSingletons] : Setting to 'True' will keep any single nodes not connected by edges in the NetworkX graph (default: True)
        • [column] : Column from Peak Table to filter on (default: no filtering)
        • [threshold] : Value to filter on (default: no filtering)
        • [operator] : The comparison operator to use when filtering (default: '>')
        • [sign] : The sign of the score to filter on ('pos', 'neg' or 'both') (default: 'pos')
      • [help] : Print this help text

      • [build] : Generates and displays the NetworkX graph.

  • springNetwork: Interactive spring-embedded network which inherits data from the NetworkX graph.

    • init_parameters
      • [g] : NetworkX graph.
    • methods
      • [set_params] : Set parameters

        • [node_size_scale] : dictionary(Peak Table column name as index: dictionary('scale': ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") 'range': a number array of length 2 - minimum size to maximum size)) (default: sizes all nodes to 10 with no dropdown menu)
        • [node_color_scale] : dictionary(Peak Table column name as index: dictionary('scale': ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") (default: colours all nodes to 'black')
        • [html_file] : Name to save the HTML file as (default: 'springNetwork.html')
        • [backgroundColor] : Set the background colour of the plot (default: 'white')
        • [foregroundColor] : Set the foreground colour of the plot (default: 'black')
        • [chargeStrength] : The charge strength of the spring-embedded network (force between nodes) (default: -120)
        • [groupByBlock] : Setting to 'True' will group nodes by 'Block' if present in the data (default: False)
        • [groupFociStrength] : Set the strength of foci for each group (default: 0.2)
        • [intraGroupStrength] : Set the strength between each group (default: 0.01)
        • [groupLayoutTemplate] : Set the layout template to use for grouping (default: 'treemap')
        • [node_text_size] : The text size for each node (default: 15)
        • [fix_nodes] : Setting to 'True' will fix nodes in place when manually moved (default: False)
        • [displayLabel] : Setting to 'True' will set the node labels to the 'Label' column, otherwise it will set the labels to the 'Name' column from the Peak Table (default: False)
        • [node_data] : Peak Table column names to include in the mouse over information (default: 'Name' and 'Label')
        • [link_type] : The link type used in building the network (default: 'score')
        • [link_width] : The width of the links (default: 0.5)
        • [pos_score_color] : Colour value for positive scores. Can be HTML/CSS name, hex code, and (R,G,B) tuples (default: 'red')
        • [neg_score_color] : Colour value for negative scores. Can be HTML/CSS name, hex code, and (R,G,B) tuples (default: 'black')
      • [help] : Print this help text

      • [build] : Generates the JavaScript embedded HTML code and writes to a HTML file and opens it in a browser.

      • [buildDashboard] : Generates the JavaScript embedded HTML code in a dashboard format, writes to a HTML file and opens it in a browser.

  • clustermap: Produces a Hierarchical Clustered Heatmap.

    • init_parameters
      • [scores] : Pandas dataframe scores.
        • [row_linkage] : Precomputed linkage matrix for the rows from a linkage clustered distance/similarities matrix
        • [col_linkage] : Precomputed linkage matrix for the columns from a linkage clustered distance/similarities matrix
    • methods
      • [set_params] : Set parameters

        • [xLabels] : A Pandas Series for labelling the X axis
        • [yLabels] : A Pandas Series for labelling the Y axis
        • [imageFileName] : The image file name to save to (default: 'clusterMap.png')
        • [saveImage] : Setting to 'True' will save the image to file (default: True)
        • [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
        • [figSize] : The figure size as a tuple (width,height) (default: (80,70))
        • [dendrogram_ratio_shift] : The ratio to shift the position of the dendrogram in relation to the heatmap (default: 0.0)
        • [dendrogram_line_width] : The line width of the dendrograms (default: 1.5)
        • [background_colour] : Set the background colour (default: 'white')
        • [transparent] : Setting to 'True' will ignore background_colour and make the background transparent (default: False)
        • [fontSize] : The font size for all text (default: 30)
        • [heatmap_annotation] : Annotate the heatmap with values (default: False)
        • [heatmap_cmap] : The CMAP colour palette to use for the heatmap (default: 'RdYlGn')
        • [cluster_cmap] : The CMAP colour palette to use for the branch separation of clusters in the dendrogram (default: 'Set1')
        • [rowColorCluster] : Setting to 'True' will display a colour bar for the clustered rows (default: False)
        • [colColorCluster] : Setting to 'True' will display a colour bar for the clustered columns (default: False)
        • [row_color_threshold] : The colouring threshold for the row dendrogram (default: 1)
        • [col_color_threshold] : The colouring threshold for the column dendrogram (default: 1)
      • [help] : Print this help text

      • [build] : Generates and displays the Hierarchical Clustered Heatmap (HCH).

  • plotFeatures: Produces different types of feature plots

    • init_parameters
      • [peaktable] : Pandas dataframe containing peak data. Must contain 'Name' and 'Label'.
      • [datatable] : Pandas dataframe containing matrix of values to plot (N samples x N features). Columns/features must be same as 'Name' from Peak Table.
    • methods
      • set_params : Set parameters

        • [plot_type] : The type of plot. Either "point", "violin", "box", "swarm", "violin-swarm" or "box-swarm" (default: 'point') - [column_numbers] : The number of columns to display in the plots (default: 4)
          - [log_data] : Perform a log ('natural', base 2 or base 10) on all data (default: (True, 2)) - [scale_data] : Scale the data ('standard' (centers to the mean and scales to unit variance), 'minmax' (scales between 0 and 1), 'maxabs' (scales to the absolute maximum value), 'robust' (centers to the median and scales to between 25th and 75th quantile range) (default: (True, 'minmax')) - [impute_data] : Impute any missing values using KNN impute with a set number of nearest neighbours (default: (True, 3)) - [style] : Set the seaborn style (default: 'seaborn-v0_8-white') - [transparent] : Setting to 'True' will make the background transparent (default: False)
          - [figSize] : The figure size as a tuple (width,height) (default: (15,10)) - [fontSize] : The font size for all text (default: 12) - [colour_palette] : The colour palette to use for the plot (default: None) - [y_axis_label] : The label to customise the y axis (default: None) - [x_axis_rotation] : Rotate the x axis labels this number of degrees (default: 0) - [group_column_name] : The group column name used in the datatable (e.g. 'Class') (default: None)
          - [point_estimator] : The statistical function to use for the point plot. Either "mean" or "median" (default: 'mean') - [point_ci] : The bootstrapped confidence interval for the point plot. Can also be standard deviation ("sd") (default: 95) - [violin_distribution_type] : The representation of the distribution of data points within the violin plot. Either "quartile", "box", "point", "stick" or None (default: 'box') - [violin_width_scale] : The method used to scale the width of the violin plot. Either "area", "count" or "width" (default: "width") - [box_iqr] : The proportion past the lower and upper quartiles to extend the plot whiskers for the box plot. Points outside this range will be identified as outliers (default: 1.5) - [saveImage] : Setting to 'True' will save the image to file (default: True) - [imageFileName] : The image file name to save to (default: [plot_type]_features.png')
          - [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
      • [help] : Print this help text

      • [plot] : Generates feature plots.

  • polarDendrogram: Polar dendrogram

    • init_parameters
      • [dn] : Dendrogram dictionary labelled by Peak Table index
    • methods
      • set_params : Set parameters

        • [imageFileName] : The image file name to save to (default: 'polarDendrogram.png')
        • [saveImage] : Setting to 'True' will save the image to file (default: True)
        • [branch_scale] : The branch distance scale to apply ('linear', 'log', 'square') (default: 'linear')
        • [gap] : The gap size within the polar dendrogram (default: 0.1)
        • [grid] : Setting to 'True' will overlay a grid (default: False)
        • [style] : Set the seaborn style (default: 'seaborn-v0_8-white')
        • [transparent] : Setting to 'True' will make the background of all plots transparent (default: False)
        • [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
        • [figSize] : The figure size as a tuple (width,height) (default: (10,10))
        • [fontSize] : The font size for all text (default: 15)
        • [PeakTable] : The Peak Table Pandas dataframe (default: empty dataframe)
        • [DataTable] : The Data Table Pandas dataframe (default: empty dataframe)
        • [group_column_name] : The group column name used in the datatable (e.g. 'Class') (default: None)
        • [textColorScale] : The scale to use for colouring the text ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") (default: 'linear')
        • [text_color_column] : The colour column to use from Peak Table (Can be colour or numerical values such as 'pvalue') (default: 'black')
        • [label_column] : The label column to use from Peak Table (default: use original Peak Table index from cartesian dendrogram)
        • [text_cmap] : The CMAP colour palette to use (default: 'brg')
      • [plotClusters] : Aggregates peaks from each cluster of the polar dendrogram and generates different feature plots across the group/class variables.

        • [plot_type] : The type of plot. Either "point", "violin", "box", "swarm", "violin-swarm" or "box-swarm" (default: 'point')
        • [column_numbers] : The number of columns to display in the plots (default: 4) - [log_data] : Perform a log ('natural', base 2 or base 10) on all data (default: (True, 2)) - [scale_data] : Scale the data ('standard' (centers to the mean and scales to unit variance), 'minmax' (scales between 0 and 1), 'maxabs' (scales to the absolute maximum value), 'robust' (centres to the median and scales to between 25th and 75th quantile range) (default: (True, 'minmax')) - [impute_data] : Impute any missing values using KNN impute with a set number of nearest neighbours (default: (True, 3)) - [figSize] : The figure size as a tuple (width,height) (default: (15,10)) - [fontSize] : The font size for all text (default: 12)
        • [colour_palette] : The colour palette to use for the plot (default: None)
        • [y_axis_label] : The label to customise the y axis (default: None)
        • [x_axis_rotation] : Rotate the x axis labels this number of degrees (default: 0) - [point_estimator] : The statistical function to use for the point plot. Either "mean" or "median" (default: 'mean')
        • [point_ci] : The bootstrapped confidence interval for the point plot. Can also be standard deviation ("sd") (default: 95) - [violin_distribution_type] : The representation of the distribution of data points within the violin plot. Either "quartile", "box", "point", "stick" or None (default: 'box') - [violin_width_scale] : The method used to scale the width of the violin plot. Either "area", "count" or "width" (default: "width") - [box_iqr] : The proportion past the lower and upper quartiles to extend the plot whiskers for the box plot. Points outside this range will be identified as outliers (default: 1.5)
        • [saveImage] : Setting to 'True' will save the image to file (default: True) - [imageFileName] : The image file name to save to (default: '[plot_type]_clusterPlots.png') - [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
      • [help] : Print this help text

      • [build] : Generates and displays the Polar dendrogram.

  • pca: Creates a Principal Component Analysis (PCA) scores and loadings biplot.

    • parameters
      • [data] : array-like matrix, shape (n_samples, n_features)
      • [imageFileName] : The image file name to save to (default: 'PCA.png')
      • [saveImage] : Setting to 'True' will save the image to file (default: True)
      • [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
      • [pcx] : The first component (default: 1)
      • [pcy] : The second component (default: 2)
      • [group_label] : Labels to assign to each group/class in the PCA plot (default: None)
      • [sample_label] : Labels to assign to each sample in the PCA plot (default: None)
      • [peak_label] : Labels to assign to each peak in the loadings biplot (default: None)
      • [markerSize] : The size of each marker (default: 100)
      • [fontSize] : The font size for all text (default: 12)
      • [figSize] : The figure size as a tuple (width,height) (default: (20,10))
      • [background_colour] : Set the background colour (default: 'white')
      • [grid] : Setting to 'True' will overlay a grid (default: True)
      • [transparent] : Setting to 'True' will ignore background_colour and make the background transparent (default: False)
      • [cmap] : The CMAP colour palette to use (default: 'Set1')
  • pcaLoadings: Creates a lollipop plot of PCA components with bootstrapped confidence intervals.

    • parameters
      • [data] : array-like, shape (n_samples, n_features)
      • [peak_label] : A list of peaks to plot
      • [imageFileName] : The image file name to save to (default: 'PCA_loadings.png')
      • [saveImage] : Setting to 'True' will save the image to file (default: True)
        • [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
        • [pc_num] : The principal component to plot (default: 1)
        • [boot_num] : The number of bootstrap samples to use to calculate confidence internals (default: 500)
        • [alpha] : The alpha value for the bootstrapped confidence intervals (default: 0.05)
        • [fontSize] : The font size for all text (default: 30)
        • [markerSize] : The size of each marker (default: 100)
        • [figSize] : The figure size as a tuple (width,height) (default: (40,40))
        • [transparent] : Setting to 'True' will make the background transparent (default: False)
  • pcoa: Creates a Principal Coordinate Analysis (PCoA) plot.

    • parameters
      • [similarities] : array-like matrix, shape (n_samples, n_features)
      • [imageFileName] : The image file name to save to (default: 'PCOA.png')
      • [saveImage] : Setting to 'True' will save the image to file (default: True)
      • [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
      • [n_components] : Number of components (default: 2)
      • [max_iter] : Maximum number of iterations of the SMACOF algorithm (default: 300)
      • [eps] : Relative tolerance with respect to stress at which to declare convergence (default: 1e-3)
      • [seed] : Seed number used by the random number generator for the RandomState instance (default: 3)
      • [group_label] : Labels to assign to each group/class (default: None)
      • [peak_label] : Labels to assign to each peak (default: None)
      • [markerSize] : The size of each marker (default: 100)
      • [fontSize] : The font size for all text (default: 12)
      • [figSize] : The figure size as a tuple (width,height) (default: (20,10))
      • [background_colour] : Set the background colour (default: 'white')
      • [grid] : Setting to 'True' will overlay a grid (default: True)
      • [transparent] : Setting to 'True' will ignore background_colour and make the background transparent (default: False)
      • [cmap] : The CMAP colour palette to use (default: 'Set1')

multivis.utils

  • loadData: Loads and validates the Data and Peak sheet from an excel file.

    • parameters
      • [filename] : The name of the excel file (.xlsx file) e.g. 'Data.xlsx'.
      • [DataSheet] : The name of the data sheet in the file e.g. 'Data'. The data sheet must contain an 'Idx', 'SampleID', and 'Class' column.
      • [PeakSheet] : The name of the peak sheet in the file e.g. 'Peak'. The peak sheet must contain an 'Idx', 'Name', and 'Label' column.
    • Returns
      • DataTable: Pandas dataFrame
      • PeakTable: Pandas dataFrame
  • groups2blocks: Slices the data by group/class name into blocks for later identification of multi-block associations and places the data into a dictionary indexed by group/class name.

    • parameters
      • [PeakTable] : Pandas dataframe containing the feature/peak data. Must contain 'Name' and 'Label'.
      • [DataTable] : Pandas dataframe matrix containing values. The data must contain a column separating out the different groups in the data (e.g. Class)
      • [group_column_name] : The group column name used in the datatable (e.g. Class)
    • Returns
      • [DataBlocks] : A dictionary containing DataTables indexed by group names
      • [PeakBlocks] : A dictionary containing PeakTables indexed by group names
  • mergeBlocks: Merges multiply different Data Tables and Peak Tables from dictionaries into a single Peak Table and Data Table (used for multi-block/multi-omics data preparation). The 'Name' column needs to be unique across all blocks. Automatically annotates the merged Peak Table with a 'Block' column and consolidates any statistical results generated from the multivis.utils.statistics package in relation to each block.

    • parameters
      • [peak_blocks] : A dictionary of Pandas Peak Table dataframes from different datasets indexed by dataset type.
      • [data_blocks] : A dictionary of Pandas Data Table dataframes from different datasets indexed by dataset type.
      • [mergeType] : The type of merging to perform. Either by 'SampleID' or 'Index'.
    • Returns
      • [DataTable] : Merged Pandas dataFrame
      • [PeakTable] : Merged Pandas dataFrame (with any statistical results generated by multivis.utils.statistics consolidated into each block)
  • transform: Scales and transforms data in forward or reverse order based on different transform options.

    • parameters
      • [data] : A 1D numpy array of values
      • [transform_type] : The transform type to apply to the data ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal")
      • [min] : The minimum value for scaling
      • [max] : The maximum value for scaling
    • Returns
      • [transformed_data] : A scaled and transformed 1D numpy array
  • scaler: Scales a series of values in a 1D numpy array or pandas dataframe matrix based on different scaling functions

    • parameters

      • [data] : A pandas dataframe matrix or 1D numpy array of numerical values
      • [type] : The scaler type to apply based on sklearn preprocessing functions (default: "standard")
      • [stdScaler_with_mean] : Using "standard" scaler, center the data to the mean before scaling (default: True)
        • [stdScaler_with_std] : Using "standard" scaler, scale the data to unit variance (default: True)
        • [robust_with_centering] : Using "robust" scaler, center the data to the median before scaling (default: True)
        • [robust_with_scaling] : Using "robust" scaler, scale the data to within the quantile range (default: True)
        • [robust_unit_variance] : Using "robust" scaler, scale the data so that normally distributed features have a variance of 1 (default: False)
        • [minimum] : Using "minmax" scaler, set the minimum value for scaling (default: 0)
        • [maximum] : Using "minmax" scaler, set the maximum value for scaling (default: 1)
        • [lower_iqr] : Using "robust" scaler, set the lower quantile range (default: 25.0)
        • [upper_iqr] : Using "robust" scaler, set the upper quantile range (default: 75.0)
    • Returns

      • [scaled_data] : A scaled pandas dataframe matrix or 1D numpy array of numerical values
  • imputeData: Imputes data given a pandas dataframe of values

    • parameters
      • [data] : A pandas dataframe of values
      • [k] : The number of nearest neighbours
    • Returns
      • [data_filled] : Imputed data
  • statistics: Generate a table of parametric or non-parametric statistics and merges them with the Peak Table (node table).

    • init_parameters
      • [peaktable] : Pandas dataframe containing peak data. Must contain 'Name' and 'Label'. - [datatable] : Pandas dataframe matrix containing values for statistical analysis
    • methods
      • [set_params] : Set parameters

        • [parametric] : Perform parametric statistical analysis, assuming the data is normally distributed (default: True) - [log_data] : Perform a log ('natural', base 2 or base 10) on all data prior to statistical analysis (default: (False, 2)) - [scale_data] : Scale the data ('standard' (centers to the mean and scales to unit variance), 'minmax' (scales between 0 and 1), 'maxabs' (scales to the absolute maximum value), 'robust' (centers to the median and scales to between 25th and 75th quantile range) (default: (True, 'standard')) - [impute_data] : Impute any missing values using KNN impute with a set number of nearest neighbours (default: (False, 3))
        • [group_column_name] : The group column name used in the datatable (default: None) - [control_group_name] : The control group name in the datatable, if available (default: None) - [group_alpha_CI] : The alpha value for group confidence intervals (default: 0.05) - [fold_change_alpha_CI] : The alpha value for mean/median fold change confidence intervals (default: 0.05) - [pca_alpha_CI] : The alpha value for the PCA confidence intervals (default: 0.05) - [total_missing] : Calculate the total missing values per feature (Default: False) - [group_missing] : Calculate the missing values per feature per group (if group_column_name not None) (Default: False) - [pca_loadings] : Calculate PC1 and PC2 loadings for each feature (Default: True) - [normality_test] : Determine normal distribution across whole dataset using Shapiro-Wilk test (pvalues < 0.05 ~ non-normal distribution) (default: True) - [group_normality_test] : Determine normal distribution across each group (if group_column_name not None) using Shapiro-Wilk test (pvalues < 0.05 ~ non-normal distribution) (default: True) - [group_mean_CI] : Determine the mean with bootstrapped CI across each group (if parametric = True and group_column_name not None) (default: True) - [group_median_CI] : Determine the median with bootstrapped CI across each group (if parametric = False and group_column_name not None) (default: True) - [mean_fold_change] : Calculate the mean fold change with bootstrapped confidence intervals (if parametric = True, group_column_name not None and control_group_name not None) (default: False)
        • [median_fold_change] : Calculate the median fold change with bootstrapped confidence intervals (if parametric = False, group_column_name not None and control_group_name not None) (default: False) - [levene_twoGroup] : Test null hypothesis that control group and each of the other groups come from populations with equal variances (if group_column_name not None and control_group_name not None) (default: False) - [levene_allGroup] : Test null hypothesis that all groups come from populations with equal variances (if group_column_name not None) (default: False) - [oneway_Anova_test] : Test null hypothesis that all groups have the same population mean, with included Benjamini-Hochberg FDR (if parametric = True and group_column_name not None) (default: False) - [kruskal_wallis_test] : Test null hypothesis that population median of all groups are equal, with included Benjamini-Hochberg FDR (if parametric = False and group_column_name not None) (default: False) - [ttest_oneGroup] : Calculate the T-test for the mean across all the data (one group), with included Benjamini-Hochberg FDR (if parametric = True, group_column_name is None or there is only 1 group in the data) (default: False) - [ttest_twoGroup] : Calculate the T-test for the mean of two groups, with one group being the control group, with included Benjamini-Hochberg FDR (if parametric = True, group_column_name not None and control_group_name not None) (default: False) - [mann_whitney_u_test] : Compute the Mann-Whitney rank test on two groups, with one being the control group, with included Benjamini-Hochberg FDR (if parametric = False, group_column_name not None and control_group_name not None) (default: False)
      • [help] : Print this help text

      • [calculate] : Performs the statistical calculations and outputs the Peak Table (node table) with the results appended.

  • corrAnalysis: Correlation analysis on a matrix of values with Pearson, Spearman or Kendall's Tau.

    • parameters
      • [df_data] : A Pandas dataframe matrix of values
      • [correlationType] : The correlation type to apply. Either 'Pearson', 'Spearman' or 'KendallTau'
    • Returns
      • [df_corr] : Pandas dataframe matrix of all correlation coefficients
      • [df_pval] : Pandas dataframe matrix of all correlation pvalues
  • cluster: Clusters data using a linkage cluster method. If the data is correlated the correlations are first preprocessed, then clustered, otherwise a distance metric is applied to non-correlated data before clustering.

    • parameters
      • [matrix] : A Pandas dataframe matrix of scores
      • [transpose_non_correlated] : Setting to 'True' will transpose the matrix if it is not correlated data
      • [is_correlated] : Setting to 'True' will treat the matrix as if it contains correlation coefficients
      • [distance_metric] : Set the distance metric. Used if the matrix does not contain correlation coefficients.
      • [linkage_method] : Set the linkage method for the clustering.
    • Returns
      • [matrix] : The original matrix, transposed if transpose_non_correlated is 'True' and is_correlated is 'False'.
      • [row_linkage] : linkage matrix for the rows from a linkage clustered distance/similarities matrix
      • [col_linkage] : linkage matrix for the columns from a linkage clustered distance/similarities matrix

License

Multivis is licensed under the MIT license.

Authors

Correspondence

Dr. Brett Chapman, Post-doctoral Research Fellow at the Western Crop Genetics Alliance, Murdoch University. E-mail: [email protected], [email protected]

Citation

If you would like to cite MultiVis in a scientific publication, please cite this GitHub page until a citation to a publication becomes available.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages