The MultiVis package contains the necessary tools for visualisation of multivariate data.
multivis requires:
- Python (==3.11.4)
- NumPy (==1.25.2)
- OpenPyXL (==2.6.1)
- Pandas (==2.1.0)
- Matplotlib (==3.8.0)
- Seaborn (==0.12.2)
- Networkx (==3.1.0)
- statsmodels (==0.14.0)
- scikits-bootstrap (==1.1.0)
- SciPy (==1.11.2)
- Scikit-learn (==1.3.1)
- tqdm (==4.66.1)
- xlrd (==2.0.1)
The recommend way to install multivis and dependencies is to using conda
:
conda install -c brett.chapman multivis
or pip
:
pip install multivis
Alternatively, to install directly from github:
pip install https://github.com/brettChapman/multivis/archive/master.zip
For further detail on the usage refer to the docstring.
-
Edge: Builds nodes and edges and is the base class for the Network class.
- init_parameters
- [peaktable] : Pandas dataframe containing peak data. Must contain 'Name' and 'Label'.
- [datatable] : Pandas dataframe matrix containing scores.
- [pvalues] : Pandas dataframe matrix containing score/similarity pvalues (if available, otherwise set to None).
- methods
-
[set_params] : Set parameters
- [filter_type] : The value type to filter the data on (default: 'pvalue')
- [hard_threshold] : Value to filter the data on (default: 0.005)
- [withinBlocks] : Include scores within blocks if building multi-block network (default: False)
- [sign] : The sign of the score/similarity to filter on ('pos', 'neg' or 'both') (default: 'both')
-
[help] : Print this help text
-
[build] : Builds the nodes and edges.
-
[getNodes] : Returns a Pandas dataframe of all nodes.
-
[getEdges] : Returns a Pandas dataframe of all edges.
-
- init_parameters
-
Network: Builds nodes and edges, with added NetworkX functionality. Inherits from Edge.
- init_parameters
- [peaktable] : Pandas dataframe containing peak data. Must contain 'Name' and 'Label'.
- [datatable] : Pandas dataframe matrix containing scores.
- [pvalues] : Pandas dataframe matrix containing score/similarity pvalues.
- methods
-
[set_params] : Set parameters
- [filter_type] : The value type to filter the data on (default: 'pvalue')
- [hard_threshold] : Value to filter the data on (default: 0.005)
- [link_type] : The value type to represent links in the network (default: 'score')
- [withinBlocks] : Include scores within blocks if building multi-block network (default: False)
- [sign] : The sign of the score/similarity to filter on ('pos', 'neg' or 'both') (default: 'both')
-
[help] : Print this help text
-
[build] : Builds nodes, edges and NetworkX graph.
-
[getNetworkx] : Returns a NetworkX graph.
-
[getLinkType] : Returns the link type parameter used in building the network.
-
- init_parameters
-
edgeBundle: Produces an interactive hierarchical edge bundle in D3.js, from nodes and edges.
- init_parameters
- [nodes] : Pandas dataframe containing nodes generated from Edge.
- [edges] : Pandas dataframe containing edges generated from Edge.
- methods
-
[set_params] : Set parameters
- [html_file] : Name to save the HTML file as (default: 'hEdgeBundle.html')
- [innerRadiusOffset] : Sets the inner radius based on the offset value from the canvas width/diameter (default: 120)
- [blockSeparation] : Value to set the distance between different segmented blocks (default: 1)
- [linkFadeOpacity] : The link fade opacity when hovering over/clicking nodes (default: 0.05)
- [mouseOver] : Setting to 'True' swaps from clicking to hovering over nodes to select them (default: True)
- [fontSize] : The font size in pixels set for each node (default: 10)
- [backgroundColor] : Set the background colour of the plot (default: 'white')
- [foregroundColor] : Set the foreground colour of the plot (default: 'black')
- [node_data] : Peak Table column names to include in the mouse over information (default: 'Name' and 'Label')
- [nodeColorScale] : The scale to use for colouring the nodes ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") (default: 'linear')
- [node_color_column] : The Peak Table column to use for node colours (default: None sets to black)
- [node_cmap] : Set the CMAP colour palette to use for colouring the nodes (default: 'brg')
- [edgeColorScale] : The scale to use for colouring the edges, if edge_color_value is 'pvalue' ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") (default: 'linear')
- [edge_color_value] : Set the values to colour the edges by. Either 'sign', 'score' or 'pvalue' (default: 'score')
- [edge_cmap] : Set the CMAP colour palette to use for colouring the edges (default: 'brg')
- [addArcs] : Setting to 'True' adds arcs around the edge bundle for each block (default: False)
- [arcRadiusOffset] : Sets the arc radius offset from the inner radius (default: 20)
- [extendArcAngle] : Sets the angle value to add to each end of the arc (default: 2)
- [arc_cmap] : Set the CMAP colour palette to use for colouring the arcs (default: 'Set1')
-
[help] : Print this help text
-
[build] : Generates the JavaScript embedded HTML code, writes to a HTML file and opens it in a browser.
-
[buildDashboard] : Generates the JavaScript embedded HTML code in a dashboard format, writes to a HTML file and opens it in a browser.
-
- init_parameters
-
plotNetwork: Produces a static spring-embedded network from a NetworkX graph.
- init_parameters
- [g] : NetworkX graph.
- methods
-
[set_params] : Set parameters
- [imageFileName] : The image file name to save to (default: 'networkPlot.jpg')
- [edgeLabels] : Setting to 'True' labels all edges with the score/similarity value (default: True)
- [saveImage] : Setting to 'True' will save the image to file (default: True)
- [layout] : Set the NetworkX layout type ('circular', 'kamada_kawai', 'random', 'spring', 'spectral') (default: 'spring')
- [transparent] : Setting to 'True' will make the background transparent (default: False)
- [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
- [figSize] : The figure size as a tuple (width,height) (default: (30,20))
- [node_cmap] : The CMAP colour palette to use for nodes (default: 'brg')
- [colorScale] : The node colour scale to apply ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") (default: 'linear')
- [node_color_column] : The Peak Table column to use for node colours (default: None sets to black)
- [sizeScale] : The node size scale to apply ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") (default: 'reverse_linear')
- [size_range] : The node size scale range to apply. Tuple of length 2. Minimum size to maximum size (default: (150,2000))
- [sizing_column] : The node sizing column to use (default: sizes all nodes to 1)
- [alpha] : Node opacity value (default: 0.5)
- [nodeLabels] : Setting to 'True' will label the nodes (default: True)
- [fontSize] : The font size set for each node (default: 15)
- [keepSingletons] : Setting to 'True' will keep any single nodes not connected by edges in the NetworkX graph (default: True)
- [column] : Column from Peak Table to filter on (default: no filtering)
- [threshold] : Value to filter on (default: no filtering)
- [operator] : The comparison operator to use when filtering (default: '>')
- [sign] : The sign of the score to filter on ('pos', 'neg' or 'both') (default: 'pos')
-
[help] : Print this help text
-
[build] : Generates and displays the NetworkX graph.
-
- init_parameters
-
springNetwork: Interactive spring-embedded network which inherits data from the NetworkX graph.
- init_parameters
- [g] : NetworkX graph.
- methods
-
[set_params] : Set parameters
- [node_size_scale] : dictionary(Peak Table column name as index: dictionary('scale': ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") 'range': a number array of length 2 - minimum size to maximum size)) (default: sizes all nodes to 10 with no dropdown menu)
- [node_color_scale] : dictionary(Peak Table column name as index: dictionary('scale': ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") (default: colours all nodes to 'black')
- [html_file] : Name to save the HTML file as (default: 'springNetwork.html')
- [backgroundColor] : Set the background colour of the plot (default: 'white')
- [foregroundColor] : Set the foreground colour of the plot (default: 'black')
- [chargeStrength] : The charge strength of the spring-embedded network (force between nodes) (default: -120)
- [groupByBlock] : Setting to 'True' will group nodes by 'Block' if present in the data (default: False)
- [groupFociStrength] : Set the strength of foci for each group (default: 0.2)
- [intraGroupStrength] : Set the strength between each group (default: 0.01)
- [groupLayoutTemplate] : Set the layout template to use for grouping (default: 'treemap')
- [node_text_size] : The text size for each node (default: 15)
- [fix_nodes] : Setting to 'True' will fix nodes in place when manually moved (default: False)
- [displayLabel] : Setting to 'True' will set the node labels to the 'Label' column, otherwise it will set the labels to the 'Name' column from the Peak Table (default: False)
- [node_data] : Peak Table column names to include in the mouse over information (default: 'Name' and 'Label')
- [link_type] : The link type used in building the network (default: 'score')
- [link_width] : The width of the links (default: 0.5)
- [pos_score_color] : Colour value for positive scores. Can be HTML/CSS name, hex code, and (R,G,B) tuples (default: 'red')
- [neg_score_color] : Colour value for negative scores. Can be HTML/CSS name, hex code, and (R,G,B) tuples (default: 'black')
-
[help] : Print this help text
-
[build] : Generates the JavaScript embedded HTML code and writes to a HTML file and opens it in a browser.
-
[buildDashboard] : Generates the JavaScript embedded HTML code in a dashboard format, writes to a HTML file and opens it in a browser.
-
- init_parameters
-
clustermap: Produces a Hierarchical Clustered Heatmap.
- init_parameters
- [scores] : Pandas dataframe scores.
- [row_linkage] : Precomputed linkage matrix for the rows from a linkage clustered distance/similarities matrix
- [col_linkage] : Precomputed linkage matrix for the columns from a linkage clustered distance/similarities matrix
- [scores] : Pandas dataframe scores.
- methods
-
[set_params] : Set parameters
- [xLabels] : A Pandas Series for labelling the X axis
- [yLabels] : A Pandas Series for labelling the Y axis
- [imageFileName] : The image file name to save to (default: 'clusterMap.png')
- [saveImage] : Setting to 'True' will save the image to file (default: True)
- [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
- [figSize] : The figure size as a tuple (width,height) (default: (80,70))
- [dendrogram_ratio_shift] : The ratio to shift the position of the dendrogram in relation to the heatmap (default: 0.0)
- [dendrogram_line_width] : The line width of the dendrograms (default: 1.5)
- [background_colour] : Set the background colour (default: 'white')
- [transparent] : Setting to 'True' will ignore background_colour and make the background transparent (default: False)
- [fontSize] : The font size for all text (default: 30)
- [heatmap_annotation] : Annotate the heatmap with values (default: False)
- [heatmap_cmap] : The CMAP colour palette to use for the heatmap (default: 'RdYlGn')
- [cluster_cmap] : The CMAP colour palette to use for the branch separation of clusters in the dendrogram (default: 'Set1')
- [rowColorCluster] : Setting to 'True' will display a colour bar for the clustered rows (default: False)
- [colColorCluster] : Setting to 'True' will display a colour bar for the clustered columns (default: False)
- [row_color_threshold] : The colouring threshold for the row dendrogram (default: 1)
- [col_color_threshold] : The colouring threshold for the column dendrogram (default: 1)
-
[help] : Print this help text
-
[build] : Generates and displays the Hierarchical Clustered Heatmap (HCH).
-
- init_parameters
-
plotFeatures: Produces different types of feature plots
- init_parameters
- [peaktable] : Pandas dataframe containing peak data. Must contain 'Name' and 'Label'.
- [datatable] : Pandas dataframe containing matrix of values to plot (N samples x N features). Columns/features must be same as 'Name' from Peak Table.
- methods
-
set_params : Set parameters
- [plot_type] : The type of plot. Either "point", "violin", "box", "swarm", "violin-swarm" or "box-swarm" (default: 'point')
- [column_numbers] : The number of columns to display in the plots (default: 4)
- [log_data] : Perform a log ('natural', base 2 or base 10) on all data (default: (True, 2)) - [scale_data] : Scale the data ('standard' (centers to the mean and scales to unit variance), 'minmax' (scales between 0 and 1), 'maxabs' (scales to the absolute maximum value), 'robust' (centers to the median and scales to between 25th and 75th quantile range) (default: (True, 'minmax')) - [impute_data] : Impute any missing values using KNN impute with a set number of nearest neighbours (default: (True, 3)) - [style] : Set the seaborn style (default: 'seaborn-v0_8-white') - [transparent] : Setting to 'True' will make the background transparent (default: False)
- [figSize] : The figure size as a tuple (width,height) (default: (15,10)) - [fontSize] : The font size for all text (default: 12) - [colour_palette] : The colour palette to use for the plot (default: None) - [y_axis_label] : The label to customise the y axis (default: None) - [x_axis_rotation] : Rotate the x axis labels this number of degrees (default: 0) - [group_column_name] : The group column name used in the datatable (e.g. 'Class') (default: None)
- [point_estimator] : The statistical function to use for the point plot. Either "mean" or "median" (default: 'mean') - [point_ci] : The bootstrapped confidence interval for the point plot. Can also be standard deviation ("sd") (default: 95) - [violin_distribution_type] : The representation of the distribution of data points within the violin plot. Either "quartile", "box", "point", "stick" or None (default: 'box') - [violin_width_scale] : The method used to scale the width of the violin plot. Either "area", "count" or "width" (default: "width") - [box_iqr] : The proportion past the lower and upper quartiles to extend the plot whiskers for the box plot. Points outside this range will be identified as outliers (default: 1.5) - [saveImage] : Setting to 'True' will save the image to file (default: True) - [imageFileName] : The image file name to save to (default: [plot_type]_features.png')
- [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
- [plot_type] : The type of plot. Either "point", "violin", "box", "swarm", "violin-swarm" or "box-swarm" (default: 'point')
- [column_numbers] : The number of columns to display in the plots (default: 4)
-
[help] : Print this help text
-
[plot] : Generates feature plots.
-
- init_parameters
-
polarDendrogram: Polar dendrogram
- init_parameters
- [dn] : Dendrogram dictionary labelled by Peak Table index
- methods
-
set_params : Set parameters
- [imageFileName] : The image file name to save to (default: 'polarDendrogram.png')
- [saveImage] : Setting to 'True' will save the image to file (default: True)
- [branch_scale] : The branch distance scale to apply ('linear', 'log', 'square') (default: 'linear')
- [gap] : The gap size within the polar dendrogram (default: 0.1)
- [grid] : Setting to 'True' will overlay a grid (default: False)
- [style] : Set the seaborn style (default: 'seaborn-v0_8-white')
- [transparent] : Setting to 'True' will make the background of all plots transparent (default: False)
- [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
- [figSize] : The figure size as a tuple (width,height) (default: (10,10))
- [fontSize] : The font size for all text (default: 15)
- [PeakTable] : The Peak Table Pandas dataframe (default: empty dataframe)
- [DataTable] : The Data Table Pandas dataframe (default: empty dataframe)
- [group_column_name] : The group column name used in the datatable (e.g. 'Class') (default: None)
- [textColorScale] : The scale to use for colouring the text ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal") (default: 'linear')
- [text_color_column] : The colour column to use from Peak Table (Can be colour or numerical values such as 'pvalue') (default: 'black')
- [label_column] : The label column to use from Peak Table (default: use original Peak Table index from cartesian dendrogram)
- [text_cmap] : The CMAP colour palette to use (default: 'brg')
-
[plotClusters] : Aggregates peaks from each cluster of the polar dendrogram and generates different feature plots across the group/class variables.
- [plot_type] : The type of plot. Either "point", "violin", "box", "swarm", "violin-swarm" or "box-swarm" (default: 'point')
- [column_numbers] : The number of columns to display in the plots (default: 4) - [log_data] : Perform a log ('natural', base 2 or base 10) on all data (default: (True, 2)) - [scale_data] : Scale the data ('standard' (centers to the mean and scales to unit variance), 'minmax' (scales between 0 and 1), 'maxabs' (scales to the absolute maximum value), 'robust' (centres to the median and scales to between 25th and 75th quantile range) (default: (True, 'minmax')) - [impute_data] : Impute any missing values using KNN impute with a set number of nearest neighbours (default: (True, 3)) - [figSize] : The figure size as a tuple (width,height) (default: (15,10)) - [fontSize] : The font size for all text (default: 12)
- [colour_palette] : The colour palette to use for the plot (default: None)
- [y_axis_label] : The label to customise the y axis (default: None)
- [x_axis_rotation] : Rotate the x axis labels this number of degrees (default: 0) - [point_estimator] : The statistical function to use for the point plot. Either "mean" or "median" (default: 'mean')
- [point_ci] : The bootstrapped confidence interval for the point plot. Can also be standard deviation ("sd") (default: 95) - [violin_distribution_type] : The representation of the distribution of data points within the violin plot. Either "quartile", "box", "point", "stick" or None (default: 'box') - [violin_width_scale] : The method used to scale the width of the violin plot. Either "area", "count" or "width" (default: "width") - [box_iqr] : The proportion past the lower and upper quartiles to extend the plot whiskers for the box plot. Points outside this range will be identified as outliers (default: 1.5)
- [saveImage] : Setting to 'True' will save the image to file (default: True) - [imageFileName] : The image file name to save to (default: '[plot_type]_clusterPlots.png') - [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
-
[help] : Print this help text
-
[build] : Generates and displays the Polar dendrogram.
-
- init_parameters
-
pca: Creates a Principal Component Analysis (PCA) scores and loadings biplot.
- parameters
- [data] : array-like matrix, shape (n_samples, n_features)
- [imageFileName] : The image file name to save to (default: 'PCA.png')
- [saveImage] : Setting to 'True' will save the image to file (default: True)
- [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
- [pcx] : The first component (default: 1)
- [pcy] : The second component (default: 2)
- [group_label] : Labels to assign to each group/class in the PCA plot (default: None)
- [sample_label] : Labels to assign to each sample in the PCA plot (default: None)
- [peak_label] : Labels to assign to each peak in the loadings biplot (default: None)
- [markerSize] : The size of each marker (default: 100)
- [fontSize] : The font size for all text (default: 12)
- [figSize] : The figure size as a tuple (width,height) (default: (20,10))
- [background_colour] : Set the background colour (default: 'white')
- [grid] : Setting to 'True' will overlay a grid (default: True)
- [transparent] : Setting to 'True' will ignore background_colour and make the background transparent (default: False)
- [cmap] : The CMAP colour palette to use (default: 'Set1')
- parameters
-
pcaLoadings: Creates a lollipop plot of PCA components with bootstrapped confidence intervals.
- parameters
- [data] : array-like, shape (n_samples, n_features)
- [peak_label] : A list of peaks to plot
- [imageFileName] : The image file name to save to (default: 'PCA_loadings.png')
- [saveImage] : Setting to 'True' will save the image to file (default: True)
- [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
- [pc_num] : The principal component to plot (default: 1)
- [boot_num] : The number of bootstrap samples to use to calculate confidence internals (default: 500)
- [alpha] : The alpha value for the bootstrapped confidence intervals (default: 0.05)
- [fontSize] : The font size for all text (default: 30)
- [markerSize] : The size of each marker (default: 100)
- [figSize] : The figure size as a tuple (width,height) (default: (40,40))
- [transparent] : Setting to 'True' will make the background transparent (default: False)
- parameters
-
pcoa: Creates a Principal Coordinate Analysis (PCoA) plot.
- parameters
- [similarities] : array-like matrix, shape (n_samples, n_features)
- [imageFileName] : The image file name to save to (default: 'PCOA.png')
- [saveImage] : Setting to 'True' will save the image to file (default: True)
- [dpi] : The number of Dots Per Inch (DPI) for the image (default: 200)
- [n_components] : Number of components (default: 2)
- [max_iter] : Maximum number of iterations of the SMACOF algorithm (default: 300)
- [eps] : Relative tolerance with respect to stress at which to declare convergence (default: 1e-3)
- [seed] : Seed number used by the random number generator for the RandomState instance (default: 3)
- [group_label] : Labels to assign to each group/class (default: None)
- [peak_label] : Labels to assign to each peak (default: None)
- [markerSize] : The size of each marker (default: 100)
- [fontSize] : The font size for all text (default: 12)
- [figSize] : The figure size as a tuple (width,height) (default: (20,10))
- [background_colour] : Set the background colour (default: 'white')
- [grid] : Setting to 'True' will overlay a grid (default: True)
- [transparent] : Setting to 'True' will ignore background_colour and make the background transparent (default: False)
- [cmap] : The CMAP colour palette to use (default: 'Set1')
- parameters
-
loadData: Loads and validates the Data and Peak sheet from an excel file.
- parameters
- [filename] : The name of the excel file (.xlsx file) e.g. 'Data.xlsx'.
- [DataSheet] : The name of the data sheet in the file e.g. 'Data'. The data sheet must contain an 'Idx', 'SampleID', and 'Class' column.
- [PeakSheet] : The name of the peak sheet in the file e.g. 'Peak'. The peak sheet must contain an 'Idx', 'Name', and 'Label' column.
- Returns
- DataTable: Pandas dataFrame
- PeakTable: Pandas dataFrame
- parameters
-
groups2blocks: Slices the data by group/class name into blocks for later identification of multi-block associations and places the data into a dictionary indexed by group/class name.
- parameters
- [PeakTable] : Pandas dataframe containing the feature/peak data. Must contain 'Name' and 'Label'.
- [DataTable] : Pandas dataframe matrix containing values. The data must contain a column separating out the different groups in the data (e.g. Class)
- [group_column_name] : The group column name used in the datatable (e.g. Class)
- Returns
- [DataBlocks] : A dictionary containing DataTables indexed by group names
- [PeakBlocks] : A dictionary containing PeakTables indexed by group names
- parameters
-
mergeBlocks: Merges multiply different Data Tables and Peak Tables from dictionaries into a single Peak Table and Data Table (used for multi-block/multi-omics data preparation). The 'Name' column needs to be unique across all blocks. Automatically annotates the merged Peak Table with a 'Block' column and consolidates any statistical results generated from the multivis.utils.statistics package in relation to each block.
- parameters
- [peak_blocks] : A dictionary of Pandas Peak Table dataframes from different datasets indexed by dataset type.
- [data_blocks] : A dictionary of Pandas Data Table dataframes from different datasets indexed by dataset type.
- [mergeType] : The type of merging to perform. Either by 'SampleID' or 'Index'.
- Returns
- [DataTable] : Merged Pandas dataFrame
- [PeakTable] : Merged Pandas dataFrame (with any statistical results generated by multivis.utils.statistics consolidated into each block)
- parameters
-
transform: Scales and transforms data in forward or reverse order based on different transform options.
- parameters
- [data] : A 1D numpy array of values
- [transform_type] : The transform type to apply to the data ("linear", "reverse_linear", "log", "reverse_log", "square", "reverse_square", "area", "reverse_area", "volume", "reverse_volume", "ordinal", "reverse_ordinal")
- [min] : The minimum value for scaling
- [max] : The maximum value for scaling
- Returns
- [transformed_data] : A scaled and transformed 1D numpy array
- parameters
-
scaler: Scales a series of values in a 1D numpy array or pandas dataframe matrix based on different scaling functions
-
- [data] : A pandas dataframe matrix or 1D numpy array of numerical values
- [type] : The scaler type to apply based on sklearn preprocessing functions (default: "standard")
- [stdScaler_with_mean] : Using "standard" scaler, center the data to the mean before scaling (default: True)
- [stdScaler_with_std] : Using "standard" scaler, scale the data to unit variance (default: True)
- [robust_with_centering] : Using "robust" scaler, center the data to the median before scaling (default: True)
- [robust_with_scaling] : Using "robust" scaler, scale the data to within the quantile range (default: True)
- [robust_unit_variance] : Using "robust" scaler, scale the data so that normally distributed features have a variance of 1 (default: False)
- [minimum] : Using "minmax" scaler, set the minimum value for scaling (default: 0)
- [maximum] : Using "minmax" scaler, set the maximum value for scaling (default: 1)
- [lower_iqr] : Using "robust" scaler, set the lower quantile range (default: 25.0)
- [upper_iqr] : Using "robust" scaler, set the upper quantile range (default: 75.0)
-
- [scaled_data] : A scaled pandas dataframe matrix or 1D numpy array of numerical values
-
-
imputeData: Imputes data given a pandas dataframe of values
- parameters
- [data] : A pandas dataframe of values
- [k] : The number of nearest neighbours
- Returns
- [data_filled] : Imputed data
- parameters
-
statistics: Generate a table of parametric or non-parametric statistics and merges them with the Peak Table (node table).
- init_parameters
- [peaktable] : Pandas dataframe containing peak data. Must contain 'Name' and 'Label'. - [datatable] : Pandas dataframe matrix containing values for statistical analysis
- methods
-
[set_params] : Set parameters
- [parametric] : Perform parametric statistical analysis, assuming the data is normally distributed (default: True) - [log_data] : Perform a log ('natural', base 2 or base 10) on all data prior to statistical analysis (default: (False, 2)) - [scale_data] : Scale the data ('standard' (centers to the mean and scales to unit variance), 'minmax' (scales between 0 and 1), 'maxabs' (scales to the absolute maximum value), 'robust' (centers to the median and scales to between 25th and 75th quantile range) (default: (True, 'standard')) - [impute_data] : Impute any missing values using KNN impute with a set number of nearest neighbours (default: (False, 3))
- [group_column_name] : The group column name used in the datatable (default: None) - [control_group_name] : The control group name in the datatable, if available (default: None) - [group_alpha_CI] : The alpha value for group confidence intervals (default: 0.05) - [fold_change_alpha_CI] : The alpha value for mean/median fold change confidence intervals (default: 0.05) - [pca_alpha_CI] : The alpha value for the PCA confidence intervals (default: 0.05) - [total_missing] : Calculate the total missing values per feature (Default: False) - [group_missing] : Calculate the missing values per feature per group (if group_column_name not None) (Default: False) - [pca_loadings] : Calculate PC1 and PC2 loadings for each feature (Default: True) - [normality_test] : Determine normal distribution across whole dataset using Shapiro-Wilk test (pvalues < 0.05 ~ non-normal distribution) (default: True) - [group_normality_test] : Determine normal distribution across each group (if group_column_name not None) using Shapiro-Wilk test (pvalues < 0.05 ~ non-normal distribution) (default: True) - [group_mean_CI] : Determine the mean with bootstrapped CI across each group (if parametric = True and group_column_name not None) (default: True) - [group_median_CI] : Determine the median with bootstrapped CI across each group (if parametric = False and group_column_name not None) (default: True) - [mean_fold_change] : Calculate the mean fold change with bootstrapped confidence intervals (if parametric = True, group_column_name not None and control_group_name not None) (default: False)
- [median_fold_change] : Calculate the median fold change with bootstrapped confidence intervals (if parametric = False, group_column_name not None and control_group_name not None) (default: False) - [levene_twoGroup] : Test null hypothesis that control group and each of the other groups come from populations with equal variances (if group_column_name not None and control_group_name not None) (default: False) - [levene_allGroup] : Test null hypothesis that all groups come from populations with equal variances (if group_column_name not None) (default: False) - [oneway_Anova_test] : Test null hypothesis that all groups have the same population mean, with included Benjamini-Hochberg FDR (if parametric = True and group_column_name not None) (default: False) - [kruskal_wallis_test] : Test null hypothesis that population median of all groups are equal, with included Benjamini-Hochberg FDR (if parametric = False and group_column_name not None) (default: False) - [ttest_oneGroup] : Calculate the T-test for the mean across all the data (one group), with included Benjamini-Hochberg FDR (if parametric = True, group_column_name is None or there is only 1 group in the data) (default: False) - [ttest_twoGroup] : Calculate the T-test for the mean of two groups, with one group being the control group, with included Benjamini-Hochberg FDR (if parametric = True, group_column_name not None and control_group_name not None) (default: False) - [mann_whitney_u_test] : Compute the Mann-Whitney rank test on two groups, with one being the control group, with included Benjamini-Hochberg FDR (if parametric = False, group_column_name not None and control_group_name not None) (default: False)
-
[help] : Print this help text
-
[calculate] : Performs the statistical calculations and outputs the Peak Table (node table) with the results appended.
-
- init_parameters
-
corrAnalysis: Correlation analysis on a matrix of values with Pearson, Spearman or Kendall's Tau.
- parameters
- [df_data] : A Pandas dataframe matrix of values
- [correlationType] : The correlation type to apply. Either 'Pearson', 'Spearman' or 'KendallTau'
- Returns
- [df_corr] : Pandas dataframe matrix of all correlation coefficients
- [df_pval] : Pandas dataframe matrix of all correlation pvalues
- parameters
-
cluster: Clusters data using a linkage cluster method. If the data is correlated the correlations are first preprocessed, then clustered, otherwise a distance metric is applied to non-correlated data before clustering.
- parameters
- [matrix] : A Pandas dataframe matrix of scores
- [transpose_non_correlated] : Setting to 'True' will transpose the matrix if it is not correlated data
- [is_correlated] : Setting to 'True' will treat the matrix as if it contains correlation coefficients
- [distance_metric] : Set the distance metric. Used if the matrix does not contain correlation coefficients.
- [linkage_method] : Set the linkage method for the clustering.
- Returns
- [matrix] : The original matrix, transposed if transpose_non_correlated is 'True' and is_correlated is 'False'.
- [row_linkage] : linkage matrix for the rows from a linkage clustered distance/similarities matrix
- [col_linkage] : linkage matrix for the columns from a linkage clustered distance/similarities matrix
- parameters
Multivis is licensed under the MIT license.
Dr. Brett Chapman, Post-doctoral Research Fellow at the Western Crop Genetics Alliance, Murdoch University. E-mail: [email protected], [email protected]
If you would like to cite MultiVis in a scientific publication, please cite this GitHub page until a citation to a publication becomes available.