-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: integrate analyze readii outputs functions #79
Closed
+5,576
−1,909
Closed
Changes from all commits
Commits
Show all changes
64 commits
Select commit
Hold shift + click to select a range
95586e5
feat: add function to calculate feature correlation matrix
strixy16 5193fb4
feat: add function to generate a heatmap plot figure from a correlati…
strixy16 a0771c6
feat: add init file to analyze directory
strixy16 cf26afc
feat: add error handling in getFeatureCorrelations
strixy16 e643349
feat: add general loading file, add loading config and data file func…
strixy16 f5882da
feat: add file for loading functions related to feature files
strixy16 5495550
build: add numpy and seaborn for correlation code
strixy16 decf8e5
refactor: remove so far unused imports
strixy16 fcc1b9e
feat: started test function for getFeatureCorrelations
strixy16 a708182
feat: make files for better function organization
strixy16 d706863
Merge remote-tracking branch 'origin/main' into katys/integrate-analy…
strixy16 d63a1c5
fix: remove duplicate tool.pixi.dependencies from merge
strixy16 484c12e
build: add seaborn for correlation plot functions, need to specify nu…
strixy16 c6b945f
feat: add init files for new directories
strixy16 fc83d69
feat: add function to calculate feature correlations and a function t…
strixy16 46f0773
feat: add function to drop a set of features at the beginning of a pa…
strixy16 fe56257
fix: set continuous setting in StructureSetToSegmentation to False
strixy16 e618269
build: moved seaborn and numpy to project dependencies
strixy16 a6ab888
test: make test feature matrix to test correlation functions with, up…
strixy16 0f1d837
feat: set StructureSetToSegmentation continuous argument to False
strixy16 5b0dccc
build: lock file from installing on katys mac
strixy16 0d9c943
Merge branch 'katys/fix_continuous_rtstruct_index' into katys/integra…
strixy16 5b4e5cb
feat: add functions for selecting subsets of dataframes
strixy16 b36f3d2
refactor: renamed process to select for specificity
strixy16 fa4da89
style: rename labelling for consistent filename convention
strixy16 0256466
feat: add function to extract patient ID label from a dataframe
strixy16 f0b87c2
feat: add functions to replace column values in a dataset for imputat…
strixy16 d44e1ce
feat: add function to save out seaborn plot figure to a png
strixy16 bfdc357
feat: add function to convert numerical days column to years
strixy16 1e89c17
feat: add function to set up a time outcome column for survival predi…
strixy16 948b426
feat: add function for survival status mapping from string to numeric…
strixy16 86f13ec
feat: add function to set patient ID column as index in a dataframe
strixy16 7842ebe
feat: add function to intersect two dataframes by their patient ID va…
strixy16 81b884a
feat: add function that takes outcome labels from clinical data and a…
strixy16 de2dd2c
feat: add function to get a list of image types from a directory of f…
strixy16 1d49ec1
feat: add function to plot and return a correlation heatmap
strixy16 8e0868f
feat: add function to plot a histogram of correlation values
strixy16 45b8fb0
feat: add functions to extract subsets of a full correlation matrix
strixy16 6b84ef8
style: rename plot to plot_correlations for specificity
strixy16 61cdedd
feat: add functions for self and cross correlation plotting
strixy16 e021051
refactor: remove unused imports
strixy16 730361b
refactor: remove unused scipy import
strixy16 1f4edf2
build: latest pixi lock file for analysis code addition
strixy16 de1c752
feat: change continuous to True in loadRTSTRUCTSITK so tests pass for…
strixy16 2647168
fix: need default vertical and horizontal suffixes when same feature …
strixy16 253aba2
fix: default feature names will have underscore at the front and unde…
strixy16 31bf5bf
feat: testing getFeatureCorrelations function
strixy16 231c390
fix: handle mutable input argument event_column_mapping
strixy16 40c1cba
fix: add fstring so variable is used properly in error message
strixy16 550c32a
fix: remove mutable version of outcome_labels input for addOutcomeLabels
strixy16 0d36600
fix: update error handling of old values to be replaced not existing …
strixy16 187b1cb
feat: change input image_types list for loadFeatureFilesFromImageType…
strixy16 b1daaf0
fix: change labels to drop default to None and assign in the function…
strixy16 6075966
refactor: use context manager for file operations and improve error h…
strixy16 501e20d
feat: improve error handling and input validation in loadFileToDataframe
strixy16 5ea0b99
refactor: change assert statements in getFeatureCorrelations to if st…
strixy16 da16d68
feat: handle NaN values in existing event values list in survival sta…
strixy16 0c8ccbf
docs: describe handling of NaNs in survival outcome column when mappi…
strixy16 2ab08e6
refactor: check dtype of event outcome column instead of first elemen…
strixy16 90839a2
refactor: simplify event column mapping dictionary check with sets
strixy16 80b81a7
refactor: change out string to numeric replacement with the replaceCo…
strixy16 b0a892d
feat: check that extracted feature directory exists
strixy16 edaf74c
refactor: improve error handling for dropping labels in loadFeatureFi…
strixy16 dc2e86a
feat: validate that any feature sets were loaded before return
strixy16 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,3 @@ | ||
# read version from installed package | ||
from importlib.metadata import version | ||
__version__ = "1.18.0" | ||
|
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,230 @@ | ||
import pandas as pd | ||
from typing import Optional | ||
import matplotlib.pyplot as plt | ||
import seaborn as sns | ||
import numpy as np | ||
|
||
|
||
def getFeatureCorrelations(vertical_features:pd.DataFrame, | ||
horizontal_features:pd.DataFrame, | ||
method:str = "pearson", | ||
vertical_feature_name:str = '_vertical', | ||
horizontal_feature_name:str = '_horizontal'): | ||
""" Function to calculate correlation between two sets of features. | ||
Parameters | ||
---------- | ||
vertical_features : pd.DataFrame | ||
Dataframe containing features to calculate correlations with. Index must be the same as the index of the horizontal_features dataframe. | ||
horizontal_features : pd.DataFrame | ||
Dataframe containing features to calculate correlations with. Index must be the same as the index of the vertical_features dataframe. | ||
method : str | ||
Method to use for calculating correlations. Default is "pearson". | ||
vertical_feature_name : str | ||
Name of the vertical features to use as suffix in correlation dataframe. Default is blank "". | ||
horizontal_feature_name : str | ||
Name of the horizontal features to use as suffix in correlation dataframe. Default is blank "". | ||
Returns | ||
------- | ||
correlation_matrix : pd.DataFrame | ||
Dataframe containing correlation values. | ||
""" | ||
# Check that features are dataframes | ||
if not isinstance(vertical_features, pd.DataFrame): | ||
raise TypeError("vertical_features must be a pandas DataFrame") | ||
if not isinstance(horizontal_features, pd.DataFrame): | ||
raise TypeError("horizontal_features must be a pandas DataFrame") | ||
|
||
|
||
if method not in ["pearson", "spearman", "kendall"]: | ||
raise ValueError("Correlation method must be one of 'pearson', 'spearman', or 'kendall'.") | ||
|
||
if not vertical_features.index.equals(horizontal_features.index): | ||
raise ValueError("Vertical and horizontal features must have the same index to calculate correlation. Set the index to the intersection of patient IDs.") | ||
|
||
# Add _ to beginnging of feature names if they don't start with _ so they can be used as suffixes | ||
if not vertical_feature_name.startswith("_"): vertical_feature_name = f"_{vertical_feature_name}" | ||
if not horizontal_feature_name.startswith("_"): horizontal_feature_name = f"_{horizontal_feature_name}" | ||
|
||
# Join the features into one dataframe | ||
# Use inner join to keep only the rows that have a value in both vertical and horizontal features | ||
features_to_correlate = vertical_features.join(horizontal_features, | ||
how='inner', | ||
lsuffix=vertical_feature_name, | ||
rsuffix=horizontal_feature_name) | ||
|
||
try: | ||
# Calculate correlation between vertical features and horizontal features | ||
correlation_matrix = features_to_correlate.corr(method=method) | ||
except Exception as e: | ||
raise ValueError(f"Error calculating correlation matrix: {e}") | ||
|
||
return correlation_matrix | ||
|
||
|
||
def plotCorrelationHeatmap(correlation_matrix_df:pd.DataFrame, | ||
diagonal:Optional[bool] = False, | ||
triangle:Optional[str] = "lower", | ||
cmap:Optional[str] = "nipy_spectral", | ||
xlabel:Optional[str] = "", | ||
ylabel:Optional[str] = "", | ||
title:Optional[str] = "", | ||
subtitle:Optional[str] = "", | ||
show_tick_labels:Optional[bool] = False | ||
): | ||
"""Function to plot a correlation heatmap. | ||
Parameters | ||
---------- | ||
correlation_matrix_df : pd.DataFrame | ||
Dataframe containing the correlation matrix to plot. | ||
diagonal : bool, optional | ||
Whether to only plot half of the matrix. The default is False. | ||
triangle : str, optional | ||
Which triangle half of the matrixto plot. The default is "lower". | ||
xlabel : str, optional | ||
Label for the x-axis. The default is "". | ||
ylabel : str, optional | ||
Label for the y-axis. The default is "". | ||
title : str, optional | ||
Title for the plot. The default is "". | ||
subtitle : str, optional | ||
Subtitle for the plot. The default is "". | ||
show_tick_labels : bool, optional | ||
Whether to show the tick labels on the x and y axes. These would be the feature names. The default is False. | ||
Returns | ||
------- | ||
corr_fig : matplotlib.pyplot.figure | ||
Figure object containing a Seaborn heatmap. | ||
""" | ||
|
||
if diagonal: | ||
# Set up mask for hiding half the matrix in the plot | ||
if triangle == "lower": | ||
# Mask out the upper right triangle half of the matrix | ||
mask = np.triu(correlation_matrix_df) | ||
elif triangle == "upper": | ||
# Mask out the lower left triangle half of the matrix | ||
mask = np.tril(correlation_matrix_df) | ||
else: | ||
raise ValueError("If diagonal is True, triangle must be either 'lower' or 'upper'.") | ||
else: | ||
# The entire correlation matrix will be visisble in the plot | ||
mask = None | ||
|
||
# Set a default title if one is not provided | ||
if not title: | ||
title = "Correlation Heatmap" | ||
|
||
# Set up figure and axes for the plot | ||
corr_fig, corr_ax = plt.subplots() | ||
|
||
# Plot the correlation matrix | ||
corr_ax = sns.heatmap(correlation_matrix_df, | ||
mask = mask, | ||
cmap=cmap, | ||
vmin=-1.0, | ||
vmax=1.0) | ||
|
||
if not show_tick_labels: | ||
# Remove the individual feature names from the axes | ||
corr_ax.set_xticklabels(labels=[]) | ||
corr_ax.set_yticklabels(labels=[]) | ||
|
||
# Set axis labels | ||
corr_ax.set_xlabel(xlabel) | ||
corr_ax.set_ylabel(ylabel) | ||
|
||
# Set title and subtitle | ||
# Suptitle is the super title, which will be above the title | ||
plt.title(subtitle, fontsize=12) | ||
plt.suptitle(title, fontsize=14) | ||
|
||
return corr_fig | ||
|
||
|
||
|
||
def getVerticalSelfCorrelations(correlation_matrix:pd.DataFrame, | ||
num_vertical_features:int): | ||
""" Function to get the vertical (y-axis) self correlations from a correlation matrix. Gets the top left quadrant of the correlation matrix. | ||
Parameters | ||
---------- | ||
correlation_matrix : pd.DataFrame | ||
Dataframe containing the correlation matrix to get the vertical self correlations from. | ||
num_vertical_features : int | ||
Number of vertical features in the correlation matrix. | ||
Returns | ||
------- | ||
pd.DataFrame | ||
Dataframe containing the vertical self correlations from the correlation matrix. | ||
""" | ||
if num_vertical_features > correlation_matrix.shape[0]: | ||
raise ValueError(f"Number of vertical features ({num_vertical_features}) is greater than the number of rows in the correlation matrix ({correlation_matrix.shape[0]}).") | ||
|
||
if num_vertical_features > correlation_matrix.shape[1]: | ||
raise ValueError(f"Number of vertical features ({num_vertical_features}) is greater than the number of columns in the correlation matrix ({correlation_matrix.shape[1]}).") | ||
|
||
# Get the correlation matrix for vertical vs vertical - this is the top left corner of the matrix | ||
return correlation_matrix.iloc[0:num_vertical_features, 0:num_vertical_features] | ||
|
||
|
||
|
||
def getHorizontalSelfCorrelations(correlation_matrix:pd.DataFrame, | ||
num_horizontal_features:int): | ||
""" Function to get the horizontal (x-axis) self correlations from a correlation matrix. Gets the bottom right quadrant of the correlation matrix. | ||
Parameters | ||
---------- | ||
correlation_matrix : pd.DataFrame | ||
Dataframe containing the correlation matrix to get the horizontal self correlations from. | ||
num_horizontal_features : int | ||
Number of horizontal features in the correlation matrix. | ||
Returns | ||
------- | ||
pd.DataFrame | ||
Dataframe containing the horizontal self correlations from the correlation matrix. | ||
""" | ||
|
||
if num_horizontal_features > correlation_matrix.shape[0]: | ||
raise ValueError(f"Number of horizontal features ({num_horizontal_features}) is greater than the number of rows in the correlation matrix ({correlation_matrix.shape[0]}).") | ||
|
||
if num_horizontal_features > correlation_matrix.shape[1]: | ||
raise ValueError(f"Number of horizontal features ({num_horizontal_features}) is greater than the number of columns in the correlation matrix ({correlation_matrix.shape[1]}).") | ||
|
||
# Get the index of the start of the horizontal correlations | ||
start_of_horizontal_correlations = len(correlation_matrix.columns) - num_horizontal_features | ||
|
||
# Get the correlation matrix for horizontal vs horizontal - this is the bottom right corner of the matrix | ||
return correlation_matrix.iloc[start_of_horizontal_correlations:, start_of_horizontal_correlations:] | ||
|
||
|
||
|
||
def getCrossCorrelationMatrix(correlation_matrix:pd.DataFrame, | ||
num_vertical_features:int): | ||
""" Function to get the cross correlation matrix subsection for a correlation matrix. Gets the top right quadrant of the correlation matrix so vertical and horizontal features are correctly labeled. | ||
Parameters | ||
---------- | ||
correlation_matrix : pd.DataFrame | ||
Dataframe containing the correlation matrix to get the cross correlation matrix subsection from. | ||
num_vertical_features : int | ||
Number of vertical features in the correlation matrix. | ||
Returns | ||
------- | ||
pd.DataFrame | ||
Dataframe containing the cross correlations from the correlation matrix. | ||
""" | ||
|
||
if num_vertical_features > correlation_matrix.shape[0]: | ||
raise ValueError(f"Number of vertical features ({num_vertical_features}) is greater than the number of rows in the correlation matrix ({correlation_matrix.shape[0]}).") | ||
|
||
if num_vertical_features > correlation_matrix.shape[1]: | ||
raise ValueError(f"Number of vertical features ({num_vertical_features}) is greater than the number of columns in the correlation matrix ({correlation_matrix.shape[1]}).") | ||
|
||
return correlation_matrix.iloc[0:num_vertical_features, num_vertical_features:] |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use exception chaining with
raise ... from e
When raising a new exception within an
except
block, usefrom e
to preserve the original exception context.Apply this diff:
📝 Committable suggestion
🧰 Tools
🪛 Ruff (0.8.0)
61-61: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)