Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add analysis functions #92

Merged
merged 41 commits into from
Dec 17, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
a7cd9e2
feat: add functions for correlation matrix calculations and subsetting
strixy16 Dec 16, 2024
bceafa4
style: change feature name if statements to be multi-line
strixy16 Dec 16, 2024
dc60e19
build: updated pixi lock file
strixy16 Dec 16, 2024
c426006
feat: add readii logger for errors and exceptions
strixy16 Dec 16, 2024
b0bd53c
test: add testing for getFeatureCorrelations
strixy16 Dec 16, 2024
9713455
build: add seaborn for plotting
strixy16 Dec 16, 2024
28ac67b
build: add pandas dependency
strixy16 Dec 16, 2024
f690a46
feat: add function to plot a correlation matrix as a seaborn heatmap
strixy16 Dec 16, 2024
6370597
feat: add function for plotting correlation distribution as a histogram
strixy16 Dec 16, 2024
0f9f331
feat: add correlation plot functions to init
strixy16 Dec 16, 2024
689eebf
feat: made writer class for plot figures
strixy16 Dec 16, 2024
23c87e4
fix: removing what I think is a typo import
strixy16 Dec 16, 2024
de87bef
style: add space in import statement for plot correlation
strixy16 Dec 16, 2024
6706b9f
feat: add function return types
strixy16 Dec 16, 2024
382de73
feat: specify PlotWriter save object as matplotlib Figure
strixy16 Dec 16, 2024
b07e5a6
Merge remote-tracking branch 'origin/main' into katys/add-analysis
strixy16 Dec 16, 2024
916c42b
docs: correct docstring in getFeatureCorrelations for default feature…
strixy16 Dec 16, 2024
24d41dd
docs: make function docstring oneliners imperative form
strixy16 Dec 16, 2024
d917b7a
docs: make docstring oneliner imperative
strixy16 Dec 16, 2024
293f40f
feat: add possible file extensions for plot figure, remove logger.exc…
strixy16 Dec 16, 2024
7bd422a
refactor: replace print statement with logger
strixy16 Dec 16, 2024
f564992
docs: update triangle parameter description in plotCorrelationHeatmap
strixy16 Dec 16, 2024
9b1424a
feat: add helper function to check if the subsetting of a dataframe i…
strixy16 Dec 16, 2024
724ee50
Merge remote-tracking branch 'origin/main' into katys/add-analysis fo…
strixy16 Dec 16, 2024
9af1289
docs/refactor: add parameter descriptions for PlotWriter save, change…
strixy16 Dec 17, 2024
5406928
style: remove blank line after function docstring
strixy16 Dec 17, 2024
573f8a9
refactor: remove unused import
strixy16 Dec 17, 2024
d3f713e
refactor: remove unused error variables in loadFeatureFilesFromImageT…
strixy16 Dec 17, 2024
0012f6e
refactor: change fstring to regular string in error msgs
strixy16 Dec 17, 2024
16c989b
style: remove whitespace around docstrings
strixy16 Dec 17, 2024
657b72a
feat: add all io, data, analyze functions to ruff config
strixy16 Dec 17, 2024
6dee126
style: sorted imports
strixy16 Dec 17, 2024
cd12dd2
refactor: replaced matplotlib import to specifically import Figure
strixy16 Dec 17, 2024
ce56c78
style: sort imports, remove whitespace in docstring
strixy16 Dec 17, 2024
b62b735
style: sort imports
strixy16 Dec 17, 2024
64052c0
feat: add error handling
strixy16 Dec 17, 2024
4cf18a4
refactor: correct help message for overwrite
strixy16 Dec 17, 2024
227e02a
refactor: replace matplotlib import with Figure import
strixy16 Dec 17, 2024
d583a84
style: sort imports
strixy16 Dec 17, 2024
173b212
refactor: remove io readers and data directory for now, will add in s…
strixy16 Dec 17, 2024
5be1831
feat: add check for empty dataframes in getFeatureCorrelations
strixy16 Dec 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 4 additions & 5 deletions notebooks/nifti_writer_example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -26,7 +26,6 @@
"import subprocess\n",
"import SimpleITK as sitk\n",
"import pandas as pd\n",
"import uuid\n",
"import random\n",
"import sys\n",
"from readii.utils import logger"
Expand All @@ -41,7 +40,7 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -65,7 +64,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -210,7 +209,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.7"
"version": "3.12.8"
}
},
"nbformat": 4,
Expand Down
1,674 changes: 701 additions & 973 deletions pixi.lock

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ dependencies = [
"pydicom>=2.3.1",
"pyradiomics-bhklab>=3.1.4,<4",
"orcestra-downloader>=0.9.0,<1",
"numpy==1.26.4.*",
"seaborn>=0.13.2,<0.14",
"pandas>=2.2.3,<3"
]
requires-python = ">=3.10, <3.13"

Expand Down
14 changes: 14 additions & 0 deletions src/readii/analyze/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
""" Module to perform analysis on READII outputs """

from .correlation import getFeatureCorrelations, getVerticalSelfCorrelations, getHorizontalSelfCorrelations, getCrossCorrelationMatrix
from.plot_correlation import plotCorrelationHeatmap, plotCorrelationHistogram
strixy16 marked this conversation as resolved.
Show resolved Hide resolved


__all__ = [
'getFeatureCorrelations',
'getVerticalSelfCorrelations',
'getHorizontalSelfCorrelations',
'getCrossCorrelationMatrix',
'plotCorrelationHeatmap',
'plotCorrelationHistogram'
]
167 changes: 167 additions & 0 deletions src/readii/analyze/correlation.py
strixy16 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
import pandas as pd
from readii.utils import logger

def getFeatureCorrelations(vertical_features:pd.DataFrame,
horizontal_features:pd.DataFrame,
method:str = "pearson",
vertical_feature_name:str = '_vertical',
horizontal_feature_name:str = '_horizontal'):
""" Function to calculate correlation between two sets of features.
strixy16 marked this conversation as resolved.
Show resolved Hide resolved

Parameters
----------
vertical_features : pd.DataFrame
Dataframe containing features to calculate correlations with. Index must be the same as the index of the horizontal_features dataframe.
horizontal_features : pd.DataFrame
Dataframe containing features to calculate correlations with. Index must be the same as the index of the vertical_features dataframe.
method : str
Method to use for calculating correlations. Default is "pearson".
vertical_feature_name : str
Name of the vertical features to use as suffix in correlation dataframe. Default is blank "".
horizontal_feature_name : str
Name of the horizontal features to use as suffix in correlation dataframe. Default is blank "".
strixy16 marked this conversation as resolved.
Show resolved Hide resolved

Returns
-------
correlation_matrix : pd.DataFrame
Dataframe containing correlation values.
"""
# Check that features are dataframes
if not isinstance(vertical_features, pd.DataFrame):
msg = "vertical_features must be a pandas DataFrame"
logger.exception(msg)
raise TypeError()
strixy16 marked this conversation as resolved.
Show resolved Hide resolved
if not isinstance(horizontal_features, pd.DataFrame):
msg = "horizontal_features must be a pandas DataFrame"
logger.exception(msg)
raise TypeError()
strixy16 marked this conversation as resolved.
Show resolved Hide resolved

if method not in ["pearson", "spearman", "kendall"]:
msg = "Correlation method must be one of 'pearson', 'spearman', or 'kendall'."
logger.exception(msg)
raise ValueError()
jjjermiah marked this conversation as resolved.
Show resolved Hide resolved

if not vertical_features.index.equals(horizontal_features.index):
msg = "Vertical and horizontal features must have the same index to calculate correlation. Set the index to the intersection of patient IDs."
logger.exception(msg)
raise ValueError()
strixy16 marked this conversation as resolved.
Show resolved Hide resolved

# Add _ to beginnging of feature names if they don't start with _ so they can be used as suffixes
if not vertical_feature_name.startswith("_"):
vertical_feature_name = f"_{vertical_feature_name}"
if not horizontal_feature_name.startswith("_"):
horizontal_feature_name = f"_{horizontal_feature_name}"

# Join the features into one dataframe
# Use inner join to keep only the rows that have a value in both vertical and horizontal features
features_to_correlate = vertical_features.join(horizontal_features,
how='inner',
lsuffix=vertical_feature_name,
rsuffix=horizontal_feature_name)

strixy16 marked this conversation as resolved.
Show resolved Hide resolved
try:
# Calculate correlation between vertical features and horizontal features
correlation_matrix = features_to_correlate.corr(method=method)
except Exception as e:
msg = f"Error calculating correlation matrix: {e}"
logger.exception(msg)
raise e

return correlation_matrix



def getVerticalSelfCorrelations(correlation_matrix:pd.DataFrame,
num_vertical_features:int):
""" Function to get the vertical (y-axis) self correlations from a correlation matrix. Gets the top left quadrant of the correlation matrix.
strixy16 marked this conversation as resolved.
Show resolved Hide resolved

Parameters
----------
correlation_matrix : pd.DataFrame
Dataframe containing the correlation matrix to get the vertical self correlations from.
num_vertical_features : int
Number of vertical features in the correlation matrix.

Returns
-------
pd.DataFrame
Dataframe containing the vertical self correlations from the correlation matrix.
"""
if num_vertical_features > correlation_matrix.shape[0]:
msg = f"Number of vertical features ({num_vertical_features}) is greater than the number of rows in the correlation matrix ({correlation_matrix.shape[0]})."
logger.exception(msg)
raise ValueError()
strixy16 marked this conversation as resolved.
Show resolved Hide resolved

if num_vertical_features > correlation_matrix.shape[1]:
msg = f"Number of vertical features ({num_vertical_features}) is greater than the number of columns in the correlation matrix ({correlation_matrix.shape[1]})."
logger.exception(msg)
raise ValueError()
strixy16 marked this conversation as resolved.
Show resolved Hide resolved

strixy16 marked this conversation as resolved.
Show resolved Hide resolved
# Get the correlation matrix for vertical vs vertical - this is the top left corner of the matrix
return correlation_matrix.iloc[0:num_vertical_features, 0:num_vertical_features]



def getHorizontalSelfCorrelations(correlation_matrix:pd.DataFrame,
num_horizontal_features:int):
""" Function to get the horizontal (x-axis) self correlations from a correlation matrix. Gets the bottom right quadrant of the correlation matrix.
strixy16 marked this conversation as resolved.
Show resolved Hide resolved

Parameters
----------
correlation_matrix : pd.DataFrame
Dataframe containing the correlation matrix to get the horizontal self correlations from.
num_horizontal_features : int
Number of horizontal features in the correlation matrix.

Returns
-------
pd.DataFrame
Dataframe containing the horizontal self correlations from the correlation matrix.
"""

if num_horizontal_features > correlation_matrix.shape[0]:
msg = f"Number of horizontal features ({num_horizontal_features}) is greater than the number of rows in the correlation matrix ({correlation_matrix.shape[0]})."
logger.exception(msg)
raise ValueError()
strixy16 marked this conversation as resolved.
Show resolved Hide resolved

if num_horizontal_features > correlation_matrix.shape[1]:
msg = f"Number of horizontal features ({num_horizontal_features}) is greater than the number of columns in the correlation matrix ({correlation_matrix.shape[1]})."
logger.exception(msg)
raise ValueError()
strixy16 marked this conversation as resolved.
Show resolved Hide resolved

jjjermiah marked this conversation as resolved.
Show resolved Hide resolved
# Get the index of the start of the horizontal correlations
start_of_horizontal_correlations = len(correlation_matrix.columns) - num_horizontal_features

# Get the correlation matrix for horizontal vs horizontal - this is the bottom right corner of the matrix
return correlation_matrix.iloc[start_of_horizontal_correlations:, start_of_horizontal_correlations:]



def getCrossCorrelationMatrix(correlation_matrix:pd.DataFrame,
num_vertical_features:int):
""" Function to get the cross correlation matrix subsection for a correlation matrix. Gets the top right quadrant of the correlation matrix so vertical and horizontal features are correctly labeled.

Parameters
----------
correlation_matrix : pd.DataFrame
Dataframe containing the correlation matrix to get the cross correlation matrix subsection from.
num_vertical_features : int
Number of vertical features in the correlation matrix.

Returns
-------
pd.DataFrame
Dataframe containing the cross correlations from the correlation matrix.
"""

if num_vertical_features > correlation_matrix.shape[0]:
msg = f"Number of vertical features ({num_vertical_features}) is greater than the number of rows in the correlation matrix ({correlation_matrix.shape[0]})."
logger.exception(msg)
raise ValueError()
strixy16 marked this conversation as resolved.
Show resolved Hide resolved

if num_vertical_features > correlation_matrix.shape[1]:
msg = f"Number of vertical features ({num_vertical_features}) is greater than the number of columns in the correlation matrix ({correlation_matrix.shape[1]})."
logger.exception(msg)
raise ValueError()
jjjermiah marked this conversation as resolved.
Show resolved Hide resolved

jjjermiah marked this conversation as resolved.
Show resolved Hide resolved
return correlation_matrix.iloc[0:num_vertical_features, num_vertical_features:]
Loading