Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add analysis functions #92

Merged
merged 41 commits into from
Dec 17, 2024
Merged
Show file tree
Hide file tree
Changes from 40 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
a7cd9e2
feat: add functions for correlation matrix calculations and subsetting
strixy16 Dec 16, 2024
bceafa4
style: change feature name if statements to be multi-line
strixy16 Dec 16, 2024
dc60e19
build: updated pixi lock file
strixy16 Dec 16, 2024
c426006
feat: add readii logger for errors and exceptions
strixy16 Dec 16, 2024
b0bd53c
test: add testing for getFeatureCorrelations
strixy16 Dec 16, 2024
9713455
build: add seaborn for plotting
strixy16 Dec 16, 2024
28ac67b
build: add pandas dependency
strixy16 Dec 16, 2024
f690a46
feat: add function to plot a correlation matrix as a seaborn heatmap
strixy16 Dec 16, 2024
6370597
feat: add function for plotting correlation distribution as a histogram
strixy16 Dec 16, 2024
0f9f331
feat: add correlation plot functions to init
strixy16 Dec 16, 2024
689eebf
feat: made writer class for plot figures
strixy16 Dec 16, 2024
23c87e4
fix: removing what I think is a typo import
strixy16 Dec 16, 2024
de87bef
style: add space in import statement for plot correlation
strixy16 Dec 16, 2024
6706b9f
feat: add function return types
strixy16 Dec 16, 2024
382de73
feat: specify PlotWriter save object as matplotlib Figure
strixy16 Dec 16, 2024
b07e5a6
Merge remote-tracking branch 'origin/main' into katys/add-analysis
strixy16 Dec 16, 2024
916c42b
docs: correct docstring in getFeatureCorrelations for default feature…
strixy16 Dec 16, 2024
24d41dd
docs: make function docstring oneliners imperative form
strixy16 Dec 16, 2024
d917b7a
docs: make docstring oneliner imperative
strixy16 Dec 16, 2024
293f40f
feat: add possible file extensions for plot figure, remove logger.exc…
strixy16 Dec 16, 2024
7bd422a
refactor: replace print statement with logger
strixy16 Dec 16, 2024
f564992
docs: update triangle parameter description in plotCorrelationHeatmap
strixy16 Dec 16, 2024
9b1424a
feat: add helper function to check if the subsetting of a dataframe i…
strixy16 Dec 16, 2024
724ee50
Merge remote-tracking branch 'origin/main' into katys/add-analysis fo…
strixy16 Dec 16, 2024
9af1289
docs/refactor: add parameter descriptions for PlotWriter save, change…
strixy16 Dec 17, 2024
5406928
style: remove blank line after function docstring
strixy16 Dec 17, 2024
573f8a9
refactor: remove unused import
strixy16 Dec 17, 2024
d3f713e
refactor: remove unused error variables in loadFeatureFilesFromImageT…
strixy16 Dec 17, 2024
0012f6e
refactor: change fstring to regular string in error msgs
strixy16 Dec 17, 2024
16c989b
style: remove whitespace around docstrings
strixy16 Dec 17, 2024
657b72a
feat: add all io, data, analyze functions to ruff config
strixy16 Dec 17, 2024
6dee126
style: sorted imports
strixy16 Dec 17, 2024
cd12dd2
refactor: replaced matplotlib import to specifically import Figure
strixy16 Dec 17, 2024
ce56c78
style: sort imports, remove whitespace in docstring
strixy16 Dec 17, 2024
b62b735
style: sort imports
strixy16 Dec 17, 2024
64052c0
feat: add error handling
strixy16 Dec 17, 2024
4cf18a4
refactor: correct help message for overwrite
strixy16 Dec 17, 2024
227e02a
refactor: replace matplotlib import with Figure import
strixy16 Dec 17, 2024
d583a84
style: sort imports
strixy16 Dec 17, 2024
173b212
refactor: remove io readers and data directory for now, will add in s…
strixy16 Dec 17, 2024
5be1831
feat: add check for empty dataframes in getFeatureCorrelations
strixy16 Dec 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions config/ruff.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ include = [
"src/readii/cli/**/*.py",
"src/readii/negative_controls_refactor/**.py",
"src/readii/io/writers/**.py",
"src/readii/analyze/**.py"
]

# extend-exclude is used to exclude directories from the flake8 checks
Expand Down
9 changes: 4 additions & 5 deletions notebooks/nifti_writer_example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -26,7 +26,6 @@
"import subprocess\n",
"import SimpleITK as sitk\n",
"import pandas as pd\n",
"import uuid\n",
"import random\n",
"import sys\n",
"from readii.utils import logger"
Expand All @@ -41,7 +40,7 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -65,7 +64,7 @@
},
{
"cell_type": "code",
"execution_count": null,
"execution_count": 3,
"metadata": {},
"outputs": [
{
Expand Down Expand Up @@ -210,7 +209,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.7"
"version": "3.12.8"
}
},
"nbformat": 4,
Expand Down
4,026 changes: 3,253 additions & 773 deletions pixi.lock

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ dependencies = [
"pydicom>=2.3.1",
"pyradiomics-bhklab>=3.1.4,<4",
"orcestra-downloader>=0.9.0,<1",
"numpy==1.26.4.*",
"seaborn>=0.13.2,<0.14",
"pandas>=2.2.3,<3"
]
requires-python = ">=3.10, <3.13"

Expand Down
17 changes: 17 additions & 0 deletions src/readii/analyze/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
"""Module to perform analysis on READII outputs."""
from .correlation import (
getCrossCorrelationMatrix,
getFeatureCorrelations,
getHorizontalSelfCorrelations,
getVerticalSelfCorrelations,
)
from .plot_correlation import plotCorrelationHeatmap, plotCorrelationHistogram

__all__ = [
'getFeatureCorrelations',
'getVerticalSelfCorrelations',
'getHorizontalSelfCorrelations',
'getCrossCorrelationMatrix',
'plotCorrelationHeatmap',
'plotCorrelationHistogram'
]
159 changes: 159 additions & 0 deletions src/readii/analyze/correlation.py
strixy16 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
import pandas as pd

from readii.data.select import validateDataframeSubsetSelection
from readii.utils import logger


def getFeatureCorrelations(vertical_features:pd.DataFrame,
horizontal_features:pd.DataFrame,
method:str = "pearson",
vertical_feature_name:str = '_vertical',
horizontal_feature_name:str = '_horizontal') -> pd.DataFrame:
"""Calculate correlation between two sets of features.

Parameters
----------
vertical_features : pd.DataFrame
Dataframe containing features to calculate correlations with. Index must be the same as the index of the horizontal_features dataframe.
horizontal_features : pd.DataFrame
Dataframe containing features to calculate correlations with. Index must be the same as the index of the vertical_features dataframe.
method : str
Method to use for calculating correlations. Default is "pearson".
vertical_feature_name : str
Name of the vertical features to use as suffix in correlation dataframe. Default is "_vertical".
horizontal_feature_name : str
Name of the horizontal features to use as suffix in correlation dataframe. Default is "_horizontal".

Returns
-------
correlation_matrix : pd.DataFrame
Dataframe containing correlation values.
"""
# Check that features are dataframes
if not isinstance(vertical_features, pd.DataFrame):
msg = "vertical_features must be a pandas DataFrame"
logger.exception(msg)
raise TypeError()
strixy16 marked this conversation as resolved.
Show resolved Hide resolved
if not isinstance(horizontal_features, pd.DataFrame):
msg = "horizontal_features must be a pandas DataFrame"
logger.exception(msg)
raise TypeError()
strixy16 marked this conversation as resolved.
Show resolved Hide resolved

if method not in ["pearson", "spearman", "kendall"]:
msg = "Correlation method must be one of 'pearson', 'spearman', or 'kendall'."
logger.exception(msg)
raise ValueError()
jjjermiah marked this conversation as resolved.
Show resolved Hide resolved

if not vertical_features.index.equals(horizontal_features.index):
msg = "Vertical and horizontal features must have the same index to calculate correlation. Set the index to the intersection of patient IDs."
logger.exception(msg)
raise ValueError()
strixy16 marked this conversation as resolved.
Show resolved Hide resolved

# Add _ to beginnging of feature names if they don't start with _ so they can be used as suffixes
if not vertical_feature_name.startswith("_"):
vertical_feature_name = f"_{vertical_feature_name}"
if not horizontal_feature_name.startswith("_"):
horizontal_feature_name = f"_{horizontal_feature_name}"

# Join the features into one dataframe
# Use inner join to keep only the rows that have a value in both vertical and horizontal features
features_to_correlate = vertical_features.join(horizontal_features,
how='inner',
lsuffix=vertical_feature_name,
rsuffix=horizontal_feature_name)

strixy16 marked this conversation as resolved.
Show resolved Hide resolved
try:
# Calculate correlation between vertical features and horizontal features
correlation_matrix = features_to_correlate.corr(method=method)
except Exception as e:
msg = f"Error calculating correlation matrix: {e}"
logger.exception(msg)
raise e

return correlation_matrix



def getVerticalSelfCorrelations(correlation_matrix:pd.DataFrame,
num_vertical_features:int) -> pd.DataFrame:
"""Get the vertical (y-axis) self correlations from a correlation matrix. Gets the top left quadrant of the correlation matrix.

Parameters
----------
correlation_matrix : pd.DataFrame
Dataframe containing the correlation matrix to get the vertical self correlations from.
num_vertical_features : int
Number of vertical features in the correlation matrix.

Returns
-------
pd.DataFrame
Dataframe containing the vertical self correlations from the correlation matrix.
"""
try:
validateDataframeSubsetSelection(correlation_matrix, num_vertical_features, num_vertical_features)
except ValueError as e:
msg = "Number of vertical features provided is greater than the number of rows or columns in the correlation matrix."
logger.exception(msg)
raise e

# Get the correlation matrix for vertical vs vertical - this is the top left corner of the matrix
return correlation_matrix.iloc[0:num_vertical_features, 0:num_vertical_features]



def getHorizontalSelfCorrelations(correlation_matrix:pd.DataFrame,
num_horizontal_features:int) -> pd.DataFrame:
"""Get the horizontal (x-axis) self correlations from a correlation matrix. Gets the bottom right quadrant of the correlation matrix.

Parameters
----------
correlation_matrix : pd.DataFrame
Dataframe containing the correlation matrix to get the horizontal self correlations from.
num_horizontal_features : int
Number of horizontal features in the correlation matrix.

Returns
-------
pd.DataFrame
Dataframe containing the horizontal self correlations from the correlation matrix.
"""
try:
validateDataframeSubsetSelection(correlation_matrix, num_horizontal_features, num_horizontal_features)
except ValueError as e:
msg = "Number of horizontalfeatures provided is greater than the number of rows or columns in the correlation matrix."
logger.exception(msg)
raise e

# Get the index of the start of the horizontal correlations
start_of_horizontal_correlations = len(correlation_matrix.columns) - num_horizontal_features

# Get the correlation matrix for horizontal vs horizontal - this is the bottom right corner of the matrix
return correlation_matrix.iloc[start_of_horizontal_correlations:, start_of_horizontal_correlations:]



def getCrossCorrelationMatrix(correlation_matrix:pd.DataFrame,
num_vertical_features:int) -> pd.DataFrame:
"""Get the cross correlation matrix subsection for a correlation matrix. Gets the top right quadrant of the correlation matrix so vertical and horizontal features are correctly labeled.

Parameters
----------
correlation_matrix : pd.DataFrame
Dataframe containing the correlation matrix to get the cross correlation matrix subsection from.
num_vertical_features : int
Number of vertical features in the correlation matrix.

Returns
-------
pd.DataFrame
Dataframe containing the cross correlations from the correlation matrix.
"""
try:
validateDataframeSubsetSelection(correlation_matrix, num_vertical_features, num_vertical_features)
except ValueError as e:
msg = "Number of vertical features provided is greater than the number of rows or columns in the correlation matrix."
logger.exception(msg)
raise e

return correlation_matrix.iloc[0:num_vertical_features, num_vertical_features:]
Loading
Loading