-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: update correlation functions to get subsections of matrix #103
Conversation
…pe name Filter the dataframe by column and/or index suffixes instead of number of columns and rows.
WalkthroughThe pull request introduces significant modifications to the correlation analysis module in the Changes
Sequence DiagramsequenceDiagram
participant User
participant CorrelationModule
participant CorrelationMatrix
User->>CorrelationModule: getSelfAndCrossCorrelations(correlation_matrix)
CorrelationModule->>CorrelationMatrix: Filter self correlations
CorrelationModule->>CorrelationMatrix: Filter cross correlations
CorrelationModule-->>User: Return combined correlation results
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/readii/analyze/correlation.py (3)
82-84
: Confirm naming consistency in docstring.
The docstring references "vertical self correlations" even though the method has been renamed to getSelfCorrelations, which works for any selected suffix. Updating the docstring will help maintain consistency.
101-102
: Fix grammar in error message.
The error message "No features with found with ..." is missing a word and may be confusing.Apply this diff to correct the grammar:
- msg = f"No features with found with {feature_type_name} suffix in the correlation matrix." + msg = f"No features found with {feature_type_name} suffix in the correlation matrix."
132-133
: Fix grammar in error message.
Similar to the earlier comment, the message "No features with found with ..." is missing a word.Apply this diff:
- msg = f"No features with found with {vertical_feature_name} and {horizontal_feature_name} suffix in the correlation matrix." + msg = f"No features found with {vertical_feature_name} and {horizontal_feature_name} suffix in the correlation matrix."
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/readii/analyze/correlation.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
src/readii/analyze/correlation.py
173-173: Undefined name getCrossCorrelationMatrix
(F821)
🔇 Additional comments (1)
src/readii/analyze/correlation.py (1)
110-113
: Assess filter approach for cross correlations.
Using filter(like=vertical_feature_name) and filter(like=horizontal_feature_name) may inadvertently match columns if their names simply contain the substring rather than specifically ending with the name. Consider being more strict if you anticipate columns that contain these substrings but are not actually the intended features.
src/readii/analyze/correlation.py
Outdated
def getSelfAndCrossCorrelations(correlation_matrix:pd.DataFrame, | ||
vertical_feature_name:str = '_vertical', | ||
horizontal_feature_name:str = '_horizontal') -> pd.DataFrame: | ||
"""Get the vertical and horizontal self correlations and cross correlations from a correlation matrix. | ||
|
||
Parameters | ||
---------- | ||
correlation_matrix : pd.DataFrame | ||
Dataframe containing the correlation matrix to get the cross correlation matrix subsection from. | ||
num_vertical_features : int | ||
Number of vertical features in the correlation matrix. | ||
|
||
Dataframe containing the correlation matrix to get the self and cross correlations from. | ||
vertical_feature_name : str | ||
Name of the vertical feature type to get self correlations for. Must be the suffix of some feature names in the correlation matrix. | ||
horizontal_feature_name : str | ||
Name of the horizontal feature type to get self correlations for. Must be the suffix of some feature names in the correlation matrix. | ||
|
||
Returns | ||
------- | ||
pd.DataFrame | ||
vertical_correlations : pd.DataFrame | ||
Dataframe containing the vertical self correlations from the correlation matrix. | ||
horizontal_correlations : pd.DataFrame | ||
Dataframe containing the horizontal self correlations from the correlation matrix. | ||
cross_correlations : pd.DataFrame | ||
Dataframe containing the cross correlations from the correlation matrix. | ||
""" | ||
try: | ||
validateDataframeSubsetSelection(correlation_matrix, num_vertical_features, num_vertical_features) | ||
except ValueError as e: | ||
msg = "Number of vertical features provided is greater than the number of rows or columns in the correlation matrix." | ||
vertical_correlations, horizontal_correlations = getSelfCorrelations(correlation_matrix, vertical_feature_name), getSelfCorrelations(correlation_matrix, horizontal_feature_name) | ||
|
||
except Exception as e: | ||
msg = f"Error getting self correlations from correlation matrix: {e}" | ||
logger.exception(msg) | ||
raise e | ||
|
||
return correlation_matrix.iloc[0:num_vertical_features, num_vertical_features:] No newline at end of file | ||
try: | ||
cross_correlations = getCrossCorrelationMatrix(correlation_matrix, vertical_feature_name, horizontal_feature_name) | ||
except Exception as e: | ||
msg = f"Error getting cross correlations from correlation matrix: {e}" | ||
logger.exception(msg) | ||
raise e | ||
|
||
return vertical_correlations, horizontal_correlations, cross_correlations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undefined function call: getCrossCorrelationMatrix.
The function getCrossCorrelationMatrix is no longer defined, which triggers an error and contradicts the new consolidated structure. Replace this call with getCrossCorrelations to align with the new architecture.
Apply this diff to fix the undefined call:
- cross_correlations = getCrossCorrelationMatrix(correlation_matrix, vertical_feature_name, horizontal_feature_name)
+ cross_correlations = getCrossCorrelations(correlation_matrix, vertical_feature_name, horizontal_feature_name)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def getSelfAndCrossCorrelations(correlation_matrix:pd.DataFrame, | |
vertical_feature_name:str = '_vertical', | |
horizontal_feature_name:str = '_horizontal') -> pd.DataFrame: | |
"""Get the vertical and horizontal self correlations and cross correlations from a correlation matrix. | |
Parameters | |
---------- | |
correlation_matrix : pd.DataFrame | |
Dataframe containing the correlation matrix to get the cross correlation matrix subsection from. | |
num_vertical_features : int | |
Number of vertical features in the correlation matrix. | |
Dataframe containing the correlation matrix to get the self and cross correlations from. | |
vertical_feature_name : str | |
Name of the vertical feature type to get self correlations for. Must be the suffix of some feature names in the correlation matrix. | |
horizontal_feature_name : str | |
Name of the horizontal feature type to get self correlations for. Must be the suffix of some feature names in the correlation matrix. | |
Returns | |
------- | |
pd.DataFrame | |
vertical_correlations : pd.DataFrame | |
Dataframe containing the vertical self correlations from the correlation matrix. | |
horizontal_correlations : pd.DataFrame | |
Dataframe containing the horizontal self correlations from the correlation matrix. | |
cross_correlations : pd.DataFrame | |
Dataframe containing the cross correlations from the correlation matrix. | |
""" | |
try: | |
validateDataframeSubsetSelection(correlation_matrix, num_vertical_features, num_vertical_features) | |
except ValueError as e: | |
msg = "Number of vertical features provided is greater than the number of rows or columns in the correlation matrix." | |
vertical_correlations, horizontal_correlations = getSelfCorrelations(correlation_matrix, vertical_feature_name), getSelfCorrelations(correlation_matrix, horizontal_feature_name) | |
except Exception as e: | |
msg = f"Error getting self correlations from correlation matrix: {e}" | |
logger.exception(msg) | |
raise e | |
return correlation_matrix.iloc[0:num_vertical_features, num_vertical_features:] | |
\ No newline at end of file | |
try: | |
cross_correlations = getCrossCorrelationMatrix(correlation_matrix, vertical_feature_name, horizontal_feature_name) | |
except Exception as e: | |
msg = f"Error getting cross correlations from correlation matrix: {e}" | |
logger.exception(msg) | |
raise e | |
return vertical_correlations, horizontal_correlations, cross_correlations | |
def getSelfAndCrossCorrelations(correlation_matrix:pd.DataFrame, | |
vertical_feature_name:str = '_vertical', | |
horizontal_feature_name:str = '_horizontal') -> pd.DataFrame: | |
"""Get the vertical and horizontal self correlations and cross correlations from a correlation matrix. | |
Parameters | |
---------- | |
correlation_matrix : pd.DataFrame | |
Dataframe containing the correlation matrix to get the self and cross correlations from. | |
vertical_feature_name : str | |
Name of the vertical feature type to get self correlations for. Must be the suffix of some feature names in the correlation matrix. | |
horizontal_feature_name : str | |
Name of the horizontal feature type to get self correlations for. Must be the suffix of some feature names in the correlation matrix. | |
Returns | |
------- | |
vertical_correlations : pd.DataFrame | |
Dataframe containing the vertical self correlations from the correlation matrix. | |
horizontal_correlations : pd.DataFrame | |
Dataframe containing the horizontal self correlations from the correlation matrix. | |
cross_correlations : pd.DataFrame | |
Dataframe containing the cross correlations from the correlation matrix. | |
""" | |
try: | |
vertical_correlations, horizontal_correlations = getSelfCorrelations(correlation_matrix, vertical_feature_name), getSelfCorrelations(correlation_matrix, horizontal_feature_name) | |
except Exception as e: | |
msg = f"Error getting self correlations from correlation matrix: {e}" | |
logger.exception(msg) | |
raise e | |
try: | |
cross_correlations = getCrossCorrelations(correlation_matrix, vertical_feature_name, horizontal_feature_name) | |
except Exception as e: | |
msg = f"Error getting cross correlations from correlation matrix: {e}" | |
logger.exception(msg) | |
raise e | |
return vertical_correlations, horizontal_correlations, cross_correlations |
🧰 Tools
🪛 Ruff (0.8.2)
173-173: Undefined name getCrossCorrelationMatrix
(F821)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/readii/analyze/correlation.py (2)
82-106
: Fix minor grammar in error message.
The message "No features with found with {feature_type_name}" is slightly off.Here’s a quick fix:
-if self_correlations.empty: - msg = f"No features with found with {feature_type_name} suffix in the correlation matrix." +if self_correlations.empty: + msg = f"No features found with {feature_type_name} suffix in the correlation matrix."
110-137
: Correct grammar in error message.
Similarly, correct the message "No features with found with {vertical_feature_name} and {horizontal_feature_name}" to be more precise.-if cross_correlations.empty: - msg = f"No features with found with {vertical_feature_name} and {horizontal_feature_name} suffix in the correlation matrix." +if cross_correlations.empty: + msg = f"No features found with {vertical_feature_name} and {horizontal_feature_name} suffix in the correlation matrix."
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pixi.lock
is excluded by!**/*.lock
📒 Files selected for processing (3)
src/readii/analyze/__init__.py
(1 hunks)src/readii/analyze/correlation.py
(1 hunks)src/readii/analyze/plot_correlation.py
(2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
src/readii/analyze/plot_correlation.py
11-11: readii.io.writers.base_writer.BaseWriter
imported but unused
Remove unused import: readii.io.writers.base_writer.BaseWriter
(F401)
🔇 Additional comments (3)
src/readii/analyze/plot_correlation.py (1)
191-191
: Return tuple matches function docstring.
This return statement aligns with the documented triple return (figure, bin values, and bin edges).
src/readii/analyze/__init__.py (1)
3-6
: Reorganized imports and all look consistent.
These changes properly align the public API with the newly renamed methods.
Also applies to: 12-14
src/readii/analyze/correlation.py (1)
141-179
: Overall logic for consolidating self and cross correlations looks good.
The new structure is more modular and clear, allowing users to retrieve multiple subsets of the correlation matrix in a single function call.
@@ -7,6 +7,9 @@ | |||
from matplotlib.figure import Figure | |||
from scipy.linalg import issymmetric | |||
|
|||
from readii.analyze.correlation import getSelfCorrelations | |||
from readii.io.writers.base_writer import BaseWriter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Remove unused import.
The static analysis tool indicates that BaseWriter is imported but never used.
You can remove it as shown below:
-from readii.io.writers.base_writer import BaseWriter
🧰 Tools
🪛 Ruff (0.8.2)
11-11: readii.io.writers.base_writer.BaseWriter
imported but unused
Remove unused import: readii.io.writers.base_writer.BaseWriter
(F401)
|
||
def plotSelfCorrHeatmap(correlation_matrix:pd.DataFrame, | ||
feature_type_name:str, | ||
correlation_method:str = "pearson", | ||
cmap='nipy_spectral', | ||
save_path:Optional[str] = None,) -> None: | ||
|
||
self_corr = getSelfCorrelations(correlation_matrix, feature_type_name) | ||
|
||
self_corr_heatmap = plotCorrelationHeatmap(self_corr, | ||
diagonal=True, | ||
cmap=cmap, | ||
xlabel=feature_type_name, | ||
ylabel=feature_type_name, | ||
title=f"{correlation_method.capitalize()} Self Correlations", subtitle=f"{feature_type_name}") | ||
|
||
|
||
if save_path is not None: | ||
heatmap_writer = PlotWriter(root_directory = save_path / "heatmap", | ||
filename_format = "{ColorMap}/" + "{FeatureType}_{CorrelationType}_self_correlation_heatmap.png", | ||
overwrite = False, | ||
create_dirs = True | ||
) | ||
|
||
self_corr_save_path = heatmap_writer.resolve_path(FeatureType = feature_type_name, | ||
CorrelationType = correlation_method, | ||
ColorMap = cmap) | ||
if self_corr_save_path.exists(): | ||
logger.warning(f"Correlation heatmap already exists at {self_corr_save_path}.") | ||
|
||
else: | ||
logger.debug("Saving correlation heatmaps.") | ||
|
||
self_corr_save_path = heatmap_writer.save(self_corr_heatmap, FeatureType = feature_type_name, CorrelationType = correlation_method, ColorMap = cmap) | ||
|
||
return self_corr_heatmap, self_corr_save_path | ||
|
||
else: | ||
return self_corr_heatmap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Update the function’s declared return type for clarity.
Currently, the docstring and type hint (→ None) do not match the actual return values. This inconsistency might confuse users since the function returns a tuple (figure, path) when save_path is provided or just one object otherwise.
Consider unifying the return type so it's consistent, for example:
-def plotSelfCorrHeatmap(correlation_matrix:pd.DataFrame,
- feature_type_name:str,
- correlation_method:str = "pearson",
- cmap='nipy_spectral',
- save_path:Optional[str] = None,) -> None:
+def plotSelfCorrHeatmap(correlation_matrix: pd.DataFrame,
+ feature_type_name: str,
+ correlation_method: str = "pearson",
+ cmap: str = 'nipy_spectral',
+ save_path: Optional[str] = None,) -> Union[Figure, Tuple[Figure, Path]]:
And update the docstring to reflect the actual return(s).
Committable suggestion skipped: line range outside the PR's diff.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #103 +/- ##
==========================================
- Coverage 43.76% 41.91% -1.85%
==========================================
Files 33 33
Lines 1451 1453 +2
==========================================
- Hits 635 609 -26
- Misses 816 844 +28 ☔ View full report in Codecov by Sentry. |
Summary by CodeRabbit
New Features
Bug Fixes
Documentation