-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add analysis functions #92
Conversation
Had to specify numpy due to dependency clash between seaborn and med-imagetools
Adding numpy and seaborn seems to have impacted the pandas dependency that was being handled before, needs to be explicitly added now
WalkthroughThis pull request introduces enhancements to the READII project, focusing on data analysis capabilities. Key changes include the addition of a new analysis module for feature correlation, updated project dependencies, improved error handling in plot writing, and expanded platform support. The version has been incremented from 1.25.0 to 1.26.0, reflecting these improvements. Changes
Sequence DiagramsequenceDiagram
participant User
participant CorrelationModule
participant DataFrame
participant Plotter
User->>CorrelationModule: Request feature correlations
CorrelationModule->>DataFrame: Validate and compute correlations
DataFrame-->>CorrelationModule: Return correlation matrix
CorrelationModule->>Plotter: Generate correlation visualization
Plotter-->>User: Return correlation plot
Poem
Tip CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 12
🧹 Nitpick comments (10)
src/readii/analyze/plot_correlation.py (6)
29-29
: Correct typographical error in the docstring.There's a typo in the parameter description for
triangle
. "matrixto" should be corrected to "matrix to".Apply this diff to fix the typo:
- Which triangle half of the matrixto plot. The default is "lower". + Which triangle half of the matrix to plot. The default is "lower".
48-48
: Fix typo in the debug logging message.In the debug message, "traingle" should be corrected to "triangle".
Apply this diff to correct the spelling:
- logger.debug(f"Creating {triangle} traingle mask for diagonal correlation plot.") + logger.debug(f"Creating {triangle} triangle mask for diagonal correlation plot.")
62-62
: Correct typographical error in the comment.There's a typo in the comment: "visisble" should be "visible".
Apply this diff to fix the typo:
- # The entire correlation matrix will be visisble in the plot + # The entire correlation matrix will be visible in the plot
73-77
: Ensure consistent spacing in function parameters.In the
sns.heatmap
function call, spacing around=
should be consistent for readability.Apply this diff to standardize the spacing:
corr_ax = sns.heatmap(correlation_matrix_df, - mask = mask, + mask=mask, cmap=cmap, vmin=-1.0, vmax=1.0)
141-142
: Replacelogger
method.Using
logger
for consistent logging practices.Apply this diff to use the logger:
- print("Correlation matrix is symmetric.") + logger.info("Correlation matrix is symmetric.")
27-29
: Clarify thediagonal
parameter in the docstring.The description for
diagonal
might be misleading. Since settingdiagonal=True
results in plotting only half of the matrix, consider renaming it tomask_half
or updating the docstring for clarity.If renaming is not preferred, consider updating the docstring as follows:
- diagonal : bool, optional - Whether to only plot half of the matrix. The default is False. + diagonal : bool, optional + Whether to apply a mask and plot only one triangle half of the matrix. The default is False.src/readii/analyze/correlation.py (1)
50-50
: Fix typographical error in comment.Correct the typo "beginnging" to "beginning".
Apply this diff:
- # Add _ to beginnging of feature names if they don't start with _ so they can be used as suffixes + # Add _ to beginning of feature names if they don't start with _ so they can be used as suffixestests/analyze/test_correlation.py (2)
68-72
: Enhance exception testing by specifying expected error messages.When using
pytest.raises
, include thematch
parameter to ensure the correct exception is raised.Apply this diff to specify the expected error message:
with pytest.raises(ValueError): + # Expected message: "Correlation method must be one of 'pearson', 'spearman', or 'kendall'." getFeatureCorrelations(vertical_features=random_features, horizontal_features=random_features, method=correlation_method)
Or better yet:
with pytest.raises(ValueError, match="Correlation method must be one of 'pearson', 'spearman', or 'kendall'\\."): getFeatureCorrelations(vertical_features=random_features, horizontal_features=random_features, method=correlation_method)
84-93
: Enhance exception testing by specifying expected error messages.Include the
match
parameter inpytest.raises
for testingTypeError
.Apply this diff:
with pytest.raises(TypeError, match="horizontal_features must be a pandas DataFrame"): getFeatureCorrelations(vertical_features=random_features, horizontal_features=wrong_features, method="pearson") with pytest.raises(TypeError, match="vertical_features must be a pandas DataFrame"): getFeatureCorrelations(vertical_features=wrong_features, horizontal_features=random_features, method="pearson")pyproject.toml (1)
16-16
: Consider relaxing the numpy version constraintThe current pinned version
numpy==1.26.4.*
is very restrictive and might cause dependency conflicts with other packages that depend on numpy. Consider using a more flexible version range like>=1.26.4,<2
to allow for compatible updates while maintaining stability.- "numpy==1.26.4.*", + "numpy>=1.26.4,<2",
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pixi.lock
is excluded by!**/*.lock
📒 Files selected for processing (5)
pyproject.toml
(1 hunks)src/readii/analyze/__init__.py
(1 hunks)src/readii/analyze/correlation.py
(1 hunks)src/readii/analyze/plot_correlation.py
(1 hunks)tests/analyze/test_correlation.py
(1 hunks)
🔇 Additional comments (1)
pyproject.toml (1)
16-18
: Verify version compatibility
Let's ensure the new dependencies are compatible with:
- The project's Python version range (3.10-3.12)
- Each other (numpy, pandas, seaborn have interdependencies)
- Existing dependencies (especially matplotlib)
✅ Verification successful
Let me gather information about matplotlib compatibility since it's mentioned in the review.
Dependencies version compatibility verified
All dependencies are compatible:
- Python version compatibility: All packages support Python 3.10-3.12 (numpy ≥3.10, pandas ≥3.9, seaborn ≥3.8)
- Inter-package compatibility:
- seaborn 0.13.2 requires numpy ≥1.20 and pandas ≥1.2
- pandas 2.2.3 requires numpy ≥1.26.0 for Python 3.12
- matplotlib ≥3.9.2 requires numpy ≥1.23
- The specified versions in pyproject.toml (numpy 1.26.4.*, pandas ≥2.2.3, seaborn ≥0.13.2) satisfy all these constraints
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check compatibility of new dependencies
# Check Python version compatibility for new packages
for pkg in numpy pandas seaborn; do
echo "Checking $pkg compatibility..."
curl -s "https://pypi.org/pypi/$pkg/json" | jq -r '.info | "Package: \(.name)\nPython requires: \(.requires_python)\nLatest version: \(.version)"'
done
# Check if seaborn declares compatible numpy/pandas versions
curl -s "https://pypi.org/pypi/seaborn/0.13.2/json" | jq -r '.info.requires_dist[]' | grep -E 'numpy|pandas'
# Check if pandas declares compatible numpy versions
curl -s "https://pypi.org/pypi/pandas/2.2.3/json" | jq -r '.info.requires_dist[]' | grep 'numpy'
Length of output: 1371
Script:
#!/bin/bash
# Check matplotlib dependencies and version constraints
echo "Checking matplotlib in pyproject.toml..."
rg "matplotlib" pyproject.toml
echo -e "\nChecking matplotlib compatibility..."
curl -s "https://pypi.org/pypi/matplotlib/json" | jq -r '.info.requires_dist[]' | grep -E 'numpy|pandas'
# Get the specific matplotlib version if defined
echo -e "\nGetting matplotlib version constraint..."
rg -A 1 -B 1 "matplotlib" pyproject.toml
Length of output: 566
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #92 +/- ##
==========================================
- Coverage 49.04% 41.23% -7.81%
==========================================
Files 25 32 +7
Lines 991 1363 +372
==========================================
+ Hits 486 562 +76
- Misses 505 801 +296 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (3)
src/readii/io/writers/plot_writer.py (3)
9-13
: Remove Redundantpass
Statements in Exception ClassesWhen an exception class contains only a docstring, the
pass
statement is unnecessary because the docstring suffices to define the class body.Apply this diff to remove the unnecessary
pass
statements:class PlotWriterError(Exception): """Base exception for PlotWriter errors.""" - pass class PlotWriterIOError(PlotWriterError): """Raised when I/O operations fail.""" - passAlso applies to: 15-19
48-54
: Catch Specific Exceptions Instead of GeneralException
Catching all exceptions with
except Exception as e
is too broad and can mask unexpected errors. It's better to catch specific exceptions that may occur during the save operation, such asOSError
.Apply this diff to catch a more specific exception:
- except Exception as e: + except OSError as e:This change ensures that only I/O-related errors are caught, allowing other exceptions to propagate appropriately.
24-29
: Evaluate the Necessity ofmetadata
in theoverwrite
FieldThe
metadata
parameter in theoverwrite
field is currently not utilized within the class. If it's not serving a purpose, consider removing it to simplify the code.Apply this diff to remove the unused
metadata
:overwrite: bool = field( default=False, - metadata={ - "help": "If True, allows overwriting existing files. If False, raises PlotWriterIOError." - }, )If the
metadata
is intended for future use or integration with external tools (e.g., for command-line interfaces or documentation), you may choose to retain it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
notebooks/nifti_writer_example.ipynb (2)
43-43
: Consider enhancing the CSVWriter implementation.While the current implementation is functional, consider these improvements:
- Add more specific type hints for the
data
parameter (e.g.,List[Dict[str, Any]]
)- Add docstring explaining the expected data format
- Add error handling for potential pandas DataFrame conversion issues
Example enhancement:
class CSVWriter(BaseWriter): + """Writer class for saving data to CSV files. + + Args: + data (List[Dict[str, Any]]): List of dictionaries to be converted to CSV + **kwargs: Additional arguments for path resolution + + Returns: + Path: Path to the saved CSV file + + Raises: + ValueError: If data cannot be converted to DataFrame + """ def save(self, data: list, **kwargs) -> Path: + if not data: + raise ValueError("Empty data provided") output_path = self.resolve_path(**kwargs) with output_path.open('w') as f: pd.DataFrame(data).to_csv(f, index=False) return output_path
67-67
: Consider improving example code structure and configurability.The example demonstrates good practices with context managers and error handling. Consider these enhancements:
- Extract magic numbers into named constants
- Make modalities configurable
- Add more comprehensive error handling
Example enhancement:
+ # Configuration + IMAGE_SIZES = { + "small": (10, 10, 10), + "medium": (20, 20, 20), + "large": (30, 30, 30) + } + MODALITIES = ["CT", "RTSTRUCT"] + data_sets = [] random.seed(42) - for MODALITY in ["CT", "RTSTRUCT"]: + for MODALITY in MODALITIES: data_sets.extend([ { "image": sitk.Image( - 10, 10, 10, + *IMAGE_SIZES["small"], sitk.sitkInt16 ), # ... rest of the code
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
notebooks/nifti_writer_example.ipynb
(4 hunks)
🔇 Additional comments (3)
notebooks/nifti_writer_example.ipynb (3)
19-19
: LGTM! Import statements are appropriate.
The imports are well-organized and necessary for the demonstrated functionality.
Line range hint 1-212
: Notebook effectively demonstrates NIFTI writing functionality.
The notebook serves as a good example of using the NIFTI writer class and aligns well with the PR objectives. It demonstrates proper file handling, error management, and includes good practices like using context managers and setting random seeds for reproducibility.
212-212
: Verify Python version compatibility with dependencies.
The Python version has been updated to 3.12.8. Ensure all dependencies support this version.
✅ Verification successful
Python version update appears safe to proceed
The update from Python 3.12.7 to 3.12.8 is a patch-level change that only includes bug fixes. The notebook uses standard libraries and common packages (SimpleITK, pandas) that are well-maintained and compatible with Python 3.12. No explicit version constraints were found in the project configuration files that would conflict with Python 3.12.8.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if pyproject.toml specifies Python version constraints
if [ -f "pyproject.toml" ]; then
echo "Checking Python version constraints in pyproject.toml:"
grep -A 5 "python_requires" pyproject.toml
fi
# Check if requirements.txt exists and contains version constraints
if [ -f "requirements.txt" ]; then
echo "Checking requirements.txt for potential version conflicts:"
grep -i "python_version" requirements.txt
fi
Length of output: 219
Script:
#!/bin/bash
# Let's check for any setup.py or setup.cfg files that might contain Python version requirements
if [ -f "setup.py" ]; then
echo "Checking setup.py for Python version requirements:"
cat setup.py
fi
if [ -f "setup.cfg" ]; then
echo "Checking setup.cfg for Python version requirements:"
cat setup.cfg
fi
# Let's also check for any environment.yml or conda files
if [ -f "environment.yml" ]; then
echo "Checking environment.yml for Python version:"
cat environment.yml
fi
# Let's check the notebook metadata for any kernel specifications
echo "Checking notebook kernel specifications:"
fd -e ipynb . -x jq -r '.metadata.kernelspec // empty'
Length of output: 759
Script:
#!/bin/bash
# Let's check for any pixi.toml or other dependency files that might specify Python version
if [ -f "pixi.toml" ]; then
echo "Checking pixi.toml for Python version requirements:"
cat pixi.toml
fi
# Let's also check for any Python imports in the notebook to see what dependencies are used
echo "Checking imports in the notebook:"
jq -r '.cells[] | select(.cell_type == "code") | .source[]' notebooks/nifti_writer_example.ipynb | grep -E "^import|^from.*import"
Length of output: 485
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (2)
src/readii/analyze/plot_correlation.py (2)
31-31
: Fix typographical error in docstringThere's a missing space in the docstring. Correct "matrixto" to "matrix to".
Apply this diff to fix the typo:
- Which triangle half of the matrixto plot. The default is "lower". + Which triangle half of the matrix to plot. The default is "lower".
64-64
: Fix typographical error in commentCorrect "visisble" to "visible" in the comment.
Apply this diff:
- # The entire correlation matrix will be visisble in the plot + # The entire correlation matrix will be visible in the plot
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pixi.lock
is excluded by!**/*.lock
📒 Files selected for processing (8)
.github/workflows/ci-cd.yml
(1 hunks)CHANGELOG.md
(1 hunks)pyproject.toml
(3 hunks)src/readii/__init__.py
(1 hunks)src/readii/analyze/__init__.py
(1 hunks)src/readii/analyze/correlation.py
(1 hunks)src/readii/analyze/plot_correlation.py
(1 hunks)src/readii/io/writers/plot_writer.py
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- src/readii/init.py
🚧 Files skipped from review as they are similar to previous changes (2)
- src/readii/analyze/init.py
- pyproject.toml
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.md
[uncategorized] ~12-~12: Use a comma before ‘so’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...r mac platform to the pixi configuration so developers with those machines can be...
(COMMA_COMPOUND_SENTENCE_2)
🪛 Markdownlint (0.37.0)
CHANGELOG.md
20-20: Column: 20
Hard tabs
(MD010, no-hard-tabs)
20-20: Column: 80
Hard tabs
(MD010, no-hard-tabs)
21-21: Column: 24
Hard tabs
(MD010, no-hard-tabs)
🔇 Additional comments (14)
.github/workflows/ci-cd.yml (1)
19-19
: LGTM! Good addition of Windows testing environment.
Adding windows-latest
to the OS matrix improves cross-platform compatibility testing coverage.
CHANGELOG.md (1)
4-26
: LGTM! Well-structured changelog entry.
The changelog entry follows best practices:
- Uses conventional commit format
- Includes PR link and commit hash
- Provides clear description of changes
- Contains auto-generated summary
🧰 Tools
🪛 LanguageTool
[uncategorized] ~12-~12: Use a comma before ‘so’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...r mac platform to the pixi configuration so developers with those machines can be...
(COMMA_COMPOUND_SENTENCE_2)
🪛 Markdownlint (0.37.0)
20-20: Column: 20
Hard tabs
(MD010, no-hard-tabs)
20-20: Column: 80
Hard tabs
(MD010, no-hard-tabs)
21-21: Column: 24
Hard tabs
(MD010, no-hard-tabs)
src/readii/analyze/plot_correlation.py (1)
61-61
: Include error message when raising ValueError
Currently, ValueError
is raised without an error message. Including the message improves debugging and clarity.
Apply this diff to include the error message:
- raise ValueError()
+ raise ValueError(msg)
src/readii/analyze/correlation.py (10)
33-33
: Include error message when raising TypeError
Currently, TypeError
is raised without an error message. Including the message improves debugging and clarity.
Apply this diff:
logger.exception(msg)
- raise TypeError()
+ raise TypeError(msg)
37-37
: Include error message when raising TypeError
Consistent exception messages enhance debugging.
Apply this diff:
logger.exception(msg)
- raise TypeError()
+ raise TypeError(msg)
42-42
: Include error message when raising ValueError
Adding the error message improves clarity on the exception raised.
Apply this diff:
logger.exception(msg)
- raise ValueError()
+ raise ValueError(msg)
47-47
: Include error message when raising ValueError
Including the message aids in debugging.
Apply this diff:
logger.exception(msg)
- raise ValueError()
+ raise ValueError(msg)
93-93
: Include error message when raising ValueError
Consistent error messages provide better debugging information.
Apply this diff:
logger.exception(msg)
- raise ValueError()
+ raise ValueError(msg)
98-98
: Include error message when raising ValueError
Same as above for consistency.
Apply this diff:
logger.exception(msg)
- raise ValueError()
+ raise ValueError(msg)
125-125
: Include error message when raising ValueError
Adding the message enhances clarity.
Apply this diff:
logger.exception(msg)
- raise ValueError()
+ raise ValueError(msg)
130-130
: Include error message when raising ValueError
Include the error message for better debugging.
Apply this diff:
logger.exception(msg)
- raise ValueError()
+ raise ValueError(msg)
160-160
: Include error message when raising ValueError
Consistent exception handling improves maintainability.
Apply this diff:
logger.exception(msg)
- raise ValueError()
+ raise ValueError(msg)
165-165
: Include error message when raising ValueError
Adding the message aids in debugging.
Apply this diff:
logger.exception(msg)
- raise ValueError()
+ raise ValueError(msg)
src/readii/io/writers/plot_writer.py (1)
1-72
: LGTM!
The PlotWriter
class is well-implemented with appropriate exception handling and logging.
… name values Co-authored-by: Jermiah Joseph <[email protected]>
…s out of bounds for that dataframe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/readii/analyze/correlation.py (1)
31-48
:⚠️ Potential issueInclude error messages in exceptions
Error messages are logged but not included in the exceptions, which makes debugging harder.
Apply this diff:
if not isinstance(vertical_features, pd.DataFrame): msg = "vertical_features must be a pandas DataFrame" logger.exception(msg) - raise TypeError() + raise TypeError(msg) if not isinstance(horizontal_features, pd.DataFrame): msg = "horizontal_features must be a pandas DataFrame" logger.exception(msg) - raise TypeError() + raise TypeError(msg) if method not in ["pearson", "spearman", "kendall"]: msg = "Correlation method must be one of 'pearson', 'spearman', or 'kendall'." logger.exception(msg) - raise ValueError() + raise ValueError(msg) if not vertical_features.index.equals(horizontal_features.index): msg = "Vertical and horizontal features must have the same index to calculate correlation. Set the index to the intersection of patient IDs." logger.exception(msg) - raise ValueError() + raise ValueError(msg)
🧹 Nitpick comments (6)
src/readii/analyze/correlation.py (4)
5-9
: Add spaces after colons in type annotationsAccording to PEP 8, there should be a space after the colon in type annotations.
Apply this diff:
-def getFeatureCorrelations(vertical_features:pd.DataFrame, - horizontal_features:pd.DataFrame, - method:str = "pearson", - vertical_feature_name:str = '_vertical', - horizontal_feature_name:str = '_horizontal') -> pd.DataFrame: +def getFeatureCorrelations(vertical_features: pd.DataFrame, + horizontal_features: pd.DataFrame, + method: str = "pearson", + vertical_feature_name: str = '_vertical', + horizontal_feature_name: str = '_horizontal') -> pd.DataFrame:
94-94
: Remove unnecessary f-string prefixThe string doesn't contain any placeholders, so the f-string prefix is unnecessary.
Apply this diff:
- msg = f"Number of vertical features provided is greater than the number of rows or columns in the correlation matrix." + msg = "Number of vertical features provided is greater than the number of rows or columns in the correlation matrix."🧰 Tools
🪛 Ruff (0.8.2)
94-94: f-string without any placeholders
Remove extraneous
f
prefix(F541)
122-122
: Fix typo and remove unnecessary f-string prefixThere's a typo in the error message ("horizontalfeatures" should be "horizontal features"), and the f-string prefix is unnecessary.
Apply this diff:
- msg = f"Number of horizontalfeatures provided is greater than the number of rows or columns in the correlation matrix." + msg = "Number of horizontal features provided is greater than the number of rows or columns in the correlation matrix."🧰 Tools
🪛 Ruff (0.8.2)
122-122: f-string without any placeholders
Remove extraneous
f
prefix(F541)
154-154
: Remove unnecessary f-string prefixThe string doesn't contain any placeholders, so the f-string prefix is unnecessary.
Apply this diff:
- msg = f"Number of vertical features provided is greater than the number of rows or columns in the correlation matrix." + msg = "Number of vertical features provided is greater than the number of rows or columns in the correlation matrix."🧰 Tools
🪛 Ruff (0.8.2)
154-154: f-string without any placeholders
Remove extraneous
f
prefix(F541)
src/readii/io/writers/plot_writer.py (2)
54-80
: Enhance docstring with parameter descriptions.The save method implementation is robust, but the docstring could be more descriptive.
- """Save the plot to a .png file.""" + """Save the plot to a file. + + Args: + plot: The matplotlib Figure to save + **kwargs: Format specifiers for the output filename + + Returns: + Path: The path where the plot was saved + + Raises: + PlotWriterIOError: If the file exists and overwrite=False, + or if there's an error saving the plot + + Note: + Uses bbox_inches='tight' to remove extra whitespace around the figure + """
83-93
: Consider adding more comprehensive usage examples.While the current example is helpful, consider adding examples that demonstrate:
- Different file formats
- Error handling scenarios
- Various plot types
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
src/readii/analyze/correlation.py
(1 hunks)src/readii/analyze/plot_correlation.py
(1 hunks)src/readii/data/select.py
(1 hunks)src/readii/io/writers/plot_writer.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/readii/analyze/plot_correlation.py
🧰 Additional context used
🪛 Ruff (0.8.2)
src/readii/analyze/correlation.py
94-94: f-string without any placeholders
Remove extraneous f
prefix
(F541)
122-122: f-string without any placeholders
Remove extraneous f
prefix
(F541)
154-154: f-string without any placeholders
Remove extraneous f
prefix
(F541)
🔇 Additional comments (5)
src/readii/analyze/correlation.py (1)
50-71
: LGTM! Well-implemented correlation calculation
The code properly handles feature name prefixes, uses inner join for data alignment, and has appropriate error handling for the correlation calculation.
src/readii/io/writers/plot_writer.py (4)
1-20
: LGTM! Well-structured exception hierarchy.
The imports are clean and the exception hierarchy is well designed with a base class and specific error types.
22-26
: LGTM! Clear and purposeful validation exception.
The validation exception is well documented and fits appropriately in the exception hierarchy.
31-36
: Ensure consistency between help text and raised exception.
The metadata help text states that a FileExistsError
will be raised, but the code raises PlotWriterIOError
instead.
Choose one of these options:
# Option 1: Update help text
metadata={
- "help": "If True, allows overwriting existing files. If False, raises FileExistsError."
+ "help": "If True, allows overwriting existing files. If False, raises PlotWriterIOError."
},
# Option 2: Use built-in exception
- raise PlotWriterIOError(msg)
+ raise FileExistsError(msg)
46-53
: LGTM! Robust validation with clear error messaging.
The post-init validation effectively ensures valid file extensions and provides a helpful error message.
… VALID_EXTENSIONS to a tuple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (3)
src/readii/analyze/plot_correlation.py (1)
62-62
:⚠️ Potential issueInclude error message in ValueError.
The error message should be included when raising the ValueError for better debugging.
Apply this fix:
- raise ValueError() + raise ValueError(msg)src/readii/analyze/correlation.py (2)
33-40
:⚠️ Potential issueInclude error messages in TypeError raises.
Error messages should be included when raising TypeError for better debugging.
Apply these fixes:
- raise TypeError() + raise TypeError(msg) - raise TypeError() + raise TypeError(msg)
42-45
:⚠️ Potential issueInclude error message in ValueError raise.
Error message should be included when raising ValueError for better debugging.
Apply this fix:
- raise ValueError() + raise ValueError(msg)
🧹 Nitpick comments (5)
src/readii/io/writers/plot_writer.py (2)
91-92
: Consider adding DPI parameter for better control over image qualityWhen saving plots, especially for publication or high-resolution displays, controlling the DPI (dots per inch) can be important.
- plot.savefig(out_path, bbox_inches='tight') + plot.savefig(out_path, bbox_inches='tight', dpi=300) # or make DPI configurable
94-96
: Enhance error logging for better debuggingConsider logging the full exception traceback for better debugging capabilities.
except Exception as e: + logger.exception(f"Failed to save plot to {out_path}") msg = f"Error saving plot to file {out_path}: {e}" raise PlotWriterIOError(msg) from e
src/readii/analyze/plot_correlation.py (2)
73-80
: Consider memory optimization for large correlation matrices.For large correlation matrices, consider adding a parameter to downsample or chunk the data before visualization to prevent memory issues.
109-109
: Fix typo in docstring.There's a spacing issue in the docstring.
- """Plot a histogram to show thedistribution of correlation values for a correlation matrix. + """Plot a histogram to show the distribution of correlation values for a correlation matrix.src/readii/analyze/correlation.py (1)
124-124
: Fix typo in error message.There's a spacing issue in the error message.
- msg = "Number of horizontalfeatures provided is greater than the number of rows or columns in the correlation matrix." + msg = "Number of horizontal features provided is greater than the number of rows or columns in the correlation matrix."
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pixi.lock
is excluded by!**/*.lock
📒 Files selected for processing (7)
config/ruff.toml
(1 hunks)src/readii/analyze/__init__.py
(1 hunks)src/readii/analyze/correlation.py
(1 hunks)src/readii/analyze/plot_correlation.py
(1 hunks)src/readii/image_processing.py
(0 hunks)src/readii/io/loaders/features.py
(2 hunks)src/readii/io/writers/plot_writer.py
(1 hunks)
💤 Files with no reviewable changes (1)
- src/readii/image_processing.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/readii/analyze/init.py
🔇 Additional comments (6)
src/readii/io/loaders/features.py (1)
56-56
: LGTM: Exception handling improvement
Good cleanup of unused exception variables in the catch blocks. This follows Python best practices and improves code clarity.
Also applies to: 73-73
src/readii/io/writers/plot_writer.py (2)
34-35
:
Update help text to match actual exception
The help text mentions FileExistsError
but the code raises PlotWriterIOError
.
- "help": "If True, allows overwriting existing files. If False, raises FileExistsError."
+ "help": "If True, allows overwriting existing files. If False, raises PlotWriterIOError."
Likely invalid or redundant comment.
38-44
:
Fix type annotation for VALID_EXTENSIONS
The type annotation indicates a list but the value is a tuple. This mismatch should be corrected.
- VALID_EXTENSIONS: ClassVar[list[str]] = (
+ VALID_EXTENSIONS: ClassVar[tuple[str, ...]] = (
Likely invalid or redundant comment.
config/ruff.toml (1)
9-11
: LGTM: Appropriate path additions
The new include paths correctly cover the new analysis functionality being added to the project.
src/readii/analyze/plot_correlation.py (1)
1-11
: LGTM! Clean and appropriate imports.
The imports are well-organized and necessary for the functionality.
src/readii/analyze/correlation.py (1)
93-98
: Refactor duplicate validation logic.
The validation logic is repeated across functions. Consider creating a helper function.
Create a helper function to handle the validation:
def validate_feature_count(correlation_matrix: pd.DataFrame, feature_count: int, feature_type: str):
try:
validateDataframeSubsetSelection(correlation_matrix, feature_count, feature_count)
except ValueError as e:
msg = f"Number of {feature_type} features provided is greater than the number of rows or columns in the correlation matrix."
logger.exception(msg)
raise e
Also applies to: 121-126, 152-157
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/readii/analyze/plot_correlation.py (2)
13-49
: Consider using a more specific return type annotation.While the implementation is good, the return type annotation could be more specific by using
matplotlib.figure.Figure
instead of justFigure
.Apply this diff:
- ) -> Figure: + ) -> matplotlib.figure.Figure:
114-114
: Fix typo in docstring.There's a typo in the docstring: "thedistribution" should be "the distribution".
Apply this diff:
- """Plot a histogram to show thedistribution of correlation values for a correlation matrix. + """Plot a histogram to show the distribution of correlation values for a correlation matrix.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/readii/analyze/plot_correlation.py
(1 hunks)src/readii/io/writers/plot_writer.py
(1 hunks)
🔇 Additional comments (6)
src/readii/io/writers/plot_writer.py (4)
1-26
: LGTM! Well-structured exception hierarchy.
The exception classes are properly organized with clear inheritance and docstrings.
46-53
: LGTM! Good validation implementation.
The post-init validation is well implemented with clear error messages.
102-112
: LGTM! Good example implementation.
The main block provides a clear example of how to use the PlotWriter class.
38-44
:
Fix type annotation for VALID_EXTENSIONS.
The type annotation indicates list[str]
but the actual value is a tuple. This inconsistency should be fixed.
Apply this diff to fix the type annotation:
- VALID_EXTENSIONS: ClassVar[list[str]] = (
+ VALID_EXTENSIONS: ClassVar[tuple[str, ...]] = (
Likely invalid or redundant comment.
src/readii/analyze/plot_correlation.py (2)
164-166
:
Include error message in ValueError.
The error message is logged but not included in the raised ValueError.
Apply this diff:
msg = f"Number of bins must be positive, got {num_bins}"
logger.exception(msg)
- raise ValueError()
+ raise ValueError(msg)
Likely invalid or redundant comment.
60-62
:
Include error message in ValueError.
The error message is logged but not included in the raised ValueError.
Apply this diff:
msg = f"If diagonal is True, triangle must be either 'lower' or 'upper'. Got {triangle}."
logger.exception(msg)
- raise ValueError()
+ raise ValueError(msg)
Likely invalid or redundant comment.
src/readii/io/writers/plot_writer.py
Outdated
def save(self, plot:matplotlib.figure.Figure, **kwargs: str) -> Path: | ||
"""Save the plot to a .png file with extra whitespace removed. | ||
|
||
Parameters | ||
---------- | ||
plot : matplotlib.figure.Figure | ||
The plot to save. | ||
kwargs : dict | ||
Additional keyword arguments to pass to the save method. | ||
|
||
Returns | ||
------- | ||
Path | ||
The path to the saved file. | ||
|
||
Raises | ||
------ | ||
PlotWriterIOError | ||
If there is an error saving the plot. | ||
PlotWriterValidationError | ||
If the filename format is invalid. | ||
""" | ||
logger.debug("Saving.", kwargs=kwargs) | ||
|
||
# Generate the output path | ||
out_path = self.resolve_path(**kwargs) | ||
|
||
# Check if the output path already exists | ||
if out_path.exists(): | ||
if not self.overwrite: | ||
msg = f"File {out_path} already exists. \nSet {self.__class__.__name__}.overwrite to True to overwrite." | ||
raise PlotWriterIOError(msg) | ||
else: | ||
logger.warning(f"File {out_path} already exists. Overwriting.") | ||
|
||
logger.debug("Saving plot to file", out_path=out_path) | ||
try: | ||
# Save the plot to the output path, remove extra whitespace around the figure | ||
plot.savefig(out_path, bbox_inches='tight') | ||
|
||
except Exception as e: | ||
msg = f"Error saving plot to file {out_path}: {e}" | ||
raise PlotWriterIOError(msg) from e | ||
else: | ||
logger.info("Plot saved successfully.", out_path=out_path) | ||
return out_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add type validation for the plot parameter.
While the method implementation is good, it should validate that the plot parameter is actually a Figure object before proceeding.
Add this validation at the start of the method:
def save(self, plot:matplotlib.figure.Figure, **kwargs: str) -> Path:
+ if not isinstance(plot, matplotlib.figure.Figure):
+ msg = f"Expected matplotlib.figure.Figure, got {type(plot)}"
+ raise PlotWriterValidationError(msg)
+
logger.debug("Saving.", kwargs=kwargs)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def save(self, plot:matplotlib.figure.Figure, **kwargs: str) -> Path: | |
"""Save the plot to a .png file with extra whitespace removed. | |
Parameters | |
---------- | |
plot : matplotlib.figure.Figure | |
The plot to save. | |
kwargs : dict | |
Additional keyword arguments to pass to the save method. | |
Returns | |
------- | |
Path | |
The path to the saved file. | |
Raises | |
------ | |
PlotWriterIOError | |
If there is an error saving the plot. | |
PlotWriterValidationError | |
If the filename format is invalid. | |
""" | |
logger.debug("Saving.", kwargs=kwargs) | |
# Generate the output path | |
out_path = self.resolve_path(**kwargs) | |
# Check if the output path already exists | |
if out_path.exists(): | |
if not self.overwrite: | |
msg = f"File {out_path} already exists. \nSet {self.__class__.__name__}.overwrite to True to overwrite." | |
raise PlotWriterIOError(msg) | |
else: | |
logger.warning(f"File {out_path} already exists. Overwriting.") | |
logger.debug("Saving plot to file", out_path=out_path) | |
try: | |
# Save the plot to the output path, remove extra whitespace around the figure | |
plot.savefig(out_path, bbox_inches='tight') | |
except Exception as e: | |
msg = f"Error saving plot to file {out_path}: {e}" | |
raise PlotWriterIOError(msg) from e | |
else: | |
logger.info("Plot saved successfully.", out_path=out_path) | |
return out_path | |
def save(self, plot:matplotlib.figure.Figure, **kwargs: str) -> Path: | |
if not isinstance(plot, matplotlib.figure.Figure): | |
msg = f"Expected matplotlib.figure.Figure, got {type(plot)}" | |
raise PlotWriterValidationError(msg) | |
logger.debug("Saving.", kwargs=kwargs) | |
# Generate the output path | |
out_path = self.resolve_path(**kwargs) | |
# Check if the output path already exists | |
if out_path.exists(): | |
if not self.overwrite: | |
msg = f"File {out_path} already exists. \nSet {self.__class__.__name__}.overwrite to True to overwrite." | |
raise PlotWriterIOError(msg) | |
else: | |
logger.warning(f"File {out_path} already exists. Overwriting.") | |
logger.debug("Saving plot to file", out_path=out_path) | |
try: | |
# Save the plot to the output path, remove extra whitespace around the figure | |
plot.savefig(out_path, bbox_inches='tight') | |
except Exception as e: | |
msg = f"Error saving plot to file {out_path}: {e}" | |
raise PlotWriterIOError(msg) from e | |
else: | |
logger.info("Plot saved successfully.", out_path=out_path) | |
return out_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/readii/io/writers/plot_writer.py (2)
11-26
: Remove unnecessarypass
statements from exception classesSince the exception classes have docstrings, the
pass
statements are redundant and can be removed for cleaner code.class PlotWriterError(Exception): """Base exception for PlotWriter errors.""" - pass class PlotWriterIOError(PlotWriterError): """Raised when I/O operations fail.""" - pass class PlotWriterValidationError(PlotWriterError): """Raised when validation of writer configuration fails.""" - pass
106-111
: Consider using a more conventional example pathThe example uses "TRASH" in the path, which isn't a conventional directory name. Consider using a more standard path like "examples" or "output".
plot_writer = PlotWriter( - root_directory=Path("TRASH", "plot_writer_examples"), + root_directory=Path("examples", "plots"), filename_format="{DatasetName}_{PlotType}_{FeatureType}_{ImageType}_plot.png", overwrite=True, create_dirs=True )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
config/ruff.toml
(1 hunks)src/readii/io/writers/plot_writer.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- config/ruff.toml
🔇 Additional comments (3)
src/readii/io/writers/plot_writer.py (3)
39-45
: Use tuple instead of list for VALID_EXTENSIONS
The comment indicates that extensions should be immutable, but the implementation uses a list. Additionally, the syntax used is for a tuple but the type annotation indicates a list.
- VALID_EXTENSIONS: ClassVar[list[str]] = (
+ VALID_EXTENSIONS: ClassVar[tuple[str, ...]] = (
".png",
".pdf",
".ps",
".eps",
".svg"
)
47-54
: LGTM! Well-implemented validation
The post-initialization validation is thorough and provides clear error messages. Good practice calling the parent's __post_init__
method first.
55-100
: 🛠️ Refactor suggestion
Add type validation for the plot parameter
While the implementation is solid, it should validate that the plot parameter is actually a Figure object before proceeding.
Add this validation at the start of the method:
def save(self, plot:Figure, **kwargs: str) -> Path:
+ if not isinstance(plot, Figure):
+ msg = f"Expected matplotlib.figure.Figure, got {type(plot)}"
+ raise PlotWriterValidationError(msg)
+
logger.debug("Saving.", kwargs=kwargs)
Likely invalid or redundant comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
src/readii/analyze/correlation.py (2)
33-40
:⚠️ Potential issueInclude error messages in TypeError exceptions
Error messages are logged but not included in the raised exceptions, which loses context for error handling.
if not isinstance(vertical_features, pd.DataFrame): msg = "vertical_features must be a pandas DataFrame" logger.exception(msg) - raise TypeError() + raise TypeError(msg) if not isinstance(horizontal_features, pd.DataFrame): msg = "horizontal_features must be a pandas DataFrame" logger.exception(msg) - raise TypeError() + raise TypeError(msg)
48-51
:⚠️ Potential issueInclude error message in ValueError for invalid correlation method
The error message should be included in the ValueError for better error handling.
if method not in ["pearson", "spearman", "kendall"]: msg = "Correlation method must be one of 'pearson', 'spearman', or 'kendall'." logger.exception(msg) - raise ValueError() + raise ValueError(msg)
🧹 Nitpick comments (4)
src/readii/analyze/correlation.py (4)
12-12
: Use imperative form in docstringPer PEP-257, one-line docstrings should be in imperative form.
- """Calculate correlation between two sets of features. + """Calculate correlations between two sets of features and return correlation matrix.
85-85
: Enhance docstring clarityThe docstring could better explain the matrix quadrant selection and its significance.
- """Get the vertical (y-axis) self correlations from a correlation matrix. Gets the top left quadrant of the correlation matrix. + """Extract vertical feature self-correlations from the top-left quadrant of the correlation matrix. + + This quadrant represents how vertical features correlate with each other.
130-130
: Fix typo in error messageThere's a missing space in "horizontalfeatures".
- msg = "Number of horizontalfeatures provided is greater than the number of rows or columns in the correlation matrix." + msg = "Number of horizontal features provided is greater than the number of rows or columns in the correlation matrix."
99-104
: Consider refactoring common error handling patternThe error handling pattern is repeated across functions. Consider extracting it into a helper function.
def handle_validation_error(e: ValueError, context: str) -> None: msg = f"Number of {context} features provided is greater than the number of rows or columns in the correlation matrix." logger.exception(msg) raise e # Usage in functions: try: validateDataframeSubsetSelection(correlation_matrix, num_vertical_features, num_vertical_features) except ValueError as e: handle_validation_error(e, "vertical")Also applies to: 127-132, 158-163
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/readii/analyze/correlation.py
(1 hunks)
🔇 Additional comments (2)
src/readii/analyze/correlation.py (2)
1-6
: LGTM! Clean import structure
The imports are well-organized and appropriately separated, importing only the necessary dependencies.
142-165
: LGTM! Well-implemented cross-correlation extraction
The function follows consistent patterns with proper validation and error handling. The implementation correctly extracts the top-right quadrant for cross-correlations.
Includes correlation calculations and plotting those correlations as a heatmap and histogram
Summary by CodeRabbit
Release Notes
New Features
numpy
,seaborn
, andpandas
.osx-64
andwin-64
.Bug Fixes
Documentation
Chores