Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: integrate analyze readii outputs functions #79

Closed
wants to merge 64 commits into from

Conversation

strixy16
Copy link
Collaborator

@strixy16 strixy16 commented Dec 9, 2024

Summary by CodeRabbit

  • New Features

    • Added functions for correlation analysis and visualization, including correlation matrix calculations and heatmap plotting.
    • Introduced functions for patient data manipulation, including patient ID handling and outcome standardization.
    • Added functions for DataFrame filtering and splitting based on column values.
    • New functions for loading and saving image datasets and plots.
    • Enhanced dataset configuration loading and data import capabilities.
    • Added new functions for loading extracted imaging feature sets from a directory.
  • Bug Fixes

    • Enhanced the loadRTSTRUCTSITK function for improved data handling.
  • Tests

    • Introduced unit tests for correlation functionalities to ensure expected behavior.
    • Added comprehensive tests for new correlation functions, covering various scenarios and input validations.

…ion and to split data by column values like for train and test
@strixy16 strixy16 linked an issue Dec 9, 2024 that may be closed by this pull request
Copy link
Contributor

coderabbitai bot commented Dec 9, 2024

Walkthrough

The changes in this pull request primarily involve the addition of new functionality and dependencies to the project. The pyproject.toml file has been updated to include new dependencies for numpy and seaborn. Several new Python files have been introduced, focusing on correlation analysis, data manipulation, and visualization, along with functions to handle patient data. Additionally, a new test file has been created to validate the correlation functionalities. Minor formatting adjustments were also made in existing files.

Changes

File Path Change Summary
pyproject.toml Added dependencies: numpy==1.26.4, seaborn>=0.13.2,<0.14. Existing dependency for pyradiomics-bhklab>=3.1.4,<4 remains unchanged.
src/readii/__init__.py Removed a blank line after __version__ declaration.
src/readii/analyze/correlation.py Introduced functions for correlation analysis: getFeatureCorrelations, plotCorrelationHeatmap, getVerticalSelfCorrelations, getHorizontalSelfCorrelations, getCrossCorrelationMatrix.
src/readii/analyze/plot_correlations.py Added functions for visualizing correlations: plotCorrelationHeatmap, plotSelfCorrelationHeatMaps, plotCrossCorrelationHeatmap, plotCorrelationDistribution, plotSelfCorrelationDistributionPlots, plotCrossCorrelationDistributionPlots.
src/readii/data/label.py Introduced functions for patient data manipulation: getPatientIdentifierLabel, setPatientIdAsIndex, convertDaysToYears, timeOutcomeColumnSetup, survivalStatusToNumericMapping, eventOutcomeColumnSetup, addOutcomeLabels.
src/readii/data/select.py Added functions for DataFrame manipulation: dropUpToFeature, selectByColumnValue, getOnlyPyradiomicsFeatures, getPatientIntersectionDataframes.
src/readii/data/split.py Introduced functions for DataFrame processing: replaceColumnValues, splitDataByColumnValue.
src/readii/io/loaders/features.py Added function loadFeatureFilesFromImageTypes for loading imaging feature sets.
src/readii/io/loaders/general.py Introduced functions: loadImageDatasetConfig, loadFileToDataFrame for dataset configuration and loading data.
src/readii/io/loaders/images.py Added function getImageTypesFromDirectory for retrieving image types from a directory.
src/readii/io/writers/plots.py Introduced function saveSeabornPlotFigure for saving Seaborn plots to files.
src/readii/loaders.py Updated loadRTSTRUCTSITK function to include a new parameter continuous in the StructureSetToSegmentation instantiation.
tests/test_correlation.py Created a new test file with unit tests for correlation functionalities, including a fixture for generating random feature matrices and tests for getFeatureCorrelations and plotCorrelationHeatmap.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CorrelationModule
    participant DataFrame
    participant VisualizationModule

    User->>CorrelationModule: getFeatureCorrelations()
    CorrelationModule->>DataFrame: Compute correlation matrix
    DataFrame-->>CorrelationModule: Return correlation matrix
    CorrelationModule->>VisualizationModule: plotCorrelationHeatmap()
    VisualizationModule-->>User: Display heatmap
Loading

🐰 "In the code, new features hop,
With numpy and seaborn on top.
Correlations dance, plots take flight,
Data's magic shines so bright.
With tests to ensure all's well,
In our rabbit hole, we surely excel!" 🐇


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Outside diff range and nitpick comments (13)
src/readii/io/loaders/features.py (1)

62-62: Remove unused variable e in exception handling

The variable e is assigned but never used in the except block. If you don't need the exception object, you can omit it.

Apply this diff to address the issue:

-        except Exception as e:
+        except Exception:

Alternatively, if you intend to use the exception message, include it in your error handling.

🧰 Tools
🪛 Ruff (0.8.0)

62-62: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

src/readii/data/split.py (1)

29-29: Simplify dictionary iteration by removing unnecessary .keys()

When iterating over a dictionary, you can iterate directly over the keys without calling .keys(). This is more Pythonic and slightly more efficient.

Apply this diff to simplify the code:

-        for new_value in replacement_value_data.keys():
+        for new_value in replacement_value_data:
-        for split_column_name in split_col_data.keys():
+        for split_column_name in split_col_data:

Also applies to: 65-65

🧰 Tools
🪛 Ruff (0.8.0)

29-29: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

src/readii/data/select.py (1)

78-78: Simplify dictionary iteration by removing unnecessary .keys()

When iterating over a dictionary, iterate directly over the keys without using .keys(). This improves readability.

Apply this diff to enhance the code:

-                for key in include_col_values.keys():
+                for key in include_col_values:
-                for key in exclude_col_values.keys():
+                for key in exclude_col_values:

Also applies to: 85-85

🧰 Tools
🪛 Ruff (0.8.0)

78-78: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

src/readii/analyze/correlation.py (2)

44-45: Follow PEP 8 style guidelines: One statement per line

Having multiple statements on one line can reduce code readability. It's recommended to place each statement on its own line.

Apply this diff to adhere to style guidelines:

-        if vertical_feature_name: vertical_feature_name = f"_{vertical_feature_name}"
-        if horizontal_feature_name: horizontal_feature_name = f"_{horizontal_feature_name}"
+        if vertical_feature_name:
+            vertical_feature_name = f"_{vertical_feature_name}"
+        if horizontal_feature_name:
+            horizontal_feature_name = f"_{horizontal_feature_name}"
🧰 Tools
🪛 Ruff (0.8.0)

44-44: Multiple statements on one line (colon)

(E701)


45-45: Multiple statements on one line (colon)

(E701)


58-58: Preserve exception context when re-raising exceptions

When re-raising an exception, use raise ... from ... to maintain the original exception context. This helps with debugging by preserving the traceback of the original exception.

Apply this diff to improve exception handling:

-        raise ValueError(f"Error calculating correlation matrix: {e}")
+        raise ValueError(f"Error calculating correlation matrix: {e}") from e
🧰 Tools
🪛 Ruff (0.8.0)

58-58: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

src/readii/data/label.py (2)

34-34: Fix typographical error in error message

The word "recognizeable" is misspelled. It should be "recognizable".

Apply this diff to correct the typo:

-    raise ValueError("Dataframe doesn't have a recognizeable patient ID column. Must contain patient or case ID.")
+    raise ValueError("Dataframe doesn't have a recognizable patient ID column. Must contain patient or case ID.")

241-241: Avoid shadowing built-in functions

The variable map shadows the built-in function map. Consider renaming it to avoid confusion.

Apply this diff to rename the variable:

-if not all(value == map for value, map in zip(existing_event_values, sorted(event_column_value_mapping.keys()))):
+if not all(value == mapped_value for value, mapped_value in zip(existing_event_values, sorted(event_column_value_mapping.keys()))):
src/readii/analyze/plot_correlations.py (1)

254-254: Replace print statement with logging

Using print statements in library code is not recommended. Consider using the logging module to provide informational messages.

Apply this diff to use logging:

-    print("Correlation matrix is symmetric.")
+    import logging
+    logging.info("Correlation matrix is symmetric.")
src/readii/io/writers/plots.py (1)

20-20: Update docstring to match parameter type

The output_dir_path parameter accepts both Path and str types. Update the docstring for accuracy.

Apply this diff to update the docstring:

-        output_dir_path : str
+        output_dir_path : Union[Path, str]
tests/test_correlation.py (2)

3-3: Remove unused import plotCorrelationHeatmap

The function plotCorrelationHeatmap is imported but not used in this file.

Apply this diff to remove the unused import:

-from readii.analyze.correlation import (
-    getFeatureCorrelations,
-    plotCorrelationHeatmap,
-)
+from readii.analyze.correlation import getFeatureCorrelations
🧰 Tools
🪛 Ruff (0.8.0)

3-3: readii.analyze.correlation.plotCorrelationHeatmap imported but unused

Remove unused import: readii.analyze.correlation.plotCorrelationHeatmap

(F401)


6-6: Remove unused import dropUpToFeature

The function dropUpToFeature is imported but not used.

Apply this diff to remove the unused import:

-from readii.data.process import dropUpToFeature
🧰 Tools
🪛 Ruff (0.8.0)

6-6: readii.data.process.dropUpToFeature imported but unused

Remove unused import: readii.data.process.dropUpToFeature

(F401)

src/readii/io/loaders/images.py (2)

42-43: Fix typo in comment

There's a typo in the comment: "banes" should be "names"

-    # Get list of file banes with the specified prefix and suffix in the directory
+    # Get list of file names with the specified prefix and suffix in the directory

40-52: Consider sorting image types for consistent output

The function returns image types in arbitrary order based on filesystem traversal.

Consider sorting the list before returning:

     # Initialize an empty list to store the image types
     image_types = []
 
     for file in raw_data_dir.glob(f"{feature_file_prefix}*{feature_file_suffix}"):
         file_name = file.name
         image_type = file_name.removeprefix(feature_file_prefix).removesuffix(feature_file_suffix)
         image_types.append(image_type)
 
-    return image_types
+    return sorted(image_types)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 3bcaa28 and 1f4edf2.

⛔ Files ignored due to path filters (1)
  • pixi.lock is excluded by !**/*.lock
📒 Files selected for processing (13)
  • pyproject.toml (2 hunks)
  • src/readii/__init__.py (0 hunks)
  • src/readii/analyze/correlation.py (1 hunks)
  • src/readii/analyze/plot_correlations.py (1 hunks)
  • src/readii/data/label.py (1 hunks)
  • src/readii/data/select.py (1 hunks)
  • src/readii/data/split.py (1 hunks)
  • src/readii/io/loaders/features.py (1 hunks)
  • src/readii/io/loaders/general.py (1 hunks)
  • src/readii/io/loaders/images.py (1 hunks)
  • src/readii/io/writers/plots.py (1 hunks)
  • src/readii/loaders.py (1 hunks)
  • tests/test_correlation.py (1 hunks)
💤 Files with no reviewable changes (1)
  • src/readii/init.py
🧰 Additional context used
🪛 Ruff (0.8.0)
src/readii/io/loaders/general.py

32-32: Use a context manager for opening files

(SIM115)

src/readii/io/loaders/features.py

10-10: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


12-12: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


62-62: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

tests/test_correlation.py

3-3: readii.analyze.correlation.plotCorrelationHeatmap imported but unused

Remove unused import: readii.analyze.correlation.plotCorrelationHeatmap

(F401)


6-6: readii.data.process.dropUpToFeature imported but unused

Remove unused import: readii.data.process.dropUpToFeature

(F401)

src/readii/data/select.py

78-78: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


85-85: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

src/readii/data/split.py

29-29: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


65-65: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

src/readii/analyze/correlation.py

44-44: Multiple statements on one line (colon)

(E701)


45-45: Multiple statements on one line (colon)

(E701)


58-58: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

src/readii/data/label.py

188-188: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


261-261: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🔇 Additional comments (3)
src/readii/io/writers/plots.py (1)

9-9: ⚠️ Potential issue

Correct type annotation syntax

When using Union, types should be separated by commas, not the | operator.

Apply this diff to fix the type annotation:

-def saveSeabornPlotFigure(sns_plot: heatmap,
+def saveSeabornPlotFigure(sns_plot: heatmap,
                           plot_name: str,
-                          output_dir_path: Union[Path|str]):
+                          output_dir_path: Union[Path, str]):

Likely invalid or redundant comment.

src/readii/loaders.py (1)

77-79: Verify impact of continuous=False parameter

The change ensures integer indices, but we should verify this doesn't affect existing code that might expect float indices.

Let's check for any dependencies on the index type:

✅ Verification successful

Setting continuous=False appears safe to use

The change to use integer indices is safe because:

  • The output segMasks is only used locally within the function
  • No downstream code depends on float indices
  • The parameter is well documented
  • The existing_roi_indices parameter already uses integer values (0 for background)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for code that might depend on the index type from StructureSetToSegmentation
rg -l "StructureSetToSegmentation" | xargs rg -A 5 "\.index|indices"

# Search for test files that might need updating
fd -e py test_ | xargs rg -A 5 "StructureSetToSegmentation"

Length of output: 446


Script:

#!/bin/bash
# Let's search for any usage of the output from StructureSetToSegmentation
rg -A 5 "makeMask\(" 

# Also search for any test cases involving StructureSetToSegmentation
ast-grep --pattern 'class $_ {
  $$$
  test_$_StructureSetToSegmentation$_($$$) {
    $$$
  }
  $$$
}'

Length of output: 393


Script:

#!/bin/bash
# Let's check how segMasks is used after being created
rg -B 2 -A 5 "segMasks = makeMask"

# Also check for any direct usage of continuous parameter
rg -B 2 -A 2 "continuous\s*=\s*(True|False)" 

Length of output: 713

pyproject.toml (1)

14-16: 🛠️ Refactor suggestion

Consider using version range for numpy instead of strict pinning

The strict version pinning of numpy (==1.26.4) differs from the project's convention of using version ranges. Since numpy is a core dependency that other packages might depend on, strict pinning could lead to dependency conflicts.

Consider using a version range similar to other dependencies:

-  "numpy==1.26.4",
+  "numpy>=1.26.4,<2"

Let's verify the compatibility of these dependencies:

✅ Verification successful

Consider using version range for numpy instead of strict pinning

Based on the codebase analysis:

  • numpy is widely used across multiple files (9 files import numpy)
  • No version-specific numpy functionality is detected
  • seaborn is used in correlation analysis and plotting modules
  • The codebase follows a consistent pattern of using numpy's standard features through the np alias

Since there's no evidence of version-specific numpy requirements and no detected incompatibilities, using a version range would be more maintainable and consistent with the project's dependency style:

-  "numpy==1.26.4",
+  "numpy>=1.26.4,<2"
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check compatibility of numpy with other dependencies

# Check if any other packages in the project have numpy as a dependency
pip index versions seaborn | grep -A 5 "Required-by:"
pip index versions pyradiomics-bhklab | grep -A 5 "Required-by:"

# Check for any known conflicts
pip check numpy==1.26.4 seaborn==0.13.2 pyradiomics-bhklab==3.1.4

Length of output: 468


Script:

#!/bin/bash
# Let's try a different approach by examining the codebase for numpy usage patterns

# Find files that import numpy and check their version constraints
rg "import numpy|from numpy" -l | while read -r file; do
  echo "=== $file ==="
  rg "import numpy|from numpy" "$file"
done

# Look for any requirements.txt or similar files that might have dependency info
fd "requirements.*\.txt|setup\.py" --type f

# Check if there are any CI/testing configuration files that might specify dependencies
fd "tox.ini|pytest.ini|\.github/workflows/.*\.ya?ml" --type f

# Look for any dependency management related comments in Python files
rg "# *(?i)(requires?|depends?|numpy)" "\.py$"

Length of output: 1985


Script:

#!/bin/bash
# Let's check the docs/requirements.txt file and look for any numpy version constraints
cat docs/requirements.txt

# Also check if seaborn is used in the codebase
rg "import seaborn|from seaborn" -l | while read -r file; do
  echo "=== $file ==="
  rg "import seaborn|from seaborn" "$file"
done

# Look for any version-specific numpy functionality
rg "numpy\.[0-9]|np\.[0-9]" -l

Length of output: 764



def loadFeatureFilesFromImageTypes(extracted_feature_dir:str,
image_types:Optional[list]=['original'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid mutable default arguments in function parameters

Using mutable default arguments like lists can lead to unexpected behavior because the default list is shared across all function calls. It's better to use None as the default value and initialize the list inside the function.

Apply this diff to fix the issue:

-def loadFeatureFilesFromImageTypes(extracted_feature_dir:str,
-                                   image_types:Optional[list]=['original'], 
-                                   drop_labels:Optional[bool]=True, 
-                                   labels_to_drop:Optional[list]=["patient_ID","survival_time_in_years","survival_event_binary"])->Dict[str,pd.DataFrame]:
+def loadFeatureFilesFromImageTypes(extracted_feature_dir:str,
+                                   image_types:Optional[list]=None, 
+                                   drop_labels:Optional[bool]=True, 
+                                   labels_to_drop:Optional[list]=None)->Dict[str,pd.DataFrame]:

Within the function, initialize the lists if they are None:

if image_types is None:
    image_types = ['original']
if labels_to_drop is None:
    labels_to_drop = ["patient_ID", "survival_time_in_years", "survival_event_binary"]

Also applies to: 12-12

🧰 Tools
🪛 Ruff (0.8.0)

10-10: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Comment on lines 31 to 32
if new_value not in dataframe[column_to_change].unique():
raise ValueError(f"New value {new_value} not found in column {column_to_change}.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential logical error in value replacement check

The function checks if the new_value exists in the column before replacement. However, you should check for the presence of the old values to be replaced, not the new value.

Modify the check to verify the presence of the old values:

-            if new_value not in dataframe[column_to_change].unique():
-                raise ValueError(f"New value {new_value} not found in column {column_to_change}.")
+            old_values = replacement_value_data[new_value]
+            if not set(old_values).intersection(set(dataframe[column_to_change].unique())):
+                raise ValueError(f"Values {old_values} to replace not found in column {column_to_change}.")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if new_value not in dataframe[column_to_change].unique():
raise ValueError(f"New value {new_value} not found in column {column_to_change}.")
old_values = replacement_value_data[new_value]
if not set(old_values).intersection(set(dataframe[column_to_change].unique())):
raise ValueError(f"Values {old_values} to replace not found in column {column_to_change}.")

src/readii/data/label.py Outdated Show resolved Hide resolved
src/readii/data/label.py Outdated Show resolved Hide resolved
src/readii/data/label.py Outdated Show resolved Hide resolved
src/readii/data/label.py Outdated Show resolved Hide resolved
tests/test_correlation.py Outdated Show resolved Hide resolved
Comment on lines 6 to 36
def loadImageDatasetConfig(dataset_name:str,
config_dir_path:str) -> dict:
"""Load the configuration file for a given dataset. Expects the configuration file to be named <dataset_name>.yaml.

Parameters
----------
dataset_name : str
Name of the dataset to load the configuration file for.
config_dir_path : str
Path to the directory containing the configuration files.

Returns
-------
dict
Dictionary containing the configuration settings for the dataset.

Examples
--------
>>> config = loadImageDatasetConfig("NSCLC_Radiogenomics", "config/")
"""
# Make full path to config file
config_file_path = os.path.join(config_dir_path, f"{dataset_name}.yaml")

# Check if config file exists
if os.path.exists(config_file_path):
# Load the config file
config = yaml.safe_load(open(config_file_path, "r"))
return config
else:
print(f"Config file {config_file_path} does not exist.")
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use context manager for file operations and improve error handling

The function has potential resource leaks and could be more robust in error handling.

Consider this implementation:

 def loadImageDatasetConfig(dataset_name:str,
                          config_dir_path:str) -> dict:
     config_file_path = os.path.join(config_dir_path, f"{dataset_name}.yaml")
     
     if not os.path.exists(config_file_path):
         raise FileNotFoundError(f"Config file {config_file_path} does not exist.")
     
-    config = yaml.safe_load(open(config_file_path, "r"))
-    return config
-    else:
-        print(f"Config file {config_file_path} does not exist.")
-        return None
+    try:
+        with open(config_file_path, "r") as f:
+            return yaml.safe_load(f)
+    except yaml.YAMLError as e:
+        raise ValueError(f"Invalid YAML in config file: {e}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def loadImageDatasetConfig(dataset_name:str,
config_dir_path:str) -> dict:
"""Load the configuration file for a given dataset. Expects the configuration file to be named <dataset_name>.yaml.
Parameters
----------
dataset_name : str
Name of the dataset to load the configuration file for.
config_dir_path : str
Path to the directory containing the configuration files.
Returns
-------
dict
Dictionary containing the configuration settings for the dataset.
Examples
--------
>>> config = loadImageDatasetConfig("NSCLC_Radiogenomics", "config/")
"""
# Make full path to config file
config_file_path = os.path.join(config_dir_path, f"{dataset_name}.yaml")
# Check if config file exists
if os.path.exists(config_file_path):
# Load the config file
config = yaml.safe_load(open(config_file_path, "r"))
return config
else:
print(f"Config file {config_file_path} does not exist.")
return None
def loadImageDatasetConfig(dataset_name:str,
config_dir_path:str) -> dict:
"""Load the configuration file for a given dataset. Expects the configuration file to be named <dataset_name>.yaml.
Parameters
----------
dataset_name : str
Name of the dataset to load the configuration file for.
config_dir_path : str
Path to the directory containing the configuration files.
Returns
-------
dict
Dictionary containing the configuration settings for the dataset.
Examples
--------
>>> config = loadImageDatasetConfig("NSCLC_Radiogenomics", "config/")
"""
# Make full path to config file
config_file_path = os.path.join(config_dir_path, f"{dataset_name}.yaml")
# Check if config file exists
if not os.path.exists(config_file_path):
raise FileNotFoundError(f"Config file {config_file_path} does not exist.")
try:
with open(config_file_path, "r") as f:
return yaml.safe_load(f)
except yaml.YAMLError as e:
raise ValueError(f"Invalid YAML in config file: {e}")
🧰 Tools
🪛 Ruff (0.8.0)

32-32: Use a context manager for opening files

(SIM115)

Comment on lines 40 to 68
def loadFileToDataFrame(file_path:str) -> pd.DataFrame:
"""Load data from a csv or xlsx file into a pandas dataframe.

Parameters
----------
file_path (str): Path to the data file.

Returns
-------
pd.DataFrame: Dataframe containing the data from the file.
"""
# Get the file extension
_, file_extension = os.path.splitext(file_path)

try:
# Check if the file is an Excel file
if file_extension == '.xlsx':
df = pd.read_excel(file_path)
# Check if the file is a CSV file
elif file_extension == '.csv':
df = pd.read_csv(file_path)
else:
raise ValueError("Unsupported file format. Please provide a .csv or .xlsx file.")

return df

except Exception as e:
print(f"An error occurred: {e}")
return None
Copy link
Contributor

@coderabbitai coderabbitai bot Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error handling and input validation

The function could be more robust with input validation and specific error handling.

Consider this implementation:

 def loadFileToDataFrame(file_path:str) -> pd.DataFrame:
+    if not file_path:
+        raise ValueError("file_path cannot be empty")
+
+    if not os.path.exists(file_path):
+        raise FileNotFoundError(f"File {file_path} does not exist")
+
     _, file_extension = os.path.splitext(file_path)
     
     try:
         if file_extension == '.xlsx':
             df = pd.read_excel(file_path)
         elif file_extension == '.csv':
             df = pd.read_csv(file_path)
         else:
             raise ValueError("Unsupported file format. Please provide a .csv or .xlsx file.")
         
+        if df.empty:
+            raise ValueError("Loaded DataFrame is empty")
+        
         return df
     
-    except Exception as e:
-        print(f"An error occurred: {e}")
-        return None
+    except pd.errors.EmptyDataError:
+        raise ValueError("File is empty")
+    except (pd.errors.ParserError, ValueError) as e:
+        raise ValueError(f"Error parsing file: {e}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def loadFileToDataFrame(file_path:str) -> pd.DataFrame:
"""Load data from a csv or xlsx file into a pandas dataframe.
Parameters
----------
file_path (str): Path to the data file.
Returns
-------
pd.DataFrame: Dataframe containing the data from the file.
"""
# Get the file extension
_, file_extension = os.path.splitext(file_path)
try:
# Check if the file is an Excel file
if file_extension == '.xlsx':
df = pd.read_excel(file_path)
# Check if the file is a CSV file
elif file_extension == '.csv':
df = pd.read_csv(file_path)
else:
raise ValueError("Unsupported file format. Please provide a .csv or .xlsx file.")
return df
except Exception as e:
print(f"An error occurred: {e}")
return None
def loadFileToDataFrame(file_path:str) -> pd.DataFrame:
"""Load data from a csv or xlsx file into a pandas dataframe.
Parameters
----------
file_path (str): Path to the data file.
Returns
-------
pd.DataFrame: Dataframe containing the data from the file.
"""
if not file_path:
raise ValueError("file_path cannot be empty")
if not os.path.exists(file_path):
raise FileNotFoundError(f"File {file_path} does not exist")
# Get the file extension
_, file_extension = os.path.splitext(file_path)
try:
# Check if the file is an Excel file
if file_extension == '.xlsx':
df = pd.read_excel(file_path)
# Check if the file is a CSV file
elif file_extension == '.csv':
df = pd.read_csv(file_path)
else:
raise ValueError("Unsupported file format. Please provide a .csv or .xlsx file.")
if df.empty:
raise ValueError("Loaded DataFrame is empty")
return df
except pd.errors.EmptyDataError:
raise ValueError("File is empty")
except (pd.errors.ParserError, ValueError) as e:
raise ValueError(f"Error parsing file: {e}")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@strixy16, glad you found the suggestion helpful.


If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (4)
src/readii/analyze/correlation.py (4)

44-45: Refactor single-line conditional statements for better readability

Multiple statements on one line reduce code readability and can make debugging more difficult.

Apply this diff to improve readability:

-if not vertical_feature_name.startswith("_"): vertical_feature_name = f"_{vertical_feature_name}"
+if not vertical_feature_name.startswith("_"):
+    vertical_feature_name = f"_{vertical_feature_name}"

-if not horizontal_feature_name.startswith("_"): horizontal_feature_name = f"_{horizontal_feature_name}"
+if not horizontal_feature_name.startswith("_"):
+    horizontal_feature_name = f"_{horizontal_feature_name}"
🧰 Tools
🪛 Ruff (0.8.0)

44-44: Multiple statements on one line (colon)

(E701)


45-45: Multiple statements on one line (colon)

(E701)


58-58: Use raise ... from e to preserve exception context

When re-raising an exception within an except block, using raise ... from e helps maintain the original traceback, making debugging easier.

Apply this diff to modify the exception handling:

-except Exception as e:
-    raise ValueError(f"Error calculating correlation matrix: {e}")
+except Exception as e:
+    raise ValueError(f"Error calculating correlation matrix: {e}") from e
🧰 Tools
🪛 Ruff (0.8.0)

58-58: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


83-83: Fix typo in docstring

There is a typo in the docstring for the triangle parameter.

Apply this diff to correct the typo:

-    Which triangle half of the matrixto plot. The default is "lower".
+    Which triangle half of the matrix to plot. The default is "lower".

162-170: Refactor duplicated validation logic in correlation functions

The functions getVerticalSelfCorrelations, getHorizontalSelfCorrelations, and getCrossCorrelationMatrix have similar validation code for checking feature counts against the correlation matrix dimensions.

Consider creating a shared helper function to handle this validation logic to reduce code duplication and improve maintainability.

Also applies to: 190-201, 221-226

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1f4edf2 and 31bf5bf.

📒 Files selected for processing (3)
  • src/readii/analyze/correlation.py (1 hunks)
  • src/readii/loaders.py (1 hunks)
  • tests/test_correlation.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/readii/loaders.py
🧰 Additional context used
🪛 Ruff (0.8.0)
src/readii/analyze/correlation.py

44-44: Multiple statements on one line (colon)

(E701)


45-45: Multiple statements on one line (colon)

(E701)


58-58: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🔇 Additional comments (1)
tests/test_correlation.py (1)

1-115: Comprehensive and well-structured test suite

The test cases cover various scenarios, including valid inputs, default behaviors, invalid methods, incorrect feature inputs, and feature name handling. The tests are well-organized and effectively validate the functionality of the getFeatureCorrelations function.

src/readii/analyze/correlation.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Dec 10, 2024

Codecov Report

Attention: Patch coverage is 6.12813% with 337 lines in your changes missing coverage. Please review.

Project coverage is 38.97%. Comparing base (3bcaa28) to head (b1daaf0).

Files with missing lines Patch % Lines
src/readii/data/label.py 0.00% 72 Missing ⚠️
src/readii/analyze/plot_correlations.py 0.00% 66 Missing ⚠️
src/readii/data/select.py 0.00% 55 Missing ⚠️
src/readii/analyze/correlation.py 35.59% 38 Missing ⚠️
src/readii/data/split.py 0.00% 29 Missing ⚠️
src/readii/io/loaders/features.py 0.00% 26 Missing ⚠️
src/readii/io/loaders/general.py 0.00% 22 Missing ⚠️
src/readii/io/loaders/images.py 0.00% 17 Missing ⚠️
src/readii/io/writers/plots.py 0.00% 12 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main      #79       +/-   ##
===========================================
- Coverage   60.92%   38.97%   -21.96%     
===========================================
  Files          10       19        +9     
  Lines         540      898      +358     
===========================================
+ Hits          329      350       +21     
- Misses        211      548      +337     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 31bf5bf and 550c32a.

📒 Files selected for processing (1)
  • src/readii/data/label.py (1 hunks)
🔇 Additional comments (2)
src/readii/data/label.py (2)

203-203: Ensure consistency between docstring and default parameter value

The docstring mentions that the default for event_column_value_mapping is an empty dictionary, but in the function signature, it is set to None. Update the docstring to reflect the actual default value.

Apply this diff to fix the docstring:

-            The default is an empty dictionary.
+            The default is `None`.

272-272: Ensure consistency between docstring and default parameter value

The docstring states that outcome_labels defaults to ["survival_time_in_years", "survival_event_binary"], but in the function signature, it is set to None. Update the docstring to reflect the current default value.

Apply this diff to fix the docstring:

-            The default is ["survival_time_in_years", "survival_event_binary"].
+            The default is `None`.

Comment on lines +245 to +246
# TODO: add handling for values not in the dictionary

Copy link
Contributor

@coderabbitai coderabbitai bot Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Implement handling for unmapped event values

There is a TODO comment indicating that handling for event values not in the mapping dictionary is incomplete. This could lead to errors if unexpected values are encountered. Please implement the necessary handling to ensure all event values are properly managed.

Would you like assistance in implementing this feature?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I would like assistance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@strixy16, glad to assist! Here's an updated version of the eventOutcomeColumnSetup function with handling for unmapped event values:

def eventOutcomeColumnSetup(dataframe_with_outcome: DataFrame,
                            outcome_column_label: str,
                            standard_column_label: str,
                            event_column_value_mapping: Optional[dict] = None):
    """Function to set up an event outcome column in a dataframe.

    Parameters
    ----------
    dataframe_with_outcome : DataFrame
        Dataframe containing the outcome column to convert.
    outcome_column_label : str
        Label for the outcome column to convert in the dataframe.
    standard_column_label : str
        Name of the column to save the standardized outcome column as.
    event_column_value_mapping : dict, optional
        Dictionary of event values to convert to numeric values. Keys are the event values, values are the numeric values.
        If provided, any unmapped event values will be assigned new numeric values.
        If not provided, will attempt to convert based on the values in the outcome column. By default, "alive" and "dead" 
        will be converted to 0 and 1, respectively, if present in the outcome column.

    Returns
    -------
    dataframe_with_standardized_outcome : DataFrame
        Dataframe with a copy of the specified outcome column converted to numeric values.
    """

    # Get the type of the existing event column
    event_variable_type = type(dataframe_with_outcome[outcome_column_label].iloc[0])

    # Make a copy of the dataframe to work on 
    dataframe_with_standardized_outcome = dataframe_with_outcome.copy()

    # Handle numeric event column
    if np.issubdtype(event_variable_type, np.number):
        dataframe_with_standardized_outcome[standard_column_label] = dataframe_with_outcome[outcome_column_label]

    # Handle boolean event column
    elif np.issubdtype(event_variable_type, np.bool_):
        dataframe_with_standardized_outcome[standard_column_label] = dataframe_with_outcome[outcome_column_label].astype(int)

    # Handle string event column
    elif np.issubdtype(event_variable_type, np.str_):
        # Make values of outcome column lowercase
        dataframe_with_standardized_outcome[outcome_column_label] = dataframe_with_outcome[outcome_column_label].str.lower()
        
        # Get the existing event values in the provided dataframe
        existing_event_values = sorted(dataframe_with_standardized_outcome[outcome_column_label].unique())

        if not event_column_value_mapping:
            # Create a dictionary to map event values to numeric values
            event_column_value_mapping = survivalStatusToNumericMapping(
                event_outcome_column=dataframe_with_standardized_outcome[outcome_column_label]
            )
        else:
            # Convert all dictionary keys in provided mapping to lowercase
            event_column_value_mapping = dict((status.lower(), value) 
                                              for status, value in event_column_value_mapping.items())
            
            # Identify any values not in the mapping
            unmapped_values = [value for value in existing_event_values 
                               if value not in event_column_value_mapping]
            if unmapped_values:
                print(f"Unmapped event values found: {unmapped_values}. They will be assigned new numeric values.")
                # Assign next available numeric value to unmapped event values
                max_mapped_value = max(event_column_value_mapping.values(), default=-1)
                for i, value in enumerate(unmapped_values, start=max_mapped_value + 1):
                    event_column_value_mapping[value] = i

        # Map the event column using the updated mapping
        dataframe_with_standardized_outcome[standard_column_label] = \
            dataframe_with_standardized_outcome[outcome_column_label].map(event_column_value_mapping)
    else:
        raise ValueError(f"Unsupported data type for outcome column: {event_variable_type}")

    return dataframe_with_standardized_outcome

Explanation:

  • Handling Unmapped Values: The function now checks for any event values in existing_event_values that are not present in event_column_value_mapping.
  • Assigning New Numeric Values: Unmapped event values are assigned new numeric codes starting from the next available integer after the maximum value in the existing mapping.
  • Mapping the Outcome Column: The event outcome column is then mapped using the updated event_column_value_mapping, ensuring all event values are properly managed.
  • User Feedback: A message is printed to inform the user about any unmapped event values found and their assignment.

This implementation ensures that any unexpected or unmapped event values are handled gracefully, preventing errors during execution.

Let me know if you have any questions or need further assistance!

src/readii/data/label.py Outdated Show resolved Hide resolved
src/readii/data/label.py Outdated Show resolved Hide resolved
src/readii/data/label.py Outdated Show resolved Hide resolved
src/readii/data/label.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (5)
src/readii/data/split.py (3)

29-39: Optimize dictionary key iteration.

The current implementation uses .keys() which is unnecessary when iterating over a dictionary.

Apply this optimization:

-    for new_value in replacement_value_data.keys():
+    for new_value in replacement_value_data:
🧰 Tools
🪛 Ruff (0.8.0)

29-29: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


66-70: Optimize dictionary key iteration.

Similar to the previous function, .keys() is unnecessary when iterating over a dictionary.

Apply this optimization:

-    for split_column_name in split_col_data.keys():
+    for split_column_name in split_col_data:
🧰 Tools
🪛 Ruff (0.8.0)

66-66: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


96-96: Consider avoiding inplace operations for better safety.

Using inplace=True with reset_index can be problematic in concurrent operations and makes the code less functional.

Apply this change for better safety:

-            split_dataframe.reset_index(inplace=True, drop=True)
+            split_dataframe = split_dataframe.reset_index(drop=True)
src/readii/io/loaders/features.py (2)

13-31: Update docstring to match current implementation.

The docstring contains inconsistencies with the actual implementation:

  1. It mentions a default value of ['original'] for image_types, but this parameter is now required
  2. It references specific default values for labels_to_drop in the description, which could become outdated

Consider updating the docstring:

     Parameters
     ----------
     extracted_feature_dir : str
         Path to the directory containing the extracted feature csv files
-    image_types : list, optional
-        List of image types to load in. The default is ['original'].
+    image_types : list
+        List of image types to load in
     drop_labels : bool, optional
         Whether to drop the labels from the dataframes. Use when loading labelled data from data_setup_for_modeling.ipynb. The default is True.
     labels_to_drop : list, optional
-        List of labels to drop from the dataframes. The default is ["patient_ID","survival_time_in_years","survival_event_binary"] based on code
-        in data_setup_for_modeling.ipynb.
+        List of labels to drop from the dataframes. If None, defaults to common patient identifiers and survival metrics.

42-53: Consider providing more context in error messages.

The error handling could be more informative by including the specific error details and providing guidance on how to resolve the issue.

     for image_type in image_types:
         try:
             image_type_feature_file = [file for file in feature_file_list if (image_type in file) and (file.endswith(".csv"))][0]
             feature_file_list.remove(image_type_feature_file)
-        except Exception as e:
-            print(f"{e}\n No {image_type} feature csv files found in {extracted_feature_dir}")
+        except IndexError:
+            print(f"No CSV files found for image type '{image_type}' in {extracted_feature_dir}.")
+            print(f"Expected filename pattern: *{image_type}*.csv")
             # Skip to the next image type
             continue
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 550c32a and b1daaf0.

📒 Files selected for processing (2)
  • src/readii/data/split.py (1 hunks)
  • src/readii/io/loaders/features.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.0)
src/readii/data/split.py

29-29: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


66-66: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

src/readii/io/loaders/features.py

66-66: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

🔇 Additional comments (4)
src/readii/data/split.py (3)

1-6: LGTM! Clean imports and well-typed function signatures.

The code follows good practices with explicit type hints and minimal imports.


7-23: LGTM! Well-documented function with clear parameters and return types.

The docstring provides comprehensive information about the function's purpose, parameters, and return value.


46-61: LGTM! Clear and comprehensive documentation.

The docstring effectively communicates the function's purpose, parameters, and return value.

src/readii/io/loaders/features.py (1)

1-7: LGTM! Imports are well-organized.

The imports are logically grouped and all are being used in the code.

src/readii/io/loaders/features.py Outdated Show resolved Hide resolved
src/readii/io/loaders/features.py Show resolved Hide resolved
src/readii/io/loaders/features.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (3)
src/readii/io/loaders/general.py (1)

58-68: Ensure file extension checks are case-insensitive

Currently, the file extension checks are case-sensitive, which may cause issues if the file has an uppercase extension (e.g., .XLSX, .CSV). Consider normalizing the file extension to lowercase before comparison.

Apply this diff to normalize the file extension:

 _, file_extension = os.path.splitext(file_path)
+file_extension = file_extension.lower()

 try:
     # Check if the file is an Excel file
     if file_extension == '.xlsx':
         df = pd.read_excel(file_path)
     # Check if the file is a CSV file
     elif file_extension == '.csv':
         df = pd.read_csv(file_path)
     else:
         raise ValueError("Unsupported file format. Please provide a .csv or .xlsx file.")
src/readii/analyze/correlation.py (1)

47-48: Split multiple statements into separate lines

Lines 47 and 48 contain multiple statements on one line, which reduces readability. According to PEP 8 style guidelines, each statement should be on its own line.

Apply this diff:

-if not vertical_feature_name.startswith("_"): vertical_feature_name = f"_{vertical_feature_name}"
+if not vertical_feature_name.startswith("_"):
+    vertical_feature_name = f"_{vertical_feature_name}"
-if not horizontal_feature_name.startswith("_"): horizontal_feature_name = f"_{horizontal_feature_name}"
+if not horizontal_feature_name.startswith("_"):
+    horizontal_feature_name = f"_{horizontal_feature_name}"
🧰 Tools
🪛 Ruff (0.8.0)

47-47: Multiple statements on one line (colon)

(E701)


48-48: Multiple statements on one line (colon)

(E701)

src/readii/data/label.py (1)

4-4: Remove unused import pandas as pd

The module pandas is imported as pd but not used in the code. Please remove the unused import to clean up your code.

Apply this diff:

-import pandas as pd
🧰 Tools
🪛 Ruff (0.8.0)

4-4: pandas imported but unused

Remove unused import: pandas

(F401)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b1daaf0 and dc2e86a.

📒 Files selected for processing (4)
  • src/readii/analyze/correlation.py (1 hunks)
  • src/readii/data/label.py (1 hunks)
  • src/readii/io/loaders/features.py (1 hunks)
  • src/readii/io/loaders/general.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.0)
src/readii/io/loaders/features.py

70-70: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

src/readii/io/loaders/general.py

39-39: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


76-76: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


78-78: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

src/readii/data/label.py

4-4: pandas imported but unused

Remove unused import: pandas

(F401)

src/readii/analyze/correlation.py

47-47: Multiple statements on one line (colon)

(E701)


48-48: Multiple statements on one line (colon)

(E701)


61-61: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🔇 Additional comments (2)
src/readii/io/loaders/features.py (2)

70-70: Remove unused exception variable e

The exception variable e is assigned but not used in the except block. Remove it to clean up the code.

Apply this diff:

-except KeyError as e:
+except KeyError:
     print(f"{feature_file_path} does not have the labels {labels_to_drop} to drop.")
     # Skip to the next image type
     continue
🧰 Tools
🪛 Ruff (0.8.0)

70-70: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


9-12: 🛠️ Refactor suggestion

Fix inconsistency between function signature and docstring for image_types

The parameter image_types is required in the function signature but is described as optional with a default value in the docstring. Please update the function signature or the docstring to reflect the intended usage.

If image_types should have a default value, avoid using a mutable default argument. Set it to None and initialize within the function.

Apply this diff:

 def loadFeatureFilesFromImageTypes(extracted_feature_dir:str,
-                                   image_types:list, 
+                                   image_types:Optional[list]=None, 
                                    drop_labels:Optional[bool]=True, 
                                    labels_to_drop:Optional[list]=None)->Dict[str,pd.DataFrame]:

Then initialize image_types inside the function:

if image_types is None:
    image_types = ['original']

Comment on lines +77 to +78
except (pd.errors.ParserError, ValueError) as e:
raise ValueError(f"Error parsing file: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use exception chaining with raise ... from e

To maintain the original exception context, add from e when raising a new exception inside an except block.

Apply this diff:

 except (pd.errors.ParserError, ValueError) as e:
-    raise ValueError(f"Error parsing file: {e}")
+    raise ValueError(f"Error parsing file: {e}") from e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except (pd.errors.ParserError, ValueError) as e:
raise ValueError(f"Error parsing file: {e}")
except (pd.errors.ParserError, ValueError) as e:
raise ValueError(f"Error parsing file: {e}") from e
🧰 Tools
🪛 Ruff (0.8.0)

78-78: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

Comment on lines +39 to +40
raise ValueError(f"Invalid YAML in config file: {e}")
def loadFileToDataFrame(file_path:str) -> pd.DataFrame:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use exception chaining with raise ... from e

To preserve the original traceback and provide better error context, use exception chaining by adding from e when raising a new exception within an except block.

Apply this diff:

 except yaml.YAMLError as e:
-    raise ValueError(f"Invalid YAML in config file: {e}")
+    raise ValueError(f"Invalid YAML in config file: {e}") from e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
raise ValueError(f"Invalid YAML in config file: {e}")
def loadFileToDataFrame(file_path:str) -> pd.DataFrame:
raise ValueError(f"Invalid YAML in config file: {e}") from e
def loadFileToDataFrame(file_path:str) -> pd.DataFrame:
🧰 Tools
🪛 Ruff (0.8.0)

39-39: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

# Calculate correlation between vertical features and horizontal features
correlation_matrix = features_to_correlate.corr(method=method)
except Exception as e:
raise ValueError(f"Error calculating correlation matrix: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use exception chaining with raise ... from e

When raising a new exception within an except block, use from e to preserve the original exception context.

Apply this diff:

 except Exception as e:
-    raise ValueError(f"Error calculating correlation matrix: {e}")
+    raise ValueError(f"Error calculating correlation matrix: {e}") from e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
raise ValueError(f"Error calculating correlation matrix: {e}")
raise ValueError(f"Error calculating correlation matrix: {e}") from e
🧰 Tools
🪛 Ruff (0.8.0)

61-61: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


# Swap the keys and values in the mapping dictionary to use the replaceColumnValues function
# So the numeric values will be the keys and the string event values will be the values
replacement_value_data = dict([num_event, str_event] for str_event, num_event in event_column_value_mapping.items())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Correct the key-value order in replacement_value_data

The dictionary comprehension for replacement_value_data may have the keys and values swapped, which can lead to incorrect replacements.

Apply this diff to fix the key-value pair order:

-        replacement_value_data = dict([num_event, str_event] for str_event, num_event in event_column_value_mapping.items())
+        replacement_value_data = {num_event: str_event for str_event, num_event in event_column_value_mapping.items()}

Ensure that the keys in replacement_value_data are the numeric event codes and the values are the corresponding string event labels.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
replacement_value_data = dict([num_event, str_event] for str_event, num_event in event_column_value_mapping.items())
replacement_value_data = {num_event: str_event for str_event, num_event in event_column_value_mapping.items()}

@strixy16 strixy16 closed this Dec 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Integrate analyze_readii_outputs into package
1 participant