feat: integrate analyze readii outputs functions #79

strixy16 · 2024-12-09T22:06:53Z

Summary by CodeRabbit

New Features
- Added functions for correlation analysis and visualization, including correlation matrix calculations and heatmap plotting.
- Introduced functions for patient data manipulation, including patient ID handling and outcome standardization.
- Added functions for DataFrame filtering and splitting based on column values.
- New functions for loading and saving image datasets and plots.
- Enhanced dataset configuration loading and data import capabilities.
- Added new functions for loading extracted imaging feature sets from a directory.
Bug Fixes
- Enhanced the loadRTSTRUCTSITK function for improved data handling.
Tests
- Introduced unit tests for correlation functionalities to ensure expected behavior.
- Added comprehensive tests for new correlation functions, covering various scenarios and input validations.

…on matrix

…tions

…ze_readii_outputs

…mpy for seaborn install to work

…o plot them as a heatmap

…nds dataframe based on a column name

…date test to use this

…te-analyze_readii_outputs

…ion and to split data by column values like for train and test

…ction

coderabbitai · 2024-12-09T22:07:02Z

Walkthrough

The changes in this pull request primarily involve the addition of new functionality and dependencies to the project. The pyproject.toml file has been updated to include new dependencies for numpy and seaborn. Several new Python files have been introduced, focusing on correlation analysis, data manipulation, and visualization, along with functions to handle patient data. Additionally, a new test file has been created to validate the correlation functionalities. Minor formatting adjustments were also made in existing files.

Changes

File Path	Change Summary
`pyproject.toml`	Added dependencies: `numpy==1.26.4`, `seaborn>=0.13.2,<0.14`. Existing dependency for `pyradiomics-bhklab>=3.1.4,<4` remains unchanged.
`src/readii/__init__.py`	Removed a blank line after `__version__` declaration.
`src/readii/analyze/correlation.py`	Introduced functions for correlation analysis: `getFeatureCorrelations`, `plotCorrelationHeatmap`, `getVerticalSelfCorrelations`, `getHorizontalSelfCorrelations`, `getCrossCorrelationMatrix`.
`src/readii/analyze/plot_correlations.py`	Added functions for visualizing correlations: `plotCorrelationHeatmap`, `plotSelfCorrelationHeatMaps`, `plotCrossCorrelationHeatmap`, `plotCorrelationDistribution`, `plotSelfCorrelationDistributionPlots`, `plotCrossCorrelationDistributionPlots`.
`src/readii/data/label.py`	Introduced functions for patient data manipulation: `getPatientIdentifierLabel`, `setPatientIdAsIndex`, `convertDaysToYears`, `timeOutcomeColumnSetup`, `survivalStatusToNumericMapping`, `eventOutcomeColumnSetup`, `addOutcomeLabels`.
`src/readii/data/select.py`	Added functions for DataFrame manipulation: `dropUpToFeature`, `selectByColumnValue`, `getOnlyPyradiomicsFeatures`, `getPatientIntersectionDataframes`.
`src/readii/data/split.py`	Introduced functions for DataFrame processing: `replaceColumnValues`, `splitDataByColumnValue`.
`src/readii/io/loaders/features.py`	Added function `loadFeatureFilesFromImageTypes` for loading imaging feature sets.
`src/readii/io/loaders/general.py`	Introduced functions: `loadImageDatasetConfig`, `loadFileToDataFrame` for dataset configuration and loading data.
`src/readii/io/loaders/images.py`	Added function `getImageTypesFromDirectory` for retrieving image types from a directory.
`src/readii/io/writers/plots.py`	Introduced function `saveSeabornPlotFigure` for saving Seaborn plots to files.
`src/readii/loaders.py`	Updated `loadRTSTRUCTSITK` function to include a new parameter `continuous` in the `StructureSetToSegmentation` instantiation.
`tests/test_correlation.py`	Created a new test file with unit tests for correlation functionalities, including a fixture for generating random feature matrices and tests for `getFeatureCorrelations` and `plotCorrelationHeatmap`.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CorrelationModule
    participant DataFrame
    participant VisualizationModule

    User->>CorrelationModule: getFeatureCorrelations()
    CorrelationModule->>DataFrame: Compute correlation matrix
    DataFrame-->>CorrelationModule: Return correlation matrix
    CorrelationModule->>VisualizationModule: plotCorrelationHeatmap()
    VisualizationModule-->>User: Display heatmap

🐰 "In the code, new features hop,
With numpy and seaborn on top.
Correlations dance, plots take flight,
Data's magic shines so bright.
With tests to ensure all's well,
In our rabbit hole, we surely excel!" 🐇

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 9

🧹 Outside diff range and nitpick comments (13)

src/readii/io/loaders/features.py (1)
62-62: Remove unused variable e in exception handling

The variable e is assigned but never used in the except block. If you don't need the exception object, you can omit it.

Apply this diff to address the issue:
-        except Exception as e:
+        except Exception:
Alternatively, if you intend to use the exception message, include it in your error handling.

🧰 Tools

🪛 Ruff (0.8.0)

62-62: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)
src/readii/data/split.py (1)
29-29: Simplify dictionary iteration by removing unnecessary .keys()

When iterating over a dictionary, you can iterate directly over the keys without calling .keys(). This is more Pythonic and slightly more efficient.

Apply this diff to simplify the code:
-        for new_value in replacement_value_data.keys():
+        for new_value in replacement_value_data:
-        for split_column_name in split_col_data.keys():
+        for split_column_name in split_col_data:
Also applies to: 65-65

🧰 Tools

🪛 Ruff (0.8.0)

29-29: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)
src/readii/data/select.py (1)
78-78: Simplify dictionary iteration by removing unnecessary .keys()

When iterating over a dictionary, iterate directly over the keys without using .keys(). This improves readability.

Apply this diff to enhance the code:
-                for key in include_col_values.keys():
+                for key in include_col_values:
-                for key in exclude_col_values.keys():
+                for key in exclude_col_values:
Also applies to: 85-85

🧰 Tools

🪛 Ruff (0.8.0)

78-78: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)
src/readii/analyze/correlation.py (2)
44-45: Follow PEP 8 style guidelines: One statement per line

Having multiple statements on one line can reduce code readability. It's recommended to place each statement on its own line.

Apply this diff to adhere to style guidelines:
-        if vertical_feature_name: vertical_feature_name = f"_{vertical_feature_name}"
-        if horizontal_feature_name: horizontal_feature_name = f"_{horizontal_feature_name}"
+        if vertical_feature_name:
+            vertical_feature_name = f"_{vertical_feature_name}"
+        if horizontal_feature_name:
+            horizontal_feature_name = f"_{horizontal_feature_name}"
🧰 Tools

🪛 Ruff (0.8.0)

44-44: Multiple statements on one line (colon)

(E701)

45-45: Multiple statements on one line (colon)

(E701)

58-58: Preserve exception context when re-raising exceptions

When re-raising an exception, use raise ... from ... to maintain the original exception context. This helps with debugging by preserving the traceback of the original exception.

Apply this diff to improve exception handling:
-        raise ValueError(f"Error calculating correlation matrix: {e}")
+        raise ValueError(f"Error calculating correlation matrix: {e}") from e
🧰 Tools

🪛 Ruff (0.8.0)

58-58: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)
src/readii/data/label.py (2)
34-34: Fix typographical error in error message

The word "recognizeable" is misspelled. It should be "recognizable".

Apply this diff to correct the typo:
-    raise ValueError("Dataframe doesn't have a recognizeable patient ID column. Must contain patient or case ID.")
+    raise ValueError("Dataframe doesn't have a recognizable patient ID column. Must contain patient or case ID.")
241-241: Avoid shadowing built-in functions

The variable map shadows the built-in function map. Consider renaming it to avoid confusion.

Apply this diff to rename the variable:
-if not all(value == map for value, map in zip(existing_event_values, sorted(event_column_value_mapping.keys()))):
+if not all(value == mapped_value for value, mapped_value in zip(existing_event_values, sorted(event_column_value_mapping.keys()))):
src/readii/analyze/plot_correlations.py (1)
254-254: Replace print statement with logging

Using print statements in library code is not recommended. Consider using the logging module to provide informational messages.

Apply this diff to use logging:
-    print("Correlation matrix is symmetric.")
+    import logging
+    logging.info("Correlation matrix is symmetric.")
src/readii/io/writers/plots.py (1)
20-20: Update docstring to match parameter type

The output_dir_path parameter accepts both Path and str types. Update the docstring for accuracy.

Apply this diff to update the docstring:
-        output_dir_path : str
+        output_dir_path : Union[Path, str]
tests/test_correlation.py (2)
3-3: Remove unused import plotCorrelationHeatmap

The function plotCorrelationHeatmap is imported but not used in this file.

Apply this diff to remove the unused import:
-from readii.analyze.correlation import (
-    getFeatureCorrelations,
-    plotCorrelationHeatmap,
-)
+from readii.analyze.correlation import getFeatureCorrelations
🧰 Tools

🪛 Ruff (0.8.0)

3-3: readii.analyze.correlation.plotCorrelationHeatmap imported but unused

Remove unused import: readii.analyze.correlation.plotCorrelationHeatmap

(F401)

6-6: Remove unused import dropUpToFeature

The function dropUpToFeature is imported but not used.

Apply this diff to remove the unused import:
-from readii.data.process import dropUpToFeature
🧰 Tools

🪛 Ruff (0.8.0)

6-6: readii.data.process.dropUpToFeature imported but unused

Remove unused import: readii.data.process.dropUpToFeature

(F401)
src/readii/io/loaders/images.py (2)
42-43: Fix typo in comment

There's a typo in the comment: "banes" should be "names"
-    # Get list of file banes with the specified prefix and suffix in the directory
+    # Get list of file names with the specified prefix and suffix in the directory
40-52: Consider sorting image types for consistent output

The function returns image types in arbitrary order based on filesystem traversal.

Consider sorting the list before returning:
     # Initialize an empty list to store the image types
     image_types = []
 
     for file in raw_data_dir.glob(f"{feature_file_prefix}*{feature_file_suffix}"):
         file_name = file.name
         image_type = file_name.removeprefix(feature_file_prefix).removesuffix(feature_file_suffix)
         image_types.append(image_type)
 
-    return image_types
+    return sorted(image_types)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 3bcaa28 and 1f4edf2.

⛔ Files ignored due to path filters (1)

pixi.lock is excluded by !**/*.lock

📒 Files selected for processing (13)

pyproject.toml (2 hunks)
src/readii/__init__.py (0 hunks)
src/readii/analyze/correlation.py (1 hunks)
src/readii/analyze/plot_correlations.py (1 hunks)
src/readii/data/label.py (1 hunks)
src/readii/data/select.py (1 hunks)
src/readii/data/split.py (1 hunks)
src/readii/io/loaders/features.py (1 hunks)
src/readii/io/loaders/general.py (1 hunks)
src/readii/io/loaders/images.py (1 hunks)
src/readii/io/writers/plots.py (1 hunks)
src/readii/loaders.py (1 hunks)
tests/test_correlation.py (1 hunks)

💤 Files with no reviewable changes (1)

src/readii/init.py

🧰 Additional context used

🪛 Ruff (0.8.0)

src/readii/io/loaders/general.py

32-32: Use a context manager for opening files

(SIM115)

src/readii/io/loaders/features.py

10-10: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

12-12: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

62-62: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

tests/test_correlation.py

3-3: readii.analyze.correlation.plotCorrelationHeatmap imported but unused

Remove unused import: readii.analyze.correlation.plotCorrelationHeatmap

(F401)

6-6: readii.data.process.dropUpToFeature imported but unused

Remove unused import: readii.data.process.dropUpToFeature

(F401)

src/readii/data/select.py

78-78: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

85-85: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

src/readii/data/split.py

29-29: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

65-65: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

src/readii/analyze/correlation.py

44-44: Multiple statements on one line (colon)

(E701)

45-45: Multiple statements on one line (colon)

(E701)

58-58: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

src/readii/data/label.py

188-188: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

261-261: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🔇 Additional comments (3)

src/readii/io/writers/plots.py (1)

9-9: ⚠️ Potential issue

Correct type annotation syntax

When using Union, types should be separated by commas, not the | operator.

Apply this diff to fix the type annotation:

-def saveSeabornPlotFigure(sns_plot: heatmap,
+def saveSeabornPlotFigure(sns_plot: heatmap,
                           plot_name: str,
-                          output_dir_path: Union[Path|str]):
+                          output_dir_path: Union[Path, str]):

Likely invalid or redundant comment.

src/readii/loaders.py (1)

77-79: Verify impact of continuous=False parameter

The change ensures integer indices, but we should verify this doesn't affect existing code that might expect float indices.

Let's check for any dependencies on the index type:

✅ Verification successful

Setting continuous=False appears safe to use

The change to use integer indices is safe because:

The output segMasks is only used locally within the function
No downstream code depends on float indices
The parameter is well documented
The existing_roi_indices parameter already uses integer values (0 for background)

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for code that might depend on the index type from StructureSetToSegmentation
rg -l "StructureSetToSegmentation" | xargs rg -A 5 "\.index|indices"

# Search for test files that might need updating
fd -e py test_ | xargs rg -A 5 "StructureSetToSegmentation"

Length of output: 446

Script:

#!/bin/bash
# Let's search for any usage of the output from StructureSetToSegmentation
rg -A 5 "makeMask\(" 

# Also search for any test cases involving StructureSetToSegmentation
ast-grep --pattern 'class $_ {
  $$$
  test_$_StructureSetToSegmentation$_($$$) {
    $$$
  }
  $$$
}'

Length of output: 393

Script:

#!/bin/bash
# Let's check how segMasks is used after being created
rg -B 2 -A 5 "segMasks = makeMask"

# Also check for any direct usage of continuous parameter
rg -B 2 -A 2 "continuous\s*=\s*(True|False)"

Length of output: 713

pyproject.toml (1)

14-16: 🛠️ Refactor suggestion

Consider using version range for numpy instead of strict pinning

The strict version pinning of numpy (==1.26.4) differs from the project's convention of using version ranges. Since numpy is a core dependency that other packages might depend on, strict pinning could lead to dependency conflicts.

Consider using a version range similar to other dependencies:

-  "numpy==1.26.4",
+  "numpy>=1.26.4,<2"

Let's verify the compatibility of these dependencies:

✅ Verification successful

Consider using version range for numpy instead of strict pinning

Based on the codebase analysis:

numpy is widely used across multiple files (9 files import numpy)
No version-specific numpy functionality is detected
seaborn is used in correlation analysis and plotting modules
The codebase follows a consistent pattern of using numpy's standard features through the np alias

Since there's no evidence of version-specific numpy requirements and no detected incompatibilities, using a version range would be more maintainable and consistent with the project's dependency style:

-  "numpy==1.26.4",
+  "numpy>=1.26.4,<2"

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check compatibility of numpy with other dependencies

# Check if any other packages in the project have numpy as a dependency
pip index versions seaborn | grep -A 5 "Required-by:"
pip index versions pyradiomics-bhklab | grep -A 5 "Required-by:"

# Check for any known conflicts
pip check numpy==1.26.4 seaborn==0.13.2 pyradiomics-bhklab==3.1.4

Length of output: 468

Script:

#!/bin/bash
# Let's try a different approach by examining the codebase for numpy usage patterns

# Find files that import numpy and check their version constraints
rg "import numpy|from numpy" -l | while read -r file; do
  echo "=== $file ==="
  rg "import numpy|from numpy" "$file"
done

# Look for any requirements.txt or similar files that might have dependency info
fd "requirements.*\.txt|setup\.py" --type f

# Check if there are any CI/testing configuration files that might specify dependencies
fd "tox.ini|pytest.ini|\.github/workflows/.*\.ya?ml" --type f

# Look for any dependency management related comments in Python files
rg "# *(?i)(requires?|depends?|numpy)" "\.py$"

Length of output: 1985

Script:

#!/bin/bash
# Let's check the docs/requirements.txt file and look for any numpy version constraints
cat docs/requirements.txt

# Also check if seaborn is used in the codebase
rg "import seaborn|from seaborn" -l | while read -r file; do
  echo "=== $file ==="
  rg "import seaborn|from seaborn" "$file"
done

# Look for any version-specific numpy functionality
rg "numpy\.[0-9]|np\.[0-9]" -l

Length of output: 764

coderabbitai · 2024-12-09T22:10:51Z

src/readii/io/loaders/features.py

+
+
+def loadFeatureFilesFromImageTypes(extracted_feature_dir:str,
+                                   image_types:Optional[list]=['original'], 


⚠️ Potential issue

Avoid mutable default arguments in function parameters

Using mutable default arguments like lists can lead to unexpected behavior because the default list is shared across all function calls. It's better to use None as the default value and initialize the list inside the function.

Apply this diff to fix the issue:

-def loadFeatureFilesFromImageTypes(extracted_feature_dir:str, - image_types:Optional[list]=['original'], - drop_labels:Optional[bool]=True, - labels_to_drop:Optional[list]=["patient_ID","survival_time_in_years","survival_event_binary"])->Dict[str,pd.DataFrame]: +def loadFeatureFilesFromImageTypes(extracted_feature_dir:str, + image_types:Optional[list]=None, + drop_labels:Optional[bool]=True, + labels_to_drop:Optional[list]=None)->Dict[str,pd.DataFrame]:

Within the function, initialize the lists if they are None:

if image_types is None: image_types = ['original'] if labels_to_drop is None: labels_to_drop = ["patient_ID", "survival_time_in_years", "survival_event_binary"]

Also applies to: 12-12

🧰 Tools

🪛 Ruff (0.8.0)

10-10: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

coderabbitai · 2024-12-09T22:10:51Z

src/readii/data/split.py

+        if new_value not in dataframe[column_to_change].unique():
+            raise ValueError(f"New value {new_value} not found in column {column_to_change}.")


⚠️ Potential issue

Potential logical error in value replacement check

The function checks if the new_value exists in the column before replacement. However, you should check for the presence of the old values to be replaced, not the new value.

Modify the check to verify the presence of the old values:

- if new_value not in dataframe[column_to_change].unique(): - raise ValueError(f"New value {new_value} not found in column {column_to_change}.") + old_values = replacement_value_data[new_value] + if not set(old_values).intersection(set(dataframe[column_to_change].unique())): + raise ValueError(f"Values {old_values} to replace not found in column {column_to_change}.")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if new_value not in dataframe[column_to_change].unique():

raise ValueError(f"New value {new_value} not found in column {column_to_change}.")

old_values = replacement_value_data[new_value]

if not set(old_values).intersection(set(dataframe[column_to_change].unique())):

raise ValueError(f"Values {old_values} to replace not found in column {column_to_change}.")

src/readii/data/label.py

tests/test_correlation.py

coderabbitai · 2024-12-09T22:10:52Z

src/readii/io/loaders/general.py

+def loadImageDatasetConfig(dataset_name:str,
+                           config_dir_path:str) -> dict:
+    """Load the configuration file for a given dataset. Expects the configuration file to be named <dataset_name>.yaml.
+
+    Parameters
+    ----------
+    dataset_name : str
+        Name of the dataset to load the configuration file for.
+    config_dir_path : str
+        Path to the directory containing the configuration files.
+
+    Returns
+    -------
+    dict
+        Dictionary containing the configuration settings for the dataset.
+
+    Examples
+    --------
+    >>> config = loadImageDatasetConfig("NSCLC_Radiogenomics", "config/")
+    """
+    # Make full path to config file
+    config_file_path = os.path.join(config_dir_path, f"{dataset_name}.yaml")
+
+    # Check if config file exists
+    if os.path.exists(config_file_path):
+        # Load the config file
+        config = yaml.safe_load(open(config_file_path, "r"))
+        return config
+    else:
+        print(f"Config file {config_file_path} does not exist.")
+        return None


🛠️ Refactor suggestion

Use context manager for file operations and improve error handling

The function has potential resource leaks and could be more robust in error handling.

Consider this implementation:

def loadImageDatasetConfig(dataset_name:str, config_dir_path:str) -> dict: config_file_path = os.path.join(config_dir_path, f"{dataset_name}.yaml") if not os.path.exists(config_file_path): raise FileNotFoundError(f"Config file {config_file_path} does not exist.") - config = yaml.safe_load(open(config_file_path, "r")) - return config - else: - print(f"Config file {config_file_path} does not exist.") - return None + try: + with open(config_file_path, "r") as f: + return yaml.safe_load(f) + except yaml.YAMLError as e: + raise ValueError(f"Invalid YAML in config file: {e}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def loadImageDatasetConfig(dataset_name:str,

config_dir_path:str) -> dict:

"""Load the configuration file for a given dataset. Expects the configuration file to be named <dataset_name>.yaml.

Parameters

----------

dataset_name : str

Name of the dataset to load the configuration file for.

config_dir_path : str

Path to the directory containing the configuration files.

Returns

-------

dict

Dictionary containing the configuration settings for the dataset.

Examples

--------

>>> config = loadImageDatasetConfig("NSCLC_Radiogenomics", "config/")

"""

# Make full path to config file

config_file_path = os.path.join(config_dir_path, f"{dataset_name}.yaml")

# Check if config file exists

if os.path.exists(config_file_path):

# Load the config file

config = yaml.safe_load(open(config_file_path, "r"))

return config

else:

print(f"Config file {config_file_path} does not exist.")

return None

def loadImageDatasetConfig(dataset_name:str,

config_dir_path:str) -> dict:

"""Load the configuration file for a given dataset. Expects the configuration file to be named <dataset_name>.yaml.

Parameters

----------

dataset_name : str

Name of the dataset to load the configuration file for.

config_dir_path : str

Path to the directory containing the configuration files.

Returns

-------

dict

Dictionary containing the configuration settings for the dataset.

Examples

--------

>>> config = loadImageDatasetConfig("NSCLC_Radiogenomics", "config/")

"""

# Make full path to config file

config_file_path = os.path.join(config_dir_path, f"{dataset_name}.yaml")

# Check if config file exists

if not os.path.exists(config_file_path):

raise FileNotFoundError(f"Config file {config_file_path} does not exist.")

try:

with open(config_file_path, "r") as f:

return yaml.safe_load(f)

except yaml.YAMLError as e:

raise ValueError(f"Invalid YAML in config file: {e}")

🧰 Tools

🪛 Ruff (0.8.0)

32-32: Use a context manager for opening files

(SIM115)

coderabbitai · 2024-12-09T22:10:52Z

src/readii/io/loaders/general.py

+def loadFileToDataFrame(file_path:str) -> pd.DataFrame:
+    """Load data from a csv or xlsx file into a pandas dataframe.
+
+    Parameters
+    ----------
+    file_path (str): Path to the data file.
+
+    Returns
+    -------
+    pd.DataFrame: Dataframe containing the data from the file.
+    """
+     # Get the file extension
+    _, file_extension = os.path.splitext(file_path)
+
+    try:
+        # Check if the file is an Excel file
+        if file_extension == '.xlsx':
+            df = pd.read_excel(file_path)
+        # Check if the file is a CSV file
+        elif file_extension == '.csv':
+            df = pd.read_csv(file_path)
+        else:
+            raise ValueError("Unsupported file format. Please provide a .csv or .xlsx file.")
+
+        return df
+
+    except Exception as e:
+        print(f"An error occurred: {e}")
+        return None


🛠️ Refactor suggestion

Improve error handling and input validation

The function could be more robust with input validation and specific error handling.

Consider this implementation:

def loadFileToDataFrame(file_path:str) -> pd.DataFrame: + if not file_path: + raise ValueError("file_path cannot be empty") + + if not os.path.exists(file_path): + raise FileNotFoundError(f"File {file_path} does not exist") + _, file_extension = os.path.splitext(file_path) try: if file_extension == '.xlsx': df = pd.read_excel(file_path) elif file_extension == '.csv': df = pd.read_csv(file_path) else: raise ValueError("Unsupported file format. Please provide a .csv or .xlsx file.") + if df.empty: + raise ValueError("Loaded DataFrame is empty") + return df - except Exception as e: - print(f"An error occurred: {e}") - return None + except pd.errors.EmptyDataError: + raise ValueError("File is empty") + except (pd.errors.ParserError, ValueError) as e: + raise ValueError(f"Error parsing file: {e}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def loadFileToDataFrame(file_path:str) -> pd.DataFrame:

"""Load data from a csv or xlsx file into a pandas dataframe.

Parameters

----------

file_path (str): Path to the data file.

Returns

-------

pd.DataFrame: Dataframe containing the data from the file.

"""

# Get the file extension

_, file_extension = os.path.splitext(file_path)

try:

# Check if the file is an Excel file

if file_extension == '.xlsx':

df = pd.read_excel(file_path)

# Check if the file is a CSV file

elif file_extension == '.csv':

df = pd.read_csv(file_path)

else:

raise ValueError("Unsupported file format. Please provide a .csv or .xlsx file.")

return df

except Exception as e:

print(f"An error occurred: {e}")

return None

def loadFileToDataFrame(file_path:str) -> pd.DataFrame:

"""Load data from a csv or xlsx file into a pandas dataframe.

Parameters

----------

file_path (str): Path to the data file.

Returns

-------

pd.DataFrame: Dataframe containing the data from the file.

"""

if not file_path:

raise ValueError("file_path cannot be empty")

if not os.path.exists(file_path):

raise FileNotFoundError(f"File {file_path} does not exist")

# Get the file extension

_, file_extension = os.path.splitext(file_path)

try:

# Check if the file is an Excel file

if file_extension == '.xlsx':

df = pd.read_excel(file_path)

# Check if the file is a CSV file

elif file_extension == '.csv':

df = pd.read_csv(file_path)

else:

raise ValueError("Unsupported file format. Please provide a .csv or .xlsx file.")

if df.empty:

raise ValueError("Loaded DataFrame is empty")

return df

except pd.errors.EmptyDataError:

raise ValueError("File is empty")

except (pd.errors.ParserError, ValueError) as e:

raise ValueError(f"Error parsing file: {e}")

good suggestion

@strixy16, glad you found the suggestion helpful.

If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

… now

…matrix is passed into getFeatureCorrelations

…rscore prefix will be added if there isn't one

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (4)

src/readii/analyze/correlation.py (4)
44-45: Refactor single-line conditional statements for better readability

Multiple statements on one line reduce code readability and can make debugging more difficult.

Apply this diff to improve readability:
-if not vertical_feature_name.startswith("_"): vertical_feature_name = f"_{vertical_feature_name}"
+if not vertical_feature_name.startswith("_"):
+    vertical_feature_name = f"_{vertical_feature_name}"

-if not horizontal_feature_name.startswith("_"): horizontal_feature_name = f"_{horizontal_feature_name}"
+if not horizontal_feature_name.startswith("_"):
+    horizontal_feature_name = f"_{horizontal_feature_name}"
🧰 Tools

🪛 Ruff (0.8.0)

44-44: Multiple statements on one line (colon)

(E701)

45-45: Multiple statements on one line (colon)

(E701)

58-58: Use raise ... from e to preserve exception context

When re-raising an exception within an except block, using raise ... from e helps maintain the original traceback, making debugging easier.

Apply this diff to modify the exception handling:
-except Exception as e:
-    raise ValueError(f"Error calculating correlation matrix: {e}")
+except Exception as e:
+    raise ValueError(f"Error calculating correlation matrix: {e}") from e
🧰 Tools

🪛 Ruff (0.8.0)

58-58: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

83-83: Fix typo in docstring

There is a typo in the docstring for the triangle parameter.

Apply this diff to correct the typo:
-    Which triangle half of the matrixto plot. The default is "lower".
+    Which triangle half of the matrix to plot. The default is "lower".
162-170: Refactor duplicated validation logic in correlation functions

The functions getVerticalSelfCorrelations, getHorizontalSelfCorrelations, and getCrossCorrelationMatrix have similar validation code for checking feature counts against the correlation matrix dimensions.

Consider creating a shared helper function to handle this validation logic to reduce code duplication and improve maintainability.

Also applies to: 190-201, 221-226

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 1f4edf2 and 31bf5bf.

📒 Files selected for processing (3)

src/readii/analyze/correlation.py (1 hunks)
src/readii/loaders.py (1 hunks)
tests/test_correlation.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/readii/loaders.py

🧰 Additional context used

🪛 Ruff (0.8.0)

src/readii/analyze/correlation.py

44-44: Multiple statements on one line (colon)

(E701)

45-45: Multiple statements on one line (colon)

(E701)

58-58: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🔇 Additional comments (1)

tests/test_correlation.py (1)

1-115: Comprehensive and well-structured test suite

The test cases cover various scenarios, including valid inputs, default behaviors, invalid methods, incorrect feature inputs, and feature name handling. The tests are well-organized and effectively validate the functionality of the getFeatureCorrelations function.

src/readii/analyze/correlation.py

codecov · 2024-12-10T21:07:45Z

Codecov Report

Attention: Patch coverage is 6.12813% with 337 lines in your changes missing coverage. Please review.

Project coverage is 38.97%. Comparing base (3bcaa28) to head (b1daaf0).

Files with missing lines	Patch %	Lines
src/readii/data/label.py	0.00%	72 Missing ⚠️
src/readii/analyze/plot_correlations.py	0.00%	66 Missing ⚠️
src/readii/data/select.py	0.00%	55 Missing ⚠️
src/readii/analyze/correlation.py	35.59%	38 Missing ⚠️
src/readii/data/split.py	0.00%	29 Missing ⚠️
src/readii/io/loaders/features.py	0.00%	26 Missing ⚠️
src/readii/io/loaders/general.py	0.00%	22 Missing ⚠️
src/readii/io/loaders/images.py	0.00%	17 Missing ⚠️
src/readii/io/writers/plots.py	0.00%	12 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main      #79       +/-   ##
===========================================
- Coverage   60.92%   38.97%   -21.96%     
===========================================
  Files          10       19        +9     
  Lines         540      898      +358     
===========================================
+ Hits          329      350       +21     
- Misses        211      548      +337

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…in the passed column

coderabbitai

Actionable comments posted: 5

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 31bf5bf and 550c32a.

📒 Files selected for processing (1)

src/readii/data/label.py (1 hunks)

🔇 Additional comments (2)

src/readii/data/label.py (2)

203-203: Ensure consistency between docstring and default parameter value

The docstring mentions that the default for event_column_value_mapping is an empty dictionary, but in the function signature, it is set to None. Update the docstring to reflect the actual default value.

Apply this diff to fix the docstring:

-            The default is an empty dictionary.
+            The default is `None`.

272-272: Ensure consistency between docstring and default parameter value

The docstring states that outcome_labels defaults to ["survival_time_in_years", "survival_event_binary"], but in the function signature, it is set to None. Update the docstring to reflect the current default value.

Apply this diff to fix the docstring:

-            The default is ["survival_time_in_years", "survival_event_binary"].
+            The default is `None`.

coderabbitai · 2024-12-10T21:43:15Z

src/readii/data/label.py

+                # TODO: add handling for values not in the dictionary
+


⚠️ Potential issue

Implement handling for unmapped event values

There is a TODO comment indicating that handling for event values not in the mapping dictionary is incomplete. This could lead to errors if unexpected values are encountered. Please implement the necessary handling to ensure all event values are properly managed.

Would you like assistance in implementing this feature?

Yes I would like assistance.

@strixy16, glad to assist! Here's an updated version of the eventOutcomeColumnSetup function with handling for unmapped event values:

def eventOutcomeColumnSetup(dataframe_with_outcome: DataFrame, outcome_column_label: str, standard_column_label: str, event_column_value_mapping: Optional[dict] = None): """Function to set up an event outcome column in a dataframe. Parameters ---------- dataframe_with_outcome : DataFrame Dataframe containing the outcome column to convert. outcome_column_label : str Label for the outcome column to convert in the dataframe. standard_column_label : str Name of the column to save the standardized outcome column as. event_column_value_mapping : dict, optional Dictionary of event values to convert to numeric values. Keys are the event values, values are the numeric values. If provided, any unmapped event values will be assigned new numeric values. If not provided, will attempt to convert based on the values in the outcome column. By default, "alive" and "dead" will be converted to 0 and 1, respectively, if present in the outcome column. Returns ------- dataframe_with_standardized_outcome : DataFrame Dataframe with a copy of the specified outcome column converted to numeric values. """ # Get the type of the existing event column event_variable_type = type(dataframe_with_outcome[outcome_column_label].iloc[0]) # Make a copy of the dataframe to work on dataframe_with_standardized_outcome = dataframe_with_outcome.copy() # Handle numeric event column if np.issubdtype(event_variable_type, np.number): dataframe_with_standardized_outcome[standard_column_label] = dataframe_with_outcome[outcome_column_label] # Handle boolean event column elif np.issubdtype(event_variable_type, np.bool_): dataframe_with_standardized_outcome[standard_column_label] = dataframe_with_outcome[outcome_column_label].astype(int) # Handle string event column elif np.issubdtype(event_variable_type, np.str_): # Make values of outcome column lowercase dataframe_with_standardized_outcome[outcome_column_label] = dataframe_with_outcome[outcome_column_label].str.lower() # Get the existing event values in the provided dataframe existing_event_values = sorted(dataframe_with_standardized_outcome[outcome_column_label].unique()) if not event_column_value_mapping: # Create a dictionary to map event values to numeric values event_column_value_mapping = survivalStatusToNumericMapping( event_outcome_column=dataframe_with_standardized_outcome[outcome_column_label] ) else: # Convert all dictionary keys in provided mapping to lowercase event_column_value_mapping = dict((status.lower(), value) for status, value in event_column_value_mapping.items()) # Identify any values not in the mapping unmapped_values = [value for value in existing_event_values if value not in event_column_value_mapping] if unmapped_values: print(f"Unmapped event values found: {unmapped_values}. They will be assigned new numeric values.") # Assign next available numeric value to unmapped event values max_mapped_value = max(event_column_value_mapping.values(), default=-1) for i, value in enumerate(unmapped_values, start=max_mapped_value + 1): event_column_value_mapping[value] = i # Map the event column using the updated mapping dataframe_with_standardized_outcome[standard_column_label] = \ dataframe_with_standardized_outcome[outcome_column_label].map(event_column_value_mapping) else: raise ValueError(f"Unsupported data type for outcome column: {event_variable_type}") return dataframe_with_standardized_outcome

Explanation:

Handling Unmapped Values: The function now checks for any event values in existing_event_values that are not present in event_column_value_mapping.

Assigning New Numeric Values: Unmapped event values are assigned new numeric codes starting from the next available integer after the maximum value in the existing mapping.

Mapping the Outcome Column: The event outcome column is then mapped using the updated event_column_value_mapping, ensuring all event values are properly managed.

User Feedback: A message is printed to inform the user about any unmapped event values found and their assignment.

This implementation ensures that any unexpected or unmapped event values are handled gracefully, preventing errors during execution.

Let me know if you have any questions or need further assistance!

src/readii/data/label.py

…s to be a required argument

… to remove mutable default argument

coderabbitai

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (5)

src/readii/data/split.py (3)
29-39: Optimize dictionary key iteration.

The current implementation uses .keys() which is unnecessary when iterating over a dictionary.

Apply this optimization:
-    for new_value in replacement_value_data.keys():
+    for new_value in replacement_value_data:
🧰 Tools

🪛 Ruff (0.8.0)

29-29: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

66-70: Optimize dictionary key iteration.

Similar to the previous function, .keys() is unnecessary when iterating over a dictionary.

Apply this optimization:
-    for split_column_name in split_col_data.keys():
+    for split_column_name in split_col_data:
🧰 Tools

🪛 Ruff (0.8.0)

66-66: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

96-96: Consider avoiding inplace operations for better safety.

Using inplace=True with reset_index can be problematic in concurrent operations and makes the code less functional.

Apply this change for better safety:
-            split_dataframe.reset_index(inplace=True, drop=True)
+            split_dataframe = split_dataframe.reset_index(drop=True)
src/readii/io/loaders/features.py (2)
13-31: Update docstring to match current implementation.

The docstring contains inconsistencies with the actual implementation:

It mentions a default value of ['original'] for image_types, but this parameter is now required

It references specific default values for labels_to_drop in the description, which could become outdated

Consider updating the docstring:
     Parameters
     ----------
     extracted_feature_dir : str
         Path to the directory containing the extracted feature csv files
-    image_types : list, optional
-        List of image types to load in. The default is ['original'].
+    image_types : list
+        List of image types to load in
     drop_labels : bool, optional
         Whether to drop the labels from the dataframes. Use when loading labelled data from data_setup_for_modeling.ipynb. The default is True.
     labels_to_drop : list, optional
-        List of labels to drop from the dataframes. The default is ["patient_ID","survival_time_in_years","survival_event_binary"] based on code
-        in data_setup_for_modeling.ipynb.
+        List of labels to drop from the dataframes. If None, defaults to common patient identifiers and survival metrics.
42-53: Consider providing more context in error messages.

The error handling could be more informative by including the specific error details and providing guidance on how to resolve the issue.
     for image_type in image_types:
         try:
             image_type_feature_file = [file for file in feature_file_list if (image_type in file) and (file.endswith(".csv"))][0]
             feature_file_list.remove(image_type_feature_file)
-        except Exception as e:
-            print(f"{e}\n No {image_type} feature csv files found in {extracted_feature_dir}")
+        except IndexError:
+            print(f"No CSV files found for image type '{image_type}' in {extracted_feature_dir}.")
+            print(f"Expected filename pattern: *{image_type}*.csv")
             # Skip to the next image type
             continue

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 550c32a and b1daaf0.

📒 Files selected for processing (2)

src/readii/data/split.py (1 hunks)
src/readii/io/loaders/features.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.0)

src/readii/data/split.py

29-29: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

66-66: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

src/readii/io/loaders/features.py

66-66: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

🔇 Additional comments (4)

src/readii/data/split.py (3)

1-6: LGTM! Clean imports and well-typed function signatures.

The code follows good practices with explicit type hints and minimal imports.

7-23: LGTM! Well-documented function with clear parameters and return types.

The docstring provides comprehensive information about the function's purpose, parameters, and return value.

46-61: LGTM! Clear and comprehensive documentation.

The docstring effectively communicates the function's purpose, parameters, and return value.

src/readii/io/loaders/features.py (1)

1-7: LGTM! Imports are well-organized.

The imports are logically grouped and all are being used in the code.

src/readii/io/loaders/features.py

…andling in loadImageDatasetConfig

…atements

…tus mapping function

…ng to numeric

…t, add check when converting the values to lowercase

…lumnValues function in split.py

…lesFromImageTypes

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (3)

src/readii/io/loaders/general.py (1)
58-68: Ensure file extension checks are case-insensitive

Currently, the file extension checks are case-sensitive, which may cause issues if the file has an uppercase extension (e.g., .XLSX, .CSV). Consider normalizing the file extension to lowercase before comparison.

Apply this diff to normalize the file extension:
 _, file_extension = os.path.splitext(file_path)
+file_extension = file_extension.lower()

 try:
     # Check if the file is an Excel file
     if file_extension == '.xlsx':
         df = pd.read_excel(file_path)
     # Check if the file is a CSV file
     elif file_extension == '.csv':
         df = pd.read_csv(file_path)
     else:
         raise ValueError("Unsupported file format. Please provide a .csv or .xlsx file.")
src/readii/analyze/correlation.py (1)
47-48: Split multiple statements into separate lines

Lines 47 and 48 contain multiple statements on one line, which reduces readability. According to PEP 8 style guidelines, each statement should be on its own line.

Apply this diff:
-if not vertical_feature_name.startswith("_"): vertical_feature_name = f"_{vertical_feature_name}"
+if not vertical_feature_name.startswith("_"):
+    vertical_feature_name = f"_{vertical_feature_name}"
-if not horizontal_feature_name.startswith("_"): horizontal_feature_name = f"_{horizontal_feature_name}"
+if not horizontal_feature_name.startswith("_"):
+    horizontal_feature_name = f"_{horizontal_feature_name}"
🧰 Tools

🪛 Ruff (0.8.0)

47-47: Multiple statements on one line (colon)

(E701)

48-48: Multiple statements on one line (colon)

(E701)
src/readii/data/label.py (1)
4-4: Remove unused import pandas as pd

The module pandas is imported as pd but not used in the code. Please remove the unused import to clean up your code.

Apply this diff:
-import pandas as pd
🧰 Tools

🪛 Ruff (0.8.0)

4-4: pandas imported but unused

Remove unused import: pandas

(F401)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b1daaf0 and dc2e86a.

📒 Files selected for processing (4)

src/readii/analyze/correlation.py (1 hunks)
src/readii/data/label.py (1 hunks)
src/readii/io/loaders/features.py (1 hunks)
src/readii/io/loaders/general.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.0)

src/readii/io/loaders/features.py

70-70: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

src/readii/io/loaders/general.py

39-39: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

76-76: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

78-78: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

src/readii/data/label.py

4-4: pandas imported but unused

Remove unused import: pandas

(F401)

src/readii/analyze/correlation.py

47-47: Multiple statements on one line (colon)

(E701)

48-48: Multiple statements on one line (colon)

(E701)

61-61: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🔇 Additional comments (2)

src/readii/io/loaders/features.py (2)

70-70: Remove unused exception variable e

The exception variable e is assigned but not used in the except block. Remove it to clean up the code.

Apply this diff:

-except KeyError as e:
+except KeyError:
     print(f"{feature_file_path} does not have the labels {labels_to_drop} to drop.")
     # Skip to the next image type
     continue

🧰 Tools

🪛 Ruff (0.8.0)

70-70: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

9-12: 🛠️ Refactor suggestion

Fix inconsistency between function signature and docstring for image_types

The parameter image_types is required in the function signature but is described as optional with a default value in the docstring. Please update the function signature or the docstring to reflect the intended usage.

If image_types should have a default value, avoid using a mutable default argument. Set it to None and initialize within the function.

Apply this diff:

 def loadFeatureFilesFromImageTypes(extracted_feature_dir:str,
-                                   image_types:list, 
+                                   image_types:Optional[list]=None, 
                                    drop_labels:Optional[bool]=True, 
                                    labels_to_drop:Optional[list]=None)->Dict[str,pd.DataFrame]:

Then initialize image_types inside the function:

if image_types is None:
    image_types = ['original']

coderabbitai · 2024-12-10T22:36:07Z

src/readii/io/loaders/general.py

+    except (pd.errors.ParserError, ValueError) as e:
+        raise ValueError(f"Error parsing file: {e}")


🛠️ Refactor suggestion

Use exception chaining with raise ... from e

To maintain the original exception context, add from e when raising a new exception inside an except block.

Apply this diff:

except (pd.errors.ParserError, ValueError) as e: - raise ValueError(f"Error parsing file: {e}") + raise ValueError(f"Error parsing file: {e}") from e

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

except (pd.errors.ParserError, ValueError) as e:

raise ValueError(f"Error parsing file: {e}")

except (pd.errors.ParserError, ValueError) as e:

raise ValueError(f"Error parsing file: {e}") from e

🧰 Tools

🪛 Ruff (0.8.0)

78-78: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

coderabbitai · 2024-12-10T22:36:07Z

src/readii/io/loaders/general.py

+        raise ValueError(f"Invalid YAML in config file: {e}")  
+def loadFileToDataFrame(file_path:str) -> pd.DataFrame:


🛠️ Refactor suggestion

Use exception chaining with raise ... from e

To preserve the original traceback and provide better error context, use exception chaining by adding from e when raising a new exception within an except block.

Apply this diff:

except yaml.YAMLError as e: - raise ValueError(f"Invalid YAML in config file: {e}") + raise ValueError(f"Invalid YAML in config file: {e}") from e

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

raise ValueError(f"Invalid YAML in config file: {e}")

def loadFileToDataFrame(file_path:str) -> pd.DataFrame:

raise ValueError(f"Invalid YAML in config file: {e}") from e

def loadFileToDataFrame(file_path:str) -> pd.DataFrame:

🧰 Tools

🪛 Ruff (0.8.0)

39-39: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

coderabbitai · 2024-12-10T22:36:07Z

src/readii/analyze/correlation.py

+        # Calculate correlation between vertical features and horizontal features
+        correlation_matrix = features_to_correlate.corr(method=method)
+    except Exception as e:
+        raise ValueError(f"Error calculating correlation matrix: {e}")


🛠️ Refactor suggestion

Use exception chaining with raise ... from e

When raising a new exception within an except block, use from e to preserve the original exception context.

Apply this diff:

except Exception as e: - raise ValueError(f"Error calculating correlation matrix: {e}") + raise ValueError(f"Error calculating correlation matrix: {e}") from e

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

raise ValueError(f"Error calculating correlation matrix: {e}")

raise ValueError(f"Error calculating correlation matrix: {e}") from e

🧰 Tools

🪛 Ruff (0.8.0)

61-61: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

coderabbitai · 2024-12-10T22:36:07Z

src/readii/data/label.py

+
+        # Swap the keys and values in the mapping dictionary to use the replaceColumnValues function
+        # So the numeric values will be the keys and the string event values will be the values
+        replacement_value_data = dict([num_event, str_event] for str_event, num_event in event_column_value_mapping.items())


⚠️ Potential issue

Correct the key-value order in replacement_value_data

The dictionary comprehension for replacement_value_data may have the keys and values swapped, which can lead to incorrect replacements.

Apply this diff to fix the key-value pair order:

- replacement_value_data = dict([num_event, str_event] for str_event, num_event in event_column_value_mapping.items()) + replacement_value_data = {num_event: str_event for str_event, num_event in event_column_value_mapping.items()}

Ensure that the keys in replacement_value_data are the numeric event codes and the values are the corresponding string event labels.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

replacement_value_data = dict([num_event, str_event] for str_event, num_event in event_column_value_mapping.items())

replacement_value_data = {num_event: str_event for str_event, num_event in event_column_value_mapping.items()}

strixy16 added 30 commits December 3, 2024 11:08

feat: add function to calculate feature correlation matrix

95586e5

feat: add function to generate a heatmap plot figure from a correlati…

5193fb4

…on matrix

feat: add init file to analyze directory

a0771c6

feat: add error handling in getFeatureCorrelations

cf26afc

feat: add general loading file, add loading config and data file func…

e643349

…tions

feat: add file for loading functions related to feature files

f5882da

build: add numpy and seaborn for correlation code

5495550

refactor: remove so far unused imports

decf8e5

feat: started test function for getFeatureCorrelations

fcc1b9e

feat: make files for better function organization

a708182

Merge remote-tracking branch 'origin/main' into katys/integrate-analy…

d706863

…ze_readii_outputs

fix: remove duplicate tool.pixi.dependencies from merge

d63a1c5

build: add seaborn for correlation plot functions, need to specify nu…

484c12e

…mpy for seaborn install to work

feat: add init files for new directories

c6b945f

feat: add function to calculate feature correlations and a function t…

fc83d69

…o plot them as a heatmap

feat: add function to drop a set of features at the beginning of a pa…

46f0773

…nds dataframe based on a column name

fix: set continuous setting in StructureSetToSegmentation to False

fe56257

build: moved seaborn and numpy to project dependencies

e618269

test: make test feature matrix to test correlation functions with, up…

a6ab888

…date test to use this

feat: set StructureSetToSegmentation continuous argument to False

0f1d837

build: lock file from installing on katys mac

5b0dccc

Merge branch 'katys/fix_continuous_rtstruct_index' into katys/integra…

0d9c943

…te-analyze_readii_outputs

feat: add functions for selecting subsets of dataframes

5b4e5cb

refactor: renamed process to select for specificity

b36f3d2

style: rename labelling for consistent filename convention

fa4da89

feat: add function to extract patient ID label from a dataframe

0256466

feat: add functions to replace column values in a dataset for imputat…

f0b87c2

…ion and to split data by column values like for train and test

feat: add function to save out seaborn plot figure to a png

d44e1ce

feat: add function to convert numerical days column to years

bfdc357

feat: add function to set up a time outcome column for survival predi…

1e89c17

…ction

strixy16 linked an issue Dec 9, 2024 that may be closed by this pull request

Integrate analyze_readii_outputs into package #70

Closed

coderabbitai bot reviewed Dec 9, 2024

View reviewed changes

strixy16 added 4 commits December 10, 2024 14:55

feat: change continuous to True in loadRTSTRUCTSITK so tests pass for…

de1c752

… now

fix: need default vertical and horizontal suffixes when same feature …

2647168

…matrix is passed into getFeatureCorrelations

fix: default feature names will have underscore at the front and unde…

253aba2

…rscore prefix will be added if there isn't one

feat: testing getFeatureCorrelations function

31bf5bf

coderabbitai bot reviewed Dec 10, 2024

View reviewed changes

src/readii/analyze/correlation.py Outdated Show resolved Hide resolved

strixy16 added 4 commits December 10, 2024 16:07

fix: handle mutable input argument event_column_mapping

231c390

fix: add fstring so variable is used properly in error message

40c1cba

fix: remove mutable version of outcome_labels input for addOutcomeLabels

550c32a

fix: update error handling of old values to be replaced not existing …

0d36600

…in the passed column

coderabbitai bot reviewed Dec 10, 2024

View reviewed changes

strixy16 added 2 commits December 10, 2024 16:43

feat: change input image_types list for loadFeatureFilesFromImageType…

187b1cb

…s to be a required argument

fix: change labels to drop default to None and assign in the function…

b1daaf0

… to remove mutable default argument

coderabbitai bot reviewed Dec 10, 2024

View reviewed changes

src/readii/io/loaders/features.py Outdated Show resolved Hide resolved

src/readii/io/loaders/features.py Show resolved Hide resolved

src/readii/io/loaders/features.py Outdated Show resolved Hide resolved

strixy16 added 11 commits December 10, 2024 16:49

refactor: use context manager for file operations and improve error h…

6075966

…andling in loadImageDatasetConfig

feat: improve error handling and input validation in loadFileToDataframe

501e20d

refactor: change assert statements in getFeatureCorrelations to if st…

5ea0b99

…atements

feat: handle NaN values in existing event values list in survival sta…

da16d68

…tus mapping function

docs: describe handling of NaNs in survival outcome column when mappi…

0c8ccbf

…ng to numeric

refactor: check dtype of event outcome column instead of first elemen…

2ab08e6

…t, add check when converting the values to lowercase

refactor: simplify event column mapping dictionary check with sets

90839a2

refactor: change out string to numeric replacement with the replaceCo…

80b81a7

…lumnValues function in split.py

feat: check that extracted feature directory exists

b0a892d

refactor: improve error handling for dropping labels in loadFeatureFi…

edaf74c

…lesFromImageTypes

feat: validate that any feature sets were loaded before return

dc2e86a

coderabbitai bot reviewed Dec 10, 2024

View reviewed changes

strixy16 closed this Dec 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: integrate analyze readii outputs functions #79

feat: integrate analyze readii outputs functions #79

strixy16 commented Dec 9, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 9, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Dec 9, 2024

coderabbitai bot Dec 9, 2024

coderabbitai bot Dec 9, 2024

coderabbitai bot Dec 9, 2024 •

edited

Loading

strixy16 Dec 10, 2024

coderabbitai bot Dec 10, 2024

coderabbitai bot left a comment

codecov bot commented Dec 10, 2024 •

edited

Loading

coderabbitai bot left a comment

coderabbitai bot Dec 10, 2024 •

edited

Loading

strixy16 Dec 10, 2024

coderabbitai bot Dec 10, 2024

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot Dec 10, 2024

coderabbitai bot Dec 10, 2024

coderabbitai bot Dec 10, 2024

coderabbitai bot Dec 10, 2024



		def loadFeatureFilesFromImageTypes(extracted_feature_dir:str,
		image_types:Optional[list]=['original'],

		if new_value not in dataframe[column_to_change].unique():
		raise ValueError(f"New value {new_value} not found in column {column_to_change}.")

-        if new_value not in dataframe[column_to_change].unique():
-            raise ValueError(f"New value {new_value} not found in column {column_to_change}.")
+        old_values = replacement_value_data[new_value]
+        if not set(old_values).intersection(set(dataframe[column_to_change].unique())):
+            raise ValueError(f"Values {old_values} to replace not found in column {column_to_change}.")

		except (pd.errors.ParserError, ValueError) as e:
		raise ValueError(f"Error parsing file: {e}")

		raise ValueError(f"Invalid YAML in config file: {e}")
		def loadFileToDataFrame(file_path:str) -> pd.DataFrame:

	raise ValueError(f"Error calculating correlation matrix: {e}")
	raise ValueError(f"Error calculating correlation matrix: {e}") from e

	replacement_value_data = dict([num_event, str_event] for str_event, num_event in event_column_value_mapping.items())
	replacement_value_data = {num_event: str_event for str_event, num_event in event_column_value_mapping.items()}

feat: integrate analyze readii outputs functions #79

feat: integrate analyze readii outputs functions #79

Conversation

strixy16 commented Dec 9, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Dec 9, 2024 • edited Loading

Walkthrough

Changes

Sequence Diagram(s)

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 9, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 9, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 9, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 9, 2024 • edited Loading

Choose a reason for hiding this comment

strixy16 Dec 10, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 10, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

codecov bot commented Dec 10, 2024 • edited Loading

Codecov Report

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

strixy16 Dec 10, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 10, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 10, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 10, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 10, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 10, 2024

Choose a reason for hiding this comment

strixy16 commented Dec 9, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 9, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

coderabbitai bot Dec 9, 2024 •

edited

Loading

codecov bot commented Dec 10, 2024 •

edited

Loading

coderabbitai bot Dec 10, 2024 •

edited

Loading