-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add io/loaders module #83
Conversation
…ic loading functions
WalkthroughThis pull request introduces a new module for loading various data types within the READII pipeline. It includes multiple new functions for loading imaging feature sets, dataset configurations, and data files into pandas DataFrames. Additionally, two new YAML configuration files for specific datasets are added, along with new tests to validate the functionality of the loading functions and configuration handling. Changes
Possibly related PRs
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #83 +/- ##
==========================================
- Coverage 50.89% 49.04% -1.86%
==========================================
Files 21 25 +4
Lines 894 991 +97
==========================================
+ Hits 455 486 +31
- Misses 439 505 +66 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Outside diff range and nitpick comments (7)
src/readii/io/loaders/features.py (2)
23-25
: Avoid referencing notebooks in docstringsReferencing
data_setup_for_modeling.ipynb
in the docstring may not be helpful for users who are not familiar with that notebook. Consider providing a clear description without external references.Apply this diff to improve the docstring:
labels_to_drop : list, optional List of labels to drop from the dataframes. The default is ["patient_ID", "survival_time_in_years", "survival_event_binary"]. - based on code in data_setup_for_modeling.ipynb.
70-70
: Remove unused variablee
in exception handlingThe variable
e
is assigned but not used in theexcept
clause. You can removeas e
to clean up the code.Apply this diff to fix the issue:
- except KeyError as e: + except KeyError:🧰 Tools
🪛 Ruff (0.8.2)
70-70: Local variable
e
is assigned to but never usedRemove assignment to unused variable
e
(F841)
tests/io/loaders/test_general.py (1)
23-24
: Useis
for comparison toNone
Comparisons to
None
should useis
oris not
to check for identity.Apply this diff to fix the comparisons:
- assert config["outcome_variables"]["event_label"] == None - assert config["outcome_variables"]["event_value_mapping"] == None + assert config["outcome_variables"]["event_label"] is None + assert config["outcome_variables"]["event_value_mapping"] is None🧰 Tools
🪛 Ruff (0.8.2)
23-23: Comparison to
None
should becond is None
Replace with
cond is None
(E711)
24-24: Comparison to
None
should becond is None
Replace with
cond is None
(E711)
src/readii/io/loaders/general.py (1)
39-39
: Useraise ... from ...
to preserve exception contextWhen re-raising exceptions in an
except
block, useraise ... from e
to maintain the original traceback and provide better debugging information.Apply this diff to improve exception handling:
except yaml.YAMLError as e: - raise ValueError(f"Invalid YAML in config file: {e}") + raise ValueError(f"Invalid YAML in config file: {e}") from e # ... except pd.errors.EmptyDataError: - raise ValueError("File is empty") + raise ValueError("File is empty") from None except (pd.errors.ParserError, ValueError) as e: - raise ValueError(f"Error parsing file: {e}") + raise ValueError(f"Error parsing file: {e}") from eAlso applies to: 79-79, 81-81
🧰 Tools
🪛 Ruff (0.8.2)
39-39: Within an
except
clause, raise exceptions withraise ... from err
orraise ... from None
to distinguish them from errors in exception handling(B904)
tests/4D-Lung/4D-Lung.yaml (1)
20-20
: Fix YAML formatting for better readabilityThe image types list needs proper spacing after commas.
-image_types: ["original", "shuffled_full","shuffled_roi","shuffled_non_roi","randomized_sampled_full","randomized_sampled_roi","randomized_sampled_non_roi"] +image_types: ["original", "shuffled_full", "shuffled_roi", "shuffled_non_roi", "randomized_sampled_full", "randomized_sampled_roi", "randomized_sampled_non_roi"]🧰 Tools
🪛 yamllint (1.35.1)
[warning] 20-20: too few spaces after comma
(commas)
[warning] 20-20: too few spaces after comma
(commas)
[warning] 20-20: too few spaces after comma
(commas)
[warning] 20-20: too few spaces after comma
(commas)
[warning] 20-20: too few spaces after comma
(commas)
tests/NSCLC_Radiogenomics/NSCLC_Radiogenomics.yaml (2)
20-20
: Fix YAML formatting for consistencySimilar to the 4D-Lung config, the image types list needs proper spacing.
-image_types: ["original", "shuffled_full","shuffled_roi","shuffled_non_roi","randomized_sampled_full","randomized_sampled_roi","randomized_sampled_non_roi"] +image_types: ["original", "shuffled_full", "shuffled_roi", "shuffled_non_roi", "randomized_sampled_full", "randomized_sampled_roi", "randomized_sampled_non_roi"]🧰 Tools
🪛 yamllint (1.35.1)
[warning] 20-20: too few spaces after comma
(commas)
[warning] 20-20: too few spaces after comma
(commas)
[warning] 20-20: too few spaces after comma
(commas)
[warning] 20-20: too few spaces after comma
(commas)
[warning] 20-20: too few spaces after comma
(commas)
12-17
: Document empty configuration sectionsThe
exclusion_variables
andtrain_test_split
sections are empty. Consider adding comments to explain why these are empty or if they will be populated in the future.-exclusion_variables: +exclusion_variables: # No exclusion criteria defined for this dataset train_test_split: split: False - split_variable: - impute: + split_variable: # Not applicable when split is False + impute: # Not applicable when split is False🧰 Tools
🪛 yamllint (1.35.1)
[error] 12-12: trailing spaces
(trailing-spaces)
[error] 16-16: trailing spaces
(trailing-spaces)
[error] 17-17: trailing spaces
(trailing-spaces)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
tests/4D-Lung/results/features/radiomicfeatures_original_4D-Lung.csv
is excluded by!**/*.csv
📒 Files selected for processing (8)
src/readii/io/loaders/__init__.py
(1 hunks)src/readii/io/loaders/features.py
(1 hunks)src/readii/io/loaders/general.py
(1 hunks)src/readii/io/loaders/images.py
(1 hunks)tests/4D-Lung/4D-Lung.yaml
(1 hunks)tests/NSCLC_Radiogenomics/NSCLC_Radiogenomics.yaml
(1 hunks)tests/io/loaders/test_general.py
(1 hunks)tests/test_feature_extraction.py
(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- src/readii/io/loaders/init.py
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/io/loaders/test_general.py
1-1: from readii.io.loaders.general import *
used; unable to detect undefined names
(F403)
13-13: loadImageDatasetConfig
may be undefined, or defined from star imports
(F405)
20-20: loadImageDatasetConfig
may be undefined, or defined from star imports
(F405)
23-23: Comparison to None
should be cond is None
Replace with cond is None
(E711)
24-24: Comparison to None
should be cond is None
Replace with cond is None
(E711)
src/readii/io/loaders/general.py
39-39: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
79-79: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
81-81: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
src/readii/io/loaders/features.py
70-70: Local variable e
is assigned to but never used
Remove assignment to unused variable e
(F841)
tests/test_feature_extraction.py
117-117: Local variable actual
is assigned to but never used
Remove assignment to unused variable actual
(F841)
127-127: Local variable actual
is assigned to but never used
Remove assignment to unused variable actual
(F841)
🪛 yamllint (1.35.1)
tests/4D-Lung/4D-Lung.yaml
[error] 2-2: trailing spaces
(trailing-spaces)
[error] 7-7: trailing spaces
(trailing-spaces)
[error] 8-8: trailing spaces
(trailing-spaces)
[error] 10-10: trailing spaces
(trailing-spaces)
[error] 12-12: trailing spaces
(trailing-spaces)
[error] 16-16: trailing spaces
(trailing-spaces)
[error] 17-17: trailing spaces
(trailing-spaces)
[warning] 20-20: too few spaces after comma
(commas)
[warning] 20-20: too few spaces after comma
(commas)
[warning] 20-20: too few spaces after comma
(commas)
[warning] 20-20: too few spaces after comma
(commas)
[warning] 20-20: too few spaces after comma
(commas)
tests/NSCLC_Radiogenomics/NSCLC_Radiogenomics.yaml
[error] 2-2: trailing spaces
(trailing-spaces)
[error] 12-12: trailing spaces
(trailing-spaces)
[error] 16-16: trailing spaces
(trailing-spaces)
[error] 17-17: trailing spaces
(trailing-spaces)
[warning] 20-20: too few spaces after comma
(commas)
[warning] 20-20: too few spaces after comma
(commas)
[warning] 20-20: too few spaces after comma
(commas)
[warning] 20-20: too few spaces after comma
(commas)
[warning] 20-20: too few spaces after comma
(commas)
🔇 Additional comments (2)
tests/test_feature_extraction.py (1)
49-51
: LGTM: New fixture for 4D Lung metadata path
The fixture is well-defined and follows the same pattern as the existing nsclcMetadataPath
fixture.
tests/NSCLC_Radiogenomics/NSCLC_Radiogenomics.yaml (1)
6-10
: LGTM: Well-defined outcome variables
The outcome variables section is properly configured with clear mappings for survival status.
tests/test_feature_extraction.py
Outdated
def test_NSCLC_radiomicFeatureExtraction_output(nsclcMetadataPath): | ||
"""Test output creation from radiomic feature extraction for SEG dataset""" | ||
actual = radiomicFeatureExtraction(nsclcMetadataPath, | ||
imageDirPath = "tests/", | ||
roiNames = None, | ||
outputDirPath = "tests/output/") | ||
outputDirPath = "tests/NSCLC_Radiogenomics/results/") | ||
expected_path = "tests/output/features/radiomicfeatures_original_NSCLC_Radiogenomics.csv" | ||
assert os.path.exists(expected_path) No newline at end of file | ||
assert os.path.exists(expected_path) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fix unused variable warning and consider adding assertions
The test function has been renamed for better clarity, but there are two issues to address:
- The
actual
variable is assigned but never used - The test only verifies file existence without checking the content
Consider this improvement:
def test_NSCLC_radiomicFeatureExtraction_output(nsclcMetadataPath):
"""Test output creation from radiomic feature extraction for SEG dataset"""
- actual = radiomicFeatureExtraction(nsclcMetadataPath,
+ radiomicFeatureExtraction(nsclcMetadataPath,
imageDirPath = "tests/",
roiNames = None,
outputDirPath = "tests/NSCLC_Radiogenomics/results/")
expected_path = "tests/output/features/radiomicfeatures_original_NSCLC_Radiogenomics.csv"
- assert os.path.exists(expected_path)
+ assert os.path.exists(expected_path), "Output file was not created"
+ # Consider adding assertions to verify the content of the output file
+ df = pd.read_csv(expected_path)
+ assert not df.empty, "Output file is empty"
+ assert "original_shape_MeshVolume" in df.columns, "Expected features are missing"
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.8.2)
117-117: Local variable actual
is assigned to but never used
Remove assignment to unused variable actual
(F841)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
solid, have a PR at #87 for with some minor changes for one of the files and a couple changes in the feature file
generally try to use pathlib over os.path to make reading easier
…87) these are mostly just qc stuff from ruff - pathlib over os.path - formatted docstrings (if looking to add these functions on the mkdocs, the numpydoc parser is not really forgiving) - exception handling is a lot easier to debug when you pass exception context back to the stack trace with `raise .... from e`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (5)
tests/io/loaders/test_general.py (3)
4-10
: Use path utilities for test directoriesThe hardcoded paths in fixtures could make tests fragile. Consider using
pathlib
oros.path
utilities to construct paths relative to the test file location.Here's a more robust implementation:
+import os +from pathlib import Path + @pytest.fixture def nsclcConfigDirPath(): - return "tests/NSCLC_Radiogenomics" + return str(Path(__file__).parent.parent.parent / "NSCLC_Radiogenomics") @pytest.fixture def lung4DConfigDirPath(): - return "tests/4D-Lung" + return str(Path(__file__).parent.parent.parent / "4D-Lung")
12-17
: Improve test readability and coverageThe test verifies the happy path but could be improved in two ways:
- Break down the long image_types list for better readability
- Add error case testing
Here's how to improve readability:
def test_NSCLC_loadImageDatasetConfig(nsclcConfigDirPath): config = loadImageDatasetConfig("NSCLC_Radiogenomics", nsclcConfigDirPath) + expected_image_types = [ + "original", + "shuffled_full", + "shuffled_roi", + "shuffled_non_roi", + "randomized_sampled_full", + "randomized_sampled_roi", + "randomized_sampled_non_roi" + ] assert config["dataset_name"] == "NSCLC_Radiogenomics" - assert config["image_types"] == ["original", "shuffled_full","shuffled_roi","shuffled_non_roi","randomized_sampled_full","randomized_sampled_roi","randomized_sampled_non_roi"] + assert config["image_types"] == expected_image_types assert config["outcome_variables"]["event_label"] == "Survival Status" assert config["outcome_variables"]["event_value_mapping"] == {'Alive': 0, 'Dead': 1}Would you like me to help generate additional test cases for error scenarios?
1-24
: Reduce code duplication by sharing image types listThe same image_types list is duplicated in both test functions. Consider moving it to a shared fixture or constant.
Here's how to implement this:
from readii.io.loaders.general import loadImageDatasetConfig import pytest +@pytest.fixture +def expected_image_types(): + return [ + "original", + "shuffled_full", + "shuffled_roi", + "shuffled_non_roi", + "randomized_sampled_full", + "randomized_sampled_roi", + "randomized_sampled_non_roi" + ] + @pytest.fixture def nsclcConfigDirPath(): return "tests/NSCLC_Radiogenomics"Then update both test functions to use this fixture:
def test_NSCLC_loadImageDatasetConfig(nsclcConfigDirPath, expected_image_types): config = loadImageDatasetConfig("NSCLC_Radiogenomics", nsclcConfigDirPath) assert config["dataset_name"] == "NSCLC_Radiogenomics" assert config["image_types"] == expected_image_types # ... rest of the test def test_lung4D_loadImageDatasetConfig(lung4DConfigDirPath, expected_image_types): config = loadImageDatasetConfig("4D-Lung", lung4DConfigDirPath) assert config["dataset_name"] == "4D-Lung" assert config["image_types"] == expected_image_types # ... rest of the test🧰 Tools
🪛 Ruff (0.8.2)
23-23: Comparison to
None
should becond is None
Replace with
cond is None
(E711)
24-24: Comparison to
None
should becond is None
Replace with
cond is None
(E711)
src/readii/io/loaders/general.py (2)
7-15
: Enhance exception classes with more descriptive docstrings and default messages.While custom exceptions are good practice, they could be more informative.
Consider this enhancement:
class ConfigError(Exception): - """Base class for errors in the config module.""" + """Exception raised for errors during configuration loading and parsing. + + This exception is raised when there are issues with loading or parsing + configuration files, such as invalid YAML syntax or empty configurations. + """ - pass + def __init__(self, message="Configuration error occurred"): + super().__init__(message) class DataFrameLoadError(Exception): - """Custom exception for DataFrame loading errors.""" + """Exception raised for errors during DataFrame loading operations. + + This exception is raised when there are issues with loading data into + DataFrames, such as empty files or parsing errors. + """ - pass + def __init__(self, message="Error occurred while loading DataFrame"): + super().__init__(message)
91-96
: Enhance error handling with more specific error messages.The current error handling could provide more detailed information about the nature of the error.
Consider this enhancement:
except pd.errors.EmptyDataError as e: - raise DataFrameLoadError("File is empty") from e + raise DataFrameLoadError(f"File {file_path} is empty or contains no data") from e except (pd.errors.ParserError, ValueError) as e: - raise DataFrameLoadError("Error parsing file") from e + raise DataFrameLoadError(f"Error parsing file {file_path}: {str(e)}") from e
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
tests/output/ct_to_seg_match_list_4D-Lung.csv
is excluded by!**/*.csv
📒 Files selected for processing (3)
.gitignore
(1 hunks)src/readii/io/loaders/general.py
(1 hunks)tests/io/loaders/test_general.py
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- .gitignore
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/io/loaders/test_general.py
23-23: Comparison to None
should be cond is None
Replace with cond is None
(E711)
24-24: Comparison to None
should be cond is None
Replace with cond is None
(E711)
🔇 Additional comments (1)
src/readii/io/loaders/general.py (1)
1-5
: LGTM! Clean and well-organized imports.
The imports follow good practices with standard library imports first, followed by third-party imports. All imports are necessary for the implemented functionality.
try: | ||
if file_extension == '.xlsx': | ||
df = pd.read_excel(file_path) | ||
elif file_extension == '.csv': | ||
df = pd.read_csv(file_path) | ||
else: | ||
msg = f"Unsupported file format {file_extension}. Please provide a .csv or .xlsx file." | ||
raise ValueError(msg) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Based on the gathered information, I can now provide the final response:
Add validation for potential Excel file format exploits
The current implementation only handles basic file extension checks and empty file errors, but lacks specific Excel format validation. Add the following security measures:
- Use
openpyxl
engine explicitly withread_only=True
to prevent formula execution - Add file size limits to prevent memory exhaustion attacks
- Validate the Excel file structure before processing
Example locations that need attention:
src/readii/io/loaders/general.py:84
: Updatepd.read_excel()
call with security parameters
🔗 Analysis chain
Add validation for Excel file format.
The current implementation doesn't validate Excel file formats, which could lead to security issues.
Let's verify the Excel file handling in the codebase:
Let me gather more context about the Excel file handling implementation.
Let me check for any additional error handling or validation mechanisms in the codebase.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for Excel file handling patterns
rg -l "\.xlsx?" .
# Check for potential Excel-related security measures
ast-grep --pattern 'pd.read_excel($$$)'
Length of output: 199
Script:
#!/bin/bash
# Check for any Excel-related validation or security measures
rg -A 5 -B 5 "\.xlsx" .
# Look for any Excel-related error handling
rg -A 5 -B 5 "read_excel" .
# Check for any custom Excel validation functions
ast-grep --pattern 'def $_($$$) {
$$$
xlsx
$$$
}'
Length of output: 2634
Script:
#!/bin/bash
# Check for custom exceptions and error handling
rg -A 10 "class.*Error" .
# Look for any validation utilities
rg -A 5 "validate|check|verify" .
# Check the full context of the file loading function
ast-grep --pattern 'def load_$_($$$) {
$$$
}'
Length of output: 18408
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (4)
src/readii/io/loaders/features.py (4)
11-33
: Fix docstring inconsistency with function signature.The docstring indicates that
image_types
is optional with a default value of['original']
, but the parameter is required in the function signature. Additionally, consider using consistent type references between docstring and type hints (Dict vs dict).Apply this diff to fix the docstring:
- image_types : list, optional - List of image types to load in. The default is ['original']. + image_types : list + List of image types to load in.
34-40
: Consider making default labels configurable.The hardcoded default labels (
["patient_ID","survival_time_in_years","survival_event_binary"]
) are specific to survival analysis. Consider moving these to a configuration file or module-level constant to make them more maintainable and reusable for different types of analyses.
56-59
: Include exception details in warning message.The caught exception variable
e
is unused. Consider including its details in the warning message for better debugging.Apply this diff:
- logger.warning(f"No {image_type} feature csv files found in {extracted_feature_dir}") + logger.warning(f"No {image_type} feature csv files found in {extracted_feature_dir}: {str(e)}")🧰 Tools
🪛 Ruff (0.8.2)
56-56: Local variable
e
is assigned to but never usedRemove assignment to unused variable
e
(F841)
73-76
: Include exception details in warning message.Similarly, include the KeyError details in the warning message.
Apply this diff:
- logger.warning(f"{feature_file_path} does not have the labels {labels_to_drop} to drop.") + logger.warning(f"{feature_file_path} does not have the labels {labels_to_drop} to drop: {str(e)}")🧰 Tools
🪛 Ruff (0.8.2)
73-73: Local variable
e
is assigned to but never usedRemove assignment to unused variable
e
(F841)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pixi.lock
is excluded by!**/*.lock
📒 Files selected for processing (2)
src/readii/io/loaders/features.py
(1 hunks)tests/io/loaders/test_general.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/io/loaders/test_general.py
🧰 Additional context used
🪛 Ruff (0.8.2)
src/readii/io/loaders/features.py
56-56: Local variable e
is assigned to but never used
Remove assignment to unused variable e
(F841)
73-73: Local variable e
is assigned to but never used
Remove assignment to unused variable e
(F841)
🔇 Additional comments (3)
src/readii/io/loaders/features.py (3)
1-10
: LGTM! Well-organized imports.
The imports are properly organized and all imported modules are utilized in the code.
41-46
: LGTM! Proper directory validation.
The code appropriately validates the directory existence and efficiently retrieves the file list once.
81-85
: LGTM! Proper final validation.
The code appropriately validates that at least one feature set was loaded before returning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
tests/test_metadata.py (1)
52-66
: Consider enhancing test parameterization and path handlingWhile the parameterization is a good start, there are several potential improvements:
- Add more test cases to fully utilize parameterization
- Use temporary directories for output files
- Add cleanup after test execution
Here's a suggested improvement:
@pytest.mark.parametrize( "summary_file_path, modality, output_file_path", [ - ("nsclcSummaryFilePath","SEG","tests/NSCLC_Radiogenomics/procdata/ct_to_seg_match_list_NSCLC_Radiogenomics.csv") + ("nsclcSummaryFilePath", "SEG", "ct_to_seg_match_list_NSCLC_Radiogenomics.csv"), + ("lung4DSummaryFilePath", "RTSTRUCT", "ct_to_seg_match_list_4D-Lung.csv") ] ) -def test_matchCTtoSegmentation_output(summary_file_path, modality, output_file_path, request): +def test_matchCTtoSegmentation_output(summary_file_path, modality, output_file_path, request, tmp_path): """Test saving output of summary file""" summary_file_path = request.getfixturevalue(summary_file_path) + output_path = tmp_path / output_file_path matchCTtoSegmentation(summary_file_path, segType = modality, - outputFilePath = output_file_path) + outputFilePath = str(output_path)) - assert Path(output_file_path).exists(), \ + assert output_path.exists(), \ "Output does not exist, double check output file is named correctly."
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.gitignore
(1 hunks)tests/test_feature_extraction.py
(2 hunks)tests/test_metadata.py
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .gitignore
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/test_feature_extraction.py
117-117: Local variable actual
is assigned to but never used
Remove assignment to unused variable actual
(F841)
127-127: Local variable actual
is assigned to but never used
Remove assignment to unused variable actual
(F841)
🔇 Additional comments (3)
tests/test_feature_extraction.py (2)
47-51
: LGTM: Consistent fixture paths
The metadata path fixtures follow a consistent pattern and structure within the test directory.
115-132
: 🛠️ Refactor suggestion
Fix unused variables and enhance test coverage
Both test functions have similar issues that need addressing:
- The
actual
variables are assigned but never used - The tests only verify file existence without validating the content
Consider this improvement for both functions:
def test_NSCLC_radiomicFeatureExtraction_output(nsclcMetadataPath):
"""Test output creation from radiomic feature extraction for SEG dataset"""
- actual = radiomicFeatureExtraction(nsclcMetadataPath,
+ radiomicFeatureExtraction(nsclcMetadataPath,
imageDirPath = "tests/",
roiNames = None,
outputDirPath = "tests/NSCLC_Radiogenomics/results/")
expected_path = "tests/NSCLC_Radiogenomics/results/features/radiomicfeatures_original_NSCLC_Radiogenomics.csv"
- assert os.path.exists(expected_path)
+ assert os.path.exists(expected_path), "Output file was not created"
+ df = pd.read_csv(expected_path)
+ assert not df.empty, "Output file is empty"
+ assert "original_shape_MeshVolume" in df.columns, "Expected features are missing"
def test_4DLung_radiomicFeatureExtraction_output(lung4DMetadataPath):
"""Test output creation from radiomic feature extraction for RTSTRUCT dataset"""
- actual = radiomicFeatureExtraction(lung4DMetadataPath,
+ radiomicFeatureExtraction(lung4DMetadataPath,
imageDirPath = "tests/",
roiNames = "Tumor_c40",
outputDirPath = "tests/4D-Lung/results/")
expected_path = "tests/4D-Lung/results/features/radiomicfeatures_original_4D-Lung.csv"
- assert os.path.exists(expected_path)
+ assert os.path.exists(expected_path), "Output file was not created"
+ df = pd.read_csv(expected_path)
+ assert not df.empty, "Output file is empty"
+ assert "original_shape_MeshVolume" in df.columns, "Expected features are missing"
+ assert all(df["roi_name"] == "Tumor_c40"), "Unexpected ROI names in output"
🧰 Tools
🪛 Ruff (0.8.2)
117-117: Local variable actual
is assigned to but never used
Remove assignment to unused variable actual
(F841)
127-127: Local variable actual
is assigned to but never used
Remove assignment to unused variable actual
(F841)
tests/test_metadata.py (1)
2-2
: LGTM! Good modernization using pathlib.Path
Using pathlib.Path
instead of os.path
is a better approach as it provides more robust cross-platform path handling and is the recommended way in modern Python.
…emove them after tests for cleanliness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
tests/test_feature_extraction.py (1)
49-63
: Consider reducing code duplication in fixturesBoth fixtures share identical logic for file copying and cleanup. Consider extracting the common functionality into a helper function.
+def _setup_metadata_path(oldpath: str, newpath: str) -> str: + oldpath = Path(oldpath) + newpath = Path(newpath) + newpath.parent.mkdir(parents=True, exist_ok=True) + shutil.copy(oldpath, newpath) + return newpath @pytest.fixture def nsclcMetadataPath(): - oldpath = Path("tests/output/ct_to_seg_match_list_NSCLC_Radiogenomics.csv") - newpath = Path("tests/NSCLC_Radiogenomics/procdata/ct_to_seg_match_list_NSCLC_Radiogenomics.csv") - newpath.parent.mkdir(parents=True, exist_ok=True) - shutil.copy(oldpath, newpath) - yield newpath.as_posix() - newpath.unlink() + newpath = _setup_metadata_path( + "tests/output/ct_to_seg_match_list_NSCLC_Radiogenomics.csv", + "tests/NSCLC_Radiogenomics/procdata/ct_to_seg_match_list_NSCLC_Radiogenomics.csv" + ) + yield newpath.as_posix() + newpath.unlink() @pytest.fixture def lung4DMetadataPath(): - oldpath = Path("tests/output/ct_to_seg_match_list_4D-Lung.csv") - newpath = Path("tests/4D-Lung/procdata/ct_to_seg_match_list_4D-Lung.csv") - newpath.parent.mkdir(parents=True, exist_ok=True) - shutil.copy(oldpath, newpath) - yield newpath.as_posix() - newpath.unlink() + newpath = _setup_metadata_path( + "tests/output/ct_to_seg_match_list_4D-Lung.csv", + "tests/4D-Lung/procdata/ct_to_seg_match_list_4D-Lung.csv" + ) + yield newpath.as_posix() + newpath.unlink()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pixi.lock
is excluded by!**/*.lock
📒 Files selected for processing (1)
tests/test_feature_extraction.py
(3 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/test_feature_extraction.py
129-129: Local variable actual
is assigned to but never used
Remove assignment to unused variable actual
(F841)
139-139: Local variable actual
is assigned to but never used
Remove assignment to unused variable actual
(F841)
🔇 Additional comments (1)
tests/test_feature_extraction.py (1)
127-143
: 🛠️ Refactor suggestion
Enhance test coverage and fix unused variables
Both test functions have similar issues that need to be addressed:
- The
actual
variable is assigned but never used - The tests only verify file existence without validating the content
Apply these improvements to both test functions:
def test_NSCLC_radiomicFeatureExtraction_output(nsclcMetadataPath):
"""Test output creation from radiomic feature extraction for SEG dataset"""
- actual = radiomicFeatureExtraction(nsclcMetadataPath,
+ radiomicFeatureExtraction(nsclcMetadataPath,
imageDirPath = "tests/",
roiNames = None,
outputDirPath = "tests/NSCLC_Radiogenomics/results/")
expected_path = "tests/NSCLC_Radiogenomics/results/features/radiomicfeatures_original_NSCLC_Radiogenomics.csv"
- assert os.path.exists(expected_path)
+ assert os.path.exists(expected_path), "Output file was not created"
+ df = pd.read_csv(expected_path)
+ assert not df.empty, "Output file is empty"
+ assert "original_shape_MeshVolume" in df.columns, "Expected features are missing"
def test_4DLung_radiomicFeatureExtraction_output(lung4DMetadataPath):
"""Test output creation from radiomic feature extraction for RTSTRUCT dataset"""
- actual = radiomicFeatureExtraction(lung4DMetadataPath,
+ radiomicFeatureExtraction(lung4DMetadataPath,
imageDirPath = "tests/",
roiNames = "Tumor_c40",
outputDirPath = "tests/4D-Lung/results/")
expected_path = "tests/4D-Lung/results/features/radiomicfeatures_original_4D-Lung.csv"
- assert os.path.exists(expected_path)
+ assert os.path.exists(expected_path), "Output file was not created"
+ df = pd.read_csv(expected_path)
+ assert not df.empty, "Output file is empty"
+ assert "original_shape_MeshVolume" in df.columns, "Expected features are missing"
+ assert all(df["roi_name"] == "Tumor_c40"), "Unexpected ROI names in output"
🧰 Tools
🪛 Ruff (0.8.2)
129-129: Local variable actual
is assigned to but never used
Remove assignment to unused variable actual
(F841)
139-139: Local variable actual
is assigned to but never used
Remove assignment to unused variable actual
(F841)
Includes features, general, and images specific loading functions
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores
.gitignore
to exclude test output files.