Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: look for image type in Path stem as Path is not iterable #97

Merged
merged 2 commits into from
Dec 18, 2024

Conversation

strixy16
Copy link
Collaborator

@strixy16 strixy16 commented Dec 18, 2024

Summary by CodeRabbit

  • New Features

    • Improved file matching logic for identifying features based on image types.
    • Enhanced clarity in handling scenarios for matching files with updated control flow.
  • Bug Fixes

    • Retained consistent error handling with appropriate logging for warnings and exceptions.

Copy link
Contributor

coderabbitai bot commented Dec 18, 2024

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (1)
  • pixi.lock is excluded by !**/*.lock

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The pull request modifies the loadFeatureFilesFromImageTypes function in the features loader module. The primary changes involve updating the file matching logic to use file stem (name without extension) instead of the full filename, and implementing a more structured pattern matching approach for handling different scenarios of file discovery. The modifications aim to improve file matching precision and enhance the clarity of the control flow logic.

Changes

File Change Summary
src/readii/io/loaders/features.py - Updated file matching to use file.stem instead of full filename
- Implemented pattern matching (match ... case) for handling file discovery scenarios
- Retained existing error handling and logging mechanisms

Poem

🐰 A Rabbit's Ode to File Matching Magic

With stem and pattern, code now dances bright,
Matching files with algorithmic delight!
No more full names, just pure essence we seek,
Our loader's logic, no longer oblique.
Hop, hop, hurray for cleaner design! 🎉


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
src/readii/io/loaders/features.py (1)

Line range hint 58-71: Improve clarity and control in the pattern matching logic

The current implementation handles both single and multiple matches the same way, with only a warning for multiple matches. This could lead to unexpected behavior.

Consider making the logic more explicit:

 match len(matching_files):
     case 1:
-        # Only one file found, use it  
-        pass
+        # Only one file found, use it
+        image_type_feature_file = matching_files[0]
     case 0:
         # No files found for this image type
         logger.warning(f"No {image_type} feature csv files found in {extracted_feature_dir}")
         # Skip to the next image type
         continue
     case _:
         # Multiple files found
         msg = f"Multiple {image_type} feature csv files found in {extracted_feature_dir}. First one will be used."
         logger.warning(msg)
+        # Consider if this should be an error instead of a warning
+        image_type_feature_file = matching_files[0]

-image_type_feature_file = matching_files[0]
🧹 Nitpick comments (1)
src/readii/io/loaders/features.py (1)

56-56: Consider using more precise string matching for image types

While using Path.stem fixes the Path iteration issue, the current string containment check (in) could lead to false matches if image types are substrings of each other (e.g., "original" would match "original_filtered").

Consider using more precise matching:

-matching_files = [file for file in feature_file_list if (image_type in file.stem)]
+matching_files = [file for file in feature_file_list if any(
+    file.stem == image_type or
+    file.stem.startswith(f"{image_type}_") or
+    file.stem.endswith(f"_{image_type}")
+)]
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 679ea8a and eaa8e3a.

📒 Files selected for processing (1)
  • src/readii/io/loaders/features.py (1 hunks)
🔇 Additional comments (1)
src/readii/io/loaders/features.py (1)

Line range hint 56-71: Verify the impact on existing image type patterns

Let's verify how this function is used in the codebase and what image types are typically processed.

✅ Verification successful

Changes to image type handling are safe and well-supported

The verification shows that the changes to handle multiple matching files are safe because:

  1. The image types are well-defined in configuration files (NSCLC_Radiogenomics.yaml and 4D-Lung.yaml) with specific patterns like "original", "shuffled_full", etc.
  2. The warning message for multiple matches is appropriate as it maintains backward compatibility by using the first match, while informing users about potential issues.
  3. The function is properly tested and integrated with the dataset configurations, as seen in the test files.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check usage patterns of loadFeatureFilesFromImageTypes and common image types

# Check how the function is called and what image types are passed
rg -A 5 "loadFeatureFilesFromImageTypes" --glob '!*.csv'

# Look for test files with image type patterns
fd -e csv . | grep -i "feature"

# Check for any test files that might be affected
fd -e py -e ipynb | xargs rg "image_type.*original"

Length of output: 1427


Script:

#!/bin/bash
# Let's try a different approach to understand the usage and impact

# Check for any direct usage of this function in the codebase
rg -l "loadFeatureFilesFromImageTypes\(" 

# Look for patterns of image types being defined or used
rg "image_types.*=.*\[" -A 2

# Check the entire features.py file to understand the context better
cat src/readii/io/loaders/features.py

# Look for test files related to features
fd "test.*feature" -t f

Length of output: 4982


Script:

#!/bin/bash
# Let's check the test file and any other related usage patterns

# Check the content of the test file
cat tests/test_feature_extraction.py

# Look for any image type definitions in the test files
rg -g "test_*.py" "image_type" -B 2 -A 2

# Check for any configuration files that might define image types
fd -e yaml -e json -e toml | xargs rg "image.*type"

# Look for any notebooks that might use this function
fd -e ipynb | grep -v checkpoint | xargs rg "loadFeatureFilesFromImageTypes"

Length of output: 10642

@strixy16 strixy16 merged commit 875a6c3 into main Dec 18, 2024
18 checks passed
@strixy16 strixy16 deleted the katys/fix-feature-file-loader branch December 18, 2024 20:19
Copy link

codecov bot commented Dec 18, 2024

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Project coverage is 41.66%. Comparing base (b241c42) to head (b6a161e).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/readii/io/loaders/features.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main      #97   +/-   ##
=======================================
  Coverage   41.66%   41.66%           
=======================================
  Files          33       33           
  Lines        1452     1452           
=======================================
  Hits          605      605           
  Misses        847      847           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

1 participant