Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add io/readers to ruff config #94

Merged
merged 11 commits into from
Dec 18, 2024
Merged

Conversation

strixy16
Copy link
Collaborator

@strixy16 strixy16 commented Dec 17, 2024

Updated include list to have the io/readers functions and updated the files so all ruff tests pass.

Summary by CodeRabbit

  • New Features

    • Enhanced flexibility in loading feature files by accepting both Path objects and strings.
    • Improved error handling and logging for file loading and directory access.
  • Bug Fixes

    • Enhanced error handling and logging for file loading and directory access, providing clearer context for issues.
  • Documentation

    • Updated docstrings for functions to improve clarity and detail, including return type annotations.

Copy link
Contributor

coderabbitai bot commented Dec 17, 2024

Walkthrough

The pull request introduces modifications to several files in the readii project, focusing on improving error handling, logging, and code structure. Changes span configuration files and loader modules, with updates to function signatures, error handling mechanisms, and logging approaches. The modifications aim to enhance code robustness, provide more informative error messages, and modernize the codebase by utilizing Path objects and structured error logging.

Changes

File Change Summary
config/ruff.toml - Updated file inclusion pattern to "src/readii/io/**/**.py"
- Removed commented-out select list
- Updated ignore list with specific linting rules
src/readii/io/loaders/features.py - Enhanced loadFeatureFilesFromImageTypes function
- Updated parameter type to accept Path or str
- Improved directory and file handling using Path methods
- Modernized error handling with logging and pattern matching
src/readii/io/loaders/general.py - Integrated logging for error handling
- Added error logging in loadImageDatasetConfig and loadFileToDataFrame
- Maintained existing exception types
src/readii/io/loaders/images.py - Added return type annotation to getImageTypesFromDirectory
- Enhanced error handling with logging
- Improved docstring clarity

Poem

🐰 In the realm of code, a rabbit's delight,
Paths now clearer, errors shine bright
Logging whispers where bugs might hide
With Path and pattern, we now stride
Refactored magic, clean and light! 🔧

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@strixy16 strixy16 requested a review from jjjermiah December 17, 2024 17:37
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (8)
src/readii/io/loaders/images.py (2)

32-34: Consider using f-strings consistently

While the error logging is good, the message construction could be more consistent with Python's modern practices.

-        msg = f"Directory {raw_data_dir} does not exist."
+        msg = f"Directory '{raw_data_dir}' does not exist."

38-40: Enhance error messages with more context

The error handling pattern is good, but consider adding more context to help users troubleshoot:

  1. For NotADirectoryError: Include the actual file type
  2. For FileNotFoundError: Include the full search pattern
-        msg = f"Path {raw_data_dir} is not a directory."
+        msg = f"Path '{raw_data_dir}' is not a directory (type: {raw_data_dir.stat().st_mode})"

-        msg = f"No files with prefix {feature_file_prefix} and suffix {feature_file_suffix} found in directory {raw_data_dir}."
+        msg = f"No files matching pattern '{feature_file_prefix}*{feature_file_suffix}' found in directory '{raw_data_dir}'"

Also applies to: 44-46

src/readii/io/loaders/general.py (1)

113-115: Consider adding data shape information

When logging empty DataFrame errors, it would be helpful to include the shape information.

-        msg = "Loaded Dataframe is empty"
+        msg = f"Loaded DataFrame is empty (shape: {df.shape})"
src/readii/io/loaders/features.py (5)

10-10: Justify or remove the noqa comment.

The # noqa comment is suppressing linter warnings, but it's not clear which specific warning is being suppressed. Consider either:

  1. Specifying the exact warning code being suppressed (e.g., # noqa: E501 for line length)
  2. Removing it if no longer needed

Line range hint 14-32: Update docstring to match the new type annotation.

The parameter type for extracted_feature_dir should be updated to reflect that it accepts both string and Path objects:

    Parameters
    ----------
-   extracted_feature_dir : str
+   extracted_feature_dir : Union[str, Path]
        Path to the directory containing the extracted feature csv files

45-48: Avoid duplicating the error message.

The error message is defined but not used in the exception. Consider:

    if not extracted_feature_dir.exists() or not extracted_feature_dir.is_dir():
        msg = f"Extracted feature directory {extracted_feature_dir} does not exist."
-       logger.error(f"Extracted feature directory {extracted_feature_dir} does not exist.")
-       raise FileNotFoundError()
+       logger.error(msg)
+       raise FileNotFoundError(msg)

76-78: Consider more specific exception handling.

The current implementation catches all exceptions with a broad Exception clause. Consider catching specific exceptions that might occur during file operations:

-       except Exception as e:
+       except (OSError, IOError) as e:
            logger.warning(f"Error loading {image_type} feature csv files from {extracted_feature_dir}: {e}")
            raise e

104-105: Include error message in ValueError.

Consider passing the error message to the ValueError for better debugging:

        logger.error(f"No valid feature sets were loaded from {extracted_feature_dir}")
-       raise ValueError()
+       raise ValueError(f"No valid feature sets were loaded from {extracted_feature_dir}")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b734be and 5dbfc73.

📒 Files selected for processing (4)
  • config/ruff.toml (1 hunks)
  • src/readii/io/loaders/features.py (4 hunks)
  • src/readii/io/loaders/general.py (4 hunks)
  • src/readii/io/loaders/images.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
src/readii/io/loaders/features.py

93-93: SyntaxError: Expected an indented block after except clause

🔇 Additional comments (7)
src/readii/io/loaders/images.py (2)

4-5: LGTM: Logger import added correctly

The logger import from readii.utils is appropriately placed at the module level.


9-9: LGTM: Return type annotation added

The return type annotation -> list improves type safety and documentation.

src/readii/io/loaders/general.py (2)

6-7: LGTM: Logger import added correctly

The logger import from readii.utils is appropriately placed at the module level.


45-46: LGTM: Error handling in loadImageDatasetConfig

The error handling pattern is consistent and appropriate:

  1. FileNotFoundError for missing config
  2. ConfigError for YAML parsing issues
  3. ConfigError for empty config

Also applies to: 52-54, 57-59

config/ruff.toml (2)

Line range hint 1-116: LGTM: Comprehensive ruff configuration

The configuration is well-structured with:

  1. Clear categorization of rules (type annotations, imports, code quality, etc.)
  2. Appropriate ignores for specific cases
  3. Consistent formatting settings

9-9: Verify the impact of broader include pattern

The pattern src/readii/io/**/**.py will include all Python files under io directory. Let's verify this doesn't accidentally include any test files or other unintended files.

✅ Verification successful

The broader include pattern correctly targets only source files

The pattern src/readii/io/**/**.py is safe to use as it only includes legitimate source files:

  • All files are properly organized under feature-specific subdirectories (loaders, utils, writers)
  • No test files were found in the io directory structure
  • The included files align with expected source code organization
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# List all Python files that will be included by the new pattern
fd '.py$' src/readii/io/

# Check if there are any test files that might be accidentally included
fd -I 'test.*\.py$' src/readii/io/

Length of output: 440

src/readii/io/loaders/features.py (1)

33-36: LGTM! Clean Path object handling.

The conversion from string to Path is implemented correctly with proper type checking.

Comment on lines +108 to +110
msg = "Error pargins {file_path}"
logger.exception(msg)
raise DataFrameLoadError() from e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix typo in error message

There's a typo in the error message for file parsing errors.

-        msg = "Error pargins {file_path}"
+        msg = f"Error parsing '{file_path}'"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
msg = "Error pargins {file_path}"
logger.exception(msg)
raise DataFrameLoadError() from e
msg = f"Error parsing '{file_path}'"
logger.exception(msg)
raise DataFrameLoadError() from e

Comment on lines 92 to 93
except KeyError:
except KeyError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix the duplicated except clause.

There's a syntax error with the duplicated except KeyError: line:

-       except KeyError:
        except KeyError:
            logger.warning(f"{feature_file_path} does not have the labels {labels_to_drop} to drop.")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except KeyError:
except KeyError:
except KeyError:
logger.warning(f"{feature_file_path} does not have the labels {labels_to_drop} to drop.")
🧰 Tools
🪛 Ruff (0.8.2)

93-93: SyntaxError: Expected an indented block after except clause

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
src/readii/io/loaders/features.py (4)

10-10: Remove unnecessary noqa comment.

The # noqa comment appears unnecessary as there are no apparent linting issues with this line. Consider removing it unless there's a specific linting rule being suppressed.


Line range hint 14-32: Update docstring to reflect new parameter type.

The docstring for extracted_feature_dir parameter should be updated to reflect that it now accepts both string and Path objects.

-    extracted_feature_dir : str
+    extracted_feature_dir : Union[str, Path]

45-48: Improve error handling consistency.

The error handling can be improved:

  1. The msg variable is defined but not used in the exception
  2. The error logging message is duplicated
     if not extracted_feature_dir.exists() or not extracted_feature_dir.is_dir():
         msg = f"Extracted feature directory {extracted_feature_dir} does not exist."
-        logger.error(f"Extracted feature directory {extracted_feature_dir} does not exist.")
+        logger.error(msg)
-        raise FileNotFoundError()
+        raise FileNotFoundError(msg)

104-105: Include error message in ValueError.

The ValueError should include the error message for better debugging.

         logger.error(f"No valid feature sets were loaded from {extracted_feature_dir}")
-        raise ValueError()
+        raise ValueError(f"No valid feature sets were loaded from {extracted_feature_dir}")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5dbfc73 and 99e8924.

⛔ Files ignored due to path filters (1)
  • pixi.lock is excluded by !**/*.lock
📒 Files selected for processing (1)
  • src/readii/io/loaders/features.py (4 hunks)
🔇 Additional comments (1)
src/readii/io/loaders/features.py (1)

10-13: Verify all callers handle the new parameter type.

The function signature has been updated to accept both Path and str types. Let's verify that all callers of this function are compatible with this change.

✅ Verification successful

Function appears to be newly added or currently unused

The function loadFeatureFilesFromImageTypes is only defined in src/readii/io/loaders/features.py and exported in src/readii/io/loaders/__init__.py, but there are no actual calls to this function found in the codebase. This suggests that either:

  1. The function is newly added and will be used in future implementations
  2. The function is used in external code that imports this package
  3. The function is used in test files that might be in a separate directory

Since the change to accept both Path and str types makes the function more flexible for future use, and there are no current callers that need updating, this change is safe.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for all calls to loadFeatureFilesFromImageTypes
ast-grep --pattern 'loadFeatureFilesFromImageTypes($$$)'

Length of output: 58


Script:

#!/bin/bash
# Search for function calls using ripgrep for better coverage
rg "loadFeatureFilesFromImageTypes" -l

# Also search for the function definition to understand its location
rg "def loadFeatureFilesFromImageTypes" -l

Length of output: 185


Script:

#!/bin/bash
# Let's see the actual usage context in both files
rg "loadFeatureFilesFromImageTypes" -B 2 -A 2 src/readii/io/loaders/features.py src/readii/io/loaders/__init__.py

Length of output: 1177


Script:

#!/bin/bash
# Let's search for any imports and actual usage of this function in the entire codebase
rg -g '!{features,__init__}.py' "loadFeatureFilesFromImageTypes" -B 3 -A 3

Length of output: 74

src/readii/io/loaders/features.py Show resolved Hide resolved
Copy link

codecov bot commented Dec 17, 2024

Codecov Report

Attention: Patch coverage is 9.37500% with 58 lines in your changes missing coverage. Please review.

Project coverage is 40.31%. Comparing base (81fbd78) to head (99e8924).
Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
src/readii/io/loaders/features.py 13.79% 25 Missing ⚠️
src/readii/io/loaders/general.py 4.00% 24 Missing ⚠️
src/readii/io/loaders/images.py 10.00% 9 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #94      +/-   ##
==========================================
- Coverage   49.04%   40.31%   -8.73%     
==========================================
  Files          25       32       +7     
  Lines         991     1399     +408     
==========================================
+ Hits          486      564      +78     
- Misses        505      835     +330     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

continue
case _:
# Multiple files found
msg = f"Multiple {image_type} feature csv files found in {extracted_feature_dir}. First one will be used."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
msg = f"Multiple {image_type} feature csv files found in {extracted_feature_dir}. First one will be used."
msg = f"{len(matching_files) csv files found for {image_type} in {extracted_feature_dir}. First one ({matching_files[0]) will be used."

Comment on lines +76 to +78
except Exception as e:
logger.warning(f"Error loading {image_type} feature csv files from {extracted_feature_dir}: {e}")
raise e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what kind of exceptions do you forsee in this try-except block? Looks like the most possible source of errors are well handled i.e no chance for index error

@strixy16 strixy16 merged commit 7efc97b into main Dec 18, 2024
18 of 20 checks passed
@strixy16 strixy16 deleted the katys/update-ruff-config branch December 18, 2024 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants