Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YOLOv8 classification #59

Merged
merged 17 commits into from
Oct 4, 2024
Merged

YOLOv8 classification #59

merged 17 commits into from
Oct 4, 2024

Conversation

Eldies
Copy link

@Eldies Eldies commented Sep 19, 2024

Summary

Adds support for yolo v8 classification format

How to test

Checklist

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.
  • I have updated the license header for each file (see an example below)
# Copyright (C) 2022 CVAT.ai Corporation
#
# SPDX-License-Identifier: MIT

Summary by CodeRabbit

  • New Features

    • Introduced support for YOLOv8 Classification format, enhancing dataset management capabilities.
    • Added comprehensive documentation for importing and exporting YOLOv8 Classification datasets.
    • New classes for handling YOLOv8 Classification data formats, including converters, importers, and extractors.
  • Bug Fixes

    • Improved error handling and item retrieval in dataset extraction processes.
  • Documentation

    • Updated changelog and user manual to include details on YOLOv8 Classification format and its usage.

Copy link

coderabbitai bot commented Sep 19, 2024

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The pull request introduces support for the YOLOv8 Classification format across multiple components of the Datumaro framework. Key changes include the addition of new classes for importing, exporting, and extracting datasets in the YOLOv8 Classification format. The changelog has been updated to reflect these enhancements, and comprehensive documentation has been created to guide users in utilizing the new format. Additionally, unit tests have been implemented to ensure functionality and robustness of the new features.

Changes

Files Change Summary
CHANGELOG.md Updated to include support for the YOLOv8 Classification format and enhancements to the env.detect_dataset() function.
datumaro/plugins/yolo_format/converter.py Introduced YOLOv8ClassificationConverter class for exporting datasets in the YOLOv8 classification format, including methods for media type validation and exporting media files.
datumaro/plugins/yolo_format/extractor.py Added YoloBaseExtractor class for improved structure and code reuse, with YoloExtractor now subclassing it. Introduced YOLOv8ClassificationExtractor for handling classification tasks.
datumaro/plugins/yolo_format/format.py Added YOLOv8ClassificationFormat class with a constant for handling unlabeled images.
datumaro/plugins/yolo_format/importer.py Introduced YOLOv8ClassificationImporter class with a find_sources method to recognize and handle classification data.
site/content/en/docs/formats/yolo_v8_classification.md Created documentation for the YOLOv8 Classification dataset format, detailing import/export processes and expected dataset structure.
site/content/en/docs/user-manual/supported_formats.md Added a new section for "Classification" formats, including links to format specifications, dataset examples, and documentation.
tests/unit/data_formats/test_yolo_format.py Added tests for the new converter, importer, and extractor classes related to YOLOv8 classification, covering various scenarios to ensure functionality and integrity during conversion and import processes.

Poem

In the meadow where data plays,
New formats bloom in sunny rays.
YOLOv8 now takes its flight,
Classification shines so bright!
With tests and docs, we hop along,
In Datumaro, we all belong! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Outside diff range and nitpick comments (8)
site/content/en/docs/formats/yolo_v8_classification.md (5)

7-14: Clearly specifies the supported annotation type and attributes.

The section provides a helpful link to the format specification and clearly states that the format supports Label annotations without any attributes.

Please fix the grammatical number in this sentence:

-Format doesn't support any attributes for annotations objects.
+Format doesn't support any attributes for annotation objects.
Tools
LanguageTool

[uncategorized] ~14-~14: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...rmat doesn't support any attributes for annotations objects. ## Import YOLOv8 classificat...

(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)


17-52: Provides clear instructions for importing datasets.

The section offers helpful guidance on importing YOLOv8 Classification datasets using both the command line and Python API, along with the expected directory structure.

Please make the following changes:

-A Datumaro project with a ImageNet dataset can be created
+A Datumaro project with an ImageNet dataset can be created
-For successful importing of YOLOv8 Classification dataset the input directory with dataset
+For successful importing of the YOLOv8 Classification dataset, the input directory with the dataset
-should has the following structure:
+should have the following structure:
Tools
LanguageTool

[uncategorized] ~19-~19: Use the indefinite article “an” before nouns that start with a vowel sound.
Context: ...cation dataset A Datumaro project with a ImageNet dataset can be created in the ...

(AI_EN_LECTOR_REPLACEMENT_DETERMINER_A_AN)


[uncategorized] ~35-~35: You might be missing the article “the” here.
Context: ...tion') ``` For successful importing of YOLOv8 Classification dataset the input direct...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[uncategorized] ~35-~35: A comma might be missing here.
Context: ...sful importing of YOLOv8 Classification dataset the input directory with dataset should...

(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)


[uncategorized] ~36-~36: This verb does not appear to agree with the subject. Consider using a different form.
Context: ...the input directory with dataset should has the following structure: ```bash datas...

(AI_EN_LECTOR_REPLACEMENT_VERB_AGREEMENT)


54-83: Provides clear instructions for exporting datasets.

The section offers helpful guidance on exporting YOLOv8 Classification datasets to other formats supported by Datumaro, using both the command line and Python API. The note about extra export options for some formats, along with the link to format-specific documentation, is useful.

Please add a comma in this sentence:

-For particular format see the
+For a particular format, see the
Tools
LanguageTool

[uncategorized] ~72-~72: You might be missing the article “the” here.
Context: ...our YOLOv8 Classification dataset using Python API ```python import datumaro as dm d...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[uncategorized] ~82-~82: Possible missing comma found.
Context: ...ve extra export options. For particular format see the > docs to get...

(AI_HYDRA_LEO_MISSING_COMMA)


85-99: Provides clear instructions for converting datasets to YOLOv8 Classification format.

The section offers helpful guidance on converting datasets containing Label annotations to the YOLOv8 Classification format using Datumaro, with examples for both the command line and a Datumaro project.

Please add a comma before "and" in this sentence:

-If your dataset contains `Label` for images and you want to convert this
+If your dataset contains `Label` for images, and you want to convert this
Tools
LanguageTool

[uncategorized] ~87-~87: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...your dataset contains Label for images and you want to convert this dataset into t...

(COMMA_COMPOUND_SENTENCE)


101-105: Clearly lists and explains extra export options.

The section provides a clear list of extra options for exporting to YOLOv8 Classification formats, along with brief explanations of each option.

Please rephrase these sentences to fix the grammatical issues:

-- `--save-media` allow to export dataset with saving media files
+- `--save-media` allows exporting the dataset with media files
  (by default `False`)
-- `--save-dataset-meta` - allow to export dataset with saving dataset meta
+- `--save-dataset-meta` allows exporting the dataset with dataset metadata
  file (by default `False`)
Tools
LanguageTool

[grammar] ~102-~102: Did you mean “exporting”? Or maybe you should add a pronoun? In active voice, ‘allow’ + ‘to’ takes an object, usually a pronoun.
Context: ...ication formats: - --save-media allow to export dataset with saving media files (by d...

(ALLOW_TO)


[grammar] ~104-~104: Did you mean “exporting”? Or maybe you should add a pronoun? In active voice, ‘allow’ + ‘to’ takes an object, usually a pronoun.
Context: ...False) - --save-dataset-meta` - allow to export dataset with saving dataset meta file...

(ALLOW_TO)

datumaro/plugins/yolo_format/converter.py (1)

409-443: The apply method looks good overall!

The method follows a clear structure:

  1. Validates the media type.
  2. Creates necessary directories.
  3. Saves dataset metadata if required.
  4. Iterates through subsets and items.
  5. Calls _export_media_for_label based on item annotations.

A few minor suggestions:

  • Consider adding a comment to explain the purpose of the DEFAULT_SUBSET_NAME check at line 425.
  • The assert statement at line 429 could be replaced with a more informative error message if the condition is not met.
tests/unit/data_formats/test_yolo_format.py (2)

1001-1015: Add docstrings to empty overridden test methods

The methods test_export_rotated_bbox, test_cant_save_with_reserved_subset_name, test_inplace_save_writes_only_updated_data, test_can_load_dataset_with_exact_image_info, and test_can_save_and_load_without_path_prefix are overridden with empty bodies. Providing docstrings explaining why these methods are intentionally left empty will improve code readability and help other developers understand their purpose.


978-1000: Consistent handling of overridden methods with different arguments

The method test_can_save_and_load_image_with_arbitrary_extension overrides a base class method with a different signature but does not disable the pylint warning for arguments-differ. For consistency and to avoid linting issues, consider adding # pylint: disable=arguments-differ or aligning the method signature with the base class.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 393cb66 and 5a38e5e.

Files ignored due to path filters (3)
  • tests/assets/yolo_dataset/yolov8_classification/train/label_0/2.jpg is excluded by !**/*.jpg
  • tests/assets/yolo_dataset/yolov8_classification/train/label_0/subfolder/1.jpg is excluded by !**/*.jpg
  • tests/assets/yolo_dataset/yolov8_classification/train/label_1/3.jpg is excluded by !**/*.jpg
Files selected for processing (8)
  • CHANGELOG.md (1 hunks)
  • datumaro/plugins/yolo_format/converter.py (2 hunks)
  • datumaro/plugins/yolo_format/extractor.py (9 hunks)
  • datumaro/plugins/yolo_format/format.py (1 hunks)
  • datumaro/plugins/yolo_format/importer.py (3 hunks)
  • site/content/en/docs/formats/yolo_v8_classification.md (1 hunks)
  • site/content/en/docs/user-manual/supported_formats.md (1 hunks)
  • tests/unit/data_formats/test_yolo_format.py (7 hunks)
Additional context used
LanguageTool
site/content/en/docs/formats/yolo_v8_classification.md

[uncategorized] ~14-~14: The grammatical number of this noun doesn’t look right. Consider replacing it.
Context: ...rmat doesn't support any attributes for annotations objects. ## Import YOLOv8 classificat...

(AI_EN_LECTOR_REPLACEMENT_NOUN_NUMBER)


[uncategorized] ~19-~19: Use the indefinite article “an” before nouns that start with a vowel sound.
Context: ...cation dataset A Datumaro project with a ImageNet dataset can be created in the ...

(AI_EN_LECTOR_REPLACEMENT_DETERMINER_A_AN)


[uncategorized] ~35-~35: You might be missing the article “the” here.
Context: ...tion') ``` For successful importing of YOLOv8 Classification dataset the input direct...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[uncategorized] ~35-~35: A comma might be missing here.
Context: ...sful importing of YOLOv8 Classification dataset the input directory with dataset should...

(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)


[uncategorized] ~36-~36: This verb does not appear to agree with the subject. Consider using a different form.
Context: ...the input directory with dataset should has the following structure: ```bash datas...

(AI_EN_LECTOR_REPLACEMENT_VERB_AGREEMENT)


[uncategorized] ~72-~72: You might be missing the article “the” here.
Context: ...our YOLOv8 Classification dataset using Python API ```python import datumaro as dm d...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)


[uncategorized] ~82-~82: Possible missing comma found.
Context: ...ve extra export options. For particular format see the > docs to get...

(AI_HYDRA_LEO_MISSING_COMMA)


[uncategorized] ~87-~87: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...your dataset contains Label for images and you want to convert this dataset into t...

(COMMA_COMPOUND_SENTENCE)


[grammar] ~102-~102: Did you mean “exporting”? Or maybe you should add a pronoun? In active voice, ‘allow’ + ‘to’ takes an object, usually a pronoun.
Context: ...ication formats: - --save-media allow to export dataset with saving media files (by d...

(ALLOW_TO)


[grammar] ~104-~104: Did you mean “exporting”? Or maybe you should add a pronoun? In active voice, ‘allow’ + ‘to’ takes an object, usually a pronoun.
Context: ...False) - --save-dataset-meta` - allow to export dataset with saving dataset meta file...

(ALLOW_TO)

Additional comments not posted (6)
datumaro/plugins/yolo_format/format.py (1)

31-32: LGTM!

The addition of the YOLOv8ClassificationFormat class and the IMAGE_DIR_NO_LABEL constant enhances the structure of the code by providing support for the YOLOv8 classification format and a mechanism to handle images without labels.

datumaro/plugins/yolo_format/importer.py (1)

114-123: LGTM!

The new YOLOv8ClassificationImporter class looks good:

  • It correctly extends the Importer class.
  • The find_sources method implementation is logically sound and handles the cases appropriately.
  • The changes enhance the functionality of the YOLO format importer to support classification data.

Great job!

site/content/en/docs/user-manual/supported_formats.md (1)

181-184: LGTM!

The new section for the YOLOv8 classification format is well-structured and consistent with the rest of the document. The provided links will be helpful for users to understand and utilize this format effectively.

CHANGELOG.md (1)

50-51: LGTM!

The changelog entry is well-formatted, descriptive, and consistent with the feature addition mentioned in the summary. Great job!

datumaro/plugins/yolo_format/extractor.py (1)

691-701: Verify Python version compatibility due to use of assignment expressions

The code uses assignment expressions (the walrus operator :=) in the comprehension at line 699, which requires Python 3.8 or newer. If the project needs to support earlier Python versions, this will cause syntax errors. Please verify the minimum required Python version for the project and adjust accordingly.

To confirm the project's Python version compatibility, run the following script:

Verification successful

Python version compatibility verified and consistent with codebase

The project's setup.py file specifies a minimum Python version of 3.8, which is appropriate for the use of assignment expressions (walrus operator). This requirement is consistent with the code in question and the usage of assignment expressions throughout the codebase, particularly in the yolo_format plugin.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for the minimum required Python version in project files
# Expected result: Python version should be >= 3.8 due to usage of assignment expressions

# Searching for 'python_requires' in 'setup.py' or 'pyproject.toml'
grep -r "python_requires" setup.py pyproject.toml || echo "python_requires not specified."

# If 'python_requires' is not specified, check for assignment expressions usage
grep -r ".*:=.*" datumaro/ || echo "No assignment expressions found."

Length of output: 973

tests/unit/data_formats/test_yolo_format.py (1)

1804-1820: Verify the use of extractor name in dataset export

In the YOLOv8ClassificationExtractorTest.test_can_parse method, the dataset is exported using YOLOv8ClassificationExtractor.NAME. Since this is an extractor test, ensure that using the extractor's name for export is intended and functions correctly. Typically, a converter's name is used when exporting datasets.

datumaro/plugins/yolo_format/converter.py Outdated Show resolved Hide resolved
datumaro/plugins/yolo_format/extractor.py Outdated Show resolved Hide resolved
datumaro/plugins/yolo_format/extractor.py Outdated Show resolved Hide resolved
tests/unit/data_formats/test_yolo_format.py Show resolved Hide resolved
tests/unit/data_formats/test_yolo_format.py Outdated Show resolved Hide resolved
@Bobronium
Copy link
Member

Could you resolve coderabbitai comments by ether applying suggestions or writing why they are not applicable?

@Eldies Eldies force-pushed the dl/yolo8-classification branch 3 times, most recently from 7e52b98 to 07734fe Compare September 24, 2024 11:38
@Bobronium
Copy link
Member

Black is failing, but otherwise LGTM!



class YOLOv8ClassificationConverter(Converter):
# https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The url seems to be unrelated to YOLOv8. I think it would make sense to add a similar one for v8.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the url.
There is already a link to official docs for yolov8 in datumaro docs, so i dont see a point of adding a link to some unofficial guide here (also I did not manage to find something like this)

Copy link

sonarcloud bot commented Oct 4, 2024

@Eldies Eldies merged commit e612d1b into develop Oct 4, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants