Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend ImageFolderDataset to support import of COCO detection data #2612

Merged
merged 1 commit into from
Dec 20, 2024

Conversation

jin-eld
Copy link
Contributor

@jin-eld jin-eld commented Dec 12, 2024

Pull Request Template

Checklist

  • Confirmed that run-checks all script has been executed*.
    * with wgpu errors not related to this PR
  • Made sure the book is up to date with changes in this PR.

Changes

COCO is a popular dataset which defines an own dataset format: https://cocodataset.org/#format-data

This commit introduces an ImageFolderDataset::new_coco_detection() function, which expects a path to the JSON annotations file in COCO format and a path where the actual images are stored.

Note, that while COCO also offers segmentation and pose estimation data, for now only the import of detection (bounding boxes) data is supported.

Testing

Apart from testing with actual COCO data, a small dataset containing three images and annotations in COCO format has been added to crates/burn-dataset/tests/data. Each image has a different amount of annotations, the test checks if the imported dataset contains the correct number of images and the expected number of annotations per image.

Copy link

codecov bot commented Dec 13, 2024

Codecov Report

Attention: Patch coverage is 83.15217% with 31 lines in your changes missing coverage. Please review.

Project coverage is 81.84%. Comparing base (ebd7649) to head (1687e55).
Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
crates/burn-dataset/src/vision/image_folder.rs 83.15% 31 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2612      +/-   ##
==========================================
- Coverage   81.86%   81.84%   -0.02%     
==========================================
  Files         832      836       +4     
  Lines      106399   107398     +999     
==========================================
+ Hits        87099    87896     +797     
- Misses      19300    19502     +202     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@laggui laggui self-requested a review December 13, 2024 13:54
Copy link
Member

@laggui laggui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's awesome! The implementation is rather concise 😄

Looks like it fit pretty well with the provisioned types for ImageFolderDataset.

I have some minor comments, but otherwise great job!

I tested the code on a tiny subset of COCO and it works as expected.

/edit: sorry for the late review btw 😅 been quite busy

crates/burn-dataset/src/vision/image_folder.rs Outdated Show resolved Hide resolved
crates/burn-dataset/src/vision/image_folder.rs Outdated Show resolved Hide resolved
@@ -82,7 +85,7 @@ pub struct SegmentationMask {
/// Object detection bounding box annotation.
#[derive(Deserialize, Serialize, Debug, Clone, PartialEq)]
pub struct BoundingBox {
/// Coordinates.
/// Coordinates in [x,y,width,height] format.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double checked the format, and for clarity I think I would be even more precise by stating [x_min, y_min, width, height] since there also exists [cx, cy, width, height] which could be confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I actually missed that, was not aware of the _min thing at all, will update the comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the comment wasn't updated with the last changes 🤔

@jin-eld jin-eld force-pushed the coco-detection-import branch from 4e76f29 to 4b32ab8 Compare December 19, 2024 20:47
@jin-eld
Copy link
Contributor Author

jin-eld commented Dec 19, 2024

@laggui updated, I usually amend and force push to keep the history clean, hope that's OK for you.

Copy link
Member

@laggui laggui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually amend and force push to keep the history clean, hope that's OK for you.

Might mess up the github review UI for stuff that was already reviewed and untouched, but I don't have any issues with that. We squash & merge anyway so it will be reduced to a single commit 👍

Looks like the comment update was missed, otherwise it will be good to go!

@@ -82,7 +85,7 @@ pub struct SegmentationMask {
/// Object detection bounding box annotation.
#[derive(Deserialize, Serialize, Debug, Clone, PartialEq)]
pub struct BoundingBox {
/// Coordinates.
/// Coordinates in [x,y,width,height] format.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the comment wasn't updated with the last changes 🤔

COCO is a popular dataset which defines an own dataset format:
https://cocodataset.org/#format-data

This commit introduces a ImageFolderDataset::new_coco_detection()
function, which expects a path to the JSON annotations file in COCO
format and a path where the actual images are stored.

Note, that while COCO also offers segmentation and pose estimation data,
for now only the import of detection (bounding boxes) data is supported.
@jin-eld jin-eld force-pushed the coco-detection-import branch from 4b32ab8 to 1687e55 Compare December 19, 2024 21:00
@jin-eld
Copy link
Contributor Author

jin-eld commented Dec 19, 2024

I usually amend and force push to keep the history clean, hope that's OK for you.

Might mess up the github review UI for stuff that was already reviewed and untouched, but I don't have any issues with that. We squash & merge anyway so it will be reduced to a single commit 👍

Looks like the comment update was missed, otherwise it will be good to go!

sorry, C&P fail 🤦🏻‍♂️

@laggui laggui merged commit a882843 into tracel-ai:main Dec 20, 2024
11 checks passed
@laggui
Copy link
Member

laggui commented Dec 20, 2024

Thank you 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants