Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSoC2024] Resolved issue #7523: Fixed error caused by filenames with same names #7602

Closed
wants to merge 11 commits into from

Conversation

adkbbx
Copy link
Contributor

@adkbbx adkbbx commented Mar 13, 2024

Resolved issue #7523 : Fixed error caused by filenames with the same name but different extensions.

Motivation and context

This commit addresses a bug where uploading coco annotation files with the same image name but different extensions resulted in an error. The fix ensures that such files are processed correctly without causing errors in the system.

Before fix:
image

After fix:
Screenshot 2024-03-13 213751

How has this been tested?

  1. Create a new task on CVAT.
  2. Upload images named image1.png and image1.jpg.
  3. Upload COCO annotations.
  4. Annotations get uploaded sucessfully.

Checklist

  • I submit my changes into the develop branch
  • I have created a changelog fragment
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • I have linked related issues (see GitHub docs)
  • I have increased versions of npm packages if it is necessary
    (cvat-canvas,
    cvat-core,
    cvat-data and
    cvat-ui)

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.

Copy link

codecov bot commented Mar 13, 2024

Codecov Report

Merging #7602 (98f6e02) into develop (f513aa1) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #7602   +/-   ##
========================================
  Coverage    83.47%   83.47%           
========================================
  Files          373      373           
  Lines        39739    39760   +21     
  Branches      3741     3741           
========================================
+ Hits         33171    33191   +20     
- Misses        6568     6569    +1     
Components Coverage Δ
cvat-ui 79.28% <ø> (ø)
cvat-server 87.34% <100.00%> (+<0.01%) ⬆️

dataset = Dataset.import_from(src_file.name,
'coco_instances', env=dm_env)
#Reverse mapping names
for item in dataset:
item.id = reverse_no_ext_mapping[item.id]
Copy link
Contributor

@zhiltsov-max zhiltsov-max Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please try to implement the solution I described in #7523 (comment)? This code currently misses the most important part, that, in fact, should address the problem. Currently, as I see, the code doesn't change the situation, as it just modifies the input dataset and then changes everything back on the Datumaro side. As the problem is on the Datumaro side, it's not possible to solve it just by changing the Datumaro dataset object. Please also check the zip archive case as well, and then COCO Keypoints format in this file, it will have the same implementation for the fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhiltsov-max

I've been experimenting with the implementation provided in #7523(comment), and I encountered a circular import error while attempting to load the rename_mapping variable into bindings.py.

ImportError: cannot import name 'GetCVATDataExtractor' from partially initialized module 'cvat.apps.dataset_manager.bindings' (most likely due to a circular import) (/home/adkbbx/projects/CVAT/cvat/cvat/apps/dataset_manager/bindings.py)

Which I thought would be the way of implementing this logic please do correct me if I was wrong with the implementation.

On furthur debugging I found that the

self._frame_mapping variable inside the bindings.py only contains the value {"image1": 1} may be since same keys were used since path "Image1" was same for both the images in the dataset and only the value of fram got update from 0->1 which I felt would be the cause of multiple annotations being placed on the same image on my current implementation.

To resolve this, I attempted to updating the _frame_mapping attribute with the instance_data and this seems to resolve the issue of multiple annotations being placed on the same image on my current implementation but while I was testing the export of datasets the annotations aren't getting updated into json file.
"annotations": []

So I think updating _frame_mapping might be the cause of this export error which I am unable to debug.

I would appreciate any hints, insights or suggestions for the implementation of this solution further. I'm sorry for the frequent contact and thank you for your time and patience! 🙇

Copy link
Contributor

@zhiltsov-max zhiltsov-max Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been experimenting with the implementation provided in #7523 (comment), and I encountered a circular import error while attempting to load the rename_mapping variable into bindings.py.

ImportError: cannot import name 'GetCVATDataExtractor' from partially initialized module 'cvat.apps.dataset_manager.bindings' (most likely due to a circular import) (/home/adkbbx/projects/CVAT/cvat/cvat/apps/dataset_manager/bindings.py)

Which I thought would be the way of implementing this logic please do correct me if I was wrong with the implementation.

The suggestion in the comment was to add a new function parameter to import_dm_annotations() in bindings.py. I believe, this doesn't require importing any modules. Then, you can supply the constructed mapping into this function, when it's called in coco.py. After this, in import_dm_annotations(), the match_dm_item() function gets called. It also needs a new parameter. match_dm_item() needs to be updated to use this new parameter to map the item name back into the original one, and find the matching frame id in the job or task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhiltsov-max
I've pushed a new commit attempting to address the issue and think I was able to incorporated the suggestions you provided earlier, including adding a new parameter to import_dm_annotations() in bindings.py and updating match_dm_item() accordingly. Despite these changes, It seems that the annotations are still being duplicated on some images even though the anotations are being accepted now. As of what I am able to understand from my debugging the way instance_dataset._frame_mapping updates when there are images with same name and different extensions might be the cause of this issue.
Any further guidance you can provide would be greatly appreciated. Thank you for your continued support and feedback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very nice to see advances in solving this problem.

As of what I am able to understand from my debugging the way instance_dataset._frame_mapping updates when there are images with same name and different extensions might be the cause of this issue.

Yeah, it seems to be the issue now. From what I've checked, we can try to change what's stored in these frame mappings, as they seem to only be used for frame matching. Probably, we can include file extensions in these mappings as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to get the inital tests setup and am able to run the tests sucessfully but using pytest ./tests/python and am getting theese results where some test cases are failing as well as status code 500 error which I think is an internal server error and am a little unsure about how to deal with this any pointers would be helpful.

Please try to check server logs with docker logs test_cvat_server_1, docker logs test_cvat_worker_import_1, docker logs test_cvat_worker_export_1. You can also attach a debugger, if it's needed.

please note that I came up with this refering to the current implementation of the test so do let me know if it is incorrect. Thank you for your support!

  • Seems like it does quite a few unnecessary actions. Please consider minimizing the amount of actions done in the test setup.
  • Importing the exported dataset makes no sense, as it doesn't reproduce the problem (since the export is broken too wrt. this issue).
  • Why the test is checking that there are no annotations in the task after uploading?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhiltsov-max I am sorry I think that was a mistake from me I will update the tests and check the logs as requested thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, any updates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhiltsov-max I am sorry for the delayed response and must inform you that progress on this task has been slower than anticipated since I had been a little busy with the proposal submission for GSOC. I plan to complete working on this issue as soon as possible. I apologize for any inconvenience this may have caused, and I assure you that I am dedicated to completing this task promptly and effectively. Thank you for your understanding and continued support. 🙇

Copy link
Contributor

@zhiltsov-max zhiltsov-max Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, do as you see fits best to you, it's your PR after all. Just keep in mind the program deadlines, if you want it to be a part of GSoC.

@nmanovic nmanovic changed the title [GSOC24] Resolved issue #7523: Fixed error caused by filenames with same names [GSoC24] Resolved issue #7523: Fixed error caused by filenames with same names Mar 15, 2024
@nmanovic nmanovic changed the title [GSoC24] Resolved issue #7523: Fixed error caused by filenames with same names [GSoC2024] Resolved issue #7523: Fixed error caused by filenames with same names Mar 15, 2024
@zhiltsov-max
Copy link
Contributor

Hi, any updates on this PR?

@zhiltsov-max
Copy link
Contributor

Closing as there's no activity for 1 month. Feel free to reopen, if you want to continue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants