Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data preparation #13

Open
sunhope54 opened this issue Jan 9, 2025 · 4 comments
Open

Data preparation #13

sunhope54 opened this issue Jan 9, 2025 · 4 comments

Comments

@sunhope54
Copy link

Thank you for your exordinary work! I want to know how to download the right dataset when occurring the various choises in the official websites.
image

@sunrainyg
Copy link
Member

Thank you for your interest. We used the 2014 training, validation, and test images and the corresponding annotations.

@sunhope54
Copy link
Author

Thank you very much for your timely reply. Excuse me again. I would like to ask how to obtain the multi-modal and multi-task datasets in your training process. Aren't the storage formats of each dataset different? My main problem is that I didn't quite understand the content of DATASET.md. I'm sorry to have taken up your time. Please accept my apologies again!

@sunrainyg
Copy link
Member

You can gather all the necessary multi-modal data for various tasks by following the instructions in DATASET.md to execute the scripts. Once the process is complete, all training data will be stored in the data/image_pairs_train directory.

This data must be generated before starting the training. During the training phase, the model will utilize data from different tasks for training.

To begin, you can run the following command:

python build_data/format_dataset_rp.py --save_root './image_pairs_train' --tasks ['det'] --data_root './data/coco'

Afterwards, you can modify the --tasks or --data_root parameters to generate data for other tasks.

Let me know if you have any further questions.

@sunhope54
Copy link
Author

Thank you very much for your previous answers, and I apologize again for my questions. I am still having some issues with building a multi-task instruction-tuning dataset. Can I build the dataset by executing the following code:
python build_data/format_dataset_rp.py --save_root './image_pairs' --tasks ['det'] --data_root './data/coco'
python build_data/format_dataset_rp.py --save_root './image_pairs' --tasks ['seg'] --data_root './data/ADE20k'
python build_data/format_dataset_rp.py --save_root './image_pairs' --tasks ['cls'] --data_root './data/Oxford-IIIT'
python build_data/format_dataset_rp.py --save_root './image_pairs' --tasks ['depes'] --data_root './data/NYUV2'
Also, when I process datasets other than coco, the following errors occur:
image
It seems that the code still deals with the coco dataset. How to sovle the problem?
Finally, thank you for taking the time to look at my problem. Best regards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants