-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset Preparation Script #29
Comments
Hi, unfortunately, the datasets are not prepared by ourselves. You can try this download tool to download large-scale datasets as We do use ShareGPT4V to recaption the data and the prompt is |
Thanks for replying back indeed! I knew this repo as it is trying to provide dataset in webdataset format. However, when I was checking the project config file, I have noticed that ‘laion-auestics-12m’ is used during pretraining. When googling, I could not find this dataset and that is why I have asked for it because without this data we cannot reproduce your experiments. Thanks! |
Hi, laion-aesthetics-12m is available here https://huggingface.co/datasets/dclure/laion-aesthetics-12m-umap |
Hi, thanks for this excellent work.
Do you mind if I ask you to release the dataset preparation part, such as downloading the dataset and recaptioning it using ShareGPT4v? I could not find them in the repo.
Thanks.
The text was updated successfully, but these errors were encountered: