Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any possible script/instruction to look at for fine-tuning? #129

Open
underdogliu opened this issue Oct 8, 2023 · 5 comments
Open

Comments

@underdogliu
Copy link

Hi! First of all thanks a lot for such amazing project.

I wonder if there is a valid way to fine-tune the model for specific tasks using customized datasets? I am trying to adapt the model to improve the performance and the source data structure I have is also for classification. It has below content:

  • Source audio
  • Text prompt
  • The event ID

I found that this script looks promising, but I do not know how to configure my data in order to replicate the process on my datasets since I found in the options, the sccipt has many setting very specific to ESC50. Do we have any gist on this problem? Thanks in advance!

@RetroCirce
Copy link
Contributor

Hi,

Since our model is trained on a large-scale data collection, we use the webdataset library to handle IO, instead of directly downloading all data in our server.

In your case, you can finetune the model without using the webdataset, which means that you need write a dataloader and a dataset class by yourself. Such dataset and dataloader classes are able to load the audio and the text for the model training. There are many ways to do that, I can refer you to my HTS-AT audio loader (which does not use Webdataset), and you need to load the text data by yourself. Previously, the audio loader loads the h5 file, but now you can directly use the torchaudio.load api.

@RetroCirce
Copy link
Contributor

And for how to finetune our model, I think eval_linear_probe code should definitely be the sample code for your reference.

@Kinetic-shaun
Copy link

Hi Xuechen,

I saw you have done some work in CLAP fine-tuning, and I wonder if you have performed the fine-tuning job in MSFT_CLAP/example for an ASR task? How does it work? Since I also want to fine-tune CLAP and perform some downstream tasks, I would appreciate it if you could provide me with an answer.

@underdogliu
Copy link
Author

@Kinetic-shaun Thanks for your message and reading my recent paper! Yes actually I have scripts with fine-tuning the Microsoft CLAP model. But it is not fully ready for contributing since it is very specific to the datasets described in the paper, instead of the one that has been used in the original repo - they are very different tasks and scenarios, as you may have read.

Recently I've been busy. I believe after the paper's status has been updated, I will have a chance to clean up the code and make it online to the repo.

BTW, @RetroCirce sorry that for the LAION AI case, I did not have time to make the fine-tuning work for my dataset.

@cvillela
Copy link

Hello @underdogliu , I also am very interested in fine-tuning CLAP for some specific use cases, and would love some insight on how you prepared your data!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants