Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MuLaN #384

Open
lucidrains opened this issue Jan 27, 2023 · 11 comments
Open

MuLaN #384

lucidrains opened this issue Jan 27, 2023 · 11 comments

Comments

@lucidrains
Copy link
Contributor

The new MusicLM relies on an audio CLIP named MuLaN

I will build out an initial implementation here, but eventually we should also get the audio encoder design into open clip, so that we can do audio-text contrastive learning.

@lucidrains
Copy link
Contributor Author

I'll be happy to lead the initial PR, once I get MusicLM to a good place, by next Tuesday I estimate

@lucidrains
Copy link
Contributor Author

Oh, there's actually not much to the audio encoder - either a Resnet50 or Transformer, with the requirement that a SpecAugment is applied first

@rwightman
Copy link
Collaborator

@lucidrains definitely sounds worthwhile, having finally merged CoCa I think we have a recent template for integrating more models -- finding the right balance been reusing existing where possible, and adding new bits where it's cleaner to do so.

There was a group/person adapting OpenCLIP for audio, I had a note of it at one point but can't track it back down, likely a fairly different approach

@lucidrains
Copy link
Contributor Author

@rwightman awesome! with the new CoCa, and some minor modifications to allow for audio input, we'll have audio captioners too 😄

@lucidrains
Copy link
Contributor Author

this repository is about to become a big success in the open source world

@lucidrains
Copy link
Contributor Author

I'll be happy to lead the initial PR, once I get MusicLM to a good place, by next Tuesday I estimate

software estimates, always multiply by 2 or 3

i'll get around to this tomorrow evening

@lucidrains
Copy link
Contributor Author

also realized the MuLaN authors went with decoupled contrastive learning but i question how important this is

probably bigger gains to be had just simply applying CoCa to MuLaN. I am also redoing the audio spectrogram transformer with a better design

@lucidrains
Copy link
Contributor Author

PR started here

@rwightman
Copy link
Collaborator

I guess figuring out what data this will be trained on would be prudent, the win with most projects so far is that we've managed to wrangle enough coding help, compute, AND data to train at scale and release something.

This was the other proj I was thinking of, forked from here at some point https://github.com/LAION-AI/CLAP

The AudioLDM is related https://github.com/haoheliu/AudioLDM

I wonder if Christoph @ LAION has anything in his pile of dataset TODOs that overlaps

@lucidrains
Copy link
Contributor Author

@rwightman yes, i've already reached out to Yusong @lukewys . He and Ke @RetroCirce have graciously offered to help out with the hyperparameters for the spectrogram, specaugment, and some of the intricacies with data loading

@lucidrains
Copy link
Contributor Author

@haoheliu also, if you are interested in MuLaN, join the fun 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants