-
Notifications
You must be signed in to change notification settings - Fork 952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MuLaN #384
Comments
I'll be happy to lead the initial PR, once I get MusicLM to a good place, by next Tuesday I estimate |
Oh, there's actually not much to the audio encoder - either a Resnet50 or Transformer, with the requirement that a SpecAugment is applied first |
@lucidrains definitely sounds worthwhile, having finally merged CoCa I think we have a recent template for integrating more models -- finding the right balance been reusing existing where possible, and adding new bits where it's cleaner to do so. There was a group/person adapting OpenCLIP for audio, I had a note of it at one point but can't track it back down, likely a fairly different approach |
@rwightman awesome! with the new CoCa, and some minor modifications to allow for audio input, we'll have audio captioners too 😄 |
this repository is about to become a big success in the open source world |
software estimates, always multiply by 2 or 3 i'll get around to this tomorrow evening |
also realized the MuLaN authors went with decoupled contrastive learning but i question how important this is probably bigger gains to be had just simply applying |
PR started here |
I guess figuring out what data this will be trained on would be prudent, the win with most projects so far is that we've managed to wrangle enough coding help, compute, AND data to train at scale and release something. This was the other proj I was thinking of, forked from here at some point https://github.com/LAION-AI/CLAP The AudioLDM is related https://github.com/haoheliu/AudioLDM I wonder if Christoph @ LAION has anything in his pile of dataset TODOs that overlaps |
@rwightman yes, i've already reached out to Yusong @lukewys . He and Ke @RetroCirce have graciously offered to help out with the hyperparameters for the spectrogram, specaugment, and some of the intricacies with data loading |
@haoheliu also, if you are interested in MuLaN, join the fun 😄 |
The new MusicLM relies on an audio CLIP named MuLaN
I will build out an initial implementation here, but eventually we should also get the audio encoder design into open clip, so that we can do audio-text contrastive learning.
The text was updated successfully, but these errors were encountered: