MuLaN #384

lucidrains · 2023-01-27T19:22:19Z

The new MusicLM relies on an audio CLIP named MuLaN

I will build out an initial implementation here, but eventually we should also get the audio encoder design into open clip, so that we can do audio-text contrastive learning.

lucidrains · 2023-01-27T19:23:09Z

I'll be happy to lead the initial PR, once I get MusicLM to a good place, by next Tuesday I estimate

lucidrains · 2023-01-27T21:57:31Z

Oh, there's actually not much to the audio encoder - either a Resnet50 or Transformer, with the requirement that a SpecAugment is applied first

rwightman · 2023-01-29T05:14:40Z

@lucidrains definitely sounds worthwhile, having finally merged CoCa I think we have a recent template for integrating more models -- finding the right balance been reusing existing where possible, and adding new bits where it's cleaner to do so.

There was a group/person adapting OpenCLIP for audio, I had a note of it at one point but can't track it back down, likely a fairly different approach

lucidrains · 2023-01-29T17:26:13Z

@rwightman awesome! with the new CoCa, and some minor modifications to allow for audio input, we'll have audio captioners too 😄

lucidrains · 2023-01-29T17:26:41Z

this repository is about to become a big success in the open source world

lucidrains · 2023-02-01T18:09:46Z

I'll be happy to lead the initial PR, once I get MusicLM to a good place, by next Tuesday I estimate

software estimates, always multiply by 2 or 3

i'll get around to this tomorrow evening

lucidrains · 2023-02-01T18:11:07Z

also realized the MuLaN authors went with decoupled contrastive learning but i question how important this is

probably bigger gains to be had just simply applying CoCa to MuLaN. I am also redoing the audio spectrogram transformer with a better design

lucidrains · 2023-02-04T03:21:11Z

PR started here

rwightman · 2023-02-04T23:04:44Z

I guess figuring out what data this will be trained on would be prudent, the win with most projects so far is that we've managed to wrangle enough coding help, compute, AND data to train at scale and release something.

This was the other proj I was thinking of, forked from here at some point https://github.com/LAION-AI/CLAP

The AudioLDM is related https://github.com/haoheliu/AudioLDM

I wonder if Christoph @ LAION has anything in his pile of dataset TODOs that overlaps

lucidrains · 2023-02-04T23:35:33Z

@rwightman yes, i've already reached out to Yusong @lukewys . He and Ke @RetroCirce have graciously offered to help out with the hyperparameters for the spectrogram, specaugment, and some of the intricacies with data loading

lucidrains · 2023-02-05T17:29:53Z

@haoheliu also, if you are interested in MuLaN, join the fun 😄

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MuLaN #384

MuLaN #384

lucidrains commented Jan 27, 2023

lucidrains commented Jan 27, 2023

lucidrains commented Jan 27, 2023

rwightman commented Jan 29, 2023

lucidrains commented Jan 29, 2023

lucidrains commented Jan 29, 2023

lucidrains commented Feb 1, 2023

lucidrains commented Feb 1, 2023

lucidrains commented Feb 4, 2023

rwightman commented Feb 4, 2023

lucidrains commented Feb 4, 2023

lucidrains commented Feb 5, 2023

MuLaN #384

MuLaN #384

Comments

lucidrains commented Jan 27, 2023

lucidrains commented Jan 27, 2023

lucidrains commented Jan 27, 2023

rwightman commented Jan 29, 2023

lucidrains commented Jan 29, 2023

lucidrains commented Jan 29, 2023

lucidrains commented Feb 1, 2023

lucidrains commented Feb 1, 2023

lucidrains commented Feb 4, 2023

rwightman commented Feb 4, 2023

lucidrains commented Feb 4, 2023

lucidrains commented Feb 5, 2023