-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any release date ? #1
Comments
Thanks ! Great paper already ! PS : I've made a quick Google Colab adapation for inference: https://github.com/jarredou/larsnet-colab |
Hi @jarredou, We developed LarsNet using an early version of StemGMD. Therefore, it will take us some time to refactor the code and have it working with the off-the-shelf version available on Zenodo. We plan to release the training code soon. I am sure it will be available by the time the article is published. In the meanwhile, we added a section in the README. Thanks for the colab, it's a great idea! |
The inference speed is really mindblowing, even on CPU, that's really amazing, congrats for that ! About the quality, do you think that with more epochs the baseline models would perform better ? Because 22 epochs seems quite low seen from outside, and some project like drumsep (demucs-based, with smaller and private dataset but with more sound diversity) are getting quite qood results, probably with more training epochs for each models. What do you think ? |
Hi @jarredou, We process 110k clips per epoch; with a batch size of 24, this corresponds to just above 4500 batches. This means that each U-Net model is trained for about 100k steps, which is pretty standard. After 100k steps, the validation loss had already stop decreasing, so I reckon we'd need more than increasing the number of epochs to improve the output quality. We already have few ideas for a v2. Most importantly adding synthetic drums to the dataset, but also improve the robustness to stereo imaging that we noticed can sometimes cause problems. Which artifacts are you more concerned with? I'll try and take a look at |
I can't talk for everybody, but for most of my own use cases, I prefer separations with occasional bleed but with full sounding targeted stem than separations with no bleed, but missing some content in the targeted stem (with "underwater"-like sounding on some parts). Using Demucs like Drumsep did is a good idea because Demucs until recently (see next message) was the best open-source architecture to separate drums from full mixture, better than KUIElab's TFC-TDF-Net when trained on the same dataset. (The drumsep model original download link is dead, but it was shared later in that issue: inagoy/drumsep#3 (comment), and it was a student project, there is no publication related to it.) |
Side-note: lucidrains has open-sourced SAMI-Bytedance's work (which is the current SOTA in music source separation, by a quite big step): You may also find interesting this work, aimed at enhancing source separated audio with a GAN : https://github.com/interactiveaudiolab/MSG |
Sure thing! Demucs is arguably a better architecture than our Spleeter-like model. Nevertheless, at this point, we mainly wanted to showcase StemGMD by releasing a baseline for future research. This is why we decided to start from a simpler architecture. We will try better architectures as we go forward! For what concerns bleed vs hard separation artifacts, you may want to play around with the α-Wiener filter. We noticed that choosing α<1 may sometimes lead to more natural sounding stems while allowing for more bleed. You can try and specify the option
The best α really depends on the input track, but it's worth trying different values as it may produce more appealing results. |
No description provided.
The text was updated successfully, but these errors were encountered: