Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any release date ? #1

Open
jarredou opened this issue Jul 5, 2023 · 9 comments
Open

Any release date ? #1

jarredou opened this issue Jul 5, 2023 · 9 comments

Comments

@jarredou
Copy link

jarredou commented Jul 5, 2023

No description provided.

@jarredou jarredou changed the title Any released date ? Any release date ? Jul 5, 2023
@ilic-mezza
Copy link
Collaborator

Hi @jarredou,

We plan to release the source code and pretrained models after the publication of our paper "Toward Deep Drum Source Separation," which is currently under peer-review. Unfortunately, we do not have a set date yet.

Regardless, the full dataset is now freely available on Zenodo.

@riccardogiampiccolo
Copy link
Collaborator

Hi @jarredou,

we published the source code, and the paper is now on ArXiv! It is just a preprint though, we have submitted the paper to Pattern Recognition Letters. You can take a look!

Best

@jarredou
Copy link
Author

jarredou commented Dec 19, 2023

Thanks ! Great paper already !
I don't see any training code, have you planned to also release it (maybe when final paper will be published) ?

PS : I've made a quick Google Colab adapation for inference: https://github.com/jarredou/larsnet-colab

@ilic-mezza
Copy link
Collaborator

ilic-mezza commented Dec 19, 2023

Hi @jarredou,

We developed LarsNet using an early version of StemGMD. Therefore, it will take us some time to refactor the code and have it working with the off-the-shelf version available on Zenodo.

We plan to release the training code soon. I am sure it will be available by the time the article is published. In the meanwhile, we added a section in the README.

Thanks for the colab, it's a great idea!

@ilic-mezza ilic-mezza reopened this Dec 19, 2023
@jarredou
Copy link
Author

jarredou commented Dec 20, 2023

The inference speed is really mindblowing, even on CPU, that's really amazing, congrats for that !

About the quality, do you think that with more epochs the baseline models would perform better ? Because 22 epochs seems quite low seen from outside, and some project like drumsep (demucs-based, with smaller and private dataset but with more sound diversity) are getting quite qood results, probably with more training epochs for each models. What do you think ?

@ilic-mezza
Copy link
Collaborator

ilic-mezza commented Dec 20, 2023

Hi @jarredou,

We process 110k clips per epoch; with a batch size of 24, this corresponds to just above 4500 batches. This means that each U-Net model is trained for about 100k steps, which is pretty standard. After 100k steps, the validation loss had already stop decreasing, so I reckon we'd need more than increasing the number of epochs to improve the output quality.

We already have few ideas for a v2. Most importantly adding synthetic drums to the dataset, but also improve the robustness to stereo imaging that we noticed can sometimes cause problems. Which artifacts are you more concerned with?

I'll try and take a look at drumsep in the next few days. (We were not aware there was already a drums demixing model out there, thanks for the heads up!)

@jarredou
Copy link
Author

jarredou commented Dec 20, 2023

I can't talk for everybody, but for most of my own use cases, I prefer separations with occasional bleed but with full sounding targeted stem than separations with no bleed, but missing some content in the targeted stem (with "underwater"-like sounding on some parts).
Occasional bleed is easier to remove with auto/manual postprocessing, the missing content is way more difficult to handle.
But I know that other people prefer it the other way.

Using Demucs like Drumsep did is a good idea because Demucs until recently (see next message) was the best open-source architecture to separate drums from full mixture, better than KUIElab's TFC-TDF-Net when trained on the same dataset.

(The drumsep model original download link is dead, but it was shared later in that issue: inagoy/drumsep#3 (comment), and it was a student project, there is no publication related to it.)

@jarredou
Copy link
Author

jarredou commented Dec 20, 2023

Side-note: lucidrains has open-sourced SAMI-Bytedance's work (which is the current SOTA in music source separation, by a quite big step):
https://github.com/lucidrains/BS-RoFormer/

You may also find interesting this work, aimed at enhancing source separated audio with a GAN : https://github.com/interactiveaudiolab/MSG

@ilic-mezza
Copy link
Collaborator

ilic-mezza commented Dec 21, 2023

Sure thing! Demucs is arguably a better architecture than our Spleeter-like model. Nevertheless, at this point, we mainly wanted to showcase StemGMD by releasing a baseline for future research. This is why we decided to start from a simpler architecture. We will try better architectures as we go forward!

For what concerns bleed vs hard separation artifacts, you may want to play around with the α-Wiener filter. We noticed that choosing α<1 may sometimes lead to more natural sounding stems while allowing for more bleed. You can try and specify the option -w with a value of 0.5. This would nonlinearly modify the masks by applying square root compression. Namely, you could run

$ python separate -i path/to/input/folder -o path/to/output/folder -w 0.5

The best α really depends on the input track, but it's worth trying different values as it may produce more appealing results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants