Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement p_mode="per_example" in Compose() #90

Open
keunwoochoi opened this issue Jul 21, 2021 · 13 comments
Open

Implement p_mode="per_example" in Compose() #90

keunwoochoi opened this issue Jul 21, 2021 · 13 comments

Comments

@keunwoochoi
Copy link
Contributor

Hi, thanks for this great software!

Is per_example supported currently or not? With the ValueError raised in Compose (https://github.com/asteroid-team/torch-audiomentations/blob/master/torch_audiomentations/core/composition.py#L30), I assume it is not supported in Compose. But the readme says it is supported - does it mean that it's supported in individual transforms but not in Compose?

Maybe it's worth using it in the example code in readme :)

@keunwoochoi
Copy link
Contributor Author

A follow-up question. What would be the best practice to apply a sequence of augmentation to the examples in a batch while varying the randomized parameters per example?

@iver56
Copy link
Collaborator

iver56 commented Jul 22, 2021

Hi keunwoochoi :) Thanks for the appreciation.

Sorry for the confusion. Please let me try to explain.

mode is not the same as p_mode.

mode is about how audio gets grouped when applying transforms.

For example mode="per_channel" means that each channel gets augmented independently (with different parameters)

mode="per_example" means that every piece of audio (which can be multichannel or mono) gets augmented independently - this is what one typically wants.

mode="per_batch" means that all the audio snippets in a batch get augmented in the same way.

p_mode refers to the behavior of "p", the probability of applying the transform.

p_mode="per_batch" together with e.g. p=0.5 means that a transform will be applied to only 50% of the batches on average. I.e. ~50% of the time you call it, it will be a no-op (it will do nothing).

p_mode="per_example" together with p=0.5 means that a transform will be applied to 50% of the examples (audio snippets) in a batch on average. The others will be left untouched.

p_mode="per_channel" together with p=0.5 means that the transform will be applied to 50% of the channels on average.

We can think of Compose as a transform that does several transforms in it. I think in Compose it is often useful to have a p=1.0, which means it will always run the Compose pipeline on the whole batch, but individual transforms inside it may be turned on or off randomly. If you have a Compose that you want to be applied only e.g. 50% of the calls, you could leave p_mode in Compose at "per_batch" while setting p=0.5.

I haven't defined mode in Compose, because I could not think of a way to have it well-defined.

Maybe I should remove p_mode in Compose to make it less confusing? I'm not sure if I'll ever implement p_mode!="per_batch" in Compose. I guess I could also remove the p in Compose and instead make a wrapper class for skipping things randomly.

@iver56
Copy link
Collaborator

iver56 commented Jul 22, 2021

What would be the best practice to apply a sequence of augmentation to the examples in a batch while varying the randomized parameters per example?

I'm not sure what the best practice is. I guess that depends on the application.

But you could do something like what is mentioned in readme:

apply_augmentation = Compose(
    transforms=[
        Gain(
            min_gain_in_db=-15.0,
            max_gain_in_db=5.0,
            p=0.5,
        ),
        PolarityInversion(p=0.5)
    ]
)

In this case, 50 % of the examples (AKA audio snippets) will get gained and 50 % of the examples (AKA audio snippets) will get polarity-inversed. The two probabilities are independent. The gain values will be different for every example that gets gained.

I would advice you to play around with it. If you want, you can give feedback and/or contributions to the project to make it better, in the spirit of open source, community-driven projects 😄

@iver56
Copy link
Collaborator

iver56 commented Jul 22, 2021

By the way, there is a demo script that applies various transforms in all three modes (per_batch, per_example and per_channel) and writes the results to wav. Listening to these output audio files can help understand what is going on.

Here's the script: https://github.com/asteroid-team/torch-audiomentations/blob/master/scripts/demo.py

@keunwoochoi
Copy link
Contributor Author

Thanks for all the answers! Knowing the difference between p_mode and mode, it seems clear to me that in Compose(), only p_mode=per_batch is allowed. It's still confusing to me, but that's largely because the problem we're solving here is complicated.

Maybe I should remove p_mode in Compose to make it less confusing?

I think the function is definitely useful!

Maybe all we need is attention a nice visualization or two. How about something like this?

@keunwoochoi
Copy link
Contributor Author

keunwoochoi commented Jul 22, 2021

(I drew the image at www.draw.io. You can open this file there https://www.dropbox.com/s/taapi8jaskts6yx/torch-audiomentation?dl=0)

@iver56
Copy link
Collaborator

iver56 commented Jul 22, 2021

Nice visualization :) Should we add it to readme for now? Feel free to make a pull request.

I have not started setting up proper documentation yet.

@keunwoochoi
Copy link
Contributor Author

  • I was trying to make a PR but do you think we should add visualizations whee p_mode is per_example or per_channel?
  • And.. I realized, maybe that figures on the bottom are not correct. It should be p_mode="per_example", right?

@iver56
Copy link
Collaborator

iver56 commented Jul 23, 2021

* I was trying to make a PR but do you think we should add visualizations whee `p_mode` is `per_example` or `per_channel`?

p_mode="per_example" is the most relevant in most cases

* And.. I realized, maybe that figures on the bottom are not correct. It should be `p_mode="per_example"`, right?

Yes, those three on the bottom should say p_mode="per_example" to be correctly aligned with the illustrations 👍

@keunwoochoi
Copy link
Contributor Author

Agree that p_mode="per_example" would be the most relevant. I changed the figure on my side.

Related to that, I think p_mode="per_example" would be quite necessary in Compose(). I don't know the implementation deeply enough but why would it be not well-defined? I'd assume, if Compose(p_mode="per_example", p=0.8), 20% of examples would be never augmented while 80% of them would go through the stochastic augmentation pipeline.

@iver56
Copy link
Collaborator

iver56 commented Jul 23, 2021

You're probably right :) Maybe I thought about it briefly when I initially coded it and thought "this is possible, but I'll leave it as a TODO for later".

@iver56 iver56 closed this as completed Jul 27, 2021
@iver56 iver56 changed the title Is per_example supported or not? Implement p_mode="per_example" in Compose() Jul 27, 2021
@iver56 iver56 reopened this Jul 27, 2021
@HLasse
Copy link
Contributor

HLasse commented Oct 29, 2021

Thumbs up for implementing p_mode = "per_example" from me, would be very helpful. Thanks for an excellent package!

@iver56
Copy link
Collaborator

iver56 commented Oct 29, 2021

I'm glad you like it :) If you want to make a contribution, that would be welcome

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants