Make BaseWaveformTransform able to pad transform outputs that have a different length #158

iver56 · 2022-10-05T13:58:44Z

This is a prerequisite for transforms that change the length, like time stretching

hbredin · 2022-10-05T14:09:29Z

Related: it would be nice for such length-changing transforms to expose some kind of API to let the user know about this behavior.

For instance, in pyannote.audio, I usually train models on chunks of fixed 5s length.
When using a time streching transform, I'd still want to have the output of the transform be 5s long.

If the transform speeds the signal by a factor of up to 2, I should therefore be warned to input 10s chunks (so that the output actually is at least 5s long).

So it could expose something like TimeStreching.input_output_length_ratio := 0.5
By default input_output_length_ratio would be 1.0.
Compose.input_output_length_ratio would be the product of composed transforms's input_output_length_ratio attribute.

Does that make sense? Or is this out of the scope of the library?

iver56 · 2022-10-06T06:58:30Z

it would be nice for such length-changing transforms to expose some kind of API to let the user know about this behavior.

Yes, I had the same idea.

An idea is that e.g. time stretching can make the audio shorter or longer, and then RandomCrop can make sure that we get a fixed length afterwards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make BaseWaveformTransform able to pad transform outputs that have a different length #158

Make BaseWaveformTransform able to pad transform outputs that have a different length #158

iver56 commented Oct 5, 2022

hbredin commented Oct 5, 2022

iver56 commented Oct 6, 2022

Make BaseWaveformTransform able to pad transform outputs that have a different length #158

Make BaseWaveformTransform able to pad transform outputs that have a different length #158

Comments

iver56 commented Oct 5, 2022

hbredin commented Oct 5, 2022

iver56 commented Oct 6, 2022