Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed table for audiomentations #190

Open
ZFTurbo opened this issue May 15, 2022 · 3 comments
Open

Speed table for audiomentations #190

ZFTurbo opened this issue May 15, 2022 · 3 comments
Labels
documentation Add or improve documentation good first issue Good for newcomers

Comments

@ZFTurbo
Copy link

ZFTurbo commented May 15, 2022

I created small code to test speed of augmentation. I made it for myself but I think it will be useful to have it in repository somewhere.

Aug: AddGaussianNoise Time: 2.23 sec Per sample: 0.022290 sec
Aug: AddGaussianSNR Time: 2.58 sec Per sample: 0.025806 sec
Aug: ApplyImpulseResponse Time: 4.13 sec Per sample: 0.041310 sec
Aug: BandPassFilter Time: 1.02 sec Per sample: 0.010221 sec
Aug: BandStopFilter Time: 1.01 sec Per sample: 0.010077 sec
Aug: HighPassFilter Time: 0.92 sec Per sample: 0.009171 sec
Aug: HighShelfFilter Time: 0.85 sec Per sample: 0.008480 sec
Aug: LowPassFilter Time: 0.91 sec Per sample: 0.009150 sec
Aug: LowShelfFilter Time: 0.85 sec Per sample: 0.008496 sec
Aug: PeakingFilter Time: 0.85 sec Per sample: 0.008530 sec
Aug: ClippingDistortion Time: 1.47 sec Per sample: 0.014670 sec
Aug: GainTransition Time: 0.42 sec Per sample: 0.004211 sec
Aug: Mp3Compression Time: 38.12 sec Per sample: 0.381207 sec
Aug: LoudnessNormalization Time: 3.23 sec Per sample: 0.032335 sec
Aug: PitchShift Time: 70.60 sec Per sample: 0.705962 sec
Aug: PolarityInversion Time: 0.10 sec Per sample: 0.001050 sec
Aug: Resample Time: 26.85 sec Per sample: 0.268525 sec
Aug: Reverse Time: 0.00 sec Per sample: 0.000010 sec
Aug: RoomSimulator Time: 31.89 sec Per sample: 0.318857 sec
Aug: SevenBandParametricEQ Time: 5.95 sec Per sample: 0.059508 sec
Aug: Shift Time: 0.10 sec Per sample: 0.001000 sec
Aug: TanhDistortion Time: 1.80 sec Per sample: 0.018049 sec
Aug: TimeMask Time: 0.13 sec Per sample: 0.001340 sec
Aug: TimeStretch Time: 40.79 sec Per sample: 0.407884 sec

from audiomentations import *
import tqdm
import soundfile as sf

def check_audiomentations_speed(
        path_to_wav_files,
        maximum_files=10,
        sample_rate=44100,
        save_to_check=False,
):
    wav_paths = glob.glob(path_to_wav_files + '/*.wav')[:maximum_files]
    data = []
    for i, wav_path in tqdm.tqdm(enumerate(wav_paths)):
        audio1, _ = librosa.load(wav_path, sr=44100, mono=False)
        data.append(audio1)

        if save_to_check:
            audio1 = audio1.transpose()
            out_folder = CACHE_PATH + 'original' + '/'
            if not os.path.isdir(out_folder):
                os.mkdir(out_folder)
            save_path = out_folder + os.path.basename(wav_paths[i])
            sf.write(save_path, audio1, samplerate=sample_rate, subtype='float')

    full_list_to_check = [
        AddGaussianNoise(p=1.0, min_amplitude=0.001, max_amplitude=0.025),
        AddGaussianSNR(p=1.0, min_snr_in_db=5, max_snr_in_db=40.0),
        ApplyImpulseResponse(p=1.0, ir_path=INPUT_PATH + 'ir_data/', lru_cache_size=500, leave_length_unchanged=True),

        BandPassFilter(p=1.0, min_center_freq=200.0, max_center_freq=4000.0, min_bandwidth_fraction=0.5, max_bandwidth_fraction=1.99, min_rolloff=12, max_rolloff=24,),
        BandStopFilter(p=1.0, min_center_freq=200.0, max_center_freq=4000.0, min_bandwidth_fraction=0.5, max_bandwidth_fraction=1.99, min_rolloff=12, max_rolloff=24,),
        HighPassFilter(p=1.0, min_cutoff_freq=20, max_cutoff_freq=2400, min_rolloff=12, max_rolloff=24, zero_phase=False, ),
        HighShelfFilter(p=1.0, min_center_freq=300.0, max_center_freq=7500.0, min_gain_db=-18.0, max_gain_db=18.0, min_q=0.1, max_q=0.999,),
        LowPassFilter(p=1.0,  min_cutoff_freq=150, max_cutoff_freq=7500, min_rolloff=12, max_rolloff=24, zero_phase=False,),
        LowShelfFilter(p=1.0, min_center_freq=50.0, max_center_freq=4000.0, min_gain_db=-18.0, max_gain_db=18.0, min_q=0.1, max_q=0.999,),
        PeakingFilter(p=1.0, min_center_freq=50.0, max_center_freq=7500.0, min_gain_db=-24, max_gain_db=24, min_q=0.5, max_q=5.0, ),

        ClippingDistortion(p=1.0, min_percentile_threshold=0, max_percentile_threshold=40),
        GainTransition(p=1.0, min_gain_in_db=-24.0,  max_gain_in_db=6.0, min_duration=0.2, max_duration=6.0,),
        Mp3Compression(p=1.0, min_bitrate=8, max_bitrate=128, backend="pydub",),
        LoudnessNormalization(p=1.0, min_lufs_in_db=-31, max_lufs_in_db=-13),
        PitchShift(p=1.0, min_semitones=-4, max_semitones=4),
        PolarityInversion(p=1.0, ),
        Resample(p=1.0, min_sample_rate=8000, max_sample_rate=44100),
        Reverse(p=1.0, ),
        RoomSimulator(p=1.0, ),
        SevenBandParametricEQ(p=1.0, min_gain_db=-12.0, max_gain_db=12.0,),
        Shift(p=1.0, min_fraction=-0.5, max_fraction=0.5, rollover=True, fade=False, fade_duration=0.01,),
        TanhDistortion(p=1.0, min_distortion=0.01, max_distortion=0.7),
        TimeMask(p=1.0, min_band_part=0.0, max_band_part=0.1, fade=False),
        TimeStretch(p=1.0, min_rate=0.8, max_rate=1.25, leave_length_unchanged=True, ),
    ]

    # Not available
    # AddShortNoises(p=1.0),
    # AirAbsorption()

    for f in full_list_to_check:
        name = f.__class__.__name__
        aug1 = Compose([
            f,
        ], p=1.0)

        start_time = time.time()
        for i, wav in enumerate(data):
            try:
                audio1 = aug1(samples=wav, sample_rate=sample_rate)
            except Exception as e:
                print('Augmentation error: {}'.format(str(e)))
                continue
            if save_to_check:
                audio1 = audio1.transpose()
                out_folder = CACHE_PATH + name + '/'
                if not os.path.isdir(out_folder):
                    os.mkdir(out_folder)
                save_path = out_folder + os.path.basename(wav_paths[i])
                sf.write(save_path, audio1, samplerate=sample_rate, subtype='float')

        delta = time.time() - start_time
        print('Aug: {} Time: {:.2f} sec Per sample: {:.6f} sec'.format(name, delta, delta / len(data)))


if __name__ == '__main__':
    path_to_wav_files = INPUT_PATH + 'train_wav/'
    maximum_files = 100
    sample_rate = 44100
    save_to_check = False

    check_audiomentations_speed(
        path_to_wav_files,
        maximum_files,
        sample_rate,
        save_to_check,
    )
@iver56
Copy link
Owner

iver56 commented May 16, 2022

Measuring execution times is definitely relevant :) There's also code for doing that in the demo script, which will output something similar:

AddBackgroundNoiseRelative       0.056 s (std: 0.064 s)
AddBackgroundNoiseAbsolute       0.054 s (std: 0.063 s)
AddBackgroundNoiseWithTransform  0.055 s (std: 0.064 s)
AddGaussianNoise                 0.011 s (std: 0.000 s)
AddGaussianSNR                   0.012 s (std: 0.000 s)
ApplyImpulseResponseWithTail     0.030 s
ApplyImpulseResponseLeaveLengthUnchanged 0.029 s
AddShortNoisesAbsolute           0.019 s (std: 0.009 s)
AddShortNoisesRelative           0.018 s (std: 0.012 s)
AddShortNoisesWithSignalGain     0.041 s (std: 0.018 s)
AddShortNoisesWithNoiseTransform 4.793 s (std: 2.357 s)
BandPassFilter                   0.006 s (std: 0.001 s)
BandStopFilter                   0.006 s (std: 0.001 s)
ClippingDistortion               0.007 s (std: 0.000 s)
FrequencyMask                    0.008 s (std: 0.000 s)
Gain                             0.001 s (std: 0.000 s)
GainTransition                   0.004 s (std: 0.002 s)
HighPassFilter                   0.005 s (std: 0.000 s)
HighShelfFilter                  0.004 s (std: 0.000 s)
LowPassFilter                    0.005 s (std: 0.001 s)
LowShelfFilter                   0.004 s (std: 0.000 s)
PitchShift                       0.475 s (std: 0.052 s)
LoudnessNormalization            0.018 s (std: 0.002 s)
Mp3CompressionLameenc            3.802 s (std: 0.447 s)
Mp3CompressionPydub              4.390 s (std: 0.408 s)
Normalize                        0.001 s
PaddingSilenceEnd                0.001 s (std: 0.000 s)
PaddingWrapEnd                   0.001 s (std: 0.000 s)
PaddingReflectEnd                0.001 s (std: 0.000 s)
PaddingSilenceStart              0.001 s (std: 0.000 s)
PaddingWrapStart                 0.001 s (std: 0.000 s)
PeakingFilter                    0.006 s (std: 0.000 s)
PolarityInversion                0.001 s
Resample                         0.376 s (std: 0.041 s)
Reverse                          0.000 s
RoomSimulator                    0.392 s (std: 0.143 s)
SevenBandParametricEQ            0.033 s (std: 0.002 s)
ShiftWithoutFade                 0.001 s (std: 0.000 s)
ShiftWithShortFade               0.001 s (std: 0.000 s)
ShiftWithoutRolloverWithLongFade 0.001 s (std: 0.000 s)
TanhDistortion                   0.012 s (std: 0.002 s)
TimeMask                         0.001 s (std: 0.000 s)
TimeStretch                      0.218 s (std: 0.023 s)
Trim                             0.006 s
BigCompose                       0.314 s (std: 0.316 s)
AirAbsorption                    0.049 s (std: 0.003 s)

I think if we make a plot (with logarithmic exec time axis) it can be included in the readme so people can get an idea of how quick/slow the transforms are

@ZFTurbo
Copy link
Author

ZFTurbo commented May 16, 2022

Yes, I think this info is very useful. May be it can be independent page with results but with link on it from main page.

Also I propose to move Changelog from main page to some other file like "Changes.md" with adding link to it.

I noticed that PitchShift and TimeStretch is very useful but very slow... Need to think how speed up them.

@iver56
Copy link
Owner

iver56 commented May 16, 2022

According to Spijkervet, the pitch shifting implementation in WavAugment is fast

https://twitter.com/JanneSpijkervet/status/1292411014584180736

There's also a pitch shift transform in https://github.com/asteroid-team/torch-audiomentations but that isn't very fast

Then there's https://github.com/maxrmorrison/clpcnet which is good, but only works for speech and only with a 16 kHz sample rate

@iver56 iver56 added the documentation Add or improve documentation label Sep 14, 2022
@iver56 iver56 added the good first issue Good for newcomers label Oct 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Add or improve documentation good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants