Torch and Torchaudio makes Docker Image size very big #380

h3110Fr13nd · 2024-08-23T13:08:23Z

Image size is currently roughly 6gb. Which isn't good. As we aren't actually inferencing any ml model locally.

Most of the image size is due to torch and torchaudio installing 4-5 GBs of nvidia packages. Which can be replaced with lightweight libraries like pydub, wave etc.

Remove Torch and Torchaudio dependency

h3110Fr13nd · 2024-08-23T13:09:11Z

Already halfway there. Can you assign this to me.

marmikcfc · 2024-08-24T18:40:12Z

Hey @h3110Fr13nd, can you ensure the changing pydub doesn't affect the latency in calls? Because I do remember pydub degrading quality of calls a bit because of time consuming operation.

h3110Fr13nd · 2024-08-25T02:18:50Z

Hey @h3110Fr13nd, can you ensure the changing pydub doesn't affect the latency in calls? Because I do remember pydub degrading quality of calls a bit because of time consuming operation.

Sure, I'll check and tell

h3110Fr13nd · 2024-08-25T20:38:30Z

Of Course, Your Concern were correct.

Although i didn't really find any difference in taking calls, But writing testcases to compare Old and new functions, showed almost similar execution times or better, with an exception of resample function making it worse by more than couple of times.

I tried various libraries for resampling Like torchaudio, soxr, pydub, scipy, numpy, soundfile, librosa etc.
Of course plain numpy maybe faster in some cases, but it is linear interpolation, So not good quality output of resampled audio. soxr was best in resampling faster and returned a high quality output and is a lightweight library.

python test.py

Old pcm_to_wav_bytes function took 1.1920928955078125e-06 seconds
New pcm_to_wav_bytes function took 4.76837158203125e-07 seconds
.

Resampling from 24000 to 8000
Torchaudio resample function took 0.00834965705871582 seconds
pydub audiosegment resample function took 0.0986635684967041 seconds
Soxr resample function took 0.00360107421875 seconds
Numpy resample function took 0.004979848861694336 seconds
Scipy resample function took 0.0089263916015625 seconds
.

Old wav_bytes_to_pcm function took 5.340576171875e-05 seconds
New wav_bytes_to_pcm function took 4.2438507080078125e-05 seconds
.

----------------------------------------------------------------------
Ran 4 tests in 0.417s

OK

I've commited the changes to use soxr for resampling. Confirming no latency by replacing torchaudio

h3110Fr13nd · 2024-08-27T09:54:09Z

@marmikcfc @prateeksachan Please review and give suggestions if any?

prateeksachan assigned h3110Fr13nd Aug 23, 2024

This was referenced Aug 23, 2024

Using Torch Audio instead of Scipy #370

Open

feat: Replace torchaudio with pydub #381

Open

h3110Fr13nd linked a pull request Sep 1, 2024 that will close this issue

Install bolna from repo #388

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torch and Torchaudio makes Docker Image size very big #380

Torch and Torchaudio makes Docker Image size very big #380

h3110Fr13nd commented Aug 23, 2024 •

edited

Loading

h3110Fr13nd commented Aug 23, 2024

marmikcfc commented Aug 24, 2024

h3110Fr13nd commented Aug 25, 2024

h3110Fr13nd commented Aug 25, 2024 •

edited

Loading

h3110Fr13nd commented Aug 27, 2024

Torch and Torchaudio makes Docker Image size very big #380

Torch and Torchaudio makes Docker Image size very big #380

Comments

h3110Fr13nd commented Aug 23, 2024 • edited Loading

h3110Fr13nd commented Aug 23, 2024

marmikcfc commented Aug 24, 2024

h3110Fr13nd commented Aug 25, 2024

h3110Fr13nd commented Aug 25, 2024 • edited Loading

h3110Fr13nd commented Aug 27, 2024

h3110Fr13nd commented Aug 23, 2024 •

edited

Loading

h3110Fr13nd commented Aug 25, 2024 •

edited

Loading