Optimized Background Noise Augmentation for Large Background Files #360

PratikKulkar · 2024-10-12T18:14:56Z

Proposed Algorithm:

Get duration of the background file (bg_file) in seconds.
Sample a random value from the range [0, bg_file_seconds - event_file_seconds).
Read the background file from sampled_value to sampled_value + event_file_seconds.

This approach ensures that:

We only load a portion of the background file required for the augmentation.
It maintains randomness in background selection while reducing memory overhead.
It is adaptable to cases with varied sample rates and event/background file durations.

Experiments and Results:

I’ve tested this algorithm using:

Event durations ranging from 1 to 9 seconds.
Background durations ranging from 81 to 10,000 seconds.
Sample rates: 16,000 Hz, 22,500 Hz, and 44,100 Hz.

This optimized approach significantly reduces memory usage while maintaining augmentation quality. I’ve attached the comparison plot showcasing the performance difference for your reference.

Improves scalability by avoiding unnecessary memory consumption for large files.
Enhances performance in real-time audio augmentation workflows.
Can be integrated as a feature or an option in AddBackgroundNoise to provide more flexibility to users.

Please let me know your thoughts on this proposal and if any further details or clarifications are needed.

In the below figure First plot shows difference between memory usage over the test_cases normalized for by 1e6 and next graph is time taken comparison of old vs proposed.

iver56 · 2024-10-14T07:01:24Z

Thanks for the PR. I will have a closer look when I have time

iver56 · 2025-01-15T13:50:02Z

In this case I would prefer lazy caching over eager caching. The difference becomes quite noticeable when there is a large number of files. Hypothetically, if you have half a million files, and it takes 1 ms to check the duration of each file, initializing the class would take 500 seconds. On the other hand, with lazy caching, initializing the class would be almost instant.

PratikKulkar · 2025-01-15T18:20:21Z

Hello @iver56,

Thank you for your valuable feedback. I have implemented the suggested changes and replaced eager caching with lazy caching. The system now caches file-related time information on demand, significantly improving the initialization speed for large datasets.

I do have a question regarding the lookup mechanism for file time information. Currently, I am using a dictionary for this purpose, but its average-case time complexity for lookups is not guaranteed to be constant. I am exploring an alternative approach using an array of size len(sound_file_paths). With this method:

Each file would be assigned an index (e.g., from 0 to len(sound_file_paths) - 1).
File paths and corresponding time information could then be accessed directly using the index, enabling efficient retrieval.

Additionally, I was wondering if there’s any provision in the current system to prioritize sampling certain files more frequently than others—for instance, based on importance, weight, or any custom-defined priority. If such functionality does not currently exist, is there a plan to introduce it in the future?

Thank you for your time and guidance. I appreciate your input and look forward to your feedback!

Best regards,
Pratik Kulkar

iver56 · 2025-01-16T13:08:27Z

Thanks for implementing that change

I do have a question regarding the lookup mechanism for file time information. Currently, I am using a dictionary for this purpose, but its average-case time complexity for lookups is not guaranteed to be constant. I am exploring an alternative approach using an array of size len(sound_file_paths). With this method:

Each file would be assigned an index (e.g., from 0 to len(sound_file_paths) - 1).

File paths and corresponding time information could then be accessed directly using the index, enabling efficient retrieval.

dict lookups are O(1) on average for both string and integer keys
Having integers as keys is faster than having strings as keys, due to faster hashing and comparison. And it uses less memory.
Accessing a value in an array/list is also O(1), but in practice it is faster than a dict lookup
a numpy array requires less memory than a python-native list of floats

Here's a rough comparison of the memory usage in the three different alternatives, given that there are half a million items:
List of floats: ~16 MB
Dictionary (int keys, float values): ~47 MB
NumPy array (float32): ~2 MB

If you feel like optimizing it with your array idea, here's my green light: 🟢

iver56 · 2025-01-16T13:10:12Z

Additionally, I was wondering if there’s any provision in the current system to prioritize sampling certain files more frequently than others—for instance, based on importance, weight, or any custom-defined priority. If such functionality does not currently exist, is there a plan to introduce it in the future?

I don't have any immediate plans for adding that feature, but you're welcome to add an issue for it

PratikKulkar · 2025-01-16T19:26:40Z

Hello @iver56 ,

Thank you for your detailed response and insights into the performance and memory usage of different data structures. The comparison between list, dictionary, and NumPy array was particularly helpful.

Based on your feedback:

I have implemented the idea of saving time information as discussed, with a focus on efficient storage and retrieval.
I have also fixed a logical bug in the previous commit titled "Avoiding Preloading" to ensure the updated implementation aligns with the lazy caching approach.

Regarding the feature to prioritize file sampling based on weights or importance, I will create an issue for it in the repository to track any discussions or future plans around it.

Thank you once again for your guidance and support. I look forward to your feedback on the latest changes.

Best regards,
Pratik Kulkar

better_background_noise_aug

3ffbe41

Cached Time Information

db96d4b

Avoiding Preoloading

7780f67

PratikKulkar added 2 commits January 17, 2025 00:36

Array based logic implementation

6da9120

float -> np.float32

67a6a55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized Background Noise Augmentation for Large Background Files #360

Optimized Background Noise Augmentation for Large Background Files #360

PratikKulkar commented Oct 12, 2024

iver56 commented Oct 14, 2024

iver56 commented Jan 15, 2025 •

edited

Loading

PratikKulkar commented Jan 15, 2025

iver56 commented Jan 16, 2025

iver56 commented Jan 16, 2025

PratikKulkar commented Jan 16, 2025

Optimized Background Noise Augmentation for Large Background Files #360

Are you sure you want to change the base?

Optimized Background Noise Augmentation for Large Background Files #360

Conversation

PratikKulkar commented Oct 12, 2024

iver56 commented Oct 14, 2024

iver56 commented Jan 15, 2025 • edited Loading

PratikKulkar commented Jan 15, 2025

iver56 commented Jan 16, 2025

iver56 commented Jan 16, 2025

PratikKulkar commented Jan 16, 2025

iver56 commented Jan 15, 2025 •

edited

Loading