-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
d94763b
commit 0db5702
Showing
22 changed files
with
130 additions
and
77 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
# StereoSampleGAN: A Computationally Inexpensive Approach High Fidelity Stereo Audio Generation. | ||
|
||
Abhay Shukla\ | ||
[email protected]\ | ||
Continuation of UCLA COSMOS 2024 Research | ||
|
||
## 1. Abstract | ||
|
||
Existing convolutional aproaches to audio generation often are limited to producing low-fidelity, single-channel, monophonic audio, while demanding significant computational resources for both training and inference. To address these challenges, this work introduces StereoSampleGAN, a novel audio generation architecture that combines a Deep Convolutional Wasserstein GAN (WGAN), attention mechanisms, and loss optimization techniques. StereoSampleGAN allows high-fidelity, stereo audio generation for audio samples while being remaining computationally efficient. Training on three distinct sample datasets with varying spectral overlap–two of kick drums and one of tonal one shots–StereoSampleGAN demonstrates promising results in generating high quality simple stereo sounds. While successfully understanding how to generate the "shape" of required audio, it displays notable limiatations in achieving the correct "tone," in some cases even generating incoherent noise. These results indicate finite limitations and areas for improvement to this approach of audio generation. | ||
|
||
## 2. Introduction | ||
|
||
## 3. Data Manipulation | ||
|
||
## 3.1 Datasets | ||
|
||
This paper utilizes three distinct data sets engineered to measure the model's resilince to variation in spectral content. | ||
|
||
1. Curated Kick Drum Set: Kick drum impulses with primarily short decay profiles. | ||
|
||
2. Diverse Kick Drum Set: Kick drum impulses with greater variation in decay profile and overall harmonic content. | ||
|
||
3. Instrument One Shot Set: Single note impulses capturing the tonal qualities and spectral characteristics of varying synthesizer and instrument sounds. | ||
|
||
These datasets provide robust frameworks for determining the model's response to scaled variation within training data. Most audio is sourced from online "digital audio production sample packs" which compile sounds for a wide variety of generes and use cases. | ||
|
||
## 3.2 Feature Extraction and Encoding | ||
|
||
## 4. Model Implementation | ||
|
||
### 4.1. Architecture | ||
|
||
### 4.2. Training | ||
|
||
## 5. Results and Discussion | ||
|
||
### 5.1. Evaluation | ||
|
||
The model generated 44.1k high quality audio, but not audio of high quality (important distinction). Shape vs tone (fundamental completely missing), why it makes sense (limitations to ft, training for shape of img not AUDIO) | ||
|
||
### 5.2. Contributions | ||
|
||
## 6. Conclusion | ||
|
||
## 7. References |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,7 @@ | ||
from utils.generation_helpers import generate_audio | ||
from usage_params import ( | ||
model_to_generate_with, | ||
training_sample_length, | ||
) | ||
from usage_params import UsageParams | ||
|
||
|
||
# Generate based on usage_params | ||
generate_audio(model_to_generate_with, training_sample_length) | ||
params = UsageParams() | ||
generate_audio(params.model_to_generate_with, params.training_sample_length, True) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,16 +1,22 @@ | ||
# Main params | ||
audio_generation_count = 2 # Audio examples to generate | ||
class UsageParams: | ||
def __init__(self): | ||
self.audio_generation_count = 2 # Audio examples to generate | ||
|
||
# Training params | ||
training_sample_length = 1.5 # seconds | ||
outputs_dir = "outputs" # Where to save your generated audio & model | ||
# Training params | ||
self.training_sample_length = 1.5 # seconds | ||
self.outputs_dir = "outputs" # Where to save your generated audio & model | ||
|
||
model_save_name = "StereoSampleGAN-InstrumentOneShot" # What to name your model save | ||
training_audio_dir = "data/one_shots" # Your training data path | ||
compiled_data_path = "data/compiled_data.npy" # Your compiled data/output path | ||
model_save_path = f"{outputs_dir}/{model_save_name}.pth" | ||
self.model_save_name = ( | ||
"StereoSampleGAN-InstrumentOneShot" # What to name your model save | ||
) | ||
self.training_audio_dir = "data/one_shots" # Your training data path | ||
self.compiled_data_path = ( | ||
"data/compiled_data.npy" # Your compiled data/output path | ||
) | ||
self.model_save_path = f"{self.outputs_dir}/{self.model_save_name}.pth" | ||
|
||
# Generating audio | ||
model_to_generate_with = model_save_path # Generation model path | ||
generated_audio_name = "generated_audio" # Output file name | ||
visualize_generated = True # Show generated audio spectrograms | ||
# Generating audio | ||
self.model_to_generate_with = self.model_save_path # Generation model path | ||
self.generated_audio_name = "generated_audio" # Output file name | ||
self.visualize_generated = True # Show generated audio spectrograms |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.