more work

shuklabhay · Aug 19, 2024 · 2661e10 · 2661e10
1 parent 3490335
commit 2661e10
Show file tree

Hide file tree

Showing 2 changed files with 15 additions and 11 deletions.
diff --git a/paper/main.md b/paper/main.md
@@ -1,21 +1,29 @@
-# Kick it Out: Audio Generation With a Deep Convolution Generative Network
+# Kick it Out: Generating Kick Drum Samples With a Deep Convolution Generative Network
 
 Abhay Shukla\
 [email protected]\
 Continuation of UCLA COSMOS 2024 Research
 
 ## Abstract
 
-Generative adversarial networks have been used to much sucess for generating images
-
 ## Introduction
 
+[what are kick drums]
+
+Generative Adversarial Networks (GANs) have changed the landscape of the machine learning community, reaching new bounds in image generation [cite] and more recently natural language and audio generation [cite]. These audio generative models often employ Deep Convolutional GANs (DCGANs) to create spectrogram representations of audio.
+
+This work aims specifically to generate kick drums using a similar DCGAN approach.
+
+Investigation tries to determine if dcgans can learn to recognize and replicate the spatial patterns and nonspatial distributions of a kick drum.
+
 ## Background
 
 ## Methodology
 
 ## Results
 
+compare w/ wavegan?? ig
+
 ## Discussion
 
 ## Conclusion
@@ -28,17 +36,13 @@ https://arxiv.org/abs/1511.06434
 similar result to me
 https://openaccess.thecvf.com/content_CVPR_2020/papers/Durall_Watch_Your_Up-Convolution_CNN_Based_Generative_Deep_Neural_Networks_Are_CVPR_2020_paper.pdf
 
-also talk abt like wavenet as other ideas for models
-
-i have to be doing something wrong. it has to be doable. quickly just check it all make sure theres noooothing more i can do bc im sure its possible just limitations here idk what else i can do to improve model or wtv. there has to be some way to improve at least get better, allthe changes i made should be making it better bruh.
-
-- go thru code and like clean up vars/make naming consistent (moreso helpers)
+- TRY USING WASSERSTEIN LOSS SEE IF THERES ANY IMPROVEMENT IF THERE IS THEN HELL YEAH NEW NOVEL KICK DRUM GEN YAY!
 - see if theres anything else i know that can be improved/possible source of error (prob not, but there has to be something it should be better w/ changes i made not "worse" its back to noise)
 - at most spend today doing this but thats it. paper has to happen now.
 
 STRUCTURE OF A PAPER (claude generated)
 
-1.
+1. title: done
 2. Abstract: A brief summary of your paper, including the problem, methods, key results, and conclusions.
 3. Introduction: Present the research problem, its importance, and your objectives.
 4. Background/Literature Review: Provide context on deep convolution and its applications in audio generation. Review relevant previous work.

diff --git a/src/utils/helpers.py b/src/utils/helpers.py
@@ -17,12 +17,12 @@
 AUDIO_SAMPLE_LENGTH = 0.5  # 500 ms
 GLOBAL_SR = 44100
 N_CHANNELS = 2  # Left, right
-N_FRAMES = 352
+N_FRAMES = 176
 N_FREQ_BINS = 257
 
 # Initialize STFT Object
 GLOBAL_WIN = 2**9
-GLOBAL_HOP = 2**6
+GLOBAL_HOP = 2**7
 win = scipy.signal.windows.kaiser(GLOBAL_WIN, beta=14)
 STFT = scipy.signal.ShortTimeFFT(
     win=win, hop=GLOBAL_HOP, fs=GLOBAL_SR, scale_to="magnitude"