Skip to content

Commit

Permalink
work on paper more
Browse files Browse the repository at this point in the history
  • Loading branch information
shuklabhay committed Aug 20, 2024
1 parent d9375fa commit b5c5fbc
Showing 1 changed file with 21 additions and 7 deletions.
28 changes: 21 additions & 7 deletions paper/main.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,22 +10,36 @@ Continuation of UCLA COSMOS 2024 Research

Since their introduction, CNN based Generative Adversarial Networks (DCGANs) have vastly increased the capabilites of machine learning models, allowing high-fidelity synthetic image generation [1], but requiring aditional optimizations for audio generation [2]. To generate high quality audio, models must capture temporal relationships and spectral characteristcs, and be able to replicate it without inconsistencies that could go unnoticed in an image and be apparent in audio. Accounting for these complexities requires additional modifications and straying away from the pure DCGAN architecture. This work attempts to recognize the limitations of audio representation generation using **only** Deep Convolution in a Generative Network.

This project uses kick drums as the sound to generate since they best fit the criteria of complexity, tonality, length, and temporal patterns. Kick drums are also an integral part of digital audio production and the foundational element of almost every song and drumset. Due to their importance, finding a large quantitity of high quality, unique kick drum samples is often a problem in the digital audio production enviroment. The characteristics of the audio that we are looking to replicate are the following:

- A 500 milisecond long audio sample
- An atonal transient “click” at the beginning of the generated audio incorporating most of the frequency spectrum
- A sustained, decaying low "rumble" following the transient of the sample
This project uses kick drums as the sound to generate since they best fit the criteria of complexity, tonality, length, and temporal patterns. Kick drums are also an integral part of digital audio production and the foundational element of almost every song and drumset. Due to their importance, finding a large quantity of high quality, unique kick drum samples is often a problem in the digital audio production enviroment.

This investigation specifically seeks to determine how feasible it can be to use a DCGAN Architecture to recognize and replicate the spatial patterns and temporal patterns of an image representation of a kick drum. We will also experiment with pure sine wave validation at one frequency.

## Methodology

## Data
### Data Collection and Processing

Training data is first sourced from digital production “sample packs” compiled by various parties. These packs contain various amounts of analog, cinematic, heavy, and edm kick drum samples, providing a wholstic yet random selection of kick drums that loosely contain the same characteristics.

The goal of this model is to replicate the following characteristics of a kick drum:

- A specific length audio sample
- An atonal transient “click” at the beginning of the generated audio incorporating most of the frequency spectrum
- A sustained, decaying low "rumble" following the transient of the sample
- The overall "decaying" nature of a kick drum

This work uses 7856 data points split into batches of eight. Each audio sample is normalized to a length of 500 miliseconds and then

Sooo
old:
Every audio clip is normalized to 500 ms and passed into a Short-time Fourier Transform, returning amplitudes for frequency bins at every frame of audio for each channel. To amplify audio features, the amplitude data is then passed through a noise floor, decibel scaled, and rescaled to be between -1 and 1.

### Model Architecture

## Results

### Kick Drum Generation

### Sine Validation

## Discussion

## Conclusion
Expand Down

0 comments on commit b5c5fbc

Please sign in to comment.