Skip to content

Commit

Permalink
starting work
Browse files Browse the repository at this point in the history
  • Loading branch information
shuklabhay committed Aug 19, 2024
1 parent 1435a58 commit 3490335
Show file tree
Hide file tree
Showing 3 changed files with 95 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ Implementing Deep Convolution to generate audio using a generative network

## Directories

- `paper`: Research paper and static images
- `model`: Trained model and generated audio
- `src`: Model source code
- `utils`: Model and data utilities
60 changes: 60 additions & 0 deletions paper/main.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Kick it Out: Audio Generation With a Deep Convolution Generative Network

Abhay Shukla\
[email protected]\
Continuation of UCLA COSMOS 2024 Research

## Abstract

Generative adversarial networks have been used to much sucess for generating images

## Introduction

## Background

## Methodology

## Results

## Discussion

## Conclusion

## References

<a id="1">[1]</a> DCGAN paper methodology structure etc
https://arxiv.org/abs/1511.06434

similar result to me
https://openaccess.thecvf.com/content_CVPR_2020/papers/Durall_Watch_Your_Up-Convolution_CNN_Based_Generative_Deep_Neural_Networks_Are_CVPR_2020_paper.pdf

also talk abt like wavenet as other ideas for models

i have to be doing something wrong. it has to be doable. quickly just check it all make sure theres noooothing more i can do bc im sure its possible just limitations here idk what else i can do to improve model or wtv. there has to be some way to improve at least get better, allthe changes i made should be making it better bruh.

- go thru code and like clean up vars/make naming consistent (moreso helpers)
- see if theres anything else i know that can be improved/possible source of error (prob not, but there has to be something it should be better w/ changes i made not "worse" its back to noise)
- at most spend today doing this but thats it. paper has to happen now.

STRUCTURE OF A PAPER (claude generated)

1.
2. Abstract: A brief summary of your paper, including the problem, methods, key results, and conclusions.
3. Introduction: Present the research problem, its importance, and your objectives.
4. Background/Literature Review: Provide context on deep convolution and its applications in audio generation. Review relevant previous work.
5. Methodology: Describe your approach, including:

- Neural network architecture
- Dataset description
- Training process
- Evaluation metrics

6. Results: Present your findings, including:

- Performance metrics
- Audio samples (if possible)
- Comparisons with other methods

7. Discussion: Interpret your results, discuss limitations, and suggest future work.
8. Conclusion: Summarize your key findings and their implications.
9. References: List all sources cited in your paper.
34 changes: 34 additions & 0 deletions src/architecture.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,33 @@


# Model Components
class PhaseShuffleLayer(nn.Module):
def __init__(self, max_shift=2):
super(PhaseShuffleLayer, self).__init__()
self.max_shift = max_shift

def forward(self, x):
batch_size, channels, frames, freq = x.size()
shifts = torch.randint(
-self.max_shift,
self.max_shift + 1,
(batch_size, channels, 1, 1),
device=x.device,
)
shifts = shifts.expand(-1, -1, frames, freq)

idx = (
torch.arange(frames, device=x.device)
.unsqueeze(0)
.unsqueeze(0)
.unsqueeze(-1)
)
idx_shifted = idx + shifts
idx_shifted = torch.clamp(idx_shifted, 0, frames - 1)

return x.gather(2, idx_shifted)


class Generator(nn.Module):
def __init__(self):
super(Generator, self).__init__()
Expand All @@ -22,24 +49,31 @@ def __init__(self):
nn.ConvTranspose2d(512, 256, kernel_size=4, stride=2, padding=1),
nn.BatchNorm2d(256),
nn.ReLU(inplace=True),
PhaseShuffleLayer(max_shift=2),
nn.ConvTranspose2d(256, 128, kernel_size=4, stride=2, padding=1),
nn.BatchNorm2d(128),
nn.ReLU(inplace=True),
PhaseShuffleLayer(max_shift=2),
nn.ConvTranspose2d(128, 64, kernel_size=4, stride=2, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(inplace=True),
PhaseShuffleLayer(max_shift=2),
nn.ConvTranspose2d(64, 32, kernel_size=4, stride=2, padding=1),
nn.BatchNorm2d(32),
nn.ReLU(inplace=True),
PhaseShuffleLayer(max_shift=2),
nn.ConvTranspose2d(32, 16, kernel_size=4, stride=2, padding=1),
nn.BatchNorm2d(16),
nn.ReLU(inplace=True),
PhaseShuffleLayer(max_shift=2),
nn.ConvTranspose2d(16, 8, kernel_size=4, stride=2, padding=1),
nn.BatchNorm2d(8),
nn.ReLU(inplace=True),
PhaseShuffleLayer(max_shift=2),
nn.ConvTranspose2d(8, 4, kernel_size=4, stride=2, padding=1),
nn.BatchNorm2d(4),
nn.ReLU(inplace=True),
PhaseShuffleLayer(max_shift=2),
nn.ConvTranspose2d(4, N_CHANNELS, kernel_size=4, stride=2, padding=1),
nn.Upsample(
size=(N_FRAMES, N_FREQ_BINS), mode="bilinear", align_corners=False
Expand Down

0 comments on commit 3490335

Please sign in to comment.