Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvements #21

Merged
merged 46 commits into from
Jan 24, 2025
Merged
Changes from 1 commit
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
b976705
wording
shuklabhay Oct 6, 2024
9794e83
feature engineering
shuklabhay Oct 8, 2024
2931c95
istft stuff
shuklabhay Oct 9, 2024
7a1285a
model architecutre
shuklabhay Oct 9, 2024
197f963
fix stuff
shuklabhay Oct 9, 2024
8e3d993
training loop
shuklabhay Oct 12, 2024
8f6bc12
more outlining
shuklabhay Oct 15, 2024
aca7360
wording
shuklabhay Oct 16, 2024
da74f31
results conclusion
shuklabhay Oct 16, 2024
b51f1f7
update wording and stuff
shuklabhay Oct 18, 2024
418d49c
outline nitro
shuklabhay Oct 19, 2024
b498a41
checkkerboard reference
shuklabhay Oct 21, 2024
e6fc7b7
intro, other wording
shuklabhay Oct 21, 2024
b45e567
change org :(
shuklabhay Oct 21, 2024
946c4d6
organize headers
shuklabhay Oct 22, 2024
7b2a680
related works
shuklabhay Oct 22, 2024
1ee197c
wavenet related works
shuklabhay Oct 23, 2024
cdb0cb3
wavegan explaination
shuklabhay Oct 24, 2024
15304c0
update description
shuklabhay Oct 24, 2024
53997ba
update results and stuff
shuklabhay Oct 24, 2024
c93cb89
talk abt gan
shuklabhay Oct 25, 2024
6be57be
wording stuff
shuklabhay Oct 25, 2024
69d9f66
mel based spec representation
shuklabhay Oct 27, 2024
099209a
remove 1k artifact
shuklabhay Oct 27, 2024
e210fba
CURATED KICK MODEL
shuklabhay Oct 28, 2024
a9b00d9
massive cleanup
shuklabhay Oct 29, 2024
305850f
lint
shuklabhay Oct 29, 2024
a7df899
resolve helper imports
shuklabhay Oct 29, 2024
ab70108
implement resize conv
shuklabhay Oct 30, 2024
0e35f58
wording
shuklabhay Nov 5, 2024
5fe2d29
Snare model prereqs & Abstract
shuklabhay Nov 8, 2024
b5fc84f
snare model
shuklabhay Nov 8, 2024
ce80a85
dataset description
shuklabhay Nov 8, 2024
aa54d44
feature eng part 1
shuklabhay Nov 13, 2024
8aebdbd
lil note
shuklabhay Nov 14, 2024
224aff6
wording stuff
shuklabhay Nov 18, 2024
7974f30
remove old
shuklabhay Nov 18, 2024
87ee183
feature eng
shuklabhay Nov 18, 2024
0f091e4
visualize bool
shuklabhay Nov 22, 2024
4c393d6
cleanup and write stuff
shuklabhay Nov 22, 2024
37b90f0
architecture and stuff
shuklabhay Nov 22, 2024
654b1dd
update lint action
shuklabhay Nov 24, 2024
5f77126
wording
shuklabhay Nov 25, 2024
78a2c1a
spellig
shuklabhay Nov 25, 2024
9e719be
cleanup ish
shuklabhay Nov 25, 2024
94e3bd5
wording
shuklabhay Nov 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
lil note
shuklabhay committed Nov 14, 2024

Verified

This commit was signed with the committer’s verified signature.
GromNaN Jérôme Tamarelle
commit 8aebdbde89cf9a7c18aff7e07ead6a723856479b
2 changes: 2 additions & 0 deletions paper/paper.md
Original file line number Diff line number Diff line change
@@ -52,6 +52,8 @@ When converting generated audio representations to audio, this process occurs in

This work utilizes a GAN architecture to create high-fidelity audio, exploiting adversaial loss to promote realism and detail within generated audio. To address the GANs training instability, this work utilizes a Wasserstein GAN and gradient penalty (WGAN-GP). The Wasserstein distance provides a stable measure of divergence between real and generated audio distributions compared to typical GAN loss functions, and minimizing this distance through the WGAN-GP framework empircally improves training stability and promotes convergence. In this work, the switch to a WGAN architecture from a standard GAN was instrumental in creating a model that could consistently converge to model that generated actual audio over noise.

VISION TRANSFORMER

The final generator passes 128 latent dimensions into six transpose convolution blocks blocks, the first five consisting each of a 2D transpose convolution and batch normalization followed by a Leaky ReLU activation and dropout layer. The final block contains a 2D transpose convolution and hyperbolic tangent activation, creating a 256 by 256 representation of audio with values between -1 to 1.

The Critic consists of six convolution blocks, converting a 256 by 256 representation of audio to a single value, an approximation of the wasterstien distance. The critic utilizes seven 2D convolution blocks with spectral normalization with to stabilize training, batch normalization, a Leaky ReLU activation, and a dropout layer, except for the first layer which does not utilize batch normalization and the third layer which includes a Linear Attention mechanism to assist the model in understanding contextual relationships in feature maps and prevenent the checkerboard issue audio generation is often plagued with. After these operations, a final 2D convolution with spectral normalization is applied and the result is flattened, returning single value wasserstein distance approximations.