-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
c090a62
commit fcaacad
Showing
9 changed files
with
168 additions
and
58 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,3 @@ | ||
__pycache__/* | ||
notebooks/__pycache__/* | ||
data_processing/__pycache__/* | ||
data_processing/.ipynb_checkpoints/* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,24 @@ | ||
# VAD | ||
VAD Challenge - Sonos | ||
## VAD Challenge - Sonos | ||
|
||
This project implements a Voice Activity Detection algorithme based on the paper: __Sofer, A., & Chazan, S. E. (2022). CNN self-attention voice activity detector. arXiv preprint arXiv:2203.02944.__ | ||
|
||
The `Data Processing` folder contains primarly a Notebook that has been used for the processing of the annoation and the data aumentation. | ||
`data_utils.py` contains helper functions for annotation processing. | ||
`energy_vad.py` contains code that was not written by myself. It is an implementaion of an energy-based VAD found on [GitHub](https://github.com/idnavid/py_vad_tool) that I used to extract noise signals from Librispeech samples. | ||
|
||
The algorithm training pipeline is organised as follows: | ||
|
||
- `data.py` implements the PyTorch dataset together with the Lightning DataModule | ||
- `modules.py` implements the PyTorch neural network model | ||
- `model.py` implements the Lightning Module for training | ||
- `train.py` is the main script to start a training | ||
- `inference.py` is a simple script to test the model on real-world audio | ||
- `config` folder regroupe YAML file for experiment hyperparamters. The `baseline_sa_cnn.yml` is the hyperparameters set as described in the paper, while `128_mels.yml` is a slightly modified version. | ||
|
||
You will also find some artifacts created after the training : | ||
|
||
- `checkpoints` folder is the saved model checkpoints, containing weights, optimizer state, hyperparams... | ||
- `tb_logs` contains the Tensorboard logs | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
data: | ||
batch_size: 128 | ||
data_dir: /home/virgile/data/vad_data_augmented | ||
hop_length: 512 | ||
n_frames: 256 | ||
n_mels: 128 | ||
n_workers: 4 | ||
nfft: 1048 | ||
norm: false | ||
pin_memory: false | ||
sr: 16000 | ||
valid_percent: 0.85 | ||
model: | ||
cnn_channels: 32 | ||
dff: 512 | ||
embed_dim: 256 | ||
n_feat: 128 | ||
num_heads: 16 | ||
model_checkpoint: | ||
filename: VAD-{epoch:02d} | ||
monitor: val_loss | ||
save_last: true | ||
trainer: | ||
accumulate_grad_batches: 1 | ||
auto_lr_find: false | ||
fast_dev_run: false | ||
gpus: '1' | ||
max_epochs: 100 | ||
precision: 32 | ||
profiler: false | ||
val_check_interval: 1.0 | ||
training: | ||
lr: 0.0003 | ||
optim: Adam | ||
weight_decay: 1.0e-05 | ||
xp_config: | ||
dataset: sonos-vad | ||
model_type: VAD |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.