audax

Sponsors
About
Installation
Data pipeline
What's available
What's coming up
On contributing
References

About

A home for audio ML in JAX. Has common features, popular learnable frontends, and pretrained supervised and self-supervised models. As opposed to popular frameworks, the objective is not to become an end-to-end, end-all-be-all DL framework, but instead to act as a starting point for doing things the jax way, through reference implementations and recipes, using the jax / flax / optax stack.

PS: I'm quite new to using Jax and it's functional-at-heart design, so I admit the code can be a bit untidy at places. Expect changes, restructuring, and like the official Jax repository itself says, sharp edges!

Installation

pip install audax

To install from the latest source use following command

git clone https://github.com/SarthakYadav/audax.git
cd audax
pip install -r requirements.txt
pip install .

A colab installation walkthrough can be found here

Data pipeline

All training is done on custom TFRecords. Initially tried using tensorflow-datasets, but decided against it.
tfrecords comprise of examples with audio file stored as an encoded PCM_16 flac buffer, label info and duration, resulting in smaller tfrecord files and faster I/O as compared to storing audio as a sequence of floats.
A step-by-step guide to setup data can be found in the recipes/data_prep, including sample script to convert data into tfrecords.
More info could be found in audax.training_utils.data_v2

What's available

Audio feature extraction

At the time of writing, jax.signal does not have a native Short-time Fourier Transform (stft) implementation.

Instead of trying to emulate the scipy.signal implementation that has a lot more bells and whistles and is more feature packed, the stft implementation in audax.core is designed such that it can be build upon to extract spectrogram and melspectrogram features as those found in torchaudio, which are quite popular. The result is a simple implementation of stft, spectrogram and melspectrogram, which are compatible with their torchaudio counterparts, as shown in the figure below.

Currently, spectrogram and melspectrogram features are supported. Visit audax.core.readme for more info.

Apart from features, jax.vmap compatible mixup and SpecAugment (no TimeStretch as of now unfortunately) implementations are also provided.

Network architectures

Several prominent neural network architecture reference implementations are provided, with more to come. The current release has:

Pretrained models can be found in respective recipes, and expect more to be added soon.

Learnable frontends

Two popular learnable feature extraction frontends are available in audax.frontends LEAF [4] and SincNet [5]. Sample recipes, as well as pretrained models (AudioSet for now) can be found in the recipes/leaf.

Self-supervised models

COLA [6] models on AudioSet for various aforementioned architectures can be found in recipes/cola.
A working implementation of SimCLR [7, 8] can be found in recipes/simclr, and pretrained models will be added soon (experiments ongoing!).

What's coming up

Pretrained COLA models and linear probe experiments. (VERY SOON!)
Better documentation and walk-throughs.
Pretrained SimCLR models.
Recipes for Speaker Recognition on VoxCeleb
More AudioSet pretrained checkpoints for architectures already added.
Reference implementations for more neural architectures, esp. Transformer based networks.

On contributing

At the time of writing, I've been the sole person involved in development of this work, and quite frankly, would love to have help!
Happy to hear from open source contributore, both newbies and experienced, about their experience and needs
Always open to hearing about possible ways to clean up/better structure code.

References

[1] He, K., Zhang, X., Ren, S. and Sun, J., 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
[2] Tan, M. and Le, Q., 2019, May. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105-6114). PMLR.
[3] Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T. and Xie, S., 2022. A ConvNet for the 2020s. arXiv preprint arXiv:2201.03545.
[4] Zeghidour, H., Teboul, O., Quitry, F., and Tagliasacchi, M., LEAF: A Learnable Frontend for Audio Classification, In International Conference on Learning Representations, 2021.
[5] Ravanelli, M. and Bengio, Y., 2018, December. Speaker recognition from raw waveform with sincnet. In 2018 IEEE Spoken Language Technology Workshop (SLT) (pp. 1021-1028). IEEE.
[6] Saeed, A., Grangier, D. and Zeghidour, N., 2021, June. Contrastive learning of general-purpose audio representations. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 3875-3879). IEEE.
[7] Chen, T., Kornblith, S., Norouzi, M. and Hinton, G., 2020, November. A simple framework for contrastive learning of visual representations. In International conference on machine learning (pp. 1597-1607). PMLR.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
audax		audax
misc_files		misc_files
recipes		recipes
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

audax

Sponsors

About

Installation

Data pipeline

What's available

Audio feature extraction

Network architectures

Learnable frontends

Self-supervised models

What's coming up

On contributing

References

About

Releases

Packages

Contributors 2

Languages

License

SarthakYadav/audax

Folders and files

Latest commit

History

Repository files navigation

audax

Sponsors

About

Installation

Data pipeline

What's available

Audio feature extraction

Network architectures

Learnable frontends

Self-supervised models

What's coming up

On contributing

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages