EMMA-Ferret

Single stream EMMA framework used for Ferret Stimuli Analysis. For detailed experiments and insights please read our article

Introduction

This repository implements a computational model inspired by the Explicit Memory Multi-resolution Adaptive (EMMA) framework for speech segregation. The model is designed to explore and provide a framework for understanding the role of temporal coherence in segregating complex auditory scenes, as observed in biological experiments.

Overview

The model aims to segregate a target voice from a mixture of voices by incorporating a directive or attentional focus, similar to how ferrets were trained to direct their attention to a specific voice. The model leverages two key principles to achieve this:

High-dimensional Nonlinear Mapping: The input voice mixture is mapped to a nonlinear high-dimensional space, simulating the diverse selectivity of cortical neurons to attributes like frequency, pitch, and location. This stage is implemented using deep-learning neural embeddings referred to as pre-attentional model embeddings (Mp), which capture all the characteristics of the input mixture.

Attentional Focus for Voice Segregation: The attentional focus stage (Ma) selectively enhances the representation of the attended target voice while suppressing other competing sources. This is achieved by aligning the Mp embeddings based on their temporal coherence with the target voice. This stage enables the model to gate the Mp embeddings in a way that emphasizes the target voice features, allowing for successful segregation.

Usage

To run this code, follow these steps:

Clone the repository.
Create a new conda env using the environment.yml and set path varaibles in dataloader.py.
Set wandb_api_key in script train_model.py and run.
For analysis scripts refer to the folder 'analysis'

For pre trained model checkpoints and stimuli data please reach out to Karan Thakkar ([email protected]) or Prof Elhilali ([email protected])

References

  title={Explicit-memory multiresolution adaptive framework for speech and music separation},
  author={Bellur, Ashwin and Thakkar, Karan and Elhilali, Mounya},
  journal={EURASIP Journal on Audio, Speech, and Music Processing},
  volume={2023},
  number={1},
  pages={20},
  year={2023},
  publisher={Springer}
}

  title={Temporal Coherence Shapes Cortical Responses to Speech Mixtures in a Ferret Cocktail Party},
  author={Joshi, Neha and Ng, Yu and Thakkar, Karran and Duque, Daniel and Yin, Pingbo and Fritz, Jonathan and Elhilali, Mounya and Shamma, Shihab},
  journal={bioRxiv},
  pages={2024--05},
  year={2024},
  publisher={Cold Spring Harbor Laboratory}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
analysis		analysis
data_spectrograms		data_spectrograms
model		model
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
Model.png		Model.png
README.md		README.md
config-128.yaml		config-128.yaml
config-64.yaml		config-64.yaml
config.py		config.py
config.yaml		config.yaml
dataset.py		dataset.py
environment.yml		environment.yml
logs.txt		logs.txt
test.py		test.py
train_model.py		train_model.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EMMA-Ferret

Introduction

Overview

Usage

References

About

Releases

Packages

Languages

License

JHU-LCAP/EMMA-Ferret

Folders and files

Latest commit

History

Repository files navigation

EMMA-Ferret

Introduction

Overview

Usage

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages