Readme

This is the data simulation scirpt for paper "Target Sound Extraction (TSE) with Variable Cross-modality Clues".

How to use:

Clone this project: git clone --recursive https://github.com/LiChenda/Multi-clue-TSE-data.git
Install pytorch: pip install torch==1.8.2 torchvision==0.9.2 torchaudio==0.8.2 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu111
Install requirements: pip install -r requirements.txt
Download the AudioSet and AudioCaps dataset.
Run simulation script: python data_simulation.py
Prepare tag clues: python gen_tag_clue.py, the one-hot tag will be created in output/[train|val|test|unseen]/tag_onehot/.
Prepare text clues: python gen_text_clue.py .
Prepare visual clues: python gen_visual_clue.py .

Supported clues:

Tag clue
Video clue
Text clue

Citations:

@inproceedings{liTargetSoundExtraction2023a,
  title = {Target {{Sound Extraction}} with {{Variable Cross-Modality Clues}}},
  booktitle = {{{ICASSP}} 2023 - 2023 {{IEEE International Conference}} on {{Acoustics}}, {{Speech}} and {{Signal Processing}} ({{ICASSP}})},
  author = {Li, Chenda and Qian, Yao and Chen, Zhuo and Wang, Dongmei and Yoshioka, Takuya and Liu, Shujie and Qian, Yanmin and Zeng, Michael},
  year = {2023},
  month = jun,
  pages = {1--5},
  doi = {10.1109/ICASSP49357.2023.10095266},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Readme

How to use:

Supported clues:

Citations:

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

Readme

How to use:

Supported clues:

Citations: