Raw Waveform Based End-to-End Deep Convolutional Network For Spatial Localization Of Multiple ACcoustic Sources
This repository holds code for the final year thesis project at PESIT - Bangalore South Campus.
In this project we will implement an end-to-end deep convolutional neural network operating on multi-channel raw audio data to localize multiple simultaneously active acoustic sources in space as proposed by Harshavardhan Sundar, Weiran Wang, Ming Sun and Chao Wang in [1]. It makes use of a novel encoding scheme to represent the spatial coordinates of multiple sources, which facilitates 2D localization of multiple sources in an end-to-end fashion. We aim to experiment and test our implementation of this novel method.
[1]: Harshavardhan Sundar, Weiran Wang, Ming Sun, and Chao Wang. 2020. Raw waveform based end-to-end deep convolutional network for spatial localization of multiple acoustic sources. In Proceedings of IEEE ICASSP, Barcelona, Spain, May 4--8, 2020
[2]: E.A.P. Habets, “Room impulse response (RIR) generator,”
Sep. 2010.
[3]: Jongpil Lee, Jiyoung Park, Keunhyoung Luke Kim, and Juhan Nam, “Sample-level deep convolutional neural networks for music auto-tagging using raw waveforms,” in Sound and Music Computing Conference (SMC), 2017.