real-time processing for FRCRN_SE_16K ? #23

vctu6 · 2024-12-09T15:57:00Z

Hi, thanks for sharing this great piece of work! Regarding ClearVoice, the documentation seems to suggest that the trained model FRCRN_SE_16K is the same used for the IEEE ICASSP 2022 DNS Challenge. Is this so? How is it possible? First, the code running FRCRN_SE_16K is an offline process (big chunks of audio are processed at once), while the referenced DNS Challenge would involve real-time processing. Also, the code here uses input windows of 40 ms, while 20 ms is used in the paper that you reference in the documentation... Do you plan to add also the model actually used for the DNS challenge, and possibly an example running it in a real-time framework (i.e., frame-by-frame processing instead of big chunks of audio)? Thanks!

alibabasglab · 2024-12-10T10:01:13Z

Hi, thank you for your feedback, this releasing currently focuses on the performance refined on large training data, we have made changes on FRCRN structure for better performance on 16K audio. For 48K audio, please try MossFormer2_SE_48K model.

vctu6 · 2024-12-10T10:58:00Z

and for real-time processing?

Related to modelscope#23 Add real-time processing support for the FRCRN_SE_16K model. * **clearvoice/models/frcrn_se/frcrn.py** - Add a new method `real_time_process` to the `FRCRN_SE_16K` class for real-time processing. - Modify the `forward` method to support both offline and real-time processing. - Update the `DCCRN` class to handle real-time processing. * **clearvoice/config/inference/FRCRN_SE_16K.yaml** - Change `win_len` to 320 to use 20 ms input windows. - Change `win_inc` to 160 to use 20 ms input windows. * **clearvoice/demo.py** - Add a new demo case for real-time processing using the `FRCRN_SE_16K` model. * **clearvoice/demo_with_more_comments.py** - Add a new demo case for real-time processing using the `FRCRN_SE_16K` model.

alibabasglab · 2024-12-10T23:58:04Z

Thank vishwamartur for the new addings. Meantime, we have released a 48K real-time model on ModelScope. Please have a check of it: https://modelscope.cn/models/iic/speech_dfsmn_ans_psm_48k_causal

vctu6 · 2024-12-12T14:08:12Z

@alibabasglab thank you for sharing the causal 48k model! This works! Too bad it uses double window size (hence, double latency) than the model described in the paper, still very interesting to see a frame-in-frame-out implementation.

@vishwamartur your implementation does not work.

vishwamartur mentioned this issue Dec 10, 2024

Add real-time processing for FRCRN_SE_16K #26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

real-time processing for FRCRN_SE_16K ? #23

real-time processing for FRCRN_SE_16K ? #23

vctu6 commented Dec 9, 2024

alibabasglab commented Dec 10, 2024

vctu6 commented Dec 10, 2024

alibabasglab commented Dec 10, 2024

vctu6 commented Dec 12, 2024

real-time processing for FRCRN_SE_16K ? #23

real-time processing for FRCRN_SE_16K ? #23

Comments

vctu6 commented Dec 9, 2024

alibabasglab commented Dec 10, 2024

vctu6 commented Dec 10, 2024

alibabasglab commented Dec 10, 2024

vctu6 commented Dec 12, 2024