Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

real-time processing for FRCRN_SE_16K ? #23

Open
vctu6 opened this issue Dec 9, 2024 · 4 comments
Open

real-time processing for FRCRN_SE_16K ? #23

vctu6 opened this issue Dec 9, 2024 · 4 comments

Comments

@vctu6
Copy link

vctu6 commented Dec 9, 2024

Hi, thanks for sharing this great piece of work! Regarding ClearVoice, the documentation seems to suggest that the trained model FRCRN_SE_16K is the same used for the IEEE ICASSP 2022 DNS Challenge. Is this so? How is it possible? First, the code running FRCRN_SE_16K is an offline process (big chunks of audio are processed at once), while the referenced DNS Challenge would involve real-time processing. Also, the code here uses input windows of 40 ms, while 20 ms is used in the paper that you reference in the documentation... Do you plan to add also the model actually used for the DNS challenge, and possibly an example running it in a real-time framework (i.e., frame-by-frame processing instead of big chunks of audio)? Thanks!

@alibabasglab
Copy link
Collaborator

Hi, thank you for your feedback, this releasing currently focuses on the performance refined on large training data, we have made changes on FRCRN structure for better performance on 16K audio. For 48K audio, please try MossFormer2_SE_48K model.

@vctu6
Copy link
Author

vctu6 commented Dec 10, 2024

and for real-time processing?

vishwamartur added a commit to vishwamartur/ClearerVoice-Studio that referenced this issue Dec 10, 2024
Related to modelscope#23

Add real-time processing support for the FRCRN_SE_16K model.

* **clearvoice/models/frcrn_se/frcrn.py**
  - Add a new method `real_time_process` to the `FRCRN_SE_16K` class for real-time processing.
  - Modify the `forward` method to support both offline and real-time processing.
  - Update the `DCCRN` class to handle real-time processing.

* **clearvoice/config/inference/FRCRN_SE_16K.yaml**
  - Change `win_len` to 320 to use 20 ms input windows.
  - Change `win_inc` to 160 to use 20 ms input windows.

* **clearvoice/demo.py**
  - Add a new demo case for real-time processing using the `FRCRN_SE_16K` model.

* **clearvoice/demo_with_more_comments.py**
  - Add a new demo case for real-time processing using the `FRCRN_SE_16K` model.
@alibabasglab
Copy link
Collaborator

Thank vishwamartur for the new addings. Meantime, we have released a 48K real-time model on ModelScope. Please have a check of it: https://modelscope.cn/models/iic/speech_dfsmn_ans_psm_48k_causal

@vctu6
Copy link
Author

vctu6 commented Dec 12, 2024

@alibabasglab thank you for sharing the causal 48k model! This works! Too bad it uses double window size (hence, double latency) than the model described in the paper, still very interesting to see a frame-in-frame-out implementation.

@vishwamartur your implementation does not work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants