[Enhancement] Pre-process Audio #66

dnhkng · 2024-06-20T06:20:12Z

There is an interesting set of Nvidia models for audio processing:
https://docs.nvidia.com/deeplearning/maxine/audio-effects-sdk/index.html

Of particular interest are:

Noise Removal/Denoising
Acoustic Echo Cancellation (AEC)

The first could clean up input audio before it gets passed to ASR. The second takes two streams, the recorded audio and Glados's voice output, and could help remove her voice from what she's listening too. This would improve the 'interruption' feature significantly!

Looking for coders to help out with this!

codearranger · 2024-06-20T07:16:13Z

The docs say it needs CUDA 11.8+

This will need to be upgraded:
https://github.com/dnhkng/GlaDOS/blob/main/Dockerfile#L1

dnhkng · 2024-06-20T07:57:31Z

@joecryptotoo Do you want to try working on this?

codearranger · 2024-06-20T13:51:51Z

@dnhkng I don't think I have the skills for this one, but I'm happy to help research it.

codearranger · 2024-06-20T13:53:43Z

https://github.com/NVIDIA/MAXINE-AFX-SDK/tree/v1.3.0

dnhkng · 2024-06-20T13:55:49Z

That repo looks surprisingly old for audio AI!

codearranger · 2024-06-20T14:01:39Z

This microservice might do the job.

https://catalog.ngc.nvidia.com/orgs/nvidia/teams/maxine/helm-charts/ucf-audio-multistream

codearranger · 2024-06-22T10:54:05Z

I tried to download the SDK for this thing so I can run it on my Linux server on my RTX 4090 but it looks like they only support these Nvidia products on Linux: a40/a30/a2/v100/a10/t4/a16/a100

while any RTX GPU can be used on Windows

Supported Hardware
Windows SDK: NVIDIA GeForce RTX 20XX and 30XX Series, Quadro RTX 3000, TITAN RTX, or higher (any NVIDIA GPUs with Tensor Cores)
Server SDK (Linux): V100, T4, A2, A10, A16, A30, A40, A100 (with MIG support)
Support for Ada-generation GPUs for Windows SDKs

codearranger · 2024-06-22T12:02:06Z

I found a discussion about this on reddit here: https://www.reddit.com/r/linux/comments/vs9pdd/for_those_who_also_want_nvidia_rtx_voice_on_linux/

Someone suggested this project as an alternative:

https://github.com/Rikorose/DeepFilterNet

lawrenceakka · 2024-06-23T08:21:49Z

See also https://pypi.org/project/audio-denoiser/

milsun · 2024-08-01T09:11:22Z

this is a really cool feature in my opinion, any luck so far?

milsun · 2024-08-03T08:58:06Z

would denoising or echo cancellation type approach really solve this issue?

dnhkng · 2024-08-03T12:14:23Z

The best option is to use a conference room microphone/speaker combo. They do hardware noice cancellation, and dont waste GPU VRAM on a signal processing model.

codearranger mentioned this issue Aug 19, 2024

Noise removal huggingface/speech-to-speech#17

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Pre-process Audio #66

[Enhancement] Pre-process Audio #66

dnhkng commented Jun 20, 2024

codearranger commented Jun 20, 2024

dnhkng commented Jun 20, 2024

codearranger commented Jun 20, 2024 •

edited

Loading

codearranger commented Jun 20, 2024

dnhkng commented Jun 20, 2024

codearranger commented Jun 20, 2024

codearranger commented Jun 22, 2024 •

edited

Loading

codearranger commented Jun 22, 2024

lawrenceakka commented Jun 23, 2024

milsun commented Aug 1, 2024

milsun commented Aug 3, 2024

dnhkng commented Aug 3, 2024

[Enhancement] Pre-process Audio #66

[Enhancement] Pre-process Audio #66

Comments

dnhkng commented Jun 20, 2024

codearranger commented Jun 20, 2024

dnhkng commented Jun 20, 2024

codearranger commented Jun 20, 2024 • edited Loading

codearranger commented Jun 20, 2024

dnhkng commented Jun 20, 2024

codearranger commented Jun 20, 2024

codearranger commented Jun 22, 2024 • edited Loading

codearranger commented Jun 22, 2024

lawrenceakka commented Jun 23, 2024

milsun commented Aug 1, 2024

milsun commented Aug 3, 2024

dnhkng commented Aug 3, 2024

codearranger commented Jun 20, 2024 •

edited

Loading

codearranger commented Jun 22, 2024 •

edited

Loading