Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Pre-process Audio #66

Open
dnhkng opened this issue Jun 20, 2024 · 12 comments
Open

[Enhancement] Pre-process Audio #66

dnhkng opened this issue Jun 20, 2024 · 12 comments

Comments

@dnhkng
Copy link
Owner

dnhkng commented Jun 20, 2024

There is an interesting set of Nvidia models for audio processing:
https://docs.nvidia.com/deeplearning/maxine/audio-effects-sdk/index.html

Of particular interest are:

  • Noise Removal/Denoising
  • Acoustic Echo Cancellation (AEC)

The first could clean up input audio before it gets passed to ASR. The second takes two streams, the recorded audio and Glados's voice output, and could help remove her voice from what she's listening too. This would improve the 'interruption' feature significantly!

Looking for coders to help out with this!

@codearranger
Copy link

The docs say it needs CUDA 11.8+

This will need to be upgraded:
https://github.com/dnhkng/GlaDOS/blob/main/Dockerfile#L1

@dnhkng
Copy link
Owner Author

dnhkng commented Jun 20, 2024

@joecryptotoo Do you want to try working on this?

@codearranger
Copy link

codearranger commented Jun 20, 2024

@dnhkng I don't think I have the skills for this one, but I'm happy to help research it.

@codearranger
Copy link

@dnhkng
Copy link
Owner Author

dnhkng commented Jun 20, 2024

That repo looks surprisingly old for audio AI!

@codearranger
Copy link

@codearranger
Copy link

codearranger commented Jun 22, 2024

I tried to download the SDK for this thing so I can run it on my Linux server on my RTX 4090 but it looks like they only support these Nvidia products on Linux: a40/a30/a2/v100/a10/t4/a16/a100

while any RTX GPU can be used on Windows

Supported Hardware
Windows SDK: NVIDIA GeForce RTX 20XX and 30XX Series, Quadro RTX 3000, TITAN RTX, or higher (any NVIDIA GPUs with Tensor Cores)
Server SDK (Linux): V100, T4, A2, A10, A16, A30, A40, A100 (with MIG support)
Support for Ada-generation GPUs for Windows SDKs

@codearranger
Copy link

I found a discussion about this on reddit here: https://www.reddit.com/r/linux/comments/vs9pdd/for_those_who_also_want_nvidia_rtx_voice_on_linux/

Someone suggested this project as an alternative:

https://github.com/Rikorose/DeepFilterNet

@lawrenceakka
Copy link
Contributor

@milsun
Copy link

milsun commented Aug 1, 2024

this is a really cool feature in my opinion, any luck so far?

@milsun
Copy link

milsun commented Aug 3, 2024

would denoising or echo cancellation type approach really solve this issue?

@dnhkng
Copy link
Owner Author

dnhkng commented Aug 3, 2024

The best option is to use a conference room microphone/speaker combo. They do hardware noice cancellation, and dont waste GPU VRAM on a signal processing model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants