Skip to content

VoiceRestore: Flow-Matching Transformers for Universal Speech Restoration

License

Notifications You must be signed in to change notification settings

sensein/voicerestore

 
 

Repository files navigation

VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration

VoiceRestore is a cutting-edge speech restoration model designed to significantly enhance the quality of degraded voice recordings. Leveraging flow-matching transformers, this model excels at addressing a wide range of audio imperfections commonly found in speech, including background noise, reverberation, distortion, and signal loss.

Demo of audio restorations: VoiceRestore

Credits: This repository is based on the E2-TTS implementation by Lucidrains

Example

Degraded Input:

Degraded Input

Degraded audio (reverberation, distortion, noise, random cut):

Note: Adjust your volume before playing the degraded audio sample, as it may contain distortions.

degraded.mp4

Restored (steps=32, cfg=1.0):

Restored

Restored audio - 16 steps, strength 0.5:

restored.mp4

Ground Truth:

Ground Truth


Key Features

  • Universal Restoration: The model can handle any level and type of voice recording degradation. Pure magic.
  • Easy to Use: Simple interface for processing degraded audio files.
  • Pretrained Model: Includes a 301 million parameter transformer model with pre-trained weights. (Model is still in the process of training, there will be further checkpoint updates)

Quick Start

  1. Clone the repository:

    git clone --recurse-submodules https://github.com/skirdey/voicerestore.git
    cd VoiceRestore

    if you did not clone with --recurse-submodules, you can run:

    git submodule update --init --recursive
  2. Install dependencies:

    pip install -r requirements.txt
  3. Download the pre-trained model and place it in the checkpoints folder.

  4. Run a test restoration:

    python audio_restoration_model.py --checkpoint ./checkpoints/voice-restore-20d-16h-optim.pt --input test_input.wav --output test_output.wav --steps 64 --cfg_strength 0.5

    This will process test_input.wav and save the result as test_output.wav.

Usage

To restore your own audio files:

from audio_restoration_model import VoiceRestoreModel

model = VoiceRestoreModel()
restored_audio = model.forward(input_audio, steps=32, cfg_strength=1.0)

Model Details

  • Architecture: Flow-matching transformer
  • Parameters: 300M+ parameters
  • Input: Degraded speech audio (various formats supported)
  • Output: Restored speech

Limitations and Future Work

  • Current model is optimized for speech; may not perform optimally on music or other audio types.
  • Ongoing research to improve performance on extreme degradations.
  • Future updates may include real-time processing capabilities.

Citation

If you use VoiceRestore in your research, please cite our paper:

@article{kirdey2024voicerestore,
  title={VoiceRestore: Flow-Matching Transformers for Speech Recording Quality Restoration},
  author={Kirdey, Stanislav},
  journal={arXiv},
  year={2024}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

About

VoiceRestore: Flow-Matching Transformers for Universal Speech Restoration

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%