LLaSE: Maximizing Acoustic Preservation for LLaMA-based Speech Enhancement

Boyi Kang*¹, Xinfa Zhu*¹, Zihan Zhang¹, Zhen Ye², Ziqian Wang¹, Wei Xue², Lei Xie¹
¹ Audio, Speech and Language Processing Group (ASLP@NPU),
School of Computer Science, Northwestern Polytechnical University, Xi’an, China
² The Hong Kong University of Science and Technology

Abstract

Language Models (LMs) have shown strong semantic understanding and contextual modeling capabilities, which have recently flourished in generative speech enhancement. However, most LM-based speech enhancement approaches focus on semantic information while ignoring the key vital of acoustic information, which leads to acoustic inconsistency after enhancement, including speaker timbre varaitions and intonation. This paper proposes LLaSE, a LLaMA-based language model for Speech Enhancement. To address the challenge of acoustic inconsistency, LLaSE takes continuous representations from WavLM as input and predicts speech tokens from XCodec2, a recently released efficient Codec, maximizing acoustic preservation. Experimental results demonstrate that LLaSE achieves state-of-the-art performance on speech enhancement, offering a robust and scalable solution for speech denoising and quality improvement.

Demo Page

Demo Page: https://kevin-naticl.github.io/LLaSE-Demopage/

DNSMOS results on DNS Challenge testset

Model	Type	Testset	SIG	BAK	OVRL
Unprocessed	-	syn_with_reverb	1.76	1.50	1.39
		syn_no_reverb	3.39	2.62	2.48
		real_recording	3.05	2.51	2.26
Conv-TasNet	Discriminative	syn_with_reverb	2.42	2.71	2.01
		syn_no_reverb	3.09	3.34	3.00
		real_recording	3.10	2.98	2.41
DEMUCS	Discriminative	syn_with_reverb	2.86	3.90	2.55
		syn_no_reverb	3.58	4.15	3.35
		real_recording	3.26	4.03	2.99
FRCRN	Discriminative	syn_with_reverb	2.93	2.92	2.28
		syn_no_reverb	3.58	4.13	3.34
		real_recording	3.37	3.98	3.04
SELM	Generative	syn_with_reverb	3.16	3.58	2.70
		syn_no_reverb	3.51	4.10	3.26
		real_recording	3.59	3.44	3.12
MaskSR	Generative	syn_with_reverb	3.53	4.07	3.25
		syn_no_reverb	3.59	4.12	3.34
		real_recording	3.43	4.03	3.14
GENSE	Generative	syn_with_reverb	3.49	3.73	3.19
		syn_no_reverb	3.65	4.18	3.43
		real_recording	-	-	-
LLaSE	Generative	syn_with_reverb	3.59	4.10	3.33
		syn_no_reverb	3.65	4.17	3.43
		real_recording	3.50	4.10	3.24

Usage

1. Clone the Repo

git clone https://github.com/Kevin-naticl/LLaSE.git
cd LLaSE

2. Install Requirements

conda create -n LLaSE python=3.10
conda activate LLaSE
pip install -r requirements.txt

3. Download the Checkpoint from Hugging Face

You can use the provided shell script to download the checkpoint or manually download it from Hugging Face.

cd ckpt
bash download.sh

4. Inference

Provide the file list in ./config/test.yml.
Run the inference script:

bash inference.sh

The processed .wav files will be saved in ./decode/wav by default (16k sample rate).

Future Updates

A Python module will be available in the future.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
ckpt		ckpt
config		config
loader		loader
nnet		nnet
vq		vq
LLaSE.png		LLaSE.png
README.md		README.md
inference.py		inference.py
inference.sh		inference.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLaSE: Maximizing Acoustic Preservation for LLaMA-based Speech Enhancement

Abstract

Demo Page

DNSMOS results on DNS Challenge testset

Usage

1. Clone the Repo

2. Install Requirements

3. Download the Checkpoint from Hugging Face

4. Inference

Future Updates

About

Releases

Packages

Languages

Kevin-naticl/LLaSE

Folders and files

Latest commit

History

Repository files navigation

LLaSE: Maximizing Acoustic Preservation for LLaMA-based Speech Enhancement

Abstract

Demo Page

DNSMOS results on DNS Challenge testset

Usage

1. Clone the Repo

2. Install Requirements

3. Download the Checkpoint from Hugging Face

4. Inference

Future Updates

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages