From 121b3a6b9b4534c9a183aab71e2b248c3d664086 Mon Sep 17 00:00:00 2001 From: Julius Richter Date: Mon, 24 Jun 2024 13:50:08 +0200 Subject: [PATCH] clean up --- index.html | 268 ++++++++++++++++++++++++++++++++--------------------- 1 file changed, 162 insertions(+), 106 deletions(-) diff --git a/index.html b/index.html index c3ee1c8..bfd2333 100644 --- a/index.html +++ b/index.html @@ -67,11 +67,18 @@ } -

EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation

+

+ EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation +

-

Julius Richter, Yi-Chiao Wu, Steven Krenn, Simon Welker, Bunlong Lay, Shinji Watanabe, Alexander Richard, Timo Gerkmann

+

+ Julius Richter, Yi-Chiao Wu, Steven Krenn, Simon Welker, Bunlong Lay, Shinji Watanabe, Alexander Richard, Timo + Gerkmann +

-

Abstract

+

+ Abstract +

We release the EARS (Expressive Anechoic Recordings of Speech) dataset, a high-quality @@ -84,9 +91,14 @@

Abstract

Dataset download links and automatic evaluation server can be found online.

-

EARS Dataset

+

+ EARS Dataset +

-

The EARS dataset is characterized by its scale, diversity, and high recording quality. In Table 1, we list characteristics of the EARS dataset in comparison to other speech datasets.

+

+ The EARS dataset is characterized by its scale, diversity, and high recording quality. In Table 1, we list + characteristics of the EARS dataset in comparison to other speech datasets. +

@@ -150,99 +162,106 @@

EARS Dataset

- EARS contains 100 h of anechoic speech recordings at 48 kHz from over 100 English speakers with high demographic diversity. - The dataset spans the full range of human speech, including reading tasks in seven different reading styles, emotional reading - and freeform speech in 22 different emotions, conversational speech, and non-verbal sounds like laughter or coughing. Reading - tasks feature seven styles (regular, loud, whisper, fast, slow, high pitch, and low pitch). Additionally, the dataset features - unconstrained freeform speech and speech in 22 different emotional styles. We provide transcriptions of the reading portion and - meta-data of the speakers (gender, age, race, first language). + EARS contains 100 h of anechoic speech recordings at 48 kHz from over 100 English speakers with high demographic + diversity. The dataset spans the full range of human speech, including reading tasks in seven different reading + styles, emotional reading and freeform speech in 22 different emotions, conversational speech, and non-verbal sounds + like laughter or coughing. Reading tasks feature seven styles (regular, loud, whisper, fast, slow, high pitch, and + low pitch). Additionally, the dataset features unconstrained freeform speech and speech in 22 different emotional + styles. We provide transcriptions of the reading portion and meta-data of the speakers (gender, age, race, first + language).

-

Audio Examples

+

+ Audio Examples +

-

Here we present a few audio examples from the EARS dataset.

+

+ Here we present a few audio examples from the EARS dataset. +

-p002/emo_adoration_sentences.wav
- -
+ p002/emo_adoration_sentences.wav
+ +
-p008/emo_contentment_sentences.wav
- -
+ p008/emo_contentment_sentences.wav
+ +
-p010/emo_cuteness_sentences.wav
- -
+ p010/emo_cuteness_sentences.wav
+ +
-p011/emo_anger_sentences.wav
- -
+ p011/emo_anger_sentences.wav
+ +
-p012/rainbow_05_whisper.wav
- -
+ p012/rainbow_05_whisper.wav
+ +
-p014/rainbow_04_loud.wav
- -
+ p014/rainbow_04_loud.wav
+ +
-p016/rainbow_03_regular.wav
- -
+ p016/rainbow_03_regular.wav
+ +
-p017/rainbow_08_fast.wav
- -
+ p017/rainbow_08_fast.wav
+ +
-p018/vegetative_eating.wav
- -
+ p018/vegetative_eating.wav
+ +
-p019/vegetative_yawning.wav
- -
+ p019/vegetative_yawning.wav
+ +
-p020/freeform_speech_01.wav
- + p020/freeform_speech_01.wav
+


-

Benchmarks

+

+ Benchmarks< + /h2>

The EARS dataset enables various speech processing tasks to be evaluated in a controlled and comparable way. Here, we @@ -251,12 +270,14 @@

Benchmarks


-

EARS-WHAM

+

+ EARS-WHAM +

- For the task of speech enhancement, we construct the EARS-WHAM dataset, which mixes speech from the EARS dataset with - real noise recordings from the WHAM! dataset . More details can be - found in the paper. + For the task of speech enhancement, we construct the EARS-WHAM dataset, which mixes speech from the EARS dataset + with real noise recordings from the WHAM! dataset . More details can + be found in the paper.

Results

@@ -264,8 +285,8 @@

Results

@@ -322,9 +343,18 @@

Results

- Table 2: Results on EARS-WHAM. Values indicate the mean of the metrics over the test set. The - best results are highlighted in bold. + Table 2: Results on EARS-WHAM. Values indicate the mean of the metrics over the test set. + The best results are highlighted in bold.
-

Audio Examples

+

+ Audio Examples +

-

Here we present audio examples for the speech enhancement task. Below we show the noisy input, processed files for Conv-TasNet , CDiffuSE , Demucs , SGMSE+ , and the clean ground truth.

+

+ Here we present audio examples for the speech enhancement task. Below we show the noisy input, processed files for + Conv-TasNet , + CDiffuSE , + Demucs , + SGMSE+ , + and the clean ground truth. +

Select an audio file: