Skip to content

Commit

Permalink
clean up
Browse files Browse the repository at this point in the history
  • Loading branch information
Julius Richter committed Jun 24, 2024
1 parent 5c983ff commit 121b3a6
Showing 1 changed file with 162 additions and 106 deletions.
268 changes: 162 additions & 106 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -67,11 +67,18 @@
}
</style>

<h1>EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation</h1>
<h1>
EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation
</h1>

<p class="left-align">Julius Richter, Yi-Chiao Wu, Steven Krenn, Simon Welker, Bunlong Lay, Shinji Watanabe, Alexander Richard, Timo Gerkmann</p>
<p class="left-align">
Julius Richter, Yi-Chiao Wu, Steven Krenn, Simon Welker, Bunlong Lay, Shinji Watanabe, Alexander Richard, Timo
Gerkmann
</p>

<h2>Abstract</h2>
<h2>
Abstract
</h2>

<p>
We release the EARS (<b>E</b>xpressive <b>A</b>nechoic <b>R</b>ecordings of <b>S</b>peech) dataset, a high-quality
Expand All @@ -84,9 +91,14 @@ <h2>Abstract</h2>
Dataset download links and automatic evaluation server can be found online.
</p>

<h2>EARS Dataset</h2>
<h2>
EARS Dataset
</h2>

<p>The EARS dataset is characterized by its scale, diversity, and high recording quality. In Table 1, we list characteristics of the EARS dataset in comparison to other speech datasets. </p>
<p>
The EARS dataset is characterized by its scale, diversity, and high recording quality. In Table 1, we list
characteristics of the EARS dataset in comparison to other speech datasets.
</p>

<div class="table-container">
<table>
Expand Down Expand Up @@ -150,99 +162,106 @@ <h2>EARS Dataset</h2>
</div>

<p>
EARS contains 100 h of anechoic speech recordings at 48 kHz from over 100 English speakers with high demographic diversity.
The dataset spans the full range of human speech, including reading tasks in seven different reading styles, emotional reading
and freeform speech in 22 different emotions, conversational speech, and non-verbal sounds like laughter or coughing. Reading
tasks feature seven styles (regular, loud, whisper, fast, slow, high pitch, and low pitch). Additionally, the dataset features
unconstrained freeform speech and speech in 22 different emotional styles. We provide transcriptions of the reading portion and
meta-data of the speakers (gender, age, race, first language).
EARS contains 100 h of anechoic speech recordings at 48 kHz from over 100 English speakers with high demographic
diversity. The dataset spans the full range of human speech, including reading tasks in seven different reading
styles, emotional reading and freeform speech in 22 different emotions, conversational speech, and non-verbal sounds
like laughter or coughing. Reading tasks feature seven styles (regular, loud, whisper, fast, slow, high pitch, and
low pitch). Additionally, the dataset features unconstrained freeform speech and speech in 22 different emotional
styles. We provide transcriptions of the reading portion and meta-data of the speakers (gender, age, race, first
language).
</p>

<h3>Audio Examples</h3>
<h3>
Audio Examples
</h3>

<p>Here we present a few audio examples from the EARS dataset. </p>
<p>
Here we present a few audio examples from the EARS dataset.
</p>

<p>
p002/emo_adoration_sentences.wav<br>
<audio controls>
<source src="https://www2.informatik.uni-hamburg.de/sp/audio/publications/interspeech2024-ears/files/ears/p002_emo_adoration_sentences.wav" type="audio/wav">
Your browser does not support the audio element.
</audio>
<br>
p002/emo_adoration_sentences.wav<br>
<audio controls>
<source src="https://www2.informatik.uni-hamburg.de/sp/audio/publications/interspeech2024-ears/files/ears/p002_emo_adoration_sentences.wav" type="audio/wav">
Your browser does not support the audio element.
</audio>
<br>

p008/emo_contentment_sentences.wav<br>
<audio controls>
<source src="https://www2.informatik.uni-hamburg.de/sp/audio/publications/interspeech2024-ears/files/ears/p008_emo_contentment_sentences.wav" type="audio/wav">
Your browser does not support the audio element.
</audio>
<br>
p008/emo_contentment_sentences.wav<br>
<audio controls>
<source src="https://www2.informatik.uni-hamburg.de/sp/audio/publications/interspeech2024-ears/files/ears/p008_emo_contentment_sentences.wav" type="audio/wav">
Your browser does not support the audio element.
</audio>
<br>

p010/emo_cuteness_sentences.wav<br>
<audio controls>
<source src="https://www2.informatik.uni-hamburg.de/sp/audio/publications/interspeech2024-ears/files/ears/p010_emo_cuteness_sentences.wav" type="audio/wav">
Your browser does not support the audio element.
</audio>
<br>
p010/emo_cuteness_sentences.wav<br>
<audio controls>
<source src="https://www2.informatik.uni-hamburg.de/sp/audio/publications/interspeech2024-ears/files/ears/p010_emo_cuteness_sentences.wav" type="audio/wav">
Your browser does not support the audio element.
</audio>
<br>

p011/emo_anger_sentences.wav<br>
<audio controls>
<source src="https://www2.informatik.uni-hamburg.de/sp/audio/publications/interspeech2024-ears/files/ears/p011_emo_anger_sentences.wav" type="audio/wav">
Your browser does not support the audio element.
</audio>
<br>
p011/emo_anger_sentences.wav<br>
<audio controls>
<source src="https://www2.informatik.uni-hamburg.de/sp/audio/publications/interspeech2024-ears/files/ears/p011_emo_anger_sentences.wav" type="audio/wav">
Your browser does not support the audio element.
</audio>
<br>

p012/rainbow_05_whisper.wav<br>
<audio controls>
<source src="https://www2.informatik.uni-hamburg.de/sp/audio/publications/interspeech2024-ears/files/ears/p012_rainbow_05_whisper.wav" type="audio/wav">
Your browser does not support the audio element.
</audio>
<br>
p012/rainbow_05_whisper.wav<br>
<audio controls>
<source src="https://www2.informatik.uni-hamburg.de/sp/audio/publications/interspeech2024-ears/files/ears/p012_rainbow_05_whisper.wav" type="audio/wav">
Your browser does not support the audio element.
</audio>
<br>

p014/rainbow_04_loud.wav<br>
<audio controls>
<source src="https://www2.informatik.uni-hamburg.de/sp/audio/publications/interspeech2024-ears/files/ears/p014_rainbow_04_loud.wav" type="audio/wav">
Your browser does not support the audio element.
</audio>
<br>
p014/rainbow_04_loud.wav<br>
<audio controls>
<source src="https://www2.informatik.uni-hamburg.de/sp/audio/publications/interspeech2024-ears/files/ears/p014_rainbow_04_loud.wav" type="audio/wav">
Your browser does not support the audio element.
</audio>
<br>

p016/rainbow_03_regular.wav<br>
<audio controls>
<source src="https://www2.informatik.uni-hamburg.de/sp/audio/publications/interspeech2024-ears/files/ears/p016_rainbow_03_regular.wav" type="audio/wav">
Your browser does not support the audio element.
</audio>
<br>
p016/rainbow_03_regular.wav<br>
<audio controls>
<source src="https://www2.informatik.uni-hamburg.de/sp/audio/publications/interspeech2024-ears/files/ears/p016_rainbow_03_regular.wav" type="audio/wav">
Your browser does not support the audio element.
</audio>
<br>

p017/rainbow_08_fast.wav<br>
<audio controls>
<source src="https://www2.informatik.uni-hamburg.de/sp/audio/publications/interspeech2024-ears/files/ears/p017_rainbow_08_fast.wav" type="audio/wav">
Your browser does not support the audio element.
</audio>
<br>
p017/rainbow_08_fast.wav<br>
<audio controls>
<source src="https://www2.informatik.uni-hamburg.de/sp/audio/publications/interspeech2024-ears/files/ears/p017_rainbow_08_fast.wav" type="audio/wav">
Your browser does not support the audio element.
</audio>
<br>

p018/vegetative_eating.wav<br>
<audio controls>
<source src="https://www2.informatik.uni-hamburg.de/sp/audio/publications/interspeech2024-ears/files/ears/p018_vegetative_eating.wav" type="audio/wav">
Your browser does not support the audio element.
</audio>
<br>
p018/vegetative_eating.wav<br>
<audio controls>
<source src="https://www2.informatik.uni-hamburg.de/sp/audio/publications/interspeech2024-ears/files/ears/p018_vegetative_eating.wav" type="audio/wav">
Your browser does not support the audio element.
</audio>
<br>

p019/vegetative_yawning.wav<br>
<audio controls>
<source src="https://www2.informatik.uni-hamburg.de/sp/audio/publications/interspeech2024-ears/files/ears/p019_vegetative_yawning.wav" type="audio/wav">
Your browser does not support the audio element.
</audio>
<br>
p019/vegetative_yawning.wav<br>
<audio controls>
<source src="https://www2.informatik.uni-hamburg.de/sp/audio/publications/interspeech2024-ears/files/ears/p019_vegetative_yawning.wav" type="audio/wav">
Your browser does not support the audio element.
</audio>
<br>

p020/freeform_speech_01.wav<br>
<audio controls>
<source src="https://www2.informatik.uni-hamburg.de/sp/audio/publications/interspeech2024-ears/files/ears/p020_freeform_speech_01.wav" type="audio/wav">
Your browser does not support the audio element.
</audio>
p020/freeform_speech_01.wav<br>
<audio controls>
<source src="https://www2.informatik.uni-hamburg.de/sp/audio/publications/interspeech2024-ears/files/ears/p020_freeform_speech_01.wav" type="audio/wav">
Your browser does not support the audio element.
</audio>
</p>

<br>

<h2>Benchmarks</h2>
<h2>
Benchmarks<
/h2>

<p>
The EARS dataset enables various speech processing tasks to be evaluated in a controlled and comparable way. Here, we
Expand All @@ -251,21 +270,23 @@ <h2>Benchmarks</h2>

<br>

<h3>EARS-WHAM</h3>
<h3>
EARS-WHAM
</h3>

<p>
For the task of speech enhancement, we construct the EARS-WHAM dataset, which mixes speech from the EARS dataset with
real noise recordings from the WHAM! dataset <span class="reference" data-ref="wham"></span>. More details can be
found in the <a href="https://arxiv.org/abs/2406.06185" target="_blank">paper</a>.
For the task of speech enhancement, we construct the EARS-WHAM dataset, which mixes speech from the EARS dataset
with real noise recordings from the WHAM! dataset <span class="reference" data-ref="wham"></span>. More details can
be found in the <a href="https://arxiv.org/abs/2406.06185" target="_blank">paper</a>.
</p>

<h4>Results</h4>

<div class="table-container">
<table style="width:100%; border-collapse: collapse;">
<caption style="caption-side: bottom; text-align: left; padding: 8px; font-style: italic;">
<strong>Table 2: Results on EARS-WHAM.</strong> Values indicate the mean of the metrics over the test set. The
best results are highlighted in bold.
<strong>Table 2: Results on EARS-WHAM.</strong> Values indicate the mean of the metrics over the test set.
The best results are highlighted in bold.
</caption>
<thead>
<tr style="border-top: 2px solid black; border-bottom: 2px solid black;">
Expand Down Expand Up @@ -322,9 +343,18 @@ <h4>Results</h4>
</table>
</div>

<h4>Audio Examples</h4>
<h4>
Audio Examples
</h4>

<p>Here we present audio examples for the speech enhancement task. Below we show the noisy input, processed files for Conv-TasNet <span class="reference" data-ref="convtasnet"></span>, CDiffuSE <span class="reference" data-ref="cdiffuse"></span>, Demucs <span class="reference" data-ref="demucs"></span>, SGMSE+ <span class="reference" data-ref="sgmse"></span>, and the clean ground truth.</p>
<p>
Here we present audio examples for the speech enhancement task. Below we show the noisy input, processed files for
Conv-TasNet <span class="reference" data-ref="convtasnet"></span>,
CDiffuSE <span class="reference" data-ref="cdiffuse"></span>,
Demucs <span class="reference" data-ref="demucs"></span>,
SGMSE+ <span class="reference" data-ref="sgmse"></span>,
and the clean ground truth.
</p>

<p>Select an audio file: &nbsp;&nbsp;
<select id="audioSelect" onchange="playAudio()">
Expand Down Expand Up @@ -442,7 +472,9 @@ <h4>Audio Examples</h4>
</script>
<br>

<h3>Blind test set</h3>
<h3>
Blind test set
</h3>

<p>
We create a blind test set for which we only publish the noisy audio files but not the clean ground truth. It
Expand All @@ -455,8 +487,8 @@ <h4>Results</h4>
<div class="table-container">
<table>
<caption style="caption-side: bottom; text-align: left; padding: 8px; font-style: italic;">
<strong>Table 3: Results for the blind test.</strong> Values indicate the mean of the metrics over the test set.
The best results are highlighted in bold.
<strong>Table 3: Results for the blind test.</strong> Values indicate the mean of the metrics over the test
set. The best results are highlighted in bold.
</caption>
<thead>
<tr style="border-top: 2px solid black; border-bottom: 2px solid black;">
Expand Down Expand Up @@ -513,10 +545,16 @@ <h4>Results</h4>
</table>
</div>

<h4>Audio Examples</h4>
<h4>
Audio Examples
</h4>

<p>Here we present audio examples for the blind test set. Below we show the noisy input, processed files for Conv-TasNet <span class="reference" data-ref="convtasnet"></span>,
CDiffuSE <span class="reference" data-ref="cdiffuse"></span>, Demucs <span class="reference" data-ref="demucs"></span>, and SGMSE+ <span class="reference" data-ref="sgmse"></span>.
<p>
Here we present audio examples for the blind test set. Below we show the noisy input, processed files for
Conv-TasNet <span class="reference" data-ref="convtasnet"></span>,
CDiffuSE <span class="reference" data-ref="cdiffuse"></span>,
Demucs <span class="reference" data-ref="demucs"></span>,
and SGMSE+ <span class="reference" data-ref="sgmse"></span>.
</p>

<p>Select an audio file: &nbsp;&nbsp;
Expand Down Expand Up @@ -624,7 +662,9 @@ <h4>Audio Examples</h4>
</script>
<br>

<h3>Evaluation on real-world data</h3>
<h3>
Evaluation on real-world data
</h3>

<p>
This demo showcases the denoising capabilities of SGMSE+ <span class="reference" data-ref="sgmse"></span> trained using the EARS-WHAM dataset.
Expand All @@ -637,14 +677,20 @@ <h3>Evaluation on real-world data</h3>

<br>

<h3>
Dereverberation (EARS-Reverb)
</h3>

<h3>Dereverberation (EARS-Reverb)</h3>

<p>For the task of dereverberation, we use real recorded room impulse responses (RIRs) from multiple public datasets
<span class="reference" data-ref="ace air arni brudex dechorate detmoldsrir palimpsest"></span>. We generate reverberant speech by convolving the clean
speech with the RIR. More details can be found in the <a href="https://arxiv.org/abs/2406.06185" target="_blank">paper</a>.</p>
<p>
For the task of dereverberation, we use real recorded room impulse responses (RIRs) from multiple public datasets
<span class="reference" data-ref="ace air arni brudex dechorate detmoldsrir palimpsest"></span>. We generate
reverberant speech by convolving the clean speech with the RIR. More details can be found in the
<a href="https://arxiv.org/abs/2406.06185" target="_blank">paper</a>.
</p>

<h4>Results</h4>
<h4>
Results
</h4>

<div class="table-container">
<table>
Expand Down Expand Up @@ -683,9 +729,14 @@ <h4>Results</h4>
</table>
</div>

<h4>Audio Examples</h4>
<h4>
Audio Examples
</h4>

<p>Here we present audio examples for the dereverberation task. Below we show the reverberant input, processed files for SGMSE+ <span class="reference" data-ref="sgmse"></span>, and the clean ground truth.</p>
<p>
Here we present audio examples for the dereverberation task. Below we show the reverberant input, processed files
for SGMSE+ <span class="reference" data-ref="sgmse"></span>, and the clean ground truth.
</p>

<p>
Select an audio file: &nbsp;&nbsp;
Expand Down Expand Up @@ -772,10 +823,13 @@ <h4>Audio Examples</h4>

<br>

<h2 id="citation">Citation</h2>
<h2 id="citation">
Citation
</h2>

<p>
If you use the dataset or any derivative of it, please cite our <a href="https://arxiv.org/abs/2406.06185" target="_blank">research paper</a>:
If you use the dataset or any derivative of it, please cite our
<a href="https://arxiv.org/abs/2406.06185" target="_blank">research paper</a>:
</p>

<p>
Expand All @@ -784,10 +838,12 @@ <h2 id="citation">Citation</h2>
author={Richter, Julius and Wu, Yi-Chiao and Krenn, Steven and Welker, Simon and Lay, Bunlong and Watanabe, Shinjii and Richard, Alexander and Gerkmann, Timo},
booktitle={Interspeech},
year={2024}{% endraw %}
}</code></pre>
</code></pre>
</p>

<h2>References</h2>
<h2>
References
</h2>

<ol id="refList" style="list-style-type: none; padding: 0;">
<!-- JavaScript will populate this list -->
Expand Down

0 comments on commit 121b3a6

Please sign in to comment.