Speech Paper Voting Round 2 #12

EverlynAsiko · 2022-02-02T06:47:36Z

In this issue you can either:

Add papers that you think are interesting to read and discuss (please stick to the format).
Vote: should be done using 👍 on comments

Example: Voting Paper #1

I have added some papers collected on the Papers to read sheet

EverlynAsiko · 2022-02-02T07:04:57Z

Listen, Attend and Spell

Link to paper

Abstract

We present Listen, Attend and Spell (LAS), a neural network that learns to transcribe speech utterances to characters. Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly. Our system has two components: a listener and a speller. The listener is a pyramidal recurrent network encoder that accepts filter bank spectra as inputs. The speller is an attention-based recurrent network decoder that emits characters as outputs. The network produces character sequences without making any independence assumptions between the characters. This is the key improvement of LAS over previous end-to-end CTC models. On a subset of the Google voice search task, LAS achieves a word error rate (WER) of 14.1% without a dictionary or a language model, and 10.3% with language model rescoring over the top 32 beams. By comparison, the state-of-the-art CLDNN-HMM model achieves a WER of 8.0%.

EverlynAsiko · 2022-02-16T17:30:53Z

Automatic speech recognition: a survey

Link to paper

Abstract

Recently great strides have been made in the field of automatic speech recognition (ASR) by using various deep learning techniques. In this study, we present a thorough comparison between cutting-edged techniques currently being used in this area, with a special focus on the various deep learning methods. This study explores different feature extraction methods, state-of-the-art classification models, and vis-a-vis their impact on an ASR. As deep learning techniques are very data-dependent different speech datasets that are available online are also discussed in detail. In the end, the various online toolkits, resources, and language models that can be helpful in the formulation of an ASR are also proffered. In this study, we captured every aspect that can impact the performance of an ASR. Hence, we speculate that this work is a good starting point for academics interested in ASR research.

JRMeyer · 2022-02-16T17:57:09Z

Sequence Transduction with Recurrent Neural Networks

The transducer described in this paper extends CTC by defining a distribution over output sequences of all lengths, and by jointly modelling both input-output and output-output dependencies.

link to paper

Abstract

Many machine learning tasks can be expressed as the transformation — or transduction — of input sequences into output sequences: speech recognition, machine translation, protein secondary structure prediction and text-to-speech to name but a few. One of the key challenges in sequence transduction is learning to represent both the input and output sequences in a way that is invariant to sequential distortions such as shrinking, stretching and translating. Recurrent neural networks (RNNs) are a powerful sequence learning architecture that has proven capable of learning such representations. However RNNs traditionally require a pre-defined alignment between the input and output sequences to perform transduction. This is a severe limitation since finding the alignment is the most difficult aspect of many sequence transduction problems. Indeed, even determining the length of the output sequence is often challenging. This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence. Experimental results for phoneme recognition are provided on the TIMIT speech corpus.

This comment was marked as outdated.

Sign in to view

EverlynAsiko changed the title ~~Speech Paper Voting #1~~ Speech Paper Voting Round 1 Feb 2, 2022

EverlynAsiko changed the title ~~Speech Paper Voting Round 1~~ Speech Paper Voting Round 2 Feb 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speech Paper Voting Round 2 #12

Speech Paper Voting Round 2 #12

EverlynAsiko commented Feb 2, 2022

This comment was marked as outdated.

EverlynAsiko commented Feb 2, 2022

EverlynAsiko commented Feb 16, 2022

JRMeyer commented Feb 16, 2022

Speech Paper Voting Round 2 #12

Speech Paper Voting Round 2 #12

Comments

EverlynAsiko commented Feb 2, 2022

This comment was marked as outdated.

EverlynAsiko commented Feb 2, 2022

Listen, Attend and Spell

EverlynAsiko commented Feb 16, 2022

Automatic speech recognition: a survey

JRMeyer commented Feb 16, 2022

Sequence Transduction with Recurrent Neural Networks