copyright

lastupdated

subcollection

years
2015, 2025

2025-02-05

speech-to-text

The science behind the service

{: #science}

{{site.data.keyword.IBM}} has been at the forefront of speech recognition research since the early 1960s and continues this rich tradition of research and development with the {{site.data.keyword.speechtotextfull}} service. {: shortdesc}

{{site.data.keyword.IBM_notm}} has demonstrated industry-record speech recognition accuracy on the public benchmark data sets for Conversational Telephone Speech (CTS) and Broadcast News (BN) transcription. {{site.data.keyword.IBM_notm}} leveraged neural networks for language modeling in addition to demonstrating the effectiveness of acoustic modeling.

The following announcements summarize {{site.data.keyword.IBM_notm}}'s recent speech recognition accomplishments:

Reaching new records in speech recognition{: external}
{{site.data.keyword.IBM_notm}} Breaks Industry Record for Conversational Speech Recognition by Extending Deep Learning Technologies{: external}
{{site.data.keyword.IBM_notm}} Sets New Transcription Performance Milestone on Automatic Broadcast News Captioning{: external}
IBM Research AI Advances Speaker Diarization in Real Use Cases{: external}
Advancing RNN Transducer Technology for Speech Recognition{: external}

These accomplishments contribute to further advance {{site.data.keyword.IBM_notm}}'s speech services. Recent ideas that best fit the cloud-based {{site.data.keyword.speechtotextshort}} service include

For language modeling, {{site.data.keyword.IBM_notm}} leverages a neural network-based language model to generate training text.
For acoustic modeling, {{site.data.keyword.IBM_notm}} uses a fairly compact model to accommodate the resource limitations of the cloud. To train this compact model, {{site.data.keyword.IBM_notm}} uses "teacher-student training / knowledge distillation." Large and strong neural networks such as Long Short-Term Memory (LSTM), VGG, and the Residual Network (ResNet) are first trained. The output of these networks is then used as teacher signals to train a compact model for actual deployment.

To further push the envelope, {{site.data.keyword.IBM_notm}} also focuses on end-to-end modeling. For example, it has established a strong modeling pipeline for direct acoustic-to-word models that it is now further improving. It is also making efforts to create compact end-to-end models for future deployment on the cloud.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

science.md

science.md

The science behind the service

Files

science.md

Latest commit

History

science.md

File metadata and controls

The science behind the service