Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASR Timeline #2

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 49 additions & 50 deletions html/asr.html
Original file line number Diff line number Diff line change
Expand Up @@ -45,77 +45,76 @@ <h1>Automatic Speech Recognition</h1>
<div class="container">
<ul>
<li>
<h3 class="heading">IndicWav2Vec</h3>
<p>IndicWav2Vec is a speech model pretrained on 17,000 hours of unlabelled audio across 40 Indian
languages, offering the most extensive language coverage among models tailored for Indian
languages. </p>
<a href="github"><i class="fa-brands fa-github"></i></a>
<a href="arxiv"><i class="fa-regular fa-file-pdf"></i></a>
<a href="jupyter"><img src="../assets/logos/colab.png"></i></a>
<span class="date dataset">February 2021</span>
<h3 class="heading">IndicVoices 2.0</h3>
<p>IndicVoices 2.0 is a dataset of natural and spontaneous speech containing a total of 12000 hours of read (8%), extempore (76%) and conversational (15%) audio from 22563 speakers covering 208 Indian districts and 22 languages. Of these 12000 hours, 3200 hours have already been transcribed, with a median of 122 hours per language.</p>
<a href="https://github.com/AI4Bharat/IndicVoices"><img src="../assets/logos/github.png"></a>
<a href="https://arxiv.org/abs/2403.01926"><img src="../assets/logos/paper.png"></a>
<!-- <a href="jupyter"><img src="../assets/logos/colab.png"></i></a> -->
<span class="date dataset">June 2024</span>
<span class="circle"></span>
</li>
<li>
<h3 class="heading">Some Model</h3>
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Fugit excepturi accusamus minus
totam </p>
<a href="github"><i class="fa-brands fa-github"></i></a>
<a href="arxiv"><i class="fa-regular fa-file-pdf"></i></a>
<span class="date model">February 2021</span>
<h3 class="heading">Lahaja</h3>
<p>Lahaja is a benchmark featuring 12.5 hours of Hindi audio to facilitate a comprehensive assessment of Hindi ASR systems across various accents. This dataset includes read and spontaneous speech on diverse topics, collected from 132 speakers across 83 districts in India.</p>
<a href="https://github.com/AI4Bharat/Lahaja"><img src="../assets/logos/github.png"></a>
<a href="https://arxiv.org/abs/2408.11440"><img src="../assets/logos/paper.png"></a>
<span class="date dataset">June 2024</span>
<span class="circle"></span>
</li>
<li>
<h3 class="heading">Some Dataset</h3>
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Fugit excepturi accusamus minus
totam </p>
<a href="github"><i class="fa-brands fa-github"></i></a>
<a href="arxiv"><i class="fa-regular fa-file-pdf"></i></a>
<span class="date dataset">March 2021</span>
<h3 class="heading">IndicVoices 1.0</h3>
<p>IndicVoices 1.0 is a dataset of natural and spontaneous speech containing a total of 7348 hours of read (9%), extempore (74%) and conversational (17%) audio from 16237 speakers covering 145 Indian districts and 22 languages. Of these 7348 hours, 1639 hours have already been transcribed, with a median of 73 hours per language.</p>
<a href="https://github.com/AI4Bharat/IndicVoices"><img src="../assets/logos/github.png"></a>
<a href="https://arxiv.org/abs/2403.01926"><img src="../assets/logos/paper.png"></a>
<span class="date dataset">March 2024</span>
<span class="circle"></span>
</li>
<li>
<h3 class="heading">Some Model</h3>
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Fugit excepturi accusamus minus
totam </p>
<a href="github"><i class="fa-brands fa-github"></i></a>
<a href="arxiv"><i class="fa-regular fa-file-pdf"></i></a>
<span class="date model">April 2021</span>
<h3 class="heading">Svarah</h3>
<p>Svarah is a benchmark addressing gaps in ASR performance on Indian accents, featuring 9.6 hours of transcribed English audio from 117 speakers across 65 locations in India. It includes both read and spontaneous speech across various domains, ensuring diverse vocabulary.</p>
<a href="https://github.com/AI4Bharat/Svarah"><img src="../assets/logos/github.png"></a>
<a href="https://arxiv.org/abs/2305.15760"><img src="../assets/logos/paper.png"></a>
<span class="date dataset">August 2023</span>
<span class="circle"></span>
</li>
<li>
<h3 class="heading">Some Model</h3>
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Fugit excepturi accusamus minus
totam </p>
<a href="github"><i class="fa-brands fa-github"></i></a>
<a href="arxiv"><i class="fa-regular fa-file-pdf"></i></a>
<span class="date model">April 2021</span>
<h3 class="heading">IndicWhisper</h3>
<p>IndicWhisper is finetuned on OpenAI’s Whisper model using the Vistaar-train set with over 10,000 hours across 12 Indian languages</p>
<a href="https://github.com/AI4Bharat/vistaar"><img src="../assets/logos/github.png"></a>
<a href="https://arxiv.org/abs/2305.15386"><img src="../assets/logos/paper.png"></a>
<span class="date model">July 2023</span>
<span class="circle"></span>
</li>
<li>
<h3 class="heading">Some Dataset</h3>
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Fugit excepturi accusamus minus
totam </p>
<a href="github"><i class="fa-brands fa-github"></i></a>
<a href="arxiv"><i class="fa-regular fa-file-pdf"></i></a>
<span class="date dataset">April 2021</span>
<h3 class="heading">Kathbath</h3>
<p>Kathbath is a comprehensive dataset comprising 1,684 hours of labeled speech data collected from 1,218 contributors across 203 districts in India, spanning 12 Indian languages. </p>
<a href="https://github.com/AI4Bharat/IndicSUPERB"><img src="../assets/logos/github.png"></a>
<a href="https://arxiv.org/abs/2208.11761"><img src="../assets/logos/paper.png"></a>
<span class="date dataset">February 2023</span>
<span class="circle"></span>
</li>
<li>
<h3 class="heading">Some Model</h3>
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Fugit excepturi accusamus minus
totam </p>
<a href="github"><i class="fa-brands fa-github"></i></a>
<a href="arxiv"><i class="fa-regular fa-file-pdf"></i></a>
<span class="date model">April 2021</span>
<h3 class="heading">Shrutilipi</h3>
<p>Shrutilipi is a dataset with 6,400+ hours of labeled audio across 12 Indian languages, totaling 4.95M sentences, created by mining text audio pairs from All India Radio.</p>
<a href="https://github.com/AI4Bharat/Shrutilipi"><img src="../assets/logos/github.png"></a>
<a href="https://arxiv.org/abs/2208.12666"><img src="../assets/logos/paper.png"></a>
<span class="date model">August 2022</span>
<span class="circle"></span>
</li>
<li>
<h3 class="heading">Some Dataset</h3>
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Fugit excepturi accusamus minus
totam </p>
<a href="github"><i class="fa-brands fa-github"></i></a>
<a href="arxiv"><i class="fa-regular fa-file-pdf"></i></a>
<span class="date dataset">April 2021</span>
<h3 class="heading">Dhwani</h3>
<p>Dhwani is a unlabelled audio dataset consisting of 17,000 hours of raw speech data for 40 Indian languages from a wide variety of domains including education, news, technology, and finance </p>
<a href="https://github.com/AI4Bharat/IndicWav2Vec/tree/main/data_prep_scripts/urls"><img src="../assets/logos/github.png"></a>
<a href="https://arxiv.org/abs/2111.03945"><img src="../assets/logos/paper.png"></a>
<span class="date dataset">February 2022</span>
<span class="circle"></span>
</li>
<li>
<h3 class="heading">IndicWav2Vec</h3>
<p>IndicWav2Vec is a speech model pretrained on 17,000 hours of unlabelled audio across 40 Indian languages, offering the most extensive language coverage among models tailored for Indian languages. </p>
<a href="https://github.com/AI4Bharat/IndicWav2Vec"><img src="../assets/logos/github.png"></a>
<a href="https://arxiv.org/abs/2111.03945"><img src="../assets/logos/paper.png"></a>
<span class="date model">February 2022</span>
<span class="circle"></span>
</li>
</ul>
Expand Down