generated from mmistakes/mm-github-pages-starter
-
Notifications
You must be signed in to change notification settings - Fork 19
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Update reading-group.md * Create 2024-11-22-william-held.md
- Loading branch information
Showing
2 changed files
with
39 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
--- | ||
title: "Distilling an End-to-End Voice Assistant Without Instruction Training Data" | ||
venue: Georgia Tech | ||
names: William Held | ||
author: William Held | ||
tags: | ||
- NLP RG | ||
categories: | ||
- Reading-Group | ||
- Fall-2024 | ||
layout: archive | ||
classes: | ||
- wide | ||
- no-sidebar | ||
--- | ||
|
||
*{{ page.names }}* | ||
|
||
**{{ page.venue }}** | ||
|
||
{% include display-publication-links.html pub=page %} | ||
|
||
The [NLP Reading Group]({% link _pages/reading-group.md %}) is excited to host [William Held](https://williamheld.com/), a PhD student at Georgia Tech, who will be speaking remotely on Zoom on Friday November 22nd about "Distilling an End-to-End Voice Assistant Without Instruction Training Data". | ||
|
||
|
||
## Talk Description | ||
|
||
In this talk, I'll cover the methods we used to train our Distilled Voice Assistant (DiVA) model. Recent efforts to simplify spoken NLP with end-to-end Speech Large Language Models (LLMs) trained with supervised finetuning (SFT) have led to models ``forgetting" capabilities from text-only LLMs. Our work proposes an alternative paradigm for training Speech LLMs without instruction data, using the response of a text-only LLM to transcripts as self-supervision. Importantly, this process can be performed directly with ASR data. We show that our Distilled Voice Assistant (DiVA) generalizes to unseen tasks and improves user experience, achieving a 72\% win rate compared with state-of-the-art open models like Qwen 2 Audio. Finally, I'll cover the open-source efforts we've made to support training and demoing Speech LLM systems. | ||
|
||
## Speaker Bio | ||
|
||
William Held is a Machine Learning PhD student at Georgia Tech, advised by Diyi Yang in the Stanford NLP Group. Before that, early engineer at Sunshine. Focused on enabling inclusive language technology by modeling linguistic variation. | ||
|
||
## Logistics | ||
|
||
Date: November 22nd<br> | ||
Time: 11:30AM <br> | ||
Location: Zoom (See email) |