From 1025b6a5b57850695bacca70d364771d44a9cd6c Mon Sep 17 00:00:00 2001 From: oriern Date: Thu, 21 Nov 2024 16:47:25 -0500 Subject: [PATCH] Adding William Held (#434) * Update reading-group.md * Create 2024-11-22-william-held.md --- _pages/reading-group.md | 2 +- .../fall-2024/2024-11-22-william-held.md | 38 +++++++++++++++++++ 2 files changed, 39 insertions(+), 1 deletion(-) create mode 100644 _posts/reading-group/fall-2024/2024-11-22-william-held.md diff --git a/_pages/reading-group.md b/_pages/reading-group.md index 430881b..a5bc696 100644 --- a/_pages/reading-group.md +++ b/_pages/reading-group.md @@ -28,7 +28,7 @@ For the Fall 2024 semester, the reading group will meet on Fridays at 11:30AM (w | November 1st @ 11:45 AM | Bang Liu | Applications and Enhancements of LLM Agents Across Diverse Environments | [click here]({% link _posts/reading-group/fall-2024/2024-11-1-bang-liu.md %}) | | November 8th @ 11:30 AM | Boyuan Zheng | Towards a Generalist Web Agent | [click here]({% link _posts/reading-group/fall-2024/2024-11-07-boyuan-zheng.md %}) | | November 12th to 16th | **EMNLP 2024** | | | -| November 22nd @ 11:30 AM | William Held | *TBA* | *TBA* | +| November 22nd @ 11:30 AM | William Held | Distilling an End-to-End Voice Assistant Without Instruction Training Data | [click here]({% link _posts/reading-group/fall-2024/2024-11-22-william-held.md %}) | | November 29th @ 11:30 AM | Luke Guerdan | *TBA* | *TBA* | | December 6th @ 11:30 AM | Amal Zouaq | *TBA* | *TBA* | diff --git a/_posts/reading-group/fall-2024/2024-11-22-william-held.md b/_posts/reading-group/fall-2024/2024-11-22-william-held.md new file mode 100644 index 0000000..c01be02 --- /dev/null +++ b/_posts/reading-group/fall-2024/2024-11-22-william-held.md @@ -0,0 +1,38 @@ +--- +title: "Distilling an End-to-End Voice Assistant Without Instruction Training Data" +venue: Georgia Tech +names: William Held +author: William Held +tags: +- NLP RG +categories: + - Reading-Group + - Fall-2024 +layout: archive +classes: + - wide + - no-sidebar +--- + +*{{ page.names }}* + +**{{ page.venue }}** + +{% include display-publication-links.html pub=page %} + +The [NLP Reading Group]({% link _pages/reading-group.md %}) is excited to host [William Held](https://williamheld.com/), a PhD student at Georgia Tech, who will be speaking remotely on Zoom on Friday November 22nd about "Distilling an End-to-End Voice Assistant Without Instruction Training Data". + + +## Talk Description + +In this talk, I'll cover the methods we used to train our Distilled Voice Assistant (DiVA) model. Recent efforts to simplify spoken NLP with end-to-end Speech Large Language Models (LLMs) trained with supervised finetuning (SFT) have led to models ``forgetting" capabilities from text-only LLMs. Our work proposes an alternative paradigm for training Speech LLMs without instruction data, using the response of a text-only LLM to transcripts as self-supervision. Importantly, this process can be performed directly with ASR data. We show that our Distilled Voice Assistant (DiVA) generalizes to unseen tasks and improves user experience, achieving a 72\% win rate compared with state-of-the-art open models like Qwen 2 Audio. Finally, I'll cover the open-source efforts we've made to support training and demoing Speech LLM systems. + +## Speaker Bio + +William Held is a Machine Learning PhD student at Georgia Tech, advised by Diyi Yang in the Stanford NLP Group. Before that, early engineer at Sunshine. Focused on enabling inclusive language technology by modeling linguistic variation. + +## Logistics + +Date: November 22nd
+Time: 11:30AM
+Location: Zoom (See email)