From e3cf872474489bdbe7b765a75e8a8a7aff728d85 Mon Sep 17 00:00:00 2001 From: cesare-spinoso <36133778+cesare-spinoso@users.noreply.github.com> Date: Tue, 26 Nov 2024 10:09:49 -0500 Subject: [PATCH] Adding Luke Guerdan's talk (#435) * fix bugs and add speaker * commit * add speaker --- _pages/reading-group.md | 4 +- .../fall-2024/2024-11-29-luke-guerdan.md | 40 +++++++++++++++++++ 2 files changed, 42 insertions(+), 2 deletions(-) create mode 100644 _posts/reading-group/fall-2024/2024-11-29-luke-guerdan.md diff --git a/_pages/reading-group.md b/_pages/reading-group.md index a5bc696..105a71b 100644 --- a/_pages/reading-group.md +++ b/_pages/reading-group.md @@ -29,8 +29,8 @@ For the Fall 2024 semester, the reading group will meet on Fridays at 11:30AM (w | November 8th @ 11:30 AM | Boyuan Zheng | Towards a Generalist Web Agent | [click here]({% link _posts/reading-group/fall-2024/2024-11-07-boyuan-zheng.md %}) | | November 12th to 16th | **EMNLP 2024** | | | | November 22nd @ 11:30 AM | William Held | Distilling an End-to-End Voice Assistant Without Instruction Training Data | [click here]({% link _posts/reading-group/fall-2024/2024-11-22-william-held.md %}) | -| November 29th @ 11:30 AM | Luke Guerdan | *TBA* | *TBA* | -| December 6th @ 11:30 AM | Amal Zouaq | *TBA* | *TBA* | +| November 29th @ 11:30 AM | Luke Guerdan | Towards Principled Model Evaluation Under Imperfect "Ground Truth" Labels | [click here]({% link _posts/reading-group/fall-2024/2024-11-29-luke-guerdan.md %}) | +| December 6th @ 11:30 AM | Poster presentations for [Comp 767](https://mcgill-nlp.github.io/teaching/comp767-ling782-F24/) | Various LLM-related topics - To be hosted at the Mila Agora | | ## History diff --git a/_posts/reading-group/fall-2024/2024-11-29-luke-guerdan.md b/_posts/reading-group/fall-2024/2024-11-29-luke-guerdan.md new file mode 100644 index 0000000..3cb21bb --- /dev/null +++ b/_posts/reading-group/fall-2024/2024-11-29-luke-guerdan.md @@ -0,0 +1,40 @@ +--- +title: Towards Principled Model Evaluation Under Imperfect "Ground Truth" Labels +venue: Carnegie Mellon University +names: Luke Guerdan +author: Luke Guerdan +tags: +- NLP RG +categories: + - Reading-Group + - Fall-2024 +layout: archive +classes: + - wide + - no-sidebar +--- + +*{{ page.names }}* + +**{{ page.venue }}** + +{% include display-publication-links.html pub=page %} + +The [NLP Reading Group]({% link _pages/reading-group.md %}) is excited to host [Luke Guerdan](https://lukeguerdan.com/), a PhD student at CMU, who will be speaking remotely on Zoom on Friday November 29th about "Towards Principled Model Evaluation Under Imperfect "Ground Truth" Labels". + + +## Talk Description + +In many evaluation contexts, “ground truth” labels are an imperfect proxy for the broader capabilities or limitations of interest—such as the “relevance” of retrieval augmented generation (RAG) outputs or the “toxicity” of chatbot responses. How can we conduct statistically rigorous and informative performance evaluations under an imperfect gold standard? + +In this talk, I begin by addressing this question in the context of predictive modeling for algorithmic decision support. I describe an approach that leverages structured human feedback in the form of expert anchor assumptions that better-connect observable proxy labels to unobservable constructs of interest. I validate this approach theoretically, and empirically demonstrate that measurement error modeling is critical for learning reliable models. I conclude by illustrating that a similar approach is necessary while evaluating LLMs under violations to the "gold labels’’ assumption. + +## Speaker Bio + +Luke Guerdan is a Ph.D. student in the Human-Computer Interaction Institute at Carnegie Mellon University. His research focuses on developing tools to evaluate the capabilities and limitations of human-algorithmic systems under imperfect labels. Luke's work has been recognized with an ACM FAccT Best Paper Award and an NSF Graduate Research Fellowship. + +## Logistics + +Date: November 29th
+Time: 11:30AM
+Location: Zoom (See email)