generated from mmistakes/mm-github-pages-starter
-
Notifications
You must be signed in to change notification settings - Fork 19
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* fix bugs and add speaker * commit * add speaker
- Loading branch information
1 parent
1025b6a
commit e3cf872
Showing
2 changed files
with
42 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
--- | ||
title: Towards Principled Model Evaluation Under Imperfect "Ground Truth" Labels | ||
venue: Carnegie Mellon University | ||
names: Luke Guerdan | ||
author: Luke Guerdan | ||
tags: | ||
- NLP RG | ||
categories: | ||
- Reading-Group | ||
- Fall-2024 | ||
layout: archive | ||
classes: | ||
- wide | ||
- no-sidebar | ||
--- | ||
|
||
*{{ page.names }}* | ||
|
||
**{{ page.venue }}** | ||
|
||
{% include display-publication-links.html pub=page %} | ||
|
||
The [NLP Reading Group]({% link _pages/reading-group.md %}) is excited to host [Luke Guerdan](https://lukeguerdan.com/), a PhD student at CMU, who will be speaking remotely on Zoom on Friday November 29th about "Towards Principled Model Evaluation Under Imperfect "Ground Truth" Labels". | ||
|
||
|
||
## Talk Description | ||
|
||
In many evaluation contexts, “ground truth” labels are an imperfect proxy for the broader capabilities or limitations of interest—such as the “relevance” of retrieval augmented generation (RAG) outputs or the “toxicity” of chatbot responses. How can we conduct statistically rigorous and informative performance evaluations under an imperfect gold standard? | ||
|
||
In this talk, I begin by addressing this question in the context of predictive modeling for algorithmic decision support. I describe an approach that leverages structured human feedback in the form of expert anchor assumptions that better-connect observable proxy labels to unobservable constructs of interest. I validate this approach theoretically, and empirically demonstrate that measurement error modeling is critical for learning reliable models. I conclude by illustrating that a similar approach is necessary while evaluating LLMs under violations to the "gold labels’’ assumption. | ||
|
||
## Speaker Bio | ||
|
||
Luke Guerdan is a Ph.D. student in the Human-Computer Interaction Institute at Carnegie Mellon University. His research focuses on developing tools to evaluate the capabilities and limitations of human-algorithmic systems under imperfect labels. Luke's work has been recognized with an ACM FAccT Best Paper Award and an NSF Graduate Research Fellowship. | ||
|
||
## Logistics | ||
|
||
Date: November 29th<br> | ||
Time: 11:30AM <br> | ||
Location: Zoom (See email) |