People

Please add your name and an intro. (Let's order alphabetically by last name)

David Atkinson. PhD student at Northeastern University's Interpretable Deep Learning Lab, interested in mechanistic interpretability.
David Bau. Northeastern University Professor, directs the Interpretable Deep Learning Lab and NDIF.
Jannik Brinkmann. PhD student interested in mechanistic interpretability. Previously visiting student at David Bau's Interpretable Deep Learning Lab.
Katrina Brown. BA/MS student at Harvard interested in control, fairness, representations.
Niv Cohen. Research Scientist (Postdoc) at New York University. Interested in AI Safety, anomaly detection, and disentanglement.
Trevor DePodesta. PhD student at Harvard Insight+Interaction Lab. Interested in interpretability for ethical Human-AI Interaction.
Clément Dumas, Neel Nanda MATS stream (prev EPFL), interested in applying model diffing to AI safety/
Matthew Kowal. Researcher at FAR AI, and PhD candidate at York University (Toronto). Interested in both theoretical aspects and practical applications of interpretability and AI Safety.
Andrew Lee. At Harvard Insight+Interaction Lab. Post-doc fellow, interested in neural network representations!
Victoria Li. Harvard BA/MS student interested in interp/representations/control!
Can Rager. Incoming PhD student at David Bau's Interpretable Deep Learning Lab. Interested in AI auditing and mechanistic interpretability.
Shivam Raval. PhD student in Physics at Harvard Insight+Interaction Lab. Interested in explaining and visualizing clustering structures in high-dimensional data and interpreting latent activations in frontier AI models.
Naomi Saphra. Kempner Research Fellow at Harvard. Interested in understanding how reasoning develops and detecting its failure modes through internal representations.
Sigurd Schacht. COAI Research - AI Safety, Interpretability. Interested in understanding reasoning models - especially reasoning in latent space and behavior analysis.
Kunvar Thaman. Standard Intelligence. Macnine learning engineer - focused on architecture search and meta learning research. Also excited about mech interp and learning interesting representations in NNs.
Dmitrii Troitskii. Independent (Previously @ NDIF and BauLab). Interested in more rigorous approaches towards interpretability.
Fernanda Viégas. At Harvard Insight+Interaction Lab. Interested in AI interpretability in general and, more specifically, in finding useful ways to bring interpretability to Human-AI Interaction.
Martin Wattenberg. At Harvard Insight+Interaction Lab. Interested in geometric approaches to interpretability, and ways for people to control AI output
Melanie Weber. At Harvard Geometric Machine Learning Group. Interested in leveraging geometric structure in data for the design of efficient and interpretable machine learning methods.
Chris Wendler. At Northeastern University's Interpretable Deep Learning Lab. Post-doc interested in deep learning and mechanistic interpretability
Brian Zhou. BA/MS student at Harvard interested in interpretability and cognition.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

People

People

Clone this wiki locally