-
Notifications
You must be signed in to change notification settings - Fork 0
People
shivam-raval96 edited this page Mar 24, 2025
·
28 revisions
Please add your name and an intro. (Let's order alphabetically by last name)
- David Atkinson. PhD student at Northeastern University's Interpretable Deep Learning Lab, interested in mechanistic interpretability.
- David Bau. Northeastern University Professor, directs the Interpretable Deep Learning Lab and NDIF.
- Jannik Brinkmann. PhD student interested in mechanistic interpretability. Previously visiting student at David Bau's Interpretable Deep Learning Lab.
- Katrina Brown. BA/MS student at Harvard interested in control, fairness, representations.
- Niv Cohen. Research Scientist (Postdoc) at New York University. Interested in AI Safety, anomaly detection, and disentanglement.
- Trevor DePodesta. PhD student at Harvard Insight+Interaction Lab. Interested in interpretability for ethical Human-AI Interaction.
- Clément Dumas, Neel Nanda MATS stream (prev EPFL), interested in applying model diffing to AI safety/
- Matthew Kowal. Researcher at FAR AI, and PhD candidate at York University (Toronto). Interested in both theoretical aspects and practical applications of interpretability and AI Safety.
- Andrew Lee. At Harvard Insight+Interaction Lab. Post-doc fellow, interested in neural network representations!
- Victoria Li. Harvard BA/MS student interested in interp/representations/control!
- Can Rager. Incoming PhD student at David Bau's Interpretable Deep Learning Lab. Interested in AI auditing and mechanistic interpretability.
- Shivam Raval. PhD student in Physics at Harvard Insight+Interaction Lab. Interested in explaining and visualizing clustering structures in high-dimensional data and interpreting latent activations in frontier AI models.
- Naomi Saphra. Kempner Research Fellow at Harvard. Interested in understanding how reasoning develops and detecting its failure modes through internal representations.
- Sigurd Schacht. COAI Research - AI Safety, Interpretability. Interested in understanding reasoning models - especially reasoning in latent space and behavior analysis.
- Kunvar Thaman. Standard Intelligence. Macnine learning engineer - focused on architecture search and meta learning research. Also excited about mech interp and learning interesting representations in NNs.
- Dmitrii Troitskii. Independent (Previously @ NDIF and BauLab). Interested in more rigorous approaches towards interpretability.
- Fernanda Viégas. At Harvard Insight+Interaction Lab. Interested in AI interpretability in general and, more specifically, in finding useful ways to bring interpretability to Human-AI Interaction.
- Martin Wattenberg. At Harvard Insight+Interaction Lab. Interested in geometric approaches to interpretability, and ways for people to control AI output
- Melanie Weber. At Harvard Geometric Machine Learning Group. Interested in leveraging geometric structure in data for the design of efficient and interpretable machine learning methods.
- Chris Wendler. At Northeastern University's Interpretable Deep Learning Lab. Post-doc interested in deep learning and mechanistic interpretability
- Brian Zhou. BA/MS student at Harvard interested in interpretability and cognition.