Skip to content
shivam-raval96 edited this page Mar 24, 2025 · 28 revisions

People

Please add your name and an intro. (Let's order alphabetically by last name)

  • David Atkinson. PhD student at Northeastern University's Interpretable Deep Learning Lab, interested in mechanistic interpretability.
  • David Bau. Northeastern University Professor, directs the Interpretable Deep Learning Lab and NDIF.
  • Jannik Brinkmann. PhD student interested in mechanistic interpretability. Previously visiting student at David Bau's Interpretable Deep Learning Lab.
  • Katrina Brown. BA/MS student at Harvard interested in control, fairness, representations.
  • Niv Cohen. Research Scientist (Postdoc) at New York University. Interested in AI Safety, anomaly detection, and disentanglement.
  • Trevor DePodesta. PhD student at Harvard Insight+Interaction Lab. Interested in interpretability for ethical Human-AI Interaction.
  • Clément Dumas, Neel Nanda MATS stream (prev EPFL), interested in applying model diffing to AI safety/
  • Matthew Kowal. Researcher at FAR AI, and PhD candidate at York University (Toronto). Interested in both theoretical aspects and practical applications of interpretability and AI Safety.
  • Andrew Lee. At Harvard Insight+Interaction Lab. Post-doc fellow, interested in neural network representations!
  • Victoria Li. Harvard BA/MS student interested in interp/representations/control!
  • Can Rager. Incoming PhD student at David Bau's Interpretable Deep Learning Lab. Interested in AI auditing and mechanistic interpretability.
  • Shivam Raval. PhD student in Physics at Harvard Insight+Interaction Lab. Interested in explaining and visualizing clustering structures in high-dimensional data and interpreting latent activations in frontier AI models.
  • Naomi Saphra. Kempner Research Fellow at Harvard. Interested in understanding how reasoning develops and detecting its failure modes through internal representations.
  • Sigurd Schacht. COAI Research - AI Safety, Interpretability. Interested in understanding reasoning models - especially reasoning in latent space and behavior analysis.
  • Kunvar Thaman. Standard Intelligence. Macnine learning engineer - focused on architecture search and meta learning research. Also excited about mech interp and learning interesting representations in NNs.
  • Dmitrii Troitskii. Independent (Previously @ NDIF and BauLab). Interested in more rigorous approaches towards interpretability.
  • Fernanda Viégas. At Harvard Insight+Interaction Lab. Interested in AI interpretability in general and, more specifically, in finding useful ways to bring interpretability to Human-AI Interaction.
  • Martin Wattenberg. At Harvard Insight+Interaction Lab. Interested in geometric approaches to interpretability, and ways for people to control AI output
  • Melanie Weber. At Harvard Geometric Machine Learning Group. Interested in leveraging geometric structure in data for the design of efficient and interpretable machine learning methods.
  • Chris Wendler. At Northeastern University's Interpretable Deep Learning Lab. Post-doc interested in deep learning and mechanistic interpretability
  • Brian Zhou. BA/MS student at Harvard interested in interpretability and cognition.