Merge branch 'main' of https://github.com/multimodal-interpretability…

…/maia-draft
multimodal-interpretability · Apr 18, 2024 · dd7299f · dd7299f
2 parents 8de383c + 75ed913
commit dd7299f
Show file tree

Hide file tree

Showing 3 changed files with 18 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -1 +1,12 @@
-MAIA
+# A Multimodal Automated Interpretability Agent #
+
+### [Project Page](https://multimodal-interpretability.csail.mit.edu/maia) | [Arxiv](https://multimodal-interpretability.csail.mit.edu/maia)
+
+<img align="right" width="42%" src="/docs/static/figures/maia_teaser.jpg">
+
+[Tamar Rott Shaham](https://tamarott.github.io/)\*, [Sarah Schwettmann](https://cogconfluence.com/)\*, <br>
+[Franklin Wang](https://frankxwang.github.io/), [Achyuta Rajaram](https://twitter.com/AchyutaBot), [Evan Hernandez](https://evandez.com/), [Jacob Andreas](https://www.mit.edu/~jda/), [Antonio Torralba](https://groups.csail.mit.edu/vision/torralbalab/) <br>
+\*equal contribution <br><br>
+**This repo is under active development, and the MAIA codebase will be released in the coming weeks. Sign up for updates by email using [this google form](https://forms.gle/Zs92DHbs3Y3QGjXG6).**
+
+MAIA is a system that uses neural models to automate neural model understanding tasks like feature interpretation and failure mode discovery. It equips a pre-trained vision-language model with a set of tools that support iterative experimentation on subcomponents of other models to explain their behavior. These include tools commonly used by human interpretability researchers: for synthesizing and editing inputs, computing maximally activating exemplars from real-world datasets, and summarizing and describing experimental results. Interpretability experiments proposed by MAIA compose these tools to describe and explain system behavior.
diff --git a/docs/index.html b/docs/index.html
@@ -81,7 +81,7 @@ <h1 class="title is-1 publication-title">A Multimodal Automated Interpretability
             <div class="publication-links">
               <!-- PDF Link. -->
               <span class="link-block">
-                <a href="https://arxiv.org/pdf/2309.03886.pdf"
+                <a href="http://www.cogconfluence.com/wp-content/uploads/2024/04/MAIA_preprint_04_17_2024-1.pdf"
                    class="external-link button is-normal is-rounded is-dark">
                   <span class="icon">
                       <i class="fas fa-file-pdf"></i>
@@ -90,7 +90,7 @@ <h1 class="title is-1 publication-title">A Multimodal Automated Interpretability
                 </a>
               </span>
               <span class="link-block">
-		<a href="https://arxiv.org/abs/2309.03886"
+		<a href="https://multimodal-interpretability.csail.mit.edu/maia/"
                    class="external-link button is-normal is-rounded is-dark">
                   <span class="icon">
                       <i class="ai ai-arxiv"></i>
@@ -131,7 +131,7 @@ <h1 class="title is-1 publication-title">A Multimodal Automated Interpretability
     <div class="container is-max-desktop">
         <div style="text-align: justify;">
 		<p><h3 class="title is-4">How can AI systems help us understand other AI systems?</h3></p>
-		<p>Interpretability Agents automate the process of scientific experimentation to answer user queries about trained models. See the <a href="https://multimodal-interpretability.csail.mit.edu/maia/experiment-browser/" target="_blank">experiment browser/</a> for more experiments.</p>
+		<p>Interpretability Agents automate aspects of scientific experimentation to answer user queries about trained models. See the <a href="https://multimodal-interpretability.csail.mit.edu/maia/experiment-browser/" target="_blank">experiment browser</a> for more experiments.</p>
 	</div>
     </div>
 </section>
@@ -157,9 +157,9 @@ <h1 class="title is-1 publication-title">A Multimodal Automated Interpretability
     <div class="container is-max-desktop">
         <div style="text-align: justify;">
 <!-- 		<p><h3 class="title is-4">How can AI systems help us understand other AI systems?</h3></p> -->
-		<p>Understanding a neural model can take many forms. For instance, we might want to know when and how the system relies on sensitive or spurious features, identify systematic errors in its predictions, or learn how to modify the training data and model architecture to improve accuracy and robustness. Today, answering these types of questions often involves significant human effort—researchers must formalize their question, formulate hypotheses about a model’s decision-making process, design datasets on which to evaluate model behavior, then use these datasets to refine and validate hypotheses. As a result, this type of understanding is slow and expensive to obtain, even about the most widely used models.</p><br>
+		<p>Understanding of a neural model can take many forms. For instance, we might want to know when and how the system relies on sensitive or spurious features, identify systematic errors in its predictions, or learn how to modify the training data and model architecture to improve accuracy and robustness. Today, answering these types of questions often involves significant human effort—researchers must formalize their question, formulate hypotheses about a model’s decision-making process, design datasets on which to evaluate model behavior, then use these datasets to refine and validate hypotheses. As a result, this type of understanding is slow and expensive to obtain, even about the most widely used models.</p><br>
 		<p>Automated Interpretability approaches have begun to address the issue of scale. Recently, such approaches have used pretrained language models like GPT-4 (in <a href="https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html" target="_blank">Bills et al. 2023</a>) or Claude (in <a href="https://transformer-circuits.pub/2023/monosemantic-features" target="_blank">Bricken et al. 2023</a>) to generate feature explanations. In earlier work, we introduced MILAN (<a href="https://arxiv.org/abs/2201.11114" target="_blank">Hernandez et al. 2022</a>), a captioner model trained on human feature annotations that takes as input a feature visualization and outputs a description of that feature. But automated approaches that use learned models to label features leave something to be desired: they are primarily tools for one-shot hypothesis generation (<a href="https://arxiv.org/abs/2309.10312" target="_blank">Huang et al. 2023</a>) rather than causal explanation, they characterize behavior on a limited set of inputs, and they are often low precision.</p><br>
-		<!-- <p>Our current line of research aims to build tools that help users understand models, while combining the flexibility of human experimentation with the scalability of automated techniques. In <a href="https://arxiv.org/abs/2309.03886" target="_blank">Schwettmann et al. 2023</a>, we introduced the <em>Automated Interpretability Agent</em> (AIA) paradigm, where an LM-based agent interactively probes systems to explain their behavior. We now introduce a multimodal AIA, with a vision-language model backbone and an API of tools for designing experiments on other systems. With simple modifications to the user query to the agent, the same modular system can field both "macroscopic" questions like identifying systematic biases in model predictions (see the tench example above), as well as "microscopic" questions like describing individual features (see example below).</p><br> -->
+		<!-- <p>Our current line of research aims to build tools that help users understand models, while combining the flexibility of human experimentation with the scalability of automated techniques. In <a href="https://arxiv.org/abs/2309.03886" target="_blank">Schwettmann et al. 2023</a>, we introduced the <em>Automated Interpretability Agent</em> (AIA) paradigm, where an LM-based agent interactively probes systems to explain their behavior. We now introduce a multimodal AIA, with a vision-language model backbone and an API of tools for designing experiments on other systems. With simple modifications to the user query, the same modular system can field both "macroscopic" questions like identifying systematic biases in model predictions (see the tench example above), as well as "microscopic" questions like describing individual features (see example below).</p><br> -->
 		<p>Our current line of research aims to build tools that help users understand models, while combining the flexibility of human experimentation with the scalability of automated techniques. We introduce the <b>M</b>ultimodal <b>A</b>utomated <b>I</b>nterpretability <b>A</b>gent (MAIA), which designs experiments to answer user queries about components of AI systems. MAIA iteratively generates hypotheses, runs experiments that test these hypotheses, observes experimental outcomes, and updates hypotheses until it can answer the user query. MAIA builds on the <em>Automated Interpretability Agent</em> (AIA) paradigm we introduced in <a href="https://arxiv.org/abs/2309.03886" target="_blank">Schwettmann et al. 2023</a>, where an LM-based agent interactively probes systems to explain their behavior. MAIA is equipped with a vision-language model backbone and an API of <a href="#tools-description">tools</a> for designing interpretability experiments. With simple modifications to the user query to the agent, the same modular system can field both "macroscopic" questions like identifying systematic biases in model predictions (see the tench example above), as well as "microscopic" questions like describing individual features (see example below).</p><br>
 	</div>
     </div>
@@ -206,7 +206,7 @@ <h2 class="title is-3 has-text-centered">MAIA</h2>
     <div class="hero-body">
       <h2 class="title is-3">MAIA uses tools to design experiments on other systems</h2>
       <div class="content" style="text-align: justify;">
-        <p>MAIA composes interpretability subroutines into python programs to answer user queries about a system. What kind of experiments does MAIA design? Below we highlight example usage of individual tools to run experiments on neurons inside common vision architectures (CLIP, ResNet, DINO). These are experimental excerpts intended to demonstrate tool use (often, MAIA runs many more experiments to reach its final conclusion!) For full experiment logs, check out our interactive <a href="https://multimodal-interpretability.csail.mit.edu/maia/experiment-browser/" target="_blank">experiment browser/</a>. </p>
+        <p>MAIA composes interpretability subroutines into python programs to answer user queries about a system. What kind of experiments does MAIA design? Below we highlight example usage of individual tools to run experiments on neurons inside common vision architectures (CLIP, ResNet, DINO). These are experimental excerpts intended to demonstrate tool use (often, MAIA runs many more experiments to reach its final conclusion!) For full experiment logs, check out our interactive <a href="https://multimodal-interpretability.csail.mit.edu/maia/experiment-browser/" target="_blank">experiment browser</a>. </p>
       </div>
       <div class="content" style="text-align: justify;">
       <h3 class="subtitle is-5" style="text-align: justify;">Visualizing Dataset Exemplars</h3>

diff --git a/docs/static/figures/maia_teaser.jpg b/docs/static/figures/maia_teaser.jpg