Skip to content
Change the repository type filter

All

    Repositories list

    • ✱ Interpreting how similar sequence continuation tasks share internal representations ✱
      Jupyter Notebook
      MIT License
      0100Updated Sep 20, 2024Sep 20, 2024
    • 😎 Code to run hackathons efficiently
      HTML
      MIT License
      0100Updated Sep 4, 2024Sep 4, 2024
    • 🌍 Website for NeurIPS2023MI
      CSS
      2100Updated Aug 19, 2024Aug 19, 2024
    • ✱ Understanding the underlying learning dynamics of simple tasks in Transformer networks
      Jupyter Notebook
      MIT License
      11200Updated Aug 16, 2024Aug 16, 2024
    • ✱ Interpreting implicit reward models learnt in RLHF using sparse autoencoders.
      Jupyter Notebook
      MIT License
      2170Updated Aug 7, 2024Aug 7, 2024
    • Python
      0500Updated Jul 19, 2024Jul 19, 2024
    • How to get started in evaluations and demonstrations research for dangerous capabilities
      MIT License
      1510Updated May 24, 2024May 24, 2024
    • 🦠 DeepDecipher: An open source API to MLP neurons
      Rust
      MIT License
      09460Updated May 2, 2024May 2, 2024
    • 📚📚📚📚📚📚📚📚📚 Reading everything
      CSS
      31200Updated Apr 21, 2024Apr 21, 2024
    • 🌍 Website for the Scaling Laws workshop
      CSS
      2100Updated Mar 22, 2024Mar 22, 2024
    • .github

      Public
      0000Updated Mar 14, 2024Mar 14, 2024
    • 🚨 METR Task Standard fork for the Code Red Hackathon
      TypeScript
      28100Updated Feb 29, 2024Feb 29, 2024
    • Jupyter Notebook
      0100Updated Feb 6, 2024Feb 6, 2024
    • 👩‍💻 Code for the ACL paper "Detecting Edit Failures in LLMs: An Improved Specificity Benchmark"
      Python
      Other
      32011Updated Jan 19, 2024Jan 19, 2024
    • open

      Public
      🌍 Repository to update our open data
      MIT License
      0000Updated Nov 30, 2023Nov 30, 2023
    • 0000Updated Oct 28, 2023Oct 28, 2023
    • Tools for exploring Transformer neuron behaviour, including input pruning and diversification.
      Jupyter Notebook
      Apache License 2.0
      51910Updated Sep 28, 2023Sep 28, 2023
    • 💡 The web app CI/CD for aisafetyideas.com
      Svelte
      38201Updated Sep 25, 2023Sep 25, 2023
    • n2g

      Public archive
      Tools for exploring Transformer neuron behaviour, including input pruning and diversification.
      Jupyter Notebook
      Apache License 2.0
      5100Updated Aug 9, 2023Aug 9, 2023
    • 🧠 Starter templates for doing interpretability research
      16000Updated Jul 16, 2023Jul 16, 2023
    • Cost-effectiveness models, tools, and results for various AI safety field-building programs.
      Python
      MIT License
      4200Updated Jul 15, 2023Jul 15, 2023
    • 🌍 Website template for academic papers
      JavaScript
      MIT License
      0000Updated Jun 9, 2023Jun 9, 2023
    • Interpretability Hackathon 2.0 entry
      Jupyter Notebook
      MIT License
      39210Updated Apr 28, 2023Apr 28, 2023
    • Uses ChatGPT to simulate a townhall discussion between avatars
      Python
      1000Updated Apr 3, 2023Apr 3, 2023
    • GPT-4 frontend with open source Next.js template.
      JavaScript
      MIT License
      4000Updated Mar 22, 2023Mar 22, 2023
    • 📆 Showcases specific times in local time zones
      HTML
      0200Updated Feb 3, 2023Feb 3, 2023
    • A repository for awesome resources in mechanistic interpretability
      0400Updated Jan 18, 2023Jan 18, 2023
    • Code templates to get started as an AI psychologist
      Jupyter Notebook
      0400Updated Oct 31, 2022Oct 31, 2022
    • 🤖 A systematic review on how to create empathetic AI
      TeX
      0100Updated Oct 14, 2022Oct 14, 2022
    • Conducting psychology experiments on black box language models
      HTML
      0100Updated Oct 6, 2022Oct 6, 2022