apartresearch

All

35 repositories

seqcont_circuits
Public
✱ Interpreting how similar sequence continuation tasks share internal representations ✱
Jupyter Notebook
•
MIT License
•0•1•0•0•Updated Sep 20, 2024Sep 20, 2024
hackathon-utils
Public
😎 Code to run hackathons efficiently
HTML
•
MIT License
•0•1•0•0•Updated Sep 4, 2024Sep 4, 2024
ICML2024MI
Public
🌍 Website for NeurIPS2023MI
CSS
•2•1•0•0•Updated Aug 19, 2024Aug 19, 2024
Integer_Addition
Public
✱ Understanding the underlying learning dynamics of simple tasks in Transformer networks
Jupyter Notebook
•
MIT License
•1•12•0•0•Updated Aug 16, 2024Aug 16, 2024
Probing-Learned-Feedback-Patterns
Public
✱ Interpreting implicit reward models learnt in RLHF using sparse autoencoders.
Jupyter Notebook
•
MIT License
•2•1•7•0•Updated Aug 7, 2024Aug 7, 2024
Research-Augmentation-Hackbook
Public
Python
•0•5•0•0•Updated Jul 19, 2024Jul 19, 2024
evaluations-starter
Public
How to get started in evaluations and demonstrations research for dangerous capabilities
MIT License
•1•5•1•0•Updated May 24, 2024May 24, 2024
deepdecipher
Public
🦠 DeepDecipher: An open source API to MLP neurons
api website machine-learning research academic interpretability interpretability-methods interpretability-jam mechanistic-interpretability
Rust
•
MIT License
•0•9•46•0•Updated May 2, 2024May 2, 2024
readingwhatwecan
Public
📚📚📚📚📚📚📚📚📚 Reading everything
CSS
•3•12•0•0•Updated Apr 21, 2024Apr 21, 2024
scale-llm-24
Public
🌍 Website for the Scaling Laws workshop
CSS
•2•1•0•0•Updated Mar 22, 2024Mar 22, 2024
.github
Public
0•0•0•0•Updated Mar 14, 2024Mar 14, 2024
task-standard
Public
🚨 METR Task Standard fork for the Code Red Hackathon
TypeScript
•28•1•0•0•Updated Feb 29, 2024Feb 29, 2024
Verified_addition
Public
Jupyter Notebook
•0•1•0•0•Updated Feb 6, 2024Feb 6, 2024
specificityplus
Public
👩‍💻 Code for the ACL paper "Detecting Edit Failures in LLMs: An Improved Specificity Benchmark"
benchmarking llm
Python
•
Other
•3•20•1•1•Updated Jan 19, 2024Jan 19, 2024
open
Public
🌍 Repository to update our open data
MIT License
•0•0•0•0•Updated Nov 30, 2023Nov 30, 2023
Apart-Evals
Public
0•0•0•0•Updated Oct 28, 2023Oct 28, 2023
Neuron2Graph
Public
Tools for exploring Transformer neuron behaviour, including input pruning and diversification.
Jupyter Notebook
•
Apache License 2.0
•5•19•1•0•Updated Sep 28, 2023Sep 28, 2023
aisafetyideas
Public
💡 The web app CI/CD for aisafetyideas.com
Svelte
•3•8•20•1•Updated Sep 25, 2023Sep 25, 2023
n2g
Public archive
Tools for exploring Transformer neuron behaviour, including input pruning and diversification.
Jupyter Notebook
•
Apache License 2.0
•5•1•0•0•Updated Aug 9, 2023Aug 9, 2023
interpretability-starter
Public
🧠 Starter templates for doing interpretability research
interpretability interpretability-jam alignment-jam mechanistic-interpretability
1•60•0•0•Updated Jul 16, 2023Jul 16, 2023
AIS-cost-effectiveness
Public
Cost-effectiveness models, tools, and results for various AI safety field-building programs.
Python
•
MIT License
•4•2•0•0•Updated Jul 15, 2023Jul 15, 2023
paper-website
Public
🌍 Website template for academic papers
template academic-website website-template
JavaScript
•
MIT License
•0•0•0•0•Updated Jun 9, 2023Jun 9, 2023
othelloscope
Public
Interpretability Hackathon 2.0 entry
Jupyter Notebook
•
MIT License
•39•2•1•0•Updated Apr 28, 2023Apr 28, 2023
town_hall_avatar
Public
Uses ChatGPT to simulate a townhall discussion between avatars
Python
•1•0•0•0•Updated Apr 3, 2023Apr 3, 2023
GPT-4-Chat-UI
Public
GPT-4 frontend with open source Next.js template.
JavaScript
•
MIT License
•4•0•0•0•Updated Mar 22, 2023Mar 22, 2023
scheduling-widget
Public
📆 Showcases specific times in local time zones
HTML
•0•2•0•0•Updated Feb 3, 2023Feb 3, 2023
mechanisticinterpretability
Public
A repository for awesome resources in mechanistic interpretability
0•4•0•0•Updated Jan 18, 2023Jan 18, 2023
ai-psychology-starter
Public
Code templates to get started as an AI psychologist
Jupyter Notebook
•0•4•0•0•Updated Oct 31, 2022Oct 31, 2022
empathetic-ai
Public
🤖 A systematic review on how to create empathetic AI
TeX
•0•1•0•0•Updated Oct 14, 2022Oct 14, 2022
blackbox-psych
Public
Conducting psychology experiments on black box language models
HTML
•0•1•0•0•Updated Oct 6, 2022Oct 6, 2022