Compute activations other than MLP neurons #6

tim-lawson · 2024-12-30T12:49:41Z

FYI: I don't know whether you're open to contributions, but someone might find this helpful.

Modify the activations lib and expgen project to generate exemplars for activations other than MLP neurons -- the residual stream, MLP in/outs, and self-attention outs. These match the Subject collect_acts method, excluding self-attention maps (different shape) and unembed in/outs (no layer index).

Follows the recommendation in project/expgen/README.md by adding to get_activations_computing_func -- specifically, introduces an ActivationType enum which determines access paths for subject -> component and component -> activations.

Adds an activation-type suffix to the exemplars folder name, except when the activation type is "neurons," to preserve current behavior. Similarly, adds a command-line argument to the expgen compute_exemplars.py script with default value "neurons".

Caveat: not tested with the rest of the expgen pipeline yet.

choidami · 2024-12-31T03:08:32Z

Thank you for implementing this! We are totally open to contributions (it's precisely why we open sourced our code!).
I'll merge the changes in once I test the integration with the expgen pipeline.

tim-lawson · 2025-01-02T15:23:35Z

You're welcome! Thanks for open-sourcing it. There's no rush to merge; if you find there are incompatibilities with the pipeline as-is, I can revisit the changes.

tim-lawson added 2 commits December 30, 2024 12:47

handle other activation types

fe95da3

add enum and refactor comp funcs

309ba42

tzengtif assigned choidami Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute activations other than MLP neurons #6

Compute activations other than MLP neurons #6

tim-lawson commented Dec 30, 2024

choidami commented Dec 31, 2024

tim-lawson commented Jan 2, 2025

Compute activations other than MLP neurons #6

Are you sure you want to change the base?

Compute activations other than MLP neurons #6

Conversation

tim-lawson commented Dec 30, 2024

choidami commented Dec 31, 2024

tim-lawson commented Jan 2, 2025