-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] fail to reproduce the GPT-2-small SAE with provided hyperparameters #405
Comments
Are you running locally or on Google Colab? If you're running locally, what version of Python are you using? You should be using Python 3.10.14. I accidentally used Python 3.10.16 (which is the latest release of Python 3.10) and as a result, the attention on the feature dashboard was wrong. Maybe you're encountering a similar issue |
The default hyperparameters have shifted since I trained those. I think the thing that's most likely going wrong is you might not be using the mse loss calculation I used. When I trained them I had some weird variation that does help in some cases. Also, I loaded the models from tlens with pre-processing including residual stream centring. I'm not sure if those factors matter a lot but they might. Sorry for slow responses and very brief response (pretty busy at my new job). |
@NatanFreeman I'm running locally and Using Python 3.10.14 does not help to solve the issue. but Thanks for your advice! |
@jbloom-aisi I’ve read your post on LessWrong, but can't find information about certain details like MSE loss calculation, pre-processing steps, or whether residual stream centering was applied. Since these details aren’t fully covered in the post, Would you be able to share any insights or repositories you used to train SAEs? Any guidance would be greatly appreciated! Thanks in advance! |
@cwyoon-99 just as well. Turns out my issue had nothing to do with my Python installation and was simply a misunderstanding of neuronpedia's interface |
@NatanFreeman curious what the misunderstanding was of Neuronpedia's interface? my guess is that we should be making it clearer somehow - if it was confusing to you, it was likely confusing to other as well. |
@hijohnnylin the interface for calculating the activations of a specific neuron confused me. There's a feature which allows you to copy and analyzed an example text. When you have it set to only show the snippet with the activations, only the snippet is copied and analyzed. This causes a mismatch between the activations because in the example text the whole text was analyzed, whereas in the copied text only the snippet is analyzed. Here's an example on my phone, I'm not in front of my computer right now |
Questions
Hi. I'm trying to reproduce the results of GPT-2 SAE blocks.8.hook_resid_pre (https://huggingface.co/jbloom/GPT2-Small-SAEs-Reformatted/tree/main/blocks.8.hook_resid_pre).
However, even though I use exactly the same hyperparameters, I can't achieve a similar performance to the uploaded hf version.
(the main metrics I monitor are L0, explained_variance, and log10_feature_density histogram)
How can I achieve this?
The l1_coefficient is set to 8e-05 in cfg.json, but I’ve read comments suggesting that a value closer to 1.0 might be more effective (see: GitHub comment). Is it possible that the learning rate calculation method has changed since then?
Or any other suggestions?
I tried a lot of combinations of l1_coefficient (8e-5, 8e-3, 8e-3, 0.1, 1.0, 2.0, 4.0) and lr (4e-4, 5e-5, 1e-5),
but can't even get decent SAE (log10_feature density is too sparse or too dense)
hyperparmeters
Mine
jbloom/GPT2-Small-SAEs-Reformatted
log10 feature_density
mine
jbloom/GPT2-Small-SAEs-Reformatted
The text was updated successfully, but these errors were encountered: