You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When training SAE on Gemma-2b layer 12, the L0 values remain unusually high (>1500) and are largely unresponsive to L0 lambda parameter adjustments. This behavior persists across various hyperparameter configurations:
L0 lambda variations (50, 100, 500) show minimal impact on L0 values
MSE losses remain around 110 at 600k steps (≈2.5B tokens)
Even with significantly reduced learning rate (7e-8), L0 values still exceed 1500 after 2.5B tokens
Bandwidth and initial threshold adjustments showed limited effect on controlling L0
Code example
Training config as follow, simply sweep l1_coefficient, I tried 1e-3, 1, 5, 5k, 500k:
50, 500 and 50k is also tried but I accidentally delete these runs log
System Info
Describe the characteristic of your environment:
All library was installed using uv, pyproject.toml is shown as below:
Note: I also tried previous version of sae-lens (4.3.4, 4.3.5)
@muyo8692 In this moment, Have you found any solutions related to this issue?
I have a similar problem that L0 does not converge into reasonable level (<= 150).
Describe the bug
When training SAE on Gemma-2b layer 12, the L0 values remain unusually high (>1500) and are largely unresponsive to L0 lambda parameter adjustments. This behavior persists across various hyperparameter configurations:
Code example
Training config as follow, simply sweep l1_coefficient, I tried 1e-3, 1, 5, 5k, 500k:
50, 500 and 50k is also tried but I accidentally delete these runs log
System Info
Describe the characteristic of your environment:
All library was installed using uv, pyproject.toml is shown as below:
Note: I also tried previous version of sae-lens (4.3.4, 4.3.5)
Run log


Checklist
The text was updated successfully, but these errors were encountered: