fix: base k_aux on d_in instead of d_sae in topk aux loss #432

chanind · 2025-02-22T04:15:17Z

Description

This PR fixes a minor issue with our topk aux loss implementation, where we calculate k_aux using d_sae instead of the correct d_in. This likely doesn't make a huge difference in practice, but can't hurt to fix. This PR also calls detach() on the residual error before calculating topk aux loss, similar to dictionary_learning's implementation. This should help ensure that the aux loss only pulls dead latents towards the SAE error, and doesn't accidentally pull live latents towards dead latents.

This PR also adds a test asserting that our topk aux loss matches the sparsity implementation aside from a normalization factor.

As a side note, we should probably add an optional aux_coefficient to the config to further customize this, but this is something that probably makes sense to hold off on until the refactor is done.

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

You have tested formatting, typing and tests

I have run make check-ci to check format and linting. (you can run make format to format code if needed.)

anthonyduong9 · 2025-03-01T08:07:53Z

sae_lens/training/training_sae.py

+            residual = (sae_in - sae_out).detach()

            # Heuristic from Appendix B.1 in the paper
-            k_aux = hidden_pre.shape[-1] // 2
+            k_aux = sae_in.shape[-1] // 2


Since we're changing calculate_topk_aux_loss(), want to also refactor it to use an early return?

It'd be

def calculate_topk_aux_loss( self, sae_in: torch.Tensor, sae_out: torch.Tensor, hidden_pre: torch.Tensor, dead_neuron_mask: torch.Tensor | None, ) -> torch.Tensor: # Mostly taken from https://github.com/EleutherAI/sae/blob/main/sae/sae.py, except without variance normalization # NOTE: checking the number of dead neurons will force a GPU sync, so performance can likely be improved here if dead_neuron_mask is None or int(dead_neuron_mask.sum()) == 0: return sae_out.new_tensor(0.0) num_dead = int(dead_neuron_mask.sum()) residual = (sae_in - sae_out).detach() # Heuristic from Appendix B.1 in the paper k_aux = sae_in.shape[-1] // 2 # Reduce the scale of the loss if there are a small number of dead latents scale = min(num_dead / k_aux, 1.0) k_aux = min(k_aux, num_dead) auxk_acts = _calculate_topk_aux_acts( k_aux=k_aux, hidden_pre=hidden_pre, dead_neuron_mask=dead_neuron_mask, ) # Encourage the top ~50% of dead latents to predict the residual of the # top k living latents recons = self.decode(auxk_acts) auxk_loss = (recons - residual).pow(2).sum(dim=-1).mean() return scale * auxk_loss

, which has a lot less indentation/is a lot easier to read.

anthonyduong9 · 2025-03-01T08:10:11Z

tests/training/test_training_sae.py

+        d_in=128,
+        d_sae=192,


Want to extract all the 128s and 192s to variables d_in and d_sae, respectively?

anthonyduong9 · 2025-03-01T08:15:44Z

tests/training/test_training_sae.py

        normalize_sae_decoder=False,
    )

+    sae = TrainingSAE(TrainingSAEConfig.from_sae_runner_config(cfg))
+    comparison_sae = SparseCoder(d_in=128, cfg=SparseCoderConfig(num_latents=192, k=26))


I think sparse_coder might be a better name than comparison_sae as someone reading the test could quickly tell it's an instance of SparseCoder.

fix: base k_aux on d_in instead of d_sae in topk aux loss

7886501

chanind requested a review from anthonyduong9 February 22, 2025 04:15

detaching error before aux loss and fixing tests

3a6b750

anthonyduong9 approved these changes Mar 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: base k_aux on d_in instead of d_sae in topk aux loss #432

fix: base k_aux on d_in instead of d_sae in topk aux loss #432

chanind commented Feb 22, 2025 •

edited

Loading

anthonyduong9 Mar 1, 2025

anthonyduong9 Mar 1, 2025

anthonyduong9 Mar 1, 2025

fix: base k_aux on d_in instead of d_sae in topk aux loss #432

Are you sure you want to change the base?

fix: base k_aux on d_in instead of d_sae in topk aux loss #432

Conversation

chanind commented Feb 22, 2025 • edited Loading

Description

Type of change

Checklist:

You have tested formatting, typing and tests

anthonyduong9 Mar 1, 2025

Choose a reason for hiding this comment

anthonyduong9 Mar 1, 2025

Choose a reason for hiding this comment

anthonyduong9 Mar 1, 2025

Choose a reason for hiding this comment

chanind commented Feb 22, 2025 •

edited

Loading