[Proposal] Optimize decoding in training using torch.nn.functional.embedding_bag #428

chanind · 2025-02-16T03:49:46Z

Proposal

Due to the sparsity of SAE activations, we don't need to fully multiply the SAE hidden activations by the decoder weights during decoding since most of the activations are zero. It may be a significant performance improvement to use torch.nn.functional.embedding_bag during decoding to replace acts @ sae.W_dec. We should benchmark this to see if it is in-fact a performance improvment. It could be that finding all the non-zero activation locations is more expensive than just running the standard decoding, for example. Likely this will be an improvement for topk SAEs at the very least.

Checklist

I have checked that there is no similar issue in the repo (required)

The text was updated successfully, but these errors were encountered:

chanind · 2025-02-21T00:28:32Z

I tried getting this working here: https://colab.research.google.com/drive/1uOc0ggPBV9VIRlw8pEDvOl9hnNauxqit

embedding_bag takes up a ton of memory when the SAE isn't sparse, so doesn't seem suitable for non-topk SAEs. Even with topk SAEs, I get a memory access violation after it runs for a few minutes on a backwards pass, so not sure what's up with that 🤔

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Optimize decoding in training using torch.nn.functional.embedding_bag #428

[Proposal] Optimize decoding in training using torch.nn.functional.embedding_bag #428

chanind commented Feb 16, 2025

chanind commented Feb 21, 2025 •

edited

Loading

[Proposal] Optimize decoding in training using torch.nn.functional.embedding_bag #428

[Proposal] Optimize decoding in training using torch.nn.functional.embedding_bag #428

Comments

chanind commented Feb 16, 2025

Proposal

Checklist

chanind commented Feb 21, 2025 • edited Loading

chanind commented Feb 21, 2025 •

edited

Loading