-
Notifications
You must be signed in to change notification settings - Fork 148
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Move from manual sharding to HF dataset builder.
Depends on #389. Inspired by: https://opensourcemechanistic.slack.com/archives/C07EHMK3XC7/p1732413633220709 Instead of manually writing the single arrow shards, we can create a dataset builder that can do this more efficiently. This speeds up saving quite a lot, old method spent a some time calculating the fingerprint of the shard, which was unecessary and would require a hack to get around. > Along with this change, I also switched to a 1D activation scheme. - Previously the dataset was stored as a `(seq_len d_in)` array. - Now stored as a flat `d_in` Primary reason for this change is shuffling activations. I found that by using activations sequence, the activations are not properly shuffled. This is a problem with `ActivationCache` too but there's not a great solution for it there. You can observe this in the loss of the SAE by using small buffer sizes with either using cache or `ActivationStore`.
- Loading branch information
1 parent
dd09264
commit a1da04c
Showing
4 changed files
with
149 additions
and
277 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.