Skip to content

Releases: adamkarvonen/SAEBench

v0.4.0

22 Feb 02:39
Compare
Choose a tag to compare

v0.4.0 (2025-02-22)

Chore

  • chore: making test less flaky (3effa18)

  • chore: fix updated torch types (4c46da6)

  • chore: fixing linting errors and adding precommit hook (85f6241)

Feature

  • feat: allow setting the artifacts path (2a4b4dc)

Fix

  • fix: gracefully handle slashes in model filename for autointerp (5d6464a)

  • fix: fix typing and updating mdl for saelens >=5.4.0 (802d1c3)

  • fix: load probe class with weights_only = False (f05bf40)

  • fix: Update README to include eval output schema update instructions (f0adee2)

  • fix: Update json schema jsons (2b2a6d3)

Unknown

  • Merge pull request #60 from chanind/deflaking-test

chore: making test less flaky (963f2e8)

  • Remove threshold from state dict if we aren't using it (d91a218)

  • Merge pull request #59 from chanind/artifacts-path-option

feat: allow setting the artifacts path (53901a2)

  • Merge pull request #58 from chanind/fixing-types

chore: fix updated torch types (849018f)

  • Merge pull request #57 from chanind/fix-slash-in-model-name-autointerp

fix: gracefully handle slashes in model filename for autointerp (11b2e38)

  • adding artifacts_path to unlearning eval (ce1de32)

  • By default we don't use a threshold for custom topk SAEs (60579ed)

  • Merge pull request #56 from chanind/type-fixes

fix: fix typing and updating mdl for saelens >=5.4.0 (0888d07)

  • Merge pull request #55 from chanind/precommit-check

chore: fixing linting errors and adding precommit hook (7ac7ced)

  • Fix SAE Bench SAEs repo names (18dc457)

  • Prevent potential division by zero (92315dd)

  • Add optional pinned dependencies (e74f0cf)

  • Calculate featurewise statistics in demo (5204b48)

  • Improve documentation on custom SAE usage (f15fe53)

  • Merge pull request #53 from adamkarvonen/hide_absorption_stddev

hide stddev from default display for absorption (155afbc)

  • hide stddev from default display for absorption (d970f05)

  • Merge pull request #52 from adamkarvonen/update_scr_tpp

update scr_tpp_schema to show top 20 by default (f551e7b)

  • update scr_tpp_schema to show top 20 by default (59320e2)

  • Merge pull request #51 from adamkarvonen/update_schema_jsons

fix: Update eval output schema jsons (7b2021c)

  • Add computational requirements (9b621a9)

  • Improve graphing notebook, include matryoshka results in graphs (f2d1d98)

  • Merge pull request #50 from chanind/lint-and-type-check

chore: Adding formatting, linting and type checking (a0fb5e9)

  • adding README and Makefile with helpers (7452eca)

  • fixing linting and type-checking issues (e663e3a)

  • formatting with ruff (14dad45)

  • Check that unlearning data exists before running unlearning eval (294b25c)

  • Improve export notebook (e2b0b3c)

  • Improve graphing utils (661920d)

  • Fix spelling (8c0df93)

  • Add standard deviation for absorption / autointerp, store results per class for sparse probing / tpp for potential error bars (141aff7)

  • Use GPU probing in correct location (ec5efa8)

v0.3.2

14 Jan 23:47
Compare
Choose a tag to compare

v0.3.2 (2025-01-14)

Fix

  • fix: use GPU for llm probing (ba0956e)

Unknown

  • Don't hardcode the device for unlearning (a594ee6)

  • Update unlearning data path (443761d)

v0.3.1

14 Jan 17:07
Compare
Choose a tag to compare

v0.3.1 (2025-01-14)

Fix

  • fix: pass device into core evals (e6651ea)

Unknown

  • fold W_dec norm when loading SAE Lens SAEs (511d51a)

  • Change default sparse probing k values (271a9d4)

v0.3.0

13 Jan 17:21
Compare
Choose a tag to compare

v0.3.0 (2025-01-13)

Feature

  • feat: Add a frac alive calculation to core (0399550)

Unknown

  • added absorption fraction metric (#48)

feat: added absorption fraction metric

  • Small fixes

  • remove unused FeatureAbsorptionCalculator._filter_prompts function


Co-authored-by: Demian Till <[email protected]> (7545ee3)

  • Add a script for organizing and uploading results (4689129)

  • Calculate featurewise statistics by default (bca84ca)

v0.2.0

09 Jan 23:02
Compare
Choose a tag to compare

v0.2.0 (2025-01-09)

Feature

  • feat: add misc core metrics (2c731f6)

Unknown

  • Make sure grad is enabled for absorption tests (bd25ca0)

v0.1.0

09 Jan 21:17
Compare
Choose a tag to compare

v0.1.0 (2025-01-09)

Feature

  • feat: EvalOutput and EvalConfig base classes to allow easy JSON schema export (537219a)

Fix

  • fix: eval_result_unstructured should be optional (38e81b0)

  • fix: dump to json file correctly (5f1cf15)

Unknown

  • git commit -m "fix: add missing init.py" (20b20f2)

  • Merge pull request #47 from chanind/packaging

feat: Setting up Python packaging and autodeploy with Semantic Release (e52a418)

  • Merge branch 'main' into packaging (9bc22a4)

  • Merge branch 'main' into packaging (bb10234)

  • Update SAE Bench demo to use new graphing functions (9bbfdc5)

  • switching to poetry and setting up CI (a9af271)

  • Add option to pass in arbitrary sae_class (e450661)

  • Mention dictionary_learning (c140e71)

  • Update graphing notebook to work with filenames (dc6f951)

  • deprecate graphing notebook (67118ee)

  • migrating to sae_bench base dir (bb8e145)

  • Use a smaller batch size for unlearning (3a099d2)

  • Reduce memory usage by only caching required activations (f026998)

  • Remove debugging check (8ea7162)

  • Add sanity checks before major run (0908b18)

  • Improve normalization check (16a3c0e)

  • Add normalization for batchtopk SAEs (6a031bd)

  • Add matroyshka loader (1078899)

  • Add pythia 160m (b219497)

  • simplify process of evaluating dictionary learning SAEs (c2dca52)

  • Add a script to run evals on dictionary learning SAEs (3f4139b)

  • Make the layer argument optional (e53675d)

  • Add batch_top_k, top_k, gated, and jump_relu implementations (9a7fce8)

  • Add a function to test the saes (864b4b3)

  • Update demo for new relu sae setup (5d04ce5)

  • Ensure loaded SAEs are on correct dtype and device (a5d6d62)

  • Create a base SAE class (8fcc9fe)

  • Add blog post link (2d47229)

  • cleanup README (0e724df)

  • Clean up graphing notebook (c08f3f5)

  • Graph results for all evals in demo notebook (29ac97b)

  • Clean up for release (1c9822c)

  • Include baseline pca in every graph. (a45afd2)

  • Clean up plot legends, support graphing subplots (7ade8b0)

  • Merge pull request #45 from adamkarvonen/update_jsonschemas

update jsonschemas (879c7ca)

  • update jsonschemas (a14d465)

  • Use notebook as default demo, mention in README (298796b)

  • Minor fixes to demo (05808c7)

  • Add missing batch size argument (877f2e7)

  • Fixes for changes to eval config formats (e0cb629)

  • Add an optional best of k graphing cell (081b59c)

  • Ignore any folder containing "eval_results" (12f8d66)

  • Add cell to add training tokens to config dictionaries (38173c9)

  • Also plot all sae bench checkpoints (93563e0)

  • Add eval links (2216f99)

  • rename core results to match convention (51e47fd)

  • Ignore autointerp with generations when downloading (aa20644)

  • Use != instead of > for L0 measurement (83504b7)

  • Add utility cell for removing llm generations (67c9b03)

  • Add utility cell for splitting up files by release name (3cc51ea)

  • Add force rerun option to core, match sae loading to other evals (8676d5d)

  • Improve plotting of results (89e5567)

  • Consolidate SAE loading and output locations (293b385)

  • Plot generator for SAE Bench (c2cb78e)

  • Add utility notebook for adding sae configs (8508a01)

  • Improve custom SAE usage (e959f65)

  • Improve graphing (490cd2a)

  • Fix failing tests (ed88f65)

  • match core output filename with others (8ca0787)

  • Remove del sae flag (feaf1f8)

  • Add current status to repo (9c95af7)

  • Add sae config to output file (b2fbd6d)

  • Add a flag for k sparse probing batch size (6f2e38f)

  • Merge pull request #44 from adamkarvonen/absorption-tweaks-2

improving memory usage of k-sparse probing (6ae8235)

  • Merge pull request #43 from adamkarvonen/fake_branch

single...

Read more