22 Feb 02:39

github-actions

c0f5431

v0.4.0 Latest

Latest

v0.4.0 (2025-02-22)

Chore

chore: making test less flaky (3effa18)
chore: fix updated torch types (4c46da6)
chore: fixing linting errors and adding precommit hook (85f6241)

Feature

feat: allow setting the artifacts path (2a4b4dc)

Fix

fix: gracefully handle slashes in model filename for autointerp (5d6464a)
fix: fix typing and updating mdl for saelens >=5.4.0 (802d1c3)
fix: load probe class with weights_only = False (f05bf40)
fix: Update README to include eval output schema update instructions (f0adee2)
fix: Update json schema jsons (2b2a6d3)

Unknown

Merge pull request #60 from chanind/deflaking-test

chore: making test less flaky (963f2e8)

Remove threshold from state dict if we aren't using it (d91a218)
Merge pull request #59 from chanind/artifacts-path-option

feat: allow setting the artifacts path (53901a2)

Merge pull request #58 from chanind/fixing-types

chore: fix updated torch types (849018f)

Merge pull request #57 from chanind/fix-slash-in-model-name-autointerp

fix: gracefully handle slashes in model filename for autointerp (11b2e38)

adding artifacts_path to unlearning eval (ce1de32)
By default we don't use a threshold for custom topk SAEs (60579ed)
Merge pull request #56 from chanind/type-fixes

fix: fix typing and updating mdl for saelens >=5.4.0 (0888d07)

Merge pull request #55 from chanind/precommit-check

chore: fixing linting errors and adding precommit hook (7ac7ced)

Fix SAE Bench SAEs repo names (18dc457)
Prevent potential division by zero (92315dd)
Add optional pinned dependencies (e74f0cf)
Calculate featurewise statistics in demo (5204b48)
Improve documentation on custom SAE usage (f15fe53)
Merge pull request #53 from adamkarvonen/hide_absorption_stddev

hide stddev from default display for absorption (155afbc)

hide stddev from default display for absorption (d970f05)
Merge pull request #52 from adamkarvonen/update_scr_tpp

update scr_tpp_schema to show top 20 by default (f551e7b)

update scr_tpp_schema to show top 20 by default (59320e2)
Merge pull request #51 from adamkarvonen/update_schema_jsons

fix: Update eval output schema jsons (7b2021c)

Add computational requirements (9b621a9)
Improve graphing notebook, include matryoshka results in graphs (f2d1d98)
Merge pull request #50 from chanind/lint-and-type-check

chore: Adding formatting, linting and type checking (a0fb5e9)

adding README and Makefile with helpers (7452eca)
fixing linting and type-checking issues (e663e3a)
formatting with ruff (14dad45)
Check that unlearning data exists before running unlearning eval (294b25c)
Improve export notebook (e2b0b3c)
Improve graphing utils (661920d)
Fix spelling (8c0df93)
Add standard deviation for absorption / autointerp, store results per class for sparse probing / tpp for potential error bars (141aff7)
Use GPU probing in correct location (ec5efa8)

Assets 6

14 Jan 23:47

github-actions

v0.3.2

b5cb985

v0.3.2

v0.3.2 (2025-01-14)

Fix

fix: use GPU for llm probing (ba0956e)

Unknown

Don't hardcode the device for unlearning (a594ee6)
Update unlearning data path (443761d)

Assets 6

14 Jan 17:07

github-actions

v0.3.1

fda4e30

v0.3.1

v0.3.1 (2025-01-14)

Fix

fix: pass device into core evals (e6651ea)

Unknown

fold W_dec norm when loading SAE Lens SAEs (511d51a)
Change default sparse probing k values (271a9d4)

Assets 6

13 Jan 17:21

github-actions

v0.3.0

0358911

v0.3.0

v0.3.0 (2025-01-13)

Feature

feat: Add a frac alive calculation to core (0399550)

Unknown

added absorption fraction metric (#48)

feat: added absorption fraction metric

Small fixes
remove unused FeatureAbsorptionCalculator._filter_prompts function

Co-authored-by: Demian Till <[email protected]> (7545ee3)

Add a script for organizing and uploading results (4689129)
Calculate featurewise statistics by default (bca84ca)

Assets 6

09 Jan 23:02

github-actions

v0.2.0

b993543

v0.2.0

v0.2.0 (2025-01-09)

Feature

feat: add misc core metrics (2c731f6)

Unknown

Make sure grad is enabled for absorption tests (bd25ca0)

Assets 6

09 Jan 21:17

github-actions

v0.1.0

800dafa

v0.1.0

v0.1.0 (2025-01-09)

Feature

feat: EvalOutput and EvalConfig base classes to allow easy JSON schema export (537219a)

Fix

fix: eval_result_unstructured should be optional (38e81b0)
fix: dump to json file correctly (5f1cf15)

Unknown

git commit -m "fix: add missing init.py" (20b20f2)
Merge pull request #47 from chanind/packaging

feat: Setting up Python packaging and autodeploy with Semantic Release (e52a418)

Merge branch 'main' into packaging (9bc22a4)
Merge branch 'main' into packaging (bb10234)
Update SAE Bench demo to use new graphing functions (9bbfdc5)
switching to poetry and setting up CI (a9af271)
Add option to pass in arbitrary sae_class (e450661)
Mention dictionary_learning (c140e71)
Update graphing notebook to work with filenames (dc6f951)
deprecate graphing notebook (67118ee)
migrating to sae_bench base dir (bb8e145)
Use a smaller batch size for unlearning (3a099d2)
Reduce memory usage by only caching required activations (f026998)
Remove debugging check (8ea7162)
Add sanity checks before major run (0908b18)
Improve normalization check (16a3c0e)
Add normalization for batchtopk SAEs (6a031bd)
Add matroyshka loader (1078899)
Add pythia 160m (b219497)
simplify process of evaluating dictionary learning SAEs (c2dca52)
Add a script to run evals on dictionary learning SAEs (3f4139b)
Make the layer argument optional (e53675d)
Add batch_top_k, top_k, gated, and jump_relu implementations (9a7fce8)
Add a function to test the saes (864b4b3)
Update demo for new relu sae setup (5d04ce5)
Ensure loaded SAEs are on correct dtype and device (a5d6d62)
Create a base SAE class (8fcc9fe)
Add blog post link (2d47229)
cleanup README (0e724df)
Clean up graphing notebook (c08f3f5)
Graph results for all evals in demo notebook (29ac97b)
Clean up for release (1c9822c)
Include baseline pca in every graph. (a45afd2)
Clean up plot legends, support graphing subplots (7ade8b0)
Merge pull request #45 from adamkarvonen/update_jsonschemas

update jsonschemas (879c7ca)

update jsonschemas (a14d465)
Use notebook as default demo, mention in README (298796b)
Minor fixes to demo (05808c7)
Add missing batch size argument (877f2e7)
Fixes for changes to eval config formats (e0cb629)
Add an optional best of k graphing cell (081b59c)
Ignore any folder containing "eval_results" (12f8d66)
Add cell to add training tokens to config dictionaries (38173c9)
Also plot all sae bench checkpoints (93563e0)
Add eval links (2216f99)
rename core results to match convention (51e47fd)
Ignore autointerp with generations when downloading (aa20644)
Use != instead of > for L0 measurement (83504b7)
Add utility cell for removing llm generations (67c9b03)
Add utility cell for splitting up files by release name (3cc51ea)
Add force rerun option to core, match sae loading to other evals (8676d5d)
Improve plotting of results (89e5567)
Consolidate SAE loading and output locations (293b385)
Plot generator for SAE Bench (c2cb78e)
Add utility notebook for adding sae configs (8508a01)
Improve custom SAE usage (e959f65)
Improve graphing (490cd2a)
Fix failing tests (ed88f65)
match core output filename with others (8ca0787)
Remove del sae flag (feaf1f8)
Add current status to repo (9c95af7)
Add sae config to output file (b2fbd6d)
Add a flag for k sparse probing batch size (6f2e38f)
Merge pull request #44 from adamkarvonen/absorption-tweaks-2

improving memory usage of k-sparse probing (6ae8235)

Merge pull request #43 from adamkarvonen/fake_branch

single...

Assets 6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.4.0 (2025-02-22)

Chore

Feature

Fix

Unknown

v0.3.2 (2025-01-14)

Fix

Unknown

v0.3.1 (2025-01-14)

Fix

Unknown

v0.3.0 (2025-01-13)

Feature

Unknown

v0.2.0 (2025-01-09)

Feature

Unknown

v0.1.0 (2025-01-09)

Feature

Fix

Unknown

Releases: adamkarvonen/SAEBench

v0.4.0

v0.4.0 (2025-02-22)

Chore

Feature

Fix

Unknown

v0.3.2

v0.3.2 (2025-01-14)

Fix

Unknown

v0.3.1

v0.3.1 (2025-01-14)

Fix

Unknown

v0.3.0

v0.3.0 (2025-01-13)

Feature

Unknown

v0.2.0

v0.2.0 (2025-01-09)

Feature

Unknown

v0.1.0

v0.1.0 (2025-01-09)

Feature

Fix

Unknown