Add AdEMAMix optimizer #1360

matthewdouglas · 2024-09-16T14:15:04Z

Adds support for the AdEMAMix optimizer described here: https://arxiv.org/abs/2409.03137

Includes blockwise 8bit and 32bit versions, each supporting paged operation.

AdEMAMix is a modification to Adam which introduces an additional EMA component. It is observed that AdEMAMix can forget training data at a slower pace and can reach similar loss as AdamW with significantly less training data.

~~TODO: Implement scheduler for alpha/beta3~~

matthewdouglas · 2024-09-16T14:19:35Z

bitsandbytes/optim/ademamix.py

+                    # For parity with bnb implementation we combine both fast
+                    # and slow EMA stats into one stacked tensor.
+                    state["m1_m2"] = p.new_zeros((2, *p.size()))


This is done for ease of compatibility with the existing test suite. In most other implementations we'll see two separate buffers here.

matthewdouglas · 2024-09-16T14:23:04Z

csrc/kernels.cu

+  // AdEMAMix has an additional state buffer, which we packed
+  // into state1. We need thread-local storage here for these.
+  // TODO: Mark with [[maybe_unused]] after upgrade to min compiler.
+  float s3_vals[NUM_PER_THREAD];


There's a few extra memory allocations like this to support AdEMAMix. Have not confirmed if the compiler is optimizing these out for instantiations with OPTIMIZER=ADAM, but if not, the overhead isn't very much.

TimDettmers

This looks all good to me.

github-actions · 2024-09-20T18:58:05Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

* Add AdEMAMix optimizer * Add PagedAdEMAMix32bit, AdEMAMix32bit * Add PagedAdEMAMix32bit, AdEMAMix32bit * AdEMAMix: add support for alpha/beta3 scheduling * Update paged AdEMAMix

Add AdEMAMix optimizer

d8c4b39

matthewdouglas added the enhancement New feature or request label Sep 16, 2024

matthewdouglas requested a review from TimDettmers September 16, 2024 14:15

matthewdouglas commented Sep 16, 2024

View reviewed changes

matthewdouglas added 3 commits September 16, 2024 11:02

Add PagedAdEMAMix32bit, AdEMAMix32bit

0911854

Add PagedAdEMAMix32bit, AdEMAMix32bit

a922dab

AdEMAMix: add support for alpha/beta3 scheduling

d4b92d1

matthewdouglas marked this pull request as ready for review September 16, 2024 20:03

matthewdouglas mentioned this pull request Sep 19, 2024

Change 8bit optimizer blocksize 2048->256; additional bf16 support #1365

Merged

TimDettmers previously approved these changes Sep 20, 2024

View reviewed changes

Update paged AdEMAMix

1a3d3c6

matthewdouglas dismissed TimDettmers’s stale review via 1a3d3c6 September 20, 2024 18:54

matthewdouglas merged commit d964546 into main Sep 20, 2024
54 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AdEMAMix optimizer #1360

Add AdEMAMix optimizer #1360

matthewdouglas commented Sep 16, 2024 •

edited

Loading

matthewdouglas Sep 16, 2024

matthewdouglas Sep 16, 2024

TimDettmers left a comment

github-actions bot commented Sep 20, 2024

Add AdEMAMix optimizer #1360

Add AdEMAMix optimizer #1360

Conversation

matthewdouglas commented Sep 16, 2024 • edited Loading

matthewdouglas Sep 16, 2024

Choose a reason for hiding this comment

matthewdouglas Sep 16, 2024

Choose a reason for hiding this comment

TimDettmers left a comment

Choose a reason for hiding this comment

github-actions bot commented Sep 20, 2024

matthewdouglas commented Sep 16, 2024 •

edited

Loading