Maxvit #65

isamu-isozaki · 2023-05-01T01:31:09Z

Draft pr where I'm adapting maxvit from lucidrian's code. The corresponding issue is here

isamu-isozaki · 2023-05-02T02:15:29Z

Needs testing but main code part should be done

isamu-isozaki · 2023-05-08T02:13:17Z

On random input, it works. After testing vqgan on webdataset on sbatch script, I'll test this too

isamu-isozaki · 2023-07-23T16:49:36Z

I added a custom TransformerLayer for MaxVit. Let me know if anyone has ideas on formulating this differently! My next step is mainly testing this out and comparing the Vram usage. Then, I'll open for review

isamu-isozaki · 2023-07-23T17:19:13Z

Checked the paper and I noticed I was missing the latter half

isamu-isozaki · 2023-07-23T17:43:24Z

Fixed!

isamu-isozaki · 2023-08-08T00:31:22Z

I can run the code without any shape errors as of now but now I'm noticing that the maxvit layers do oom while the counterpart doesn't. I think I'm initializing some parameters to be too large which I plan to check tomorrow.

isamu-isozaki · 2023-08-12T21:10:42Z

Ok! I found that the main issue with the memory was the feed-forward networks in each transformer layer. They have the most parameters in the transformer layers and in max vit, we needed 3 instead of just 1. So that makes the memory usage per layer roughly 3 times. I fixed it so the size of the model is only 2 times now. The checklist now is

Do some batch-size tests
Resolve coflicts
Open for review

isamu-isozaki · 2023-08-12T21:24:13Z

Without max vit:
memory allocated before training=3.43GB
max memory allocated after one forward step=11.61GB
max memory allocated after optimizer step=12.88GB

With maxvit:
max memory allocated before training=6.26GB
max memory allocated after one forward step=15.32GB
max memory allocated after optimizer step=27.91GB

I think the main way google resolved this higher vram usage is to use optimizers like Lion and adafactor vs adamw since AdamW copies the weights of the model

With lion:
with maxvit=17.63GB
without maxvit=11.61GB

so proportional to the input weights it's better with maxvit.

isamu-isozaki · 2023-08-13T14:23:08Z

@williamberman @patil-suraj @sayakpaul @pcuenca I think I'm pretty much done. Let me know if there are any experiments/code changes that are recommended!

The TLDR for this pr is this is the attention format that google used for the second stage of muse to reduce vram usage with the higher sequence length from using a f8 vqgan vs a f16 vqgan. This pr is heavily inspired from lucidrian's maxvit implementation here

isamu-isozaki added 2 commits April 30, 2023 14:29

Started adding maxvit

f275de7

Started adapting lucidrian's maxvit

8242bdb

isamu-isozaki marked this pull request as draft May 1, 2023 01:31

isamu-isozaki added 4 commits May 1, 2023 21:37

Fixed attention in maxvitAttention

72551b7

Made maxvitblock

35244d6

Cleaned up init

2f905c7

Main part done

4e473d4

isamu-isozaki marked this pull request as ready for review May 2, 2023 02:15

Added some documentation

4e91cb4

isamu-isozaki changed the title ~~Maxvit~~ WIP: Maxvit May 5, 2023

Confirmed run on random input

f63913d

isamu-isozaki added 12 commits July 23, 2023 07:36

Added docs

77da80e

Removed some data

ad4d2a4

Refactored

fcf417d

Resolved conflicts

e88bc1f

Properly save

c8107b7

Added more docs

11fcedb

Added doc

debe7a0

Made mbconv a class

c907c1c

Reformatting

b91b149

Remove config for now

e8e5fbe

Removed redundant

b1e17f1

Fixed shape mismatch

08b9e87

Fixed according to original paper

147db13

maxvit training test

0fcf030

isamu-isozaki added 2 commits August 7, 2023 20:16

Smaller window size

9e96358

More logging

99e18d0

isamu-isozaki added 7 commits August 10, 2023 21:55

Check memory usage

a7bc2c6

Printing model size

dab1459

Printing model size

196e65c

Removed some logs

a2713da

Fixed model size

9f9b508

Fixed model size

289a952

Roughly fixed memory issue

fda1343

isamu-isozaki added 2 commits August 12, 2023 17:17

logging memory usage

b041878

logs

6c1a6bd

isamu-isozaki added 6 commits August 12, 2023 20:17

Resolved conflicts

f8feba8

Added back

8a19a82

Removed logs

ee405e5

Remove readme changes

d39df3e

Reduced diffs

030d107

Fixed style with make style

8bfd0a3

isamu-isozaki changed the title ~~WIP: Maxvit~~ Maxvit Aug 13, 2023

isamu-isozaki added 8 commits August 13, 2023 09:53

Removed more diffs

0f0fe52

Removed diffs

63515d4

Removed diffs

db1b8bb

lion test

7d348c9

logs

19dc870

Removed conversion to tuple

a2a7847

Removed conversion to tuple

d6c144e

Removed logs

46396a8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maxvit #65

Maxvit #65

isamu-isozaki commented May 1, 2023

isamu-isozaki commented May 2, 2023

isamu-isozaki commented May 8, 2023

isamu-isozaki commented Jul 23, 2023

isamu-isozaki commented Jul 23, 2023

isamu-isozaki commented Jul 23, 2023

isamu-isozaki commented Aug 8, 2023

isamu-isozaki commented Aug 12, 2023

isamu-isozaki commented Aug 12, 2023 •

edited

Loading

isamu-isozaki commented Aug 13, 2023

Maxvit #65

Are you sure you want to change the base?

Maxvit #65

Conversation

isamu-isozaki commented May 1, 2023

isamu-isozaki commented May 2, 2023

isamu-isozaki commented May 8, 2023

isamu-isozaki commented Jul 23, 2023

isamu-isozaki commented Jul 23, 2023

isamu-isozaki commented Jul 23, 2023

isamu-isozaki commented Aug 8, 2023

isamu-isozaki commented Aug 12, 2023

isamu-isozaki commented Aug 12, 2023 • edited Loading

isamu-isozaki commented Aug 13, 2023

isamu-isozaki commented Aug 12, 2023 •

edited

Loading