Memory saving loading weight for non-quant models #56
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Trying to fix #51
And this also increase the speed of loading weights. (in my computer, about 1min vs 2min)
Tested on
1.1-7b-it
and7b-it
model.but:
quant
model.requires_grad
ofnn.Parameter
inLinear
andEmbedding
toTrue
after the loading is completed. (I don't know why some nn.Parameters in model.py haverequires_grad
as False and others as default True) But I think, this shouldn't affect sinceforward
function ofGemmaForCausalLM
has@torch.no_grad()
.