You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use a script similar to cola.sh to train and/or evaluate a model for sequence classification.
There are two possible parameters for model state files init_model and pre_trained.
I want and expect the model to be loaded with weights from pre_trained when provided while vocabulary is loaded based on init_model if init_model is one of the provided pretrained models.
However, the model parameters are actually loaded using init_model only. That's because pre_trained flag doesn't have an effect in this fucntion, although I expect pre_trained should override init_model.
Steps to reproduce
Set init_model to deberta-v3-base
Set pre_trained to $PATH_TO_MY_MODEL, which is a path to the pretrained mDeBERTa-V3-Base for example
Check the model parameter after loading, e.g print(model.deberta.encoder.layer[7].output.dense.weight[:5,:4]) after this line
Description
I use a script similar to
cola.sh
to train and/or evaluate a model for sequence classification.There are two possible parameters for model state files
init_model
andpre_trained
.I want and expect the model to be loaded with weights from
pre_trained
when provided while vocabulary is loaded based oninit_model
ifinit_model
is one of the provided pretrained models.However, the model parameters are actually loaded using
init_model
only. That's becausepre_trained
flag doesn't have an effect in this fucntion, although I expectpre_trained
should overrideinit_model
.Steps to reproduce
init_model
todeberta-v3-base
pre_trained
to $PATH_TO_MY_MODEL, which is a path to the pretrained mDeBERTa-V3-Base for exampleprint(model.deberta.encoder.layer[7].output.dense.weight[:5,:4])
after this linetensor([[-0.0212, 0.0130, 0.0446, 0.0156],
[ 0.0811, 0.0023, 0.0057, -0.0301],
[-0.0190, 0.0097, -0.0114, 0.0306],
[ 0.0049, -0.0174, 0.0064, -0.0275],
[-0.0152, -0.0411, -0.0166, -0.0447]], dtype=torch.float16)
tensor([[ 0.0278, -0.0206, -0.0062, 0.0368],
[ 0.0262, -0.0676, 0.0477, 0.0249],
[-0.0364, 0.0453, 0.0912, 0.0590],
[-0.0638, 0.0402, 0.0272, -0.0013],
[-0.0352, -0.0579, 0.0320, 0.0003]], grad_fn=)
Additional information/Environment
My system setup is:
The text was updated successfully, but these errors were encountered: