-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ViLT token_type_embeddings implemented twice #34758
Comments
Hi, Yes ViLT uses 2 types of token type embeddings:
|
Hi, Thank you for your reply. Do you know if the first are included in the original ViLT implementation or are they an addition of the hugging face implementation? Because in the equations that describe ViLT in the original paper, the first token type embeddings are not mentioned: |
It's included in the original implementation here, so I assume that they just don't mention it in the paper. |
I loaded
ViltModel.from_pretrained("dandelin/vilt-b32-mlm")
and I printed out the parameters of the model. I noticed that token_type_embeddings was included twice.Once in the
TextEmbeddings
class: https://github.com/huggingface/transformers/blob/13493215abceafc1653af88b045120014fb4c1fc/src/transformers/models/vilt/modeling_vilt.py#L239and once in the
ViltEmbeddings
class:https://github.com/huggingface/transformers/blob/13493215abceafc1653af88b045120014fb4c1fc/src/transformers/models/vilt/modeling_vilt.py#L98
Additionaly, both of these token_type_embeddings are added to the text embeddings, which doesn't seem to be in agreement with the original paper.
The first time is added in the
TextEmbeddings
class in this line:embeddings = inputs_embeds + token_type_embeddings
.https://github.com/huggingface/transformers/blob/13493215abceafc1653af88b045120014fb4c1fc/src/transformers/models/vilt/modeling_vilt.py#L280
The second time is added in the
ViltEmbeddings
class in this line:text_embeds = text_embeds + self.token_type_embeddings( torch.zeros_like(attention_mask, dtype=torch.long, device=text_embeds.device) )
.https://github.com/huggingface/transformers/blob/13493215abceafc1653af88b045120014fb4c1fc/src/transformers/models/vilt/modeling_vilt.py#L218C9-L220C10
The image embeddings don't seem to have the same issue, as the
token_type_embeddings
are added to them only once inViltEmbeddings
.Is this an intented behaviour for some reason I don't understand or is it a mistake?
The text was updated successfully, but these errors were encountered: