Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attention mask? #88

Open
pcuenca opened this issue Jun 17, 2023 · 1 comment
Open

Attention mask? #88

pcuenca opened this issue Jun 17, 2023 · 1 comment

Comments

@pcuenca
Copy link
Member

pcuenca commented Jun 17, 2023

Like in Stable Diffusion, no attention mask appears to be used for input tokens:

input_ids = self.tokenizer(
text,
return_tensors="pt",
padding="max_length",
truncation=True,
max_length=self.tokenizer.model_max_length,
).input_ids # TODO: remove hardcode
input_ids = input_ids.to(self.device)
encoder_hidden_states = self.text_encoder(input_ids).last_hidden_state

But according to third party analysis this appears to have been a mistake all along. Do we have insight on whether attention masks would help for better prompt-image alignment?

@Birch-san
Copy link

Birch-san commented Jul 7, 2023

these authors reckon it's better to train on an unmasked text embeddings (even though that risks learning from PAD token embeddings):
huggingface/diffusers#1890 (comment)

as for inference: the user needs to be able to match whatever approach was used during training.

I thought Muse was a bit wackier though. it actually masks vision tokens:

https://github.com/lucidrains/muse-maskgit-pytorch/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants