Question about the training data #37

Epiphqny · 2024-08-09T07:38:18Z

Dear authors,Thank you for your excellent work. I have a question regarding your training methodology, specifically concerning the utilization of training data. Upon examining the code in your GitHub repository (

anole/facilitating_image_generation/train_image_head.py

Line 19 in 219a9a3

self.tokenized_data.append(torch.tensor(obj['image_tokens'], dtype=torch.long))

), I noticed that only image tokens appear to be fed into the network. Could you please confirm if my understanding is correct?
If so, I'm curious about how the model learns to generate images corresponding to different text inputs？

irexyc · 2024-08-29T07:09:59Z

It seems the training is build upon official chameleon ckpt.

I think the training doc is very clear. The build dataset with (text, image_tokens) pairs and only train the output layer that output the special image tokens (4, 8196).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the training data #37

Question about the training data #37

Epiphqny commented Aug 9, 2024

irexyc commented Aug 29, 2024

Question about the training data #37

Question about the training data #37

Comments

Epiphqny commented Aug 9, 2024

irexyc commented Aug 29, 2024