You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear authors,Thank you for your excellent work. I have a question regarding your training methodology, specifically concerning the utilization of training data. Upon examining the code in your GitHub repository (
), I noticed that only image tokens appear to be fed into the network. Could you please confirm if my understanding is correct?
If so, I'm curious about how the model learns to generate images corresponding to different text inputs?
The text was updated successfully, but these errors were encountered:
It seems the training is build upon official chameleon ckpt.
I think the training doc is very clear. The build dataset with (text, image_tokens) pairs and only train the output layer that output the special image tokens (4, 8196).
Dear authors,Thank you for your excellent work. I have a question regarding your training methodology, specifically concerning the utilization of training data. Upon examining the code in your GitHub repository (
anole/facilitating_image_generation/train_image_head.py
Line 19 in 219a9a3
If so, I'm curious about how the model learns to generate images corresponding to different text inputs?
The text was updated successfully, but these errors were encountered: