-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About multimodal sequence input #38
Comments
Hi, we did not try that. |
Hello, there is something strange for me about multimodal sequence input in mmu. |
Hi, for continuous clip-vit features, we follow llava's processing. In our experiments, it seems that the order does not matter a lot. |
This result is quite interesting. I'd like to know which input order while training. |
Hello, I am very interested in your great work. I see in the code that the sequence of the image generation input is basically text tokens before image tokens, what about reversing the order when generating the image?
The text was updated successfully, but these errors were encountered: