-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
代码中会把t2i、llm、mmu三部分的数据集混合起来训练 #48
Comments
一直是混合起来训练的 |
如果是GPU显存不够的话,可以打开deepspeed zero3,或者开启一下accumulation steps |
好的,谢谢大佬解惑。 |
欢迎给我们仓库一个star :) |
作者您好,请问只在stage3高质量数据上训练时,也需要t2i、llm、mmu三部分吗? 看论文里好像只需要用到t2i和llava在stage3? |
需要的,就是用高质量数据 替换其中一个 然后还是需要混合训练 |
那请问基于(show-o-512x512-wo-llava-tuning) finetune的时候,如果不用 refinedweb 和 language modeling loss,性能影响会很大吗? |
对于understanding benchmark应该影响比较小,实际对纯text建模应该会有影响,只是我们没有评测纯text建模能力 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
iterables = { "t2i_flow": train_dataloader_t2i, "lm_flow": train_dataloader_lm, "mmu_flow": train_dataloader_mmu, }
因为设备限制,我无法训练,但依照我的理解,似乎代码中会把t2i、llm、mmu三部分的数据集混合起来训练?可能json文件不同时,里面的数据集会变,但无论怎么变,似乎还是混合起来训练。
还是说可能有时候某个数据集是空的?
The text was updated successfully, but these errors were encountered: