We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
训练环境:
torch==2.1.2、cuda==11.8、transformer==4.45.0.dev0、LLaMA-Factory==v0.9.0,显卡A800
训练碰到的问题:训练脚本https://github.com/hiyouga/LLaMA-Factory/blob/main/examples/train_lora/qwen2vl_lora_sft.yaml 修改了视频解析策略,非流式数据预处理,十几万数据preprocessing_num_workers==48可以在90分钟左右预处理完,只训练LLM lora的时候能够正常训练,但是 添加 “additional_target: merger”选项的时候,预处理正常,模型训练完成1个step的时候就会长时间hang住不报错(实测过三四次,稳定出现,但是在训练数据比较少的时候又能正常训练),hang住的时候显存占满且利用率长时间100%,想请问下这是为什么呢?
另外尝试了下“additional_target: merger”在流式数据处理情况下的模型训练,模型能正常训练,但是不支持混合模态数据,训练时间也变得特别的长,流式的配置如下。
buffer_size: 64 preprocessing_batch_size: 64 streaming: true accelerator_config: dispatch_batches: false
The text was updated successfully, but these errors were encountered:
hello,想问下有大佬能帮忙解答下这个问题么
Sorry, something went wrong.
hello,没有人能回复一下吗...,群里也问了,但是没人理
同求
No branches or pull requests
训练环境:
训练碰到的问题:训练脚本https://github.com/hiyouga/LLaMA-Factory/blob/main/examples/train_lora/qwen2vl_lora_sft.yaml
修改了视频解析策略,非流式数据预处理,十几万数据preprocessing_num_workers==48可以在90分钟左右预处理完,只训练LLM lora的时候能够正常训练,但是 添加 “additional_target: merger”选项的时候,预处理正常,模型训练完成1个step的时候就会长时间hang住不报错(实测过三四次,稳定出现,但是在训练数据比较少的时候又能正常训练),hang住的时候显存占满且利用率长时间100%,想请问下这是为什么呢?
另外尝试了下“additional_target: merger”在流式数据处理情况下的模型训练,模型能正常训练,但是不支持混合模态数据,训练时间也变得特别的长,流式的配置如下。
The text was updated successfully, but these errors were encountered: