We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
torchrun $DISTRIBUTED_ARGS --nnodes 1 --nproc_per_node $gpu_num --master_port 12345 /home/funasr-dev/FunAsr-fork/FunASR/funasr/bin/train_ds.py ++model="/home/funasr-dev/FunAsr-fork/models/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch" ++train_data_set_list="./train.jsonl" ++valid_data_set_list="./val.jsonl" ++dataset="AudioDataset" ++dataset_conf.index_ds="IndexDSJsonl" ++dataset_conf.data_split_num=1 ++dataset_conf.batch_sampler="BatchSampler" ++dataset_conf.batch_size=10000 ++dataset_conf.max_source_length=5000 ++dataset_conf.max_token_length=5000 ++dataset_conf.sort_size=1024 ++dataset_conf.batch_type="token" ++dataset_conf.num_workers=2 ++train_conf.max_epoch=20 ++train_conf.log_interval=1 ++train_conf.resume=true ++train_conf.validate_interval=3000 ++train_conf.save_checkpoint_interval=3000 ++train_conf.keep_nbest_models=20 ++train_conf.avg_nbest_model=10 ++train_conf.use_deepspeed=false ++train_conf.avg_keep_nbest_models_type="acc" ++train_conf.deepspeed_config=/home/funasr-dev/FunAsr-fork/FunASR/examples/deepspeed_conf/ds_stage1.json ++optim_conf.lr=0.00005 ++output_dir="./outputs" &> ./outputs/log.txt 前面几百个step 内存正常的,会突然一个阶梯一个阶梯的增加gpu内存,直到gpu内存耗尽。修改batch_size也不行,都会慢慢耗尽内存
The text was updated successfully, but these errors were encountered:
No branches or pull requests
torchrun $DISTRIBUTED_ARGS --nnodes 1 --nproc_per_node $gpu_num --master_port 12345
/home/funasr-dev/FunAsr-fork/FunASR/funasr/bin/train_ds.py
++model="/home/funasr-dev/FunAsr-fork/models/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch"
++train_data_set_list="./train.jsonl" ++valid_data_set_list="./val.jsonl" ++dataset="AudioDataset" ++dataset_conf.index_ds="IndexDSJsonl"
++dataset_conf.data_split_num=1 ++dataset_conf.batch_sampler="BatchSampler" ++dataset_conf.batch_size=10000 ++dataset_conf.max_source_length=5000
++dataset_conf.max_token_length=5000 ++dataset_conf.sort_size=1024 ++dataset_conf.batch_type="token" ++dataset_conf.num_workers=2 ++train_conf.max_epoch=20
++train_conf.log_interval=1 ++train_conf.resume=true ++train_conf.validate_interval=3000 ++train_conf.save_checkpoint_interval=3000
++train_conf.keep_nbest_models=20 ++train_conf.avg_nbest_model=10 ++train_conf.use_deepspeed=false ++train_conf.avg_keep_nbest_models_type="acc"
++train_conf.deepspeed_config=/home/funasr-dev/FunAsr-fork/FunASR/examples/deepspeed_conf/ds_stage1.json ++optim_conf.lr=0.00005
++output_dir="./outputs" &> ./outputs/log.txt
前面几百个step 内存正常的,会突然一个阶梯一个阶梯的增加gpu内存,直到gpu内存耗尽。修改batch_size也不行,都会慢慢耗尽内存
The text was updated successfully, but these errors were encountered: