Problem when using run_pt.py #4

Melonoyk · 2024-04-23T07:12:02Z

I met a problem when i testing the pre-train part of your code. I use: bash script/run_pt.sh. to follow your Start Training part in README, and find that the process is blocked at the Epoch 1 /100. Eventually the process will be forcibly killed. I also tried to interrupt the process and found that it stuck at reading the length of dataloader. I wonder if this is due to hardware requirements that don't support pre-training(using RTX 3090), and looking forward to your reply very much.

The output is as follow:

tyang816 · 2024-04-25T06:27:45Z

Hi, Ruiwen,
Sorry for the late reply.
I have tested the pre-train code just now (RTX 3090), but I didn't meet any wrong, can you provide more information?
Thx

Melonoyk · 2024-04-25T15:02:13Z

Hi, Yang,
Some of the packages i use are not in the same version mentioned in enviroment.yaml which may cause the occurrence of this issue. When i using: conda env create -f environment.yaml, this process will shutdown in the middle. I fix this problem by rewriting the BatchSampler function, cuz i found that the loading of dataset is stuck at loading the first data of dataset into sampler while i dont konw how this issue occurs. At last, thank u for your testing!

tyang816 · 2024-04-28T05:55:25Z

Great! I will check the environment file soon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem when using run_pt.py #4

Problem when using run_pt.py #4

Melonoyk commented Apr 23, 2024

tyang816 commented Apr 25, 2024

Melonoyk commented Apr 25, 2024

tyang816 commented Apr 28, 2024

Problem when using run_pt.py #4

Problem when using run_pt.py #4

Comments

Melonoyk commented Apr 23, 2024

tyang816 commented Apr 25, 2024

Melonoyk commented Apr 25, 2024

tyang816 commented Apr 28, 2024