how much time to evaluate? #11

YueFan1014 · 2023-07-25T06:05:59Z

Hello, I am running
python pasture_runner.py -a src.models.agent_fbe_owl -n 8 --arch B32 --center on a single RTX4090, and after three hours no results are produced. I also encounter the problem raised in [https://github.com//issues/4], but the processes are still running on the GPU.

Therefore l am wondering how much time it usually takes to finish evaluation? Or is the program just stuck in somewhere thus producing no results? Thanks.

The text was updated successfully, but these errors were encountered:

sagadre · 2023-07-25T15:54:37Z

Hi @YueFan1014 is anything getting written to a results/ folder? Are you able to follow the pointers here? Trying to understand if things are running slowly or not at all. Thanks!

YueFan1014 · 2023-07-26T01:28:28Z

Hi @YueFan1014 is anything getting written to a results/ folder? Are you able to follow the pointers here? Trying to understand if things are running slowly or not at all. Thanks!

A folder longtail_longtail_fbe_owl-b32-openai-center was created under results but it is still empty after 15 hours.

sagadre · 2023-07-26T01:57:30Z

Something appears to be locked up. can you try running with -n 1 instead of -n 8 especially if you are using only one GPU? In my experience spawning too many THOR processes on 1 GPU can lead to problems.

YueFan1014 · 2023-07-26T02:13:55Z

Something appears to be locked up. can you try running with -n 1 instead of -n 8 especially if you are using only one GPU? In my experience spawning too many THOR processes on 1 GPU can lead to problems.

Thanks, I will try it. Besides, how much time it approximately takes to finish python pasture_runner.py -a src.models.agent_fbe_owl -n 1 --arch B32 --center if deployed on a single GPU?

hszhoushen · 2023-10-20T05:08:55Z

Hi @YueFan1014 is anything getting written to a results/ folder? Are you able to follow the pointers here? Trying to understand if things are running slowly or not at all. Thanks!

A folder longtail_longtail_fbe_owl-b32-openai-center was created under results but it is still empty after 15 hours.

I used the same number of GPUs but the error (queue.Empty #4) still exists, how do I fix this?

AmingWu · 2024-01-05T13:41:26Z

@hszhoushen , How long does you need to train? Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how much time to evaluate? #11

how much time to evaluate? #11

YueFan1014 commented Jul 25, 2023

sagadre commented Jul 25, 2023

YueFan1014 commented Jul 26, 2023

sagadre commented Jul 26, 2023

YueFan1014 commented Jul 26, 2023

hszhoushen commented Oct 20, 2023

AmingWu commented Jan 5, 2024

how much time to evaluate? #11

how much time to evaluate? #11

Comments

YueFan1014 commented Jul 25, 2023

sagadre commented Jul 25, 2023

YueFan1014 commented Jul 26, 2023

sagadre commented Jul 26, 2023

YueFan1014 commented Jul 26, 2023

hszhoushen commented Oct 20, 2023

AmingWu commented Jan 5, 2024