Update subprocess.py #102

Constannnnnt · 2018-07-06T04:18:24Z

Description: I use 3 GPUs to train the network and interrupt at some point before the final step, which means I only save the checkpoint but not config. Then, I try to test the model, which unexpectedly failed and the error message is start = subinds[i][0], list index out of range.

Issue: I think at the line 64, instead of writing gpu_inds = range(cfg.NUM_GPUS), I think it is much more reasonable to write gpu_inds = range(NUM_GPUS). Let me explain it.

After import the yaml and config file in subprocess.py, cfg.NUM_GPUs is 8 instead of 3 (well, in train_net_step, there is a statement which assigns cfg.NUM_GPUs = torch.cuda.device_count(), so it does not crash), and NUM_GPUs = torch.cuda.device_count() = 3 in my case, and it turns out that at line 56, the size of subins is 3.

I choose to let cuda see all my GPUs, Later, at line 64, if gpu_inds = range(cfg.NUM_GPUS) is used, the size of gpu_indx is 8, which then will crash at line 68. Therefore, at line 64, gpus_inds = range(NUM_GPUs) is much more reasonable.

Please check and see if my solution is correct or not. Thanks.

issue fix.

ternaus · 2018-09-30T03:47:31Z

👍

Update subprocess.py

2f15597

issue fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update subprocess.py #102

Update subprocess.py #102

Uh oh!

Constannnnnt commented Jul 6, 2018 •

edited

Loading

Uh oh!

ternaus commented Sep 30, 2018

Uh oh!

Uh oh!

Update subprocess.py #102

Are you sure you want to change the base?

Update subprocess.py #102

Uh oh!

Conversation

Constannnnnt commented Jul 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ternaus commented Sep 30, 2018

Uh oh!

Uh oh!

Constannnnnt commented Jul 6, 2018 •

edited

Loading