nnUNetV2Runner cannot be run with NVIDIA MIG configuration

**Describe the bug**

`python -m monai.apps.nnunet nnUNetV2Runner train_single_model --input_config "./input.yaml" --config "2d" --fold 0 --gpu_id {MIG_UUID}`

When providing the UUID of the MIG device as gpu_id, I am getting the following error:

```
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 74, in _wrap
    fn(i, *args)
  File "/usr/local/lib/python3.10/dist-packages/nnunetv2/run/run_training.py", line 113, in run_ddp
    torch.cuda.set_device(torch.device('cuda', dist.get_rank()))
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py", line 404, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
```

Similarly, setting CUDA_VISIBLE_DEVICES (`CUDA_VISIBLE_DEVICES={MIG_UUID} python -m monai.apps.nnunet nnUNetV2Runner train_single_model`) is overwritten by nnUNetV2Runner and not working.

Running `nnUNet` natively works fine with:

`CUDA_VISIBLE_DEVICES={MIG_UUID} nnUNetv2_train ... 2d 4`

**To Reproduce**
Steps to reproduce the behavior:

1. Use computer with MIG device
2.  run 
```
python -m monai.apps.nnunet nnUNetV2Runner train_single_model --input_config "./input.yaml" --config "2d" --fold 0 --gpu_id {MIG_UUID}
```
OR
```
CUDA_VISIBLE_DEVICES={MIG_UUID} python -m monai.apps.nnunet nnUNetV2Runner train_single_model --input_config "./input.yaml" --config "2d" --fold 0 
```

**Expected behavior**

`CUDA_VISIBLE_DEVICES` should not be overwritten if it was provided. 




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

nnUNetV2Runner cannot be run with NVIDIA MIG configuration #7497

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

nnUNetV2Runner cannot be run with NVIDIA MIG configuration #7497

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions