You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"train#trainer#train_handlers": "$@train#handlers[: -2 if dist.get_rank() > 0 else None]",
the multi-gpu override essentially set the trainer handlers to $@train#handlers[:-2] for the worker nodes. but because of the @train#handlers reference, the config parser will still trigger handler constructor calls on all nodes.
Thanks @wyli . I will take a look at this issue and your suggestion. Or @KumoLiu , if you have time could you please help to address it? Can check with the deepedit bundle first.
model-zoo/models/spleen_ct_segmentation/configs/multi_gpu_train.json
Line 18 in cf5e032
the multi-gpu override essentially set the trainer handlers to
$@train#handlers[:-2]
for the worker nodes. but because of the@train#handlers
reference, the config parser will still trigger handler constructor calls on all nodes.for tensorboard handlers this will be an issue, as each constructor call will create a new event log file. as a result the multinode log will have unnecessary event logging files. https://github.com/Project-MONAI/MONAI/blob/e36982b87bf87fb9559fc4d124e132b67f177d23/monai/handlers/tensorboard_handlers.py#L52-L55
The text was updated successfully, but these errors were encountered: