Skip to content

multi-gpu tensorboard handlers initialization #520

Open
@wyli

Description

@wyli

"train#trainer#train_handlers": "$@train#handlers[: -2 if dist.get_rank() > 0 else None]",

the multi-gpu override essentially set the trainer handlers to $@train#handlers[:-2] for the worker nodes. but because of the @train#handlers reference, the config parser will still trigger handler constructor calls on all nodes.

for tensorboard handlers this will be an issue, as each constructor call will create a new event log file. as a result the multinode log will have unnecessary event logging files. https://github.com/Project-MONAI/MONAI/blob/e36982b87bf87fb9559fc4d124e132b67f177d23/monai/handlers/tensorboard_handlers.py#L52-L55

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions