Uncouple multiprocessing settings #240

ejm714 · 2022-09-28T18:45:13Z

Right now, we set the multiprocessing_context for the Trainer based on the num_workers used for the data loader

https://github.com/drivendataorg/zamba/blob/master/zamba/pytorch_lightning/utils.py#L67-L71

https://github.com/drivendataorg/zamba/blob/master/zamba/models/model_manager.py#L283-L286

It would be good to separate those out for a couple reasons:

it lets us use multiple cores for data loading but not need to set a multiprocessing strategy for the trainer when only running on a single GPU
we've only trained models on a single GPU so it's not clear that multiprocessing for the model is fully and properly configured
pytorch lightning is making a lot of changes currently to their accelerators and strategies used for distributed training, so it would be nice to let those settle a bit before supporting multi GPU training in zamba

Implementation thoughts:

do not infer multiprocessing context from num workers (only use num workers for the dataloaders and to determine persistent_workers)
consider adding a multiprocessing strategy on the train config object with the PTL default. another option is to set this as a boolean and let zamba determine the best strategy / accelerator combo

The text was updated successfully, but these errors were encountered:

aaronphilip19 · 2024-04-22T03:35:18Z

Hey, @sambujangfofana and I are students from the University of Michigan. We are currently working on a project wherein we have to contribute to a Github repository(https://eecs481.org/hw6.html). We are pretty interested in this issue and would want to work on it. We hope to submit a pull request this week. Could we be assigned this issue?

…cessing strategy Issue drivendataorg#240

…rainer Issue drivendataorg#240

…#240

ejm714 added the enhancement New feature or request label Sep 28, 2022

klwetstone added the good first issue Good for newcomers label Apr 17, 2024

sambujangfofana added a commit to sambujangfofana/zamba that referenced this issue Apr 26, 2024

Updated trainer configuration in model_manager.py to take in multipro…

59f078a

…cessing strategy Issue drivendataorg#240

sambujangfofana added a commit to sambujangfofana/zamba that referenced this issue Apr 26, 2024

Separated the base context inference for the multiprocessor for the t…

879a776

…rainer Issue drivendataorg#240

aaronphilip19 added a commit to aaronphilip19/zamba that referenced this issue Apr 26, 2024

Update model_manager.py - trainer config change - Issue drivendataorg…

e34c71f

…#240

aaronphilip19 mentioned this issue Apr 26, 2024

Uncouple Multiprocessing Settings - Issue #240 #324

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uncouple multiprocessing settings #240

Uncouple multiprocessing settings #240

ejm714 commented Sep 28, 2022 •

edited

Loading

aaronphilip19 commented Apr 22, 2024

Uncouple multiprocessing settings #240

Uncouple multiprocessing settings #240

Comments

ejm714 commented Sep 28, 2022 • edited Loading

aaronphilip19 commented Apr 22, 2024

ejm714 commented Sep 28, 2022 •

edited

Loading