Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In Ascend NPU, stop workers together with its children processes #1284

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

majieyue
Copy link
Collaborator

@majieyue majieyue commented Oct 1, 2024

What changes were proposed in this pull request?

Be adapt to Ascend NPU cases, to stop workers and make sure no remaining processes are using NPU

Why are the changes needed?

In Ascend NPU, workers will fork many child processes, and we need to clear all of them

Does this PR introduce any user-facing change?

NO

How was this patch tested?

UT

Copy link

codecov bot commented Oct 1, 2024

Codecov Report

Attention: Patch coverage is 95.34884% with 2 lines in your changes missing coverage. Please review.

Project coverage is 80.57%. Comparing base (6764a09) to head (c2e69a4).
Report is 26 commits behind head on master.

Files with missing lines Patch % Lines
dlrover/python/elastic_agent/torch/training.py 89.47% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1284      +/-   ##
==========================================
+ Coverage   80.41%   80.57%   +0.15%     
==========================================
  Files         219      219              
  Lines       20126    20167      +41     
==========================================
+ Hits        16185    16249      +64     
+ Misses       3941     3918      -23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…value, not a pids set. so we need to call values() to get all the worker pids
…y stop workers while the main thread is running in _invoke_run
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant