Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job launcher #3049

Merged
merged 35 commits into from
Nov 1, 2024
Merged

Job launcher #3049

merged 35 commits into from
Nov 1, 2024

Conversation

yhwen
Copy link
Collaborator

@yhwen yhwen commented Oct 23, 2024

Fixes # .

Description

Added the job launcher API. Each site will have the ability to choose how to launch the job to run. Currently only support the client side job launch. There are 2 launchers to choose from, ProcessJobLauncher, or K8sJobLauncher. The job launcher is configured in the local/resources.json components like this:

  {
    "id": "process_launcher",
    "path": "nvflare.app_opt.job_launcher.process_launcher.ProcessJobLauncher",
    "args": {}
  }

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Quick tests passed locally by running ./runtest.sh.
  • In-line docstrings updated.
  • Documentation updated.

@yhwen yhwen requested a review from nvidianz October 23, 2024 18:14
Copy link
Collaborator

@yanchengnv yanchengnv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change to use event to select launchers, as we discussed before. This will make our core code completely free of launcher specific knowledge (e.g. image).
Define a set of standard return codes for the "poll" method.
No need for "can_launch" method in Launcher Spec.
See comments for detail.

nvflare/app_opt/job_launcher/__init__.py Show resolved Hide resolved
nvflare/app_opt/job_launcher/job_launcher_spec.py Outdated Show resolved Hide resolved
nvflare/app_opt/job_launcher/job_launcher_spec.py Outdated Show resolved Hide resolved
nvflare/app_opt/job_launcher/k8s_launcher.py Outdated Show resolved Hide resolved
nvflare/app_opt/job_launcher/k8s_launcher.py Outdated Show resolved Hide resolved
nvflare/private/fed/client/client_executor.py Outdated Show resolved Hide resolved
nvflare/private/fed/client/client_executor.py Outdated Show resolved Hide resolved
nvflare/private/fed/client/client_executor.py Outdated Show resolved Hide resolved
nvflare/private/fed/client/client_executor.py Outdated Show resolved Hide resolved
nvflare/private/fed/client/client_executor.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@yanchengnv yanchengnv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite there yet:

  • The return code of poll() should be standardized.
  • The condition for checking job_launcher isn't right.
  • See other comments for improvement in other areas.

nvflare/apis/job_launcher_spec.py Show resolved Hide resolved
nvflare/apis/job_launcher_spec.py Show resolved Hide resolved
nvflare/app_opt/job_launcher/k8s_launcher.py Outdated Show resolved Hide resolved
nvflare/app_opt/job_launcher/k8s_launcher.py Outdated Show resolved Hide resolved
nvflare/private/fed/client/client_executor.py Outdated Show resolved Hide resolved
nvflare/private/fed/utils/fed_utils.py Show resolved Hide resolved
nvflare/private/fed/client/client_executor.py Show resolved Hide resolved
nvflare/private/fed/client/client_executor.py Outdated Show resolved Hide resolved
yanchengnv
yanchengnv previously approved these changes Oct 29, 2024
Copy link
Collaborator

@yanchengnv yanchengnv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay. See my comments for return code improvement that can be done in another PR.

Also need another PR to modify job_meta_validator.py to support the new deploy_map format.

nvflare/app_opt/job_launcher/k8s_launcher.py Show resolved Hide resolved
nvflare/app_opt/job_launcher/k8s_launcher.py Show resolved Hide resolved
@yhwen
Copy link
Collaborator Author

yhwen commented Oct 29, 2024

/build

@yhwen yhwen enabled auto-merge (squash) October 29, 2024 19:34
@YuanTingHsieh
Copy link
Collaborator

/build

@YuanTingHsieh
Copy link
Collaborator

/build

@yhwen
Copy link
Collaborator Author

yhwen commented Nov 1, 2024

/build

@YuanTingHsieh
Copy link
Collaborator

/build

@yhwen yhwen merged commit dd256fe into NVIDIA:main Nov 1, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants