Skip to content

feat: parametrize GPUS_PER_NODE and CPUS_PER_WORKER in ray.sub #410

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 23, 2025

Conversation

terrykong
Copy link
Collaborator

@terrykong terrykong commented May 18, 2025

image

Closes #309

Copy link
Collaborator

@SahilJain314 SahilJain314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're getting to a lot of 'knobs' in ray.sub. Maybe it's time to add a doc for it? It didn't seem obvious to me that envvars like HF_HOME and WANDB_API_KEY would get plumbed through ray.sub and now we're adding GPUS_PER_NODE and CPUS_PER_WORKER too.

@terrykong
Copy link
Collaborator Author

I'll address the UV_CACHE_DIR in a follow up PR #426

terrykong added 2 commits May 20, 2025 23:06
Signed-off-by: Terry Kong <[email protected]>
Signed-off-by: Terry Kong <[email protected]>
SahilJain314
SahilJain314 previously approved these changes May 21, 2025
Copy link
Collaborator

@jgerh jgerh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completed the Tech Pubs review of docs/cluster.md and provided some copyedits and suggested text revisions. Comments added inline with the "add a suggestion" tool as well as line-by-line for read-only text.

Co-authored-by: jgerh <[email protected]>
Signed-off-by: Terry Kong <[email protected]>
Signed-off-by: Terry Kong <[email protected]>
@terrykong terrykong enabled auto-merge May 22, 2025 20:38
Signed-off-by: Terry Kong <[email protected]>
@terrykong terrykong added this pull request to the merge queue May 22, 2025
@parthchadha parthchadha removed this pull request from the merge queue due to a manual request May 22, 2025
@SahilJain314 SahilJain314 added this pull request to the merge queue May 23, 2025
Merged via the queue into main with commit f9e45de May 23, 2025
13 of 14 checks passed
@SahilJain314 SahilJain314 deleted the tk/cpu-task branch May 23, 2025 04:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

init_ray's runtime_env (with full os.environ) causes Ray runtime_env_agent to fail
4 participants