Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch worker cannot start with more than 64 CPUs #14644

Open
cjllanwarne opened this issue Jul 30, 2024 · 0 comments · May be fixed by #14643
Open

Batch worker cannot start with more than 64 CPUs #14644

cjllanwarne opened this issue Jul 30, 2024 · 0 comments · May be fixed by #14643
Assignees
Labels
needs-triage A brand new issue that needs triaging.

Comments

@cjllanwarne
Copy link
Collaborator

cjllanwarne commented Jul 30, 2024

What happened?

Try running a job with _machine_type: 'n1-highmem-64'. This is necessary to get enough memory for some larger jobs (> ~200GB).

Startup on the batch worker fails because the job is calculating how many theoretical network namespaces it could support (4 per CPU, 64 CPUS, plus some for JVMs), but not considering that the IPv4 schema puts a hard limit of 255 on namespaces if only one subnet value is changing each time.

Version

Live 7/30/24

Relevant log output

No response

Security considerations:

Low risk of impacting security. High CPU machine types are not materially different from others with respect to security considerations, and the bug is a simple logic error.

@cjllanwarne cjllanwarne added the needs-triage A brand new issue that needs triaging. label Jul 30, 2024
@cjllanwarne cjllanwarne linked a pull request Jul 30, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage A brand new issue that needs triaging.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant