Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

github act/mpi4py disable multi proc runs #12302

Merged
merged 1 commit into from
Feb 2, 2024

Conversation

hppritcha
Copy link
Member

for now. seem to timeout randomly

for now.  seem to timeout randomly

Signed-off-by: Howard Pritchard <[email protected]>
Copy link
Contributor

@wenduwan wenduwan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm are we sure this isn't a bug?

@hppritcha
Copy link
Member Author

yes likely, but it seems many of these problems are coming from accept/connect or spawn and although its good to get those working better i'd rather not have PRs that have nothing to do with this functionality failing with this CI test.

@rhc54
Copy link
Contributor

rhc54 commented Feb 1, 2024

Just wondering: is there a way for the CI to detect relevance? Maybe the PR submitter has to provide some hints, or the CI can look at the impacted subdirectories (e.g., if ompi/dpm is involved, you know you need spawn and accept/connect - if not, then maybe don't block the PR if those tests fail).

Or maybe look at CI history and see if the PR makes things worse? If not, then the PR isn't the issue.

@hppritcha
Copy link
Member Author

Just wondering: is there a way for the CI to detect relevance? Maybe the PR submitter has to provide some hints, or the CI can look at the impacted subdirectories (e.g., if ompi/dpm is involved, you know you need spawn and accept/connect - if not, then maybe don't block the PR if those tests fail).

Or maybe look at CI history and see if the PR makes things worse? If not, then the PR isn't the issue.

i'll check if there are some options here.

@dalcinl
Copy link
Contributor

dalcinl commented Feb 2, 2024

Things are still broken, this new thing I'm getting today in ompi@main smells like a bug:
https://github.com/mpi4py/mpi4py-testing/actions/runs/7749602944/job/21134452410#step:22:8815

Yesterday, the v5.0x runs failed after a deadlock with np=3.
https://github.com/mpi4py/mpi4py-testing/actions/runs/7734380956/job/21088271374#step:20:4748

BTW, you should eventually backport this PR to branch v5.0.x (and maybe even v4.1.x?).

There are also a few tests that mpi4py is already skipping when running under Open MPI, I'll submit issues about them separately.

@hppritcha hppritcha merged commit 8197ce9 into open-mpi:main Feb 2, 2024
11 checks passed
@hppritcha
Copy link
Member Author

Things are still broken, this new thing I'm getting today in ompi@main smells like a bug: https://github.com/mpi4py/mpi4py-testing/actions/runs/7749602944/job/21134452410#step:22:8815

I can't reproduce this one, at least not standalone outside of github action world

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants