Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix a bug introduced by PR 11854 #12242

Merged
merged 1 commit into from
Jan 19, 2024
Merged

Conversation

hppritcha
Copy link
Member

PR #11854 introduced a big that causes singleton runs to segfault at startup in some cases

this bug was rooted out by the github action in
PR #12217

mca_base_parse_paramfile(file, &params);
free(file);
OPAL_LIST_FOREACH (fv, &params, mca_base_var_file_value_t) {
pmix_overlap = (NULL == evar) ? 0 : check_pmix_overlap(fv->mbvfv_var, evar);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comment on your mpi4py PR - this is an incorrect fix. The evar variable is not set - you are using a dangling value from the prior code block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hu? the triad has check for evar being null

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay i see what you're saying. we just shouldn't use evar in this section.

mca_base_parse_paramfile(file, &params);
free(file);
OPAL_LIST_FOREACH (fv, &params, mca_base_var_file_value_t) {
if (check_pmix_param(fv->mbvfv_var)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still isn't quite correct. You do need to check the params from the file for overlap - i.e., legacy ORTE params that need to be converted to PMIx. The problem was that we were passing the wrong things to check_pmix_param - the params in that function are (the variable name, the variable's value). So here you would pass check_pmix_overlap(fv->mbvfv_var, fv->mbvfv_value).

You still need check_pmix_param in case they put a PMIx var in the file - you just need to check overlap as well.

HTH

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhc54 check now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ggouaillardet
Copy link
Contributor

there is a typo in the commit message (PR ... introduced a big)

@hppritcha
Copy link
Member Author

i think the nvidia failure is related to #12245

@hppritcha
Copy link
Member Author

@qkoziol please review when you have a chance

@wenduwan
Copy link
Contributor

i think the nvidia failure is related to #12245

@hppritcha That's correct. The fix is here #12246

I will ping Quincey to take a look at the PR.

Copy link
Contributor

@qkoziol qkoziol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fixes!

@wenduwan wenduwan changed the title fix a bug introduced by PR 11854 fix a bug introduced by PR 11854(trigger CI again) Jan 19, 2024
@wenduwan wenduwan changed the title fix a bug introduced by PR 11854(trigger CI again) fix a bug introduced by PR 11854 Jan 19, 2024
@wenduwan
Copy link
Contributor

Hmmm not sure how to trigger nvidia CI. The failure should have been fixed.

PR open-mpi#11854 introduced a big that causes singleton runs to
segfault at startup in some cases

this bug was rooted out by the github action in
PR open-mpi#12217

Signed-off-by: Howard Pritchard <[email protected]>
@hppritcha hppritcha merged commit 6ab6775 into open-mpi:main Jan 19, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants