-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stokhos: Remove NUM_MPI_PROCS from parallel-enabled tests #11322
Stokhos: Remove NUM_MPI_PROCS from parallel-enabled tests #11322
Conversation
Apparely including NUM_MPI_PROCS N with N > 1 prevents the test from running in a serial build, which is not desired behavior. I don't remember this being the case years ago when this option was added, but appears to be now.
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection Is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-11.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
Using Repos:
Pull Request Author: etphipp |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-11.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ rppawlo ]! |
Status Flag 'Pull Request AutoTester' - Pull Request will be Automerged |
Merge on Pull Request# 11322: IS A SUCCESS - Pull Request successfully merged |
@bartlettroscoe - is this intended? I'm pretty sure we have use cases where we want a test to run in both serial and parallel builds, and where we need to force a specific number of mpi processes to be able to check the answer in parallel. |
It was a defect in TriBITS and very undesirable behavior if you think about it. (I can provide some concrete examples.) If you want to run a test both in serial and in parallel, just add two tests, one with |
So is this something you plan to fix in tribits? Or would it break too many projects? I would probably have to go through all the nox and loca examples. I bet every epetra and tpetra test has this issue. It would be much easier to see it fixed in tribits, but that may lead to issues downstream (test harnesses timing out...). |
@rppawlo, I was not clear in my response above: The behavior where TriBITS would simply ignore TL;DR I suspect that most cases where a developer was setting Note that @etphipp's fix in this PR was to simply remove Note that this behavior of having the default for So what do we want to do here? This change was made quite a while ago to Trilinos 'develop' so many new tests have likely been added using the new (arguably correct) behavior where setting I will go back to my assertion that we want to design our test suites so that doing an MPI build also runs a superset of all same one-proc tests run by a non-MPI build would run, in addition to the multi-process tests. That way, doing an MPI build and passing all of the tests gives great confidence that all of the tests will pass with an non-MPI build. (That was not the case with the old behavior of TriBITS and is not the case with these updated Stokhos tests or any tests that leave off the Other than the argument to maintain backwards compatibility for all time (even for confusing and surprising behavior), is there another view here? Any more discussion on this should likely happen in a short MS Teams meeting (which will speed up the communication). |
NOTE: Really, there never should have been a variable |
I removed NUM_MPI_PROCS because I have no reason to believe it won't pass on any number of processors and I want it to run in serial builds. If specifying NUM_MPI_PROCS N means only run the test with exactly N procs, why do you allow SERIAL as an option for COMM if N > 1? Seems like if the user includes SERIAL for COMM, they expect it to run in a non-MPI build. If this is the behavior you intend, you should make it a configure error if SERIAL is specified when N > 1. |
👍
@etphipp, good point, should be an error because the test will never be added and never be run. I can update TriBITS to error out in those cases. Update: I added TriBITSPub/TriBITS#542 |
Getting that through the autotester will require updating lots of tests :) |
@etphipp you think there are a lot of tests passing in |
I don't know about other packages, but I know there are many NOX/LOCA tests where this is the case. I don't think you should just comment out those tests. The proper solution would be to add a separate SERIAL test. |
@etphipp, can you point out some examples?
Right, but some those tests may be failing once you add then back the so someone will have to fix those tests. I guess we just post a PR for each package independently and ask the package teams to take them over and fix the one-proc tests that are failing? |
There are many examples in NOX/LOCA, e.g., https://github.com/trilinos/Trilinos/blob/master/packages/nox/test/epetra/DS6.5.1/CMakeLists.txt. I am not aware of any tests in NOX/LOCA with NUM_MPI_PROCS > 1 that aren't expected to pass in serial builds. So hopefully none of these tests fail in serial as that represents a serious regression. |
Those NOX/LOCA tests show the args:
One idea is to change the behavior of Another idea is to extend the Opinions? |
I would prefer the first choice as that is how I thought it worked all along, and I don't understand what the problems are with it. If the test writer includes serial for COMM, they clearly expect it to run in a serial build, don't they? |
Agree with @etphipp . First choice would be my preference. |
What about the case where |
But I come back to the problem that this approach does not run the single-process versions of these tests in an MPI build. How about if you set But I argue that it should still be the case that if the user passes in |
That sounds like a good idea to me.
Agreed. |
See updated issue TriBITSPub/TriBITS#542 |
Apparently including NUM_MPI_PROCS N with N > 1 prevents the test from running in a serial build, which is not desired behavior. I don't remember this being the case years ago when this option was added, but appears to be now.