Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler wrappers + update packages + use spack environments #520

Merged
merged 39 commits into from
Apr 22, 2024

Conversation

adrienbernede
Copy link
Member

@adrienbernede adrienbernede commented Dec 11, 2023

Supersedes #517 (Update Shared CI, add poodle, update Spack configs accordingly.)

Adds
Update to 2023-12-0 release of Shared CI, with fix of flux allocations to allow MPI tests, and improvement of reproducer.
Update to 2023-12-3 release of Shared CI, with fixed reproducer.
Update the submodule RADIUSS Spack Configs

Purpose

In RADIUSS Spack Configs:

Notes

Packages in RADIUSS Spack Configs still have some specific changes compared to the PR mentioned above.

Supersedes LLNL/Umpire#876

@adrienbernede
Copy link
Member Author

adrienbernede commented Dec 18, 2023

@daboehme I would need you to have a look at the jobs on poodle.

@daboehme
Copy link
Member

@daboehme I would need you to have a look at the jobs on poodle.

Fails in the papi test because the PAPI library on poodle apparently doesn't know PAPI_TOT_CYC :-( Kinda weird since I think it used to work? Did we only enable PAPI recently? Anyway a workaround would be to disable the papi option in the Caliper build spec for this machine.

@adrienbernede
Copy link
Member Author

adrienbernede commented Dec 18, 2023

Yea... I had checked the PAPI version of both machines before asking you, same version same spack hash for both the ruby and the poodle papi installation.

However, I added poodle support recently, and I remember having this issue from the beginning, I’ve never seen it work.

@daboehme
Copy link
Member

The available counters don't really depend on the PAPI version but more on what is implemented for the given architecture. However PAPI_TOT_CYC is usually a pretty reliable one, so I'm kinda surprised it doesn't work, especially since it's available on any other Intel CPU so far. Oh well. Let's just disable PAPI on poodle for now.

@daboehme
Copy link
Member

Now that I have an account on poodle myself I checked and found that the PAPI install there doesn't work at all for some reason. So looks like we really have to disable PAPI support on poodle.

@adrienbernede
Copy link
Member Author

Thanks for the update @daboehme. I’ll do that.
For the records, I updated this PR with the latest changes in RADIUSS Spack Configs (moving to rocm 5.7) but we now have an MPI related failure on Tioga that is generating an unsupported character in the logs, causing Flux watch to fail.
The flux team has been notified about this, but we need to fix this bug and, if possible, identify this special character.

@adrienbernede
Copy link
Member Author

@daboehme I updated the CI configuration with the new standard. There is a new class of failures I’d like you to look into. Thanks.

@daboehme
Copy link
Member

@daboehme I updated the CI configuration with the new standard. There is a new class of failures I’d like you to look into. Thanks.

Looks like for Tioga we may also want to disable the MPI tests with -DRUN_MPI_TESTS=Off, same as for Lassen.

The Lassen failure is weird. It's only running into it with xlc and only in a Release build. The specific failure is very strange and just shouldn't be happening. I'm suspecting a compiler bug. I can dig around a bit more and see if I can come up with a workaround.

@adrienbernede
Copy link
Member Author

@daboehme, I must ask about this:

..........== CALIPER: Region stack mismatch: Trying to end
    "testbinding"
  but current region is
    "inner".
  Ceasing region tracking!
  Run program with CALI_SERVICES_ENABLE=validator to examine nesting errors, or
  run with CALI_CALIPER_ALLOW_REGION_OVERLAP=true to continue region tracking.

Could this be causing the MPI issues ?

@daboehme
Copy link
Member

daboehme commented Apr 9, 2024

Hi @adrienbernede, this should actually be fixed in the latest master branch. Are you still seeing this somewhere?

@adrienbernede adrienbernede changed the title [WIP] [Woptim] RADIUSS Spack Configs update [DEPRECATED] [Woptim] RADIUSS Spack Configs update Apr 12, 2024
@adrienbernede adrienbernede changed the title [DEPRECATED] [Woptim] RADIUSS Spack Configs update [Woptim] RADIUSS Spack Configs update Apr 12, 2024
@adrienbernede adrienbernede changed the title [Woptim] RADIUSS Spack Configs update [WIP] [Woptim] RADIUSS Spack Configs update Apr 12, 2024
@adrienbernede adrienbernede changed the title [WIP] [Woptim] RADIUSS Spack Configs update [WIP] [Woptim] Compiler wrappers + update packages + use spack environments Apr 12, 2024
@adrienbernede
Copy link
Member Author

@daboehme What should we do about the lassen failure with XL compiler ?

@adrienbernede adrienbernede changed the title [WIP] [Woptim] Compiler wrappers + update packages + use spack environments Compiler wrappers + update packages + use spack environments Apr 15, 2024
@daboehme
Copy link
Member

daboehme commented Apr 22, 2024

@daboehme What should we do about the lassen failure with XL compiler ?

Hi @adrienbernede , yeah this is a weird case. It seems to work if we build with -DCMAKE_BUILD_TYPE=Debug, can we do that in the test framework?

@daboehme daboehme merged commit 45dffec into master Apr 22, 2024
7 checks passed
@adrienbernede adrienbernede deleted the woptim/shared-ci-2023-12-0 branch April 22, 2024 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants