Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race-condition between collect_artifacts.sh instances #92

Open
theurich opened this issue Mar 16, 2023 · 0 comments
Open

Fix race-condition between collect_artifacts.sh instances #92

theurich opened this issue Mar 16, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@theurich
Copy link
Member

The test_esmf.py currently spawns separate child processes to run collect_artifacts.sh after build and test phases for each of the combos executed. All of those collect_artifacts.sh scripts execute collect_artifacts.py which contains Git command from under the local esmf-test-artifacts clone directory. This causes race-conditions between all of those collect_artifacts.py instances.

There is currently code inside collect_artifacts.py that is supposed to function as a lock mechanism to prevent the race-condition. The locking implementation is file-based, and with file-system (FS) issues, does not guarantee to function. In fact, on lustre FS it does not work reliably at all!

One solution might be to prevent multiple collect_artifacts.py instances in the first place. Instead maybe there should be only one of them, but it is responsible to process all of the running build & test jobs. This could be managed by a simple file that contains all of the job-ids to wait on. The single collect_artifacts.py instances then just loops over those ids, looking if any of them is done, and if so handles the collection. There is virtually no potential of conflict in this approach between the collect-processing of the different combos, and at the same time it should be just as flexible, i.e. what ever gets done gets processed asap, i.e. no serialization of the order, since the single instance collect_artifacts.py loops over all ids, checking which ones are done for collection. The process finishes once all ids have finished and have been processed.

@theurich theurich added the feature New feature or request label Mar 16, 2023
@theurich theurich added bug Something isn't working and removed feature New feature or request labels Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant