Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

md5_status check update #553

Merged
merged 5 commits into from
Dec 14, 2023
Merged

md5_status check update #553

merged 5 commits into from
Dec 14, 2023

Conversation

aschroed
Copy link
Member

@aschroed aschroed commented Dec 12, 2023

This check has a throttle that limits the number of runs of these workflows that can be kicked off by the action to avoid going beyond the docker-hub pull limit of 200/6h.

Previously the way this was written was that the list of files checked was obtained by grabbing first n items from the search response where n is determined as (pull limit - the number of workflow_runs) created in the past 6 hours. However, we just ran into the case where 1 submitter uploaded 100+ files (and the md5 was kicked for some but not all of them - due to the pull limit throttle ) and in the meantime before n was high enough to finish all the runs a different submitter uploaded metadata for ~1000 files but did not upload the actual files. This meant that the same n files from the beginning of the list (those most recently submitted) got processed by the check and since they are still not uploaded the check is futilely spinning until they are.

This PR allows all the file items in the search_metadata response to be checked and how the appropriate number to run to avoid hitting the limit is determined (within the iterated results rather than limiting the number of items that are input to the iteration.

NOTE: that there is an (probably related) issue in that the check does not look to be triggered at all even though it is scheduled. We thought that updating the docker that was pointed to in the Workflow Item had fixed this but the check still doesn't seem to be running as far as Will can tell. However, the check appeared to run fine locally both before and after this update so I have no way to know if this change will resolve that. I did also switch search_metadata to return as a generator if maybe there is a memory issue.

Copy link
Member

@willronchetti willronchetti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like some of these changes are bug fixes ie: appending to summary that should be ported elsewhere?

Copy link
Member

@clarabakker clarabakker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the branch with a null query to see whether the conditional evaluated as expected with and without the generator, and I believe it does behave differently. Ultimately the check still correctly evaluates, but the generator does break that piece.

Never mind, you were too speedy. I think this looks good!

@aschroed aschroed merged commit 019baad into master Dec 14, 2023
1 check failed
@aschroed aschroed deleted the ajs_fix_md5_check branch December 14, 2023 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants