-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
md5_status check update #553
Conversation
…ate status even if not uploaded but still respect the docker pull limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like some of these changes are bug fixes ie: appending to summary that should be ported elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the branch with a null query to see whether the conditional evaluated as expected with and without the generator, and I believe it does behave differently. Ultimately the check still correctly evaluates, but the generator does break that piece.
Never mind, you were too speedy. I think this looks good!
This check has a throttle that limits the number of runs of these workflows that can be kicked off by the action to avoid going beyond the docker-hub pull limit of 200/6h.
Previously the way this was written was that the list of files checked was obtained by grabbing first n items from the search response where n is determined as (pull limit - the number of workflow_runs) created in the past 6 hours. However, we just ran into the case where 1 submitter uploaded 100+ files (and the md5 was kicked for some but not all of them - due to the pull limit throttle ) and in the meantime before n was high enough to finish all the runs a different submitter uploaded metadata for ~1000 files but did not upload the actual files. This meant that the same n files from the beginning of the list (those most recently submitted) got processed by the check and since they are still not uploaded the check is futilely spinning until they are.
This PR allows all the file items in the search_metadata response to be checked and how the appropriate number to run to avoid hitting the limit is determined (within the iterated results rather than limiting the number of items that are input to the iteration.
NOTE: that there is an (probably related) issue in that the check does not look to be triggered at all even though it is scheduled. We thought that updating the docker that was pointed to in the Workflow Item had fixed this but the check still doesn't seem to be running as far as Will can tell. However, the check appeared to run fine locally both before and after this update so I have no way to know if this change will resolve that. I did also switch search_metadata to return as a generator if maybe there is a memory issue.