-
-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup CI manifest #540
Comments
A more efficient approach would be webhooks, though it can only be created by the repository owner. We can ask them to help with this project, it would be the most efficient way without having to go through GitHub API, and most GitHub friendly. The webhook information can be cached on a dedicated github branch, so that the manifest CI won't refetch them. |
cc @jayvdb AFAIK this is especially slow since the recent codegen changes (#473).
Currently, we always call requests that have a limit of 100 releases per request in several times to get all the releases. However, if codegen is called with the "latest" argument, as in CI, it is usually sufficient to look at the latest 100 releases. Eventually we may need to change the overall logic, I guess there are a number of areas that could be made better without such a big deal. |
@taiki-e I think using web hook for certain projects and cache them in a dedicated branch would be much efficient. cc @sunshowers |
By the way, in my understanding, one of the actual causes of the "slowness" is that even though the token has reached the rate limit, that fact is not remembered on the codegen side, and it attempts to download with the token that has reached the rate limit each time the download is invoked, repeatedly. |
The binstalk-downloader crate checks for rate limit and applies a backoff strategy |
We also have crate binstalk-git-repo-api for some GraphQL APIs. I could add a artifact listing API for specific version, if you need any. |
Yes, this is the main cause. It added one github API fetch of each tools A very quick improvement for the license file fetches is to use the existing manifest data to first verify the existing URLs still exist, and fallback to the existing logic which attempt to find the appropriate license files. We can likely also skip a lot of fetching by storing, in the manifests, a copy of the github headers related to caching, e.g Due to the way the code is structured, the /releases list API and release asset fetches are easy to skip if github.com says the content isnt modified.
|
For license file, can't we just grab it from the tarball downloaded from crates.io? The tarball on crates.io is immutable so it can be cached for each release, and I think we already download from crates.io? |
Yup; good idea.
yes we do. |
Before a5ddc5a: about 11m - 22m GITHUB_TOKEN wasn't applied properly in the first place (I think it worked at one point, but at some point GitHub may have made the validation more strict). Closing this issue since this is no longer an urgent issue. (Opened separate issue #546 for license file fetch improvements.) |
CI manifest is quite slow, sometimes taking up to 20m.
I think using a dedicated github token might help,
secrets.GITHUB_TOKEN
only provides 1000 API hits per hour, where one created by user gets 5000.Switching to GraphQL might also help, since we can fetch all the artifacts information when fetching the release information.
The text was updated successfully, but these errors were encountered: