-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new API endpoint requests/latest #951
Add new API endpoint requests/latest #951
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CodeQL found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.
d6aef93
to
12af2bc
Compare
I'll give this another look in the morning with fresh eyes, but it's mostly ready for review |
ab37d31
to
07ba3f2
Compare
cachito/web/api_v1.py
Outdated
@@ -184,6 +184,70 @@ def get_request(request_id): | |||
return flask.jsonify(json) | |||
|
|||
|
|||
def get_latest_request() -> flask.Response: | |||
""" | |||
Retrieve the latest request for a repo_name/ref. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be something like "get
wrapper for 'latest_request()'" - currently it's identical to the docstring for _get_latest_request_by_repo_name_and_ref
below, which makes me wonder what this method actually does...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC this one's just the public API entrypoint to the private "backend" handler which this one wraps with JSON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my point was that we shouldn't have two different methods with identical docstrings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can update the docstring to make it more clear 👍
Request.query.filter(Request.ref == ref) | ||
.filter(or_(Request.repo.endswith(repo_name), Request.repo.endswith(repo_name_with_git))) | ||
.order_by(desc(Request.id)) | ||
) | ||
|
||
# Check the first result for a matching repo_name | ||
first_request = query.first() | ||
if first_request and get_repo_name(first_request.repo) == repo_name: | ||
return first_request | ||
|
||
# Fall back to iterating over the full result set, loading it in chunks, | ||
# looking for a matching repo_name | ||
for request in query.yield_per(10): | ||
if get_repo_name(request.repo) == repo_name: | ||
return request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's clearly something I'm missing here about those requests, so please clarify the following to me:
- we filter requests by a given commit ref
- we then filter those results to limit the set to only those which are related to
repo_name
- finally we order the set by highest request ID
So far so good. But why do we need to check get_repo_name(first_request.repo) == repo_name
again? Doesn't the SQL filter take care of that already?
Is it because of git protocol in the URI? Or is it because e.g. "repo": "https://github.com/git/org/eggs"
and "repo": "https://github.com/org/eggs"
are 2 different repos?
Also, how slow is loading those requests by pages of 10 results? I'm only asking since you're special-handling the first entry and then falling back to iterating over the rest of the set instead of just iterating it right away. What's the performance gain here on this API? Do we plan on running it that often instead only occasionally in a cron job?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it because of git protocol in the URI? Or is it because e.g. "repo": "https://github.com/git/org/eggs" and "repo": "https://github.com/org/eggs" are 2 different repos?
It's both 🥲
- Requests for repos [email protected]:org/eggs and https://github.com/org/eggs end up in two separate directories in the archives volume
- https://github.com/git/org/eggs and https://github.com/org/eggs are two different repos that would have separate source archives. We wouldn't want repo_name: org/eggs (derived from the archive name) to have latest data that is based upon results from git/org/eggs too
It all really comes down to how get_repo_name is used to determine the path of the archive on the volume and when assessing an archive, that's all the information we have to query the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, how slow is loading those requests by pages of 10 results? I'm only asking since you're special-handling the first entry and then falling back to iterating over the rest of the set instead of just iterating it right away. What's the performance gain here on this API? Do we plan on running it that often instead only occasionally in a cron job?
I expect that the initial query will cover nearly all requests and that the fall-back will just handle the odd edge-case. We'll probably just be running this as-needed and when it comes time to write the script, my plan was to rate-limit the requests anyway to avoid hammering the API for a non-critical maintenance task.
cachito/web/api_v1.py
Outdated
@@ -184,6 +184,70 @@ def get_request(request_id): | |||
return flask.jsonify(json) | |||
|
|||
|
|||
def get_latest_request() -> flask.Response: | |||
""" | |||
Retrieve the latest request for a repo_name/ref. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC this one's just the public API entrypoint to the private "backend" handler which this one wraps with JSON.
ee0d306
to
97849c2
Compare
The requests/latest endpoint will return the most recent request for a given repo_name and git ref. - The git repo_name and ref will be provided to the API as query parameters - The git repo_name is the namespaced repository name, not the full git URL (e.g. release-engineering/retrodep) - The most recent request among those with the same repo_name and ref is considered to be the one with the highest request_id Signed-off-by: Taylor Madore <[email protected]>
Generate duplicate requests for a repo/ref and query the requests/latest API endpoint to determine the "latest". The latest request should always be the last of the duplicate requests for each repo+ref combination Signed-off-by: Taylor Madore <[email protected]>
97849c2
to
6326d91
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
d912982
The requests/latest endpoint will return the most recent request for a given repo_name and git ref
Maintainers will complete the following section