Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new API endpoint requests/latest #951

Merged

Conversation

taylormadore
Copy link
Contributor

@taylormadore taylormadore commented Dec 5, 2023

The requests/latest endpoint will return the most recent request for a given repo_name and git ref

Maintainers will complete the following section

  • Commit messages are descriptive enough
  • Code coverage from testing does not decrease and new code is covered
  • New code has type annotations
  • OpenAPI schema is updated (if applicable)
  • [n/a] DB schema change has corresponding DB migration (if applicable)
  • [n/a] README updated (if worker configuration changed, or if applicable)
  • Draft release notes are updated before merging

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.

@taylormadore
Copy link
Contributor Author

I'll give this another look in the morning with fresh eyes, but it's mostly ready for review

@taylormadore taylormadore force-pushed the get-latest-request branch 2 times, most recently from ab37d31 to 07ba3f2 Compare December 7, 2023 13:59
@taylormadore taylormadore marked this pull request as ready for review December 7, 2023 13:59
@@ -184,6 +184,70 @@ def get_request(request_id):
return flask.jsonify(json)


def get_latest_request() -> flask.Response:
"""
Retrieve the latest request for a repo_name/ref.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be something like "get wrapper for 'latest_request()'" - currently it's identical to the docstring for _get_latest_request_by_repo_name_and_ref below, which makes me wonder what this method actually does...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this one's just the public API entrypoint to the private "backend" handler which this one wraps with JSON.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my point was that we shouldn't have two different methods with identical docstrings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can update the docstring to make it more clear 👍

Comment on lines +232 to +244
Request.query.filter(Request.ref == ref)
.filter(or_(Request.repo.endswith(repo_name), Request.repo.endswith(repo_name_with_git)))
.order_by(desc(Request.id))
)

# Check the first result for a matching repo_name
first_request = query.first()
if first_request and get_repo_name(first_request.repo) == repo_name:
return first_request

# Fall back to iterating over the full result set, loading it in chunks,
# looking for a matching repo_name
for request in query.yield_per(10):
if get_repo_name(request.repo) == repo_name:
return request
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's clearly something I'm missing here about those requests, so please clarify the following to me:

  • we filter requests by a given commit ref
  • we then filter those results to limit the set to only those which are related to repo_name
  • finally we order the set by highest request ID

So far so good. But why do we need to check get_repo_name(first_request.repo) == repo_name again? Doesn't the SQL filter take care of that already?
Is it because of git protocol in the URI? Or is it because e.g. "repo": "https://github.com/git/org/eggs" and "repo": "https://github.com/org/eggs" are 2 different repos?

Also, how slow is loading those requests by pages of 10 results? I'm only asking since you're special-handling the first entry and then falling back to iterating over the rest of the set instead of just iterating it right away. What's the performance gain here on this API? Do we plan on running it that often instead only occasionally in a cron job?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it because of git protocol in the URI? Or is it because e.g. "repo": "https://github.com/git/org/eggs" and "repo": "https://github.com/org/eggs" are 2 different repos?

It's both 🥲

It all really comes down to how get_repo_name is used to determine the path of the archive on the volume and when assessing an archive, that's all the information we have to query the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, how slow is loading those requests by pages of 10 results? I'm only asking since you're special-handling the first entry and then falling back to iterating over the rest of the set instead of just iterating it right away. What's the performance gain here on this API? Do we plan on running it that often instead only occasionally in a cron job?

I expect that the initial query will cover nearly all requests and that the fall-back will just handle the odd edge-case. We'll probably just be running this as-needed and when it comes time to write the script, my plan was to rate-limit the requests anyway to avoid hammering the API for a non-critical maintenance task.

@@ -184,6 +184,70 @@ def get_request(request_id):
return flask.jsonify(json)


def get_latest_request() -> flask.Response:
"""
Retrieve the latest request for a repo_name/ref.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this one's just the public API entrypoint to the private "backend" handler which this one wraps with JSON.

@taylormadore taylormadore force-pushed the get-latest-request branch 2 times, most recently from ee0d306 to 97849c2 Compare December 11, 2023 20:12
The requests/latest endpoint will return the most recent request for a
given repo_name and git ref.

- The git repo_name and ref will be provided to the API as query
  parameters
- The git repo_name is the namespaced repository name, not the full git
  URL (e.g. release-engineering/retrodep)
- The most recent request among those with the same repo_name and ref
  is considered to be the one with the highest request_id

Signed-off-by: Taylor Madore <[email protected]>
Generate duplicate requests for a repo/ref and query the requests/latest
API endpoint to determine the "latest". The latest request should always
be the last of the duplicate requests for each repo+ref combination

Signed-off-by: Taylor Madore <[email protected]>
Copy link
Contributor

@brunoapimentel brunoapimentel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@taylormadore taylormadore added this pull request to the merge queue Dec 13, 2023
Merged via the queue into containerbuildsystem:master with commit d912982 Dec 13, 2023
14 checks passed
@taylormadore taylormadore deleted the get-latest-request branch December 13, 2023 14:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants