Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream extra_metrics fails on repos with large number of issues/PRs #205

Open
laurentS opened this issue May 12, 2023 · 1 comment
Open
Assignees
Labels
bug Something isn't working

Comments

@laurentS
Copy link
Contributor

When running the tap on https://github.com/microsoft/TypeScript with the extra_metrics stream, it crashes because the number of open issues is shown as 5k+ on the project page.

When navigating to https://github.com/microsoft/TypeScript/issues the actual number is 5988 (as of writing this), so really closer to 6k.

Stack trace:

File "tap_github/repository_streams.py", line 2071, in parse_response
     yield from scrape_metrics(response, self.logger)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "tap_github/scraping.py", line 126, in scrape_metrics
     issues = parse_counter(soup.find("span", id="issues-repo-tab-count"), logger)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "tap_github/scraping.py", line 109, in parse_counter
     return int(title_string.strip().replace(",", ""))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '5000+'

It would make sense to source the number of open issues and PRs from the graphql api endpoint instead.
I will open a PR to fix this.

@laurentS laurentS added the bug Something isn't working label May 12, 2023
@laurentS laurentS self-assigned this May 12, 2023
@laurentS
Copy link
Contributor Author

Dependents and contributors counts are included in the same stream, but are not available through the API, as a temp fix, I'll make sure the 5k+ token does not crash the tap and returns something. Not ideal, but should allow the tap to keep running.

For the record, the following graphql query:

{
  repository(name: "TypeScript", owner: "microsoft") {
    issues(states: OPEN) {
      totalCount
    }
    pullRequests(states: OPEN) {
      totalCount
    }
  }
}

provides the number of open issues and PRs as shown on the web page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant