Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tools/internal/parser: check TXT records #2213

Merged
merged 1 commit into from
Oct 14, 2024

Conversation

danderson
Copy link
Contributor

Supports checking a single PR, as well as re-verifying the entire PSL.


Opening as a draft because I forgot about this until now :( and I need to re-read the change to make sure it's ready to go and doesn't have leftover debugging cruft.

The main portion of the change is in validate.go, and is hopefully commented enough to explain what's happening. Essentially, we do parallel DNS lookups to discover the _psl records, gather a list of all PRs to verify, an then fan out again with github API requests to inspect the PRs and see if they match the TXT record.

Note that when running the validation on the entire PSL, or doing frequent reruns, you need a github auth token in the environment as GITHUB_TOKEN, otherwise you'll get rate limited by github pretty much instantly.

@danderson
Copy link
Contributor Author

It looks like github rate limits became much more aggressive since last time I used this code, previously I was able to run a check against the entire PSL a few time/hour. Now if I do that, I get limited almost immediately, even with authenticated requests. I'll see if I can make that more graceful, but in the meantime checking individual PRs works, since that's much lower load.

Supports checking a single PR, as well as re-verifying the entire PSL.
@danderson danderson marked this pull request as ready for review October 14, 2024 20:53
@danderson
Copy link
Contributor Author

For now, I've put --online-checks behind a dev-only gate for whole-file revalidation, since that causes a lot of github API traffic, and that causes too many false positives from getting rate-limited. Consulting with @simon-friedberger I also lowered the concurrency for DNS record lookup to account for weaker home router DNS resolvers that don't like getting slammed with hundreds of qps of lookups 😅

Simon made a very good suggestion out of band: the github commit history actually has enough information to verify most TXT record values without asking github for anything! I have a prototype of this working now that cuts out >99% of the github API requests by using a local git clone. 5 PRs still need github queries to verify because the corresponding git commit does not list the PR number anywhere, so we cannot discover the PR<>commit data any other way than asking github.

This PR should still be good to go as-is for verifying individual PRs. I plan to send the optimization outlined above as a followup, along with some improvements to DNS querying to try and make whole-file revalidation more usable.

@simon-friedberger simon-friedberger merged commit 3ea913a into publicsuffix:master Oct 14, 2024
2 checks passed
@danderson danderson deleted the push-kluwykpttnpo branch October 15, 2024 01:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants