Better GitHub scraper

This is a template repository for a GitHub scraper, a technique pioneered by Simon Willison.

This template assumes you need to query the DOM or HTML of the page to get the data you need. If you only need to fetch a data file directly, use the simple-scrape-template.

How to use this template

Modify the scrape.js file to scrape the data you need. The remplate is setup to use Playwright, but you can use any other scraping library you prefer by modifying the dependencies in the package.json file.

Commit and push the repo to GitHub and you're ready to go.

By default the scraper will run once per week, but you can change the cron schedule in the fetch.yaml file.

Data can be stored anywhere you like in the repository. The template is setup to store the data the data directory.

You may need to update the permissions on the new repository to allow workflows to make commits to the repository.

Using the scraped data

The way this scraper works by default is to update the data as JSON a file in the repository, so the repo always contains the latest version of the data, but the repository history contains a full history of the data from when scraping began.

This makes a time series analysis of the data possible, though not exactly straight forward. The git-history tool can be used to extract the full history into an SQLite database.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
data		data
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
scrape.js		scrape.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Better GitHub scraper

How to use this template

Using the scraped data

About

Uh oh!

Releases

Packages

Uh oh!

Languages

drzax/better-scrape-template

Folders and files

Latest commit

History

Repository files navigation

Better GitHub scraper

How to use this template

Using the scraped data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages