You should migrate to my more recent template: here. This repository is still a great learning resource for beginners!
With NodeJS and TypeScript, you can structure your project how you want. However, this can be a bit intimidating for beginners.
The puppeteer-extra-boilerplate is a good learning source for scraping beginners and allows advanced users to have a batteries-included template ready.
This is a WIP boilerplate. You might want to check prescience's one: https://github.com/prescience-data/foundation.
Then welcome to scraping ! Make sure to join the Scraping Enthusiasts discord server
You can use this project to help you get started in architecturing your own projects.
Then you probably will want to compare your stack with mine.
You need to rename .env.example
to a plain .env
.
To access and edit the database, you'll need to setup a local PostgreSQL instance. Then, you can edit the connection string in .env
Prisma's TypeScript getting started guide
This package is used to extend the base functionnalities of puppeteer.
It is required to use puppeteer-extra plugins.
Setup:
- Create a GCP project https://console.cloud.google.com
- Enable the Cloud Logging API
- Create a service account
- required roles:
- Logging > Logs Writer
- Monitoring > Monitoring Metric Writer
- source: https://cloud.google.com/logging/docs/agent/authorization
- required roles:
- Download the service account's credentials in JSON format
- Add the key to the root of the project and rename it to "gcp-creds.json"
Rules:
- You need to respect the code format. If you are using VSCode, install the Prettier extension, which should automatically pick up the .prettierrc file.
- All contributions are accepted. Documentation, code, etc...
- Add stealth utils (jitter, stealth mouse movements)
- Add stealth measures where there are
// STEALTH
comments. These measures should be activated thanks to an env varialbe. It would add delay, so not great for development.
Check VSCode's and TypeScript's issues.
See: microsoft/TypeScript#43249
This package is licensed under the MIT license.
Feel free to raise an issue.