This is a web crawler designed for extracting information from websites.
-
Copy the Files
Crawl.js
: Main script for web crawling and filtering.json.js
: A versatile script for creating and updatingdata.json
files, suitable for use in multiple projects.
-
Integration
- Include the following line in your main script:
Make sure to adjust the path according to your project structure.
require('./path-to/Crawl.js');
- Include the following line in your main script:
async function crawl(url, maxDepth, maxTime)
url
: Starting point for the web crawling.maxDepth
: Maximum number of links visited on a single page.maxTime
: Time limit given in minutes for the crawling process.
⭕ Excluded Websites
- Check the
Crawl.js
file for theexcludedWebsites
variable, defined as:const excludedWebsites = require('./excluded-websites.json').websites;
- Modify the contents of
./excluded-websites.json
to include pages that should be ignored during crawling.