A simple crawler made in JavaScript for Node.
crawling
is both available on GitHub Packages and npm.
To install, you first have to follow this guide on GitHub Docs. Then, you can run:
$ npm install @lgrachov/[email protected]
This should install the package in your project.
You only need to run one command:
$ npm install crawling
This should install the package in your project.
This example will create an array with all of the links gathered from the page.
import { crawlSite } from "crawling";
const links = [];
for await (const url of crawlSite("https://github.com/", 500)) {
links.push(url);
}
This example will log each one of the links received, without a delay like the previous example had.
import { crawlSite } from "crawling";
for await (const url of crawlSite("https://github.com/", 500)) {
console.log(url);
}
The function crawlSite
takes two parameters:
site
: Required. The site to crawl.timeout
: Optional. The timeout between each link in miliseconds, default is 500.
There are examples of usage, above and below:
import { crawlSite } from "crawling";
// this should choose a random url
const links = [];
for (const url of await crawlSite("https://github.com/", 500)) {
links.push(url);
}
console.log(shuffle(links)[0]);