Redis Spidey - Distributed Web Scraping Solution Powered by Redis

RedisSpidey is a powerful tool that combines the capabilities of Spidey and Redis to enable efficient distributed crawling and web scraping. Leveraging the advanced features of Redis, RedisSpidey features a distributed architecture that supports parallel operation of multiple instances, all listening to the same queue. Additionally, RedisSpidey pushes scraped data back to Redis queues for easy distributed post-processing, enhancing the overall efficiency of the scraping process.

Features

Distributed Crawling: RedisSpidey enables seamless operation of multiple instances of crawlers, all listening to the same queue, for efficient distributed crawling.
RedisPipeline: RedisSpidey provides support to push crawled data back to Redis queues for distributed post-processing

Installation

npm install spidey-redis

Options

RedisSpidey supports all Spidey options in addition to the following specific options.

Configuration	Type	Description	Default	Required
`redisUrl`	`string`	Redis url such as `redis://localhost:6379`	`null`	Yes
`urlsKey`	`string`	Redis input queue name such as `urls:queue`	`null`	Yes
`dataKey`	`string`	Redis output queue name such as `data:queue`	`null`	Yes if using RedisPipeline
`sleepDelay`	`number`	Wait for new items in queue if empty	`5000ms`	No

Usage

import { RedisSpidey, RedisPipeline } from 'spidey-redis';

class AmazonSpidey extends RedisSpidey {
  constructor() {
    super({
      // spidey options ...
      redisUrl: 'redis://localhost:6379',

      // Input queue
      urlsKey: 'amazon:urls',

      // Output queue
      dataKey: 'amazon:data',

      // Redis pipeline to push crawled data to data queue 
      pipelines: [RedisPipeline],
    });
  }
}

Conclusion

RedisSpidey is the ultimate solution for distributed web scraping and crawling, offering unparalleled performance, scalability, and flexibility. With RedisSpidey, you can easily handle large-scale web scraping tasks with ease, while taking advantage of advanced Redis and Spidey technology for efficient distributed crawling and post-processing of data.

License

Spidey is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
examples		examples
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tslint.json		tslint.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Redis Spidey - Distributed Web Scraping Solution Powered by Redis

Features

Installation

Options

Usage

Conclusion

License

About

Releases

Packages

Languages

License

asad-haider/spidey-redis

Folders and files

Latest commit

History

Repository files navigation

Redis Spidey - Distributed Web Scraping Solution Powered by Redis

Features

Installation

Options

Usage

Conclusion

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages