This project is a distributed web crawler, which is specially developed for crawling data from Quora.com.
Server message queuing middleware, using rabbitmq or redis.
Using virtual environments is not recommended, as it makes the task impossible.
Dependency | Version |
---|---|
Python | 3.11 or higher |
RabbitMQ | latest |
Redis | latest |
git clone https://github.com/LxYxvv/quora_distributed_crawler.git
cd quora_distributed_crawler
pip install -r requirements.txt
cd quora_distributed_crawler/server
python main.py
Set the broker_url
in the config.py file to your message middleware address.
Set the worker_concurrency
worker process to 2, to prevent too many and frequent crawler requests.
Set the url
in the utils/upload.py
How to submit a task to the queue?
Please refer to celery doc. need to configure the broker_url
of the server config.py
python main.py
Download ZeroTermux https://github.com/hanxinhao000/ZeroTermux/releases
pkg update && pkg upgrade
pkg install python3
Then Installation > Configuration worker > Start worker