Scrapy Webarchive is a plugin for Scrapy that allows users to capture and export web archives in the WARC and WACZ formats during crawling.
- Save web crawls in WACZ format (multiple storages supported; local and cloud).
- Crawl against WACZ format archives.
- Integrate seamlessly with Scrapy’s spider request and response cycle.
- Python 3.7, 3.8, 3.9, 3.10, 3.11 and 3.12
Documentation is available online at developers.thequestionmark.org/scrapy-webarchive/