The ScrapeOps Scrapy Proxy Middleware is an easy way to integrate the ScrapeOps Proxy API into your Scrapy spiders.
To use the middleware you first need to get a free API key here if you don't have one already.
To install the ScrapeOps Scrapy Proxy Middleware simply use pip:
pip install scrapeops-scrapy-proxy-sdk
Integrate into your Scrapy project by updating your settings.py
file:
SCRAPEOPS_API_KEY = 'YOUR_API_KEY'
SCRAPEOPS_PROXY_ENABLED = True
DOWNLOADER_MIDDLEWARES = {
'scrapeops_scrapy_proxy_sdk.scrapeops_scrapy_proxy_sdk.ScrapeOpsScrapyProxySdk': 725,
}
Now when you run your spiders, the requests will be automatically sent through the ScrapeOps Proxy API.
The ScrapeOps Proxy API supports a range of more advanced features that you can enable by adding extra query parameters to your request.
To enable them using the ScrapeOps Scrapy Proxy Middleware you can do so using 3 methods:
You can apply the proxy setting to every spider that runs in your project by adding a SCRAPEOPS_PROXY_SETTINGS
dictionary to your settings.py
file with the extra features you want to enable.
SCRAPEOPS_API_KEY = 'YOUR_API_KEY'
SCRAPEOPS_PROXY_ENABLED = True
SCRAPEOPS_PROXY_SETTINGS = {'country': 'us'}
DOWNLOADER_MIDDLEWARES = {
'scrapeops_scrapy_proxy_sdk.scrapeops_scrapy_proxy_sdk.ScrapeOpsScrapyProxySdk': 725,
}
You can apply the proxy setting to every request a spider makes by adding a SCRAPEOPS_PROXY_SETTINGS
dictionary to the custom_settings
attribute in your spider with the extra features you want to enable.
import scrapy
class QuotesSpider(scrapy.Spider):
name = "quotes"
custom_settings = {
'SCRAPEOPS_PROXY_SETTINGS': {'country': 'us'}
}
def start_requests(self):
urls = [
'https://quotes.toscrape.com/page/1/',
'https://quotes.toscrape.com/page/2/',
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
pass
You can apply the proxy setting to each individual request a spider makes by adding the extra features you want to enable to the meta
parameter of each request.
When using this method you need to add 'sops_' to the start of the feature you key you want to enable. So to enable 'country': 'uk'
, you would use 'sops_country': 'uk'
.
import scrapy
class QuotesSpider(scrapy.Spider):
name = "quotes"
def start_requests(self):
urls = [
'https://quotes.toscrape.com/page/1/',
'https://quotes.toscrape.com/page/2/',
]
for url in urls:
yield scrapy.Request(url=url, meta={'sops_country': 'uk'}, callback=self.parse)
def parse(self, response):
pass
A full list of advanced features can be found here.
When using Scrapy with the ScrapeOps Proxy you need to make sure you don't exceed your concurrency limit of the plan you are using.
For example, if you were using the Free Plan which has a concurrency limit of 1 thread, then you would set CONCURRENT_REQUESTS=1
in your settings.py
file.
For maximum performance you would also ensure that DOWNLOAD_DELAY
is set to zero in your settings.py
file (this is the default setting).
## settings.py
CONCURRENT_REQUESTS=1
DOWNLOAD_DELAY=0