Retry won't pick a new proxy. #15

HGYD · 2016-09-20T10:17:15Z

Hi,
I use a proxies list to run my spider. However, it failed to pick a new porxy when the connection failure happens.

2016-09-20 17:48:25 [scrapy] DEBUG: Using proxy http://xxx.160.162.95:8080, 3 proxies left
2016-09-20 17:48:27 [scrapy] INFO: Removing failed proxy http://xxx.160.162.95:8080, 2 proxies left
2016-09-20 17:48:27 [scrapy] DEBUG: Retrying <GET http://jsonip.com/> (failed 1 times): User timeout caused connection failure: Getting http://jsonip.com/ took longer than 2.0 seconds..
2016-09-20 17:48:29 [scrapy] INFO: Removing failed proxy http://xxx.160.162.95:8080, 2 proxies left
2016-09-20 17:48:29 [scrapy] DEBUG: Retrying <GET http://jsonip.com/> (failed 2 times): User timeout caused connection failure: Getting http://jsonip.com/ took longer than 2.0 seconds..
2016-09-20 17:48:31 [scrapy] INFO: Removing failed proxy http://xxx.160.162.95:8080, 2 proxies left
2016-09-20 17:48:31 [scrapy] DEBUG: Gave up retrying <GET http://jsonip.com/> (failed 3 times): User timeout caused connection failure: Getting http://jsonip.com/ took longer than 2.0 seconds..

Please help to fix this problem.
thanks a lot

HGYD · 2016-09-20T11:02:13Z

the problem may cause by this code
if 'proxy' in request.meta:
return
I deleted the code and fixed the problem.

I think when you retry, you already have a proxy in your request.meta, so the middleware just pass away.

watermelonjuice · 2016-09-23T02:29:28Z

Same issue

astwyg · 2016-10-11T01:02:07Z

same +1

watermelonjuice · 2016-11-05T20:42:07Z

@aivarsk any chance you can update us on this?

IvanIrk · 2016-12-26T22:36:52Z

Same issue

IvanIrk · 2016-12-26T22:44:25Z

HGYD solution works for me.

flash5 · 2017-07-06T13:39:33Z

I have similar issue of selecting new proxy like mentioned above, with my code after finishing retry attempts the code executions stops. Please suggest some solution

Here is the back trace

2017-07-06 18:54:45 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6042
2017-07-06 18:54:45 [scrapy.core.engine] INFO: Spider opened
2017-07-06 18:54:45 [scrapy.proxies] DEBUG: Using proxy http://72.169.78.1:87, 200 proxies left
2017-07-06 18:54:59 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 1 times): 403 Forbidden
2017-07-06 18:55:09 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 2 times): 403 Forbidden
2017-07-06 18:55:17 [scrapy.proxies] INFO: Removing failed proxy http://72.169.78.1:87, 199 proxies left
2017-07-06 18:55:17 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 3 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.>]
2017-07-06 18:55:24 [scrapy.proxies] INFO: Removing failed proxy http://72.169.78.1:87, 199 proxies left
2017-07-06 18:55:24 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 4 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.>]
2017-07-06 18:55:31 [scrapy.proxies] INFO: Removing failed proxy http://72.169.78.1:87, 199 proxies left
2017-07-06 18:55:31 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 5 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.>]
2017-07-06 18:55:45 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 6 times): 403 Forbidden
2017-07-06 18:55:53 [scrapy.proxies] INFO: Removing failed proxy http://72.169.78.1:87, 199 proxies left
2017-07-06 18:55:53 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 7 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.>]
2017-07-06 18:56:01 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 8 times): 403 Forbidden
2017-07-06 18:56:19 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 9 times): 403 Forbidden
2017-07-06 18:56:33 [scrapy.proxies] INFO: Removing failed proxy http://72.169.78.1:87, 199 proxies left
2017-07-06 18:56:33 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 10 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.>]
2017-07-06 18:56:41 [scrapy.proxies] INFO: Removing failed proxy http://72.169.78.1:87, 199 proxies left
2017-07-06 18:56:41 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET http://xyz.com> (failed 11 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.>]
Traceback (most recent call last):
File "/usr/bin/scrapy", line 11, in
sys.exit(execute())
File "/usr/lib/python2.7/site-packages/scrapy/cmdline.py", line 149, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/usr/lib/python2.7/site-packages/scrapy/cmdline.py", line 89, in _run_print_help
func(*a, **kw)
File "/usr/lib/python2.7/site-packages/scrapy/cmdline.py", line 156, in _run_command
cmd.run(args, opts)
File "/usr/lib/python2.7/site-packages/scrapy/commands/shell.py", line 73, in run
shell.start(url=url, redirect=not opts.no_redirect)
File "/usr/lib/python2.7/site-packages/scrapy/shell.py", line 48, in start
self.fetch(url, spider, redirect=redirect)
File "/usr/lib/python2.7/site-packages/scrapy/shell.py", line 115, in fetch
reactor, self._schedule, request, spider)
File "/usr/lib64/python2.7/site-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
result.raiseException()
File "", line 2, in raiseException
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.>]

shadow-ru · 2017-08-07T14:04:04Z

Use errbacks in Requests:

def start_requests(self):
    # ...
    yield scrapy.Request(url=url, callback=self.parse, errback=self.make_new_request)

def make_new_request(self, failure):
    return scrapy.Request(url=failure.request.url, callback=self.parse, errback=self.make_new_request, dont_filter=True)

wvengen · 2020-01-03T09:59:32Z

What about setting a new proxy if a retry has happened? On line 81:

# Don't overwrite with a random one (server-side state for IP)
# But when randomizing every request, we do want to update the proxy on retry.
if not (self.mode == Mode.RANDOMIZE_PROXY_EVERY_REQUESTS and request.meta.get('retry_times', 0) == 0):
  if 'proxy' in request.meta:
    if request.meta["exception"] is False:
      return

superryeti mentioned this issue Jun 14, 2019

Passing proxy via meta in start request throws KeyError: 'exception' #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry won't pick a new proxy. #15

Retry won't pick a new proxy. #15

HGYD commented Sep 20, 2016

HGYD commented Sep 20, 2016 •

edited

Loading

watermelonjuice commented Sep 23, 2016

astwyg commented Oct 11, 2016

watermelonjuice commented Nov 5, 2016

IvanIrk commented Dec 26, 2016

IvanIrk commented Dec 26, 2016

flash5 commented Jul 6, 2017

shadow-ru commented Aug 7, 2017

wvengen commented Jan 3, 2020

Retry won't pick a new proxy. #15

Retry won't pick a new proxy. #15

Comments

HGYD commented Sep 20, 2016

HGYD commented Sep 20, 2016 • edited Loading

watermelonjuice commented Sep 23, 2016

astwyg commented Oct 11, 2016

watermelonjuice commented Nov 5, 2016

IvanIrk commented Dec 26, 2016

IvanIrk commented Dec 26, 2016

flash5 commented Jul 6, 2017

shadow-ru commented Aug 7, 2017

wvengen commented Jan 3, 2020

HGYD commented Sep 20, 2016 •

edited

Loading