Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry won't pick a new proxy. #15

Open
HGYD opened this issue Sep 20, 2016 · 9 comments
Open

Retry won't pick a new proxy. #15

HGYD opened this issue Sep 20, 2016 · 9 comments

Comments

@HGYD
Copy link

HGYD commented Sep 20, 2016

Hi,
I use a proxies list to run my spider. However, it failed to pick a new porxy when the connection failure happens.

2016-09-20 17:48:25 [scrapy] DEBUG: Using proxy http://xxx.160.162.95:8080, 3 proxies left
2016-09-20 17:48:27 [scrapy] INFO: Removing failed proxy http://xxx.160.162.95:8080, 2 proxies left
2016-09-20 17:48:27 [scrapy] DEBUG: Retrying <GET http://jsonip.com/> (failed 1 times): User timeout caused connection failure: Getting http://jsonip.com/ took longer than 2.0 seconds..
2016-09-20 17:48:29 [scrapy] INFO: Removing failed proxy http://xxx.160.162.95:8080, 2 proxies left
2016-09-20 17:48:29 [scrapy] DEBUG: Retrying <GET http://jsonip.com/> (failed 2 times): User timeout caused connection failure: Getting http://jsonip.com/ took longer than 2.0 seconds..
2016-09-20 17:48:31 [scrapy] INFO: Removing failed proxy http://xxx.160.162.95:8080, 2 proxies left
2016-09-20 17:48:31 [scrapy] DEBUG: Gave up retrying <GET http://jsonip.com/> (failed 3 times): User timeout caused connection failure: Getting http://jsonip.com/ took longer than 2.0 seconds..

Please help to fix this problem.
thanks a lot

@HGYD
Copy link
Author

HGYD commented Sep 20, 2016

the problem may cause by this code
if 'proxy' in request.meta:
return
I deleted the code and fixed the problem.

I think when you retry, you already have a proxy in your request.meta, so the middleware just pass away.

@watermelonjuice
Copy link

Same issue

@astwyg
Copy link

astwyg commented Oct 11, 2016

same +1

@watermelonjuice
Copy link

@aivarsk any chance you can update us on this?

@IvanIrk
Copy link

IvanIrk commented Dec 26, 2016

Same issue

@IvanIrk
Copy link

IvanIrk commented Dec 26, 2016

HGYD solution works for me.

@flash5
Copy link

flash5 commented Jul 6, 2017

I have similar issue of selecting new proxy like mentioned above, with my code after finishing retry attempts the code executions stops. Please suggest some solution

Here is the back trace

2017-07-06 18:54:45 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6042
2017-07-06 18:54:45 [scrapy.core.engine] INFO: Spider opened
2017-07-06 18:54:45 [scrapy.proxies] DEBUG: Using proxy http://72.169.78.1:87, 200 proxies left
2017-07-06 18:54:59 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 1 times): 403 Forbidden
2017-07-06 18:55:09 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 2 times): 403 Forbidden
2017-07-06 18:55:17 [scrapy.proxies] INFO: Removing failed proxy http://72.169.78.1:87, 199 proxies left
2017-07-06 18:55:17 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 3 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.>]
2017-07-06 18:55:24 [scrapy.proxies] INFO: Removing failed proxy http://72.169.78.1:87, 199 proxies left
2017-07-06 18:55:24 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 4 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.>]
2017-07-06 18:55:31 [scrapy.proxies] INFO: Removing failed proxy http://72.169.78.1:87, 199 proxies left
2017-07-06 18:55:31 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 5 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.>]
2017-07-06 18:55:45 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 6 times): 403 Forbidden
2017-07-06 18:55:53 [scrapy.proxies] INFO: Removing failed proxy http://72.169.78.1:87, 199 proxies left
2017-07-06 18:55:53 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 7 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.>]
2017-07-06 18:56:01 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 8 times): 403 Forbidden
2017-07-06 18:56:19 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 9 times): 403 Forbidden
2017-07-06 18:56:33 [scrapy.proxies] INFO: Removing failed proxy http://72.169.78.1:87, 199 proxies left
2017-07-06 18:56:33 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 10 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.>]
2017-07-06 18:56:41 [scrapy.proxies] INFO: Removing failed proxy http://72.169.78.1:87, 199 proxies left
2017-07-06 18:56:41 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET http://xyz.com> (failed 11 times): [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.>]
Traceback (most recent call last):
File "/usr/bin/scrapy", line 11, in
sys.exit(execute())
File "/usr/lib/python2.7/site-packages/scrapy/cmdline.py", line 149, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/usr/lib/python2.7/site-packages/scrapy/cmdline.py", line 89, in _run_print_help
func(*a, **kw)
File "/usr/lib/python2.7/site-packages/scrapy/cmdline.py", line 156, in _run_command
cmd.run(args, opts)
File "/usr/lib/python2.7/site-packages/scrapy/commands/shell.py", line 73, in run
shell.start(url=url, redirect=not opts.no_redirect)
File "/usr/lib/python2.7/site-packages/scrapy/shell.py", line 48, in start
self.fetch(url, spider, redirect=redirect)
File "/usr/lib/python2.7/site-packages/scrapy/shell.py", line 115, in fetch
reactor, self._schedule, request, spider)
File "/usr/lib64/python2.7/site-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
result.raiseException()
File "", line 2, in raiseException
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion.>]

@shadow-ru
Copy link

Use errbacks in Requests:

def start_requests(self):
    # ...
    yield scrapy.Request(url=url, callback=self.parse, errback=self.make_new_request)

def make_new_request(self, failure):
    return scrapy.Request(url=failure.request.url, callback=self.parse, errback=self.make_new_request, dont_filter=True)

@wvengen
Copy link

wvengen commented Jan 3, 2020

What about setting a new proxy if a retry has happened? On line 81:

# Don't overwrite with a random one (server-side state for IP)
# But when randomizing every request, we do want to update the proxy on retry.
if not (self.mode == Mode.RANDOMIZE_PROXY_EVERY_REQUESTS and request.meta.get('retry_times', 0) == 0):
  if 'proxy' in request.meta:
    if request.meta["exception"] is False:
      return

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants