Spider closes on exception #5

samos123 · 2013-05-18T08:02:37Z

If exception is raised in parse method of a WebdriverResponse/WebdriverRequest whole spider closes/exits and doesnt continue

Steps to reproduce:
In any of your parse methods which parse WebDriverResponses raise an exception

Current result:
Scrapy stops crawling

Expected result:
Scrapy continues crawling next requests / urls

When parsing a normal scrapy Request / Response and you raise an error it seems to just continue. I did some quick testing on this, so I may be wrong though. This is a related error log:

2013-05-18 00:10:43+0800 [xxxxxx] ERROR: Spider error processing <GET http://item.xxxxx.com/>
        Traceback (most recent call last):
          File "/usr/lib/python2.7/site-packages/twisted/internet/base.py", line 824, in runUntilCurrent
            call.func(*call.args, **call.kw)
          File "/usr/lib/python2.7/site-packages/twisted/internet/task.py", line 607, in _tick
            taskObj._oneWorkUnit()
          File "/usr/lib/python2.7/site-packages/twisted/internet/task.py", line 484, in _oneWorkUnit
            result = next(self._iterator)
          File "/home/samos/.virtualenvs/scrapy/lib/python2.7/site-packages/scrapy/utils/defer.py", line 57, in <genexpr>
            work = (callable(elem, *args, **named) for elem in iterable)
        --- <exception caught here> ---
          File "/home/samos/.virtualenvs/scrapy/lib/python2.7/site-packages/scrapy/utils/defer.py", line 96, in iter_errback
            yield it.next()
          File "/home/samos/.virtualenvs/scrapy/lib/python2.7/site-packages/scrapy/contrib/spidermiddleware/offsite.py", line $
8, in process_spider_output
            for x in result:
          File "/home/samos/.virtualenvs/scrapy/lib/python2.7/site-packages/scrapy_webdriver/middlewares.py", line 36, in proc$
ss_spider_output
            for item_or_request in self._process_requests(result):
          File "/home/samos/.virtualenvs/scrapy/lib/python2.7/site-packages/scrapy_webdriver/middlewares.py", line 51, in _pro$
ess_requests
            for request in iter(items_or_requests):
          File "/home/samos/.virtualenvs/scrapy/lib/python2.7/site-packages/scrapy/contrib/spidermiddleware/referer.py", line $
2, in <genexpr>
            return (_set_referer(r) for r in result or ())
          File "/home/samos/.virtualenvs/scrapy/lib/python2.7/site-packages/scrapy/contrib/spidermiddleware/urllength.py", lin$
 33, in <genexpr>
            return (r for r in result or () if _filter(r))
          File "/home/samos/.virtualenvs/scrapy/lib/python2.7/site-packages/scrapy/contrib/spidermiddleware/depth.py", line 50$
 in <genexpr>
            return (r for r in result or () if _filter(r))
          File "/home/samos/workspace/alex-scrapy/crawler/spiders/xxx_spider.py", line 50, in parse_item
            raise Exception("test")
        exceptions.Exception: test

The text was updated successfully, but these errors were encountered:

stringertheory · 2013-07-11T22:22:08Z

I've been trying to figure this out, and thought that the issue might be that the lock on the webdriver instance was not getting released when there is an exception in the parse method (It is released when the parse method is successful in the process_spider_output method). However, I tried adding in a process_spider_exception method:

def process_spider_exception(self, response, exception, spider):
    if isinstance(response.request, WebdriverRequest):
        self.manager.release(response.request.url)
        return None

with no luck. The first exception is clearly getting logged by the handle_spider_error method in https://github.com/scrapy/scrapy/blob/master/scrapy/core/scraper.py, but I can't follow the scrapy source code through all of the callbacks/errbacks well enough to understand.

ncadou · 2013-07-11T22:48:30Z

If you could submit a pull request with a failing test case, that'd be awesome.

stringertheory · 2013-07-12T16:25:34Z

I'll add a test case as soon as I can figure out how to do it. My attempt to add one keeps failing miserably with ReactorNotRestartable errors. Any suggestions for good resources for understanding twisted?

ncadou · 2013-07-13T12:17:36Z

Not that I know of. To make some sense of twisted, I looked at its documentation and at scrapy source code, and googled for specific problems I encountered. This blog post was also useful: http://jessenoller.com/blog/2009/02/11/twisted-hello-asynchronous-programming

Updated logging strings to unicode

nt3rp mentioned this issue Jun 2, 2014

Scrapy - Exception Handling Willet/SecondFunnel#888

Merged

tonal pushed a commit to tonal/scrapy-webdriver that referenced this issue Apr 14, 2017

Merge pull request brandicted#5 from Willet/unicode-str-bug

2e550c1

Updated logging strings to unicode

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spider closes on exception #5

Spider closes on exception #5

samos123 commented May 18, 2013

stringertheory commented Jul 11, 2013

ncadou commented Jul 11, 2013

stringertheory commented Jul 12, 2013

ncadou commented Jul 13, 2013

Spider closes on exception #5

Spider closes on exception #5

Comments

samos123 commented May 18, 2013

stringertheory commented Jul 11, 2013

ncadou commented Jul 11, 2013

stringertheory commented Jul 12, 2013

ncadou commented Jul 13, 2013