Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Fetch request url from redis fail #285

Open
KokoTa opened this issue Aug 13, 2023 · 8 comments
Open

[Question] Fetch request url from redis fail #285

KokoTa opened this issue Aug 13, 2023 · 8 comments

Comments

@KokoTa
Copy link

KokoTa commented Aug 13, 2023

Description

If i insert start url to redis before run scrapy, is successful.

But if i run scrapy first and insert url, listen url will get fail info:

2023-08-13 17:11:59 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method RedisMixin.spider_idle of <TestHtmlSpider 'test_html' at 0x2b05c4162d0>>
Traceback (most recent call last):
  File "C:\Users\KokoTa\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\utils\signal.py", line 43, in send_catch_log
    response = robustApply(
               ^^^^^^^^^^^^
  File "C:\Users\KokoTa\AppData\Local\Programs\Python\Python311\Lib\site-packages\pydispatch\robustapply.py", line 55, in robustApply
    return receiver(*arguments, **named)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\KokoTa\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy_redis\spiders.py", line 208, in spider_idle
    self.schedule_next_requests()
  File "C:\Users\KokoTa\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy_redis\spiders.py", line 197, in schedule_next_requests
    self.crawler.engine.crawl(req, spider=self)
TypeError: ExecutionEngine.crawl() got an unexpected keyword argument 'spider'

I can't get url dynamically and scrapy will crush.

@Shleif91
Copy link

Shleif91 commented Sep 4, 2023

Same error... Found a solution?

@gc1423
Copy link

gc1423 commented Sep 14, 2023

Passing a spider argument to the crawl() methods of scrapy.core.engine.ExecutionEngine is no longer supported in scrapy v2.10.0. release notes

Try scrapy 2.9.0.

RoyHuang2 added a commit to RoyHuang2/scrapy-redis that referenced this issue Nov 13, 2023
@GeorgeA92
Copy link

It looks like pull request #286 that fix this already exist from Aug.
This can be easily applied for app with current scrapy-redis version by.. overriding schedule_next_request method.

class SomeSpider(RedisSpider):
    ## vvv _add this to spider code
    def schedule_next_requests(self):
        """Schedules a request if available"""
        # TODO: While there is capacity, schedule a batch of redis requests.
        for req in self.next_requests():
            self.crawler.engine.crawl(req, spider=self)
            # see https://github.com/scrapy/scrapy/issues/5994
            if scrapy_version >= (2, 6):
                self.crawler.engine.crawl(req)
            else:
                self.crawler.engine.crawl(req, spider=self)

@xuexingdong
Copy link

hope the fixed version quickly release

@jordinl
Copy link

jordinl commented May 16, 2024

@rmax would it be possible to release a fix for this? I'm also encountering this issue

@migrant
Copy link

migrant commented Jun 18, 2024

The same problem...

@georgeJzzz
Copy link

@rmax would it be possible to release a fix for this? I'm also encountering this issue。 Thanks

@rmax
Copy link
Owner

rmax commented Jul 4, 2024

Thank you for your patience. V0.8.0 has been released 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants