Skip to content

Read one entry from redis queue #136

Open
@t75bernd

Description

@t75bernd

Hello,

first of all: great project, and easy to use.

I have a suggestion to make for solving a problem i have, let me explain it:

  1. Add 2 urls to a queue for one spider
  2. Spider reads 2 requests and yields both of them as a request
  3. Now sometimes the following is happening in my case: Both requests are yield and i get a session key in both requests. The first one finished returns a bunch of new requests (yield a new request is not possible) and the second yielded request is waiting until the first yielded request has crawled an item (could last some minutes). In the meantime the session key of my second request is expired and it get's rejected when trying to make a new request.

My idea would be to have an attribute in the spider which allows me to define that only 1 item from the queue is read and yielded as a response.

class MySpider(RedisSpider):
    yield_1_request = True

And next_requests in spiders.py has to be changed to something like:

if req:
    yield req
    found += 1
    if hasattr(self, 'yield_1_request') and self.yield_1_request and not use_set:
        break
else:
    self.logger.debug("Request not made from data: %r", data)

In this way you still could decide for every spider if this is necessary and also it is not a big impact in the code. What do you think about the idea/implementation? Is there anything i could make better? I would also prepare a pr when this is accepted as a feature.

Thanks for your help in advance :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions