Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scrapy wouldn't create new request after i pause crawl and resume crawl, and redis wouldn't create spidername:requests set again, even i set SCHEDULER_PERSIST = True at settings.py. #149

Open
Vickey-Wu opened this issue Jul 24, 2019 · 0 comments

Comments

@Vickey-Wu
Copy link

pause crawl (ctrl+c) and resume crawl util 0 in redis:requests set, and when spider closed spidername:requests were deleted

127.0.0.1:6379> KEYS *
1) "spidername:requests"
2) "spidername:dupefilter"
127.0.0.1:6379> ZCARD spidername:requests
(integer) 7
127.0.0.1:6379> ZCARD spidername:requests
(integer) 0
127.0.0.1:6379> KEYS *
1) "spidername:dupefilter"
127.0.0.1:6379> SCARD spidername:dupefilter
(integer) 26

i set SCHEDULER_PERSIST = True at settings.py and try to run spider again, redis wouldn't create "spidername:dupefilter" redis set again, only dupefilter is still here.

127.0.0.1:6379> KEYS *
1) "spidername:dupefilter"
127.0.0.1:6379> SCARD spidername:dupefilter
(integer) 26
2019-07-24 06:58:05 [scrapy.core.engine] INFO: Spider opened
2019-07-24 06:58:05 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-07-24 06:58:05 [spidername] INFO: Spider opened: spidername
2019-07-24 06:58:05 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2019-07-24 06:58:16 [root] INFO: ***************** crawling page 2 ***************** 
2019-07-24 06:58:16 [scrapy.core.engine] INFO: Closing spider (finished)
2019-07-24 06:58:16 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 926,
 'downloader/request_count': 4,
 'downloader/request_method_count/GET': 4,
 'downloader/response_bytes': 8868,
 'downloader/response_count': 4,
 'downloader/response_status_count/200': 2,
 'downloader/response_status_count/301': 2,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2019, 7, 24, 6, 58, 16, 951389),
 'log_count/INFO': 11,
 'memusage/max': 53563392,
 'memusage/startup': 53563392,
 'request_depth_max': 1,
 'response_received_count': 2,
 'robotstxt/request_count': 1,
 'robotstxt/response_count': 1,
 'robotstxt/response_status_count/200': 1,
 'scheduler/dequeued/redis': 2,
 'scheduler/enqueued/redis': 2,
 'start_time': datetime.datetime(2019, 7, 24, 6, 58, 5, 486811)}
2019-07-24 06:58:16 [scrapy.core.engine] INFO: Spider closed (finished)
@Vickey-Wu Vickey-Wu changed the title scrapy wouldn't crawl new request after i pause crawl and resume crawl, and redis wouldn't create redis:requests set again, even i set SCHEDULER_PERSIST = True at settings.py. scrapy wouldn't create new request after i pause crawl and resume crawl, and redis wouldn't create spidername:requests set again, even i set SCHEDULER_PERSIST = True at settings.py. Jul 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants