Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CentOS 7 Spider not found: SafariBooks #36

Open
anutator opened this issue Feb 24, 2018 · 2 comments
Open

CentOS 7 Spider not found: SafariBooks #36

anutator opened this issue Feb 24, 2018 · 2 comments

Comments

@anutator
Copy link

2018-02-24 05:19:34 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: scrapybot)
2018-02-24 05:19:34 [scrapy.utils.log] INFO: Versions: lxml 4.1.1.0, libxml2 2.9.7, cssselect 1.0.3, parsel 1.4.0, w3lib 1.19.0, Twisted 16.4.1, Python 2.7.5 (default, Aug  4 2017, 00:39:18) - [GCC 4.8.5 20150623 (Red Hat 4.8.5-16)], pyOpenSSL 17.5.0 (OpenSSL 1.1.0g  2 Nov 2017), cryptography 2.1.4, Platform Linux-3.10.0-693.11.1.el7.x86_64-x86_64-with-centos-7.4.1708-Core
Traceback (most recent call last):
  File "/usr/bin/safaribooks", line 9, in <module>
    load_entry_point('safaribooks==0.1.0', 'console_scripts', 'safaribooks')()
  File "/usr/lib/python2.7/site-packages/safaribooks/__main__.py", line 121, in main
    args.func(args)
  File "/usr/lib/python2.7/site-packages/safaribooks/__main__.py", line 28, in download_epub
    output_directory=args.output_directory
  File "/usr/lib64/python2.7/site-packages/scrapy/crawler.py", line 170, in crawl
    crawler = self.create_crawler(crawler_or_spidercls)
  File "/usr/lib64/python2.7/site-packages/scrapy/crawler.py", line 198, in create_crawler
    return self._create_crawler(crawler_or_spidercls)
  File "/usr/lib64/python2.7/site-packages/scrapy/crawler.py", line 202, in _create_crawler
    spidercls = self.spider_loader.load(spidercls)
  File "/usr/lib64/python2.7/site-packages/scrapy/spiderloader.py", line 71, in load
    raise KeyError("Spider not found: {}".format(spider_name))
KeyError: 'Spider not found: SafariBooks'

I installed Python 2.7.14 as an alternative (not deleting 2.7.5). https://tecadmin.net/install-python-2-7-on-centos-rhel/ Should I add something to configuration?

@anutator
Copy link
Author

anutator commented Feb 24, 2018

The answer is to run command inside safarybooks directory. But epub files are very small. I don't show book id number.

2018-02-24 05:37:38 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <401 https://www.safaribooksonline.com//api/v1/book/book-id-number/chapter/ch01s04.html>: HTTP status code is not handled or not allowed
2018-02-24 05:37:39 [scrapy.core.engine] DEBUG: Crawled (401) <GET https://www.safaribooksonline.com//api/v1/book/book_id/chapter/ch01s03.html> (referer: https://www.safaribooksonline.com/nest/epub/toc/?book_id=book_id_number)

@yuankunzhang
Copy link

I encountered this problem too, thank you @bestann for your solution. And this behavior is really confusing and should be improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants